Digitální knihovna UPCE přechází na novou verzi. Omluvte prosím případné komplikace. / The UPCE Digital Library is migrating to a new version. We apologize for any inconvenience.

Publikace:
Feature extraction from unstructured texts as a combination of the morphological and the syntactic analysis and its usage in fake news classification tasks

Článekopen accesspeer-reviewedpublished
Načítá se...
Náhled

Datum

Autoři

Szabo Nagy, Kitti
Kapusta, Jozef
Munk, Michal

Název časopisu

ISSN časopisu

Název svazku

Nakladatel

Springer

Výzkumné projekty

Organizační jednotky

Číslo časopisu

Abstrakt

In this paper, a new technique of feature extraction is proposed, which is considered an essential part of natural language processing. Feature extraction is the process of transformation of the unstructured text to a format which is recognizable by computers. This means a transformation to a vector of numbers. The study evaluates and compares the performance of three methods: M1, which is the baseline method TfIdf; M2, which combines TfIdf with POS tags; and M3, a novel technique called MDgwPosF that incorporates weighted TfIdf values based on word depths and the relative frequency of POS tags. The primary focus of the study is to assess and compare the performance of these methods, with particular emphasis on evaluating how M3 performs in comparison with M1 and M2. Two different datasets and feed-forward, LSTM and GRU neural networks were used in this study. The results showed that the feed-forward model with the proposed method MDgwPosF in moderate topology achieved the best performance across various measures. The dataset created automatically performed better than the manual dataset. The differences between methods and topologies were not statistically significant. Statistically significant differences between the classification models were proven. The MDgwPosF method achieved higher accuracy compared to the baseline TfIdf, indicating that incorporating additional information into the vector can enhance the performance of TfIdf.

Popis

Klíčová slova

Syntactic analysis, Morphological analysis, Feature extraction, Fake news classification, Neural networks, Syntaktická analýza, Morfologická analýza, Extrakce funkcí, Klasifikace falešných zpráv, Neuronové sítě

Citace

Permanentní identifikátor

Endorsement

Review

Supplemented By

Referenced By