Publikace: Feature extraction from unstructured texts as a combination of the morphological and the syntactic analysis and its usage in fake news classification tasks
Článekopen accesspeer-reviewedpublishedNačítá se...
Datum
Autoři
Szabo Nagy, Kitti
Kapusta, Jozef
Munk, Michal
Název časopisu
ISSN časopisu
Název svazku
Nakladatel
Springer
Abstrakt
In this paper, a new technique of feature extraction is proposed, which is considered an essential part of natural language processing. Feature extraction is the process of transformation of the unstructured text to a format which is recognizable by computers. This means a transformation to a vector of numbers. The study evaluates and compares the performance of three methods: M1, which is the baseline method TfIdf; M2, which combines TfIdf with POS tags; and M3, a novel technique called MDgwPosF that incorporates weighted TfIdf values based on word depths and the relative frequency of POS tags. The primary focus of the study is to assess and compare the performance of these methods, with particular emphasis on evaluating how M3 performs in comparison with M1 and M2. Two different datasets and feed-forward, LSTM and GRU neural networks were used in this study. The results showed that the feed-forward model with the proposed method MDgwPosF in moderate topology achieved the best performance across various measures. The dataset created automatically performed better than the manual dataset. The differences between methods and topologies were not statistically significant. Statistically significant differences between the classification models were proven. The MDgwPosF method achieved higher accuracy compared to the baseline TfIdf, indicating that incorporating additional information into the vector can enhance the performance of TfIdf.
Popis
Klíčová slova
Syntactic analysis, Morphological analysis, Feature extraction, Fake news classification, Neural networks, Syntaktická analýza, Morfologická analýza, Extrakce funkcí, Klasifikace falešných zpráv, Neuronové sítě