Feature extraction from unstructured texts as a combination of the morphological and the syntactic analysis and its usage in fake news classification tasks

Szabo Nagy, Kitti; Kapusta, Jozef; Munk, Michal

doi:10.1007/s00521-023-08967-2

Publikace:
Feature extraction from unstructured texts as a combination of the morphological and the syntactic analysis and its usage in fake news classification tasks

Článekopen accesspeer-reviewedpublished

Soubory

s00521-023-08967-2.pdf (834.56 KB)

Datum

2023

Autoři

Szabo Nagy, Kitti

Kapusta, Jozef

Munk, Michal

Nakladatel

Springer

Abstrakt

In this paper, a new technique of feature extraction is proposed, which is considered an essential part of natural language processing. Feature extraction is the process of transformation of the unstructured text to a format which is recognizable by computers. This means a transformation to a vector of numbers. The study evaluates and compares the performance of three methods: M1, which is the baseline method TfIdf; M2, which combines TfIdf with POS tags; and M3, a novel technique called MDgwPosF that incorporates weighted TfIdf values based on word depths and the relative frequency of POS tags. The primary focus of the study is to assess and compare the performance of these methods, with particular emphasis on evaluating how M3 performs in comparison with M1 and M2. Two different datasets and feed-forward, LSTM and GRU neural networks were used in this study. The results showed that the feed-forward model with the proposed method MDgwPosF in moderate topology achieved the best performance across various measures. The dataset created automatically performed better than the manual dataset. The differences between methods and topologies were not statistically significant. Statistically significant differences between the classification models were proven. The MDgwPosF method achieved higher accuracy compared to the baseline TfIdf, indicating that incorporating additional information into the vector can enhance the performance of TfIdf.

Klíčová slova

Syntactic analysis, Morphological analysis, Feature extraction, Fake news classification, Neural networks, Syntaktická analýza, Morfologická analýza, Extrakce funkcí, Klasifikace falešných zpráv, Neuronové sítě

Permanentní identifikátor

https://hdl.handle.net/10195/83930

Kolekce

Publikační činnost akademických pracovníků UPCE / UPCE Research Outputs
Publikační činnost akademických pracovníků FES / FES Research Outputs

Zobrazit úplný záznam

Publikace:
Feature extraction from unstructured texts as a combination of the morphological and the syntactic analysis and its usage in fake news classification tasks

Soubory

Datum

Autoři

Název časopisu

ISSN časopisu

Název svazku

Nakladatel

Výzkumné projekty

Organizační jednotky

Číslo časopisu

Abstrakt

Popis

Klíčová slova

Citace

Permanentní identifikátor

Kolekce

Endorsement

Review

Supplemented By

Referenced By

Publikace: Feature extraction from unstructured texts as a combination of the morphological and the syntactic analysis and its usage in fake news classification tasks

Soubory

Datum

Autoři

Název časopisu

ISSN časopisu

Název svazku

Nakladatel

Výzkumné projekty

Organizační jednotky

Číslo časopisu

Abstrakt

Popis

Klíčová slova

Citace

Permanentní identifikátor

Kolekce

Endorsement

Review

Supplemented By

Referenced By

Publikace:
Feature extraction from unstructured texts as a combination of the morphological and the syntactic analysis and its usage in fake news classification tasks