Machine Learning Techniques in Spam Filtering

Barushka, Aliaksandr

Publikace:
Machine Learning Techniques in Spam Filtering

Disertační práceopen access

Soubory

Plný text práce (2.47 MB)

Posudek vedoucího práce (626.68 KB)

Posudek oponenta práce (1.42 MB)

Datum

2020

Autoři

Barushka, Aliaksandr

Nakladatel

Univerzita Pardubice

Abstrakt

The rapid growth of unsolicited and unwanted messages has inspired the development of many anti-spam methods. Machine-learning methods such as Naive Bayes, support vector machines or neural networks have been particularly effective in categorizing spam/non-spam messages. In order to further enhance the performance of review spam detection, I propose a novel contentbased approach that considers both bag-of-words and word context. More precisely, the proposed approach utilizes n-grams and the Skip-Gram word embedding method to build a vector model. As a result, high-dimensional eature representation is generated. To handle the representation and classify the spam accurately, ensemble learning techniques with regularized deep feed-forward neural networks as base learners are used in order to overcome slow optimization convergence to a poor local minimum and overfitting ssues. In order to verify the proposed approach, I use seven different types of datasets from different spam filtering domains. I show that the proposed spam filtering model outperforms existing methods in terms of classification accuracy, false negative and false positive rates, F-score, area under ROC and misclassification cost. The only drawback of the proposed algorithm is its higher computation complexity.

Klíčová slova

neural networks, ensemble learning, word embedding, spam, machine learning

Permanentní identifikátor

https://hdl.handle.net/10195/75560

Kolekce

Disertační práce / Dissertations FES (Ph.D.)
Vysokoškolské kvalifikační práce / Theses, dissertations, etc.

Zobrazit úplný záznam

Publikace:
Machine Learning Techniques in Spam Filtering

Soubory

Datum

Autoři

Název časopisu

ISSN časopisu

Název svazku

Nakladatel

Výzkumné projekty

Organizační jednotky

Číslo časopisu

Abstrakt

Popis

Klíčová slova

Citace

Permanentní identifikátor

Kolekce

Endorsement

Review

Supplemented By

Referenced By

Publikace: Machine Learning Techniques in Spam Filtering

Soubory

Datum

Autoři

Název časopisu

ISSN časopisu

Název svazku

Nakladatel

Výzkumné projekty

Organizační jednotky

Číslo časopisu

Abstrakt

Popis

Klíčová slova

Citace

Permanentní identifikátor

Kolekce

Endorsement

Review

Supplemented By

Referenced By

Publikace:
Machine Learning Techniques in Spam Filtering