Machine Learning Techniques in Spam Filtering

Zobrazit minimální záznam

dc.contributor.author Barushka, Aliaksandr
dc.date.accessioned 2020-07-08T10:45:37Z
dc.date.available 2020-07-08T10:45:37Z
dc.date.issued 2020
dc.date.submitted 2020-03-31
dc.identifier Univerzitní knihovna (studovna) cze
dc.identifier.uri https://hdl.handle.net/10195/75560
dc.description.abstract The rapid growth of unsolicited and unwanted messages has inspired the development of many anti-spam methods. Machine-learning methods such as Naive Bayes, support vector machines or neural networks have been particularly effective in categorizing spam/non-spam messages. In order to further enhance the performance of review spam detection, I propose a novel contentbased approach that considers both bag-of-words and word context. More precisely, the proposed approach utilizes n-grams and the Skip-Gram word embedding method to build a vector model. As a result, high-dimensional eature representation is generated. To handle the representation and classify the spam accurately, ensemble learning techniques with regularized deep feed-forward neural networks as base learners are used in order to overcome slow optimization convergence to a poor local minimum and overfitting ssues. In order to verify the proposed approach, I use seven different types of datasets from different spam filtering domains. I show that the proposed spam filtering model outperforms existing methods in terms of classification accuracy, false negative and false positive rates, F-score, area under ROC and misclassification cost. The only drawback of the proposed algorithm is its higher computation complexity. eng
dc.format 116 s.
dc.language.iso eng
dc.publisher Univerzita Pardubice cze
dc.rights Bez omezení
dc.subject neural networks eng
dc.subject ensemble learning eng
dc.subject word embedding eng
dc.subject spam eng
dc.subject machine learning eng
dc.title Machine Learning Techniques in Spam Filtering eng
dc.type disertační práce cze
dc.contributor.referee Bureš, Vladimír
dc.contributor.referee Pokorný, Miroslav
dc.date.accepted 2020-06-02
dc.description.department Fakulta ekonomicko-správní cze
dc.thesis.degree-discipline Applied Informatics cze
dc.thesis.degree-name Ph.D.
dc.thesis.degree-grantor Univerzita Pardubice. Fakulta ekonomicko-správní cze
dc.identifier.signature D40321
dc.thesis.degree-program Applied Informatics cze
dc.description.defence Doktorand se ve své disertační práci věnoval definici problému se spamem a pokročilými metodami jeho rozpoznání a filtrací, v čemž mohu vidět společenský přínos disertační práce. V diskusi zodpověděl všechny dotazy členů komise s přehledem a hlubokou znalostní problematiky. cze
dc.identifier.stag 40602
dc.description.grade Dokončená práce s úspěšnou obhajobou cze


Tento záznam se objevuje v následujících kolekcích

Zobrazit minimální záznam

Vyhledávání


Rozšířené hledání

Procházet

Můj účet