Machine Learning Techniques in Spam Filtering

Barushka, Aliaksandr

Digitální knihovna UPCE
→
Fakulta ekonomicko-správní / Faculty of Economics and Administration
→
Disertační práce / Dissertations FES (Ph.D.)
→
Zobrazit záznam

dc.contributor.author	Barushka, Aliaksandr
dc.date.accessioned	2020-07-08T10:45:37Z
dc.date.available	2020-07-08T10:45:37Z
dc.date.issued	2020
dc.date.submitted	2020-03-31
dc.identifier	Univerzitní knihovna (studovna)	cze
dc.identifier.uri	https://hdl.handle.net/10195/75560
dc.description.abstract	The rapid growth of unsolicited and unwanted messages has inspired the development of many anti-spam methods. Machine-learning methods such as Naive Bayes, support vector machines or neural networks have been particularly effective in categorizing spam/non-spam messages. In order to further enhance the performance of review spam detection, I propose a novel contentbased approach that considers both bag-of-words and word context. More precisely, the proposed approach utilizes n-grams and the Skip-Gram word embedding method to build a vector model. As a result, high-dimensional eature representation is generated. To handle the representation and classify the spam accurately, ensemble learning techniques with regularized deep feed-forward neural networks as base learners are used in order to overcome slow optimization convergence to a poor local minimum and overfitting ssues. In order to verify the proposed approach, I use seven different types of datasets from different spam filtering domains. I show that the proposed spam filtering model outperforms existing methods in terms of classification accuracy, false negative and false positive rates, F-score, area under ROC and misclassification cost. The only drawback of the proposed algorithm is its higher computation complexity.	eng
dc.format	116 s.
dc.language.iso	eng
dc.publisher	Univerzita Pardubice	cze
dc.rights	Bez omezení
dc.subject	neural networks	eng
dc.subject	ensemble learning	eng
dc.subject	word embedding	eng
dc.subject	spam	eng
dc.subject	machine learning	eng
dc.title	Machine Learning Techniques in Spam Filtering	eng
dc.type	disertační práce	cze
dc.contributor.referee	Bureš, Vladimír
dc.contributor.referee	Pokorný, Miroslav
dc.date.accepted	2020-06-02
dc.description.department	Fakulta ekonomicko-správní	cze
dc.thesis.degree-discipline	Applied Informatics	cze
dc.thesis.degree-name	Ph.D.
dc.thesis.degree-grantor	Univerzita Pardubice. Fakulta ekonomicko-správní	cze
dc.identifier.signature	D40321
dc.thesis.degree-program	Applied Informatics	cze
dc.description.defence	Doktorand se ve své disertační práci věnoval definici problému se spamem a pokročilými metodami jeho rozpoznání a filtrací, v čemž mohu vidět společenský přínos disertační práce. V diskusi zodpověděl všechny dotazy členů komise s přehledem a hlubokou znalostní problematiky.	cze
dc.identifier.stag	40602
dc.description.grade	Dokončená práce s úspěšnou obhajobou	cze