Zobrazit minimální záznam
dc.contributor.author |
Barushka, Aliaksandr
|
|
dc.date.accessioned |
2020-07-08T10:45:37Z |
|
dc.date.available |
2020-07-08T10:45:37Z |
|
dc.date.issued |
2020 |
|
dc.date.submitted |
2020-03-31 |
|
dc.identifier |
Univerzitní knihovna (studovna) |
cze |
dc.identifier.uri |
https://hdl.handle.net/10195/75560 |
|
dc.description.abstract |
The rapid growth of unsolicited and unwanted messages has inspired the development of many anti-spam methods. Machine-learning methods such as Naive Bayes, support vector machines or neural networks have been particularly effective in categorizing spam/non-spam messages. In order to further enhance the performance of review spam detection, I propose a novel contentbased approach that considers both bag-of-words and word context. More precisely, the proposed approach utilizes n-grams and the Skip-Gram word embedding method to build a vector model. As a result, high-dimensional eature representation is generated. To handle the representation and classify the spam accurately, ensemble learning techniques with regularized deep feed-forward neural networks as base learners are used in order to overcome slow optimization convergence to a poor local minimum and overfitting ssues. In order to verify the proposed approach, I use seven different types of datasets from different spam filtering domains. I show that the proposed spam filtering model outperforms existing methods in terms of classification accuracy, false negative and false positive rates, F-score, area under ROC and misclassification cost. The only drawback of the proposed algorithm is its higher computation complexity. |
eng |
dc.format |
116 s. |
|
dc.language.iso |
eng |
|
dc.publisher |
Univerzita Pardubice |
cze |
dc.rights |
Bez omezení |
|
dc.subject |
neural networks |
eng |
dc.subject |
ensemble learning |
eng |
dc.subject |
word embedding |
eng |
dc.subject |
spam |
eng |
dc.subject |
machine learning |
eng |
dc.title |
Machine Learning Techniques in Spam Filtering |
eng |
dc.type |
disertační práce |
cze |
dc.contributor.referee |
Bureš, Vladimír |
|
dc.contributor.referee |
Pokorný, Miroslav |
|
dc.date.accepted |
2020-06-02 |
|
dc.description.department |
Fakulta ekonomicko-správní |
cze |
dc.thesis.degree-discipline |
Applied Informatics |
cze |
dc.thesis.degree-name |
Ph.D. |
|
dc.thesis.degree-grantor |
Univerzita Pardubice. Fakulta ekonomicko-správní |
cze |
dc.identifier.signature |
D40321 |
|
dc.thesis.degree-program |
Applied Informatics |
cze |
dc.description.defence |
Doktorand se ve své disertační práci věnoval definici problému se spamem a pokročilými metodami jeho rozpoznání a filtrací, v čemž mohu vidět společenský přínos disertační práce. V diskusi zodpověděl všechny dotazy členů komise s přehledem a hlubokou znalostní problematiky. |
cze |
dc.identifier.stag |
40602 |
|
dc.description.grade |
Dokončená práce s úspěšnou obhajobou |
cze |
Tento záznam se objevuje v následujících kolekcích
Zobrazit minimální záznam
|
Vyhledávání
Procházet
-
Vše v Digitální knihovně
-
Tato kolekce
Můj účet
|