Publikace: Machine Learning Techniques in Spam Filtering
Disertační práceopen access| dc.contributor.author | Barushka, Aliaksandr | |
| dc.contributor.referee | Bureš, Vladimír | |
| dc.contributor.referee | Pokorný, Miroslav | |
| dc.date.accepted | 2020-06-02 | |
| dc.date.accessioned | 2020-07-08T10:45:37Z | |
| dc.date.available | 2020-07-08T10:45:37Z | |
| dc.date.issued | 2020 | |
| dc.date.submitted | 2020-03-31 | |
| dc.description.abstract | The rapid growth of unsolicited and unwanted messages has inspired the development of many anti-spam methods. Machine-learning methods such as Naive Bayes, support vector machines or neural networks have been particularly effective in categorizing spam/non-spam messages. In order to further enhance the performance of review spam detection, I propose a novel contentbased approach that considers both bag-of-words and word context. More precisely, the proposed approach utilizes n-grams and the Skip-Gram word embedding method to build a vector model. As a result, high-dimensional eature representation is generated. To handle the representation and classify the spam accurately, ensemble learning techniques with regularized deep feed-forward neural networks as base learners are used in order to overcome slow optimization convergence to a poor local minimum and overfitting ssues. In order to verify the proposed approach, I use seven different types of datasets from different spam filtering domains. I show that the proposed spam filtering model outperforms existing methods in terms of classification accuracy, false negative and false positive rates, F-score, area under ROC and misclassification cost. The only drawback of the proposed algorithm is its higher computation complexity. | eng |
| dc.description.defence | Doktorand se ve své disertační práci věnoval definici problému se spamem a pokročilými metodami jeho rozpoznání a filtrací, v čemž mohu vidět společenský přínos disertační práce. V diskusi zodpověděl všechny dotazy členů komise s přehledem a hlubokou znalostní problematiky. | cze |
| dc.description.department | Fakulta ekonomicko-správní | cze |
| dc.description.grade | Dokončená práce s úspěšnou obhajobou | cze |
| dc.format | 116 s. | |
| dc.identifier | Univerzitní knihovna (studovna) | cze |
| dc.identifier.signature | D40321 | |
| dc.identifier.stag | 40602 | |
| dc.identifier.uri | https://hdl.handle.net/10195/75560 | |
| dc.language.iso | eng | |
| dc.publisher | Univerzita Pardubice | cze |
| dc.rights | Bez omezení | |
| dc.subject | neural networks | eng |
| dc.subject | ensemble learning | eng |
| dc.subject | word embedding | eng |
| dc.subject | spam | eng |
| dc.subject | machine learning | eng |
| dc.thesis.degree-discipline | Applied Informatics | cze |
| dc.thesis.degree-grantor | Univerzita Pardubice. Fakulta ekonomicko-správní | cze |
| dc.thesis.degree-name | Ph.D. | |
| dc.thesis.degree-program | Applied Informatics | cze |
| dc.title | Machine Learning Techniques in Spam Filtering | eng |
| dc.type | disertační práce | cze |
| dspace.entity.type | Publication |
Soubory
Původní svazek
1 - 3 z 3
Načítá se...
- Název:
- Disertacni_prace_Ing_Barushka.pdf
- Velikost:
- 2.47 MB
- Formát:
- Adobe Portable Document Format
- Popis:
- Plný text práce
Načítá se...
- Název:
- Posudek_skolitele_Ing_Barushka.pdf
- Velikost:
- 626.68 KB
- Formát:
- Adobe Portable Document Format
- Popis:
- Posudek vedoucího práce
Načítá se...
- Název:
- Posudky_oponentu_Ing_Barushka.pdf
- Velikost:
- 1.42 MB
- Formát:
- Adobe Portable Document Format
- Popis:
- Posudek oponenta práce