Spam Filtering Using Regularized Neural Networks with Rectified Linear Units

Barushka, Aliaksandr; Hájek, Petr

Digitální knihovna UPCE
→
Univerzita Pardubice
→
Publikační činnost akademických pracovníků UPCE / UPCE Research Outputs
→
Zobrazit záznam

dc.contributor.author	Barushka, Aliaksandr	cze
dc.contributor.author	Hájek, Petr	cze
dc.date.accessioned	2017-05-11T10:45:45Z
dc.date.available	2017-05-11T10:45:45Z
dc.date.issued	2016	eng
dc.identifier.isbn	978-3-319-49129-5	eng
dc.identifier.issn	0302-9743	eng
dc.identifier.uri	http://hdl.handle.net/10195/67259
dc.description.abstract	The rapid growth of unsolicited and unwanted messages has inspired the development of many anti-spam methods. Machine-learning methods such as Naïve Bayes (NB), support vector machines (SVMs) or neural networks (NNs) have been particularly effective in categorizing spam /non-spam messages. They automatically construct word lists and their weights usually in a bag-of-words fashion. However, traditional multilayer perceptron (MLP) NNs usually suffer from slow optimization convergence to a poor local minimum and overfitting issues. To overcome this problem, we use a regularized NN with rectified linear units (RANN-ReL) for spam filtering. We compare its performance on three benchmark spam datasets (Enron, SpamAssassin, and SMS spam collection) with four machine algorithms commonly used in text classification, namely NB, SVM, MLP, and k-NN. We show that the RANN-ReL outperforms other methods in terms of classification accuracy, false negative and false positive rates. Notably, it classifies well both major (legitimate) and minor (spam) classes.	eng
dc.format	p. 65-75	eng
dc.language.iso	eng	eng
dc.publisher	Springer	eng
dc.relation.ispartof	AIIA 2016 Advances in Artificial Intelligence	eng
dc.rights	Pouze v rámci univerzity	eng
dc.subject	Spam filter	eng
dc.subject	Email	eng
dc.subject	Sms	eng
dc.subject	Neural network	eng
dc.subject	Regularization	eng
dc.subject	Rectified linear unit	eng
dc.subject	Spamový filtr	cze
dc.subject	Email	cze
dc.subject	Sms	cze
dc.subject	neuronová síť	cze
dc.subject	regularizace	cze
dc.subject	rektifikovaná lineární jednotka	cze
dc.title	Spam Filtering Using Regularized Neural Networks with Rectified Linear Units	eng
dc.title.alternative	Filtrování nevyžádané pošty pomocí regularizovaných neuronových sítí s rektifikovanými lineárními jednotkami	cze
dc.type	ConferenceObject	eng
dc.description.abstract-translated	Rychlý růst nevyžádaných a nežádoucích zpráv inspiroval vývoj mnoha anti-spamových metod. Metody strojového učení, jako je Naive Bayes (NB), podpůrné vektorové stroje (SVM) nebo neuronové sítě (NN) byly při kategorizaci spamu obzvláště účinné. Tyto metody automaticky sestavují seznamy slov a jejich váhy obvykle v módu balíků slov. Nicméně, tradiční vícevrstvý perceptron (MLP) obvykle trpí pomalou konvergencí ke horšímu lokálním minimu a problémem přeučení. K překonání tohoto problému používáme pro filtrování nevyžádané pošty regularizované NN s rektifikovanými lineárními jednotkami (RANN-ReL). Porovnáváme jejich výkon na třech testovacích datových sadách (Enron, SpamAssassin a SMS spamu) se čtyřmi algoritmy strojového učení běžně používaných v textovém klasifikaci, a to NB, SVM, MLP a k-NN. Ukázali jsme, že RANN-ReL překonává jiné metody pokud jde o přesnost klasifikace, chybně negativní a chybně pozitivní míry. Tento systém klasifikuje jak majoritní (oprávněné) tak minoritní (spam) třídy.	cze
dc.event	15th International Conference of the Italian Association for Artificial Intelligence (28.11.2016 - 01.12.2016)	eng
dc.peerreviewed	yes	eng
dc.publicationstatus	postprint	eng
dc.relation.publisherversion	http://link.springer.com/chapter/10.1007/978-3-319-49130-1_6
dc.project.ID	SGS_2016_023/Ekonomický a sociální rozvoj v soukromém a veřejném sektoru	eng
dc.identifier.wos	000389797400006
dc.identifier.scopus	2-s2.0-85006021653
dc.identifier.obd	39877838	eng