The Effect of Text Preprocessing Strategies on Detecting Fake Consumer Reviews

Barushka, Aliaksandr; Hájek, Petr

doi:10.1145/3383902.3383908

Publikace:
The Effect of Text Preprocessing Strategies on Detecting Fake Consumer Reviews

Konferenční objektOmezený přístuppeer-reviewedpostprint

dc.contributor.author	Barushka, Aliaksandr	cze
dc.contributor.author	Hájek, Petr	cze
dc.date.accessioned	2021-05-15T18:13:55Z
dc.date.available	2021-05-15T18:13:55Z
dc.date.issued	2019	eng
dc.description.abstract	Fake review detection is getting crucial due to rapid growth of internet purchases. Obviously, it is important to choose the most efficient algorithm in order to detect fake (deceptive, spam) reviews either positive or negative. On the other hand, it is also important to pre-process the textual content of the reviews for training and later for production environment. A number of text preprocessing methods are examined in this study, such as feature dimensionality, tokenization, removal of stop words, stemming and different term weighting schemes. Three well-known machine learning algorithms are used as benchmark classifiers, including Naïve Bayes, neural network and support vector machine. Here we show that text preprocessing strategies are important determinants of the classifiers' performance. We find that the classifiers perform better for high-dimensional datasets represented by bigrams or trigrams selected according to the non-binary weighting scheme. Stemming and stopword removal seem to be less important.	eng
dc.description.abstract-translated	Detekce falešných recenzí získává na důležitosti díky rychlému růstu nákupů přes internet. Je zřejmé, že je důležité zvolit nejúčinnější algoritmus, aby bylo možné detekovat falešné (klamné, spamové) recenze, ať už pozitivní nebo negativní. Na druhou stranu je také důležité předběžně zpracovat textový obsah recenzí pro učení a později pro produkční prostředí. V této studii je zkoumána řada metod předzpracování textu, například dimenze atributů, tokenizace, odstranění častých slov, ořezávání a různá schémata vážení termů. Jako srovnávací klasifikátory se používají tři známé algoritmy strojového učení, včetně Naïve Bayes, neuronové sítě a podpůrného vektorového stroje. Zde ukazujeme, že strategie předzpracování textu jsou důležitými determinanty výkonu klasifikátorů. Zjistili jsme, že klasifikátory fungují lépe pro vysoko-dimenzionální datové sady reprezentované bigramy nebo trigramy vybranými podle ne-binárního váhového schématu. Ořezávání a odstranění častých slov se zdají být méně důležité.	cze
dc.event	3rd International Conference on E-Business and Internet, ICEBI 2019 (09.11.2019 - 11.11.2019, Praha)	eng
dc.format	p. 13-17	eng
dc.identifier.doi	10.1145/3383902.3383908	eng
dc.identifier.isbn	978-1-4503-7170-4	eng
dc.identifier.obd	39885186	eng
dc.identifier.scopus	2-s2.0-85096087042
dc.identifier.uri	https://hdl.handle.net/10195/77006
dc.language.iso	eng	eng
dc.peerreviewed	yes	eng
dc.publicationstatus	postprint	eng
dc.publisher	ACM (Association for Computing Machinery)	eng
dc.relation.ispartof	ICEBI 2019 : proceedings of the 2019 3rd International Conference on E-Business and Internet	eng
dc.relation.publisherversion	https://dl.acm.org/doi/abs/10.1145/3383902.3383908#sec-terms	eng
dc.rights	pouze v rámci univerzity	cze
dc.subject	fake	eng
dc.subject	reviews	eng
dc.subject	text preprocessing	eng
dc.subject	bag of words	eng
dc.subject	machine learning	eng
dc.title	The Effect of Text Preprocessing Strategies on Detecting Fake Consumer Reviews	eng
dc.title.alternative	Vliv strategie předzpracování textu na detekci falešných spotřebitelských recenzí	cze
dc.type	ConferenceObject	eng
dspace.entity.type	Publication

Soubory

Původní svazek

Nyní se zobrazuje 1 - 1 z 1

Název:: The_effect_of_preprocessing_strategies_on_detecting_fake_reviews_-revised.pdf
Velikost:: 317.7 KB
Formát:: Adobe Portable Document Format

The_effect_of_preprocessing_strategies_on_detecting_fake_reviews_-revised.pdf (317.7 KB)

Kolekce

Publikační činnost akademických pracovníků UPCE / UPCE Research Outputs
Publikační činnost akademických pracovníků FES / FES Research Outputs

Publikace: The Effect of Text Preprocessing Strategies on Detecting Fake Consumer Reviews

Soubory

Původní svazek

Kolekce

Publikace:
The Effect of Text Preprocessing Strategies on Detecting Fake Consumer Reviews