Digitální knihovnaUPCE
 

Application of POS Tagging in Machine Translation Evaluation

Konferenční objektpeer-reviewedpostprint
Náhled

Datum publikování

2016

Vedoucí práce

Oponent

Název časopisu

Název svazku

Vydavatel

Wolters Kluwer ČR, a. s.

Abstrakt

The aim of the paper is to present a process of natural language processing in its full extent as well as in machine translation from English language into Slovak as a representative of inflectional language. We aim at the data preparation phase for automatic evaluation of machine translation through POS tagging. The preparation phase for MT evaluation consists of several steps, but only the first step - creation of dataset-parallel corpus is deeply described. We focus on the source text collection of various styles and genres-dataset creation and machine translation collection. Two machine translation systems are used-web SMT Google translator API and MT@EC. As a morphology analyzing tool-TreeTagger is used. The process of dataset creation, which covers not only parallel corpora creation, but also creation of errors' database of Slovak words with morphological annotation, is analyzed. The main contribution consists of a novel approach to research of MT evaluation given by the POS tagging (machine learning methods), to identify differences between MT output and post-edited machine translation output. The ground essential of the research is machine translation errors analysis, their identification and classification, from English language into Slovak.

Rozsah stran

p. 471-479

ISSN

2464-7470

Trvalý odkaz na tento záznam

Projekt

SGS_2016_023/Ekonomický a sociální rozvoj v soukromém a veřejném sektoru

Zdrojový dokument

DIVAI 2016 ‐ 11th International Scientific Conference on Distance Learning in Applied Informatics

Vydavatelská verze

Přístup k e-verzi

Pouze v rámci univerzity

Název akce

DIVAI 2016 ‐ 11th International Scientific Conference on Distance Learning in Applied Informatics (02.05.2016 - 04.05.2016)

ISBN

978-80-7552-249-8

Studijní obor

Studijní program

Signatura tištěné verze

Umístění tištěné verze

Přístup k tištěné verzi

Klíčová slova

Natural language processing, Evaluation, Machine Translation quality, Sentence alignment, Tokenization, POS tagging, Evaluace, Kvalita strojového překladu, POS tagging, Tokenizace

Endorsement

Review

item.page.supplemented

item.page.referenced