Presented at: The Sixth International Language Resources and Evaluation Conference (LREC2008)
by Serge Sharoff, Mikhail Kopotev, Tomaz Erjavec, Anna Feldman, Dagmar Divjak
Webpage: http://www.lrec-conf.org/proceedings/lrec2008/pdf/78_paper.pdfThis paper reports the principles behind designing a tagset to cover Russian morphosyntactic phenomena, modifications of the core tagset, and its evaluation. The tagset is based on the MULTEXT-East framework, while the decisions in designing it were aimed at achieving a balance between parameters important for linguists and the possibility to detect and disambiguate them automatically. The final tagset contains about 500 tags and achieves about 95% accuracy on the disambiguated portion of the Russian National Corpus. We have also produced a test set that can be shared with other researchers.
Keywords: Morphology, Multilinguality, Tagging, Linguistics
Resource URI on the dog food server: http://data.semanticweb.org/conference/lrec/2008/papers/78
Explore this resource elsewhere: