Reusable Tagset Conversion Using Tagset Drivers

Presented at: The Sixth International Language Resources and Evaluation Conference (LREC2008)

by Daniel Zeman

Webpage: http://www.lrec-conf.org/proceedings/lrec2008/pdf/66_paper.pdf
Webpage: http://www.lrec-conf.org/proceedings/lrec2008/summaries/66.html

Part-of-speech or morphological tags are important means of annotation in a vast number of corpora. However, different sets of tags are used in different corpora, even for the same language. Tagset conversion is difficult, and solutions tend to be tailored to a particular pair of tagsets. We propose a universal approach that makes the conversion tools reusable. We also provide an indirect evaluation in the context of a parsing task.

Keywords: Corpus (creation, annotation, etc.), Standards for LRs, Tagging, Linguistics


Resource URI on the dog food server: http://data.semanticweb.org/conference/lrec/2008/papers/66


Explore this resource elsewhere: