Influence of Text Type and Text Length on Anaphoric Annotation

Presented at: The Sixth International Language Resources and Evaluation Conference (LREC2008)

by Daniela Goecke, Maik Stührenberg, Andreas Witt

Webpage: http://www.lrec-conf.org/proceedings/lrec2008/pdf/368_paper.pdf
Webpage: http://www.lrec-conf.org/proceedings/lrec2008/summaries/368.html

We report the results of a study that investigates the agreement of anaphoric annotations. The study focuses on the influence of the factors text length and text type on a corpus of scientific articles and newspaper texts. In order to measure inter-annotator agreement we compare existing approaches and we propose to measure each step of the annotation process separately instead of measuring the resulting anaphoric relations only. A total amount of 3,642 anaphoric relations has been annotated for a corpus of 53,038 tokens (12,327 markables). The results of the study show that text type has more influence on inter-annotator agreement than text length. Furthermore, the definition of well-defined annotation instructions and coder training is a crucial point in order to receive good annotation results.

Keywords: Anaphora, Coreference, Corpus (creation, annotation, etc.), Validation of LRs, Linguistics


Resource URI on the dog food server: http://data.semanticweb.org/conference/lrec/2008/papers/368


Explore this resource elsewhere: