Exploitation of an Arabic Language Resource for Machine Translation Evaluation: using Buckwalter-based Lookup Tool to Augment CMU Alignment Algorithm

Presented at: The Sixth International Language Resources and Evaluation Conference (LREC2008)

by Clare Voss, Jamal Laoudi, Jeffrey Micher

Webpage: http://www.lrec-conf.org/proceedings/lrec2008/pdf/887_paper.pdf
Webpage: http://www.lrec-conf.org/proceedings/lrec2008/summaries/887.html

Voss et al. (2006) analyzed newswire translations of three DARPA GALE Arabic-English MT systems at the segment level in terms of subjective judgmen+F925t scores, automated metric scores, and correlations among these different score types. At this level of granularity, the correlations are weak. In this paper, we begin to reconcile the subjective and automated scores that underlie these correlations by explicitly grounding MT output with its Reference Translation (RT) prior to subjective or automated evaluation. The first two phases of our approach annotate {MT, RT} pairs with the same types of textual comparisons that subjects intuitively apply, while the third phase (not presented here) entails scoring the pairs: (i) automated calculation of MT-RT hits using CMU aligner from METEOR, (ii) an extension phase where our Buckwalter-based Lookup Tool serves to generate six other textual comparison categories on items in the MT output that the CMU aligner does not identify, and (iii) given the fully categorized RT & MT pair, a final adequacy score is assigned to the MT output, either by an automated metric based on weighted category counts and segment length, or by a trained human judge.

Keywords: Evaluation methodologies, LR web services, Tools, systems, applications, Linguistics


Resource URI on the dog food server: http://data.semanticweb.org/conference/lrec/2008/papers/887


Explore this resource elsewhere: