Presented at: The Sixth International Language Resources and Evaluation Conference (LREC2008)
Webpage: http://www.lrec-conf.org/proceedings/lrec2008/pdf/549_paper.pdfThis work presents improvements of a large-scale Arabic to French statistical machine translation system over a period of three years. The development includes better preprocessing, more training data, additional genre-specific tuning for different domains, namely newswire text and broadcast news transcripts, and improved domain-dependent language models. Starting with an early prototype in 2005 that participated in the second CESTA evaluation, the system was further upgraded to achieve favorable BLEU scores of 44.8% for the text and 41.1% for the audio setting. These results are compared to a system based on the freely available Moses toolkit. We show significant gains both in terms of translation quality (up to +1.2% BLEU absolute) and translation speed (up to 16 times faster) for comparable configuration settings.
Keywords: Corpus (creation, annotation, etc.), Machine Translation, SpeechToSpeech Translation, Tools, systems, applications, Linguistics
Resource URI on the dog food server: http://data.semanticweb.org/conference/lrec/2008/papers/549
Explore this resource elsewhere: