The AUTONOMATA Spoken Names Corpus

Presented at: The Sixth International Language Resources and Evaluation Conference (LREC2008)

by Henk van den Heuvel, Jean-Pierre Martens, Bart D hoore, Kristof D hanens, Nanneke Konings

Webpage: http://www.lrec-conf.org/proceedings/lrec2008/pdf/48_paper.pdf
Webpage: http://www.lrec-conf.org/proceedings/lrec2008/summaries/48.html

In the Autonomata project we have collected a corpus of spoken name utterances with manually corrected phonemic transcriptions of these utterances. The corpus was designed with the intention to become a major resource for the development of automatic speech recognition engines that can achieve a high accuracy on the recognition of person and geographical names spoken in Dutch. The recorded names were selected so as to reveal the major pronunciation variations that a speech recognizer of e.g. a navigation system with speech input is going to be confronted with. This includes native speakers speaking foreign names and vice versa.

Keywords: Corpus (creation, annotation, etc.), Multilinguality, Speech resource/database, Linguistics


Resource URI on the dog food server: http://data.semanticweb.org/conference/lrec/2008/papers/48


Explore this resource elsewhere: