A Multi-Word Term Extraction Program for Arabic Language

Presented at: The Sixth International Language Resources and Evaluation Conference (LREC2008)

by Siham Boulaknadel, Beatrice Daille, Driss Aboutajdine

Webpage: http://www.lrec-conf.org/proceedings/lrec2008/pdf/378_paper.pdf
Webpage: http://www.lrec-conf.org/proceedings/lrec2008/slides/378.ppt
Webpage: http://www.lrec-conf.org/proceedings/lrec2008/summaries/378.html

Terminology extraction commonly includes two steps: identification of term-like units in the texts, mostly multi-word phrases, and the ranking of the extracted term-like units according to their domain representativity. In this paper, we design a multi-word term extraction program for Arabic language. The linguistic filtering performs a morphosyntactic analysis and takes into account several types of variations. The domain representativity is measure thanks to statistical scores. We evalutate several association measures and show that the results we otained are consitent with those obtained for Romance languages.

Keywords: Information Extraction, Information Retrieval, MultiWord Expressions & Collocations, Linguistics


Resource URI on the dog food server: http://data.semanticweb.org/conference/lrec/2008/papers/378


Explore this resource elsewhere: