Presented at: 16th International World Wide Web Conference (WWW2007)
by Marius Pasca
As part of a large effort to acquire large repositories of facts from unstructured text on the Web, a seed-based framework for textual information extraction allows for weakly supervised extraction of class attributes (e.g., "side effects" and "generic equivalent" for drugs) from anonymized query logs. The extraction is guided by a small set of seed attributes, without any need for handcrafted extraction patterns or further domain-specific knowledge. The attributes of classes pertaining to various domains of interest to Web search users have accuracy levels significantly exceeding current state of the art. Inherently noisy search queries are shown to be a highly valuable, albeit unexplored, resource for Web-based information extraction, for the task of class attribute extraction as well as for named entity discovery.
Resource URI on the dog food server: http://data.semanticweb.org/conference/www/2007/paper/main/560
Explore this resource elsewhere: