Identification and Disambiguation of Graph-structured Concepts for Enterprise Search

Presented at: 19th International World Wide Web Conference (WWW2010)

by Falk Brauer, Michael Huber, Gregor Hackenbroich, Ulf Leser, Felix Naumann, Wojciech BarczyƄski

Enterprise Search (ES) is a major challenge due to a number of reasons, among which the high level of ambiguity and implicitly addressed concepts in query and document terms are the most important. What distinguishes ES from ordinary search problems most is the existence of graph-structured enterprise data (ontologies) that describe the concepts of interest and their relationships to each other. We present a method to leverage this type of information to improve the quality of query answers. Our method identifies concepts from the enterprise ontology in the query and in the corpus. Therefore, we propose a ranking scheme for top-k ontology sub-graphs on top of approximately matched token q-grams between text and ontology. The ranking scheme leverages the graph-structure of the ontology for identification and disambiguation of not explicitly mentioned concepts. It improves previous solutions by using a fine-grained ranking function that leverages relevance ratings derived from the enterprise data and a confidence rating which takes constituent match situations derived from the document into account. Query/document-specific subgraphs are used for ranking documents based on the similarity of those subgraphs. This method is able to capture much more of the semantics of queries and documents than previous techniques. We prove this claim by an evaluation of our method using three real-life document sets and consider two knowledge bases.

Keywords: Semantic search, entity retrieval, geo/temporal search, sub/super-documents

Resource URI on the dog food server:

Explore this resource elsewhere: