Thesaurus-based Search in Large Heterogeneous Collections

Presented at: 7th International Semantic Web Conference (ISWC2008)

by Jan Wielemaker, Michiel Hildebrand, Jacco van Ossenbruggen, Guus Schreiber


In cultural heritage, large virtual collections are coming into existence. Such collections contain heterogeneous sets of metadata and vocabulary concepts, originating from multiple sources. In the context of the E-Culture demonstrator we have shown earlier that such virtual collections can be effectively explored with keyword search and semantic clustering. In this paper we describe the design rationale of ClioPatria, the E-Culture open-source software which provides APIs for scalable semantic graph search. The use of ClioPatria's search strategies is illustrated with a realistic use case: searching for "Picasso". We discuss details of scalable graph search, the required OWL reasoning functionalities and show why SPARQL queries are insufficient for solving the search problem.

Keywords: Data integration, e-culture, metadata, semantic search, vocabularies

