MultiCrawler: A Pipelined Architecture for Crawling and Indexing Semantic Web Data

Presented at: 5th International Semantic Web Conference (ISWC2006)

by Andreas Harth, J├╝rgen Umbrich, Stefan Decker


The goal of the work presented in this paper is to obtain large amounts of semistructured data from the web. We contrast our approach to conventional web crawlers, and describe and evaluate a five-step pipelined architecture to crawl and index data from both the traditional and the Semantic Web.

