We are witnessing the first stages of the document web becoming a data web, with the implied new opportunities for discovering, re-purposing, "meshing up" and analyzing linked data. There is an increasing volume of linked open data and the first data web search engines are taking shape. Dealing with queries against the nascent data web may easily add two orders of magnitude in computing power requirements on top of what a text search engine faces. Queries may involve arbitrary joining, aggregation, filtering and so forth, compounded by the need for inference and on the fly schema mapping. This is the environment for which Virtuoso Cluster Edition is intended. This paper presents the main challenges encountered and solutions arrived at during the development of this software product. We present adaptations of RDF load and query execution and query planning suited for distributed memory platforms, with special emphasis on dealing with message latency and the special operations required by RDF.
Keywords: RDF, Scalability
Resource URI on the dog food server: http://data.semanticweb.org/workshop/ssws/2008/paper/main/1
Explore this resource elsewhere: