Benchmarking the Performance of Linked Data Translation Systems

Presented at: Linked Data on the Web (LDOW2012)

by Carlos R. Rivero, Andreas Schultz, Chris Bizer, David Ruiz

Linked Data sources on theWeb use a wide range of different vocabularies to represent the same type of entity. For some types of entities, like people or bibliographic record, common vocabularies have emerged that are used by multiple data sources. But even for representing data of these common types, different user communities use different competing common vocabularies. Linked Data applications that want to understand as much data from the Web as possible, thus need to overcome the vocabulary heterogeneity and translate the original data into a single target vocabulary. To support application developers with this integration task, several Linked Data translation systems have been developed. These systems provide languages to represent correspondences in the form of declarative mappings and use these mappings to translate heterogeneous Web data into a single target vocabulary.In this paper, we present a benchmark for comparing the expressivity as well as the runtime performance of data translation systems. Based on a set of examples from the LOD Cloud, we developed a catalog of fifteen data translation patterns and survey how often these patterns occur in our example set. Based on these statistics, we designed the LODIB (Linked Open Data Integration Benchmark) which aims to reflect the real-world heterogeneities that exist on the Web of Data. We apply the benchmark to test the performance of two data translation systems, Mosto and LDIF, and compare the performance of the systems with the SPARQL CONSTRUCT performance of the Jena TDB RDF store.

Keywords: Benchmarking, Data translation, Linked Data

Resource URI on the dog food server:

Explore this resource elsewhere: