When to Reach for the Cloud: Using Parallel Hardware for Link Discovery

Presented at: 10th ESWC 2013 (ESWC2013)

by Axel-Cyrille Ngonga Ngomo, Lars Kolb, Norman Heino, Michael Hartung, Sören Auer, Erhard Rahm

With the ever-growing amount of RDF data available across the Web, the discovery of links across datasets and deduplication of resources within knowledge bases have become tasks of central importance. Over the last years, several link discovery approaches have been developed to tackle the runtime and complexity problems that are intrinsic to link discovery. Yet, so far, the management of hardware resources for the execution of link discovery tasks has been payed little attention to. This paper aims to address exactly this research gap by investigating the use of hardware resources for link discovery. We implement the HR3 approach within three different paradigms of parallel computing. Based on a comparison of the runtimes of three different implementations, we address the following question: Under which conditions should which hardware be used to link or deduplicate knowledge bases? Our results show that certain tasks that seem to be predestined to being carried out in the cloud can actually be ran using standard massively parallel hardware. Moreover, our evaluation provides break-even points that can serve as guidelines for deciding on when to use which hardware for link discovery.

Keywords: Cloud computing, Massively Parallel Computing, Parallel Hardware, Reduction Ratio

Resource URI on the dog food server: http://data.semanticweb.org/conference/eswc/2013/paper/eswc-2013/78

Explore this resource elsewhere: