Efficiently Evaluating Skyline Queries on RDF Databases

Presented at: 8th Extended Semantic Web Conference (ESWC2011)

by Ling Chen, Sidan Gao, Kemafor Anyanwu

Skyline queries are a class of preference queries that are valuable for multi-criteria decision making scenarios. Such queries compute the pareto-optimal tuples from a set of tuples. This problem has received significant attention in the context of relational data where many techniques focus on answering queries over a single table. Consequently, for multi-relational skyline query scenarios, as would be the norm for RDF, the strategy for query evaluation would need to be a join-first-skyline-later strategy. However, such a split computational strategy limits the optimization opportunities that are useful for pruning search space via information passing between the join phase and the skyline phase. Other available techniques for multi-relational skyline queries assume storage and indexing techniques that are not typically used with RDF, thereby requiring a preprocessing step. In this paper, we present an approach for optimizing skyline queries over RDF data. The approach is based on the concept of a “Header Point” which maintains a concise summary of visited region in the data space. This summary allows some fraction of non-skyline tuples to be pruned from the set advancing to the skyline processing phase, thus reducing the number of expensive dominance checks required in the skyline phase. We further present more aggressive pruning rules that result in the computation of near-complete skylines in significantly less time than the complete algorithm. A comprehensive performance evaluation of different algorithms is presented using datasets with different types of data distributions generated using a benchmark data generator.

Keywords: RDF Databases, Skyline Queries, Skyline-Join

Resource URI on the dog food server: http://data.semanticweb.org/conference/eswc/2011/paper/semantic-data-management/49

Explore this resource elsewhere: