Why Dog Food?!

The call to "eat your own dog-food" is often heard in the Semantic Web research area. The motto encourages us to use the languages and tools that we are developing to support our own work and demonstrate convincing arguments for the introduction of explicit semantics.

The International Semantic Web and European Semantic Web Conference series have followed this maxim and published metadata describing the events. This metadata covers information about papers, schedules, attendees etc. Tools can then consume this information and provide services, such as intelligent scheduling or search, to conference attendees.

In 2006, ontologies were developed for ESWC that made use of existing vocabularies like Friend of a Friend (FOAF). Additions were also made for ISWC to encompass the the SWRC ontology, which provides relationships for encoding bibtex information and iCal formats for scheduling events.

In previous conferences, data has been hosted at the conference site. This introduces potential problems of curation and sustainability. At data.semanticweb.org we intend to provide a permanent, central, home for this conference metadata. This is not just a site for ISWC and ESWC though. We hope that, in time, other metadata sets relating to Semantic Web activity will be hosted here — additional bibliographic data, test sets, community ontologies and so on.

I'm a conference or workshop organiser - How do I add my own data to the dog food corpus?

All the data on our site is stored in RDF, the format for linked, decentralised data on the Web. Every time we add a new workshop or conference to the corpus, we have different options of doing this:

XML-based Import

We can start with some simple XML files containing the data for the event and convert it to the RDF we need. If you want to add your conference to our corpus, all you have to do is to create these XML files and send them to us. We need two different files: a configuration file and a data file. The first one contains general information about the event (dates, homepage, chairs, ...), while the second contains the information about all papers, authors, etc. Below are some examples for both files. Just model your own event according to those. To make things easier, we have also set up a validator service for checking your XML files before you send them to us. The validator page also contains links to the schema files, in case you want to work from there.

If you are using EasyChair for your conference organisation and you are a premium subscriber, you can generate the data file directly from within EasyChair!

Spreadsheet-based Import

We are also using a lot of spreadsheet input as source data. The XML format is easiest to process for us, but if it seems a bit too daunting, you are welcome to use this method as well. There is a small Excel spreadsheet example you can use as a starting point here.

RDFa-based Import

You can also annotate your own event web page with RDFa, which we can then import. This way, you keep control over your own data, and can change it at any time. Ideally, you would use the same vocabularies we are using here, and you should make sure that URIs you use for people and organisations match those already present in our dataset. If there is no URI for a person or organisation that you want to include in your data, you can simply coin a new URI, but ideally in the http://data.semanticweb.org/person or http://data.semanticweb.org/organization namespaces. Some good examples for workshops annotated with SWC RDFa are:

Once you have annotated your event page, you can then drop us an email, so that we can add your event to the list ofg sites we import.

What vocabularies are you using in the dog food corpus?

If you want to know more about the RDF vocabularies and ontologies we actually use behind the scenes, you can find some more information here. To describe all the entities in our corpus, we are using what we call the Semantic Web Conference Ontology (SWC). However, SWC is mainly a convention of how to use classes and properties from other ontologies, most prominently FOAF (for people) and SWRC (their BibTeX elements, for the papers). We are also throwing in some SIOC, Dublin Core and iCal and of course a lot of RDFS and OWL. Some glue is provided through our own swc namespace.


The documentation itself is basically just example usage, here showing you how Tom's and my own "Recipes for Semantic Web Dog Food" paper is modeled in SWC. The first figure shows how the main entities - the paper, the authors, the corresponding talk, topics and the author's affiliations - are linked. The second figure then shows in more detail the data that is available for each type of instance.

How to get to the Data

The data on data.semanticweb.org is available in two flavours:
  • There are rdf/xml dumps for each conference and workshop in the repository. All those dumps are available at http://data.semanticweb.org/dumps/.
  • The URIs of the named graphs also work as an alias to that graph's data dump.
  • There is a SPARQL repository which contains all data for all conferences and workshops:


    You can post a SPARQL query to the repository by simply appending ?query=$ESCAPED_QUERY. The repository itself is split up into a number of named graphs, one for each event (i.e., conference or workshop). This means that you can either do an integrated query over the whole repository, or restrict the query to a specific event, using the GRAPH directive.

To control the format returned by the SPARQL endpoint, you have to specify the desired mimetype in the Accept header of your HTTP request. Possible mimetypes include application/sparql-results+xml and application/sparql-results+json. E.g., to run a SPARQL query with curl from the command line, you could do:

curl -H "Accept: application/sparql-results+xml" "http://data.semanticweb.org/sparql?query=PREFIX%20foaf%3A%20%

For an easy and fun way to explore the dataset, we have added the interactive SPARQL explorer Snorql to the site: http://data.semanticweb.org/snorql

SPARQL Queries

To see how the SPARQL queries for this server work in general, here are two small examples that show how to get the information about all people in the repository and their affiliations. The first version queries the complete repository, the second one only the part of the repository that contains data about ISWC+ASWC2007.

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX swrc: <http://swrc.ontoware.org/ontology#>
SELECT DISTINCT $person $affiliation
    $person a foaf:Person.
    $person swrc:affiliation $affiliation
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX swrc: <http://swrc.ontoware.org/ontology#>
SELECT DISTINCT $person $affiliation
    GRAPH <http://data.semanticweb.org/conference/iswc-aswc/2007/complete> {
        $person a foaf:Person.
        $person swrc:affiliation $affiliation

Named Graphs

All data on the dog food site lives in the same RDF repository. However, we have split this repository into several different named graphs; one for each conference or workshop. The graphs should be interpreted as the context in which the statements they contain are true. E.g., the statement "Richard works for DERI" is only true in the context of some of the events in the repository (e.g. ISWC2008). In other contexts (e.g. ISWC2007), it is true that "Richard works for Freie Universität Berlin". The following is a list of all named graphs in the repository:

What's the license for the data on this site?

The event datasets on SWDF are free to use for everyone with no restrictions or strings attached. We do this in the spirit of free, open and linked data, and believe this is to the benefit of all involved parties - event organisers, who gain publicity for their event, us as the data hosters, who attract usage of our site, and you, the user of this data, who is unencumbered by any licence to use this data in any way you see fit.

In particular, we apply the Open Data Commons Public Domain Dedication and Licence (it's called a licence, but it's really a waiver) and Attribution-Sharealike Community Norms: