Presented at: 9th Extended Semantic Web Conference (ESWC2012)
The three most common approaches for deriving or predicting instantiated relations, i.e. triple statements (s, p, o), are information extraction, reasoning and relational machine learning. Information extraction uses sensory information, typically in form of text, and extracts statements using various methods ranging from simple classifiers to the most sophisticated NLP approaches. Logical reasoning is based on a set of true statements and derives new statements via inference using higher-order logical axioms. Finally, machine learning exploits regularities in the data to predict the likelihood of new statements. In this paper we combine all three methods to exploit all sources of available information in a modular way, by which we mean that each approach, i.e., information extraction, reasoning, machine learning, can be optimized independently to be combined in an overall system. For relational machine learning, we present a novel approach based on hierarchical Bayesian multi-label learning which also sheds new light on common factorization approaches. We rank the probabilities for statements to be true in the sense that: given that we are forced to make a decision, what is the best option. We consider the fact that an entity can belong to more than one ontological class and discuss aggregation. We extend the approach to modeling nonlinear dependencies between relationships and for personalization. We validate our model using data from the Yago and the DBpedia ontology.
Keywords: Multivariate modeling, Probabilistic PCA, Relation prediction
Resource URI on the dog food server: http://data.semanticweb.org/conference/eswc/2012/paper/research/171
Explore this resource elsewhere: