Automated interlinking of speech radio archives

Presented at: Linked Data on the Web (LDOW2012)

by Yves Raimond, Chris Lowis

The BBC is currently tagging programmes manually, using DBpedia as a source of tag identifiers, and a list of suggested tags extracted from their synopsis. These tags are then used to help navigation and topic-based search of BBC programmes. However, given the very large number of programmes available in the archive, most of them having very little metadata attached to them, we need a way of automatically assigning tags to programmes. We describe a framework to do so, using speech recognition, text processing and concept tagging techniques. We evaluate this framework against manually applied tags, and compare it with related work. We find that this framework is good enough to bootstrap the interlinking process of archived content.

Keywords: Concept Tagging, Linked Data, Named Entity Extraction, Speech Processing, Text Processing

