Automated semantic tagging of speech audio

Presented at: 21st International World Wide Web Conference (WWW2012)

by Yves Raimond, Chris Lowis, Jonathan Tweed, Roderick Hodgson

The BBC is currently tagging programmes manually, using DBpedia as a source of tag identifiers, and a list of sug- gested tags extracted from their synopsis. These tags are then used to help navigation and topic-based search of BBC programmes. However, given the very large number of pro- grammes available in the archive, most of them having very little metadata attached to them, we need a way of automat- ically assigning tags to programmes. We describe a frame- work to do so, using speech recognition, text processing and concept tagging techniques. We evaluate this framework against manually applied tags, and compare it with related work. We find that this frame- work has better performances than related work for this task, and is good enough to bootstrap the tagging process of archived content. We describe Tellytopic, an application us- ing automatically extracted tags to aid discovery of archive content.

Keywords: Linked Data, Named Entity Extraction, Speech Processing, Text Processing

