A Densitometric Analysis of Web Template Content

Presented at: 18th International World Wide Web Conference (WWW2009)

by Christian Kohlschütter

Webpage: http://www2009.eprints.org/163/1/p1165.pdf

What makes template content in the Web so special that we need to remove it? In this paper I present a large-scale aggregate analysis of textual Web content, corroborating statistical laws from the field of Quantitative Linguistics. I analyze the idiosyncrasy of template content compared to regular "full text" content and derive a simple yet suitable quantitative model.

Keywords: Poster Session

Resource URI on the dog food server: http://data.semanticweb.org/conference/www/2009/paper/163

Explore this resource elsewhere: