Weinberger Need Statisticians
I’ve always wondered how Weinberger could get meaningful information out of his “huge pile”, and in his interview with Doctorow[1], Weinberger mentioned a way to make use of it: statistical analysis. This is what he said:
“Tags are chaos, and as you get more and more of them, it will get more and more chaotic. It turns out that when you have a lot of them, the statistical analysis becomes really pretty precise.”
This reminds me of a paper I’ve previously read, “Toward Extracting Flickr Tag Semantics”, written by Yahoo! Research Berkeley and published on WWW2007[2]. The method described in the paper could identify “place tag” and “event tag” from the tags store in Flickr. For instance, the authors could “detect that the tag Bay Bridge describes a place, and that the tag WWW2007 is an event.” (WWW2007 is a conference held in Canada in 2007.)
How did they do that? The main idea is, “place tag” like Bay Bridge has significant spatial patterns, tending to concentrate within a certain geographic range, and “event tag” like a conference has significant temporal patterns, tending to appear around a certain time period. So by using preexisting spatial and temporal statistical methods, computer scientists are able to discover the “semantics” of Fickr tags.
In all, statistical analysis can help Weinberger make use of the huge amount of information, and it may also serve as a “filter” to deal with information overload problems.
REFERENCE
[1] Metacrap and Flickr Tags: An Interview with Cory Doctorow, http://blog.wired.com/business/2007/05/metacrap_and_fl.html
[2] Towards Extracting Flickr Tag Semantics, http://www2007.org/posters/poster909.pdf