NYTimes TimesTags API

The New York Times has created an API against their “taxonomy and controlled vocabulary used by Times indexers since 1851”.  Send their API a word and the NYTimes will send back a list of the most common relevant tags (and whether it’s a Person, Description, Organization or Location).  

Why create our own structured vocabulary when highly trained people have been doing it since 1851 and we can borrow theirs?

Comments off

Project Bamboo


Not sure if you have heard of Project Bamboo, but it is a effort to find ways to utilize and incorporate technology into humanities research to advance the field(s).  Sponsored by the Mellon Foundation, the end goal is a proposal for an implementation strategy, including standards and the like.  My husband has been attending the most recent workshop on behalf of Blackboard (because they want a seat at the table as the standards are being set of course!!!) and it’s basically been a 202 extravaganza.  At the table?  Librarians, philosophers, artists, lit profs, computer scientists, even a few iSchool professors (Larson and Kansa), etc. This led to lengthy debates about the meaning of what they were actually trying to do, how explcitly they should define it, how to carve up their worlds, why the sky is blue, etc. One of the main things that they apparently kept coming back to was, of course, The Tradeoff.  Who does the work and who reaps the benefits.

Pretty cool stuff though, and hearing his recap (“classification”, “ontology”, “schemas”, “data interoperability”, “buzz”, “buzz”, “buzz”) was essentially like a mini-study session for the midterm.

If anyone is interested in contributing – especially those philosophers among us – there are links to join off of their site.


Comments off

No Child Left Behind as an Ontology:

Imposing standardization upon a domain (Education): 

Our discussions about authority and vocabulary authoring remind me of the recent attempt by the No Child Left Behind (NCLB) law of 2001 to standardize public educaton. For those that don’t know, the goals of NCLB were lofty:  to close the achievement gap between high performing and low performing school by implementing a system of accountability and high standards. This included giving annual academic assessments, having consequences for schools that fail to improve. NCLB was met with a great deal of hostility and resentment from many teachers. I had the chance to see observe this during my time teaching. And as 202 is reframing my worldview, I’m coming to see that some of the backlash against NCLB might be linked to the imposition of a new ontology upon a fragmented community of interest.

What happens when a new ontology is imposed upon a collection of thousands of communities of interest when each member of the community of interest is expected to be an expert in his or her own practice? 

NCLB could be seen as an ontology that defines the terms used to describe and represent the various processes surrounding public education. Few would argue with the intent of the law: NLCB seeks to reform public education and close the gap between high and low performing schools. In order to do this,  the federal government acts as an authority that imposes a certain redefinition of vocabulary and processes that carry with them a set of assumptions and values, not to mention  consequences. 

Fragmented Community of Interest (Teachers). 

For many teachers, NCLB came down to the five terms that defined the categories each of their students were placed into based upon their standardized tests scores: Well Below Basic, Below Basic, Basic, Proficient, Advanced. Students were tracked over school years, and schools were rewarded or punished based on the numbers students in each category (as well as numbers of different kinds of students in each category). At the time of its implementation, most teachers had already developed unique tracking and assessment systems over their many years of teaching. Suddenly they were asked to give up their own framework, and replace it with a new and unfamilair one. Soon, school communities were filled with angry cries directed toward NCLB: “They’re doing it all wrong. They’re making it worse”. They could have been saying, “They’re not speaking my language!”

I am suggesting that the national body of teachers is a fragmented community of interest. Rosenthal defines communities of interest (COI) as “a set of stakeholders who must exchange information in pursuit of their shared goals, interests, missions, or business processes and who therefore must have shared vocabulary for the information they exchange”(Rosenthal 47). As a small COI, it was difficult enough for the English department at my high school to define a shared vocabulary to exchange information. NCLB asked all teachers in all schools to use a system of standards, categorization and information exchange that they had no hand in developing. Oh, and teachers would be held accountable for it as well. 

The idea here is not that authoritative ontologies don’t work or can’t work. They are immensely useful. The point, I guess, is for us to be aware of the ontological inertia of a given domain. Any change in the structure of a system of categorization and accountability will be difficult to implement, and will probably be extremely painful to the existing users, especially if they have historically been free to create and implement their own systems. 


Rosenthal, Arnon (2004). From Semantic Integration to Semantics Management: Case Studies and a Way Forward.. SIGMOD Record.33, 44-50.

Comments (1)

A Dogma of Categorization

In determining facets or categories for a set of objects, we might tend to think that some facets are better than others because they are more inherently essential to a particular set of objects.  I believe this is a dogma we should be careful to avoid and as a result I argue that we can only be pragmatic in evaluating ontologies.

__(‘Read the rest of this entry »’)

Comments (2)

A Semi-Automated Semantic Web?

I read a paper today that discussed a step covered in class today on document engineering.  We learned how the Berkeley Calendar Network team manually harvested and consolidated tables of terms in a huge excel spreadsheet.  The IEEE paper I read argues for semi-automating this process of deriving homonyms in IS-A relationships, for instance, and integrating the terms with the http://wordnet.princeton.edu ontology.

The buzzword-laden paper continues to argue for creation of a working Semantic Web by harnessing the large quantity of structured, “Deep Web” data. The, “Deep Web” (unindexed by conventional search engines) contains ~over 4 orders of magnitude of data than the, “Surface Web” and some data is structured in databases.  The Semantic web, they claim, has been hampered by difficulty in manually creating large OWL and RDF ontologies, and harvesting the richer potential of the Deep Web points to a possible solution:
Semantic Web + Deep Web-Ontology-aware browser.

Ironically, the paper, itself, is buried in the deep web:

Comments off

Dewey or Don’t We?

This article from May 07 is about a library that decided to move away from the Dewey Decimal system and towards a subject based organization. They used 50 subject headings created by the Book Industry Study Group Inc. The library intentionally mimicked certain aspects of bookstores, not only in how the books are organized by subject, but also in physical layout. It appears they are trying to accommodate their customers’ habits and expectations.

For myself, this sounds interesting. I recall while reading Weinberger that I liked book stores and as long as the subject areas are clearly labeled I had little trouble finding the specific book I was seeking. At the very least it was no more difficult than in a library, and usually easier. Of course, this is a small library (24,000 books/dvds, etc). If you are dealing with a larger set of works this may become too difficult to manage.  And it seems more “natural” to me to search for a subject over a number.

However, one of the comments on the article is key (in my opinion) to the bookstore/Dewey decision. “That’s OK for leisure reading, but if you need to do research on a specific topic, you are going to have a hard time finding the particular information that you need.” The additional structure in the Dewey system makes it easier (once you know how to use the system) to find ever-granular information. Most bookstores just lump it all together.

I’ve not been able to find any follow-up information as to whether it worked or not. Their page shows they now have over 30,000 items in the library, but nothing about its current layout/organization or popularity. I wish I’d found this article when we read Weinberger’s piece.

PS: I wish I could claim the title as original, but I borrowed it.

Comments (1)

New Method for Building Multilingual Ontologies


Researchers from the Validation and Business Applications Group based at the Universidad Politécnica de Madrid’s School of Computing (FIUPM) have developed a new method for building multilingual ontologies that can be applied to the Semantic Web.

So ontologies are the cool thing to be developing these days given the promise of the Semantic web looming over us.  But up until yesterday, a big limitation with ontologies was that they were relatively single-minded when it came to language.  “The application of ontologies to the Internet comes up against serious problems triggered by linguistic breadth and diversity. This diversity stands in the way of users making intelligent use of the web.

People have tried to bridge the gap, but strategies like expert-based terminology (ahem, Svenonius, ahem) and using one language as the “pivot”, have failed miserably.  But these researchers claim to have created a method for building ontologies IRRESPECTIVE of language. And their secret weapons appear to be universal words and the assumption that “any text has implicit ontological relations that can be extracted by analysing certain grammatical structures of the sentences making up the text“. (I mean, I could’ve told them that, but whatever)

Interesting stuff, and will probably be even more interesting when I finally grasp what an ontology actually is. 😛 (just kidding) (sort of)

Comments off

The Proto-Web


Apparently, Paul Otlet came up with what is arguably the first iteration of the World Wide Web in 1934, the Mundaneum. Located in Belgium, it’s comprised of millions of 3 by 5 index cards stored in thousands of boxes. It never fully took off because of funding problems and the German invasion of Belgium during the World War. The article doesn’t describe in full detail how the system is implemented, but one thing that stands out is its revised version of hypertext, which differs from standard hypertext in that a link can indicate whether or not other links are similar to itself. Professor Buckland describes the Semantic Web as being “rather Otlet-ish.” The ill-lived story of the Mundaneum could prove to be a lesson to learn about the Semantic Web and whether or not it will fail depending on the way that it is structured and the labor that it will require. 

Relevant Lecture: The Semantic Web 

Comments off