Taxonomy of Philosophy

Weinberger links to this intriguing attempt to categorize philosophical papers for a system to “access online work in philosophy.”  

The best part is the discussion that follows David Chalmers’ blog post about the project, which sends me through a microcosm of the 202 course so far.  One commenter links to “An Essay towards a Real Character and a Philosophical Language” in which John Wilkins attempts to create a language where every word defines itself based on a hierarchy of 40 Genuses (each divided into Differences and then Species) of his design.  The Wikipedia article points me to Borges’ response, “The Analytical Language of John Wilkins”, where he casts doubt on such universal categorization schemes by comparison to The Celestial Emporium of Benevolent Knowledge.  Other commenters on the blog post point out similar problems: a separate set of categories for the history of philosophy seems strange since many of these papers are relevant to the philosophical topics themselves; there seem to be “multiple principles of division“.

One of the author’s of the philosophy taxonomy responds with a return to pragmatism:

OK, it’s a pseudo-taxonomy, or maybe just a category scheme. We’re not doing science here, just trying to come up with something useful and convenient.

Excellent.  We all know that classification systems should be judged by their usefulness rather than how essential their representations of the world are.

Finally, the other author of the taxonomy argues for the values of faceted classification:

our system allows massive cross-classification both of papers and categories: any paper or category can be in multiple categories. This allows us to cut the pie in many ways at once, and we hope that people will generally be able to find what they are looking for following their intuitive way of cutting the pie (along periods, figures, views, points of disagreement, etc).

Though if he is attempting to cut the pie in many different ways at once, I would think he would want explicitly orthogonal classifications, rather than one enormous tree.

Comments off

NYTimes TimesTags API

The New York Times has created an API against their “taxonomy and controlled vocabulary used by Times indexers since 1851”.  Send their API a word and the NYTimes will send back a list of the most common relevant tags (and whether it’s a Person, Description, Organization or Location).  

Why create our own structured vocabulary when highly trained people have been doing it since 1851 and we can borrow theirs?

Comments off

Dewey Decimal

Based on section discussion of Cory Doctorow’s point “schemas aren’t neutral” and on a librarian friend complaining that Korea got shafted when it came to the folktale section of the Dewey Decimal system, I decided to look at the complete list of Dewey Decimal classes.

Like Nick mentioned in section, the religion section is overwhelmingly dominated by Christianity. Also, any time languages are mentioned, European languages get multiple categories (English, Other Germanic Languages, French, Spanish, Italian, Slavic, Scandinavian) while the rest of the world is stuck in the “other” category. Wikipedia, font of all knowledge, mentions that the Library of Congress system is even more US-centric than the Dewey Decimal system.

Makes you wonder what sort of systems of categorization information scientists in other countries create.

Comments (1)

Categorizing music

When I started collecting MP3s many years ago I was obsessive about filling in blank ID3 tags. No more “Track 09 — Unknown Artist” for me. There was one problem: I didn’t know how to fill out the genre tag. I actually remember posting a message to a newsgroup asking how to know what counted as rock, pop, hip-hop, R&B, and so forth. Someone had created these categories and I wanted to use them, but I didn’t have a clue how to do it (Doctorow’s “People are stupid,” I suppose).

I was having a conversation about this with Michael Manoochehri who interjected that to some extent those are just commercial categories for music, which I hadn’t considered before. Nonetheless, there does seem to be at least some useful aspect of these categories: sometimes I feel like listening to music from one of them and not others. Using genres as categories is painting in broad strokes–different songs from the same album might properly belong to different genres, and an artist might move between genres during her career–a system like Pandora’s use of the music genome project might more accurately select what I want to hear.

While I tried to conform to what I imagined where norms for genre categorization, I had a friend who created an entire set of unique genres for her music. Instead of pop and rock, she changed everything to “Coffee Shop Grooves” or “Rocking the Suburbs” or something similarly unusual, effectively using the genre namespace to sort music into her own categories.

Categorizing music is an issue across borders as well. I saw these two signs in a record store in South America:

Anglo rock and pop

Black music

Note that Eminem has a couple albums in the “Black Music” section. Now there’s a funny categorization scheme for you.

Comments (2)

Taxonomic Tagging

One of the problems with tagging is that the terms used can be ambiguous. Zigtag is a startup which offers delicious-style social bookmarking, but pairs it with a collection of meaningful tags. When a user goes to apply a tag, the system looks it up in their taxonomy and presents a list of matching entries from the taxonomy along with the meaning of each.

They do not currently have a tag specifically for i202’s syllabus.

Comments off

Which “Class” of “Middle Class” Are You Part Of?

Four Middle Classes ChartThis Pew Research Center article, entitled “America’s Four Middle Classes,” has little explicitly to do with IR technology. However, it does feature a new model for the categorization of the “American Middle Class,” which I think is a useful example for a discussion of the ways in which redefining data categories can provide new insights and sweep away widely held myths. This report describes how social survey data was used by researchers to segment people who self-identify as “Middle Class” into four new categories that describe financial stability – namely Top of the Class, Satisfied Middle, Struggling Middle, and Anxious Middle. The report demonstrates how within the self-identified category of “Middle Class,” there is a great variation in financial status, from relative economic comfort to the potential for financial hardship. I was personally drawn to this report because it demonstrates that simple recategorization of data can possibly lead to sweeping changes in social perceptions.

Here is an excerpt from the article: Life is considerably tougher for the Struggling Middle, a group disproportionately composed of women and minorities. In fact, many members of the Struggling Middle have more in common with the lower class than they do with those in the other three groups and actually have a lower median family income than Americans who put themselves on the lowest rungs of the social ladder. About one-in-six self-identified middle class Americans fall into the Struggling Middle.

Rest of article here. Full, 19-page PDF report of this project available here.

Relevant lectures: 5. CONCEPTS & CATEGORIES (9/15)

Comments (2)

The Proto-Web

http://nytimes.com/2008/06/17/science/17mund.html

Apparently, Paul Otlet came up with what is arguably the first iteration of the World Wide Web in 1934, the Mundaneum. Located in Belgium, it’s comprised of millions of 3 by 5 index cards stored in thousands of boxes. It never fully took off because of funding problems and the German invasion of Belgium during the World War. The article doesn’t describe in full detail how the system is implemented, but one thing that stands out is its revised version of hypertext, which differs from standard hypertext in that a link can indicate whether or not other links are similar to itself. Professor Buckland describes the Semantic Web as being “rather Otlet-ish.” The ill-lived story of the Mundaneum could prove to be a lesson to learn about the Semantic Web and whether or not it will fail depending on the way that it is structured and the labor that it will require. 

Relevant Lecture: The Semantic Web 

Comments off

One Picture, 1,000 Tags

One Picture, 1000 tags (The New York Times, March 28, 2007)

If you try to find paintings on the museum’s web site, you will probably fail unless you know the title or artist. In order to increase accessibility, a dozen museums such as the Metropolitan Museum of Art are redesigning their online site by encouraging the public to annotate their collections with descriptive tags.

However, this tagging application could cause a huge semantic gap between the public and curators. For example, the Metropolitan Museum of Art ran a test in which volunteers supplied keywords for 30 images of paintings and sculptures. The tags were compared with the museum’s curatorial catalog, and more than 80 percent of the terms were not in the museum’s documentation. Nevertheless, ironically, since the art professionals can find it difficult to describe the visual elements of a picture and there is no taxonomic system, museums want the public to participate for a lot of tags of each image. Tags – from obvious to personal – can also be used to proclaim a personal connection with a work of art. These ‘collective intelligence’ projects might bring the collection alive. 

[Relevant lectures]

5. CONCEPTS & CATEGORIES (9/15)
7. CONTROLLED NAMES AND VOCABULARIES (9/22)
14. SOCIAL / DISTRIBUTED CATEGORIZATION (10/15)
16. CONTENT MANAGEMENT (10/22)

Comments (2)

Open Secrets

“Open Secrets: Enron, intelligence, and the perils of too much information” by Malcolm Gladwell in the January 8, 2007 New Yorker.

There are puzzles and there are mysteries. You solve puzzles by finding the missing information; with mysteries the problem is that you have too much information, and solving them requires analysis. In this article, Gladwell applies this paradigm from Gregory Treverton to Enron’s collapse (and also to the hunt for Osama bin Laden, Watergate, Nazi propaganda in WWII, and cancer). He disputes the widely accepted premise that Enron withheld information on its dubious practices. In reality Enron disclosed nearly everything and analysts failed to understand the sea of data.

Today’s problems force us to re-examine the human element of information processing. Though the amount of information is paralyzing and noise is high, “the complex, uncertain issues that the modern world throws at us require the mystery paradigm.”

This article fits best with the 9/3 lecture on issues and contexts.

Comments off

Field Guides, Bigfoot, and the desire to grasp life

Field Guide illustration“… Guides do not, however, deceive their users into thinking that life is fully “knowable.” The National Science Foundation estimates that only two to 40 percent of the total species on Earth have even been identified. A true understanding of the planet’s biodiversity, then, remains elusive. Classify away and you’ve still only scratched the surface. This tension, between the desire to grasp life and its ultimate ungraspability, plays out on the pages of the field guide and in their use. (…)

Subsequent naming can offer only a fleeting illusion of knowability. Yet ephemerality does nothing to discourage identification; instead, it leaves us wanting more.”

How do we approach assessing the appropriate technology for a particular situation? Are video podcasts really necessary for birding? This article covers a lot of ground, from the history of field guides to some of the more philosophical questions about human information interaction and why we pursue study in this field.

Guiding Light by Jesse Smith, from The Smart Set from Drexel University, August 22, 2008.

This may touch on issues discussed in lectures:
5. CONCEPTS & CATEGORIES (9/15)
7. CONTROLLED NAMES AND VOCABULARIES (9/22)
8. CLASSIFICATION (9/24)

Comments (1)

« Previous entries