Taxonomy of Philosophy

Weinberger links to this intriguing attempt to categorize philosophical papers for a system to “access online work in philosophy.”  

The best part is the discussion that follows David Chalmers’ blog post about the project, which sends me through a microcosm of the 202 course so far.  One commenter links to “An Essay towards a Real Character and a Philosophical Language” in which John Wilkins attempts to create a language where every word defines itself based on a hierarchy of 40 Genuses (each divided into Differences and then Species) of his design.  The Wikipedia article points me to Borges’ response, “The Analytical Language of John Wilkins”, where he casts doubt on such universal categorization schemes by comparison to The Celestial Emporium of Benevolent Knowledge.  Other commenters on the blog post point out similar problems: a separate set of categories for the history of philosophy seems strange since many of these papers are relevant to the philosophical topics themselves; there seem to be “multiple principles of division“.

One of the author’s of the philosophy taxonomy responds with a return to pragmatism:

OK, it’s a pseudo-taxonomy, or maybe just a category scheme. We’re not doing science here, just trying to come up with something useful and convenient.

Excellent.  We all know that classification systems should be judged by their usefulness rather than how essential their representations of the world are.

Finally, the other author of the taxonomy argues for the values of faceted classification:

our system allows massive cross-classification both of papers and categories: any paper or category can be in multiple categories. This allows us to cut the pie in many ways at once, and we hope that people will generally be able to find what they are looking for following their intuitive way of cutting the pie (along periods, figures, views, points of disagreement, etc).

Though if he is attempting to cut the pie in many different ways at once, I would think he would want explicitly orthogonal classifications, rather than one enormous tree.

Weinberger Need Statisticians

I’ve always wondered how Weinberger could get meaningful information out of his “huge pile”, and in his interview with Doctorow[1], Weinberger mentioned a way to make use of it: statistical analysis. This is what he said:

“Tags are chaos, and as you get more and more of them, it will get more and more chaotic.  It turns out that when you have a lot of them, the statistical analysis becomes really pretty precise.”

This reminds me of a paper I’ve previously read, “Toward Extracting Flickr Tag Semantics”, written by Yahoo! Research Berkeley and published on WWW2007[2]. The method described in the paper could identify “place tag” and “event tag” from the tags store in Flickr. For instance, the authors could “detect that the tag Bay Bridge describes a place, and that the tag WWW2007 is an event.” (WWW2007 is a conference held in Canada in 2007.)

How did they do that? The main idea is, “place tag” like Bay Bridge has significant spatial patterns, tending to concentrate within a certain geographic range, and “event tag” like a conference has significant temporal patterns, tending to appear around a certain time period. So by using preexisting spatial and temporal statistical methods, computer scientists are able to discover the “semantics” of Fickr tags.

In all, statistical analysis can help Weinberger make use of the huge amount of information, and it may also serve as a “filter” to deal with information overload problems.



[1] Metacrap and Flickr Tags: An Interview with Cory Doctorow,

[2] Towards Extracting Flickr Tag Semantics,

“Genius” Feature Makes Music Miscellaneous

As many of you probably already know, last week Apple released iTunes 8. One of the most interesting features announced in this update is Genius playlist creation.  Select any song in your library and the Genius will create a playlist of songs in your own library that go well with it. So, if you’re in the mood for jazz, just select your favorite Ella Fitzgerald song, press the genius button and you’ll have a playlist of songs like it.

In my use of this feature, I’d say it works really well. It saves a lot of time and reintroduces me to music I already have but may not have listened to in a while).

Where this feature gets interesting is in how it relates to the material we’ve discussed in 202. The Genius works by first collecting and submitting (anonymously) all of your music’s metadata to Apple’s servers. There, these data are analyzed and compared to other users’ music metadata as well as the buying habits of iTunes music store customers, of which there are about 70 million. The algorithm that Apple uses to determine music matches has in effect made music miscellaneous. People buy music from the iTunes store, rip CDs, and tag their own music files anyway. This feature taps into these disparate cataloging systems collected from millions of users and creates something new from them. It ameliorates the problem of having to recall all the music you have your library that might fit a particular mood. No music professionals required. 

To me, this is a clear win for Weinberger.

Bringing the Third Order of Order to the Second Order

The title for this post is deliberately a mouthful, recalling the mess that is physical metadata. However, there is a company, Tikitag, that is seeking to apply some principles of digital metadata to the physical world by selling cheap RFID tags and readers. The goal is for users to stick these inexpensive unique digital identifiers on their physical belongings, and to manage the information about them on their computers. RFID tags are being used increasingly in the retail space for inventory management, but until now their use in consumer applications has been fairly limited.

Engadget has a writeup of their coverage at DemoFall.

of bits and atoms in long time

news article from SFGate: Etched language data will last for 2000 years

In ‘everything is miscellaneous’, David Weinberger’ argues the benefits of Bits over Atoms. He points out that bits can be duplicated without effort and that they can be re-organized instantaneously to conform to our immediate contexts.

Apart from pointing at this inherent mutability of bits, he refers to the fact that digital information is easier to preserve. This is, in a certain marginal sense contradicted by a very interesting project to preserve a snapshot of all human languages. The Rosetta Project, run by The Long Now Foundation and The National Science Foundation is a collaborative effort to create a record of all human languages from 02000 to 12000.

This near permanent record, apart from being an online archive is a physical artifact designed for long term storage. The design of their Rosetta Stone is very intriguing. It is not digital, and is designed to be read without the need for any specific devices. The Stone is human readable, and has inscribed on one side, in many languages – “Languages of the World: This is an archive of over 1,500 human languages assembled in the year 02008 C.E. Magnify 1,000 times to find over 13,000 pages of language documentation.”

It is interesting to note of the design considerations and concerns change when designing for Long time. Kevin Kelly, a participant designer (and founder of Wired) points to some of the issues on his blog. Good quality, properly stored paper can easily last for 2000 years. Moreover we can be quite sure, that it will be understandable and accessible after that time. On the other hand

Front of the Rosetta DiskPages stored on plastic DVDs are neither stable over the very long term, nor readable over the long term. Unless digital information is ceaselessly migrated from one fading medium to another new one, it will quickly cease to be accessible. Two decades ago the floppy disk was ubiquitous. Most personal digital information then was stored on this format. Today, any information stored only on a floppy disk is essentially gone.  Imagine the incompatibility of today’s DVD in 1,000 years.”

Mentioned by Vannevar Bush in his 1945 paper, the most appropriate technology for long term storage is still micro-etched film. The composition (and cost) of the film obviously changes based on how important the data is. The Long Now Rosetta Stone contains 350,000 pages of text written in 1000 languages, designed by linguists are inscribed onto a nickel cast of a micro-etch silicon mold. It comes with a built in magnifying glass. They came up with this design as part of the discussions on Managing Time Continuity. Following the archiving principle of LOCKS (Lots of copies, Keep ’em Safe) the project is distributing copies of the disk globally. They have also placed one in space – on the Rosetta Space Probe, that will sometime in the future land on a comet, leaving the disk to circle around the sun for many many years to come.

Obviously this is not the first attempt to preserve human language and culture for posterity. Other interesting projects are namely the Crypt of Civilization, placed in a underground bunker in Atlanta, Georgia; The Westinghouse Time Capsules and the popular (but short-timed) Yahoo! Time Capsule designed by Jonathan Harris.


