Archive forAssignment 1

The Proto-Web

http://nytimes.com/2008/06/17/science/17mund.html

Apparently, Paul Otlet came up with what is arguably the first iteration of the World Wide Web in 1934, the Mundaneum. Located in Belgium, it’s comprised of millions of 3 by 5 index cards stored in thousands of boxes. It never fully took off because of funding problems and the German invasion of Belgium during the World War. The article doesn’t describe in full detail how the system is implemented, but one thing that stands out is its revised version of hypertext, which differs from standard hypertext in that a link can indicate whether or not other links are similar to itself. Professor Buckland describes the Semantic Web as being “rather Otlet-ish.” The ill-lived story of the Mundaneum could prove to be a lesson to learn about the Semantic Web and whether or not it will fail depending on the way that it is structured and the labor that it will require. 

Relevant Lecture: The Semantic Web 

Comments off

T-Shirt search engines tag and help you find shirt designs

http://www.techcrunch.com/2008/09/01/the-vaynerchucks-launch-t-shirt-search-engine-pleasedressme/

Sounds silly, right?  Why would you need an entire search engine devoted to t-shirts?  However, clothing falls into that category of items that are plentiful and searchable online, yet difficult to search on for meaningful (visual) characteristics.

This new search engine, PleaseDressMe, was recently launched.  It searches some of the top t-shirt websites and tags them with useful, more general or esoteric keywords, such as “sarcasm”, “politics”, or “typography”.  Clothing is a good example of something that is easy to find when you’re not looking for it, but much more difficult to search precisely on concepts, or characteristics like fabric or sleeve length.  This search engine aims to make it easier for people to find comprehensive results of the kinds of t-shirts they’re looking for without having to visit sites individually or wade through pages of non-product google search results.

The TechCrunch article also has several comments pointing to Teenormous, a similar search engine with many more shirts indexed.

Lecture: CONTROLLED NAMES AND VOCABULARIES

Comments off

One Picture, 1,000 Tags

One Picture, 1000 tags (The New York Times, March 28, 2007)

If you try to find paintings on the museum’s web site, you will probably fail unless you know the title or artist. In order to increase accessibility, a dozen museums such as the Metropolitan Museum of Art are redesigning their online site by encouraging the public to annotate their collections with descriptive tags.

However, this tagging application could cause a huge semantic gap between the public and curators. For example, the Metropolitan Museum of Art ran a test in which volunteers supplied keywords for 30 images of paintings and sculptures. The tags were compared with the museum’s curatorial catalog, and more than 80 percent of the terms were not in the museum’s documentation. Nevertheless, ironically, since the art professionals can find it difficult to describe the visual elements of a picture and there is no taxonomic system, museums want the public to participate for a lot of tags of each image. Tags – from obvious to personal – can also be used to proclaim a personal connection with a work of art. These ‘collective intelligence’ projects might bring the collection alive. 

[Relevant lectures]

5. CONCEPTS & CATEGORIES (9/15)
7. CONTROLLED NAMES AND VOCABULARIES (9/22)
14. SOCIAL / DISTRIBUTED CATEGORIZATION (10/15)
16. CONTENT MANAGEMENT (10/22)

Comments (2)

Open Secrets

“Open Secrets: Enron, intelligence, and the perils of too much information” by Malcolm Gladwell in the January 8, 2007 New Yorker.

There are puzzles and there are mysteries. You solve puzzles by finding the missing information; with mysteries the problem is that you have too much information, and solving them requires analysis. In this article, Gladwell applies this paradigm from Gregory Treverton to Enron’s collapse (and also to the hunt for Osama bin Laden, Watergate, Nazi propaganda in WWII, and cancer). He disputes the widely accepted premise that Enron withheld information on its dubious practices. In reality Enron disclosed nearly everything and analysts failed to understand the sea of data.

Today’s problems force us to re-examine the human element of information processing. Though the amount of information is paralyzing and noise is high, “the complex, uncertain issues that the modern world throws at us require the mystery paradigm.”

This article fits best with the 9/3 lecture on issues and contexts.

Comments off

NTT, BayTSP Begin Joint Field Trial of NTT’s Robust Media Search Technology on BayTSP’s Content Authentication Platform

Reuters Mon Apr 21, 2008 10:00pm EDT
http://www.reuters.com/article/pressRelease/idUS28851+22-Apr-2008+BW20080422

NTT’s content recognition engine will be deployed in the U.S. for the first time combined with BayTSP’s Content Authentication Platform to enable content owners to monitor and manage how their intellectual property is used online.

The combination of NTT and BayTSP’s technologies will allow content owners to use proven video and audio fingerprinting technologies to monitor and manage how their intellectual property is used online, primarily on user-generated content sites like YouTube, Daily Motion, Google Video and Yahoo Video.

NTT has been researching and developing media search based on proprietary audio and video fingerprinting technologies since 1996.  The newly announced field trial in collaboration with BayTSP is the first application of NTT’s most advanced third generation robust media search technology to Internet content authentication applications on a large scale, and the first deployment of NTT’s Robust Media Search systems in the United States.
Relevant lecture:

8 CLASSIFICATION

16 CONETENT MANAGEMENT

18 METADATA FOR MULTIMEDIA

27 MULTIMEDIA IR

Comments off

Using Text Search Ideas to Speed Up Image Search

Microsoft Research news: Text-Search Tricks Speak Volumes in Image Search, May 2007

Finding similar images on Web used to require prohibitively high computational cost. Now however, researchers use text-search ideas to make content-based image search commercially feasible. For each image, a set of features are detected, and each feature is represented by a vector describing its characteristics such as orientation and intensity. In this way, each image resembles a document in text-search, and each vector resembles a token. Vocabulary is generated from millions of tokens gathered, and an inverted index could be built. Thus, the speed of finding a similar image on the Web falls to around 0.1 seconds.

A project call “Photo2search in Beijing” has turns this idea into reality. Geo-tagged street view images with longitude and latitude are crawled from photo-sharing websites like Flickr and put into an inverted index. Imagine you get lost in Beijing. Just pull out your camera phone, shoot a photo of your surroundings, send it to the system, and you get a digital map with your position marked on it by matching your photo to the most similar geo-tagged street view images.

 

Relevant lecture:  

23. VECTOR MODELS (11/17)

27. MULTIMEDIA IR (12/1)

Comments (3)

Fixing Broken Ballots

How Design Can Save Democracy
The New York Times, August 25, 2008.

Interactive Feature: Problems/Solutions in Ballot Design

In recent years, there has been controversy about the design of election ballots that cause confusion for both voters and vote-counters. (Remember butterfly ballots?) Unfortunately, voting technology and ballot design are not standardized or consistent, and vary wildly across the country. Ignoring the whole other issue of electronic voting security, there are still many problems with ballots that use confusing language and layout, as well as have difficult to read small print. These are especially problematic for people with visual impairments or those whose first language is not English.

Fortunately, the United Stated Election Assistance Commission created ballot design guidelines earlier this year. Following a guide to improve clarity in both language and design should reduce voter confusion, and will hopefully reduce problems of vote accuracy.

Local governments often have very limited funding, and it’s challenging to design forms that are clear to the hugely diverse population of “Americans 18 and older.” However, it seems to me that this is a case where budgeting for some extra thought and effort in the initial design can prevent many problems and their related costs later.

Relates to lectures:
3. Organization {and, or, vs} Retrieval
7. Controlled names and vocabularies
12. Enterprise/institutional categorization & standards
19. Information organization in user interfaces

Comments (1)

Employee Social Networking

A Case Study in Employee Social Networking at Sabre

http://www.socialcomputingmagazine.com/viewcolumn.cfm?colid=601

With the explosive popularity of social networking sites in the last few years, business analysts have been scrambling to find a way to incorporate employee networking into their companies. The task of improving efficiency of communication and building corporate culture for large companies with thousands of employees stretched across the world might be best achieved with these emerging platforms.

In “A Case Study in Employee Social Networking at Sabre” Toby Ward, Founder and CEO of Prescient Digital Media, documents some of the impacts a strong employee social network has made on the airline reservation company Sabre. He notes that while email is still the dominant application for company communication, more value can be delivered when a single employee can communicate “both actively and passively” to all connected employees. Users of “SabreTown”, Sabre’s employee networking platform allows for most of the features any social network platform does: employee profiles, photo sharing, blogs, comments, etc.

SabreTown and other platforms might just be more than another excuse to ride the Web 2.0 and social networking wave. As users complete their profile; write, comment on and edit blogs; ask and answer questions, the platform engine compiles and categorizes relevant information in order to improve employee search and helps “members find the right people with the right answers.” Sounds a lot like Google’s quest to display the exact result the user wants at the number one spot by collecting as much data about the user as possible.

Somewhat obvious is that these emerging platforms will become increasingly useful in industrial and public service domains. When I was teaching, I had my students complete MySpace profiles for characters from Romeo and Juliet. They had to fill out their profile according the specific details of each character as well as comment and send messages to other characters. As oft-nebulous Shakespeare characters began to have personalities they could relate to, my students became more engaged and enjoyed reading the play much more.

14. SOCIAL / DISTRIBUTED CATEGORIZATION (10/15)

Comments off

of bits and atoms in long time

news article from SFGate: Etched language data will last for 2000 years

In ‘everything is miscellaneous’, David Weinberger’ argues the benefits of Bits over Atoms. He points out that bits can be duplicated without effort and that they can be re-organized instantaneously to conform to our immediate contexts.

Apart from pointing at this inherent mutability of bits, he refers to the fact that digital information is easier to preserve. This is, in a certain marginal sense contradicted by a very interesting project to preserve a snapshot of all human languages. The Rosetta Project, run by The Long Now Foundation and The National Science Foundation is a collaborative effort to create a record of all human languages from 02000 to 12000.

This near permanent record, apart from being an online archive is a physical artifact designed for long term storage. The design of their Rosetta Stone is very intriguing. It is not digital, and is designed to be read without the need for any specific devices. The Stone is human readable, and has inscribed on one side, in many languages – “Languages of the World: This is an archive of over 1,500 human languages assembled in the year 02008 C.E. Magnify 1,000 times to find over 13,000 pages of language documentation.”

It is interesting to note of the design considerations and concerns change when designing for Long time. Kevin Kelly, a participant designer (and founder of Wired) points to some of the issues on his blog. Good quality, properly stored paper can easily last for 2000 years. Moreover we can be quite sure, that it will be understandable and accessible after that time. On the other hand

Front of the Rosetta DiskPages stored on plastic DVDs are neither stable over the very long term, nor readable over the long term. Unless digital information is ceaselessly migrated from one fading medium to another new one, it will quickly cease to be accessible. Two decades ago the floppy disk was ubiquitous. Most personal digital information then was stored on this format. Today, any information stored only on a floppy disk is essentially gone.  Imagine the incompatibility of today’s DVD in 1,000 years.”

Mentioned by Vannevar Bush in his 1945 paper, the most appropriate technology for long term storage is still micro-etched film. The composition (and cost) of the film obviously changes based on how important the data is. The Long Now Rosetta Stone contains 350,000 pages of text written in 1000 languages, designed by linguists are inscribed onto a nickel cast of a micro-etch silicon mold. It comes with a built in magnifying glass. They came up with this design as part of the discussions on Managing Time Continuity. Following the archiving principle of LOCKS (Lots of copies, Keep ‘em Safe) the project is distributing copies of the disk globally. They have also placed one in space – on the Rosetta Space Probe, that will sometime in the future land on a comet, leaving the disk to circle around the sun for many many years to come.

Obviously this is not the first attempt to preserve human language and culture for posterity. Other interesting projects are namely the Crypt of Civilization, placed in a underground bunker in Atlanta, Georgia; The Westinghouse Time Capsules and the popular (but short-timed) Yahoo! Time Capsule designed by Jonathan Harris.

Relevant Lectures: ISSUES AND CONTEXTS (9/3), ORGANIZATION {AND,OR,VS} RETRIEVAL (9/8), INFORMATION INTEGRATION & INTEROPERABILITY (10/6)

Comments (2)

Fish Debates and Wine Dilemmas

For a new word to make it into the dictionary, Adam Gorlick [1] argues that it has to be more than a “flash-in-the-pan” fad. The word needs staying power to get in. Many word inclusions result from new classifications that emerge as a consequence of globalization and reflect changing usage. For instance, in July 2008, the word “Prosecco” was included in Merriam-Webster’s Collegiate dictionary to mean sparkling Italian wine”. While this was encouraged by marketers seeking to promote it, the inclusion has not been without controversy. Many argue that the inclusion of the word as a synonym for sparkling wine would create confusion about its identity. How is Prosecco different from Champagne except for the fact that it is from Conegliano Valdobbiane in Italy?

Another popular debate involves the inclusion of the word “Pescatarian” to mean a “vegetarian who eats fish”. Referred to as the “Great Fish Debate”, many argue the definition as being an oxymoron that challenges the notions of a simple vegetarian like myself who has all her life known a vegetarian diet to not involve any kind of animal flesh! And now, they tell you can eat fish and still be a vegetarian.. It will not be long before classifications like lacto-vegetarianism, ova-vegetarianism, asian-vegetarianism to make their way into the dictionary.

In this background, it becomes reasonable to askwhat is the impact of such new classifications on every day language use? Do they serve as new dimensions to an existing concept and beliefs or do they simply add ambiguity to it? What is the basis of such classifications and how do they evolve with new discoveries? Does the etymology of these terms reflect changing perceptions of parent terms- for example, does the emergence of Pescatarian reflect openness among vegetarians to expand food options or is it simply a market driven branding technique for fish lovers!

[1] http://www.boston.com/news /local/ massachusetts/articles/2006/07/05/mouse_potato_needing_ bling_check_merriam_websters_ new_entries/

[2] http://www.blogdolcevita.com/post/633/italian-wine-the-word-prosecco-included-in-merriam-websters-collegiate-dictionary

[3] http://blogs.citypages.com/food/2008/07/mirriam_webster.phps

Related to Lectures:

3. Information Organization and Retrieval

5. Information Categories

8. Classification

Comments off

Next entries » · « Previous entries