Archive forOctober, 2008

Destination Search for Informational Queries

IR queries generally fall into two categories: navigational queries which seek specific website or home page, and informational queries which seek general information (Manning’s IR book, Chapter 19). This Destination Search is proposed by researchers from Microsoft Research for the latter.

 

Their observation is that for an Informational Query, users typically browse a trail of web pages before finding a final web page that the user is interested in and stop browsing. This final web page is called “destination”. The “length” of this browsing trail is 4.8 web pages on average. The aim of the research is to shorten the length of browsing trail and bring up this “destination” web page immediately.

 

Different from traditional IR that match query against documents, Destination Search matches a new query against previous queries. A Destination Search engine maintains a dictionary of “query-destination” pairs, based on search logs collected from online users. When a new query comes, it is matched against previous queries in “query-destination” dictionary, and destinations of similar previous queries are returned as query result. User study also shows greater satisfaction of information seekers for this new IR system.

 

The idea comes from an ACM SIGIR ’07 paper titled “Studying the Use of Popular Destinations to Enhance Web Search Interaction”, http://research.microsoft.com/~mbilenko/papers/07-sigir.pdf

Comments off

Controlling vocabularies with paper barcodes

This is something that came up in my GIS (Geographic Information Systems) class – perhaps not a great breakthrough, but as a technique for controlling vocabularies, I thought it was pretty neat.

One common way to develop a GIS database is to take little GPS handhelds out in the field, go up to the features you want to map, record their locations, and input the attributes you care about (for example, locate a tree and input its height and species). As it turns out, there are a couple important interface issues here – one is that you have to input complex data with the limited interface of a handheld device, and the other is that you often want to use a complex controlled vocabulary for the data you input (e.g. a list of tree species, delineated categories of tree heights, etc).

As it turns out, some of these handhelds have an integrated barcode reader. So you define your vocabularies, then print them out into a paper catalog of terms, each with a barcode, and when you’re standing next to your tree all you need to do is look up the terms in your paper catalog and scan them with your barcode reader. I thought this was a pretty elegant solution to the problem, and it addresses what I often see as the most important part of producing structured data – find ways to make it easier for content creators to produce clean data than not.

Comments off

games with a purpose – computers can guess your gender

I am sure, that mostly everybody has already heard of the work by Prof. Luis von Ahn at CMU. His game ESP (for extra sensory perception) had been widely covered by all kinds of media. The game was licensed by Google to create (the not as good) Google Image Labeler. 

Luis von Ahn was already quite famous for his work with stenography, when he came up with Captcha’s to help solve the spam problem. And if that was not enough – he has been coming up with amazing games that help computers become more smart than ever. 

We play, to help computers learn. I like the idea. Check out some games at GWAP.com

Interesting article on Luis von Ahn by Wired

Comments (2)

PHP Namespace Delimiter Controversy: “set sail for fail”

PHP is a popular, and often maligned scripting language, which evolved originally as a means for creating dynamic web pages. The software that runs the iSchool website, “Drupal,” is written in PHP. One of the biggest criticisms with PHP is the fact that it has thousands of built in functions, many of which share the same “namespace.” If you recall from our XML lectures, a namespace allows one to use the same element names from different XML vocabularies in the same document, as long as they are referenced in the context of their namespace. For example, two elements with the ID “name” can be used in the same XML document, if each has a unique namespace:

<book:name>Some Book Title</book:name>

<store:name>Some Store Title</store:name>

Fail BoatSome PHP function names get around this problem by having long psudeo-namespace identifiers, such as “preg_match_all” (performs a global reg/ex match). Unfortunately, there is not a lot of standardization between these names, meaning coders must deal with time-consuming inconsistencies. PHP developers recently decided that real namespaces are an important feature, but were not in agreement as to how to implement them. XML, as you know, uses the colon, “:”, to facilitate namespaces within a tag . Other languages, like Python and Java, use a dot, “.”,  to define which classes are imported from a particular package’s namespace. However, PHP developers have decided to define a namespace using the backslash, “\”, which has generated quite a bit of criticism. One major concern is that the backslash is also used as an escape code for special characters, such as “Tab” – which can also be represented as “\t.” So, will there be cases in the PHP code where a function like “MyPackage\transform” will be misread as “MyPackage[TAB]ransform?” Also, backslash is not in the same place on different keyboard layouts. Is this a choice that is biased toward certain users? One blogger has referred to the backspace choice as a decision that will cause PHP to “set sail for fail.”

Like the standardization negotiations we discussed in class, the PHP namespace implementation came about after quite a lot of discussion. However, based on the controversy, discussion probably could have used a little more time, and a few more voices before a final decision was made. Today, Slashdot had a link to the long IRC chat log where the choice was finalized, and here is a link to the pros and cons of namespace design on php.net.

Comments (2)

Social Semantic AI Twiney Thingy

I read about Twine in Technology Review (and then saw Shawna just posted on it). Well the TR review is interesting in itself. It seems to even includes a 202 kitchen sink:  Autotagging, Autosummary, Bookmarking, Sharing, AI, Concept extraction, NLP, Semantic Web, etc.. but they report bugs, too.

Comments off

In continuation with Nick’s very valuable info on ‘NY Times tags API’

http://open.nytimes.com/2007/10/23/messing-around-with-metadata/

Jacob harris highlights the importance of metadata in News industry. And they have been using it since 1851 phew!!  

On a different note the following excerpt (from this article) touches upon the ‘automation vs manual’ tradeoff discussed in today’s class. 

“Still my snarky aside has truth to it: people are ultimately controlling the process. In the beginning, rules for the automatic extraction and tagging are set by an Information Architect. In the end, final approval and correction of suggested metadata is done by various Web producers before publication. Web producers also do the important job of accurately summarizing the story. So, while we have machines to help out the process, it’s still ultimately a human endeavor, largely because automated summarization and classification has its problems.”

Comments off

Twine

Probably a lot of you have heard of Twine, but it just came out of private beta and is now open to the public.  A social network built on the semantic web… hmm…

Comments off

NYTimes TimesTags API

The New York Times has created an API against their “taxonomy and controlled vocabulary used by Times indexers since 1851″.  Send their API a word and the NYTimes will send back a list of the most common relevant tags (and whether it’s a Person, Description, Organization or Location).  

Why create our own structured vocabulary when highly trained people have been doing it since 1851 and we can borrow theirs?

Comments off

Personal Financial Information Manager

From our discussion today, it seems that one of the key features of an effective PIM system is ease-of-use and transparency. You have to be able to just set it up and let it do its thing. I’ve been using such a system to manage my finances for a while now called Mint.

The way it works is that you give Mint the usernames and passwords of the online financial systems you use:  bank accounts, credit cards, investments, loans, etc.  From then on, the system will automatically download all the financial information available from the various sites and collect them into a single view.  It also automatically categorizes all transactions so you can, say, compare spending at restaurants from one month to the next. Always adjust these categories yourself, but Mint does a pretty good job by itself.

Of course, this system is not tied in with other elements of my life like, say, the pictures I’ve taken, or the phone call I made to set up a meeting at particular restaurant who’s bill is unusually high, etc.  But as a system for managing the disparate financial systems in our lives, it’s great.

I highly recommend it!

 

Comments (2)

Cooliris – Visualization of searching media

I found a fascinating plug-in for firefox — Cooliris. (Maybe some of you already know it.)

It is about visualizing multimedia contents searched by search engine like Google. I think it could be a good example of the importance of information presentation.

While Cooliris basically shows the same data as Google does, Cooliris gives much better experience and navigation. Let me show some screenshots that I took by myself.

The picture above is a normal view of the result of Google search with keyword “ontology.” Then how about Cooliris?

Cooliris provides a full 3D stack of image search results. I think this kind of presentation would really enhance user experience in searching, although it just shows the same things.

[Added later]

I think that I described this in a poor manner. As Nick explained in comment, I just captured one of dynamically moving scene. I can navigate the collection by dragging mouse and enlarge what I want to see in detail by clicking on the picture.

Comments (3)

« Previous entries