Search, Facet, and Filtering Examples

Konigi is a User Experience Design site that features interesting interfaces, with a handful of features on searches, filtering, and faceted navigation.

A couple of sites they’ve featured:, my favorite travel search interface

FanSnap, an event ticket site

Also, Cookstr is a recipe site that has a ton of interesting facets once you search or click a category: cuisine, cost, dietary considerations, kid friendly, holiday… Much cleaner and easier to use than other recipe sites I’ve played with.

Doesn’t it just make you happy when a company gets search right?

Comments off

What to do with the Nasties

BBC News is reporting that YouTube has removed some videos from its site that it judged to glorify the Columbine school shooters, which left me wondering what one does when one expunges “undesirable” data from a collection. Assuming the expunging is justified, do you keep the reference information so you have a record of having had the thing around (and thereby make yourself better able to detect its reappearance)? Do you expunge the thing from the entire database? It seems good general practice to have a place where one can keep old records that no longer point to something retrievable. Would it be wise to allow people to search and find that an item had been intentionally removed, to save them the trouble of searching and searching for it? Or would it be ethically questionable to have even just the record available, since it could give people the idea to seek it elsewhere or create copycat works? I’m guessing the videos will appear elsewhere on the net, and there is little anyone can do to keep them out of public view, but keeping them off popular sites could effectively marginalize them. I’m thinking the benefit of keeping something truly nasty beyond the view of the “tell me something about …” searcher outweighs the benefit of explaining the removal to the “I want this exact document” searcher.

Comments (1)

Search + Social Networking

A search engine that lets users benefit from their social network to improve search results:

From the about us page:

What is Delver?

Delver is an intelligent social search engine that enables you to find, experience and benefit from the wealth of information created and referenced by your social world. Our mission is to empower you to easily discover and benefit from the collective wisdom of your social world. Your circle of friends and extended network are increasingly creating and sharing useful information and media online through: blogs, videos, reviews, articles, websites, music… and the list is only growing. By indexing all that shared knowledge, media, opinions, and activities, we can deliver search results that are truly relevant to you.

Comments off

semantic image retrieval

this may already be old news for regular readers of, but incase you missed it, here’s another search engine.

Pixolu is a semantic image search, which allows to refine a search by allowing users to select images that best represent their query. I tried it for some queries and it seems to do a good job, factoring in color, object shapes, size and density in images. 

The two-step search-and-refine process is very interesting and represents a more natural way of information gathering. Pixolu, a more 202’ish search pays attention to recent (and older) research in information gathering and foraging.

Comments off

Search Flickr by Color

Searching for all the photos on Flickr that are tagged “red” is old-hat. Besides, searching for colors in tags is fraught with problems: people don’t have the patience to tag their photos exhaustively with all the colors in them, people may not be able to distinguish all the colors in a photos, and worse, they may be “wrong” about the colors. After all, your red is my pink. (If you want to get philosophical, check out the inverted spectrum problem, though this doesn’t pose a problem for Flickr tagging.)

An obvious approach is to tag photos with all their colors algorithmically. We can scan photos for colors and tag any picture with lots of #ff0000 “red”. Users who search for red will retrieve these results. This approach would be consistent, but it is still open to the problem of disagreement about colors–someone still has to define red in the computation. In terms from a recent 202 lecture, a semantic gap remains between the photo and the metadata used to describe (and consequently retrieve) it.

A solution to this problem is to search using a criteria at the same semantic level that you require in your results. Idée has implemented this idea with its Multicolr interface for searching Flickr. You select a color and see pictures that contain that color. Using Multicolr is mesmerizing because you can adjust your search criteria to encompass multiple colors and see results matching your search. Selecting the same color multiple times (i.e. the equivalent of “redred“) increases its intensity in your search.

Textual search is likely to remain our primary means of retrieval for the foreseeable future–so much of our discourse is word-dominated–but this is an example of the frontiers of IR.

Comments off

New Research Engine Searches “Deep Web”

How much of the World Wide Web is actually indexed… 27.65 billion pages? Maybe about 0.2% of the total content? The “Deep Web” (web documents not immediately accessible by direct hyperlink from public pages) may contain something like 91,000 terabytes of data… as compared to an estimated 167 terabytes of Surface Web data.

A new service, called Infovell, hopes to help users find more of this “Deep Web” data… yet unlike Google and other Surface Web engines, it won’t be ad-supported. Instead, the service will be subscription based. Read more at ReadWriteWeb blog. Here is an excerpt from the article:

InfoVellThe engine scours through open-access repositories of information like PubMed Central and the U.S. Patent and Trademark Office Claims, but it also allows access to scholarly journals such as those from Oxford University Press, SAGE, Taylor & Francis, Annual Reviews, Mary Ann Liebert Publications, and more. The culmination of these billions of pages currently unindexed by other engines, gives you access to content in the areas of Life Sciences, Medicines, Patents, Industry News, and other reference content from expert sources.

Comments (1)

Not so cool?

Stealth search start-up Cuil (pronounced “cool”) launched its product on July 29th of this year, and was promptly subject to an angry backlash from its users. The start-up – founded by three former senior Google employees – claims to have an index size of 120 billion web pages (larger than that of Google’s, they say). On the day of its launch however, Cuil’s search results were not as rewarding. For example, a search for “Dog” resulted in 280 million hits on Cuil and 498 million on Google. Of course, quantity isn’t everything, but even in relevance, Google’s results were better.

The comparison to Google is but natural, since Google defines what search means to most of us today – search results that are relevant, but also photos, news articles, video files, etc. that complete the picture. It remains to be seen whether Cuil can offer a superior ‘universal’ search package to one that we’re already used to.

Relevant lectures: 21/22

Comments (1)

E-Discovery – Too Much Information (TMI!)

Electronic discovery or e-discovery is the process of demanding and
sifting through, “digital evidentiary artifacts” for lawsuits.
Information from Facebook, Myspace, chat, email, laptops, smart phones,
memory sticks, back-up tapes, logs from service providers, is now considered,
“fair game,” and subject to inspection when adversaries in lawsuits
demand and are granted access. E-discovery is an increasingly expensive
and Sisyphean reality of modern court proceedings.  Court cases
more-frequently face early settlement, plaintiffs are increasingly
unable to sue (or defend), “for fear of [enormous] e-discovery costs”,
and the justice system is increasingly over-burdened.

Ordinary court cases risk millions of dollars, and hours of being
bogged down in e-discovery.  As a Verizon attorney explains for his
business, “Almost every case [now] involves e-discovery and spits out
“terabytes” of information…. 200 lawyers can easily review electronic
documents for four months, at a cost of millions of dollars.”  As a
result of the increased burden of effort, e-discovery businesses are
booming, frequently charging $125-$600/hr. Annual revenues from
e-discovery businesses, “Have grown from $40m in 1999 to about $2
billion in 2006 and may hit $4 billion next year.”

“Results [of e-discovery] have to be indexed and reviewed by
humans. This usually falls to the junior staff at law firms, some of
whom are so fed up with the drudgery that they have quit the profession

Privacy is increasingly subject to invasion, as insurance
companies have demanded personal records of their clients when
disputing customer claims.  For example, in a recent lawsuit, “Horizon
Blue Cross Blue Shield of New Jersey… asked and were granted the
right to see practically everything the teenagers had said on their
Facebook and MySpace profiles, in instant-messaging threads, text
messages, e-mails, blog posts and whatever else the girls might have
done online.”

In this context, it looks like your memex could be your wost enemy!

For more, see the original article:  The Big Data Dump

This may touch on the following lectures:

Comments off