“A New Era for Image Annotation”..?

Searching for images on the Web has traditionally been more complicated than text search – for instance, a Google image search for “tiger” not only yields images of tigers, but also returns images of Tiger Woods, tiger sharks and many others that are ‘related’ to the text in the query string. This is because contemporary search engines look for images using any ‘text’ linked to images rather than the ‘content’ of the picture itself.  In an effort to improve the recall of image searches, folks from UC San Diego are working on a search engine that works differently – one that analyzes the image itself. “You might finally find all those unlabeled pictures of your kids playing soccer that are on your computer somewhere,” says Nuno Vasconcelos, a professor of electrical engineering at the UCSD Jacobs School of Engineering. They claim that their Supervised Multiclass Labeling System “may be folded into next-generation image search engines for the Internet; and in the shorter term, could be used to annotate and search commercial and private image collections.”

What is Supervised Multiclass Labeling System anyway?

Supervised refers to the fact that the users train the image labeling system to identify classes of objects, such as “tigers,” “mountains” and “blossoms,” by exposing the system to many different pictures of tigers, mountains and blossoms. The supervised approach allows the system to differentiate between similar visual concepts – such as polar bears and grizzly bears. In contrast, “unsupervised” approaches to the same technical challenges do not permit such fine-grained distinctions. “Multiclass” means that the training process can be repeated for many visual concepts. The same system can be trained to identify lions, tigers, trees, cars, rivers, mountains, sky or any concrete object. This is in contrast to systems that can answer just one question at a time, such as “Is there a horse in this picture?” (Abstract concepts like “happiness” are currently beyond the reach of the new system, however.) “Labeling” refers to the process of linking specific features within images directly to words that describe these features.

While the idea of searching images by their ‘content’ is indeed promising, there are some questions that still need to be answered. To what extent does the system’s efficiency depend on the sample of images used for training?  What is the impact of variations in the quality of photos on the algorithm’s performance? What big a role will these play in affecting the user’s supposedly improved search experience? Finally, do we foresee an extension of the algorithm to determine abstract concepts in the images too? Indeed, these are interesting areas to explore; nevertheless, the SML seems to be a significant step towards better image retrieval mechanisms.

Read more about the SML at http://www.jacobsschool.ucsd.edu/news/news_releases/release.sfe?id=650

1 Comment

  1. Ryan Shaw Said,

    November 10, 2008 @ 11:09 pm

    Contest-based image search has been an active area of research for many years, and there are hundreds of systems that use various machine learning techniques to classify or cluster images based on analyzing pixels and shapes and learning relationships to text annotations or tags or ontologies. All of these systems have failed to make it out of the lab, for two big reasons: first, while these systems can often be tuned to work well with test collections that have fairly straightforward imagery, they often fail miserably with photos “in the wild” that can be a lot weirder. Second, it turns out that people rarely happen to be looking for pictures of “tigers”; usually their criteria are a bit more compilcated than that. (Besides, current systems actually work fairly well for finding pictures of tigers, and scale a lot better than content-based techniques. E.g.: http://flickr.com/photos/tags/tiger/clusters/)

    Not that this isn’t interesting work; I just don’t see any evidence that there’s been a major breakthrough at UCSD, as opposed to improvements to well-established techniques. I still think the former is needed to content-based approaches to image search to become viable in the marketplace.

RSS feed for comments on this post