Image-based mobile search by Vodafone and Nokia

Sunny’s blog posting reminded me of the effort for ‘Image-based’ search by using mobile phone – Vodafone and Nokia. Internet service companies are not the only ones who try to find out an effective and accurate way for ‘Image’ search. In order to provide better service to their customers and to invent their killer application, mobile telecommunication incumbents are also striving for this area.

  • Vodafone‘ image-based mobile search ‘Otello’
    : At CeBIT 2008, Vodafone announced that they’ll be trialling their Otello image-based search technology in Germany. In other words, handset owners just snap a picture of anything — a landmark, DVD case, unidentified flying object, etc. — and Otello then “returns information relevant to the picture to the mobile phone.
  • Nokia was also planning a semantic visual search engine, which makes plenty of sense given their push in high-quality cameras for their mobile phones.
    : The visual search engine uses three process levels to extract semantic information from an image.1) When analyzing the images, at first they are converted into a plurality of candidate low-level
    features (like shape, color and texture strength) and these features are extracted locally around salient points of the image.

    2) Then a supervised learning approach is used to select prominent low-level features from the plurality of candidate low-level features. The prominent low-level features are associated with predefined object categories, that describe generic objects (e.g., cars, planes, etc.); parts of a person’s body (e.g., faces), geographical landmarks (e.g., mountains, trees. etc.), or other items.

  • 3) When a new item is to be categorized, the target item is converted into a plurality of multi-scale local features and then each local feature is matched with the prominent low-level features using a probabilistic model. So, if the target item has a face, then this feature will be matched accordingly to the other items having a face and the item will be categorized.

Besides these two players, even though I couldn’t find a success story so far, numerous mobile-related companies (including KDDI, iPhone) are working on this huge project continuously. However, this technology could one day be used to search for more information on famous places, DVDs, toys and so on by simply taking a picture with your handset while walking on the street.

Comments (1)

“A New Era for Image Annotation”..?

Searching for images on the Web has traditionally been more complicated than text search – for instance, a Google image search for “tiger” not only yields images of tigers, but also returns images of Tiger Woods, tiger sharks and many others that are ‘related’ to the text in the query string. This is because contemporary search engines look for images using any ‘text’ linked to images rather than the ‘content’ of the picture itself.  In an effort to improve the recall of image searches, folks from UC San Diego are working on a search engine that works differently – one that analyzes the image itself. “You might finally find all those unlabeled pictures of your kids playing soccer that are on your computer somewhere,” says Nuno Vasconcelos, a professor of electrical engineering at the UCSD Jacobs School of Engineering. They claim that their Supervised Multiclass Labeling System “may be folded into next-generation image search engines for the Internet; and in the shorter term, could be used to annotate and search commercial and private image collections.”

What is Supervised Multiclass Labeling System anyway?

Supervised refers to the fact that the users train the image labeling system to identify classes of objects, such as “tigers,” “mountains” and “blossoms,” by exposing the system to many different pictures of tigers, mountains and blossoms. The supervised approach allows the system to differentiate between similar visual concepts – such as polar bears and grizzly bears. In contrast, “unsupervised” approaches to the same technical challenges do not permit such fine-grained distinctions. “Multiclass” means that the training process can be repeated for many visual concepts. The same system can be trained to identify lions, tigers, trees, cars, rivers, mountains, sky or any concrete object. This is in contrast to systems that can answer just one question at a time, such as “Is there a horse in this picture?” (Abstract concepts like “happiness” are currently beyond the reach of the new system, however.) “Labeling” refers to the process of linking specific features within images directly to words that describe these features.

While the idea of searching images by their ‘content’ is indeed promising, there are some questions that still need to be answered. To what extent does the system’s efficiency depend on the sample of images used for training?  What is the impact of variations in the quality of photos on the algorithm’s performance? What big a role will these play in affecting the user’s supposedly improved search experience? Finally, do we foresee an extension of the algorithm to determine abstract concepts in the images too? Indeed, these are interesting areas to explore; nevertheless, the SML seems to be a significant step towards better image retrieval mechanisms.

Read more about the SML at

Comments (1)

semantic image retrieval

this may already be old news for regular readers of, but incase you missed it, here’s another search engine.

Pixolu is a semantic image search, which allows to refine a search by allowing users to select images that best represent their query. I tried it for some queries and it seems to do a good job, factoring in color, object shapes, size and density in images. 

The two-step search-and-refine process is very interesting and represents a more natural way of information gathering. Pixolu, a more 202’ish search pays attention to recent (and older) research in information gathering and foraging.

Comments off

GazoPa:Changing the way we do image search?

James’s blog post reminded me of another company from Japan that presented at TechCruch50 in September called GazoPa. GazoPa uses proprietary image analytics technology to extract information such as color and shape from images and then identifies similar pictures from a pool of about 50 million different images found around the web.

I registered and played around by uploading a few images and noticed that the image search engine is keen on color and will retrieve a bunch of photos that have similar color patterns or palettes. A picture of a white puppy on a green pasture will generate anything from birds and horses to frogs and spiders as long as there is sizable green and white within the frame of the picture.

A more complex picture with a hipster-ish looking girl looking blasé surrounded by text and images (pulled from a magazine) generated a much more random, some seemingly non-sequitur crop of photos, that ranged from a picture of Christina Aguilera, a cat, a construction site and some others. But even with that, the color tones remained within the same range of the original uploaded image. The service needs work with better identifying shapes. In addition I wish they had an added feature of identification beyond the link at the bottom of the retrieved images. Under the photo of Christina Aguilera, I’d like for it say Christian Aguilera.

Apparently a similar service was developed by AltaVista ten years ago but was abandoned shortly. GazoPa hopes for better chances of success due to the ubiquitiy of digital and phone cameras these days and its large database.

With widespread use, I wouldn’t go so far as say that it would render keywords and tags used for image search obsolete but it would definitely be an additional method we can use when we go about searching for images.

Comments off