Archive forDecember, 2008

Folksonomy works well with others?

Here is a post about a library of congress report that we all might find pretty interesting. To copy from the blog which copies from the summary:

The following statistics attest to the popularity and impact of the pilot. As of October 23, 2008,
there have been:
• 10.4 million views of the photos on Flickr.
• 79% of the 4,615 photos have been made a “favorite” (i.e., are incorporated into personal
Flickr collections).
• More than 15,000 Flickr members have chosen to make the Library of Congress a
“contact,” creating a photostream of Library images on their own accounts.
• 7,166 comments were left on 2,873 photos by 2,562 unique Flickr accounts.
• 67,176 tags were added by 2,518 unique Flickr accounts.
• 4,548 of the 4,615 photos have at least one community-provided tag.
• Less than 25 instances of user-generated content were removed as inappropriate.
• More than 500 Prints and Photographs Online Catalog (PPOC) records have been
enhanced with new information provided by the Flickr Community.

Kinda cool, no?

Comments off

Using GPS Tracks as Contextual Metadata for Multimedia

In the reading “Context Data in Geo-Referenced Digital Photo Collections”, a single GPS point is used as metadata for an image. In fact, GPS device could capture more than just a point; it could capture the whole track of a user’s journey, and using the GPS tracks as metadata may be more interesting than using point alone. The image below shows a trip around Tian’anmen in Beijing, with images connected together by a GPS track.

There are at least two benefits about using a GPS track, instead of a point:

1.Better visualization. Users could “replay” the whole journey on a map together with images in animation (which I implemented myself with Javascript and Live Map API)

2. GPS tracks reveal more about users’ behavioral patterns, and these patterns can be used to improve mobile/location search. On an individual level, car drivers care less about distance than pedestrians, and traffic mode can be inferred from GPS tracks using supervised learning [1]; on a social level, locations frequently visited could be regarded as more “popular” and thus be given higher ranking in local search. Rich geographic information in GPS tracks is invaluable to mobile and location-based IR.

[1] “Learning Transportation Mode from Raw GPS Data for Geographic Applications on the Web”, WWW2008

Comments off

DSM-V – Somewhat arbitrary?

Earlier in the semester, I posted about the story of the creation of the DSM-IV classification system based on psychological characteristics.   Today’s New York Times addresses the latest release, the DSM-V.  It’s an ideal case study for classification themes we’ve reviewed all semester:

“In psychiatry no one knows the causes of anything, so classification can be driven by all sorts of factors” — political, social and financial.”

“What you have in the end,” Mr. Shorter said, “is this process of sorting the deck of symptoms into syndromes, and the outcome all depends on how the cards fall.”

http://www.nytimes.com/2008/12/18/health/18psych.html?hp

N grows over time..

Comments off

Michigan Library Web 2.0-ed; Husband 202-ed

As if we needed further proof that taking 202 can change your life (or your wife), my husband today sent me an email worthy of mentioning on this blog. I think he has been vicariously 202-ed. What he felt compelled to share with me was the fact that the University of Michigan Library has a feature on its Mirlyn search engine where users can tag any item in the collection. I checked the UC “next-generation pilot” version of Melvyl, and indeed it supports tags as well, along with social bookmarking features. It’ll be interesting to see just how Weinbergian our libraries get in the next few years. I think this is a good thing, because it shows that libraries are paying attention to what is going on outside and are not afraid to experiment with it. Long live libraries!

Comments off

Google advanced search gets query preview

Shortly after we talked about Google’s advanced search interface, someone in class pointed out that Google had changed their advanced search interface. I was re-visited their advanced search tonight and noticed that it now also shows you how your query would be formatted using Google’s syntax. This seems like a fabulous UI tool for teaching people how to construct better queries. For example, if you type ‘School of information’ in the “this exact wording or phrase” box, it shows that term in quotation marks in your query preview. Through observation, people can then recognize that quotation marks indicate exact phrases. The interface also teaches you how to specify the file type of results and how to restrict your search to a specific site.

Google Advanced Search

Google Advanced Search

Not all elements of advanced search are represented in the search box itself, so the product isn’t making itself dispensable, but the interface does make it much easier to learn some of Google’s tricks. I learned that you can search for a range of numbers with “\d..\d”. The first result for ‘cars $500..$1000′ find $690.

Comments off

XML was not always a silver bullet.

I programmed a simulator for another class this semester and I tried to use XML format for my input file to the simulator. The simulator takes a graph topology information first, and then needs to parse it. Compare the two formats below describing the same graph information.

:: GraphML (Standard XML format for describing graph data structure) ::

<graphml xmlns=”http://graphml.graphdrawing.org/xmlns” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation=”http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd” >
<graph edgedefault=”undirected” parse.nodes=”10000″ parse.edges=”20000″>
<node id=”0″ />
<node id=”1″ />
<node id=”2″ />
…..
<node id=”9997″ />
<node id=”9998″ />
<node id=”9999″ />
<edge source=”2″ target=”1″ />
<edge source=”2″ target=”0″ />
<edge source=”3″ target=”1″ />
…..
<edge source=”0″ target=”8068″ />
<edge source=”1″ target=”9731″ />
<edge source=”1″ target=”5549″ />
</graph>
</graphml>

:: Normal text format ::

Topology: ( 10000 Nodes, 20000 Edges )
Model (1 – RTWaxman)

Nodes: ( 10000 )
0
1
2
…..
9997
9998
9999

Edges: ( 20000 )
0    2    1
1    2    0
2    3    1
…..
19997    0    8068
19998    1    9731
19999    1    5549

The second format was much better both in terms of file size and parsing speed. The XML format spent too much on putting structured metadata on the data. Once the data will be used in a limited domain, costs for structuring and standardizing data could overwhelm the benefit of doing so.

This case reminded me the warning of Svenonius, which was “putting infinite number of metadata to data is economically impossible,” although my case did not involve “infinite” numbers of metadata. Anyway, I experienced the tradeoff of IO and IR again.

Comments off

Bad IO/IR Case That I Found

I have recently developed a ruby wrapper for the API of one of the largest internet portals in Korea. Most of their RESTful API were well organized in a form of RSS or XML, but there was one very interestingly bad case.

http://dev.naver.com/openapi/sample/rank.xml

If you open the link above (please ignore Korean part of it), you will see an XML file. The file describes the real-time hot keywords searched by people, and items are ordered by its rank. When you see the tags embracing each keyword, the names of elements are “R1″, “R2″, “R3″ and so on. In the perspective of a 202er, it should be corrected to something like that below.

<result>
<items>
<item>
<rank>1</rank>
<keyword>ischool</keyword>
<change>+32</change>
</item>
…..
</items>
</result>

Or, at least, they should use attribute to describe the rank instead using element name for doing so.

Comments off

The Library of Congress releases a report on the success of Flickr Commons

The Library of Congress has released a report discussing the results of their experiment to put a few thousand historical photos on flickr and allow users to add tags, comments, and notes on the photos. They’ve deemed the project a success, gathering lots of additional information about photos including personal stories from commenters’ family histories. The LOC has employees verify user-contributed information such as details on subject or location before adding it to the official description.

The report does mention some concern with the presence of rudeness or snarkiness that results when you open a project to the public: “Notes (annotations left directly on the photos) have some utility, such as pointing out specific persons in a crowd or deciphering the words on a sign or placard. Notes are also a means of adding graffiti-type messages and smart-aleck humor to the images, which is a cause for some concern among Flickr members and Library staff.”

Link: Library of Congress Blog

On an unrelated note, here’s a comic depicting an alternate method than what we discussed in class for calculating the impact of a researcher’s work based on their citations:


http://www.phdcomics.com/comics/archive.php?comicid=1108

Comments off

organizing our future | cleaning up art

<body intent:sarcasm>

<Introduction tone:praise>
An amazing presentation on the benefits of putting 202 into art. Using these methods artists can now estimate the amount of color they would need for their future works.
</Introduction>

<illustration> By creating ontologies and structurally separating elements in Art, the interoperability between artists would increase greatly. The BIG RED button would help us generate a 25% Monet + 75% Blake. </illustration>

<Dream> And the best part of all – Computers would be able to make sense of Art, and make decisions about it, just like they should be able to select the best doctor for Lucy (Tim Berners-Lee 2001)</Dream>

ART: I want it organized, just like I want my interface designed by an algorithm, and my Moby Dick in XML. 

amen.

</body intent:sarcasm>

<link: ted.com foolishTag:to-share>

“…Ursus Wehrli shares his vision for a cleaner, more organized, tidier form of art — by deconstructing the paintings of modern masters into their component pieces, sorted by color and size.”

http://www.ted.com/index.php/talks/ursus_wehrli_tidies_up_art.html

</link>

Comments off

The Silicon Tower

BBC News’s Aleks Krotoski has a thought-provoking op-ed piece about how technophilically skewed the bulk of the internet really is. Her observation is based on spending some time with people who simply don’t use the web. She points out that they are not luddites but people who have simply found that the web doesn’t speak their language, doesn’t share their ways of structuring information. She mentions issues with search facilities like Google, but she also points out that even approaches meant to be more democratic (e.g., the semantic web, or facilities based on the intelligence of the masses) fall short for people who are not technologically oriented because the creators of web sites and the presumed intelligent masses are dominated by technophiles. For us 202ers, of course, the differences in how people organize information are nothing new, but it’s good to remind ourselves now and then that, as aware of the differences as we are, we are ourselves members of a particular community of thought. We at the iSchool are, I think, too focused on serving the needs of society to be considered residents of the traditional ivory tower; we live instead in a silicon one.

Comments off

« Previous entries