In the 190 or so posts preceding this one we’ve shared a bunch of interesting tools related to 202 concepts. Nick Doty and I are collecting websites, tools, and ideas that could be used in future 202 classes. We’d like to find as many as we can. If anyone stumbles across something that makes you think 202, please post it here, add a comment, or send me an email.

A couple of examples of nifty tools that people have already shared:

If it makes you think 202, we want to know about it.

202 Tools

I ran across a few tools/services which made me think of the good old 202 days, and seemed worth sharing:

In case your delicious bookmarks are still not as annotated/organized/categorized as much as you would like them to be, the fine folks over at MIT’s CSAIL have released a tool to help with that, dubbed Facette. Actaully seems to be an interesting project, tackling familiar issues and questions.

And if you find yourself casting about for concept map/mind map software, the folks at Mindomo have both a web based and an AIR app which can be installed locally for those times when you just need to diagram it out.

Maybe this will be useful for the Fall…

What’s a Small Farm?

This year the the USDA released the much-anticipated 2007 agricultural census.  This census showed a rise in the number of small farms, and this statistic was celebrated in many farm and food articles and blogs.

Gristmill points out that former USDA Economic Research Service researcher, Michael Roberts, argues that there may not actually be more small farms, there may simply be a difference in what “counts” as a small farm.

The important revelation here is that the USDA uses statistical weighting to arrive at the numbers for these micro-farms since many of these people don’t even self-identify as farmers — and so their precision is entirely a question of their methodology, i.e. how they decide to model the presence/frequency of these small operations. Census weighting is, of course, both controversial and necessary. Counting everything by hand can have a larger margin for error than rigorous statistical modeling. Indeed, this “controversy” is right now at the heart of a monumental battle between Democrats and Republicans over the U.S. Census (just ask Sen. Judd Gregg).

That said, there is nothing inherently wrong with the practice. However, even if your overall approach is solid, if you then change your weighting techniques from year to year, comparing annual changes is all but impossible. And that appears to be exactly what the USDA is doing.

Needless to say, this is a pretty big deal.  Are the number of small farms actually growing?  Or is the current political climate in this realm simply pushing the USDA to fudge their methods a little, causing a shift in their categorization schemes?

The Modern Librarian

This nytimes article discusses the roles librarians have taken on in response to computer based io/ir.  The article presents some examples of io/ir skills librarians are now training students in, reasons why librarians will not be replaced by search engines.  It also mentions issues of schools with tight budgets firing librarians to save money and a lack of appreciation for the role of the library/librarian in schools, reasons why their jobs might be threatened anyway.

“The days of just reshelving a book are over,” said Ms. Rosalia, who came to P.S. 225 nearly six years ago after graduating at the top of her class at the Queens College Graduate School of Library and Information Studies. “Now it is the information age, and that technology has brought out a whole new generation of practices.”

In Web Age, Library Job Gets Update
Published: February 16, 2009
School librarians are increasingly teaching digital skills, but they often become the first casualties of budget crunches.

Folksonomy works well with others?

Here is a post about a library of congress report that we all might find pretty interesting. To copy from the blog which copies from the summary:

The following statistics attest to the popularity and impact of the pilot. As of October 23, 2008,
there have been:
• 10.4 million views of the photos on Flickr.
• 79% of the 4,615 photos have been made a “favorite” (i.e., are incorporated into personal
Flickr collections).
• More than 15,000 Flickr members have chosen to make the Library of Congress a
“contact,” creating a photostream of Library images on their own accounts.
• 7,166 comments were left on 2,873 photos by 2,562 unique Flickr accounts.
• 67,176 tags were added by 2,518 unique Flickr accounts.
• 4,548 of the 4,615 photos have at least one community-provided tag.
• Less than 25 instances of user-generated content were removed as inappropriate.
• More than 500 Prints and Photographs Online Catalog (PPOC) records have been
enhanced with new information provided by the Flickr Community.

Kinda cool, no?

Using GPS Tracks as Contextual Metadata for Multimedia

In the reading “Context Data in Geo-Referenced Digital Photo Collections”, a single GPS point is used as metadata for an image. In fact, GPS device could capture more than just a point; it could capture the whole track of a user’s journey, and using the GPS tracks as metadata may be more interesting than using point alone. The image below shows a trip around Tian’anmen in Beijing, with images connected together by a GPS track.

There are at least two benefits about using a GPS track, instead of a point:

1.Better visualization. Users could “replay” the whole journey on a map together with images in animation (which I implemented myself with Javascript and Live Map API)

2. GPS tracks reveal more about users’ behavioral patterns, and these patterns can be used to improve mobile/location search. On an individual level, car drivers care less about distance than pedestrians, and traffic mode can be inferred from GPS tracks using supervised learning [1]; on a social level, locations frequently visited could be regarded as more “popular” and thus be given higher ranking in local search. Rich geographic information in GPS tracks is invaluable to mobile and location-based IR.

[1] “Learning Transportation Mode from Raw GPS Data for Geographic Applications on the Web”, WWW2008

DSM-V – Somewhat arbitrary?

Earlier in the semester, I posted about the story of the creation of the DSM-IV classification system based on psychological characteristics.   Today’s New York Times addresses the latest release, the DSM-V.  It’s an ideal case study for classification themes we’ve reviewed all semester:

“In psychiatry no one knows the causes of anything, so classification can be driven by all sorts of factors” — political, social and financial.”

“What you have in the end,” Mr. Shorter said, “is this process of sorting the deck of symptoms into syndromes, and the outcome all depends on how the cards fall.”

N grows over time..

Michigan Library Web 2.0-ed; Husband 202-ed

As if we needed further proof that taking 202 can change your life (or your wife), my husband today sent me an email worthy of mentioning on this blog. I think he has been vicariously 202-ed. What he felt compelled to share with me was the fact that the University of Michigan Library has a feature on its Mirlyn search engine where users can tag any item in the collection. I checked the UC “next-generation pilot” version of Melvyl, and indeed it supports tags as well, along with social bookmarking features. It’ll be interesting to see just how Weinbergian our libraries get in the next few years. I think this is a good thing, because it shows that libraries are paying attention to what is going on outside and are not afraid to experiment with it. Long live libraries!

Google advanced search gets query preview

Shortly after we talked about Google’s advanced search interface, someone in class pointed out that Google had changed their advanced search interface. I was re-visited their advanced search tonight and noticed that it now also shows you how your query would be formatted using Google’s syntax. This seems like a fabulous UI tool for teaching people how to construct better queries. For example, if you type ‘School of information’ in the “this exact wording or phrase” box, it shows that term in quotation marks in your query preview. Through observation, people can then recognize that quotation marks indicate exact phrases. The interface also teaches you how to specify the file type of results and how to restrict your search to a specific site.

Google Advanced Search

Google Advanced Search

Not all elements of advanced search are represented in the search box itself, so the product isn’t making itself dispensable, but the interface does make it much easier to learn some of Google’s tricks. I learned that you can search for a range of numbers with “\d..\d”. The first result for ‘cars $500..$1000’ find $690.

XML was not always a silver bullet.

I programmed a simulator for another class this semester and I tried to use XML format for my input file to the simulator. The simulator takes a graph topology information first, and then needs to parse it. Compare the two formats below describing the same graph information.

:: GraphML (Standard XML format for describing graph data structure) ::

<graphml xmlns=”” xmlns:xsi=”” xsi:schemaLocation=”” >
<graph edgedefault=”undirected” parse.nodes=”10000″ parse.edges=”20000″>
<node id=”0″ />
<node id=”1″ />
<node id=”2″ />
<node id=”9997″ />
<node id=”9998″ />
<node id=”9999″ />
<edge source=”2″ target=”1″ />
<edge source=”2″ target=”0″ />
<edge source=”3″ target=”1″ />
<edge source=”0″ target=”8068″ />
<edge source=”1″ target=”9731″ />
<edge source=”1″ target=”5549″ />

:: Normal text format ::

Topology: ( 10000 Nodes, 20000 Edges )
Model (1 – RTWaxman)

Nodes: ( 10000 )

Edges: ( 20000 )
0    2    1
1    2    0
2    3    1
19997    0    8068
19998    1    9731
19999    1    5549

The second format was much better both in terms of file size and parsing speed. The XML format spent too much on putting structured metadata on the data. Once the data will be used in a limited domain, costs for structuring and standardizing data could overwhelm the benefit of doing so.

This case reminded me the warning of Svenonius, which was “putting infinite number of metadata to data is economically impossible,” although my case did not involve “infinite” numbers of metadata. Anyway, I experienced the tradeoff of IO and IR again.

