XML was not always a silver bullet.

I programmed a simulator for another class this semester and I tried to use XML format for my input file to the simulator. The simulator takes a graph topology information first, and then needs to parse it. Compare the two formats below describing the same graph information.

:: GraphML (Standard XML format for describing graph data structure) ::

<graphml xmlns=”http://graphml.graphdrawing.org/xmlns” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation=”http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd” >
<graph edgedefault=”undirected” parse.nodes=”10000″ parse.edges=”20000″>
<node id=”0″ />
<node id=”1″ />
<node id=”2″ />
…..
<node id=”9997″ />
<node id=”9998″ />
<node id=”9999″ />
<edge source=”2″ target=”1″ />
<edge source=”2″ target=”0″ />
<edge source=”3″ target=”1″ />
…..
<edge source=”0″ target=”8068″ />
<edge source=”1″ target=”9731″ />
<edge source=”1″ target=”5549″ />
</graph>
</graphml>

:: Normal text format ::

Topology: ( 10000 Nodes, 20000 Edges )
Model (1 – RTWaxman)

Nodes: ( 10000 )
0
1
2
…..
9997
9998
9999

Edges: ( 20000 )
0    2    1
1    2    0
2    3    1
…..
19997    0    8068
19998    1    9731
19999    1    5549

The second format was much better both in terms of file size and parsing speed. The XML format spent too much on putting structured metadata on the data. Once the data will be used in a limited domain, costs for structuring and standardizing data could overwhelm the benefit of doing so.

This case reminded me the warning of Svenonius, which was “putting infinite number of metadata to data is economically impossible,” although my case did not involve “infinite” numbers of metadata. Anyway, I experienced the tradeoff of IO and IR again.

Comments off

Project Bamboo

http://projectbamboo.uchicago.edu/

Not sure if you have heard of Project Bamboo, but it is a effort to find ways to utilize and incorporate technology into humanities research to advance the field(s).  Sponsored by the Mellon Foundation, the end goal is a proposal for an implementation strategy, including standards and the like.  My husband has been attending the most recent workshop on behalf of Blackboard (because they want a seat at the table as the standards are being set of course!!!) and it’s basically been a 202 extravaganza.  At the table?  Librarians, philosophers, artists, lit profs, computer scientists, even a few iSchool professors (Larson and Kansa), etc. This led to lengthy debates about the meaning of what they were actually trying to do, how explcitly they should define it, how to carve up their worlds, why the sky is blue, etc. One of the main things that they apparently kept coming back to was, of course, The Tradeoff.  Who does the work and who reaps the benefits.

Pretty cool stuff though, and hearing his recap (”classification”, “ontology”, “schemas”, “data interoperability”, “buzz”, “buzz”, “buzz”) was essentially like a mini-study session for the midterm.

If anyone is interested in contributing – especially those philosophers among us – there are links to join off of their site.

http://projectbamboo.uchicago.edu/join-us

Comments off