XML was not always a silver bullet.
I programmed a simulator for another class this semester and I tried to use XML format for my input file to the simulator. The simulator takes a graph topology information first, and then needs to parse it. Compare the two formats below describing the same graph information.
:: GraphML (Standard XML format for describing graph data structure) ::
<graphml xmlns=”http://graphml.graphdrawing.org/xmlns” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation=”http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd” >
<graph edgedefault=”undirected” parse.nodes=”10000″ parse.edges=”20000″>
<node id=”0″ />
<node id=”1″ />
<node id=”2″ />
…..
<node id=”9997″ />
<node id=”9998″ />
<node id=”9999″ />
<edge source=”2″ target=”1″ />
<edge source=”2″ target=”0″ />
<edge source=”3″ target=”1″ />
…..
<edge source=”0″ target=”8068″ />
<edge source=”1″ target=”9731″ />
<edge source=”1″ target=”5549″ />
</graph>
</graphml>
:: Normal text format ::
Topology: ( 10000 Nodes, 20000 Edges )
Model (1 – RTWaxman)Nodes: ( 10000 )
0
1
2
…..
9997
9998
9999Edges: ( 20000 )
0 2 1
1 2 0
2 3 1
…..
19997 0 8068
19998 1 9731
19999 1 5549
The second format was much better both in terms of file size and parsing speed. The XML format spent too much on putting structured metadata on the data. Once the data will be used in a limited domain, costs for structuring and standardizing data could overwhelm the benefit of doing so.
This case reminded me the warning of Svenonius, which was “putting infinite number of metadata to data is economically impossible,” although my case did not involve “infinite” numbers of metadata. Anyway, I experienced the tradeoff of IO and IR again.
Permalink Comments off

