case studies

Overview

While we witnessed tremendous technological innovation, approximately 1.2 billion people are still said to live in extreme poverty. It has been a big question in international community how we can solve this problem more rapidly and effectively. The World Bank, a world largest aid agency making 52.6 billion commitment in 2012, believes data can accelerate the effort. They have been collecting data through their research activity as well as project operations and leading knowledge sharing in the industry. Since 2010, they started open data initiative and make available over 850 financial data sets, statistical data on 11,000 projects, and data collected through 700 surveys. This renewed, open information system includes various digitized datasets, various information retrieval functions and interactive visualization features. In this essay, I am going to focus my analysis on the Data Catalog, the online catalog provides the list of available datasets with various meta data and allows users to understand what particular datasets are and how they can access these datasets.

 

What resources are being used?

The Data Catalog contains 162 items, which are collection level resources such as “World Development Indicators”. These items are collection of indicator data typically organized in faceted classification of regions, countries, topics, and years. Alternative design choice could be to set granularity at more specific level, particular topic indicators such as “Access to electricity”. But this way would make it difficult to see how the overall survey system is organized and structured. As that is important information for users to think of application of the data, I believe the designer took the current granularity level.

As I stated in overview, the scale of the Data Catalog is already very large. It is expanding infinitely because the new survey results will be constantly added. When existing items get new set of annual data, it goes to “Archive” and does not increase total number of item in the Catalog. This way, the designer successfully shows large amount of data in a very simple manner. The Data Catalog has various description resources and data in highly compatible format to support interaction related to primary resources, which I would discuss in next paragraph.

 

Why are resources being organized?

The highest-level goal of this information system is to accelerate poverty reduction by enabling effective knowledge generation and sharing. To support the goal, the Data Catalog provides access and platform where users can retrieve, download and interact with data. Users are typically international aid practitioners, activists, researchers and software developers. There used to be little awareness about software developer’s role because they are not traditionally main stakeholders in the industry. But now, reflecting the World Bank’s strong commitment for utilizing data more effectively, the system provides the developers even APIs of the Catalog. It helps to pass the data from the Catalog to computational processes.

In addition, we can search information by useful features such as faceted classification, sorting based on multiple conditions. Links to download resources such as annual datasets are displayed so clearly in blocks that even first-time users can find them without problems. Most data are available in excel and csv formats so that users can manipulate, analyze and obtain meaningful findings. To support data manipulation by users with little technical knowledge, some of the items in the Catalog can be viewed in interactive visualization system. Most contents are available in multiple languages.

 

How much are the resources organized?

To enable precise browsing and search, the Data Catalog uses controlled vocabulary for various descriptions resources and well designed categories. Like authority control in library science, this system precisely follows standardized names and terms. For example, as “Economy Coverage”, there are standardized categories and names of regions such as “East Asia & Pacific” and “Europe & Central Asia”. However, some items in the Catalog are quite different from others in terms of contents, making it difficult to categorize everything to the same degree. Therefore, these items seem to use standardized descriptions wherever applicable. I do not observe social tagging system with any resources in the Catalog at all. I believe that the designer did not include social tagging system because most users have specific information needs and would search information based on controlled vocabularies.

 

When are the resources organized?

The resources in the Catalog are all collected officially by the World Bank. The World Bank has obligation of publishing these resources, and so organizes items soon after the studies are completed. Because most survey results are digitized, the process of inserting the new survey results to the Catalog might be automated. Description resources are generated at the time when a primary resource is added to the Catalog, providing users as efficient interaction as possible.

 

Who does the organizing?

As the system is very large scale, there must be staff who are in charge of overall governance and decision making about the organizing system. These staff work to analyze users’ interaction and improve the system by changing interfaces, keeping data consistent, and possibly generating new description resource categories.  On the other hand, the staff who are responsible for particular surveys work to add more resources in the approved formats with description resources in controlled vocabularies.

 

Other considerations

One of the biggest challenges of the Catalog is to support interaction in multiple languages. This is not a trivial issue as users are from various countries with different culture and languages. Current interface of the Data Catalog is not ideal for non-English speaking users. Even if I switch the language setting to Chinese, many description resources are still displayed in English.  Because these are official terms which probably have corresponding official translated terms, I believe the designer did not want to use auto-generated translation. In addition, information retrieval features are currently only available in English. If the same feature would be implemented in Chinese, a non-segmented language, it could be more complex. As OECD successfully provides statistical data in both English and French, the World Bank may be able to learn from their practice.