Tiffany’s Assignment 10 | I202 | Information Organization and Retrieval

Tiffany Barkley

INFO 202 Assignment 10

OVERVIEW

This case study discusses the organizational principles and processes behind a computational system built by a start-up to collect traffic speed data and provide real-time routing and trip time estimates to users as part of a new mobile application

WHAT RESOURCES ARE BEING ORGANIZED?

The resources in this scenario are the data points that are collected by the system and turned into the travel guidance viewed by the end-user. The foundational system resource is traffic speed data. During the resource selection process, the company needs to decide the types of data it will collect and from where they will be acquired. Alternatives include collecting data from physical sensors maintained by the government for roadway operations (often free, but not always available), purchasing raw in-vehicle GPS data points from car companies and other vendors (costly), and collecting GPS data points from the company’s user base (free, but likely does not offer sufficient data points to produce good information

across an entire network). Making the decision of what types of data to collect and organize requires trade-off considerations of purchase cost, degree of autonomy from other organizations, accuracy of results, computation complexity, and scalability to other markets.

WHY ARE THE RESOURCES ORGANIZED?

The resources are organized because they need to support the real-time interactions of a customer base that expects fast and accurate results. Underlying the transformation of the raw data into the processed trip time estimates is a highly organized database systems and set of computations, which are further described in the following section. This organization is essential for quickly computing and updating information for the user. Beyond the real-time context, organizing the historical resource data has other benefits, like allowing the company to train its prediction models by comparing its trip time estimates with measured trip times computed from the data

HOW MUCH ARE THE RESOURCES BEING ORGANIZED?

This question depends on the number and different types of data resources chosen during resource selection. The organization of the resources all takes place in databases. It might make sense for the company to store all of the raw data that it collects in a separate table for each separate source, since each source will be received in a particular format. From there, the data may be organized into source-specific tables that store the results of massaging the data into speed in consistent units, along a specified set of network links sufficiently granular to support routing use cases. This data may be further processed and organized into the final data set that represents the fusion of all of the different collected data sources, generated with some weighting function for the accuracy of each source. While the mobile application may only interface with this final data set table, the other tables are critical for generating consistent results that can seamlessly handle various error conditions. For example, if the real-time data feeds go down, the final table can be populated with historical default values for each link, potentially representing average speeds for each time of day and day of week. The organization of the data at various levels of the processing chain allows for this to happen without the application having to know or care where the data comes from.

It is also important to consider how much the resource descriptions are being organized. With traffic data, it is most critical to capture and store where and when the data point was collected. How these resource descriptions are provided by the data source is highly variable, since there is no single, universally accepted standard for this domain. For example, location data can be received as a latitude/longitude pair in a number of different coordinate systems, a milemarker, or a reference to a ramp crossing. As such, it is critical to have a model that maps over the vocabulary used by each data source provider into the company’s model

WHEN ARE THE RESOURCES BEING ORGANIZED?

The speed data resources are organized into the system in real-time, as they are collected. The data model and system architecture, however, would be defined by the company at the outset of the project, and continually refined as it is implemented and data begins to feed in. Given the rapid change in sensing technologies, it is likely that the company will have to revisit its database model and algorithms to accommodate new forms of data as their product matures. Another interesting question related to resource maintenance is how long the company should keep the data it collects. For the purposes of the mobile application, the value of each data point decreases over time. However, the company has added value to the data by quality-controlling it and fusing disparate sources together, they may want to keep it and see if it can be of any value in future products or in a data sale to another organization. This especially makes sense given that storage costs are decreasing and cloud-based storage alternatives are abundant.

WHO DOES THE ORGANIZING?

Once the system is set-up, it is fully automated, meaning that the system takes care of all collection and organization. The development and maintenance of the system is done by company employees. Given the complexity of the system, this is a major undertaking which grows more challenging as the number of data sources and data points grow. At the outset, the system organizers needs to make decisions about the hardware needed to run the system (based on their estimates of how many data points will be collected and what needs to be done with them), structure the various database tables with a thorough understanding of inputs and outputs, and set up the feeds. On an ongoing basis, they need to maintain the system and add new sources/features as necessary.

OTHER CONSIDERATIONS

This case study considered traffic data to be the primary resource. As the system evolves, the company may want to fold in other resources helpful in prediction, such as data on traffic accidents, weather, and construction.