Category Archives: Case Studies

The Organizing System for Cricket

Overview
From the old-school charms of Calcutta’s grounds to the big-city vibe of Bombay, popular from foothills of the Himalayas to the beaches of Chennai, Cricket is a not just a sport in India but a religion, which unites the whole nation. Tons of venues, hundreds of players, thousands of matches, and millions of spectators altogether generate so much of information, which when systematically organized, becomes a powerful tool for post-match analysis. The domain of my case study is “Organizing system for Cricket matches”, and scope of my case study is “to organize and provide Cricket match details on an online platform during and after a cricket match”.
What resources are being used?
This Organizing system systematically organizes information about all the Cricket matches, in-depth statistics of every player and team, match scorecards, and match commentary for each game. It also organizes the post-match in-depth analysis on every aspect of the game by the expert, whose judgments are based not merely on the keenest understanding of the game but on a wider understanding of society, history and human behavior, and its impact on the game.
Why are the resources organized?
In this system, resources are organized to allow Cricket fans to retrieve data about a cricket match that is currently going on or that has already been played. Organizing cricket match details on an online platform allows for the resources to be interacted with at different levels of granularity, which can allow for multiple levels of interaction such as live commentary access and live scorecard updates during the match. Organizing cricket match details also allows for interactions not just with the cricket match but also allows a fan to interact with the sport as a whole (in terms of the resource descriptions provided for each stroke played, and post-match analysis), the players (access and updates to a player’s profile), and to the collection of players in the team as well.
This organization system assesses the information of the agents involved in the process, and the goal is to synthesize insights from the information generated by these parameters, to provide richer understanding to the participating teams in any tournament, to help them decide the selection of players. It is also used to analyze and predict players’ performances, to predict their likelihood of injury, and also to predict the number of spectators for a particular match on a venue. This organizing system makes it easier for the Cricket teams to trace the connection between their requirement and players’ capabilities, and hence it helps the teams to decide transaction of players among themselves.
How much are the resources organized?
Cricket has a predefined controlled vocabulary, which allows semantic comprehension for the users of this organizing system. The use of the controlled vocabulary is imperative as it is a means to communicate the statistics of the match, analyze the match, and also the performance of the players.
The expected lifetime of the organizing system is not the same as the expected lifetime of the Cricket match in this case. Although the match is a short-lived entity, but the resources in this case such as match scorecard, match commentary, player statistics, and post-match analysis report remain till the lifecycle of the organizing system.
When are the resources organized?
In this organizing system, all the resources being used are digital resources, some of which are generated after every match, while others are created by automated processes. Therefore, these exhibit a high degree of organization and structure because they are generated automatically in conformance with data or document schemas. These schemas implement the rules of the game and information models for the updation of scorecards and generation of the match statistics.
All the entities involved in system become part of the organization as soon as they participate in the game in any form, and they are part of the system till the lifetime of the organization. Since this sport is mandated by the rules imposed by International Cricket Committee, rules and regulations vary from time to time and this ultimately affects the nature and extent of the organizing system. For example, based on the performance of a player, he can be promoted to the International level from domestic level, which shifts him to another class in the information.
Who does the organizing?
Organization is performed by professional indexers and information feeders by using computer algorithms. They create and maintain Organizing System by ensuring the accuracy of the data. They are also responsible for implementing the same logical Organizing System in different classes by separating the Organizing principles in the middle tier.
Other considerations
Currently, there are fixed formats of the game which depend on the duration of the match, such as one day matches last for 50 overs each, test matches last for unlimited overs but for 5 days. With the introduction of new formats in the game such as 20-20 matches, which are fast paced, last only for 40 overs, and whose main focus is public entertainment, a lot of factors are going to be impacted in analyzing the statistics of the players. This will also change the schema or structure of the organizing system.

Lynea Lattanzi and The Cat House on the Kings

Overview

Lynea Lattanzio is the founder, president, and resident of The Cat House on the Kings, a nonprofit corporation and cat sanctuary with over 700 cats living on its 12-acre campus. (See http://www.cathouseonthekings.com/index.php
or the documentary http://channel.nationalgeographic.com/wild/videos/the-lady-with-700-cats/ ). This sanctuary is a fascinating, unique organizing system in itself. However, another organizing system will need to be designed on top of this system. For as Lynea Lattanzio grows older, she will need many support systems integrated to help both her and the cats to interact effectively. Most seniors need an organizing system (including sub-systems) for “independent living” to live in their homes for as long as possible. In Lattanzio’s case, she cares as much about her cats as about herself, so any system will need to ensure both her wellness and the cats’. An organizing system will need to help her interactions as she ages, but also the interactions of her cats. (Her corporation’s sum of interactions will need to be supported, since this management entity in turn manages the resources that serve the cats).

What resources are being used?

Lattanzio’s home and corporation are currently organized for many interactions that serve both her and the cats. The new organizing system must manage these resources as she ages, to support her interactions, but she is also part of the corporation’s resources, and therefore supports the cat’s interactions. As she ages, she will need her resources to change, and she herself will be a changing resource. She maintains effective management and also hands-on interactions. The specific resources being organized will be an information technology system that monitors her declining vision, hearing, mobility, and mental faculties. It will be integrated with dashboards that monitor her corporation. The system includes an array of sensors on the cats, which will use EEG and other techniques to monitor their motion, mood, noises, and full behavior. With machine learning, the emotional data given by the cats, and her sensors, will help monitor and predict changes in the resources. These signals of change will be about the intrinsic and extrinsic dynamic properties. The resource descriptions will be only as granular as needed based on the health status of her and the cats (if there are specific medical problems such as coronary disease or ulceritus, those specific systems will need targeted monitoring). Even with the corporation, the KPI’s or other metrics will be general, but if there is a problem and that problem can be isolated, more granularity will be needed to monitor and isolate at least that specific issue. (If there is a food supply problem traceable to just liver pellets being eaten by one cat, those pellets and cat might be monitored and interacted with to ensure that specific problem was fixed and prevented).

Why are the resources organized?

The resources support Lattanzio’s and the cats’ lives. However, decisions will prioritize the various interactions that constitute their lives and even lifestyles. The core interactions of life are typically related to eating, health, health care, and hygiene. Perhaps equally important are the wellness and emotional needs, which might require basic social interaction (communication, touch, or expression). There will be trade-offs when allocating resources to sometimes competing interactions.

How much are the resources organized?

The amount of organization depends on budget and the state of technologies, since many types of data collection, user interface, and analysis tools/software are available or designable. Her own interactions will simplify slowly with age so the interactions themselves may need to be less granular and precise (both mentally and physically). Since the current corporation organizes her and the cats already, the additional system organize the immediate threats posed by aging to the current interactions (it depends on which weakness needs targeting—if blindness is encroaching, her interactivity with the new system as well as the old systems would be much different, so extensive organizing would be needed). The more she ages, the more organization will be needed to fill gaps that were formerly covered by her.

When are the resources organized?

The new resources of the technology solution will be organized in stages, from initial analysis to implementation to ongoing maintenance. However, the technology solution should have continual, constant auto-regulatory features that organize the resources and the resources’ dynamic properties (possibly including the resource descriptions). For example, as the system senses that she is slower, it might set reminders or alarms at different times to account for it taking longer for her to either take her medication or feed the cats dinner. As the cats change and age, they might not only be classified differently (as arthritic, or flu-ridden), but also new classification categories may appear (such as having-rabies) in order to quarantine them differently. The initial organization and design of the system will need to identify which organizing activities occur on continual basis.

Who does the organizing?

An expert team will design the system (with her help). However, she will continue to do some organizing activities (certain screens and even haptic interfaces will be adjustable, so she can make the text bigger as her vision gets worse, and have the touch screen buttons get bigger as her coordination worsens). As she faces increasing problems her staff and/or family and friends or experts will adjust features of the system that are not auto-regulating. For example, they may change the type of security authentication when she can’t easily remember passwords.

Other considerations

The staff management software and networked robotics can manually or automatically increase to keep services at adequate levels. She can self-assess her decision-making and effectiveness via brain- and bio- sensor monitoring, and also by monitoring the wellness (“performance”) of the cats and corporation via reports that with machine learning or otherwise, synthesizing the vast amounts of data into meaningful indicators or metrics in a “business intelligence” that includes her and the cats’ behaviors and states. Beyond the scope of this case is how to organize resources after she dies, to support the corporation (because it supports the cats).

Assignment 10

Overview

I have chosen an enterprise data warehouse as the organizing system that I would like to focus on. A mid to large sized corporation has many types of immovable, movable infrastructure like office buildings, office furniture, transport buses and equipment. Out of these, the type of resource that is the most prized and dynamic is IT infrastructure. Servers, phone lines, network routers, laptops, PC’s, proxy networks are ubiquitous in an information-centric company. Managing these resources in the most efficient way possible is a primary goal of the management of a company. This is because the more efficiently these resources are handled, the less of a cost center IT equipment becomes. It can be seen as a profit center instead of a cost center. IT infrastructure is a very important resource that needs to be organized logically so that we can support interactions (like adding a resource, removing a resource etc.) efficiently without affects any dependent processes.

What resources are being used?

The resources that are being used in a company are generally the hardware and software components owned by the company. They can be a server, a network switch, a laptop or an application. These resources support the activities of a company for both its internal customers as well its external customers. But for our organizing system, which is a data warehouse, the resources are information components which have data related to these physical objects. This information can be about when a server was added to a network, when it was upgraded, when an issue was reported about the hardware and how much time it took to resolve it. The format of the source data fed into the data warehouse is an important distinction for the ‘Extract’ process of a data warehouse. The data can be received in the form of flat files (plain text), or from relational databases. Each format requires special technologies and handling procedures, like UNIX shell scripts for flat files and database procedures for relational data. Although all the information received are primary resources, they link to each other and hence form description resources for each other. An example would be, the ticket that was raised to replace a server was linked to a ticket that was raised to report a server crashing. The two information resources in themselves are primary resources, each containing the details when the ticket was raised, who it was assigned to and when it was resolved, but by linking them together they form metadata for each other. This is a kind of shared information component network that the data warehouse tries to catch. Depending on the focus of our resource, we can either have the same record as a primary or description resource. The resources primarily provide information about physical resources but they also capture information about the processes related to the these physical resources.

The resources are grouped together depending on the process or physical object they represent. Like information about a server or information about adding a new sever. Although they might be related to the same server, they clearly have different domains and different types of information. As a result, they will be treated as different resources.

The resources are names as per the domain they belong to. Like information about incidents raised for a equipment can be clubbed as an Incident entity. All change-related information can be clubbed as Change entity. The naming for these entities in the data warehouse conforms to a controlled vocabulary and a fixed syntax. Like a ‘server change-related information’ can be referred to a CHG. The corresponding table in the data warehouse would have a name with CHG as a suffix. Having a primary key for a server incident-related record as ‘12345-INC’ can ensure there can be no collisions with another record in a different domain. This qualified name also makes the identifier more informative.

Why are the resources organized?

The information resources are being organized in order to facilitate easy reporting. When every resource is intrinsically linked to other resources, getting the bigger picture can be a problem for mid-level and senior-level managers who are responsible for gauging the effectiveness of a new strategy and making required corrections. A person who manages a team responsible for resolving server incidents would like to know how many issues were resolved by his/her team, are they effective, should he/she get more people in her team, should he/she make changes to the way that people are assigned issues to work on. A single go-to point which can help answer all these questions is an operational necessity. The faster this organizing system can answer his/her questions, the more relevant the data warehouse becomes. That ultimately depends on the types of interactions the system supports, like having a canned report with trend graphs or exporting raw data so that it can be processed by another tool.

How much are the resources organized?

The user requirements decide how much the resources are organized. The effectivity of captured information is decided by the end users. Would they like to preserve data that was loaded to the data warehouse more than a year ago? Or are they only interested in last 2 months of data. If each user’s requirement is considered a set, then the union of all the sets of all possible users establishes the minimum amount of a data that needs to be maintained. Also, the level of granularity of data is again dependent on user requirements. If they are just interested in looking at a high-level summarized view of data, we need not preserve raw data pertaining to each and every ticket in the system. In this case the interaction is purely based on a collection level property. There could be a new entity that will be reported as a standalone process without any relationships with other processes. In that case, if there is no reporting requirement, it makes sense to just add that resource to the system without explicitly modelling the relationships that it actually has with other entities. If a star database schema is sufficient to satisfy user requirements, there would be no need to have a snow-flake data model which is inherently more complex and difficult to maintain. There could be cases where two separate domains always go together. Like ‘Incident’ information and “Rootcause” information. Every incident would have a root cause. But should we introduce these as separate entities or we can club them together into a single ‘Incident’ domain. This is a ‘one-one’ relationship which can easily be housed in a single table. I have noticed that this question is answered depending on the performance implications of having too many resources in the system as joining them and rendering them in a report would be time consuming. So wherever possible, collocation (denormalization) would be make better sense than unnecessarily creating too many resource categories.

When are the resources organized?

The resources are organized depending on the level of time granularity the user wants. In this specific case, the manager might want to look at the number of incidents resolved on a day-day basis. If the application or group they are supporting is critical, day to day monitoring makes sense. In this case any dip in the productivity of the team can be quickly caught and remedial action can be taken. The frequency of the data warehouse loads reflects this priority.

Who does the organizing?

Automated processes designed to Extract, Transform and Load data do the organizing on a daily basis. The designer of these processes, the data architect, creates the organizing system principles (data model) after interacting with the end users and inquiring about their requirements and specific needs. These requirements are almost always related to what kind of data the users would like to see together, in a single pie chart or in a single table. Understanding the relationships between the resources and maintain the ability and efficiency of the system are the key concerns that the organizer has.

Other considerations

Readability of the code and documentation matters a lot while creating a data warehouse. In this case, the person or team creating the organizing system moves on once the system has been created and a support team would maintain the system. If the design is too complex, or if the various processes were not documented properly, it would lead to a lot of issues in supporting the system and making modifications to existing functionality. The organizing principles need to be clearly enunciated so that new interactions can be added without affecting any existing functionality.

Personal Movie Collection

Overview

As a film aficionado, I am particular that my collection of movies stays organized. The organizing system for the film collection is influenced by the storage, file size, file type and source of the movie. It involves several kinds of interactions-Keeping up-to-date with the latest releases, maintaining a list of the movies that need to be added and deleted, finding the sources to acquire the movies, categorizing the folders, naming the files and updating the database whenever required. In order to facilitate this workflow, it is imperative to establish a robust set of designing principles for the organizing system.

What resources are being used?
On a computer the movies are represented by their corresponding digital file. Thus, the resources in this particular organizing system are the digital film files. These resources could have varied formats like AVI, MPEG or the WMV. Thus, it is important to realize the compatibility of each of these files in order to avoid operation issues. Furthermore, the movies are stored on different storage devices. For example, the classics from the ‘70s and ‘80s that are stored on VHS tapes or the ‘90s movies stored on CDs require a convertor to convert to a digital file. This process is important to obtain uniform resource format so that the organizing principles can be applied to all resources, seamlessly.
Naming Conventions: The inclusion of metadata like date for the resources could aid in differentiating between similar resource names. For example, Die Hard(1988) and Die Hard(2013).

Why are the resources organized?
I have an always-increasing collection of more than 400 movies on the computer. Having an organized system of classifying these resources based upon certain criteria could be helpful in several ways. I could effectively retrieve the resources whenever I want to view these movies. The system can be viewed as a central database for several other users such as my family and friends to access the movies whenever they would want to. Furthermore, organizing the resources can also make the sharing of these resources simpler and quicker. For example, a subfolder within the movie folder named ‘Horror’ (Based of the movie Genre) can allow a friend who wishes to borrow a horror movie easily. The overall goal of organizing the resources is to allow such simple and easy interactions with the system. The intended users can be my friends, family also, so it is important that the system is designed for simplicity.

How much are the resources being organized?
At the least granular level the main folder called ‘Movies’ is branched into subfolders called ‘Genre’, ‘Decade’, ‘Cast’, ‘Director’, ‘Language’ and ‘Favorites’. Each of the subfolders is further classified. For example the Genre folder is classified as Sci-Fi, Comedy, Thriller,, Action, Period, Western, Drama, Romance, etc. Thus, organizing the folders represents the hierarchical classification in the organization system. In this way, by assigning an extrinsic static property creates a distinct method of classification of the resources.

The classification of movies should be a faceted classification. Each resource can be categorized from a different designing principle. An overlapping-genre movie like an action-drama can be classified into the ‘Action’ genre folder as well as in the ‘Drama’ genre folder. A movie like Die Hard is an 80s, 90s action-movie starring Bruce Willis and thus would be a member of all these folders.

When are the resources organized?
Once the categories of folders have been created, the organizing activity can commence once a new movie file is added to the computer. Once a movie is acquired, it is saved on to a temporary folder. The details of the movie are obtained from the web sources such as IMDB and Metacrtitic. Having obtained the metadata, the resource can be added to the respective folders. However, the ‘Favorites’ folder would get populated only when I watch the movie frequently. Thus, the resources in this organizing system are not organized when added to the system but also when are frequently accessed by the user.

Who does the organizing?
As mentioned above, I am the primary user of the system and would do the bulk of the organizing activity. However, the system is capable of catering to the other intended users such as my family or friends who wish to access the system.

Other Considerations
One important consideration while designing the system is storage. Every month, several movies release that are worthy of being added to the system. Thus, as a system designer, I need to ensure that every resource has sufficient space upon inclusion. Another consideration is maintenance. Once a movie has been viewed and doesn’t seem to be one that could be a favorite, it could be deleted. Also when a movie gets corrupted, or for some technical reasons, it doesn’t work, it needs to be deleted. Thus, operations and maintenance are two key considerations once the system is set up.

Asha Tea House: Boba Tea Shop in Berkeley

Overview

Asha Tea House is a drink shop in Berkeley. Boba tea, the main product of Asha Tea House, is a Taiwanese tea-based drink, which usually mixed with fruit or milk and can be customized with chewy tapioca balls or other ingredients. In addition to sell cooked tea in shop, Asha also has an online shop selling the raw tea.As a Taiwanese, I found Asha Tea House interesting for its unique atmosphere based on the blend of the traditional Taiwanese Tea culture with US cosmopolitan café shop style. Asha tries to convey the oriental tea knowledge and culture to the customers and maintain long-term relationship with the customers by providing close interactions. This uniqueness distinguishes Asha Tea House from the other boba tea shops in Taiwan and Bay Area.

What resources are being used?

Just like the other profit-oriented cooperation, Asha Tea House has to organize many resources to maintain the business, including human resources, consumer resources, stock resources, knowledge resources, etc. Among these resources, I would like to specify in their stock resources and knowledge resources.As mentioned before, there are many different variants of boba tea and many ingredients can add to. Although the items on menu of Asha Tea House is simpler than other boba tea shop, there are still about 20 different kinds of tea. And the stocks not only includes 16 different species of raw tea, the raw tapioca ball, condensed milk, fruit puree, syrup also expand the scale of stock resources.

For the knowledge resources, Asha Tea House has to transform the abstract knowledge into a concrete information as to convey to their staff and the customers. They have to deal with not only the standardized process to make tea and prepare ingredients, but also the deeper knowledge of traditional Taiwanese tea culture and species of tea. Since Asha is not only selling tea as an “drink item”, but more like selling “drinking tea” as a life style, they put a lot of effort to tell the story behind every single cup of tea. In the shop, you can see the big-size photos of Taiwan tea farm, where all the tea served in store comes from, bringing more live scene for customers to picture. Also, you can get more well-organized guidance information on their online store. All of these are the knowledge resources they have to organize.

Why are the resources organized?

The resources are organized to improve the efficiency of certain interactions. For the stock resources, since the tea and ingredients are well organized into different category, it will provide a more efficient interaction of retrieval. And although the barista is the one having direct interaction of retrieving the tea, the efficiency is not just meant for the barista staff but also for the customers who are waiting for barista to make the order. To sum up, we can say that the efficiency leveraged by organization can be beneficial to the people both involved direct and indirect with certain interactions.Moreover, organizing not only provides the efficiency but also provides a better understanding. For the mission of tea culture education, the well-organized guidelines answer the most common questions and thus provide a easy way for customers to have deep understandings about tea.

How much are the resources organized?

Since there is currently one shop only and the number of staff is small, the stock resources are not organized into many hierarchies and granulated. For the tea part, different tea species are first categorized into black tea/ green tea/ oolong tea and then organized by their quality level and location of the tea farm.

When are the resources organized?

Asha Tea House change the menu seasonally, so when the menu is modified, it means the organizing system of stock has to be redesigned as to update the transition of the ingredients. For the knowledge resources, since the knowledge of tea should be persistent as to serve the same quality and taste, so most of the knowledge resource will not be reorganized frequently. However, if the manager does come up with some new stories about the tea, at that time, the knowledge resources will be reorganized.

Who does the organizing?

The owner of the Asha Tea House. Since it is relatively a small business, I supposed the owner can mostly decide most of the organizing principle. After the organizing schema is set, the staff team can follow the organizing schema but also giving suggestions to improve the organizing.

Other considerations

Currently there is only one shop for Asha Tea House to maintain, so the scope and complexity of the organizing system is relative small. However, from a report, I learned that the owner of Asha is thinking to open more shops in the near future. When it becomes chained-store, the organizing system will have to redesign as to interact with more people.

Shoe Retailing and at Zappos.com

Overview
Zappos.com is a large online shoe retailer that sells thousands of types of shoes, as well as bags, clothing, and accessories. Like its parent company, Amazon.com, Zappos has mass consumer appeal, so its website has been designed to serve a broad range of users. This case study reviews Zappos’ website and how it has been designed to support the company’s mission to provide the best customer service possible.

What resources are being organized?
The primary resources on Zappos.com are digital descriptions of the physical products that Zappos sells. Product descriptions include pictures, the price, available sizes, and customer reviews. In terms of granularity, a single product page for a pair of shoes or article of clothing may actually include dozens of variations of that item given the size and color options, with some variations having different prices. Zappos groups these variations together rather than giving each one a separate page.

Shoes can be difficult to buy online because customers cannot try on the product and test the fit before purchase. Zappos uses its customer review feature not only to collect review information, but to administer a “fit survey”. The survey asks a reviewer to rate the shoe’s support and whether the shoe felt true to size and width. This information is aggregated across reviews for each shoe and is prominently featured on the product page. In addition, the customer reviews include other experiences with the products such as whether a shoe is of good quality and construction. The fit information and customer reviews are essential for helping customers understand what it is like to wear the shoes being described.

Why are the resources organized?
The product descriptions on Zappos.com are organized to help customers find and browse shoes and other products for purchase. This search system must be easy to use in order for each unique customer to be able to quickly filter out tens of thousands of products before browsing just a subset they are interested in. A typical customer starts by entering a general search query like “winter boots” into the search bar. Then the customer is presented with a grid of product images. At this point, the customer might filter items by gender (men’s boots), price ($200 and under), and color (black). The results can additionally be filtered by brand, style, and other properties, or sorted by price and customer rating.

Once the customer has reviewed an item’s product page including the other customer reviews, the customer can “add to favorites” to remember the item for later or “add to cart” to save the item for purchase. After purchase, the customer may review the the item, thus contributing additional information to Zappos’ collection of product descriptions.

How much are the resources organized?
Zappos.com employs both a hierarchical classification system and a faceted classification system to organize its product descriptions. From the home page, the classification system appears hierarchical with top-level categories for women’s, men’s, and kids’, which are each subdivided into clothing, shoes, and other product categories. After clicking on a subcategory like “women’s shoes”, the customer is taken to another page that shows that subcategory broken down further (e.g., boots and slippers).

After clicking on another subcategory, or when using the search bar at the top of the website, the faceted classification system is immediately apparent. The left-hand menu becomes a tool for filtering the products shown. Prominent facets like color and type (e.g., for shoes: boots, sandals, athletic, etc.) are based on intrinsic static properties. Other facets like price and average customer rating are based on extrinsic dynamic properties. Curiously, while the fit survey information is prominently displayed on each product page, this information is not accessible for filtering.

When are the resources organized?
Product descriptions are first organized when they are created for a new product. As customers buy and review the products, they provide sales and rating information by which the product descriptions are continually organized. This information is used by later customers to sort by “Best Sellers” and “Customer Rating”. As the physical product inventory is sold, and certain sizes or colors of a product sell out or get discounted, the product description page continues to be updated. This maintenance of the product description continues until the item is eventually removed from the collection of products sold by Zappos.

Who does the organizing?
The product descriptions are organized by both humans and automated processes. The descriptions and product reviews are originally created by people and then made accessible to customers by automated processes. Customers browsing product descriptions can help organize the reviews on any given product page by marking reviews as helpful, where the most helpful reviews will be listed first in the list of reviews. As products are sold, automated processes manage marking items as best sellers and sorting items by average rating.

Other Considerations
Zappos.com’s faceted organizing system can be used to help with the vocabulary problem of describing shoe styles. For example, what some people call “sneakers”, others call “tennis shoes” or “athletic shoes”. Zappos’ classification system allows the same product to be classified with all three facets. In addition, this system has been designed for thousands of new products to be added each year. As shoe styles change over time, the faceted classification scheme will allow Zappos to create and add new descriptive facets for filtering products in a way that is understandable to customers of the time.

Organizing Digital Tumors

OVERVIEW: After months of reading the scholarly literature, attending lectures, and ruminating over the associative trails, a biologist might suddenly think of an idea that they wish to test. There are many methods of testing a fledgling hypothesis; some biologists might take to the lab, while others might pack for a long trip in the wilderness.

Today, due to the growing abundance of digital data, and the encouragement of open collaboration, some biologists have another option. Many researchers now upload their raw and processed datasets for community use. This is particularly relevant and useful for cancer biologists, who can use available genome sequences to answer research questions about a large sample of individuals, without expanding as many resources to hand collect the data.

In this case study, I have chosen The Cancer Genome Atlas (TCGA), an online repository created through a joint effort between the National Cancer Institute, NHGRI, and certain divisions of NIH. To test a hypothesis about cancer, a biologist, like the one in this case study, might turn to the TCGA.

WHAT RESOURCES ARE BEING ORGANIZED?:

An online tumor data file is born digital, but acquires its identity from physical objects. Samples from the physical tissue are extracted, and then passed through a machine that sequences fragments of DNA. The process of sequencing returns many small fragments of DNA, which are represented as digital files. Different levels of processing and granularity will yield different types of files. In some cases, a whole genome sequence may be considered a resource, while another resource might consist of only a few mutations pulled from that sequence.

These resources and their associated metadata serve to represent and model the physical tumor. Even the process of creating the tumor sequence is largely intention driven, rather than an exact representation of the tumor. After the sequencer has output the DNA fragments, an unaligned genome resembles a puzzle, in which many pieces are duplicated, and many have errors. Computational methods using physical similarity and statistics play a role in creating and refining resources, as does the human biologist’s intuition.

WHY ARE THE RESOURCES ORGANIZED?:

Online tumor datasets are organized to facilitate easy retrieval for analysis. TCGA resources are arranged in a faceted classification system. This enables users to sort and retrieve based on multiple properties. Some users might be only interested in a certain type of data, such as clinical data, while others might wish to examine all types of data for a particular cancer type, such as ovarian. The faceted classification system enables yet a third user to look at ovarian clinical data.

Certain traits, closely resembling attributes, further enable precision in retrieval. For example, the user searching for ovarian cancer, might also search for Stage III as opposed to stage II ovarian cancer. As in many organizing systems, the imposition of categories and hierarchy often reveals patterns in resources, which can alter how users think about the resources. A glance at TCGA’s scheme demonstrates that the histology and anatomical origins of a tumor continue to drive how we analyze cancer.

HOW MUCH ARE THE RESOURCES ORGANIZED?:

Resources are organized at multiple levels at varying levels of granularity. They are almost always organized by intrinsic static properties; age of files does not matter as much as the content, which should remain unchanged over time. Some of the organization draws from conventional medical standards that are used in many other domains.

Other levels of organization are based on standards drafted by TCGA or other organizations. An example of this can be seen in the TCGA defined .maf file, which contains information about mutations. Each mutation has several required fields, such as position and allele, and each field must adhere to a predefined format. Presumably, if a resource is not formatted correctly, a researcher will not be allowed to upload it, and so this control is fairly strict. This is with good reason, as poor or inconsistent organization within files will cause problems when computational analysis is run.

WHEN ARE THE RESOURCES ORGANIZED?

Resources are organized as they are added to the collection. When the site was first created, the staff of TCGA defined the categories and hierarchies of the organizing system. Currently, unaffiliated researchers deposit the resources into this preset system. The timing of resource addition to the collection usually coincides with the timing of a related publication. Researchers will likely only deposit their data after a study is finished, to minimize chances of a competing publication.

WHO DOES THE ORGANIZING?

Organizing is a joint effort between the researchers and organizations that collect the data, and those who curate the TCGA collection. TCGA maintains certain standards for formatting the different resource types and defines the overall structure into which resources are deposited. However, processing each tumor is also an organizing step. At the very lowest levels, when researchers “create” the resources, they must make some of the messier decisions, such as what stage a cancer is in and what quality of mutation is considered are valid. In this sense, even though the TCGA will organize the physical files and provide structure, researchers play a role in adding the description and metadata that will determine where the resource is sorted.

OTHER CONSIDERATIONS:

It is likely that certain technical considerations will change over time, such as file formats, the sequencing platforms used, and the types of data available. It will be interesting to see how the site adapts and reorganizes, and what it does with older data, which may not be as high quality. The organizing principles of the collection will also likely parallel research developments. If patient age, for example, is shown to be more significant than histology, this change in research trends would likely be reflected in the organizing system.

Organizing Resources from Online Shopping Websites

OVERVIEW:

Online shopping websites themselves are usually complex organizing system. It would be very challenging to aggregate their resources for the purpose of searching or sorting. I used to work on a light project to leverage several shopping sites and offer top 4-5 best matched instances in one search result snippet. Then a lot of interaction design issues came up. The project grows larger and larger that we ended up building an entire organizing system (http://gouwu.sogou.com/) based on resources from different sites. We were also able to define the description format, and eventually offer a unified API for all partners to updates their resources in real time.

WHAT RESOURCES ARE BEING ORGANIZED?

At stage one, our service was able to integrate 8 different websites: taobao.com (ebay-like C2C site and the only C2C website), tmall.com (amazon-like general B2C website), amazon.cn (local version of amazon.com), 360buy.com (B2C, specialized in electronic devices), dangdang.com (B2C, specialized in books), vancl.com (B2C, specialized in clothes), m18.com (specialized in bags), and yhd.com (B2C, specialized in grocery).

As an aggregator, our system didn’t worry about the physical resources in our partners’ warehouse. All we wanted to organize were digital information about products, such as texts and images. We also didn’t maintain any information about users’ profile or transaction history. It should be pointed out that we spent a lot of time designing the log format: what information to be logged and how to maintain those information. From my experience, the log information is a crucial part of online organizing system and it should be built along with any systems rather than after. Proper analysis of user log could clarify the requirements and verify the system design in any spiral or iterative lifecycle.

WHY ARE THE RESOURCES ORGANIZED?

We would like to build an organizing system to provide convenient and intelligent search function as well as independent ranking service to our end users. The affordance of our system enabled large-scale and cross-system searching and ranking. Therefore, our system targeted at users with clear search goal and need for price comparing.

On the other hand, we would like to recommend good sites to our end users. A big concern of our end user was the credibility of the online shopping websites. The concern sometimes limited their choice in term of price comparison. To address this problem, we carefully selected the sources of the instances in our system and we tried our best effort to make the information timely and accurate.

HOW MUCH ARE THE RESOURCES ORGANIZED?

Scope-wise, an obvious obstacle is the heterogeneity of resources. By nature, some products could be grouped and comparable like camera, cellphone with the same model or serial number, while others could not, like clothes. We accordingly design two user interface templates and eventually unify their interaction process. Meanwhile, different categorization systems from our partners also cost us lots of effort to achieve integration. Apparently there was no standard or ecosystem yet. We therefore define the granularity from scratch and managed to convert their categorization systems.

Lifecycle-wise, we went through several iterations at the first stage. E-commerce was a hot market at that time. Most of the domain-specific shopping sites later expanded their domain and changed their categorization systems to some extent. We had to address this problem and adjusted accordingly, which echoed the characteristics of complex organizing system lifecycle. It took us some time to redefine our goal and the users of our organizing system. Once the goal of our system was defined, the design and implementation could be clearly carried out. We eventually cut some unnecessary features (like total numbers of similar items) and defined an API to get the description format.

WHEN ARE THE RESOURCES ORGANIZED?

As one of our first priorities was to make the information timely and accurate, we discarded the crawling method used in other vertical domain. Instead, we requested the information of the instances to be transferred immediately after published in our partners’ site. However, we still designed a verification process seconds before the instances store in our system. For example, we automatically checked the URLs availability and information accountability to filter spam.

WHO DOES THE ORGANIZING?

The whole organizing process was automatically carried out by our algorithms. We also apply data analysis techniques to monitor the performance of our algorithms. The log we carefully designed at the beginning came into play here. Our product manager would receive daily report and analyze possible issues. We also have real-time warning system in case of unusual situation.

OTHER CONSIDERATIONS

Despite the progress we made, there were still some issues to be solved. The category systems from our partners are evolving and our system should be able to adjust easily. For example, e-books gain more and more popularity, and now the user need is large enough to make it independent from books category.

Organizing Votes Against Voter ID in Minnesota

Overview

An amendment to limit voting rights was proposed to the Minnesota constitution in 2012. With many similar proposed laws popping up across the US and heavy support in Minnesota, a coalition of Minnesota organizations set out to convince voters to defeat the amendment. Concurrently, an effort to defeat an anti-gay marriage constitutional amendment was underway, and swallowed resources and attention that the voter ID amendment may have otherwise gotten. In May 2012, a poll showed 80% of Minnesota voters in support of the amendment.

What are the resources?

The main thrust of this organizing campaign was how to strategically organize volunteers, voters, and money to produce a simple majority against the amendment.

The campaign carefully chose voters to persuade and aggressively recruited volunteers to reach more voters. Initially, the campaign used the public voter file and roughly categorized voters based on their voting history. Factors like party registration and likelihood to go to the polls played into decisions about resource selection, usually called list segmentation in this context.

Volunteers were tasked with moving voters who were pro-amendment but likely to jump categories, and to then document whether the interaction yielded movement on the 1-5 scale.

Why are the resources being organized?

The voter-volunteer interaction is designed to produce data about voters to be folded into algorithms that produce subsequent call lists.

The persuasion interactions turned the voter file into increasingly meaningful data with growing granularity. The voters that could initially be thrown into only 5 categories could be segmented much further: towards the end of the campaign, we could create a list of all registered Democrat veterans living in zip 55407 with valid phone numbers with whom we’d had at least two conversations. Conversations with voters enriched allowed for a sharpening of volunteer efforts towards more strategic persuasion.

How much are the resources organized?

Voter data was organized iteratively throughout the campaign as more and more data came in.

Volunteers noted additional descriptions such as veteran, having tribal ID, disabled, or unable to renew driver’s license, or otherwise personally affected by the amendment. Sometimes this identity “tagging” was used to match volunteers of a particular identity to similar voters for more effective conversations.

During the campaign, all volunteer-voter interactions were designed to describe voters in greater detail, which helped automatically filter out voters not receptive to conversations about voter identification. Based on certain behaviors and outcomes during volunteer-voter interactions, some voters were labeled Do Not Call, Wrong Number, or Deceased, handily eliminating them from future lists.

After the election, the campaign data could be used again, but the archived data was all that remained. The volunteers and voters don’t persist as campaign resources once votes are cast.

When are the resources organized?

Volunteers were trained in rapid voter classification before door-knocking or calling. Based on responses to the script, volunteers categorized voters with whom they interacted with a ranked resource description. Voters were given a number between 1-5 depending on whether they were likely to vote for or against the amendment, or somewhere in the middle. Their rankings were kept on file and those ranked anywhere from 2-4 received repeat calls during the course of the campaign to gauge the effectiveness of the persuasion tactics. The campaign decided to save resources by deeming anyone who was strongly in favor of the amendment as “immovable”.

Removing the extreme voters made sense at the beginning when the odds of beating the amendment were low and limited resources had to be used extremely frugally. The cost-benefit analysis changed once the campaign garnered more attention, and the campaign shifted to spend resources to reach these extreme voters: the supportive ones to ensure that they hadn’t been swayed by the opposition, and the opposing ones to begin the hard task of swaying them.

Who does the organizing?

The coalition of organizations behind the vote no efforts orchestrated the campaign’s organizing system.

The slate of volunteers working on the campaign grew in granularity as the campaign moved forward. Volunteers specialized by taking on particular roles within the campaign and called or visited voters within their district of Minnesota. Some climbed the hierarchy of leadership. As the campaign progressed, more categories of leaders were added to the leadership structure.

Other considerations

Because this organizing system is all about people organizing people, resource description is particularly difficult and subject to bias. All the “data” that results from persuasion interactions is necessarily fuzzy and one 4 is not necessarily equal to another 4 in likelihood to vote a particular way. However, a no vote from one person is interchangeable with a no vote from another. Conducting effective conversations with the right individuals has the potential to yield scores or hundreds of no votes because of the social capital and persuasive power of some individuals.

Interestingly, if a volunteer reached a wrong number with a receptive human at the end of it, they were advised not to spend time persuading them. Because the lists are produced ahead of time and individual volunteers didn’t have access to the voter file, they couldn’t match a receptive ear on the other end to a particular voter. The campaign couldn’t rely on conversations that weren’t documented.

Furthermore, it would be a problem of authentication even if they had access: how would they prove the voter’s identity, or resolve the name problem of many potential voters by the same name? Some volunteers flouted this advice and tried to convince the unknown individual to vote against the amendment anyway. Whether these undocumentable conversations resulted in wasted time is unknown, but in the end, voter ID was defeated.

Automatic description at this relatively small scale is impossible due to scarcity of financial resources: it took the massive efforts of many to gauge the political sentiments of others and make predictions about future behavior. At the scales of national elections, mountains of data and data scientists made it possible for the Democratic party to segment the voting public, test message effectiveness, and conduct remarkably successful fundraising, $5 at a time.

Comic Book Characters

Overview
There are a lot of back-stories and alternate versions of characters in the comic book mythology and few people are able to keep track of everything. There is also a growth in the number of movies based on comic books and fans may be interested in learning about their favorite superhero or villain. DC Entertainment decides that they want to create a database that both their creative teams and fans can use to find information on prominent DC characters.

What resources are being used?
The resources being organized are the characters within the DC comic book universe and their descriptions. Other companies and characters from independent are not included because they’re not a part of the DC universe. The size and scope of the organizing system will be limited to only characters that have a lasting or important impact in the DC universe of characters. This will allow DC to focus its efforts on characters that draw more attention from fans. What is considered important or impactful can be ambiguous because many fans have biases about what is important or impactful. The definition used here will be the number of sales and number of years in print. So a character that was a commercial failure and has only been in print within a short time is outside the scope of the organizing system.

A relational standard will also be used to determine who is important and impactful. This means that characters with a significant relationship to these characters will also be included. So even though Damien Wayne is a relatively new character, he is included because he is Bruce Wayne’s son. Characters who have been a part of pivotal plots are also considered, such as Doomsday. Even though Doomsday does not have a long print history, he is included because he is famous for killing Superman. Since the organizing system is not to create an archive but to record prominent characters, there will be a significant number of characters that will be left out. All this information will be stored on DC Entertainment’s web servers and can be accessed online through the DC Entertainment website.

Why are the resources organized?
The resources are being organized because it is difficult keeping track of all the mythologies. Writers and artists need to remember what was done in the past in order to not repeat certain facts or get them wrong. DC Entertainment is creating the organizing system so that their creative teams can keep track of these facts. It also provides information for curious fans that want to read more about a character, which fosters interests and new customers for DC.

The goal does not include archiving or providing its own critique of characters or even providing the critiques from notable writers and artists. The organizing system is also not meant to sell books either. Doing so would make the organizing system a store rather than a database, and DC has their own online store available.

How much are the resources organized?
A basic biography with background, family names, first appearance, how each person achieved their hero or villain status, or their contribution to the comic book universe. A character’s importance includes the relationship the character has with another prominent character. The varying details between different interpretations are also included, such as changes in super powers, secret identity, or back-story. This will put more constraints in granularity, as many details that appear important to devoted fans will be left out. Some may argue that extensive biographical information is necessary for each character, but the purpose is to help with the basic facts of characters rather than creating extensive biographies. It may also be necessary to create family trees in order to show relationship between characters.

When are the resource organized?
The database would be created long before it is launched but would start as soon as DC Entertainment approves of the project. There would also be scheduled updates and maintenance as new characters and plot developments are made with each new issue. Much of the organizing may already be done as DC Entertainment most likely has its own historical archives. Organizers will also need to filter out information that does not fit the scope of the organizing system.

Who does the organizing?
Employees of DC Entertainment’s historians and web team will be involved in creating the online database. Members from marketing may need to help to distinguish what should be considered a financial success and what titles are have been more consistently popular over the years. If needed, DC Entertainment may also allow fans to help organize the resources. This itself would require much organization to ensure that volunteer contribution fits the scope of the organizing system. This may have some drawbacks, as control over the organizing process would be harder to maintain depending on the number of volunteers involved. But once the database is finished, DC’s web team and historians would be in charge of maintaining it.

Other considerations
The organizing system may grow and DC may change the scope of the database to include unknown characters or more extensive biographies. So the system must be implemented to allow flexibility. DC may also think about adding an API so that fans and its web developers can use the information for other purposes. Other interactions may also be considered such as allowing fans to make specific queries. For instance, a user might be curious about the number of LGBT characters in the DC universe. Doing so will requiring tagging each character to allow easier searches, which allows another interaction for users—tagging. This will also require a controlled tagging vocabulary to ensure that resources are properly tagged. The database might also be so popular that it may merge with its online store in order to sell more comics. Doing so would then change the organizing system from being a database to more of a store.

I202 | Information Organization and Retrieval

UC Berkeley Fall 2013 INFO 202

Category Archives: Case Studies

The Organizing System for Cricket

Lynea Lattanzi and The Cat House on the Kings

Assignment 10

Personal Movie Collection

Asha Tea House: Boba Tea Shop in Berkeley

Shoe Retailing and at Zappos.com

Organizing Digital Tumors

Organizing Resources from Online Shopping Websites

Organizing Votes Against Voter ID in Minnesota

Overview

What are the resources?

Why are the resources being organized?

How much are the resources organized?

When are the resources organized?

Who does the organizing?

Other considerations

Comic Book Characters