Category Archives: Assignments

Knowledge Management Case Study – Ruchita

Overview 

Customer experience has become one of the key performance indicator for predicting success or early exit of a fledgling enterprise software company. When shifting the reference point from revenue-driven processes to user experience driven business, the business must rethink how to architect information flow within and outside the company for providing a seamless user experience.

The enterprise software startup considered for this case study was also facing several challenges that was starting to have negative impact on the end user experience. The software processes were ad-hoc, it was difficult to capture customer touch points, drive the open source developer community knowledge base, and was challenging to seamlessly weave the story for customer use cases around the product platform.

The customers experienced a broken software experience across various dimensions like deployment, customer interaction, training, and documentation.

What resources are being used? 

As a system designer, it was important to identify information components that had confounded information content. This includes the product documentation, blogs, training material, customer support tickets, and software tickets.The variability in information component implies a large and diverse set of resource descriptions causing categorization challenges for the entire set. In order to classify the diverse set of resources, common properties must be used to identify higher level of abstraction in order to balance precision tradeoff. We organized the resources based on broad categories like posts, product documentation, infographics, and customer tickets. However, even after categorizing the resources based on higher level of abstraction, there was another problem of separating the resources that pointed to the same instance. For example, knowledge base articles and product documentation sometimes referred to the same resource instance. Some resources were composite resources used by cross functional teams like customer support tickets were often used by the product marketing team to extract usecases. This situation caused ambiguity about the number of parts in the resource description decision. It was also important to separate the duality of the tangible embodiment of resources with inherent abstract information resource. For example, the resource distribution channel for example, print, ebook, blog posts, etc. also has the impact on how resources will be organized.

Why are the resources organized? 

The teams were facing vocabulary and interoperability issues because of the different document formats being used across various teams. The two teams for example, product documentation and training teams were not able to talk with each other due to absence of single sourcing and common interoperability issues. Additionally, there were several resources for multimedia like videos for training sessions that had separate organization issues. These multimedia resources would have to be classified based on the interactions that are qualitatively different than the other digital resources.

How much are the resources organized? 

Some resources must be organized based on the intrinsic static properties like category of the resource etc. Other resources must be organized based on the extrinsic dynamic properties like customer salesforce tickets and knowledge base articles must be organized simultaneously. Decisions about the organizations are based on the business goals of providing seamless customer interaction with the product. Because most of the content flow is intertwined in order to provide an end to end product experience it is imperative for the system designer to consider the information flow across different resource components. The XML based resources would follow DocBook standard for organization and interoperability between several platform. An example of these resources would be the documentation and blog posts.

When are the resources organized? 

The resources must be organized during creation.  Additionally, the resources would also be organized during the maintenance phase in order to maintain the archives based on various versions of the product platform. For the community driven content, (especially the JIRA tickets filed with the open source community) the organization would mostly happen during creation and triage phase.

Who does the organizing? 

The contributors, authors, curators, and the automated system are responsible for the content creation and organization. Some of the organization would be done during the creation based on the single sourcing publication solution that would be implemented. Additionally, the end users of the system will also do the organizing by tagging resources like customer tickets and use case based articles.

Other considerations 

This was an ambitious project considering the umbrella of components it covered. The most challenging part would be sweeping in the open source content because the company had no control over the form of the content received. Additionally, as the company grows the challenges with content production and localization are also inevitable. Currently, the support center and product documentation teams were able to talk with each other seamlessly, however product marketing and training teams could still not avail of the single source publishing pipeline because of legacy content. Merging these two teams would be another challenging aspect for the system designers.

PeerLibrary: Organizing the world’s scientific knowledge

OVERVIEW

Since the dawn of science, it is estimated humans have published approximately 50 million scientific articles. In the biomedical and social sciences alone, there is a new publication added approximately every minute of every day. Discussed here is one novel organizing system that is currently under development– a web application called PeerLibrary— that strives to create an enriched, collaborative experience around navigating this corpus of knowledge.

WHAT RESOURCES ARE BEING USED?

Scientific articles are the primary resources being organized by PeerLibrary. The entire service is centered around access to and collaborative human-generated description of these articles, in the form of digital document annotations. Resource descriptions are extracted from each article, including author names, the abstract text, publication year, and journal information. Because many articles are not available in HTML/XML formats, descriptions provided by journals, users, and through Optical Character Recognition must unfortunately be relied on. Additionally, it is notable that users in this system are not only interacting with the articles, but also considered resources themselves as potential authors of both articles and annotations.

One major design consideration in building this system was determining exactly how articles would be selected and added to the PeerLibrary resource collection. While the overall goal is to make all of the world’s scientific knowledge open for discussion, to dynamically add every article as it is published is impractical. APIs to journal and archive databases help fill in much of the resources, but complex issues such as user authentication to access articles behind paywalls and timely discovery of newly published work persist. An Import feature was added to the current application version to allow users to quickly be able to access all of their initial resources of interest in the cloud. Of course, this useful feature poses a challenge for the organizing system because it provides opportunities for duplicate articles to be created and added to the resource collection. The resource descriptions, such as article title, year of publication, and author names must be used consistently to properly handle duplicate cases.

The matter of scoping which kind of resources would be allowed into the collection is also an important, non-trivial conversation that occurred early in the development process. Lots of different kinds of articles could benefit from collaborative annotation, including fiction writing, news stories, and humanities research. However, building a tool that specifically supports interactions around scientific inquiry narrows the types of resource descriptions that are possible while maximizing their utility. For example, entities like abstracts can be consistently used as comprehensive previews of articles, journals usually provide information about the field of study that the article is primarily relevant to, and authors can be connected to articles, fields of study, and to other authors in a meaningful social network.

WHY ARE THE RESOURCES ORGANIZED?

Historically, publishing of scholarly literature has been a practice that exploits the research community while creating lucrative profits for publishing companies. To drive science forward, researchers need access to the highest quality and most relevant past work that can inform context and decisions for current and future studies. Furthermore, it is not enough to allow articles to be openly and freely accessed. There is increasingly a need for a space to openly exchange knowledge, feedback, and insights about the conducted research. PeerLibrary recognizes that researchers need access to intuitive collaboration tools in order to get used to being in this open science mindset. Somewhere down the road, this approach might help build a more open, sustainable, and high quality peer review system.

HOW MUCH ARE THE RESOURCES ORGANIZED?

When scientific articles are added to PeerLibrary, they are parsed for all of the resource descriptions discussed earlier, and then added to a NoSQL MongoDB database right away. The documents are added to the collection in a very unstructured way and saved as independent Document objects, each containing resource descriptions in structured fields. Unlike in systems with more heterogeneous resource description formats, PeerLibrary worries minimally about how documents are organized in the collection and instead relies on the structure of these resource descriptions to facilitate user interactions such as searching for articles.

WHEN ARE THE RESOURCES ORGANIZED?

Article resources have the potential to be organized by any resource description properties as soon as they are added. Users dynamically organize articles into sub-collections of interest by narrowing their search with keywords, specific authors, publication date ranges, and other descriptors. Users can also share pointers to articles and specific annotations with others, creating the possibility of group collections.

WHO DOES THE ORGANIZING?

The organization of articles in PeerLibrary is done largely by the users. The crowd sourced approach of this tool allows users to select articles relevant to personal or group research interests and read, share, discuss, and export citations for. While journals impose structured descriptions of the articles, the collaborative layer of user-generated knowledge that PeerLibrary creates enables each user to create custom collections.

OTHER CONSIDERATIONS

An notable complication in this system is that many users who would like to sign up for an account to read and annotate documents might be authors of articles contained in the resource collection. To protect against duplicating a person’s identity as a user and author, a decision was made to create a Person identity for each unique author that is in the current corpus. When a new user goes to create an account at PeerLibrary, the system first checks whether the information they provide is a potential match for a Person that is already in the system. If so, the User and Person identities get linked together to avoid the vocabulary problem of synonymy and provide an richer user experience. When a group of authors publish a new article, it is also important that this article be listed under the correct individuals’ profile pages. Occurrences such as multiple researchers having the same name, individuals changing their names, and inconsistent formatting of name strings across an individual’s publications (e.g. including a middle initial or not) pose significant obstacles here.

Assignment 11 – Reflections on Alumni Day (Optional Extra Credit)

(This is optional extra credit that we’ll just tack onto your cumulative
score for the other assignments; this assignment is worth 1/2 of an
assignment)

Write about 500 words as a reaction to one or more of the talks by Zach
Gillen (Kaiser), Ryan Greenberg (Twitter), Thejo Kote (Automatic), Bryan
Rea (Google), or Igor Pesenson (Salesforce). Analyze how their talk
applies to one or more of the 202 lectures this semester and demonstrate
with your cogent use of 202 concepts and your knowledge of the course
readings that you’ve benefited from coming to class.

(This is due before you take your final exam next week, because I want
that exam to be the end of the semester for you. Send your assignment
to me as an email attachment).

(I don’t expect most of you to do this. Numerically this isn’t very
significant with respect to your final grade, and you need to get your
Assignment 10 case study written, and you need to study for the final
exam. But I’m offering this optional assignment as another chance to
think about what you learned in 202, and perhaps those of who didn’t
show me what you knew on the midterm can use this to demonstrate a
little bit more that you get 202…)

Assignment 10 – Organizing System Case Study

Due: Thursday, Dec 12th 2013, 9am

For this assignment, you will write a case study of an organizing system in the domain you have explored throughout this class.  The assignment is intended to help you review and synthesize what you have learned this semester. Treat this assignment as part of your studying for the final exam.

You must select a particular organizing system, not a class of organizing systems, for this assignment. For example, “my personal photo database” would be an appropriate organizing system for this assignment, but “personal photo management” in general would not.

Your case study should have the same sections as the case studies in Chapter 10 of The Discipline of Organizing:

  • Overview (1 pt)
  • What resources are being used? (2 pts)
  • Why are the resources organized? (2 pts)
  • How much are the resources organized? (2 pts)
  • When are the resources organized? (1 pt)
  • Who does the organizing? (1 pt)
  • Other considerations (1 pt)

In each section, consider explaining the alternatives and their ramifications. When making each design decision, your system’s designer chose between several alternatives. Describe each of those alternatives, their ramifications and the reasons for the designer’s final choice.

Also, when appropriate, identify the challenges unique to your system. Describe the puzzles, problems or dilemmas your system’s designer faced that are unique to your system. When possible, also describe their solution.

Whenever possible, use the vocabulary of The Discipline of Organizing and this class. For example, instead of saying “We struggled with how specifically to describe each item,” say “We struggled with resource description granularity.” Although you might use a more domain-specific vocabulary in your work, use TDO vocabulary in this assignment to show us you understand it.

Refer to both chapter 10 and the organizing systems design decisions described in chapter one for example case studies and other issues to consider.

Your entire assignment, including section headings, must be 1,000 words or less. The best case studies will be included in future editions of The Discipline of Organizing.

Submission Instructions
Post your case study to the 202 blog at http://bit.ly/1iw9z5o.

Check the “case studies” category on the right and tag your post with your TA’s first name.

(To do this, you will need to log in with your I School username and password. If you have log-in issues, contact Fred at fchasen@berkeley.edu with your I School username. If you run out of troublehooting time, email the assignment to your TA.)

Assignment 9 – Text Toolkit and Document Analysis

Due Date: Thursday, December 5th, 9am

Assignment Overview

In this assignment, you will:

  1. Learn how to use a toolkit for text analysis
  2. Learn about stemming and using stop words
  3. Create an index for documents in a collection
  4. Process a search query and return relevant documents
  5. Reflect on your experiences

Download as: Word | PDF

Text files for the assignment:

Submission Requirements

You will submit a file called YourNameA9report.pdf, which will include

  • Short answers to each of the reflection prompts below.
  • The chart that you will create in Part 4.
  • Your vector plot of your search query from Reflection 4.

Instructions

Peter Holme’s word stemmer is a web-based text analysis tool which walks you through the processes of stemming and analyzing documents. These are the steps one would go through while creating an index of a document which can be used for IR.

Part 0

  1. Go to http://holme.se/stem/
  2. Open A9_text.txt This file contains text you know well, the first three paragraphs of TDO.
  3. Read the text, copy it and paste it into the Text field in the word stemmer.

Part 1

  • Click the “Don’t use any stopwords” checkbox in Extra super fun stuff.
  • Increase the number of words for word count to 100.
  • Leave all other settings at their default.
  • Click Send.
  • Look at the results in Step 1 on the right.

➪ Reflection 1 (1 point)

  • Why did some words disappear?
  • Why do we want to remove these terms/words and what are some problems that may arise when we do so?

Part 2.1

  • Change the minimum number of characters in a word to 1.
  • Run the analysis again. Look at the results in Step 1, and note what got added back and what is still missing. Now, look at Step 2 and check out how words were stemmed.

➪ Reflection 2.1 (1 point)

Pick three stemmed words from the list, and for each, answer the following:

  • Why was that stem chosen?
  • What problems might arise with that stem?

Part 2.2

  • Next, look at the Word Count list in Step 4: Notice the number of words and the frequency.
  • Are there any words that surprise you in this list?

➪ Reflection 2.2 (1 point)

  • How much about the article can be inferred from the words on this list?
  • Looking at the word frequency, where would you draw the line to eliminate stop words?

Part 3

  • Uncheck the don’t use any stopwords checkbox.
  • Take the top 10 words of the resulting word frequency list from the last analysis that you think should be used as stopwords. Enter them into the stopwords box and then run the analysis.
  • Copy the resulting Word Count list (Step 4) into a table or spreadsheet.
  • Next, delete the list you just added to the stopwords and run the analysis again. (When the stopwords box is empty, it will run using a default list of stopwords, which you can see in Step 3).
  • Copy these new Word Count results (Step 4) into your table and compare them with your previous results to see how the list and frequencies changed.

➪ Reflection 3 (1 point)

  • Why do you think the system chose some of the stopwords? Are there any words that surprise you?
  • Why are the stopwords stemmed?

Part 4

  • Run A9_text2.txt, A9_text3.txt, and A9_text4.txt through the system and copy each resulting word count list into separate parts of a spreadsheet. Now you have an index for each document in your collection (note that in a real system you would want the full index – for simplicity’s sake, we’ve cut it off at 100 terms).
  • Now you get to play the part of a search engine. You will now manually run the query “Organizing System” on your collection of three documents.
  • For each document, find the term frequency (tf) for each term in the query. Record each calculation in a chart like the one below.
  • Calculate the idf for each term. List it in your chart and show your work in Reflection 4.
  • Calculate the tf-idf for each document. Again, show your work in Reflection 4.

➪Reflection 4 (6 points)

  • As indicated before, show us your work in calculating tf, idf, and tf-idf. (2 points)
  • In a simple graph, draw the vector plot for your search query (you can do this by hand or in software). (2 points)
  • Which document appears to best fit your query? Why? (1 points)
  • What does tf-idf allow you to deduce that using term frequency alone does not? (1 point)
Term tf
for doc 1
tf
for doc 2
tf
for doc 3
idf tf-idf
for doc 1
tf-idf
for doc 2
tf-idf
for doc 3
organizing
system

Extra Credit (1 point):

  • Calculate the cosine similarity between your top two documents and the query. Show your work. Which is now the best fit for your query? Why?

Assignment 8 – Search UI Evaluation

Due: Thursday, 21 Nov 2013, 9am

For this assignment you will analyze the search interfaces of Wine.com and Zillow.com through the lens of organizing systems presented thus far in class. Familiarize yourself with their search functionality by performing a few searches (say “pinot noir” for Wine.com and “Berkeley, CA” for Zillow.com). Screenshots of each website are attached so you can see the general area you ought to be looking at.

Write 450-550 words for each of the following questions (for a total of 900-1100 words). Submit a single file FirstnameLastnameA8Report.pdf that contains all parts of the assignment.

wine.com advanced search: http://www.wine.com/v6/search/advancedsearch.aspx

zillow.com search: http://www.zillow.com/homes/berkeley,-ca_rb/
click on the filter buttons to the right of the search bar to bring up the advanced search boxes, if they don’t show up automatically

Part 1 (4 points)

Analyze the search interface and the search results using the key concepts of organizing systems we’ve covered this semester. DO NOT concern yourself with whether the results are actually good or if they fulfill the user’s information needs. Focus on what these interfaces reveal to you about the underlying organizing system. Specifically, comment on each of the following:

1. Classification principles

2. Resource description

3. Granularity

4. One issue of your choosing.

Provide 1-2 specific examples for each, to illustrate your points.

Part 2 (6 points)

Discuss 3 specific design issues from the Hearst chapter and for each issue, evaluate how well it is satisfied in both applications.

Bonus (1 point)

Suggest a way that either of these search interfaces could be improved, justifying your suggestion using one of the design issues from the Hearst chapter (maximum of 50 words extra). If appropriate, you may use one of the issues you brought up in Part 2.

Screenshots

wine zillow-0 zillow-1

Assignment 7 – Analyzing Interactions in an Organizing System

Due: Thursday, 31 Oct 2013, 9am

Submission Requirements:

  • One ZIP file named FirstnameLastnameA7.zip, containing two files:
  • One PDF named FirstnameLastnameA7Chart.pdf containing a spreadsheet of parts 1, 2 and 3
  • One PDF named FirstnameLastnameA7Report.pdf containing your domain sentence (part 0) and reflection (part 4)

In this assignment you will:

  1. Restate your domain and clarify its scope
  2. Make a list of potential interactions in your A1 organizing system
  3. Analyze the designed interactions and create equivalence classes
  4. Further classify them based on a few sets of criteria
  5. Reflect on the assignment

INTERACTIONS include any activity, function, or service supported by or enabled with respect to the resources in a collection or with respect the collection as a whole.

Your initial task in this assignment is to identify the potential interactions in your Assignment 1 Organizing System. You will then analyze and classify the interactions, and in doing so you will be refining the scope and scale of your organizing system.

Part 0. Write one sentence telling us your domain and its scope.

We know you already submitted a domain in A1, but you man have changed or refined it, or adjusted the scope. This will give us an idea of where you are with that now.

Part 1. Make a list of the potential interactions in your Assignment 1 organizing system. (3 points)

Be creative here. Don’t be limited to the interactions that you can see in other instances of organizing systems in your chosen domain. Aim to have no more than 10 interactions.

If your organizing system is in a domain with physical resources, you should focus on what TDO calls the “designed interactions” that are enabled by intentional acts of resource description and organization. (The same is true if your domain is digital, but since everything must be designed in a digital domain, it’s less of a potential problem.) We don’t want you analyzing interactions that are completely intrinsic and “non-designed”. (Recall the sidebar example in Chapter 1 about the “Digital Zoo” – if you visit an actual zoo, “viewing the animals” is an interaction that doesn’t take much design effort because “viewing” is pretty intrinsic for physical resources. In the digital zoo, however, instead of “access just by looking” you’d need to design and implement some “access via technology” mechanisms for locating and viewing the remote animals).

Some interactions are quite generic and are probably or potentially supported by every organizing system. Examples of generic interactions could be selecting a resource from a set of candidates in a collection, finding a resource that you already know exists, and discovering the identity of a particular resource. Include these in your list, but contextualize them to make them descriptive and relevant.

Again, keep your list to no more than 10 interactions. If you find yourself coming up with significantly more than that, you may need to adjust the granularity of the interactions you’re choosing, or modify the scope of your organizing system.

Part 2. Analyze the designed interactions and create some equivalence classes for those that are similar in function and or implementation. (2 points)

In part one, you will have come up with a collection of all the interesting interactions you can contemplate for your organizing system. In this part, you will classify them. You will want to pay close attention to the granularity of the interactions that you are classifying. For example, some organizing systems may themselves collect information about every single interaction with its resources, the context of that interaction, etc. This “information collection” interaction may be something you want to record and classify, but you would probably not want to list every possible variation or report each one as a separate interaction.

Record each of your designed interactions and their equivalence classes in a spreadsheet based on the attached sample.

Part 3. Classify the interactions based on the following criteria: (3 points)

Record these in the same spreadsheet as in part 2, in the 3rd, 4th, and 5th columns as indicated by the sample.

A. Classify the interactions into those that are 1) based on specific resources (one at a time) or 2) interactions that utilize collection-level properties (i.e. the collection as a whole). (1 point)

B. Separately (independently from part A), classify the interactions into those that are 1) initiated by a user of the organizing system, 2) initiated by the resources themselves, or 3) performed with “mixed initiative” in which both the user and the resource initiate some aspect of their joint interaction. Explain why you chose each classification when you feel it isn’t self-explanatory. (1 point)

C. Take the interactions which you classified in part A as being based on specific resources. Further classify them into those that involve 1) interacting with the original resource, 2) interacting with descriptions of the original resource, or 3) interacting with copies of the original resource. (1 point)

Part 4. Reflect on the assignment (2 points)
Write a couple of paragraphs reflecting on the assignment, including but not limited to:

  • Did the number/type of interactions you found, or the difficulty/ease with which you thought of them, tell you anything about the scope of your organizing system?
  • Did you find yourself adjusting your scope in mid-exercise? If so, how did you go about this? If not, moving forward, how might you adjust your scope to make it more manageable in terms of identifying and classifying interactions (if needed)?
  • Did you find describing or classifying any of the interactions particularly difficult? If so, what was it about those interactions that made it hard?

Starter spreadsheet for parts 1, 2, and 3: Assignment 7 – Spreadsheet

Assignment 6 – Hierarchical Classification

Due: October 24, 2013, 9 a.m.

The goal of this assignment is to give you practice thinking about categorization and category membership, abstraction, classification, and ontologies. Pretend you are a prehistoric human, just starting to think about abstract concepts. You come across 15 animals, and decide to organize them into a hierarchical classification scheme, using properties and behaviors you can directly observe. (We are setting the stage this way to emphasize that we want you to create an intuitive classification and not a scientific one. We do not want to you consult authoritative biological classifications. Please don’t fret about the impossibility that a prehistoric human might encounter all 15 of these animals).

In this assignment, you will:

  • Define equivalence classes for 15 animal instances
  • Sort those classes into a hierarchy of animal types
  • Create a diagram of your ontology
  • Write definitions for each part of your ontology using hypernyms and hyponyms
  • Reflect on your experience.

Submission Requirements
You will submit a zip file named FirstnameLastnameA6.zip that includes 3 files:

  1. A spreadsheet (modeled on the sample spreadsheet attached here) called FirstnameLastnameA6.pdf
  2. Your diagram, which will be called FirstnameLastnameA6.pdf
  3. A brief reflection (FirstnameLastnameA6Reflection, in pdf format)

Regardless of what method you use to make your diagram, please submit all of your files as a pdf so we can be sure we’re seeing what you intended for us to see.

Part 1: Identify Equivalence Classes (3 points)
Remember that an equivalence class, also known as category or type, is a way of specifying that a group of resources should be considered the same thing in a given context. With that in mind, look at the 15 animals here: https://www.dropbox.com/sh/bdopy1ntiuzgjiy/6Km8_T8Sah
Using this attached spreadsheet, list the 15 animals using the animal name listed on each file (i.e., Fish, Penguin, Eagle, …).

For each animal instance, identify a possible equivalence class to which the instance belongs. For example, if we were classifying musical instruments, and you were creating an equivalence class for a drum set, you might pick something like “rhythm instrument”. Depending on the other musical instruments you were organizing, you might want a more abstract (instruments) or less abstract (four piece drum sets) equivalence class. You are making a choice about the level of abstraction you use.

As you are making your first pass through the instances, do not worry too much about naming these types. Organizing is an iterative process. You are likely to go back to them and revise them as you progress through the assignment. If the ambiguity is making you a little crazy, our advice is to come up with a placeholder and move on; new naming ideas might surface as you start to arrange your hierarchy.

Focus on what is observable from the image, and not from further research. You may use what you already know about the behavior of the different animals as long as it is something that can be observed (e.g. birds fly, fish swim, giraffes walk).

Part 2. Organize Your Equivalence Classes into a Hierarchy (2 points)
Now that you have identified equivalence classes for each of the animal instances, begin arranging the classes into a hierarchy. The root element of your hierarchy will be “Animals.” The leaf level elements of your hierarchy will be the specific animal instances. When you created your equivalence classes in Part 1, you added a second level to the hierarchy — more abstract than your instances but less abstract than “Animals.” Take this one step further by adding one more level of abstraction — a new level between your equivalence classes and “Animal.” These are hypernyms or “super-types.”

At this phase of the assignment, it’s important that you strive for a consistent level of abstraction among your “super-types” (hypernyms). For example, if “musical instruments” was our root and the next level down included the groupings “clarinets” and “stringed instruments” that might be a sign that our classification system did not maintain a consistent level of abstraction.

Part 3: Create a Diagram or Visualization of Your Hierarchy (1 point)
The visualization does not have to be fancy. We repeat: it does not have to be fancy. Start with “Animal” as the root of your hierarchy, then add your “super-types,” then add your types, and then your animal instances as the final level in your diagram. You can use any tool you wish (including drawing by hand and scanning your drawing), as long as it allows you to represent the hierarchical relations in your ontology.

Part 4: Define your Types and “Super-Types” (2 points)
Now, you will write definitions for both your equivalence classes and your ‘super types’ so an ordinary person would be able to categorize new instances. You’ll be following this formula for definitions:

Hyponym = {adjective+} hypernym {distinguishing clause}

Example:
Clarinets = {reeded} Woodwinds {that are approximately cylindrical in shape and have numerous keys}
Woodwinds = {reed or flute} Instruments {that produce sound when air is blown into them}

Record each definition in your spreadsheet.

Part 5: Reflect on Your Experience (2 points)
In your reflection document, write a few paragraphs about the approaches you used to identify equivalence classes and organize them into “super-types.” Consider your thought process throughout the assignment, tips from readings or lectures you drew on, finding consistency in your layers of abstraction.

Download: Assignment 6 Starter Spreadsheet

Assignment 4 – XML

Due 10/1

This purpose of this assignment is to get you familiar with one of the two editors and understand how an XML instance, schema, and transform fit together.

XML Spy (Windows)https://www.ischool.berkeley.edu/intranet/computing/software/xmlspy

XML Spy is an award-winning XML editor and development environment that has been generously provided to the I School for use in courses and projects, but it runs only on Windows or on Mac/Linux using a Windows emulator.

oXygen (Mac / Linux)https://www.ischool.berkeley.edu/intranet/computing/software/oxygen

oXygen XM  runs within a Java Virtual Machine, so it runs on any platform with a Java Runtime Environment. You can use the I School’s license to install it on your own computer.

Part 0:

This section is not graded, but will instruct you on how to use XML and the XML editors. The listed questions don’t need to be answered, but are here to help you think about and understand XML

  1. Install XML Spy or oXygen

  2. Download the attached zip file, which contains:
    Report.xml: an XML instance
    Report.dtd: an XML Document Type Definition
    Report.xsl: an XML transformation file

  3. Open Report.xml in the editor.

  1. Also open Report.xml in a web browser. Why is it rendered this way? (“View > Source” on menu bar).

  2. Back in the editor, check the XML instance for “well-formedness” – conformance to the syntax rules for XML (F7 in XML Spy; in oXygen, click the blue-checked document icon in the tool bar).

  1. Delete the beginning <Name> tag. Is the instance still well-formed?
    Change <Para> to <para>. Is the instance well-formed?
    XML is enforcing more restrictive syntax rules than HTML. Or put another way, XML doesn’t allow bad practices that browsers typically forgive with HTML.
    Undo these changes so that your instance is well-formed again.

  1. Specify an XML Document Type Definition for the XML instance by inserting

    
    

    directly below the

    declaration at the top of the file.

  1. Validate the XML instance. (F8 in XML Spy; in oXygen, use the red-checked document icon, near the well-formedness icon). Insert a second author element containing your name and email. Is this valid?

  1. Insert a <Phone> tag, your phone number, and </Phone> after your email element. Is this valid?

  1. Open the XML DTD in the editor. Try to figure out how you could have answered the previous two questions by examining the DTD rather than by experimentation.

  1. Specify a style transformation for the XML instance by inserting as the third line of the instance

  2. Open the XML instance in a browser again (in XML Spy, you can do this by clicking the “Browser” button at the bottom of the editor pane; in oXygen, click the red-triangle-in-a-circle icon to the right of the well-formedness icon).
    It should be formatted this time.

  1. Delete the DTD specification. Does the style transform still work?
    What does this imply about XML transformation programs?

  1. Open the XML transformation file (Report.xsl) in the editor.
    The third line of the program (where “xsl:template” occurs) matches the element named “Report” in the instance and then passes through as output everything up to the next “xsl:template” tag.

    Can you see how these 20 lines or so create the HTML “scaffold” for the formatted report?

Part 1 (4 points):

Rename a copy of Report.xml to YourlastnameA4.xml (e.g., GlushkoA4.xml).
Change the Author information to your own.

Note: You will have another section for your reflections from Part 2, listed below.

In the Body section of the report, change the Section title to “Reflections on Assignment 4” and write a short paragraph which:

  • Describes (using 202 terms) what each of the three files we gave you does. (2 points)

  • Explains the difference between an XML instance and the Document Type Definition. (1 point)

Validate your Report document and transform it to HTML. (1 point)

Part 2 (6 points):

Now that you have a feel for working in XML, we are asking you to return to A3 and encode the an instance of the vocabulary you submitted.

If during this process you wish to change your vocabulary, feel free, but the idea is that all the conceptual work you did in A3 should transfer over seamlessly. If you do make changes to your vocabulary, clearly describe your new vocabulary and descriptors in your reflection.

Please note: we are not asking you to create a DTD. Only an instance that follows the schema you laid out in A3. Put another way, pretend that the vocabulary you created in A3 is a DTD and follow it when creating your instance. Please name your instance YourNameA4Instance.xml. (3 points)

Next, create another Section in your Reflection report in Part 1 and title it “Reflections on Encoding Your Instance”. Write a paragraph or two (depends on how much you changed) describing your experience creating your instance. (3 points)

  • Was it straightforward?

  • Did it match the instance you created for A3?

  • If not, what changed in the instance?

  • If you had to make changes to your vocabulary, what were they and why did you decide to make them?

  • Try to validate your document against Report.dtd. What happens and why?

Make sure your XML is still valid after you add this Section!

Extra Credit (up to 2 points):

Create a DTD for your vocabulary defining valid tags and attributes. Ensure that your instance validates against the DTD. Include the DTD in your submission zip with the name YourNameA4Vocabulary.dtd

Submission Instructions

Submit a total of five files (zipped). Name your zipped file YourNameA4.zip

You should include:

  • YourNameA4Instance.xml

  • YourNameA4Report.xml

  • YourNameA4Report.dtd

  • YourNameA4Report.xsl

  • YourNameA4Report.html

  • YourNameA4Vocabulary.dtd (extra credit)