Archive forAugust, 2008

Rating the Ratings

Rating the ratings by Stephen Whitty/The Star-Ledger

http://www.nj.com/entertainment/tv/index.ssf/2008/08/post_3.html

The way in which feature films are given MPAA ratings often appears haphazard, governed more by studio muscles and politics rather than a reasonable set of guidelines. 

The highly political nature of this governing body was featured in Kirby Dick’s Documentary This Film Is Not Yet Rated, which came out in 2006. 

Most recently, controversy buzzed about regarding Kevin’s Smith’s latest film entitled Zack and Miri Make a Porno, starring Seth Rogan and Elizabeth Banks, when it was slapped with an NC-17 rating—which was overturned by the appeals board and given a box-office friendlier R rating. 

The article talks bout the current flawed system and lists examples of movies and their ratings to delineate the inconsistency within the system.  The author ponders what and who governs these guidelines and laments the politics that clearly influence and dictate the direction of the board. 

Categorizing a subjective and nuanced product, something that is difficult to be scientifically calibrated seems to be a consistent challenge not only for movie ratings but also in a variety of fields beyond the arts.  A product that is measured/judged on opinion is naturally subject to subjectivity.  Finding the best method of categorization will be an interesting and relevant challenge.  

3 – Organization {and, of, vs} Retrieval

5 – Concepts and Categories

7 – Controlled Names and Vocabularies

8 – Classification

12 – Enterprise/Institutional Categorization & Standards 

 

 

Comments off

The Chameleon

The Chameleon, David Grann, The New Yorker, August 2008

I found this piece in the New Yorker to be so interesting that I felt inspired to relate it to our 202 discussions somehow.  It was in this attempt that I gained a whole new appreciation of and perspective on the story and it’s characters.

The Chameleon is a true story about a Frenchman named Frederic Bourdin who, at the age of 16, runs away from home and wanders across Europe, taking on many different personas and fictional characters in search of the “perfect shelter.”  By the time he turns 18 and becomes a true adult Interpol has a growing record of his deceits, and he has attracted the attention of the European media.  These potential threats hardly affect his lifestyle as he continues to “insinuate himself into youth shelters, orphanages, foster homes, junior high schools, and children’s hospitals,” across 15 countries, generally posing as a desperate child in order to “win sympathy.”

His deceits escalate until he eventually hatches up a plan to impersonate a missing child named Nicholas Barclay who is said to have run away from his home outside of San Antonio, Texas three years earlier.  Not only is he able to convince Spanish and American Authorities, but also the missing child’s family.  The missing child’s older sister eventually meets Frederic at the American Embassy in Spain where he receives an American passport and is taken home to live with the family.  Without giving the whole story away, suffice it to say that there are quite a few additional twists.

So how does this relate to 202?  Well it made me think about misinformation, peoples’ varying perceptions of information, and how they determine the validity of information.  Do people believe what they see, what their instincts tell them, what their emotions or feelings make them want to believe, or what other people tell them to believe?  It also made me think about the sharing of information across institutions, how we manage and interpret personal identification information, and whether or not somebody could pull this off in today’s post 9/11 world.

Related Lectures

7 – Controlled Names and Vocabularies (9/22)

11 – Information Integration and Interoperability (10/6)

15 – Personal Information Management (10/20)

Comments (1)

Data Fusion: The Ups and Downs of All-Encompassing Digital Profiles

The current issue of the magazine Scientific American includes several articles on the rise of digital information and its use, from RFID and biometrics to eavesdropping. I found an article on data fusion, aka data integration, as the most relevant for this discussion.

Data Fusion: The Ups and Downs of All-Encompass Digital Profiles

The article begins with the author’s reflection on his experience traveling internationally several years ago, when his credit card issuer blacklisted his card because the company’s anti-fraud data mining algorithm detected potential fraud. He had merely bought a latte and a cell phone SIM in England. The company knew he was in England, as he’d bought his ticket to England with the same credit card! Shouldn’t they have known it was him?

From this introduction the author traces the history of data mining efforts in the United States, focusing in particular on the challenges of integrating multiple data sources together in order to data mine effectively. He cites examples from DARPA’s counter-terrorism efforts and the Department of Health & Human Services anti-fraud efforts, exploring how data mining is viewed as an ultimate tool, however one that is still rife with inconsistency and errors.

Errors primarily arise from the difficulty normalizing data from varying sources with varying levels of detail and uncertainty. And perhaps most importantly, that once those data sources are aligned, how does one guarantee identity? Who’s who? Am I Andy Brooks, A.L Brooks, and/or Andrew Brooks?

After examining data mining’s shortfalls, the author turns to examples of more effective work done with data fusion and data mining. The winners? Casinos! Think of those wallet-sized perks cards casinos are so happy to give you. In order to counter the efforts of cheaters, casinos have long funded development of non-obvious relationship analysis techniques. The techniques attempt to normalize data across multiple sources in an evolving way, one that tolerates error and uncertainty, and strives to grow more intelligent over time.

The article concludes with the author’s most important point – that similar to the history of cryptography, the public is essentially left out of any discussions about the use of data mining and data fusion. We don’t really know when it’s used, by whom, and for what purpose. We often only know after the fact, such as when the author’s credit card was declined when trying to buy a train ticket in London.

Candidate Lectures:

11. Information Integration & Interoperability (10/6)

15. Personal Information Management (10/20)

Comments off

Lines and Bubbles and Bars, Oh My!

New York Times, Aug. 30, 2008
http://www.nytimes.com/2008/08/31/technology/31novel.html?_r=1&ref=technology&oref=slogin

Many Eyes is a web service much like YouTube and Flickr, only instead of being able to share and tag photos, users can create, share, and tag visualizations of data. The tools used to generate graphical displays of data organization range from text clouds highlighting words most frequently used in a document or speech to creating more traditional circle and bar graphs, but the coolest part is how users are able to discuss the data and representation of the data in comments and how they can post their data representations to their own blogs or websites.

The part that struck me most in the article was the example of how a discussion in the comments lead to the data in question being represented in a different way, thereby leading to a slightly different conclusion.

Relevant lectures: Classification; Documents and Data Models… and Modeling; Social/Distributed Categorization

And as a bonus link incorporating cool data visualization: Debunking myths about the “Third World”

Comments (1)

Predicting Events with Public Data

With tools like wikipedia becoming increasingly important to public perception, watching what edits are being made, and who’s making them, can tell you a lot about the future.

On Friday, John McCain announced his vice presidential selection, Sarah Palin. But for some savvy data hawks, that was old news. A datamining and consulting firm, Cyveilliance, decided to monitor the Wikipedia pages of the vice presidential hopefuls, and by correlating the edits on Palin’s page with those on McCain’s, they were able to predict the selection the night before it was announced.

Note: I already shared an article. I know. I just thought this was too cool not to share, in addition. If you were writing a post on it already, let me know and I’ll delete this one.

Article

Source: The Washington Post, Published: Friday, August 29, 2008; 5:47 PM

Comments (1)

Tracking Your Money

As more banking transactions occur online, it becomes more difficult to keep track of where your money is going—even for banks’ wealthiest clients. Over the course of 15 months, someone managed to siphon over $300,000 out of Guy Wyser-Pratte’s JPMorgan Chase account. When he discovered that the funds were missing, he expected the bank to rectify the situation, but they would only cover $50,000.

The source of Wyser-Pratte’s woe is a combination of antiquated banking laws and the explosion of online transactions. Existing regulations require that bank customers notify the bank of suspicious activity within 60 days of the activity occurring, but with online services like automated bill pay and recurring transactions keeping track of every dollar is a Sisyphean task. Furthermore, many transactions are inscrutable even if detected. Companies bill from unexpected locations and use unannounced bill processing providers, so that classifying a transaction as erroneous is a non-trivial task.

As data overload, and our inability to deal with it using traditional tools and practices, proliferates, its effects will be felt in all sectors and by all members of society. While high fees and unsympathetic bankers used to be solely the purview of the unwashed masses, they now seem poised to strike at even the wealthiest and most privileged in society. While consumers and their advocates have been railing against outdated regulations and anti-consumer policies for years, the recent addition of more influential victims might aid their cause.

Article

Source: The New York Times, Published: August 29, 2008

Potentially relevant lectures: ENTERPRISE / INSTITUTIONAL CATEGORIZATION & STANDARDS (10/8), PERSONAL INFORMATION MANAGEMENT (10/20)

Comments off

E-Discovery – Too Much Information (TMI!)

Electronic discovery or e-discovery is the process of demanding and
sifting through, “digital evidentiary artifacts” for lawsuits.
Information from Facebook, Myspace, chat, email, laptops, smart phones,
memory sticks, back-up tapes, logs from service providers, is now considered,
“fair game,” and subject to inspection when adversaries in lawsuits
demand and are granted access. E-discovery is an increasingly expensive
and Sisyphean reality of modern court proceedings.  Court cases
more-frequently face early settlement, plaintiffs are increasingly
unable to sue (or defend), “for fear of [enormous] e-discovery costs”,
and the justice system is increasingly over-burdened.

Ordinary court cases risk millions of dollars, and hours of being
bogged down in e-discovery.  As a Verizon attorney explains for his
business, “Almost every case [now] involves e-discovery and spits out
“terabytes” of information…. 200 lawyers can easily review electronic
documents for four months, at a cost of millions of dollars.”  As a
result of the increased burden of effort, e-discovery businesses are
booming, frequently charging $125-$600/hr. Annual revenues from
e-discovery businesses, “Have grown from $40m in 1999 to about $2
billion in 2006 and may hit $4 billion next year.”

“Results [of e-discovery] have to be indexed and reviewed by
humans. This usually falls to the junior staff at law firms, some of
whom are so fed up with the drudgery that they have quit the profession
altogether.”

Privacy is increasingly subject to invasion, as insurance
companies have demanded personal records of their clients when
disputing customer claims.  For example, in a recent lawsuit, “Horizon
Blue Cross Blue Shield of New Jersey… asked and were granted the
right to see practically everything the teenagers had said on their
Facebook and MySpace profiles, in instant-messaging threads, text
messages, e-mails, blog posts and whatever else the girls might have
done online.”

In this context, it looks like your memex could be your wost enemy!

For more, see the original Economist.com article:  The Big Data Dump

This may touch on the following lectures:
ISSUES AND CONTEXTS (9/3)
ORGANIZATION {AND,OR,VS} RETRIEVAL (9/8)
PERSONAL INFORMATION MANAGEMENT (10/20)

Comments off

Pensieve, Delicious and “trails”

About a month ago, IBM published a press release about a project for personal memory organization called “Pensieve“.  There are a lot of similarities to MyLifeBits — its focus is on recording disparate types of information (business cards, photographs, timestamps, etc.) and then associating them together in the data store to ease retrieval.

That associative quality reminds me a lot of Vannevar Bush’s ”trails”.  The reader wants to connect several documents together (or have it done automatically) so that they can be easily retrieved together later.  I can’t wait for this sort of technology to be commonplace (though I wonder if it will need to be done with a monolithic application like MyLifeBits or PENSIEVE rather than a series of integrated applications like Flickr, Delicious, GMail, etc.).

And just finding this link for this blog posts gives an example of why I’d like this “associative” information organizer.  Using delicious (a bookmark organizer that I’d heartily recommend to all of you), I wanted to connect the MyLifeBits link and the Pensieve link since there was such an explicit comparison there.  But delicious doesn’t provide functionality for explicit connections (Vannevar Bush’s trails are still lacking, even for something as simple as links in a single service).  Instead I’m forced to awkwardly create a unique tag (”cPensieve”) for the connection between them (that won’t recall lots of Harry Potter links as well).  So to see all the projects I’d like to compare to MyLifeBits (there’s another called Daytum that’s also worth looking at), you can go to this link: http://delicious.com/npdoty/cPensieve

(This should fit into the next lecture, or whenever we talk about Vannevar Bush and MyLifeBits.)

Comments off

Field Guides, Bigfoot, and the desire to grasp life

Field Guide illustration“… Guides do not, however, deceive their users into thinking that life is fully “knowable.” The National Science Foundation estimates that only two to 40 percent of the total species on Earth have even been identified. A true understanding of the planet’s biodiversity, then, remains elusive. Classify away and you’ve still only scratched the surface. This tension, between the desire to grasp life and its ultimate ungraspability, plays out on the pages of the field guide and in their use. (…)

Subsequent naming can offer only a fleeting illusion of knowability. Yet ephemerality does nothing to discourage identification; instead, it leaves us wanting more.”

How do we approach assessing the appropriate technology for a particular situation? Are video podcasts really necessary for birding? This article covers a lot of ground, from the history of field guides to some of the more philosophical questions about human information interaction and why we pursue study in this field.

Guiding Light by Jesse Smith, from The Smart Set from Drexel University, August 22, 2008.

This may touch on issues discussed in lectures:
5. CONCEPTS & CATEGORIES (9/15)
7. CONTROLLED NAMES AND VOCABULARIES (9/22)
8. CLASSIFICATION (9/24)

Comments (1)

Cosmic Ghosts and Galaxy Zoo

The Galaxy Zoo project, brainchild of astrophysicist Kevin Schawinski and Chris Lintott, is a web-based tool that allows “armchair astronomers” (otherwise known as normal people like us) to help catalogue archived photographs of galaxies.  Through a brief online tutorial, amateur astronomers could start classifying galaxies as spiral, elliptical or something else.  The idea is, if an overwhelming amount of people classify a certain photograph as, for instance, elliptical, chances are they are right.

A 25-year old teacher from the Netherlands who had been using Galaxy Zoo found a strange and new object while looking over photographs.  After posting it on the site, it quickly attracted the curiosity of astronomers all over the world, including the people who run the Hubble Space Telescope.  A bright, gaseous mass with a huge hole in the center, observers are currently calling it “cosmic ghost”.  It might very well be a new class of astronomical objects.

The discovery of cosmic ghosts would probably not have happened during the present time if not for the opportunity given by projects like Galaxy Zoo for a multitude of people to participate in retrieving and organizing information.  In the past year alone, 150,000 armchair astronomers have participated in 50 million classification activities on the site.

For more information on Galaxy Zoo, visit www.galaxyzoo.org.

From the article “Armchair astronomer discovers unique ‘cosmic ghost’”.

Relevant Lectures:  5.  Concepts and Categories (9/15), 7.  Controlled Names and Vocabularies (9/22),  8.  Classification (9/24),  27.  Multimedia IR (12/1)

Comments off

« Previous entries