Data-Mining for Terrorists Not ‘Feasible’

Along with the article “Name Matching in Law Enforcement and Counter-Terrorism,” there have been several posts on this blog about terror watch lists (see Michael L.’s post and Karen’s post) and cases of mistaken identity (Kentaro’s cell phone blacklist adventure). I just read this article, from Wired Magazine’s Threat Level Blog, about a recent report entitled “Protecting Individual Privacy in the Struggle Against Terrorists,” which criticizes the US Department of Homeland Security’s attempt at identifying terrorist activity by datamining every possible database of personal information, from phone records to credit card transactions. The article states:

“Automated identification of terrorists through data mining (or any other known methodology) is neither feasible as an objective nor desirable as a goal of technology development efforts,” the report found. “Even in well-managed programs, such tools are likely to return significant rates of false positives, especially if the tools are highly automated.”

I am very glad Wired put up a blog post about this, because the actual report is over 350 pages long. The executive summary of the report mentions that one huge problem with this type of data mining is, of course, that criminals can circumvent being listed in databases by (a) not participating in them, and (b) using false identities if they do.

Comments (2)

An Army of Ones and Zeroes

“An Army of Ones and Zeroes: How I became a soldier in the Georgia-Russia cyberwar.” by Evgeny Morozov via

As it stands, no one can really dispute that the last decade has brought significant changes to our societal definitions of warfare. Most obvious among these changes is the shift from the nation-to-nation principals of the Clausewitzian era to a new 21st century battlefield of non-state actors and Radio Shack enhanced IT tactics. Meanwhile, our military schools and strategists redefine their tactics and goals as they struggle to keep up.

In Morozov’s journalistic experiment, the author channels Matthew Broderick’s cheekiness from “Wargames” while signing on to act as a cyber-soldier against Georgia in its recent military face-off with Putin’s Russia. Experimenting with simple page-reload scripts and DOS attacks, Morozov describes his exploits against Georgian government information sites using widely available, pre-built tools that made joining the ranks so easy that he was left with “concerns about the number of child soldiers who may just find it too fun and accessible to resist.”

Given that warring countries have always had very different “calls-to-arms” for their citizen militias, I wonder how technologically sophisticated societies will harness the power of their citizens in information warfare over the next decade. While it’s somewhat hard to imagine the United States asking its general population to militarize their home computers for an information assault on China, it’s not unrealistic to envision a war between the Korean states or between China and Taiwan being fought in-part by thousands of teenage, or even elderly patriots recruited and trained in advanced cyber-warfare using an advanced social network that uses internal feedback systems such as quests, rankings, and rewards to promote its soldiers . Warfare 2.0 FTW. Kinda scary.

What struck me about this article is that given the expanding toolkit of tracking and surveillance hardware installed throughout this country, Morozov mentions nothing about nation vs. civilian reprisals. If Georgia discovers you are attacking its infrastructure, how can it strike back? Is this perhaps why he didn’t choose to attack the Russians despite his statement that his “geopolitical sympathies…lie with Moscow’s counterparts.”

I think it is valid to make correlations between this article and concepts discussed by Vannevar Bush and “Operation Clean Data.” When your information is centralized, is it not also weakened from a security standpoint? How are government information systems designed to provide data unification, internal transparency, redundancy, modularization, and useability all at the same time? How do cyber-warfare techniques exploit these systems with automation and retrieval? Surely these issues are being weighed by information and security experts to anticipate the many changes in the future of information warfare.


Comments off

Rating the Ratings

Rating the ratings by Stephen Whitty/The Star-Ledger

The way in which feature films are given MPAA ratings often appears haphazard, governed more by studio muscles and politics rather than a reasonable set of guidelines. 

The highly political nature of this governing body was featured in Kirby Dick’s Documentary This Film Is Not Yet Rated, which came out in 2006. 

Most recently, controversy buzzed about regarding Kevin’s Smith’s latest film entitled Zack and Miri Make a Porno, starring Seth Rogan and Elizabeth Banks, when it was slapped with an NC-17 rating—which was overturned by the appeals board and given a box-office friendlier R rating. 

The article talks bout the current flawed system and lists examples of movies and their ratings to delineate the inconsistency within the system.  The author ponders what and who governs these guidelines and laments the politics that clearly influence and dictate the direction of the board. 

Categorizing a subjective and nuanced product, something that is difficult to be scientifically calibrated seems to be a consistent challenge not only for movie ratings but also in a variety of fields beyond the arts.  A product that is measured/judged on opinion is naturally subject to subjectivity.  Finding the best method of categorization will be an interesting and relevant challenge.  

3 – Organization {and, of, vs} Retrieval

5 – Concepts and Categories

7 – Controlled Names and Vocabularies

8 – Classification

12 – Enterprise/Institutional Categorization & Standards 



Comments off

The Chameleon

The Chameleon, David Grann, The New Yorker, August 2008

I found this piece in the New Yorker to be so interesting that I felt inspired to relate it to our 202 discussions somehow.  It was in this attempt that I gained a whole new appreciation of and perspective on the story and it’s characters.

The Chameleon is a true story about a Frenchman named Frederic Bourdin who, at the age of 16, runs away from home and wanders across Europe, taking on many different personas and fictional characters in search of the “perfect shelter.”  By the time he turns 18 and becomes a true adult Interpol has a growing record of his deceits, and he has attracted the attention of the European media.  These potential threats hardly affect his lifestyle as he continues to “insinuate himself into youth shelters, orphanages, foster homes, junior high schools, and children’s hospitals,” across 15 countries, generally posing as a desperate child in order to “win sympathy.”

His deceits escalate until he eventually hatches up a plan to impersonate a missing child named Nicholas Barclay who is said to have run away from his home outside of San Antonio, Texas three years earlier.  Not only is he able to convince Spanish and American Authorities, but also the missing child’s family.  The missing child’s older sister eventually meets Frederic at the American Embassy in Spain where he receives an American passport and is taken home to live with the family.  Without giving the whole story away, suffice it to say that there are quite a few additional twists.

So how does this relate to 202?  Well it made me think about misinformation, peoples’ varying perceptions of information, and how they determine the validity of information.  Do people believe what they see, what their instincts tell them, what their emotions or feelings make them want to believe, or what other people tell them to believe?  It also made me think about the sharing of information across institutions, how we manage and interpret personal identification information, and whether or not somebody could pull this off in today’s post 9/11 world.

Related Lectures

7 – Controlled Names and Vocabularies (9/22)

11 – Information Integration and Interoperability (10/6)

15 – Personal Information Management (10/20)

Comments (1)

Data Fusion: The Ups and Downs of All-Encompassing Digital Profiles

The current issue of the magazine Scientific American includes several articles on the rise of digital information and its use, from RFID and biometrics to eavesdropping. I found an article on data fusion, aka data integration, as the most relevant for this discussion.

Data Fusion: The Ups and Downs of All-Encompass Digital Profiles

The article begins with the author’s reflection on his experience traveling internationally several years ago, when his credit card issuer blacklisted his card because the company’s anti-fraud data mining algorithm detected potential fraud. He had merely bought a latte and a cell phone SIM in England. The company knew he was in England, as he’d bought his ticket to England with the same credit card! Shouldn’t they have known it was him?

From this introduction the author traces the history of data mining efforts in the United States, focusing in particular on the challenges of integrating multiple data sources together in order to data mine effectively. He cites examples from DARPA’s counter-terrorism efforts and the Department of Health & Human Services anti-fraud efforts, exploring how data mining is viewed as an ultimate tool, however one that is still rife with inconsistency and errors.

Errors primarily arise from the difficulty normalizing data from varying sources with varying levels of detail and uncertainty. And perhaps most importantly, that once those data sources are aligned, how does one guarantee identity? Who’s who? Am I Andy Brooks, A.L Brooks, and/or Andrew Brooks?

After examining data mining’s shortfalls, the author turns to examples of more effective work done with data fusion and data mining. The winners? Casinos! Think of those wallet-sized perks cards casinos are so happy to give you. In order to counter the efforts of cheaters, casinos have long funded development of non-obvious relationship analysis techniques. The techniques attempt to normalize data across multiple sources in an evolving way, one that tolerates error and uncertainty, and strives to grow more intelligent over time.

The article concludes with the author’s most important point – that similar to the history of cryptography, the public is essentially left out of any discussions about the use of data mining and data fusion. We don’t really know when it’s used, by whom, and for what purpose. We often only know after the fact, such as when the author’s credit card was declined when trying to buy a train ticket in London.

Candidate Lectures:

11. Information Integration & Interoperability (10/6)

15. Personal Information Management (10/20)

Comments off

Classifying Terrorism

From: Terror Watchlist is “Imploding,” Legislator Says

According to a letter from the chair of the House Science and Technology Committee, the Terrorist Identites Datamart Enviornment (TIDE), commonly known as the Terror Watch List, is failing miserably. In theory, the list should take the form of a  database that accepts information from a number of government organizations such as the FBI, CIA and National Counterterrorism Center (NCTC). It should accept and classify that information in the database, and then allow authorized organizations to query that information.

Unfortunately, the letter says, the database can’t keep up with the data that is being submitted for classification, nor can queries be accurately done on the information already in the system. Furthermore, it lacks the capability to do fuzzy searches, so if my name is on the watchlist, it’s not a big deal because I can just travel under my middle name and get right by. The letter says that the database consists of “463 separate tables, 295 of which are undocumented.” The only way to query it is via SQL commands.

From the lecture: How do people search for information, How can we organize information, What is meaning? Where is meaning? Defining what something means.

Comments (1)