Data Fusion: The Ups and Downs of All-Encompassing Digital Profiles

The current issue of the magazine Scientific American includes several articles on the rise of digital information and its use, from RFID and biometrics to eavesdropping. I found an article on data fusion, aka data integration, as the most relevant for this discussion.

Data Fusion: The Ups and Downs of All-Encompass Digital Profiles

The article begins with the author’s reflection on his experience traveling internationally several years ago, when his credit card issuer blacklisted his card because the company’s anti-fraud data mining algorithm detected potential fraud. He had merely bought a latte and a cell phone SIM in England. The company knew he was in England, as he’d bought his ticket to England with the same credit card! Shouldn’t they have known it was him?

From this introduction the author traces the history of data mining efforts in the United States, focusing in particular on the challenges of integrating multiple data sources together in order to data mine effectively. He cites examples from DARPA’s counter-terrorism efforts and the Department of Health & Human Services anti-fraud efforts, exploring how data mining is viewed as an ultimate tool, however one that is still rife with inconsistency and errors.

Errors primarily arise from the difficulty normalizing data from varying sources with varying levels of detail and uncertainty. And perhaps most importantly, that once those data sources are aligned, how does one guarantee identity? Who’s who? Am I Andy Brooks, A.L Brooks, and/or Andrew Brooks?

After examining data mining’s shortfalls, the author turns to examples of more effective work done with data fusion and data mining. The winners? Casinos! Think of those wallet-sized perks cards casinos are so happy to give you. In order to counter the efforts of cheaters, casinos have long funded development of non-obvious relationship analysis techniques. The techniques attempt to normalize data across multiple sources in an evolving way, one that tolerates error and uncertainty, and strives to grow more intelligent over time.

The article concludes with the author’s most important point – that similar to the history of cryptography, the public is essentially left out of any discussions about the use of data mining and data fusion. We don’t really know when it’s used, by whom, and for what purpose. We often only know after the fact, such as when the author’s credit card was declined when trying to buy a train ticket in London.

Candidate Lectures:

11. Information Integration & Interoperability (10/6)

15. Personal Information Management (10/20)

Comments are closed.