Dataception: Discussing the Metadata of Your Data

Dataception: Discussing the Metadata of Your Data
By Gurdit Chahal | November 1, 2019

Many of us have heard the adage that a picture is worth a thousand words. However, few of us may realize how a digital picture literally carries a thousand words as we share content across social media, email, and the internet in general. And no, I don’t mean your visible posts or hashtags. Instead, consider the metadata, the trail of breadcrumbs attached to your electronic and digital content that government intelligence agencies, hackers, advertisers, and others make use of to gain knowledge into private areas of our lives.

Aptly named, metadata is the data about your data. Its function is to help tag, organize, find, and work with the data it describes. For a digital image or other visual digital media, the metadata often comes as an “exif” file that has details regarding the make of the camera, whether flash was used, the date it was taken, GPS coordinates, etc. This information is usually generated and shipped with the image itself automatically. Other digital objects like documents can carry metadata as well. For another example showcasing the potential level of detail, check out the anatomy of metadata for a Twitter tweet in the image below.


Caption: Metadata Diagram for a Twitter Tweet

Exploring examples of metadata-based applications can help give a sense of how much we potentially expose ourselves in the digital world. A seemingly silly yet concerning demonstration of the capability is given by the self-described “data visualization experiment” at iknowwhereyourcatlives.com . To showcase exploitability of personal information through public content, the creators collected millions of cat pictures available on the internet and pinned the pictures to Google maps using GPS coordinates provided in the images’ metadata that is accurate to about twenty feet of where it was taken. Basically, by your posting a picture of your cat (or anything/anyone else) at home, someone can figure out where you live and be off by about three cars parked in a line.


Caption:Metadata Can Be Found With Basic Tools

Taking the “Big Brother” vibe up a notch is an example from 2009. German politician Malte Spitz took his telecommunication provider Deutsche Telekom to court in order to get access to the data, particularly the metadata, it had collected on him. With the help of journalists, Spitz was able to produce an interactive map of his life that spans six months based purely off of his metadata. It included where he went, where he lived, whom he talked with/the call duration, his phone contacts, among other details. On top of that, the metadata combined with data related to his political life such as Twitter feeds, blogs, and other content available online to make the map showed not only where he was going and with whom he spoke, but also likely what he was talking about or doing throughout the day (such as rallies, flights, lunch breaks). The map is here: https://www.zeit.de/datenschutz/malte-spitz-data-retention .


Caption: Sample Frame of Maltz Spitz’s Data Story

In terms of research, there have been studies to try to quantify how vulnerable we are depending on the metadata available. A 2018 study showed how given Twitter metadata and nothing regarding their actual historical content, a machine-learning algorithm could pinpoint a new tweet to a user out of a group of 10,000 identified individuals with about 96% accuracy. Moreover, despite trying to confuse the model with data obfuscation and randomization techniques that are standard ways to try to add noise and hide information, the performance was still near 95%. A rough analogy would be to take a local phone book, be told that one of the people listed said “I like turtles” at 2pm and be able to use that list and some phone bill information to pinpoint who it was.

On the legal end, scholars like Helen Nissenbaum have pointed how metadata is often in a gray zone with respect to Fourth Amendment protections from search and seizure, since it hinges on what is “expected” to be private. While Europe has GDPR, California is one of few American parallels with the Electronic Communications Privacy Act requiring a warrant and the Consumer Privacy Act of 2018 giving citizens (starting 2020) the right to see what data and metadata is collected by companies to be able to have it deleted.

Having provided a survey level perspective on metadata and its potential, I hope that we can we can be mindful of not only the worth of the picture, but also of the thousand words used to describe it.

References
[1] iknowwhereyourcatlives.com
[2] https://www.law.nyu.edu/centers/ili/metadataproject
[3] https://arxiv.org/pdf/1803.10133.pdf
[4] https://www.zeit.de/datenschutz/malte-spitz-data-retention
[5] https://opendatasecurity.io/what-is-metadata-and-what-does-it-reveal/
[6] https://www.perspectiverisk.com/metadata-and-the-risks-to-your-security/
[7] https://www.digitalcitizen.life/what-file-s-metadata-and-how-edit-it
[8] https://www.eckerson.com/articles/if-data-is-the-new-oil-metadata-is-the-new-gold
[9] https://www.datasciencecentral.com/profiles/blogs/importance-of-metadata-in-a-big-data-world
[10] https://whatis.techtarget.com/definition/image-metadata
[11] https://www.digitalcitizen.life/what-file-s-metadata-and-how-edit-it

Leave a Reply