Archive for April 12th, 2018

Facebook and Cambridge Analytica have recently sparked major outrages, discussions, movements, and official investigations [1]. In a previous class blog post [2], author Kevin Foley aptly suggests that business incentives may have enabled this incident and that government regulators should work with companies to ensure privacy is respected. This made me wonder about the users themselves, who – from much of what I’ve read in the media – seem to be painted as helpless victims of this breach in trust. Should users rely solely on investors and the government, or should the users themselves be able to take steps to better secure their privacy? Specifically, if users read and better understood relevant documents, such as Facebook’s privacy policy, could they have understood how their data could be inappropriately collected by an organization that could create psychological profiles used to manipulate them?

Recognizing that privacy is “a secondary task – unless you’re a privacy professional, it’s never at the top of your mind” [3] and that “users generally do not read [privacy] policies and those who occasionally do struggle to understand what they read” [4], professor Norman Sadeh at Carnegie Mellon developed the Usable Privacy Policy Project to make privacy policies easier to understand. The project involved using Artificial Intelligence to learn from training data – 115 privacy policies annotated by law students, essentially transformed from legal jargon to simpler, more digestible language – to allow for automated classification of privacy categories, reading level, and user choice sections. The project exists as an online tool ( and contains thousands of machine-annotated privacy policies. Reddit’s most current privacy policy from 2017, which I summarized for a previous assignment, was algorithmically classified as College (Grade 14) reading level material, had mostly appropriate categories with corresponding sections in the text version of the privacy policy highlighted [5]. The tool isn’t perfect, boasting 79% for relevant passage detection [6], but it provides areas to search in the text if interested in a topic, such as statements associated with “third-party sharing,” which could exist in multiple places in the document.

The machine-annotated privacy policy for Facebook from late 2017 is also available and has multiple sections where third-party sharing is highlighted by the tool [7]. Reading through these non-summarized sections, users could understand that apps, services, and third-party integrations (subject to their own terms) could collect user information. The written examples suggest active involvement on the user’s part could result in information sharing for *that* user (i.e. self-participation entailed consent), but I don’t think readers could reasonably expect that simple actions their friends perform would allow their own data to be shared. This kind of sharing (user to app), albeit vaguely described and not necessarily expected by users, is part of Facebook’s permissive data policy. The violation behind the scandal was inappropriate information sharing from the app developer, Kogan, to a third-party, Cambridge Analytica [8].

I have to agree with Foley that government regulators should be able to hold companies accountable for protecting user data. Even a good tool that that makes it easier to understand privacy policies can’t help users identify a risk if that risk (or factors leading up to it: which data is being shared, in this case) isn’t sufficiently detailed in the document. The question then becomes how balanced should a privacy policy be, in terms of generality and specificity? How many relevant hypothetical examples and risks should be included, and what role (if any) should government regulators have in shaping an organization’s privacy policy?

On a more positive note, Facebook seems to be taking action to prevent misuse; for example, they now provide a link showing users which apps they use along with information being shared to those apps [9]. A final question is left for the reader: as a user of any online service, how informed would you like to be about how your data is handled, and whether there more effective methods to be informed than static privacy policy pages (e.g. flowcharts, embedded videos, interactive examples, short stories, separate features)?