Archive for March, 2018

Privacy.  Something that I like to think I have a good amount of, in reality probably have a lot less of, and always want to have more of.  Today I want to explore the idea of “losing” privacy and why our fear of this phenomenon may be higher in the current climate than it was before.

Before going any further, let’s first define what privacy is.  For the purposes of this blog post, we’ll define “privacy” as the level of difficulty a stranger would have in finding out personal details about you.  The more privacy you have, the harder it is for someone to find out information about you, and the less privacy you have, the easier it is for someone to find out information about you.  We’ll define strangers as fellow human beings with whom you do not have a personal relationship with and who do not work in government surveillance.

Now, what does it mean to “lose” privacy, and when did we start “losing” it?  According to our definitions above, we can say losing privacy is essentially making it less difficult for strangers to find out details about you.  As for when we actually started to lose it, the answer is a little less clear.  Technically speaking, we can say that we started losing privacy as soon as we started posting information about ourselves on publicly available spaces.  In practice this means using social media such as Xanga, MySpace, Facebook, and, more recently, Instagram, Twitter, and Snapchat.  However, with even these services alone, it can be argued that society’s collective privacy was still fairly safe, as, while it may be easy for an individual to stalk a few people, it would be quite the effort for an individual to find information about thousands or millions of people.  In other words, the information was all there, but the method for viewing a lot of it at once wasn’t something a layman could do.

This is, of course, before the widespread use of web scraping and web crawling.  Web scraping and web crawling, at their core, are methods by which one can scan and extract data from the internet en masse.  While these tools have been in existence almost as long as the internet has, their threat to privacy is their recent adoption and integration in facets of everyday life.  Consider Spokeo, a company that aggregates individuals’ information across their social media accounts, public data, and deep web to create a “profile” of that person.  This profile would include information such as potential family members, place of residence, past places of residence, salary range, and estimated credit score.  Spokeo then sells this information to entities such as employers, creditors, and landlords, who then use this data to make hiring, loan, and rent decisions.  In a similar vein, consider Fama, a company that classifies one’s social media activities and posts as “positive”, “negative”, or “neutral”, and then sells the count of each type of action to potential employers.

From the two examples above, we see that there are now commercially available ways for strangers to not only obtain information about you, but also act on it.  This, I argue, is the ultimate chipping away of privacy, and why our societal fear is greater and more warranted than it was before.  The information is available as it has been for over a decade now – however, there are now very accessible tools with which those with minimal technical experience can not only aggregate information about an individual, but also aggregate that aggregated information for the millions of individuals who have some form of online presence.

So, is this how democracy dies, in a late stage capitalistic state where oligarchical companies know everything about us?  I would like to think not.  Thankfully, there are ways that we can combat this very thing, all from the comfort of our own homes.  Spokeo, and companies like it have an opt-out option where individuals can delete information about themselves that Spokeo and other companies have been storing.  An incomplete list of how to do this can be found here.  Additionally, while our information may be available to the public, our “content” is under copyright.  Anything posted on Twitter or Instagram, according to the respective services’ Terms of Use, is the legal property of the user.  Further, there is legislation in the pipeline that would protect users’ online privacy.  For the state of California, one need only look here to see the upcoming legislation.  Finally, the simplest option is to, of course, discontinue use of social media platforms.  For many, this is understandably a challenge.  While we may not be able to completely stop using social media, there are ways to make more of our information on social media private, thus preventing companies like Spokeo and Fama from harvesting it.  Ways to do this on Facebook can be found here.

Other Sources:

This past week has seen Facebook as the center of a news cycle that since the 2016 election has proven very difficult to get attention from. Cambridge Analytica, a political consulting firm, has been accused of manipulating users to gain access to their data and users are upset with Facebook for not doing more to protect such data. It’s not just users who have shown displeasure though – the stock market reacted in a large way with Facebook’s stocks dropping over 13% in the two weeks since the revelations.

While users and investors are upset with Facebooks, things won’t change. From investor’s perspective – they don’t want things to change. The reason they’re upset with the company is that Facebook got caught. Markets have valued Facebook’s stock so since they’re IPO because of the amount of data they have available and what that data is worth to advertisers. Sure, investors would like Facebook to be slightly more careful with some privacy issues, but only so far as it doesn’t impact the business in a significant way.

Users, of course, are more upset. They feel violated, not only because their data was made available but because of how it was used to manipulate them by companies like Cambridge Analytica. Already user engagement has dropped. [1] Mark Zuckerberg has tried taking out full-page ads and going on a public relations tour to reassure users, but thus far it’s been shown little success.

The question is if users aren’t using Facebook then where are they going? There are few other options; other services like Twitter or Snapchat don’t provide quite the same service as Facebook, and more importantly, they’re just as happy to share your data for advertisers as well. Any competitor that might arise would be faced with the same issues as Facebook – little incentive to self-regulate. There have been attempts in the past for subscription-based social network such as which have demonstrated users aren’t willing to pay for privacy.

If investors are only interested in privacy so far as any violations aren’t in the headlines and users want some kind of privacy but aren’t willing to pay for it, what can be done? The best solution to this problem is to introduce government backed regulations to monitor the handling of data by companies such as Facebook and other tech giants. It’s not just that they lack the incentive to self-regulate, in many ways they lack the perspective to gauge whether a business decision is a violation of users’ privacy. It takes someone with the ability and the experience to really look at a process and not just see best case scenarios – such as the scenario where all developers respect the agreement they’ve signed.

Government regulators need the ability to hold companies accountable and help provide the necessary guidelines to make sure users’ data is safeguarded. It’s impossible to prevent every single violation of privacy when data is so prevalent; however, users have already shown that’s not what they’re expecting. They’re already aware that they’re sharing their data and that it’s being consumed not only by friends and family but also 3rd parties. They’re willing to make that tradeoff to enjoy free services, but only if basic precautions are implemented. Given most users aren’t in the position to evaluate a company’s data practices and determine if they are up to snuff, the government must step in to work with companies to make sure users’ privacy is respected.


Privacy and Teenagers

March 27th, 2018

In the 2014 book “It’s Complicated: The Social Lives of Networked Teens”, author danah boyd presents a decade of research of the role of the internet and social media in the lives of teenagers. This book goes beyond the common stereotypes of teenagers always being on their phones, unable to interact face to face, and through interviewing hundreds of teens in the United States across racial, class and geographic divides, creates a nuanced picture of how teens use and view social media. Of particular interest is a chapter on privacy. This chapter examines why teens share personal details on public forums, the conscious privacy decisions they are making in the process, and who they want privacy from.

In short, boyd asserts that teenagers are aware that postings on social media such as Facebook are accessible to the general public, but they view that as a standard part of life. They are more concerned about privacy from parents and teachers than privacy from corporations, and data analytics. By posting in vague ways that require deep understanding of social context or inside jokes, teens communicate on public forums in ways that are meaningful to their social network while appearing as meaningless noise to outsiders.

This work suggests that teens are a unique subgroup of internet users, with a unique understanding of privacy, and unique concerns. These generational differences should inform how privacy policies are presented, what they contain, and how regulations are created.

Privacy Regulations for Teenagers

In the State of California, online data use is governed by CalOPPA, the California version of the Online Privacy Protection Act. Because most websites in the US have users in CA, CalOPPA has become the national standard.There is a specific subsection of this act referred to as the “Online Eraser” rule. This requires websites that have users between the ages of 13 and 17 “provide a mechanism for minors to remove the content or information that they have posted(1)”. This law acknowledged that the teenage years are a special time, and makes additional leeway for decisions made during this time. Teens are acknowledges as a subgroup deserving of additional privacy protections.

Societal Pressure to Clarify Privacy Policies and Increase Understanding

The is an increasing pressure on tech companies to clarify privacy policies, and ensure that users understand what they are consenting to, rather than clicking “I agree” to a long unintelligible document full of legal jargon. In 2011, Facebook was investigated by the Federal Trade Commission (FTC) on charges “that it deceived consumers by telling them they could keep their information on Facebook private, and then repeatedly allowing it to be shared and made public.(2)” In the resulting settlement, Facebook overhauled its privacy and data use policy to create an easier to understand format, is subject to bi-annual third-party audits. In light of the recent scandal with Cambridge Analytica, as of March 26th, 2018 the FTC has reopened the investigation into Facebook’s privacy practices(3).

Duty of Companies to Explain Privacy in Ways that Match Teens Worldview

Many current privacy concerns center around the use of data for unintended purposes. This includes the scrapping of Linkedin data to tell a current employer that someone is looking for a new job, such as the topic of a lawsuit between LinkedIn and HiQ labs(4), or Cambridge Analytica scraping Facebook data to create psychological profiles of voters to target personalized political advertisements in the 2016 Brexit vote and the US Presidential election(5).

It would be logical to expect that as a result of these recent high profile stories, internet users will be analyzing privacy policies with a fresh eye towards data use by third parties. In the fallout of these scandals, many companies that rewrite or reformat privacy policies will be looking to assuage these fears and provide clear information where users have the option to consent before their data is shared, or used by outside analytic firms.  

However, in this moment of increased scrutiny, it is important to look at the data privacy expectations and understanding of all subgroups. Rewritten privacy policies should not assume that teens have the same privacy priorities and understanding as adults. There is a need for clear information on what sharing with a third party would mean, as well as why that could be beneficial or dangerous, as well as clear information on privacy from specific people that the teens crave.

In January of 2018, Amazon Go has finally opened to the public as its first store which located  in Seattle, WA. Another six Amazon Go stores are planned to open across the country this year as well.  Amazon Go is aiming in a move that could revolutionise the way we buy groceries– it has no human checkout operators or cashiers.

How does it work

In short, customers simply walk into the store, pick up items that they want, and they’re automatically charged for their purchases on their Amazon Go account and associated credit card, no need to stop for checking out.

The store use a variety of scanning technologies and algorithms to monitor patrons and verify purchases. When consumers walk in swiping their smartphones loaded with the Amazon Go app, they are free to put any of the sandwiches, salads, drinks and biscuits on the shelves straight into their personal shopping bags or backpacks instead of shopping carts.  The store uses hundreds of cameras and electronic sensors that mounted on the ceiling and corners to identify each customer and track the items they select.

Why Amazon Go might succeed

For consumers, no need to wait in the line during the grocery rush hour is a big relief, especially when you in a hurry to catch the morning bus or forget to grab your wallet. You walk in and out much faster.  In addition, you can also see the price (including sales price) ahead of time. The app also provides valuable feedback data from consumers. Instead of waiting and writing feedback card in a grocery store where you are upset about the salad you purchase, simply typing a short review and click submit, your store manager will know immediately that the salad might not be as fresh as they expected. It also allows you to track your grocery expenditure and eating habit. You might wanna put your lunch budge a little bit tighter and spend more money on healthier choices.

For the store managers and retail owners, not only for the obvious reason that stores don’t have to hire more human for the basic operations, the data it generates and offers in real time is a game changer : the Amazon Go app gives it customer feedback and helps dictate what the store should stock instantly. And this is the crucial part across retail because they don’t want to miss out on sales and find themselves with unsold merchandise they have to chuck or sell at clearance prices.

What about their privacy policy

One would imagine a cashierless store would come at a big cost to privacy, as the system allows Amazon to track troves of data, from cameras, sensors and microphones. So far, the Amazon Go app uses the same privacy policy as  The policy claims that the information it collects and stores include the information you give them, automatic information (certain types of information whenever you interact with us), and also information from other sources .No sensor, camera or monitor information were mentioned in their privacy policy, it could follows under the information you give them section.

How this will impact our life

You might not be able to enjoy Amazon Go now if you are not in Seattle area, in the very near future it could still impact our lives greatly. Although Amazon has not said whether and how much they are planning to expand their concept, the technology is already proving to be something it could use more widely and echo across different areas.

The huge impact from this technology is not only from online aspect, but also the real physical world. The patent that Amazon filed in 2014 shows that Amazon used the algorithm analyses the gestures captured by the cameras to identify which items a customer picks ups, and the weight sensors to assess which ones leave the shelves. The shoppers are basically on display throughout the whole process : tracked by hundreds of cameras and sensors from the first swipe of their phone to their last step out the door. Data generated from this process include: your location and time information, your body and physical information (your clothes and shoe size, gender, ethnicity and the clothing information) and movement (walking pattern, gesture), your health information (your height, weight, even heart rate).  Till now, Amazon Go hasn’t mentioned if they will track any information regarding under aged children. The smart convenience of the Amazon Go technology also brings potential scary privacy concerns. Not only it collects your shopping behavior, but also some of your physical behavior and data.


Note: The following is a re-post of AFOG member Shreeharsh Kelkar’s September 25, 2017 post on Scatterplot responding to the controversy over Wang and Kosinski’s (2018) paper about using deep neural networks to recognize “gay” or “straight” faces. At the time of Shreeharsh’s post, Wang and Kosinski’s paper had been accepted for publication but not yet published. The final version of the paper is now published in the Journal of Personality and Social Psychology and can be found here. Shreeharsh argues in this post that at least a part of the opacity of algorithms comes from the ways in which their technical mechanisms and social meanings co-exist side-by-side.

On this blog, and elsewhere, Greggor Mattson, Phil Cohen, and many others have written thoughtful, principled critiques of the recent “gaydar” study by Yilun Wang and Michal Kosinski (henceforth I’ll refer to the authors as just Kosinsky since he seems to be the primary spokesperson).  I fully agree with them: the study both does too much and too little.  It purports to “advance our understanding of the origins of sexual orientation and the limits of human perception” (!) through a paltry analysis of 35,326 images (and responses to these images by anonymous humans on Amazon Mechanical Turk).  And it aims to vaguely warn us about rapacious corporations using machine learning programs to surreptitiously identify sexual orientation but the warning seems almost like an afterthought: if the authors were really serious about this warning, they could have dug deeper with a feasibility study rather than sliding quickly into thinking about the biological underpinnings of sexuality.

As someone who follows and studies the history of artificial intelligence (as I do), there are some striking parallels between the argument between Kosinsky and his critics, and early controversies over AI in the 1960s-80s, and I will also argue, some lessons to be learnt. Early AI was premised on the notion that when human beings did putatively “intelligent” things, they were processing information, a sort of “plan” that was worked out in their heads and then executed.  When philosopher Hubert Dreyfus wrote his famous “Alchemy and Artificial Intelligence” paper for the RAND Corporation in 1965 (later expanded into his book What Computers Can’t Do), he drew on the work of post-foundationalist philosophers like Heidegger, Wittgenstein, and Merleau-Ponty to argue that human action could not be reduced to rule-following or information processing, and once AI systems were taken out of their toy “micro-worlds,” they would fail. For their part, AI researchers argued that critics like Dreyfus moved the “intelligence” goalposts when it suited them. When programs worked (as did the chess and checkers-playing programs in the 1960s and 1970s), the particular tasks they performed were just moved out of the realm of intelligence.

Figure 1: The canon of artificial intelligence. Source: Flickr, Creative Commons License.

One way to understand this debate—the way that participants often talked right past each other—is to understand the different contexts in which the AI researchers and their critics approached what they did.  In what I have found to be one of the best descriptions of what it means to do technical work, Phil Agre, who worked both as an AI researcher and a social scientist, points out that AI researchers rarely care about ideas by themselves.  Rather, an idea is only important if it can be built into a technical mechanism, i.e. if it can be formalized either in mathematics or in machinery.   Agre calls this the “work ethic”:

Computer people believe only what they can build, and this policy imposes a strong intellectual conservatism on the field. Intellectual trends might run in all directions at any speed, but computationalists mistrust anything unless they can nail down all four corners of it; they would, by and large, rather get it precise and wrong than vague and right. They often disagree about how much precision is required, and what kind of precision, but they require ideas that can be assimilated to computational demonstrations that actually get built. This is sometimes called the work ethic: it has to work (p13).

But the “work ethic” is often not something outsiders—and especially outside researchers—get.  To them, the exercise seems intellectually shoddy and perhaps even dangerous.  Here is Agre again:

To get anything nailed down in enough detail to run on a computer requires considerable effort; in particular, it requires that one make all manner of arbitrary commitments on issues that may be tangential to the current focus of theoretical interest. It is no wonder, then, that AI work can seem outrageous to people whose training has instilled different priorities—for example, conceptual coherence, ethnographic adequacy, political relevance, mathematical depth, or experimental support. And indeed it is often totally mysterious to outsiders what canons of progress and good research do govern such a seemingly disheveled enterprise. The answer is that good computational research is an evolving conversation with its own practical reality; a new result gets the pulse of this practical reality by suggesting the outlines of a computational explanation of some aspect of human life. The computationalist’s sense of bumping up against reality itself—of being compelled to some unexpected outcome by the facts of physical readability as they manifest themselves in the lab late at night—is deeply impressive to those who have gotten hold of it. Other details—conceptual, empirical, political, and so forth—can wait. That, at least, is how it feels. [p13, my emphasis].

Figure 2: Courses required to complete a graduate certificate in artificial intelligence. Source: Flickr, Creative Commons License.

This logic of technical work manifests itself even more strangely for something like AI, a field that is about building “intelligent” technical mechanisms, which therefore has to perform a delicate two-step between the “social” and the “technical” domains—but which is nevertheless also a key to its work and its politics.  Agre argues that the work of AI researchers can be described as a series of moves done together, a process that he calls “formalization”: taking a metaphor, often in an intentionalist vocabulary, (e.g. “thinking,” “planning”, “problem-solving,”), attaching some mathematics and machinery to it, and then being able to narrate the working of that machinery in intentional vocabulary.  This process of formalization has a slightly schizophrenic character: the mechanism is precise in its mathematical form and imprecise in its lay form; but being able to move fluidly between the precise and the imprecise is the key to its power. This is not perhaps very different from the contortions that quantitative social science papers perform to hint at causation without really saying it openly (which Dan has called the correlation-causation two-step on this blog).  The struggle in quantitative social science is between a formal definition of causation versus a more narrative one.  AI researchers, of course, perform their two-step with fewer caveats because their goal is to realize their mathematical machinery into actual “working” programs, rather than explain a phenomenon.

To switch abruptly to the present, we can see the same two-step at work in the Kosinsky paper. There is the use of social categories (“gay,” “straight”), the precise reduction of these categories to self-labeled photos with faces, the also-precise realization of a feature-set and standard algorithm to derive the labels for these photos, and then the switch back into narrating the workings of the systems in terms of broader social categories (gender, sexuality, grooming, recognizing).  The oddest thing in the paper is the reference to the “widely accepted prenatal hormone theory (PHT) of sexual orientation” but a closer reading shows that the theory is invoked essentially to provide a “scientific” justification of choices in the design of what is a conventional machine learning classifier.  (My suspicion is that the classifier came first, and the theory came later because of the decision to submit to a psychology journal.  Alternatively, it may have evolved out of the peer review process.)

But if the two-step remains the same, the world of AI today is starkly different.  As I have written before, today’s artificial intelligence is steeped far more in the art of making (real-world) classifications, rather than in the abstract concepts of planning and state-space searching.  Moreover, far from operating in “microworlds” as they did before, contemporary AI programs are all too realizable in the massive infrastructures of Facebook and Google.  (Indeed, one of Dreyfus’ criticisms of early AI was that it would not work in the real world.  No one would argue that today.)   Not surprisingly, the debates over AI have shifted as well: they are much more about questions of bias and discrimination; there’s also far more talk of how “algorithms”—the classifying recipes of the new AI—sometimes seem similar to the discredited sciences of phrenology and physiognomy.

There have been three angles of critique of the Kosinsky study.  The first has been over the researchers’ notion of “informed consent”: as Greggor Mattson points out (see also this Mary Gray post on the old Facebook contagion study), researchers, whether corporate or academic, need to be more cognizant of community norms around anonymity and privacy (especially for marginalized communities) when they scrape what they see as “public” data.   The second has been from quantitative social scientists who find the Kosinksy study lacking by the standards of rigorous social science.  Again, you’ll find no argument from me on that score.  But it bears mentioning that AI researchers are not quantitative social scientists: they are not so much interested in explaining phenomena as they are in building technical systems.  Should quantitative social scientists take the logic of technical work into account when they criticize the big claims of the Kosinksy study?  Maybe so, maybe not; there are certainly grounds to think that the dialogue between quantitative social scientists (accustomed to the correlation-causation two-step) and AI classifier-builders will be productive, given that the use of correlations is now emerging as central to both fields.

My own angle on the study is from the third  perspective, that of interpretive social science. When we social scientists find the use of social categories in the Kosinsky study dubious (and even outright wrong), we are reacting to what we see as the irresponsible use of a socially meaningful vocabulary to describe the working of an arcane technical mechanism.  On this score though, the history of the older debates over AI is worrying.  If my reading of the history of AI is right (I’m open to other interpretations), those debates went nowhere because people were talking past each other.  Much ink was spilled, feuds were born, but everything went right on as it did before: AI was still AI, the social sciences were still the social sciences, and the differences remained stark and deep. (Indeed, the work of people like Agre and Lucy Suchman got taken up more in the computer science sub-field of human-computer interaction (HCI) than in AI proper.)

Could we do better this time?  I don’t know.  I might start by asking the AI researchers to be careful with their use of metaphors and socially meaningful categories.  As the AI researcher Drew McDermott put it in his marvelously titled “Artificial Intelligence Meets Natural Stupidity” article written in the 1970s, some of the feuds over early AI really could have been avoided if the AI researchers had used more technical names for their systems rather than “wishful mnemonics.”

Many instructive examples of wishful mnemonics by AI researchers come to mind once you see the point.  Remember GPS? (Ernest and Newell 1969).  By now, “GPS” is a colorless term denoting a particularly stupid program to solve puzzles.  But it originally meant “General Problem Solver,” which caused everybody a lot of needless excitement and distraction.  It should have been called LFGNS–“Local-Feature-Guided Network Searcher.”

For our part, we may want to collaborate with AI researchers to think about social categories relationally and historically rather than through an essentialist lens.  But successful collaborations require care and at least a sense of the other culture.  First, we may want to keep in mind through our collaborations that there is an inner logic to technical work.  To put it in Agre’s terms, technical work evolves in conversation with its own practical reality and does not necessarily aim at conceptual coherence.  Second, when they do draw on the social sciences, AI researchers tend to look at psychology and economics (and philosophy), rather than, say, sociology, history or anthropology.  (And not surprisingly, it is also in psychology and economics that machine learning has been taken up enthusiastically.  Kosinsky, for instance, has a PhD in psychology but seems to describe himself as a “data scientist.”)  This is not a coincidence: computer science, psychology and economics were all transformed by the cognitive revolution and took up, in various ways, the idea of information processing that was central to that revolution.  They are, all of them, in Philip Mirowski’s words, “cyborg sciences” and as such, concepts can travel easier between them.  So interpretive social scientists have their work cut out.  But even if our effort is doomed to fail, it should be our responsibility to open a dialogue with AI researchers and push for what we might call a non-essentialist understanding of social categories into AI.

Slaves of the Machines

March 20th, 2018

In his book “Slaves of the Machines”, first published in 1997, Gregory J.E. Rowlins take lay readers on a tour of the sometimes scary world to which computers are leading us. Today, 20 years later, and in a world where Artificial Intelligence (AI) has become a household name, his predictions are more relevant than ever.

Before we dive into the risks we are now facing, let us first start off with defining what Artificial Intelligence is. Stated simply, AI is machines doing things that are considered to require intelligence when humans do them, e.g. understanding natural language, recognizing faces in photos or driving a car. It’s the difference between a mechanical arm on a factory production line programmed to repeat the same basic task over and over again, and an arm that learns through trial and error how to handle different tasks by itself.

There are two risks that are most often brought up in relation to the introduction of Artificial Intelligence into our society and workplace:

  • Robots and further automation risk to displace a large set of existing jobs; and
  • Super-intelligent AI agents risk running amok, creating a so-called AI-mageddon.

In relation to the first risk, a recent research report by McKinsey Global Institute called “Harnessing Automation for a Future that Works” makes this threat quite clear by predicting that 49 percent of time spent on work activities today could be automated with “currently demonstrated technology” either already in the marketplace or being developed in labs. Luckily for us, McKinsey do think it will take a few decades to come to fruition due to other ingredients such as economics, labor markets, regulations and social attitudes.

As for the second risk, the dooms-day thesis has perhaps most famously been described by the Swedish philosopher and Oxford University Professor Nick Bostrom in his book “Superintelligence: Paths, Dangers, Strategies”. The risk Bostrom describe is not that an extremely intelligent agent would misunderstand what humans want it to do and do something else. Instead, the risk is that intensely pursuing the precise (but flawed) goal that the agent is programmed to pursue could pose large risks. An open letter on the website of the Future of Life Institute shows the seriousness of this risk. The letter is signed not just by famous AI outsiders such as Steve Hawking, Elon Musk, and Nick Bostrom but also by prominent computer scientists (including Demis Hassabis, a top AI researcher at Google).

Compared to the above two risks, less has been written about a potential third one, namely the threat of lost autonomy/fairness for and potential deceit of workers when controlled by AI. This arrangement, where machines are the brains and humans are the robots (or slaves), is not only in existence in manufacturing and logistics today. It also occurs frequently in new sectors ranging from medical sales to transportation services where human intervention is still required while AI is desired for productivity and profitability.

Ryan Calo and Alex Rosenblat touch on this dilemma in their paper “The Taking Economy: Uber, Information, And Power“. The paper gives a good picture of the limited autonomy Uber drivers have vis-à-vis the automated Uber AI control system. In order to maximize productivity, the system imposes severe restrictions on the information and choices available to drivers. Drivers are not allowed to know the destination of the next ride before pick-up; heat maps are shown without precise pricing or explanations how they were created; and no chances are given to drivers to opt-out from default settings. The AI platform is in control and the information process is concealed to the degree that we cannot review or judge its fairness.

Thankfully, there are increasing efforts in academia (e.g. UC Berkeley – Algorithmic Fairness and Opacity Working Group) and legislators (see Big Data – Federal Trade Commission) to help demystify AI and the underlying Machine Learning procedures on which it is built. These efforts look to implement:

  • Increased verification and audit requirements to prevent discrimination from creeping into algorithm designs;
  • Traceability and validation of models through defined test setups where both input and output data are well-known;
  • The possibility to override default settings to ensure fairness and control;
  • The introduction of security legislation to prevent unintentional manipulation by unauthorized parties.

In a world of AI, it is the “free will” that separates humans from machines. It is high time that we exercise this will and define how we want a world with AI to be.

How many of us woke up to the news of a 1.5% dip in the stock market today? This is primarily due to the outfall of Cambridge Analytica’s illicit use of profile data from Facebook. Of course, the illegality, as far as Facebook is concerned is for holding data that Cambridge Analytica said they had voluntarily removed from their servers years before. The current fallout to Facebook (down 7% today) is not for the potentially catastrophic end use of that data if proven to have been used in electioneering, which Cambridge Alaytica is under investigation in the UK for swinging the Brexit vote as well as in the US for helping elect Trump, who paid handsomely ($6M) to get access to their user-profile centered analyses.

Admittedly, with #deletefacebook trending up a storm on Twitter (of all places), there is a little bit of schadenfreude aimed at greedy Facebook ad executives baked into that 400 point drop in the Dow, but at its heart is an international call for better regulation of the deeply personal data that is housed and sold by Facebook and other tech giants. In this instance, the data policies that are in the limelight are two of the most problematic for Facebook: third party sharing/housing of data, and using research as a means for data acquisition. The research use of Facebook data is definitely tarnished

The market volatility and the fact that Facebook actually lost daily users last quarter in the US, some of which was attributable to data privacy concerns from their user base, highlights the need for more secure third party data use policies. These are exactly the reason why, even if you delete your profile, the data can live on (indefinitely) on the servers of third party vendors without known/feasible recourse by the Facebook users to demand the deletion of this data. And their privacy policy makes this clear, though it is a difficult read to figure that out.

Facebook’s outsized market value is based in a great part on their ability to aggregate their users’ personal data and freely sell it as desired. The European Union’s upcoming May 25th deadline to implement the General Dat Protection Regulations is likely to help push the needle towards more control of data deletion and usage by third parties in Europe, and it is exactly the specter of potentially farther reaching regulation about data usage that dragged down the market today and will ultimately lower Facebook’s value if more regulation comes about. The big question is whether Facebook and other large data acquiring companies will be able to balance their voracious profit motive and inherent need to sell our data with the ability to help protect our privacy, and/or whether heavy handed government tactics can achieve that second goal for them?

Seeing Through the Fog

March 19th, 2018

Welcome to the AFOG Blog! We will use this space to post what we hope are accessible and provocative think pieces and reactions to academic research and news stories. Posts about what? Allow us to use this initial blog post to answer that question and introduce ourselves.

Algorithms and computational tools/systems, particularly as applied to artificial intelligence and machine learning, are increasingly being used by firms and governments in domains of socially consequential classification and decision-making. But their construction, application, and consequences are raising new concerns over issues of fairness, bias, transparency, interpretability, and accountability. The development of approaches or solutions to address these challenges are still nascent. And they require attention from more than just technologists and engineers, as they are playing out in domains of longstanding interest to social scientists and scholars of media, law, and policy, including social equality, civil rights, labor and automation, and the evolution of the news media.

In the fall of 2017, Professors Jenna Burrell and Deirdre Mulligan at the UC Berkeley School of Information began the Algorithmic Fairness and Opacity Group (AFOG), a working group that brings together UC Berkeley faculty, postdocs, and graduate students to develop new ideas, research directions, and policy recommendations around these topics. We take an interdisciplinary approach to our research, with members based at a variety of schools and departments across campus. These include UC Berkeley’s School of Information, Boalt Hall School of Law, Haas School of Business, the Goldman School of Public Policy, the departments of Electrical Engineering and Computer Sciences (EECS) and Sociology, the Berkeley Institute of Data Science (BIDS), the Center for Science, Technology, Medicine & Society (CSTMS), and the Center for Technology, Society & Policy (CTSP).

We meet roughly biweekly at the School of Information for informal discussions, presentations, and workshops. We also host a speaker series that brings experts from academia and the technology industry to campus to give public talks and take part in interdisciplinary conversations. AFOG is supported by UC Berkeley’s School of Information and a grant from Google Trust and Safety.

Below is a sampling of some of the questions that we seek to address:

  • How do trends in data-collection and algorithmic classification relate to the restructuring of life chances, opportunities, and ultimately the social mobility of individuals and groups in society?
  • How does an algorithmically informed mass media and social media shape the stability of our democracy?
  • How can we design user interfaces for machine-learning systems that will support user understanding, empowered decision-making, and human autonomy?
  • What tools and techniques are emerging that offer ways to mitigate transparency and/or fairness problems?
  • Which methods are best suited to particular domains of application?
  • How can we identify and transcend differences across disciplines in order to make progress on issues of algorithmic opacity and fairness?

Look for more from us on the AFOG Blog in the weeks and months to come!

2017 was a bad year for Uber. If you’re reading this, you probably don’t need me to tell you why. What you might not have seen though, is how Uber used data science experiments to manipulate drivers.  In this New York Times article, Noam Sheiber discusses how Uber uses the results of data-driven experiments to influence drivers, including ways to get drivers to stay on the app and work longer, as well as getting drivers to exhibit certain behaviors (e.g. drive in certain neighborhoods at certain times).

In light of Uber’s widespread bad behavior, it’s been brought up several times that maybe we should have seen this coming.  After all, this is a company that has flown in the face of laws and regulations with premeditation and exuberance, operating successfully in cities where by rule their model isn’t allowed.  Given this, the question I’ll pursue here is what should we make of Airbnb, a company whose growth to unicorn status has been fueled by similarly brazen ignorance of local laws, pushing into cities where hosts often break the law (or at least the tax code) by listing their homes?

In particular, I’d like to take a look at how Airbnb affects how their hosts price their listings. Why? Well, this is where Airbnb has invested a lot of their data science resources (from what’s known publicly) and it’s one of the key levers where they can influence hosts.  The genesis of  their pricing model came in 2012, when Airbnb realized they had a problem. In a study, they found that many potential new hosts were going through the entire signup process, just to leave when prompted to price their listing. People didn’t know, or didn’t want to put in the work, to find out how much their listing was worth.  So, Airbnb built  hosts a tool that would offer pricing “tips”. The inference from Airbnb’s blog posts covering their pricing model is that this addressed the problem, as users happily rely on their tips – though they are careful to point out, repeatedly, that users are free to price at whatever they want.

As someone who is looking at this with the agenda of flagging any potential areas of concerns, this caught my attention.  The inference I took from reading several accounts of their pricing model, is that Airbnb believe users lean heavily (or blindly) on their pricing suggestions. I’d buy that. And why that’s concerning is we don’t really know how their model works.  Yes, we know that it’s a machine learning classifier model, that extracts features out of a listing, as well as incorporating dynamic market features (season, events, etc) to predict the value of the listing.  In their postings about their model, they list features it uses, and many make sense.  Wifi, private bathrooms, neighborhood, music festivals, all of these are things we’d expect. And others like “stage of growth for Airbnb” and “factors in demand” seem innocuous at first pass. But wait, what do those really mean?

One of the underlying problems present in Sheiber’s Uber article was that fundamentally, Uber’s and their Driver’s agendas were at odds. And while I wouldn’t say the relationship between Airbnb and their hosts is nearly as fraught as Uber and its drivers, it might not be 100% aligned. For host’s, the agenda is pretty simple: on any given listing, they’re trying to make as much money as possible. But for Airbnb, there’s way more at play. They’re trying to grow, and establish themselves as a reliable, go-to source for short-term housing rentals. They’re competing with the hotel industry as a whole, trying to establish themselves in new markets, and trying to change legislature the world over.  Any of these could be a reason why they might include features in their pricing tips model that do not lead it price listings at the maximum potential value.

The potential problem here is that while Airbnb likes to share their data science accomplishments, and even open source tools, they aren’t fully transparent with users and hosts about what factors go into some of the algorithms that effect user decisions. While it would be impossible to share every feature and it’s associated weights, it is entirely possible for them to inform users if their model takes into account factors whose intent is not to maximize user revenue. 

Clearly, this is all speculative, as I can’t with any certainty say what is behind the curtain of Airbnb’s pricing model. In writing this, I’m mearly hoping to bring attention to an interaction that is vulnerable to manipulation.

Filter Bubbles

March 13th, 2018

During our last live session, we discussed in detail the concept of filter bubbles. The condition in which we isolate ourselves inside an environment where everyone around us agrees with our points of view. It is being said a lot lately, not just during our live session, that these filter bubbles are exacerbated by business models and algorithms that power most of the internet. For example, Facebook runs on algorithms that aim to show the users the information that Facebook thinks they will be most interested in based on what the platform knows about them. So if you are on Facebook and like an article from a given source, chances are you will continue to see more articles from that and other similar sources constantly showing up on your feed and you will probably not see articles from other publications that are far away in the ideological spectrum. The same thing happens with Google News and Search, Instagram feeds, Twitter feeds, etc. The information that you see flowing through is based on the profile that these platforms have built around you and they present the information that they think best fits that profile.

Filter bubbles are highlighted as big contributors to the unexpected outcomes of some major political events around the world during 2016 such as the UK vote to exit the European Union as well as the result of the US presidential election in favor of Donald Trump. The idea is that in a politically divided society, filter bubbles make it even harder for groups to find common ground, compromise, and work towards a common goal. Another reason filter bubbles are seen as highly influential in collective decision making is that people tend to trust other individuals in their own circles much more than “impartial” third parties. For example, a person would much rather believe what his or her neighbor is posting on the Facebook wall over what the article in a major national newspaper is reporting on, if the two ideas are opposed to each other, even if the newspaper is a longstanding and reputable news outlet.

This last effect is to me, the most detrimental aspect of internet-based filter bubbles. Because it lends itself for easy exploitation and abuse. With out-of-the-box functionality, these platforms allow trolls and malicious agents to easily identify and join like-minded cohorts and present misleading and false information pretending to be just another member of the trusted group. This type of exploitation is currently being exposed and documented, for example, as part of the ongoing investigation on Russian meddling in the 2016 US Presidential election. But I believe that the most unsettling aspect of this is not the false information itself, it is the fact that that the tools being used to disseminate it are not backdoor hacking or sophisticated algorithms. It is being done using the very core and key functionality of the platforms, which is the ability of third party advertisers to identify specific groups in order to influence them with targeted messages. That is the core business model and selling point of all of these large internet companies and I believe it is fundamentally flawed.

So can we fix it? Do we need to pop out the filter bubbles and reach across the aisle? That would be certainly helpful. But very difficult to implement. Filter bubbles have always been around. I remember in my early childhood, in the small town where I grew up, pretty much everyone around me believe somewhat similar things. We all shared relatively similar ideas, values, and world views. That is natural human behavior. We thrive in tribes. But because we all knew each other, it was also very difficult for external agents to use that close knit community to disguise false information and propaganda. So my recommendation to these big internet companies, would not necessarily be to show views and articles across a wider range of ideas. That’d be nice. But most importantly, I would ask for them to ensure that the information shared by their advertisers and the profiles they surface on users’ feeds are properly vetted out. Put truth before bottom lines.