Archive for November 20th, 2017

In a W231 group project that I recently worked on, we attempted to combine Transparent California, an open database of California’s employees’ salaries and pension data, with social media information to assemble detailed profiles of California’s employees. While we were able to do so on a individual employees, one by one, limitations posed by large social networks on scraping by third party tools prevented us from doing this at scale. This may soon change.

In a recent ruling, the Federal US District Court for the Northern District of California concluded that the giant social network cannot prevent HiQ Labs from accessing and using its data. The startup helps HR professionals fight attrition by scraping LinkedIn data and deploying machine learning algorithms to predict employees’ flight risk.


This was a fascinating case of the sometimes-inevitable clash between the public’s right to access to information, and the individuals’ right to privacy. On one hand – most would agree that liberating our data from tech giants such as LinkedIn, now owned by Microsoft, is a positive outcome; on the other – allowing universal access to it doesn’t come without risks.


Free speech advocates praise this ruling as potentially signaling a new direction in US courts’ approach to social networks’ control over data their users shared publically. This new approach was exemplified only a month earlier by another ruling by the US supreme court, where usage of social media was described as “speaking and listening in the modern public square”. If social media indeed is a modern public square, there should be little debate on whether information posted there can be used by anyone, for any reason.


There are, however, disadvantages to this increased access to publically posted data. The first of which, as my group project discussed, is that by combining multiple publically available data sets one can violate users’ privacy. And, practically adding the vast sea of information held by social networks to these open data sets, enables an ever-increased violation of privacy. It may be claimed that if users themselves post information online, companies who use it do nothing wrong. However, it is unclear whether users who post information to their public LinkedIn profiles intend for it to be scraped, analyzed and sold by other services. It is much more likely that most users expect this information to be viewed by other individual users.


Finally, as argued by LinkedIn, the right to privacy covers not only the data itself but also changes to it. When changing their profiles, social media users mostly do not wish to broadcast the change publically, but to display it to friends or connections who visit their profiles. LinkedIn even allows users to choose whether to make changes private, and post them only to the user’s profile, or public and let them appear on others’ news feed. Allowing third parties to scrape information revokes this right from users – algorithms such as HiQ’s scrape profiles, pick up changes, and sell them.


LinkedIn already appealed the court’s decision, and it will likely be a while before information on social media will be treated literally as posted on the public square. Courts will be required to choose, again, between the right to privacy and the right to access to information in this case. But regardless of what the decision will be, this is yet another warning sign reminding us, again, to be thoughtful of what information we share online – it will likely reach more eyes, and be used in other ways than we originally intended.

Algorithm is not a magic word

November 20th, 2017

People may throw the word “algorithm” around to justify that their decisions are good and unbiased, and it sounds technical enough that you might trust them. Today I want to demystify this concept for you. Fair warning: it may be a bit disillusioning.

What is an algorithm? You’ll be happy to know that it’s not even that technical. An algorithm is just a set of predetermined steps that are followed to reach some final conclusion. Often we talk about algorithms in the context of computer programs, but you probably have several algorithms that you use in your daily life without even realizing it. You may have an algorithm that you follow for clearing out your email inbox, for unloading your dishwasher, or for making a cup of tea.

Potential Inbox Clearing Algorithm

  • Until inbox is empty:
    • Look at the first message:
      • If it’s spam, delete it.
      • If it’s a task you need to complete:
        • Add it to your to-do list.
        • File the email into a folder.
      • If it’s an event you will attend:
        • Add it to your calendar.
        • File the email into a folder.
      • If it’s personal correspondence you need to reply to:
        • Reply.
        • File the email into a folder.
      • For all other messages:
        • File the email into a folder.
    • Repeat.

Computer programs use algorithms too, and they work in a very similar way. Since the steps are predetermined and the computer is making all the decisions and spitting out the conclusion in the end, some people may think that algorithms are completely unbiased. But if you look more closely, you’ll notice that interesting algorithms can have pre-made decisions built into the steps. In this case, the computer isn’t making the decision at all, it’s just executing the decision that a person made. Consider this (overly simplified) algorithm that could decide when to approve credit for an individual requesting a loan:

Algorithm for Responding to Loan Request

  • Check requestor’s credit score.
  • If credit score is above 750, approve credit.
  • Otherwise, deny credit.

This may seem completely unbiased because you are trusting the computer to act on the data provided (credit score) and you are not allowing any other external factors such as age, race, gender, or sexual orientation to influence the decision. In reality though, a human being had to decide on the appropriate threshold for the credit score. Why did they choose 750? Why not 700, 650, or 642? They also had to choose to base their decision solely on credit score, but could there be other factors that the credit score subtly reflects, such as age or duration of time spent in America? (hint: yes.) With more complicated algorithms, there are many more decisions about what thresholds to use and what information is worth considering, which brings more potential for bias to creep in, even if it’s unintentional.

Algorithms are useful because they can help humans use additional data and resources to make more informed decisions in a shorter amount of time, but they’re not perfect or inherently fair. Algorithms are subject to the same biases and prejudices that humans are, simply from the fact that 1) a human designed the steps in an algorithm, and 2) the data that is fed into the algorithm is generated in the context of our human society, including all of its inherent biases.

These inherent biases built into algorithms can manifest in dark ways with considerable negative impacts to individuals. If you’re interested in some examples, take a look at this story about Somali markets in Seattle that were prohibited from accepting food stamps, or this story about how facial recognition software used in criminal investigations can lead to a disproportionate targeting of minority individuals.

In the future, when you see claims that an important decision was based on some algorithm, I hope you will hold the algorithm to the same standards that you would any other human decision-maker. We should continue to question the motivations behind the decision, the information that was considered, and the impact of the results.

For further reading: