Between Free Speech and the Right to Privacy – Data Science W231

In a W231 group project that I recently worked on, we attempted to combine Transparent California, an open database of California’s employees’ salaries and pension data, with social media information to assemble detailed profiles of California’s employees. While we were able to do so on a individual employees, one by one, limitations posed by large social networks on scraping by third party tools prevented us from doing this at scale. This may soon change.

In a recent ruling, the Federal US District Court for the Northern District of California concluded that the giant social network cannot prevent HiQ Labs from accessing and using its data. The startup helps HR professionals fight attrition by scraping LinkedIn data and deploying machine learning algorithms to predict employees’ flight risk.

This was a fascinating case of the sometimes-inevitable clash between the public’s right to access to information, and the individuals’ right to privacy. On one hand – most would agree that liberating our data from tech giants such as LinkedIn, now owned by Microsoft, is a positive outcome; on the other – allowing universal access to it doesn’t come without risks.

Free speech advocates praise this ruling as potentially signaling a new direction in US courts’ approach to social networks’ control over data their users shared publically. This new approach was exemplified only a month earlier by another ruling by the US supreme court, where usage of social media was described as “speaking and listening in the modern public square”. If social media indeed is a modern public square, there should be little debate on whether information posted there can be used by anyone, for any reason.

There are, however, disadvantages to this increased access to publically posted data. The first of which, as my group project discussed, is that by combining multiple publically available data sets one can violate users’ privacy. And, practically adding the vast sea of information held by social networks to these open data sets, enables an ever-increased violation of privacy. It may be claimed that if users themselves post information online, companies who use it do nothing wrong. However, it is unclear whether users who post information to their public LinkedIn profiles intend for it to be scraped, analyzed and sold by other services. It is much more likely that most users expect this information to be viewed by other individual users.

Finally, as argued by LinkedIn, the right to privacy covers not only the data itself but also changes to it. When changing their profiles, social media users mostly do not wish to broadcast the change publically, but to display it to friends or connections who visit their profiles. LinkedIn even allows users to choose whether to make changes private, and post them only to the user’s profile, or public and let them appear on others’ news feed. Allowing third parties to scrape information revokes this right from users – algorithms such as HiQ’s scrape profiles, pick up changes, and sell them.

LinkedIn already appealed the court’s decision, and it will likely be a while before information on social media will be treated literally as posted on the public square. Courts will be required to choose, again, between the right to privacy and the right to access to information in this case. But regardless of what the decision will be, this is yet another warning sign reminding us, again, to be thoughtful of what information we share online – it will likely reach more eyes, and be used in other ways than we originally intended.

Leave a Reply Cancel reply