The Blurry Line of Wearables

Imagine a world where your heart rate, sleep pattern, and other biometrics are being measured and monitored by your employer on a daily basis. They can decide when and how much you can work based on these metrics. They can decide how much to pay you or even fire you based on this information. This is the world that a lot of athletes now live in. Wearable biometric trackers are becoming the norm in sports across the world. The NBA has begun adopting it to aid recovery and monitor training regimens. MLB is using it to monitor pitchers’ arms. There are European sports teams that display this information on a jumbo-tron live in the arena.

Currently in the NBA, the collective bargaining agreement between the player’s union and the NBA state that “data collected from a Wearable worn at the request of a team may be used for player health and performance purposes and Team on-court tactical and strategic purposes only. The data may not be considered, used, discussed or referenced for any other purpose such as in negotiations regarding a future Player Contract or other Player Contract transaction.” The line between using biometric data for strategic or health purposes and using it for contract talks or roster decisions can become blurred quickly. What begins as a simple monitoring of heart rate in practice can end with a team cutting a player for being out of shape or not trying hard enough. A pitcher getting rested by a team because they saw that his arm is tiring out can lead to less money next time that pitcher negotiates for a contract since that pitcher played in less games. If a team sees that a player isn’t sleeping as much as they should, they can offer less money and say that it’s because the player is going out partying too much and they question his character. These are all possible ethical and legal violations of a player’s rights. Being able to monitor a player during a game is one thing. But being able to monitor their entire lives and controlling what they can or cannot do and tying that to their paycheck is another.

On top of these ethical and legal issues that wearables raise, there are also a lot of privacy risks. These players are in the public eye every day and leaks happen within organizations. The more sensitive medical information a team collects, the more risks there are that there are of HIPAA violations. Imagine if a player gets traded or cut for what seems like an unexplainable reason to the public. People might start questioning if there are medical issues with the player. Fans love to dissect every part of a player’s life and teams love to post as much information as possible online for their fans to look at. We’ve seen the risks of open public datasets and how even anonymized information can be de-anonymized and tied back to an individual. With the vast amount of information in the world about these players, this is definitely a risk.

All of these issues come as wearable technology become commonplace, but there are other risks on the horizon. We’re beginning to see embeddables, such as a pill to take or a device implemented under the skin, being developed and it will only be a matter a time before professional sports start taking advantage of it. These issues are not only faced by professional athletes. Think about your own job. Do you wear a badge at work? Can your employer track where you are at all times and analyze your performance based on that? What if in the future DNA testing is required? As more technology is developed and data is collected, we need to be aware of the possible issues that come with it and ask ourselves these questions.

Between Free Speech and the Right to Privacy

In a W231 group project that I recently worked on, we attempted to combine Transparent California, an open database of California’s employees’ salaries and pension data, with social media information to assemble detailed profiles of California’s employees. While we were able to do so on a individual employees, one by one, limitations posed by large social networks on scraping by third party tools prevented us from doing this at scale. This may soon change.

In a recent ruling, the Federal US District Court for the Northern District of California concluded that the giant social network cannot prevent HiQ Labs from accessing and using its data. The startup helps HR professionals fight attrition by scraping LinkedIn data and deploying machine learning algorithms to predict employees’ flight risk.

 

This was a fascinating case of the sometimes-inevitable clash between the public’s right to access to information, and the individuals’ right to privacy. On one hand – most would agree that liberating our data from tech giants such as LinkedIn, now owned by Microsoft, is a positive outcome; on the other – allowing universal access to it doesn’t come without risks.

 

Free speech advocates praise this ruling as potentially signaling a new direction in US courts’ approach to social networks’ control over data their users shared publically. This new approach was exemplified only a month earlier by another ruling by the US supreme court, where usage of social media was described as “speaking and listening in the modern public square”. If social media indeed is a modern public square, there should be little debate on whether information posted there can be used by anyone, for any reason.

 

There are, however, disadvantages to this increased access to publically posted data. The first of which, as my group project discussed, is that by combining multiple publically available data sets one can violate users’ privacy. And, practically adding the vast sea of information held by social networks to these open data sets, enables an ever-increased violation of privacy. It may be claimed that if users themselves post information online, companies who use it do nothing wrong. However, it is unclear whether users who post information to their public LinkedIn profiles intend for it to be scraped, analyzed and sold by other services. It is much more likely that most users expect this information to be viewed by other individual users.

 

Finally, as argued by LinkedIn, the right to privacy covers not only the data itself but also changes to it. When changing their profiles, social media users mostly do not wish to broadcast the change publically, but to display it to friends or connections who visit their profiles. LinkedIn even allows users to choose whether to make changes private, and post them only to the user’s profile, or public and let them appear on others’ news feed. Allowing third parties to scrape information revokes this right from users – algorithms such as HiQ’s scrape profiles, pick up changes, and sell them.

 

LinkedIn already appealed the court’s decision, and it will likely be a while before information on social media will be treated literally as posted on the public square. Courts will be required to choose, again, between the right to privacy and the right to access to information in this case. But regardless of what the decision will be, this is yet another warning sign reminding us, again, to be thoughtful of what information we share online – it will likely reach more eyes, and be used in other ways than we originally intended.

Algorithm is not a magic word

People may throw the word “algorithm” around to justify that their decisions are good and unbiased, and it sounds technical enough that you might trust them. Today I want to demystify this concept for you. Fair warning: it may be a bit disillusioning.

What is an algorithm? You’ll be happy to know that it’s not even that technical. An algorithm is just a set of predetermined steps that are followed to reach some final conclusion. Often we talk about algorithms in the context of computer programs, but you probably have several algorithms that you use in your daily life without even realizing it. You may have an algorithm that you follow for clearing out your email inbox, for unloading your dishwasher, or for making a cup of tea.

Potential Inbox Clearing Algorithm

  • Until inbox is empty:
    • Look at the first message:
      • If it’s spam, delete it.
      • If it’s a task you need to complete:
        • Add it to your to-do list.
        • File the email into a folder.
      • If it’s an event you will attend:
        • Add it to your calendar.
        • File the email into a folder.
      • If it’s personal correspondence you need to reply to:
        • Reply.
        • File the email into a folder.
      • For all other messages:
        • File the email into a folder.
    • Repeat.

Computer programs use algorithms too, and they work in a very similar way. Since the steps are predetermined and the computer is making all the decisions and spitting out the conclusion in the end, some people may think that algorithms are completely unbiased. But if you look more closely, you’ll notice that interesting algorithms can have pre-made decisions built into the steps. In this case, the computer isn’t making the decision at all, it’s just executing the decision that a person made. Consider this (overly simplified) algorithm that could decide when to approve credit for an individual requesting a loan:

Algorithm for Responding to Loan Request

  • Check requestor’s credit score.
  • If credit score is above 750, approve credit.
  • Otherwise, deny credit.

This may seem completely unbiased because you are trusting the computer to act on the data provided (credit score) and you are not allowing any other external factors such as age, race, gender, or sexual orientation to influence the decision. In reality though, a human being had to decide on the appropriate threshold for the credit score. Why did they choose 750? Why not 700, 650, or 642? They also had to choose to base their decision solely on credit score, but could there be other factors that the credit score subtly reflects, such as age or duration of time spent in America? (hint: yes.) With more complicated algorithms, there are many more decisions about what thresholds to use and what information is worth considering, which brings more potential for bias to creep in, even if it’s unintentional.

Algorithms are useful because they can help humans use additional data and resources to make more informed decisions in a shorter amount of time, but they’re not perfect or inherently fair. Algorithms are subject to the same biases and prejudices that humans are, simply from the fact that 1) a human designed the steps in an algorithm, and 2) the data that is fed into the algorithm is generated in the context of our human society, including all of its inherent biases.

These inherent biases built into algorithms can manifest in dark ways with considerable negative impacts to individuals. If you’re interested in some examples, take a look at this story about Somali markets in Seattle that were prohibited from accepting food stamps, or this story about how facial recognition software used in criminal investigations can lead to a disproportionate targeting of minority individuals.

In the future, when you see claims that an important decision was based on some algorithm, I hope you will hold the algorithm to the same standards that you would any other human decision-maker. We should continue to question the motivations behind the decision, the information that was considered, and the impact of the results.

For further reading:

https://fivethirtyeight.com/features/technology-is-biased-too-how-do-we-fix-it/  

http://culturedigitally.org/2012/11/the-relevance-of-algorithms/

https://link.springer.com/article/10.1007/s10676-010-9233-7

Open Data Challenges and Opportunities

            The Open Data movement has won the day and governments around the world  – as well as scientific researchers, non-profits and even private companies – are embracing data sharing. Open data has huge benefits in encouraging replicability of scientific research, encouraging community and non-profit engagement with government data, providing accountability, and aiding businesses (especially new businesses which may not yet have detailed customer data). Open Data has already scored several key wins in areas from criminal justice reform to improved diagnostics. On the criminal justice side, open data lead to the discovery of the fact that “stop and frisk” policies in New York were both ineffective and heavily biased against minorities. This discovery was instrumental in the successful community effort to end this policy[1]. Considering health care, promising innovations in using deep learning to diagnose cancer are being assisted by publicly released data sets of labeled MRI and CAT scan images. These efforts demonstrate how Open Data can democratize the benefits of government data and publicly funded research and in this positive context, it is easy to see Open Data as a panacea. However,  several key concerns remain around privacy, accessibility, and ethics.

The most popular concern about Open Data revolves around the privacy concerns that arise from the public release of data – especially the public release of data that users might not have realized is private. Balancing privacy with providing the granularity of data needed for more sophisticated analysis is an ongoing concern, although increasingly a shared set of policies and practices are being developed around privacy protection[2]. But despite these advances in policy and practice, key privacy concerns remain both in general and in specific instances in which clear privacy harms have been caused.  For example, New York city released taxi trip data with license numbers hashed, which lead to two separate key privacy concerns[3]. First, data was easily de-anonymized by a  civic hacker, leading to privacy concerns for the taxi drivers. And second, several data scientists demonstrate how they identified one particular individual as a frequent customer of a gentleman’s club (which, clearly, is potentially very publically embarrassing) [4]. This case specifically demonstrates how very specific GPS data and a little bit of clever analytics can very easily de-anonymize particular users and expose them to substantial privacy risk. Differential privacy – seeking to provide accurate results on an aggregate level without allowing any individual to be identified – should be applied to avoid these situations but advances in record linkage make this even more difficult as researchers have to consider both the privacy risks in their own data but also how it could be combined with other data in privacy-damaging ways. And these high-profile failures demonstrate that there is still a progress to be made in privacy protection.

In addition to these concerns about privacy, there are several other aspects of open data that merit further consideration. One of these is around access to open data – with the increasing size and complexity of data, just providing access data itself may not be enough. Several companies’ business models are the processing and serving of open data[5], which illustrates that open data doesn’t mean easy to access data. If the public is funding data collection, some argue, it is not enough to provide the data but more effort needs to be put into making this data truly accessible to the public and smaller firms. Seconds, concerns remain around opacity of the ethics behind publicly accessible data. While the data itself – and often the code that produces results – is publicly accessible, the ethical decision making around study design, eligibility, and protections for subject often is not[6].  The lack of accessibility of Internal Review Board reports and ethical evaluations impedes the public’s ability to make informed judgements about public data sets.

Overall, Open Data has great potential to improve accountability and foster innovative solutions to social issues, but still requires work in order to balance privacy and ethics with openness.

[1] https://sunlightfoundation.com/2015/05/01/the-benefits-of-criminal-justice-data-beyond-policing/

[2] http://reports.opendataenterprise.org/BriefingPaperonOpenDataandPrivacy.pdf

[3] ibid

[4] https://research.neustar.biz/author/atockar/

[5] http://www.computerweekly.com/opinion/The-problem-with-Open-Data

[6] https://www.forbes.com/sites/kalevleetaru/2017/07/20/should-open-access-and-open-data-come-with-open-ethics/#1b0bc7565426

Politicizing Data: The NOAA Data Controversy

In 2015, climatologists at the National Oceanic and Atmospheric Administration (NOAA) published a study calling into question a global warming “hiatus”, in which the rate of global warming increase was supposedly slowing. Prior to this study, it was believed that the rate of increase in global warming from 1998 to 2012 was about 33% to 50% the rate of the previous decades. The NOAA study found that the slowdown was much less significant (and that there wasn’t potentially one at all). However, later that year, a whistleblower alleged that the paper ignored major issues in a “rush to publication,” including improper data storage, improper data selection, lack of transparency and not noting that the data was “experimental”.  This whistleblower was possibly Dr. John Bates, a former NOAA scientist who gave an interview in early 2017 with the Daily Mail where he reiterated many of the accusations (1. Rose- 2017). It’s worth noting that this Daily Mail article now has to open with a disclaimer that “The Independent Press Standards Organisation has upheld a complaint against this article” regarding journalistic integrity.

The allegations caused the NOAA to review the paper, and fueled many climate change skeptics to decry the organization and climate change in general. Lamar Smith, the head of the House Science Committee, used this as a talking point and attempted to subpoena the emails of the NOAA (in fairness, he was trying to subpoena them prior to the whistleblower stepping forward and frequently has tried to subpoena the emails of climatologists whose work he doesn’t agree). Initially, the charges made it seem that the NOAA study improperly used data, and perhaps even weighed it in order to get a particular outcome. However, the study actually based its data based on previously published papers, and several subsequent papers using independent data sources corroborated the NOAA’s findings (2. Wright 2017). In fact, the study methodology corrected bias in some previous data sets (previous data was aggregated from buoys and ships without accounting for measurement differences between the two. Ship data is less accurate than buoy data because ships can generate their own heat, and the new data puts more emphasis on the more accurate data source (3. Hausfather 2017). Granted, there has at least been one major paper in 2016 that still supported a potential “hiatus.”

In the end, Dr. Bates ended up clarifying that the data was not fraudulent, but he was not happy with some protocol breaches. However, it seems like his actual concern was not really about data at all, but rather that he believed the paper authors were trying to influence policy (specifically in regards to the Paris Climate Agreement). In his own words,

“You really have to provide the most objective view and let the             policymakers decide from their role. I’m getting much more wary of scientists growing into too much advocacy. I think there is certainly a role there, and yet people have to really examine themselves for their own bias and be careful about that.” (4. Waldman- 2017).

While we should always be examining our own bias, the scientific community has a right to influence policy; it’s illogical to disseminate important information, and then leave all the policy decisions to people who deny your findings with no factual basis. If the data was improperly obtained or if the article had incorrect results due to rushing, this would be a different story. However, neither of these are the case. I was recently reminded of this scandal when someone was posting this as proof not to trust climatologists, so the allegations of impropriety have real world effects in shaping how voters view climate change. Additionally, much of the groundwork for Paris had been laid out prior to the 2015 paper being published, so it’s hard to say whether it even had any effect.

The reality is in the end, working quickly to get out a paper that confirms what the scientific community already agreed upon was not much of a political act. As scientists, we not only have to understand what we are researching, but also how it will be used; this also extends to critiques we make. The irony is that in attempting to stop other scientists from advocating for policy, Dr. Bates drew far more negative political attention to the subject at hand. He was the one who politicized the paper.

 

Works Cited

1. Rose, D (2017, Feb 4). Exposed: How world leaders were duped into investing billions over manipulated global warming data. Daily Mail.

http://www.dailymail.co.uk/sciencetech/article-4192182/World-leaders-duped-manipulated-global-warming-data.html

2. Wright, P (2017, Feb 9). The Data is Right: Climate Change is Still Real. Weather.com

https://weather.com/science/environment/news/climate-change-noaa-controversy-study

3. Hausfather, Z (2017, Feb 5). Factcheck: Mail on Sunday’s ‘astonishing evidence’ about global temperature rise. CarbonBrief

Factcheck: Mail on Sunday’s ‘astonishing evidence’ about global temperature rise

4. Waldman, S (2017, Feb 7). ‘Whistleblower’ says protocol was breached but no data fraud. E&E News

https://www.eenews.net/stories/1060049630

Fall 2017 Test

Hi there, everyone! This is a “test” post to ensure that the process is working as intended and that everyone should have access to create posts of your own!

Hello

Hello new MIDS students! This will be where your blog posts are listed; currently it is set so your entries are only visible to members of the I School Community, but we highly encourage you to have it be posted publicly as well!