Archive for November, 2017

Data Breaches

November 23rd, 2017

According to the United States Government, “A data breach is a security violation in which sensitive, protected or confidential data is copied, transmitted, viewed, stolen or used by an individual unauthorized to do so.”  The news has been filled with massive company data breaches involving customer and employee information.

Notification Laws: Every state in the U.S., with the exception of Alabama and South Dakota, has a data breach notification law in place.  The National Conference of State Legislators has a link to all the different state laws so you can see what your state requires.  Keeping track of all these laws could be very confusing, not including all the international laws for multinational corporations.  Currently, there is no federal law that covers general personal information data breaches. Both the Data Security and Breach Notification Act of 2015 and Personal Data Notification and Protection Act of 2017 have been introduced into the House of Representatives but that is as far as they got.  For health information specifically, there are two rules at the federal level that cover notification to those effected which are the Health Breach Notification Rule and the HIPAA Breach Notification Rule.

Data Ownership: Discussion stemming from these breaches has brought up the topic of data ownership. The personal information that companies have residing in their databases has long been thought of as their property.  This concept has been changing and evolving as our personal data has been proliferated into many databases with increasingly more personal information being collected and generated.  Users of these websites and companies understand that organizations need their information to provide services, whether that’s a personalized shopping experience or hailing a ride.  This point of ownership cannot be highlighted enough.  The acquiring of personal information gained in a data breach is not just an attack on the company but is an attack on all this users whose personal information was stolen and could be sold or used for illegal activities.

Timing: Customers of these companies want to know if their information has been compromised, so they can evaluate if accounts or other identity fraud situations have occurred. There are several milestones in the data breach timeline.  One is when the data breach actually occurred.  This may not be known if the company does not have a digital trail and infrastructure to discover when this happened.  This may be well before the next milestone of the company discovering a breach and assessing the extent of the breach.  The next milestone would be the corrective action taken by the effected company or agency to ensure the data is now being protected.  Currently, only eight states have a firm deadline for notification which is usually 30 to 90 days after discovery of the breach.

Encryption: California led the data breach notification law effort by passing, in 2002, a law requiring businesses and government agencies to notify California residents of data security breaches.  In the California law, there is an exception to notifying those effected if the personal information is encrypted. The law defines the term “encrypted” to mean “rendered unusable, unreadable, or indecipherable to an unauthorized person through a security technology or methodology generally accepted in the field of information security.”  These broad terms for encryption do not include a particular levels of encryption but tries to leave open the increasing level of encryption by whatever the industry standard is at that time.  Maybe if a breach occurs, a government or third party could evaluate the company’s encryption levels to determine if reporting is required.

The issue of data breaches is not going away. If Government agencies and companies do not respond in a fashion that customers find acceptable, users will start to become wary of sharing this valuable personal information and the insights that come with it will be lost.

The Blurry Line of Wearables

November 22nd, 2017

Imagine a world where your heart rate, sleep pattern, and other biometrics are being measured and monitored by your employer on a daily basis. They can decide when and how much you can work based on these metrics. They can decide how much to pay you or even fire you based on this information. This is the world that a lot of athletes now live in. Wearable biometric trackers are becoming the norm in sports across the world. The NBA has begun adopting it to aid recovery and monitor training regimens. MLB is using it to monitor pitchers’ arms. There are European sports teams that display this information on a jumbo-tron live in the arena.

Currently in the NBA, the collective bargaining agreement between the player’s union and the NBA state that “data collected from a Wearable worn at the request of a team may be used for player health and performance purposes and Team on-court tactical and strategic purposes only. The data may not be considered, used, discussed or referenced for any other purpose such as in negotiations regarding a future Player Contract or other Player Contract transaction.” The line between using biometric data for strategic or health purposes and using it for contract talks or roster decisions can become blurred quickly. What begins as a simple monitoring of heart rate in practice can end with a team cutting a player for being out of shape or not trying hard enough. A pitcher getting rested by a team because they saw that his arm is tiring out can lead to less money next time that pitcher negotiates for a contract since that pitcher played in less games. If a team sees that a player isn’t sleeping as much as they should, they can offer less money and say that it’s because the player is going out partying too much and they question his character. These are all possible ethical and legal violations of a player’s rights. Being able to monitor a player during a game is one thing. But being able to monitor their entire lives and controlling what they can or cannot do and tying that to their paycheck is another.

On top of these ethical and legal issues that wearables raise, there are also a lot of privacy risks. These players are in the public eye every day and leaks happen within organizations. The more sensitive medical information a team collects, the more risks there are that there are of HIPAA violations. Imagine if a player gets traded or cut for what seems like an unexplainable reason to the public. People might start questioning if there are medical issues with the player. Fans love to dissect every part of a player’s life and teams love to post as much information as possible online for their fans to look at. We’ve seen the risks of open public datasets and how even anonymized information can be de-anonymized and tied back to an individual. With the vast amount of information in the world about these players, this is definitely a risk.

All of these issues come as wearable technology become commonplace, but there are other risks on the horizon. We’re beginning to see embeddables, such as a pill to take or a device implemented under the skin, being developed and it will only be a matter a time before professional sports start taking advantage of it. These issues are not only faced by professional athletes. Think about your own job. Do you wear a badge at work? Can your employer track where you are at all times and analyze your performance based on that? What if in the future DNA testing is required? As more technology is developed and data is collected, we need to be aware of the possible issues that come with it and ask ourselves these questions.

In a W231 group project that I recently worked on, we attempted to combine Transparent California, an open database of California’s employees’ salaries and pension data, with social media information to assemble detailed profiles of California’s employees. While we were able to do so on a individual employees, one by one, limitations posed by large social networks on scraping by third party tools prevented us from doing this at scale. This may soon change.

In a recent ruling, the Federal US District Court for the Northern District of California concluded that the giant social network cannot prevent HiQ Labs from accessing and using its data. The startup helps HR professionals fight attrition by scraping LinkedIn data and deploying machine learning algorithms to predict employees’ flight risk.


This was a fascinating case of the sometimes-inevitable clash between the public’s right to access to information, and the individuals’ right to privacy. On one hand – most would agree that liberating our data from tech giants such as LinkedIn, now owned by Microsoft, is a positive outcome; on the other – allowing universal access to it doesn’t come without risks.


Free speech advocates praise this ruling as potentially signaling a new direction in US courts’ approach to social networks’ control over data their users shared publically. This new approach was exemplified only a month earlier by another ruling by the US supreme court, where usage of social media was described as “speaking and listening in the modern public square”. If social media indeed is a modern public square, there should be little debate on whether information posted there can be used by anyone, for any reason.


There are, however, disadvantages to this increased access to publically posted data. The first of which, as my group project discussed, is that by combining multiple publically available data sets one can violate users’ privacy. And, practically adding the vast sea of information held by social networks to these open data sets, enables an ever-increased violation of privacy. It may be claimed that if users themselves post information online, companies who use it do nothing wrong. However, it is unclear whether users who post information to their public LinkedIn profiles intend for it to be scraped, analyzed and sold by other services. It is much more likely that most users expect this information to be viewed by other individual users.


Finally, as argued by LinkedIn, the right to privacy covers not only the data itself but also changes to it. When changing their profiles, social media users mostly do not wish to broadcast the change publically, but to display it to friends or connections who visit their profiles. LinkedIn even allows users to choose whether to make changes private, and post them only to the user’s profile, or public and let them appear on others’ news feed. Allowing third parties to scrape information revokes this right from users – algorithms such as HiQ’s scrape profiles, pick up changes, and sell them.


LinkedIn already appealed the court’s decision, and it will likely be a while before information on social media will be treated literally as posted on the public square. Courts will be required to choose, again, between the right to privacy and the right to access to information in this case. But regardless of what the decision will be, this is yet another warning sign reminding us, again, to be thoughtful of what information we share online – it will likely reach more eyes, and be used in other ways than we originally intended.

Algorithm is not a magic word

November 20th, 2017

People may throw the word “algorithm” around to justify that their decisions are good and unbiased, and it sounds technical enough that you might trust them. Today I want to demystify this concept for you. Fair warning: it may be a bit disillusioning.

What is an algorithm? You’ll be happy to know that it’s not even that technical. An algorithm is just a set of predetermined steps that are followed to reach some final conclusion. Often we talk about algorithms in the context of computer programs, but you probably have several algorithms that you use in your daily life without even realizing it. You may have an algorithm that you follow for clearing out your email inbox, for unloading your dishwasher, or for making a cup of tea.

Potential Inbox Clearing Algorithm

  • Until inbox is empty:
    • Look at the first message:
      • If it’s spam, delete it.
      • If it’s a task you need to complete:
        • Add it to your to-do list.
        • File the email into a folder.
      • If it’s an event you will attend:
        • Add it to your calendar.
        • File the email into a folder.
      • If it’s personal correspondence you need to reply to:
        • Reply.
        • File the email into a folder.
      • For all other messages:
        • File the email into a folder.
    • Repeat.

Computer programs use algorithms too, and they work in a very similar way. Since the steps are predetermined and the computer is making all the decisions and spitting out the conclusion in the end, some people may think that algorithms are completely unbiased. But if you look more closely, you’ll notice that interesting algorithms can have pre-made decisions built into the steps. In this case, the computer isn’t making the decision at all, it’s just executing the decision that a person made. Consider this (overly simplified) algorithm that could decide when to approve credit for an individual requesting a loan:

Algorithm for Responding to Loan Request

  • Check requestor’s credit score.
  • If credit score is above 750, approve credit.
  • Otherwise, deny credit.

This may seem completely unbiased because you are trusting the computer to act on the data provided (credit score) and you are not allowing any other external factors such as age, race, gender, or sexual orientation to influence the decision. In reality though, a human being had to decide on the appropriate threshold for the credit score. Why did they choose 750? Why not 700, 650, or 642? They also had to choose to base their decision solely on credit score, but could there be other factors that the credit score subtly reflects, such as age or duration of time spent in America? (hint: yes.) With more complicated algorithms, there are many more decisions about what thresholds to use and what information is worth considering, which brings more potential for bias to creep in, even if it’s unintentional.

Algorithms are useful because they can help humans use additional data and resources to make more informed decisions in a shorter amount of time, but they’re not perfect or inherently fair. Algorithms are subject to the same biases and prejudices that humans are, simply from the fact that 1) a human designed the steps in an algorithm, and 2) the data that is fed into the algorithm is generated in the context of our human society, including all of its inherent biases.

These inherent biases built into algorithms can manifest in dark ways with considerable negative impacts to individuals. If you’re interested in some examples, take a look at this story about Somali markets in Seattle that were prohibited from accepting food stamps, or this story about how facial recognition software used in criminal investigations can lead to a disproportionate targeting of minority individuals.

In the future, when you see claims that an important decision was based on some algorithm, I hope you will hold the algorithm to the same standards that you would any other human decision-maker. We should continue to question the motivations behind the decision, the information that was considered, and the impact of the results.

For further reading:

Listen to the full interview with Dr. Blumenstock on the most recent Bloomberg Benchmark podcast

Circle Design Workbook - colored cards

Several speculative designs and design fictions from a design workbook were printed onto cards and into other formats for participants to interact with.


Richmond Wong, Deirdre Mulligan, Ellen Van Wyk, James Pierce, and John Chuang published a paper in CSCW (Computer Supported Cooperative Work) 2018’s online-first publication, in the Proceedings of the ACM on Human-Computer Interaction.

The paper, titled “Eliciting Values Reflections by Engaging Privacy Futures Using Design Workbooks,” presents a case study where a set of design workbooks of conceptual speculative designs and design fictions were presented to technologists in training in order to surface discussions and critical reflections about privacy. From the paper:

Although “privacy by design” (PBD)—embedding privacy protections into products during design, rather than retroactively—uses the term “design” to recognize how technical design choices implement and settle policy, design approaches and methodologies are largely absent from PBD conversations. Critical, speculative, and value-centered design approaches can be used to elicit reflections on relevant social values early in product development, and are a natural fit for PBD and necessary to achieve PBD’s goal. Bringing these together, we present a case study using a design workbook of speculative design fictions as a values elicitation tool. Originally used as a reflective tool among a research group, we transformed the workbook into artifacts to share as values elicitation tools in interviews with graduate students training as future technology professionals. We discuss how these design artifacts surface contextual, socially-oriented understandings of privacy, and their potential utility in relationship to other values levers.

We suggest that technology professionals can view and interact with design workbooks—collections of design proposals or conceptual designs, drawn together to allow designers to investigate, explore, reflect on, and expand a design space—to elicit values reflections and
discussions about privacy before a system is built, in essence “looking around corners” by broadening the imagination about what is possible.

Download the paper from the ACM Digital Library, or the Open Access version on eScholarship.