morganya – Page 3 – Data Science W231 | Behind the Data: Humans and Values

June 24, 2020June 24, 2020

Police Shootings: A Closer Look at Unarmed Fatalities

Police Shootings: A Closer Look at Unarmed Fatalities
By Anonymous

Last year, fifty-five people were killed by police shootings while “unarmed.” This number comes from the Washington Post dataset of fatal police shootings, which is aggregated by “local news reports, law enforcement websites and social media” as well as other independent databases. In this dataset, the recorded weapons that victims were armed with during the fatal encounter range from wasp spray to barstools. Here is a breakdown of the types of arms that were involved in the fatal police shootings of 2019.

We see a large number of fatalities among people who were armed with guns and knives, but also vehicles and toy weapons. In my opinion cars and toys are not weapons and would more appropriately fit the category of “unarmed.” But what exactly does “unarmed” mean? The basic Google search definition is “not equipped with or carrying weapons.” Okay, well what is a weapon? Another Google search defines weapons as “a thing designed or used for inflicting bodily harm or physical damage.” Toys and cars were not designed for inflicting bodily harm, but may have been used to do so. Now with the same logic, would we call our arms and legs “weapons,” since many people have used their appendages to inflict bodily harm? No. So why do we distinguish the cars and toys from the “unarmed” status?

This breakdown of the categories leads to bias in the data. When categorizing the armed-status of victims of police shootings, the challenge of specificity arises. Some may find value in having more specific descriptions for each of the cases in the dataset, but this comes at the cost of distinguishing certain cases that really should be in the same bucket; in this case, “vehicles” and “toy weapons” should be contained in the “unarmed” bucket, rather than their own separate categories. The exclusion of those cases would provide lower counts to the actual number of unarmed people who were killed by police. Including the cases that involved vehicles and toy weapons brings the count of unarmed fatalities from 55 to 142. In other words, the bias inflicted by granular categorization underestimated the number of unarmed victims of police shootings in 2019.

Now let’s look at the breakdown by race, specifically White versus Black (non-Hispanic).

For Washington Post’s definition of unarmed, 45% of the victims were White, while 25% were Black. For toy weapons, 50% were White, and 15% were Black. For vehicles, 41% were White, and 30% were Black. For all of those cases combined, 44% were White, and 25% were Black.

Now some may interpret this as “more White people are being killed by police,” and that is true, but let’s think about the population of White and Black folks in the United States. According to the 2019 U.S. Census Bureau, 60% of the population are White while only 13% are Black or African American. So when we consider, by race, the percentage of unarmed people who were killed by police in comparison with the percentage of people in the U.S., we see a disproportionate effect on Black folks. If the experience of Black and White folks were the same, then we would expect only 13% of police-shooting victims to be Black and 60% to be White. However, we see a much lower number, proportionally for White folks (44% unarmed White victims), and a much higher number for Black folks (25% unarmed Black victims).

This highlights the disproportionate effect of police brutality towards Black folks, yet the point estimates provided from this data may not be fully comprehensive. When police reports are fabricated, when horrific police killings of Black and Brown folks go under the radar, we risk the data provided by the Washington Post to be further biased. However, this bias would suggest an even greater disparity between the victimization by police shootings of unarmed Black and Brown folks. As we consider data in our reflections of current events, we have to be mindful of the potential biases that may exist in the creation and collection of the data, as well as our interpretation of it.

December 11, 2018December 11, 2018

Impact of Algorithmic Bias on Society

Impact of Algorithmic Bias on Society
By Anonymous | December 11, 2018

Artificial intelligence (AI) is being widely deployed in a number of realms where they have never been used before. A few examples of areas in which big data and artificial intelligence techniques are used are selecting potential candidates for employment, decisions on whether a loan should be approved or denied, and using facial recognition techniques for policing activities. Unfortunately, AI algorithms are treated as a black box in which the “answer” provided by the algorithm is presumed to the absolute truth. What is missed is the fact that these algorithms are biased for many reasons including the data that was utilized for training it. These hidden biases have serious impact on society and in many cases the divisions that have appeared among us. In the next few paragraphs we will present examples of such biases and what can be done to address them.

Impact of Bias in Education

In her book titled, “Weapons of Mass Destruction”, a mathematician, Cathy O’Neil, gives many examples of how mathematics on which machine learning algorithms are based on can easily cause untold harm on people and society. One such example she provides is the goal set forward by Washington D.C.’s newly elected mayor, Adrian Fenty, to turn around the city’s underperforming schools. To achieve his goal, the mayor hired an education reformeras the chancellor of Washington’s schools. This individual, based on an ongoing theory that the students were not learning enough because their teachers were not doing a good job, implemented a plan to weed out the “worst” teachers. A new teacher assessment tools called IMPACT was put in place and the teachers whose scores fell in the bottom 2% in the first year of operation, and 5% in the second year of the operation were automatically fired. From mathematical sense this approach makes perfect sense: evaluate the data and optimize the system to get the most out of it. Alas, as Cathy points out in the example, the factors that were used to determine the IMPACT score were flawed. Specifically, it was based on a model that did not have enough data to reduce statistical variance and improve accuracy of the conclusions one can draw from the score. As a result, teachers in poor neighborhoods, performing very well in a number of different metrics, were the ones that were impacted by the use of the flawed model. The situation was further exacerbated by the fact that it is very hard to attract and grow talented teachers in the schools in poor neighborhoods, many of whom are underperforming.

Gender Bias in Algorithms Used By Large Public Cloud Providers

The bias in algorithms is not limited to small entities with limited amount of data. Even large public cloud providers with access to large number of records can easily create algorithms that are biased and cause irreperable harm when used to make impactful decisions. The website, http://gendershades.org/, provides one such example. The research to determine if there were any biases in the algorithms of three major facial recognition AI service provider— Microsoft, IBM and Face++— was conducted by providing 1270 images from a mix of individuals originating from the continent of Africa and Europe. The sample had subjects from 3 African countries and 3 European countries with 54.4% male and 44.6% female division. Furthermore, 53.6% of the subjects had light skin and 46.4% had darker skin. When the algorithms from the three companies were asked to classify the gender of the samples, as seen in the figure below, the algorithms performed relatively well when one looks just at the overall accuracy.

However, on further investigation, as seen in the figure below, the algorithms performed poorly when classifying dark skinned individuals, particularly women. Clearly, any decisions that one makes based on the classification results of these algorithms, would be inherently biased and potentially harmful to dark skinned women in particular.

Techniques to Address Biases in Algorithms

The recognition that the algorithms are potentially biased is the first and the most important step towards addressing the issue. The techniques to use to reduce bias and improve the performance of algorithms is an active area of research. A number of techniques ranging from creation of an oath similar to the Hippocratic Oath that doctor’s pledge to a conscious effort to use a diverse set of data much more representative of the society has been proposed and is being evaluated. There are many reasons to be optimistic that although the bias in algorithms can never be eliminated, in the very near future the extent of the bias in the algorithms would be reduced.

Bibliography

Cathy O’Neil, 2016, Weapons of Math Destruction, Crown Publishing Company.
How well do IBM, Microsoft and Face++ AI services guess the gender of a face?

December 4, 2018

Potential Negative Consequences IoT Devices Could Have on Consumers

Potential Negative Consequences IoT Devices Could Have on Consumers
By Anonymous | December 4, 2018

IoT, or the Internet of Things, are devices that have the ability to collect and transmit data across the internet or other devices. The number of internet connected devices has grown rapidly among consumers. In the past, a typical person only owned a few IoT devices, such as desktops, laptops, routers and smartphones. Now, due to technological advances, many people also own televisions, video game consoles, smart watches (e.g. Fitbit, Apple Watch), digital assistants (e.g. Amazon Alexa, Google Home), cars, security systems, appliances, thermostats, locks and lights that all connect and transmit information over the internet.

While companies are constantly trying to find new ways to implement IoT capabilities into the lives of consumers, security seems to be taking a back seat. Therefore, with all of these new devices, it is important for consumers to remain aware of the personal information that is being collected, and to be informed of the potential negative consequences that could result from owning such devices. Here are four things you may want to be aware of:

1. Hackers could spy on you

43768357 – hooded cyber criminal stealing secrets with laptop

I am sure you have heard stories of people who have been spied on after having the webcams on their laptops hacked. Other devices, like The Owlet, a wearable baby monitor, was found to be hackable, along with SecurView smart cameras. What if someone were able to access your Alexa? They would be able to learn a lot about your personal life through recordings of your conversations. If someone were to hack your smart car, then they would be able to know where you are at most times. Recently, researchers uncovered vulnerabilities in Dongguan Diqee vacuum cleaners that could allow attackers to listen or perform video surveillance.

2. Hackers could sell or use your personal information

It may not seem like a big deal if a device, such as your FitBit is hacked. However, many companies would be interested in obtaining this information and could achieve financial gains with it. What if an insurance company could improve their models with this data, and as a result, increased their rates for customers with poor vital signs? Earlier this year, hackers were able to steal sensitive information from a casino after gaining access to a smart thermometer in a fish tank. If hackers can steal data from companies that prioritize security, then they will probably have a much easier time doing the same to an average person. The data you generate is valuable, and hackers can find a way to monetize it.

3. Invasion of privacy by device makers

Our personal information is not only obtainable through hacks. We may be willingly giving it away to the makers of the devices we use. Each device and application has its own policies regarding the data it chooses to collect and store. A GPS app may store you travel history so it can make recommendations in the future. However, it may also use this information to make money on marketing offers for local businesses. Device makers are financially motivated to use your information to improve their products and target their marketing efforts.

4. Invasion of privacy by government agencies

Government agencies are another group that may have access to our personal information. Some agencies, like the FBI, have the power to request data from device makers in order to gather intelligence related to possible threats. Law enforcement may be able to access certain information for purposes of investigations. Last year, police charged a man with murdering his wife using data from her Fitbit. Also, lawyers may be able to subpoena data in criminal and civil litigation.

IoT devices will continue to play an important role in everyone’s lives. They will continue to create an integrated system that will lead to increased efficiency for all. However, consumers should remain informed, and if given a choice between a brand of device, like Alexa or Google Home, consider choosing a company that prioritizes the security and policy issues discussed above. This will send a message that consumers care, and encourage positive change.

December 4, 2018

The View from The Middle

The View from The Middle
By Anonymous | December 4, 2018

If you are like me, you probably spend quite a bit of time online.

We read news articles online, watch videos, plan vacations, shop and much more. At the same time, we are generating data that is being used to tailor advertising to our personal preferences. Profiles constructed from our personal information are used to suggest movies and music we might like. Data driven recommendations make it easier for us to find relevant content. Advertising also provides revenue for the content providers which allows us to access those videos and articles at reduced cost.

But is the cost really reduced? How valuable is your data and how important is your privacy? Suppose you were sharing a computer with other members of your household. Would you want all your activities reflected in targeted advertising? Most of the time we are unaware that we are under surveillance and have no insight into the profiles created using our personal information. If we don’t want our personal information shared, how do we turn it off?

To answer that question, let’s first see what is being collected. We’ll put a proxy server between the web browser and the internet to act as a ‘Man-in-the-Middle’. All web communication goes through the proxy server which can record and display the content. We can now see what is being shared and where it is going.

The Privacy Settings of our Chrome browser allow us to turn off web services that share data. We also enable ‘Do Not Track’ to request that sites not track our browsing habits across websites.

Let’s see what happens when we browse to the webpage of a popular travel site and perform a search for vacation accommodation. In our proxy server we observe that the travel website caused many requests to be sent from our machine to advertising and analytics sites.

We can see requests being made to AppNexus (secure.adnxs.com), a company which builds groups of users for targeted advertising. These requests have used the X-Proxy-Origin HTTP Header to transmit our IP address. As IP addresses can be associated with geographic location this is personal data we may prefer to protect.

Both the Google Marketing Platform (doubleclick.net) and AppNexus are sharing details of the travel search in the Referrer HTTP Header. They know the intended destination and dates and the number of adults and children travelling.

ATDMT (ad.atdmt.com) is owned by a Facebook subsidiary Atlas Solutions. It is using a one pixel image as a tracking bug although the Do Not Track header is set to true. Clearbrain is a predictive analytics company which is also using a tracking bug.

Now we’ll have a look at the effectiveness of some popular privacy tools:

The Electronic Frontier Foundation’s ‘Privacy Badger’ combined with ‘Adblock Plus’ in Chrome. Privacy Badger is a browser add-on from the Electronic Frontier Foundation that stops advertisers and other third-party trackers from secretly tracking what pages you look at on the web. Adblock Plus is a free open source ad blocker which allows users to customize how much advertising they want to see.
The Cliqz browser with Ghostery enabled. Ghostery is a privacy plugin giving control over ads and tracking technologies. Cliqz is an open source browser designed for privacy.

There are now far fewer calls to third party websites. Privacy Badger has successfully identified and blocked the ATDMT tracking bug. Our IP address and travel search are no longer being collected. However neither Privacy Badger nor Ghostery detected the Clearbrain tracker. Since Privacy Badger learns to spot trackers while we browse it may just need to more time to detect bugs.

While these privacy tools are quite effective at providing some individual control over personal information, they are by no means a perfect solution. This approach places the burden of protecting privacy on the individual who does not always understand the risks. While these tools are designed to be easy to install, many people are unfamiliar with browser plugins.

Furthermore, we are making a trade off between our privacy and access to tailored advertising. Content websites we love to use may be sponsored by the advertising revenue we are now blocking.

For now, these tools at least offer the ability to make a choice.

December 4, 2018December 5, 2018

The Customer Is Always Right: No Ethics in Algorithms Without Consumer Support

The Customer Is Always Right: No Ethics in Algorithms Without Consumer Support
by Matt Swan | December 4, 2018

There is a something missing in data science today: ethics. It seems like there is a new scandal everyday; more personal data leaked to any number of bad actors in the greatest quantities possible. Big Data has quickly given way to Big Data Theft.

The Internet Society of France, for example, a public interest group advocating for online rights, is pushing Facebook to fix the problems that led to the recent string of violations. They’re suing for $100 million Euros (~$113 million USD) and threatening EU-based group action, if appropriate remedies are not made. Facebook is also being pursued by a public interest group in Ireland and recently paid a fine of 500,000 pounds (~$649,000 USD) for their role in the Cambridge Analytica breach. Is this the new normal?

Before we answer that question, it might be more prudent to ask why this happened in the first place. That answer is simple.

Dollars dictate ethics.

Facebook’s primary use of our data is to offer highly targeted (read: effective) advertising. Ads are the price of admission and it seems we’ve all come to terms with that. Amid all the scandals and breaches, Facebook made their money – far more money than they paid in fines. And they did it without any trace of ethical introspection. Move fast and break things, so long as they’re not your things.

Dollars dictate ethics.

Someone should be more concerned about this. In the recent hearings in the US Congress in early September, there was talk about regulating the tech industry to try to bring these problems under control. This feels like an encouraging move in the correct direction. It isn’t.

First, laws cannot enforce ethical behavior. Laws can put in place measures to reduce the likelihood of breaches or punish those not sufficiently safeguarding personal data or those failing to correct algorithms with a measurable bias, but it cannot require a company to have a Data Ethicist on the payroll. We’ve already noted that Facebook made more money than they paid in fines, so what motivation do they have to change their behavior?

Second, members of Congress are more likely to believe TensorFlow is a new setting on their Keurig than they are to know it’s an open source machine learning framework. Because of this reality, some organizations – such as 314 Action – exist and prioritize electing more STEM professionals to government because of technology has progressed quickly and government is out of touch. We need individuals who have a thorough understanding of technological methods.

Meanwhile, higher education is making an effort to import ethics into computer and data science programs, but there are still limitations. Some programs, such as UC Berkeley’s MIDS program, have implemented an ethics course. However, at the time of this writing, no program includes a course in ethics as a graduation requirement.

Dollars dictate ethics.

Consider the time constraints; only so many courses can be taken. If one program requires an ethics course, the programs that do not will be at an advantage in recruiting because they will argue the ethics course is a lost opportunity to squeeze in one more technology course. This will resonate with prospective students since there are no Data Ethicist jobs waiting for them and they’d prefer to load up on technology-oriented courses.

Also, taking an ethics course does not make one ethical. Ultimately, while each budding data scientist should be forced to consider the effects of his or her actions, it is certainly no guarantee of future ethical behavior.

If companies aren’t motivated to pursue ethics themselves and the government can’t force them to be ethical and schools can’t force us to be ethical, how can we possibly ensure the inclusion of ethics in data science?

I’ve provided the answer three times. If it were “ruby slippers”, we’d be home by now.

Dollars dictate ethics.

All the dollars start with consumers. And it turns out that when consumers collectively flex their economic muscles, companies bend and things break. Literally.

In late 2017, Fox News anchor Sean Hannity had made some questionable comments regarding a candidate for an Alabama senate seat. Consumers contacted Keurig, whose commercials aired during Hannity’s show, and complained. Keurig worked with Fox to ensure their ads would no longer be shown at those times, which also resulted in the untimely death of a number of Keurig machines.

The point is this: if we want to effect swift and enduring change within tech companies, the most effective way to do that is through consistent and persistent consumer influence. If we financially support companies that consider the ethical implications of their algorithms, or simply avoid those that don’t, we can create the necessary motivation for them to take it seriously.

But if we keep learning about the newest Facebook scandal from our Facebook feeds, we shouldn’t expect anymore more than the same “ask for forgiveness, not permission” attitude we’ve been getting all along.

Sources:
https://www.siliconrepublic.com/companies/data-science-ethicist-future
https://www.siliconrepublic.com/enterprise/facebook-twitter-congress
https://www.siliconrepublic.com/careers/data-scientists-ethics
https://news.bloomberglaw.com/privacy-and-data-security/facebook-may-face-100m-euro-lawsuit-over-privacy-breach
https://www.nytimes.com/2017/11/13/business/media/keurig-hannity.html
http://www.pewinternet.org/2018/11/16/public-attitudes-toward-computer-algorithms/

December 4, 2018

Freudian Slips

Freudian Slips
by Anonymous | December 4, 2018

In the ongoing battleground in the use and abuse of personal data in the age of big data, we often see competing forces of companies wanting to create new innovation and regulators who want to enforce limits in consideration of privacy or sociological harms that could arise from unmitigated usage. Often we will see companies or organizations who want as much data as possible, unfettered by considerations of regulation or other restrictions.

An interesting way to think about the underlying dynamic is to consider superimposing psychological models of human behavior on the cultural forces at play. Ruth Fulton Benedict wrote, Culture is “Personality Writ Large” in her book Patterns of Culture (htt1). One model for understanding the forces underlying human behavior is the Freudian one of the Id, Superego and Ego. In this model of explaining human behavior, Freud identified the Id as the primal underlying driving forces of human gratification, whether they be to satiate hunger or sexual in nature. It is an entirely selfish want or need for gratification. The Superego is the element of the brain that adheres to social norms, morality and is aware of the inherent social contract, the force of mutually agreed upon rules that enable individuals to co-exist. In his model, the resultant Ego was the output of these two forces that was the observed behavior presented to the world. An unhealthy balance of either Id or Superego, in his model, would result in diseased behavior. Consider a person who feels hungry and just takes things off other people’s plates without hesitation. This Id imbalance, without sufficient Superego restriction would be considered unhealthy behavior.

Drawing an analogy to behavior in the world of big data, we can see the Id as the urges of companies for innovation or profit. It’s what drives people to search for new answers, create innovative products, make money, to dive into solving a problem regardless of the consequences. However, unchecked by consideration or adherence to any social contract consideration, or the Superego, unhealthy behavior begins to leak out – privacy and data breaches, unethical use of people’s data, and even abusive workplace environments. Consider Uber, an extraordinarily innovative company with rapid growth. Led by Trevor Kalanick, there was a strong Id component to their rapid growth in a take no prisoners approach. In the process, people’s privacy was often overlooked. They often flaunted city regulations or cease and desist orders. They created an application to evade law enforcement (htt2). They also used data analytics to analyze if passengers were having one night stands. (htt4)

Of course, an inherent lack of trust results from some of these unchecked forces. But, without that driving Id, that drive to create, innovate, make money, it is unlikely Uber would have grown so rapidly. It is also likely no coincidence that some of the downfall of this unchecked Id, resulted in similar Id-like behavior leaking into the workplace, resulting in rampant sexual harassment and misconduct allegations and the eventual resignation of their CEO. Google, which has quickly grown to one of the biggest companies in the world, has also been recently accused of similar rampant sexual misconduct allegations.

Similarly, this is why on the flip side, a heavily Superego organization, one overly protective and regulatory, always considering stringent rules, might also be considered unhealthy. Consider the amount of innovation coming out of governmental organizations and institutions. This Freudian perspective superimposed on the dynamics of forces in the battles of big data organizations and government regulation is one perspective of how to interpret the different roles groups are playing. Neither could exist without each other, and the balance between the two is what creates a healthy growth and environment. There is a necessary amount of regulation, or reflection of social consequences, as well as a corresponding primal urges of recognition or power, that can create they type of growth that actually serves both to create healthy organizations.

References
(n.d.). Retrieved from http://companyculture.com/113-culture-is-personality-writ-large/
(n.d.). Retrieved from https://thehill.com/policy/technology/368560-uber-built-secret-program-to-evade-law-enforcement-report
(n.d.). Retrieved from https://boingboing.net/2014/11/19/uber-can-track-your-one-night.html

December 4, 2018December 5, 2018

The Next Phase of Smart Home Tech: Ethical Implications of Google’s New Patent

The Next Phase of Smart Home Tech: Ethical Implications of Google’s New Patent
By Jennifer Podracky | December 4, 2018

On October 30, 2018, Google filed a new patent for an extension of their Google smart home technology, titled “SMART-AUTOMATION SYSTEM THAT SUGGESTS OR AUTOMATICALLY (sp) IMPLEMENTS SELECTED HOUSEHOLD POLICIES BASED ON SENSED OBSERVATIONS.” In summary, this patent proposes a system by which an interconnected smart home system can detect the status and activities of persons in the household via audio or visual cues, and then implement a home-wide policy across the rest of the system based on rules that Google users have set up.

In English, what that means is that Google will use either microphones or cameras built into its smart devices to see who is home and what they are doing, and then make decisions on how to activate/deactivate other smart devices as a result. For example, it may hear the sound of a child playing in his or her room, and then determine via cameras and microphones in other devices that there is no one else in the home. Based on earlier observation of the home, Google already knows that there are two adults and one child that live in the home full-time; by adding this information to the information it has on the home’s current state, Google can infer that the child is home alone unsupervised. From there, Google can do a multitude of things: notify the owner(s) of the Google smart home account of the unsupervision, lock any unsecured smart locks, turn off smart lights in the front rooms of the home, disable the smart television in the child’s room, and so on. The action(s) that Google will take will depend on the policies and rules that the smart home’s users have configured. Google can also suggest new policies to the smart home users, based on the home status that it’s inferred; if it determines that a child is home alone and no policies have been configured for this situation, it can suggest the above actions.

Ethical Implications for the Smart Home Consumer

There are a couple of key components of this patent that could be cause for alarm with concerns to privacy.

What Google Can See

Thus far, commercially-released Google smart home devices (specifically the Google Home product line) have not included cameras. Today’s products include microphones, and are constantly listening to all voices and sounds in the home awaiting their “wake word” that requires them to take some action. Google can use the data that it collects from these microphones, even those not associated with device commands, to learn more about the individuals living in the home. Google devices can determine from regular household noises when different individuals usually arrive home, when they usually eat dinner, whether they cook or order out, how often they clean, and so on. By adding a camera to this product line, Google will always be both listening and watching. This means that Google won’t just be able to know when you cook, but also see what you cook. It will also be able to see your kitchen, including what brands of kitchenware you use, how high-end your appliances are, and how often you buy fresh produce. Perhaps most alarmingly, Google will also be able to see what you look like while you’re in the kitchen. Google can then use this information to draw conclusions about your health, income, and more.

What Google Can Learn

Additionally, Google can learn more about individuals in the home based on the policies that they choose to implement. By setting a policy that detects noise in a child’s room after 8pm, Google can infer that this child’s bedtime is 8pm and then suggest other policies related to that (e.g. restricting TV usage). By setting policies restricting channel availability to specific household members, Google can infer which TV shows and channels that specific individuals are (or aren’t) allowed to watch.

Why this matters

By watching and listening to the home, Google is amassing an incredible amount of data on both the individual(s) that purchased the smart devices, as well as anyone else who is in the home at any time (including minors and un-consenting visitors).

What can Google do with all this data? Well, in a 2016 patent titled “PRIVACY-AWARE PERSONALIZED CONTENT FOR THE SMART HOME”, Google discusses how it could use visual clues like the contents of your closet to determine what kinds of clothes and brands you like, to then market related content to you. Specifically: “a sensing device 138 and/or client device may recognize a tee-shirt on a floor of the user’s closet and recognize the face on the tee-shirt to be that of Will Smith. In addition, the client device may determine from browser search history that the user has searched for Will Smith recently. Accordingly, the client device may use the object data and the search history in combination to provide a movie recommendation that displays, “You seem to like Will Smith. His new movie is playing in a theater near you.”” Google will use the audio and visual data that it collects to determine your likes and dislikes and market content to you accordingly.

Google can also provide the data of un-consenting individuals back to the owner of the smart home system. Suppose you’ve hired a babysitter for the evening to watch your child; Google can report back to you at the end of the night saying how much time she spent with the child, what she watched on TV, and what she looked at on the internet. Google can hear if your child is whispering past their bedtime, “infer mischief” (which is a direct quote), and then tattle to you. Google can see and hear if your teenager is crying in their room, and then report back on its findings to you without their knowledge. For the record, these are all real examples listed in the patent, so Google is aware of these uses too.

As of today, these patents have not been implemented (as far as we know) as part of the commercially available Google smart home product line. However, as the product line advances, it is important that we keep the privacy and ethical concerns in mind before bringing the latest-and-greatest device into a home that is shared with others.

October 24, 2018

Is the GDPR’s Bark Bigger than its Bite?

Is the GDPR’s Bark Bigger than its Bite?
by Zach Day on 10/21/2018

The landmark EU regulation, formally called the General Data Protection Regulation or GDPR, took effect on May 25, 2018. Among other protections, GDPR grants “data subjects” with a bundle of new rights and placed an increased obligation on companies who collect and use the data. Firms were given a two year preemptive notice to implement changes that would bring them into compliance with GDPR by May 2018.

Image Credit: https://www.itrw.net/2018/03/22/what-you-need-to-know-about-general-data-protection-regulation-gdpr/

I don’t think I’m reaching too far to make this claim: some for-profit enterprises won’t do the right thing just because it’s the right thing, especially when the right thing is costly. Do the EU member countries’ respective Data Protection Authorities, also called DPAs, have enforcement tools that are powerful enough to motivate firms to invest in the systems and processes required for compliance?

Let’s compare two primary enforcement tools/consequences, monetary fines and bad press coverage.

Monetary Fines

When the UK Information Commissioner’s Office released their findings on Facebook’s role in the Cambridge Analytica scandal, the fine was capped at 500,000 pounds or $661,000. This is because Facebook’s transgressions occured before the initiation of GDPR and was therefore subject to the regulations of the UK Data Protection Act of 1998, the UK’s GDPR precursor, which specifies a maximum administrative fine of 500,000 pounds. How painful do you think a sub-million dollar fine is for a company that generated $40B of revenue in 2017?

GDPR vastly increases the potential monetary fine amount to a maximum of $20M euros or 4% of the company’s global turnover. For Facebook, this would have amounted to a fine of ~$1.6B. That’s more like it.

But how effectively can EU countries enforce the GDPR? GDPR enforcement occurs at the national level, with each member country possessing its own Data Protection Authority. Each nation’s DPA has full enforcement discretion. Because of this, there will inevitably be variation in enforcement trends from country to country. Countries like Germany, with a strong cultural value of protecting individual privacy, may enforce the GDPR with far more gusto than a country like Malta or Cyprus.

Monetary fines are not going to be the go-to tool for every enforcement case brought under the GDPR. DPAs have vast investigative powers, such as carrying out audits, obtaining access to stored personal data, accessing the facilities of the data controller or processor then issuing warnings, reprimands, orders, and bans on processing. It’s likely that these methods will be used with much more frequency. Although, the first few cases will be anomalies since, a) media outlets are chomping at the bit to report on the first few enforcement actions taken under the GDPR and b) DPAs will be trying to send a message.

PR Damage

What do you think burned worse to Facebook, a $661,000 fine or the front page of every international media outlet running the story for hundreds of millions of readers to see (imagine for how many this was the last straw, causing them to deactivate their Facebook accounts)? I would argue that the most powerful tool in the GDPR regulators toolbox is the bad press associated with a GDPR violation brought against a company, especially in the early years of the regulation when the topic is still fraught.

Mark Zuckerberg testifying before a joint hearing of the Senate Judiciary and Senate Commerce Committees, April 10, 2018. Image Credit: https://variety.com/2018/digital/news/zuckerberg-congress-testimony-1202749461/

A report published in July by TrustArc outlining estimated GDPR compliance rates across the US, UK, and EU noted that 57% of firms are motivated to comply with GDPR by ‘customer satisfaction’, whereas only 39% were motivated by fines. Of course a small business with 100 employees in a suburb of London is chiefly concerned with a potential 20,000,000 euro fine. They’d simply be out of business. On the other hand, large Silicon Valley based tech firms, with armies of experienced attorneys (Facebook attorneys have plenty of litigation experience in this area, by now), have much more to lose from more bad press versus a fine of any amount allowed under GDPR.

Path Forward Firms are going to pursue any path that leads to maximum revenue growth and profitability, even if it means operating in ethical/legal grey areas. If GDPR regulators plan to effectively motivate compliance from companies, they need to focus on the most sensitive pressure points. For some, it’s the threat of monetary penalty. For the tech behemoths, it’s the threat of another negative front-page headline. Regulators will be at a strategic disadvantage if they don’t acknowledge this fact and master their PR strategies.

October 24, 2018

Ethical Issues of the Healthcare Internet of Things

Ethical Issues of the Healthcare Internet of Things
By Osmar Coronel | October 21, 2018

Tracking vital signs on the fly

Very likely you are already using an Internet of Things (IoTs) product for your healthcare.

Our new connected world

IoTs are small computing objects that are constantly collecting information and sending it to the cloud or turning on or off something automatically. This blog is mainly focused on the ethical risks of IoT devices applied to improving your health.

IoT devices are predicted to expand in the healthcare industry due to the many benefits they provide. However, our FIPP, Fair Information Practice Policy, regulation might not be able to protect the consumer against all the new ethical risks associated with the upcoming healthcare IoT device applications.

IoT devices have several applications in healthcare. For instance, insulin pumps and blood-pressure cuffs connect to a mobile app that tracks and monitors blood pressure. The power of technology allows people to take control of their health. With IoTs people can also be more engage with their health. A patient with an insulin pump can be in more control of their blood pressure levels and this gives them more control of their diabetes. IoTs applied to healthcare can monitor and collect information such as heart rate and skin temperature. The data captured from the consumer can be transmitted, stored, and analyzed. This creates opportunities for research.

The use of IoTs in healthcare is expanding in the medical area. According to a Market Watch article, the healthcare IoT Market is expected to be worth $158 Billion by 2022. The Consumer Electronics Show (CES) in 2018, showcased several companies IoT products created to diagnose, monitor and treat illnesses.

Under the lens of the Federal Trade Commission (“FTC”), the FIPP focus is on notice, access, accuracy, data minimization, security, and accountability. The most relevant recommendations for IoT are security, data minimization, notice, and choice.

From the FIPP security recommendation, companies should implement from the very beginning “security by design”. They should also train their employees and retain providers that are able to enforce security in their services.

New Healthcare

Another risk of the application of healthcare IoTs is that it collects a large amount of data of the consumer for a long period. The FIPP Data Minimization proposes that companies should limit the data collected only for what is needed and for a limited time. Companies should develop and apply best practices, business needs and develop policies and practices that impose reasonable limits on the collection and retention of consumer data. Security and data minimization have more explicit initiatives to help minimize the ethical risks of IoTs.

On the other hand, notice and choice could be a challenge. In general there is a high risk that IoT companies do not provide notice or choice to the customer. Providing a notice or choice is challenging since IoTs are used in everyday life and they typically lack a user interface. Furthermore, some people think that the benefit of the IoT devices outweigh the cost of not giving the consumer notice and choice.

It is challenging to provide a choice when there is no user interface. However, according to the FIPP, there are still suitable alternatives like the implementation of video tutorials and implementation of QR codes on the devices. Also, in many cases, the data use might be under the consumers’ expectations, so that means that not every data collection needs to require the consumer consenting to the collection of data. Companies should implement opt-in choices at the point of sale when the consumer is acquiring the device with an easy to understand language.

The use of new technological advances in healthcare IoT devices offers a large number of benefits and they will expand considerable in the healthcare sector. Nonetheless, they will require careful implementation. The expansion of healthcare IoTs will come with a surge of new ethical problems and conflicts.

October 24, 2018

Unknown Knowns

Unknown Knowns
by Anonymous on 10/21/2018

Image Credit: https://www.azquotes.com/quote/254214
Donald Rumsfeld during Department of Defense News Briefing, archive.defense.gov. February 12, 2002.

The taxonomy of knowledge laid out by Rumsfeld in his much quoted new briefing conspicuously omits a fourth category: unknown knowns. In his critique of Rumsfeld’s analysis, the philosopher Slavoj Žižek defines the unknown knowns as “the disavowed beliefs and suppositions we are not even aware of adhering to ourselves, but which nonetheless determine our acts and feelings.” While this may seem like the realm of psychoanalysis, it’s a term that could also be applied to two of the most important topics in machine learning today: bias and interpretability.

The battle against bias, especially illegal biases that discriminate against protected classes, is a strong focus for both academia and industry. Simply testing the outputs of an algorithm for different categories of people for statistical difference can reveal things about the decision making process that were previous unknown, flipping things from the “unknown known” state to “known known.” More advanced interpretability tools, like LIME, are able to reveal even more subtle relationships between inputs and outputs.

While swaths of “unknown knowns” are being converted to “known knowns” with new techniques and attention, there’s still a huge amount of “unknown knowns” that we will miss forever. Explicitly called out protected classes are becoming easier to measure, but it’s rare to check for all possible intersections of protected classes. For example, there may be know measurable bias in some task when comparing genders or across race separately, but there may be bias when looking at the combinations. The fundamental nature of intersections is that their populations become smaller as more dimensions are considered, so the statistical tests become less powerful and it’s harder for automated tools to identify bias with certainty.

Image Credit: https://www.ywboston.org/2017/03/what-is-intersectionality-and-what-does-it-have-to-do-with-me/

There are are also many sub-classes that we don’t even know to look for bias against and have to rely on chance to discover. For example, in 2014 Target was called out for predicting pregnancies based on shopping patterns. Their Marketing Analytics team had a hypothesis that they could target pregnant women and made the explicit choice to single out this population, but with modern unsupervised learning techniques it could have just as easily been an automatically deployed campaign where no human had ever seen the description of the target audience.

“Pregnant women” as a category is easy to describe and the concerns about such targeting are easy to stir up controversy and change corporate behaviour, but more niche groups that may be biased against by algorithms may never be noticed. It’s also troubling that there may be classes discovered by unsupervised learning algorithm that have no obvious description yet, but would be controversial if given a name.

So what can be done? It may seem like a contradiction to try and address unknown knowns, given that they’re unknown, but new interpretability tools are changing what can be known. Practitioners could also start dedicating more of their model validation time to exploring the full set of combinations of protected classes, rooting out the subtle biases that might be missed with separate analysis of each category. A less technical but more ambitious solution is for organizations and practitioners to start sharing the biases they’ve discovered in their models and to contribute to some sort of central repository that others can learn from.