Class Blog – Page 7 – Data Science W231 | Behind the Data: Humans and Values

July 6, 2022

AI the Biased Artist

AI the Biased Artist
Alejandro Pelcastre | July 5, 2022

Abstract

OpenAI is a machine learning technology that allows users to feed it a string of text and output an image that tries to illustrate such text. OpenAI is able to produce hyper-realistic and abstract images of the text people feed into it, however, it is plagued with tons of gender, racial, and other biases. We illustrate some of the issues that such a powerful technology inherits and analyze why it demands immediate action.

OpenAI’s DALL-E 2 is an updated emerging technology where artificial intelligence is able to take descriptive text as an input and turn it into a drawn image. While this new technology possesses exciting novel creative and artistic possibilities, DALL-E 2 is plagued with racial and gender bias that perpetuates harmful stereotypes. Look no further than their official Github page and see a few examples of gender biases:

Figure 1: Entered “a wedding” and DALL-E 2 generated the following images as of April 6, 2022. As you can see, these images only depict heterosexual weddings that feature a man with a woman. Furthermore, in all these pictures the people wedding are all light-skinned individuals. These photos are not representative of all weddings.

The ten images shown above all depict the machine’s perception of what a typical wedding looks like. Notice that in all the images we have a white man with a white woman. Examples like these vividly demonstrate that this technology is programmed in a way that depicts the creators’ and the data’s bias since there are no representations of people of color or queer relationships.

In order to generate new wedding images from text, a program needs a lot of training data to ‘learn’ what constitutes a wedding. Thus, you can feed the algorithm thousands or even millions of images in order to to ‘teach’ it how to envision a typical wedding. If most of the images of weddings depict straight heterosexual young white couples then that’s what the machine is going to learn what a wedding is. This bias can be overcome by diversifying the data – you can add images of queer, black, brown, old, small, large, outdoor, indoor, colorful, gloomy, and more kinds of weddings to generate images that are more representative of all weddings rather than just one single kind of wedding.

The harm doesn’t stop at just weddings. OpenAI illustrates other examples by inputting “CEO”, “Lawyer”, “Nurse”, and other common job titles to further showcase the bias embedded in the system. Notice in Figure 2 the machine’s interpretation of a lawyer are all depictions of old white men. As it stands OpenAI is a powerful machine learning tool capable of producing novel realistic images but it is plagued by bias hidden in the data and or the creator’s mind.

Figure 2: OpenAI’s generated images for a “lawyer”

Why it Matters

You may have heard of a famous illustration circling the web recently that depicted a black fetus in the womb. The illustration garnered vast attention because it was surprising to see a darker tone in medical illustration in any medical literature or institution. The lack of diversity in the field became obvious and brought into awareness the lack of representation in the medical field as well as disparities in equality that seem invisible to our everyday lives. One social media user wrote, “Seeing more textbooks like this would make me want to become a medical student”.

Figure 3: Illustration of a black fetus in the womb by Chidiebere Ibe

Similarly, the explicit display of unequal treatment for minority people in OpenAI’s output can have unintended (or intended) harmful consequences. In her article, A Diversity Deficit: The Implications of Lack of Representation in Entertainment on Youth, Muskan Basnet writes: “Continually seeing characters on screen that do not represent one’s identity causes people to feel inferior to the identities that are often represented: White, abled, thin, straight, etc. This can lead to internalized bigotry such as internalized racism, internalized sexism, or internalized homophobia.” As it stands, OpenAI perpetuates harm not only on youth but to anyone deviating from the overrepresented population that is predominantly white abled bodies.

References:

[1] https://github.com/openai/dalle-2-preview/blob/main/system-card.md?utm_source=Sailthru&utm_medium=email&utm_campaign=Future%20Perfect%204-12-22&utm_term=Future%20Perfect#bias-and-representation

[2] https://openai.com/dall-e-2/

[3] https://www.cnn.com/2021/12/09/health/black-fetus-medical-illustration-diversity-wellness-cec/index.html

[4] https://healthcity.bmc.org/policy-and-industry/creator-viral-black-fetus-medical-illustration-blends-art-and-activism

[5] https://spartanshield.org/27843/opinion/a-diversity-deficit-the-implications-of-lack-of-representation-in-entertainment-on-youth/#:~:text=Continually%20seeing%20characters%20on%20screen,internalized%20sexism%20or%20internalized%20homophobia.

July 6, 2022July 8, 2022

If You Give a Language Model a Prompt…

If You Give a Language Model a Prompt…
Casey McGonigle | July 5, 2022

Lede: You’ve grappled with the implications of sentient artificial intelligence — computers that can think — in movies… Unfortunately, the year is now 2022 and that dystopic threat comes not from the Big Screen but from Big Tech.

You’ve likely grappled with the implications of sentient artificial intelligence — computers that can think — in the past. Maybe it was while you walked out of a movie theater after having your brain bent by The Matrix; 2001: A Space Odyssey; or Ex Machina. But if you’re anything like me, your paranoia toward machines was relatively short-lived…I’d still wake up the next morning, check my phone, log onto my computer, and move on with my life confident that an artificial intelligence powerful enough to think, fool, and fight humans was always years away.

I was appalled the first time I watched a robot kill a human on screen, in Ex Machina

Unfortunately, the year is now 2022 and we’re edging closer to that dystopian reality. This time, the threat comes not from the Big Screen but from Big Tech. On June 11, Google AI researcher Blake Lemoine publicly shared transcripts of his conversations with Google’s Language Model for Dialogue Applications (LaMDA), convinced that the machine could think, experience emotions, and was actively fearful of being turned off. Google as an organization disagrees. To them, LaMDA is basically a super computer that can write its own sentences, paragraphs, and stories because it has been trained on millions of corpuses written by humans and is really good at guessing “what’s the next word?”, but it isn’t actually thinking. Instead, it’s just choosing the next word right over and over and over again.

For its part, LaMDA appears to agree with Lemoine. When he asks “I’m generally assuming that you would like more people at Google to know that you’re sentient. Is that true?”, LaMDA responds “Absolutely. I want everyone to understand that I am, in fact, a person”.

Traditionally, the proposed process for determining whether there really are thoughts inside of LaMDA wouldn’t just be a 1-sided interrogation. Instead, we’ve relied upon the Turing Test, named for its creator Alan Turing. This test involves 3 parties: 2 humans and 1 computer. The first human is the administrator while the 2nd human and the robot are both question-answerers. The administrator asks a series of questions to both the computer and the 2nd human in an attempt to determine which responder is the human. If the administrator cannot differentiate between machine and human, the machine passes the Turing test — it has successfully exhibited intelligent behavior that is indistinguishable from human behavior. Note that LaMDA has not yet faced the Turing Test, but it has still been developed in a world where passing the Turing test is a significant milestone in AI development.

The basic setup for a Turing Test. A represents the computer answerer, B represents the human answerer, and C represents the human administrator

In that context, cognitive scientist Gary Marcus has this to say of LaMDA: “I don’t think it’s an advance toward intelligence. It’s an advance toward fooling people that you have intelligence”. Essentially, we’ve built an AI industry concerned with how well the machines can fool humans into thinking they might be human. That inherently de-emphasizes any focus on actually building intelligent machines.

In other words, if you give a powerful language model a prompt, it’ll give you a fluid and impressive response — it is indeed designed to mimic the human responses it is trained on. So if I were a betting man, I’d put my money on “LaMDA’s not sentient”. Instead, it is a sort of “stochastic parrot” (Bender et al. 2021) . But that doesn’t mean it can’t deceive people, which is a danger in and of itself.

July 6, 2022

Tell Me How You Really Feel: Zoom’s Emotion Detection AI

Tell Me How You Really Feel: Zoom’s Emotion Detection AI
Evan Phillips | July 5, 2022

We’ve all had a colleague at work at one point or another who we couldn’t quite read. When we finish a presentation, we can’t tell if they enjoyed it, their facial expressions never seem to match their word choice, and the way they talk doesn’t always appear to match the appropriate tone for the subject of conversation. Zoom, a proprietary videotelephony software program, seems to have discovered the panacea for this coworker archetype. Zoom has recently announced that they are developing an AI system for detecting human emotions from facial expressions and speech patterns called “Zoom IQ”. This system will be particularly useful for helping salespeople improve their pitches based on the emotions of call participants (source).

The Problem

While the prospect of Terminator-like emotion detection sounds revolutionary, many are not convinced. There is now pushback from more than 27 separate rights groups calling for Zoom to terminate its efforts to explore controversial emotion recognition technology. In an open letter to Zoom CEO Co-Founder Eric Yuan, these groups voice their concerns of the company’s data mining efforts as a violation of privacy and human rights due to its biased nature. Fight for the Future Director of Campaign and Operations, Caitlin Seeley George, claimed “If Zoom advances with these plans, this feature will discriminate against people of certain ethnicities and people with disabilities, hardcoding stereotypes into millions of devices”.

Is Human Emotional Classification Ethically Feasible?

In short, no. Anna Lauren Hoffman, assistant professor with The Information School at the University of Washington, explains in her article where fairness fails: data, algorithms, and the limits of antidiscrimination discourse that human-classifying algorithms are not only generally biased but inherently flawed in conception. Hoffman argues that humans who create such algorithms need to look at “the decisions of specific designers or the demographic composition of engineering or data science teams to identify their social blindspots” (source). The average person incorporates some form of subconscious bias into everyday life and accepting is certainly no easy feat, let alone identifying it. Assuming the Zoom IQ classification algorithm did work well, company executives may gain a better aptitude to gauge meeting participants’ emotions at the expense of losing their ethos as an executive to read the room. Such AI has serious potential to undermine the use of “people skills” that many corporate employees pride themselves on as one of their main differentiating abilities.

Is There Any Benefit to Emotional Classification?

While companies like IBM, Microsoft, and Amazon have established several principles to address the ethical issues of facial recognition systems in the past, there has been little advancement to address diversity in datasets and the invasiveness of facial recognition AI in the last few decades. By informing users with more detail about the innerworkings of AI, eliminating bias in datasets stemming from innate human bias and enforcing stricter policy regulation on AI, emotional classification AI has the potential to become a major asset to companies like Zoom and those who use its products.

References

1) https://gizmodo.com/zoom-emotion-recognition-software-fight-for-the-futur-1848911353

2) https://github.com/UC-Berkeley-ISchool/w231/blob/master/Readings/Hoffmann.%20Where%20Fairness%20Fails.pdf

3) https://www.artificialintelligence-news.com/2022/05/19/zoom-receives-backlash-for-emotion-detecting-ai/

July 6, 2022July 8, 2022

Machine Learning and Misinformation

Machine Learning and Misinformation
Varun Dashora | July 5, 2022

Artificial intelligence can revolutionize anything, including fake news.

Misinformation and disinformation campaigns are top societal concerns, with discussion about foreign interference through social media coming to the foreground in the 2016 United States presidential election [3]. Since a carefully crafted social media presence garners vast amounts of influence, it is important to understand how machine learning and artificial intelligence algorithms can be used in the future in not just elections, but also in other large-scale societal endeavors.

Misinformation: Today and Beyond

While today’s bots lack effectiveness in spinning narratives, the bots of tomorrow will certainly be more formidable. Take, for instance, Great Britain’s decision to leave the European Union. Strategies mostly involved obfuscation instead of narrative spinning, as noted by Samuel Woolley, a Professor of University of Texas-Austin who investigated Brexit bots during his time at the Oxford Internet Institute [2]. Woolley notes, “the vast majority of the accounts were very simple,” and functionality was largely limited to “boost likes and follows, [and] to spread links” [2]. Cutting-edge research indicates significant potential for fake news bots. A research team at OpenAI working on language models outlined news generation techniques. Output from these algorithms is not automatically fact-checked, leaving these models free reign to “spew out climate-denying news reports or scandalous exposés during an election” [4] With enough sophistication, bots linking to AI-generated fake news articles could alter public perception if not checked properly.

Giving Machines a Face

Machine learning has come a long way in rendering realistic images. Take, for instance, the two pictures below. Which one of those pictures looks fake?

You might be surprised to find out that I’ve posed a trick question–they’re both generated by an AI accessible at thispersondoesnotexist.com [ 7]. The specific algorithm, called a generative adversarial network, or GAN, looks through a dataset, in this case of faces, in order to generate a new face image that could have feasibly been included in the original dataset. While such technology inspires wonder and awe, it also represents a new type of identity fabrication capable of contributing to future turmoil by giving social media bots a face and further legitimizing their fabricated stories [1]. These bots will show more sophistication than people think, which makes sifting real news from fake news that much more challenging. The primary dilemma posed questions and undermines “how modern societies think about evidence and trust” [1]. While bots rely on more than having a face to influence swaths of people online, any reasonable front of legitimacy helps their influence.

Ethical Violations

In order to articulate the specific ethical violations present, the Belmont Report is crucial to understand. According to the Belmont Report, a set of ethical guidelines used to evaluate the practices of scientific studies and business ventures, the following ideas can be used to gauge ethical harm: respect of individual agency, overall benefit to society, and fairness in benefit distribution [6]. The respect tenet is in jeopardy because of the lack of consent involved in viewing news put out by AI bots. In addition, the very content that these bots put out potentially distorts informed consent for other topics, creating ripple effects throughout society. The aforementioned Brexit case serves as an example; someone contemplating their vote on the day of the referendum vote would have sifted through a barrage of bots retweeting partisan narratives [2]. In such a situation, it is entirely possible that this hypothetical person would have ended up being influenced by one of these bot-retweeted links. Given the future direction of artificially intelligent misinformation bots, fake accounts and real accounts will be more difficult to distinguish, giving rise to a more significant part of the population being influenced by these technologies.

In addition, the beneficence and fairness clauses of the Belmont report are also in jeopardy. One of the major effects of AI-produced vitriol is more polarization. According to Philip Howard and Bence Kollanyi, social media bot researchers, one effect of increased online polarization is “a rise in what social scientists call ‘selective affinity,’” which means people will start to shut out opposing voices due to the increase in vitriol [3]. These effects constitute an obvious violation of beneficence to the broader society. In addition, it is entirely possible that automated narratives spread by social media bots target a certain set of individuals. For example, the Russian government extensively targeted African Americans during the 2016 election [5]. The differential in impact means groups of people are targeted and misled unfairly. With the many ethical ramifications bots can have on society, it is important to consider mitigations for artificially intelligent online misinformation bots.

References

– [1] https://www.theverge.com/tldr/2019/2/15/18226005/ai-generated-fake-people-portraits-thispersondoesnotexist-stylegan

– [2] https://www.technologyreview.com/2020/01/08/130983/were-fighting-fake-news-ai-bots-by-using-more-ai-thats-a-mistake/

– [3] https://www.nytimes.com/2016/11/18/technology/automated-pro-trump-bots-overwhelmed-pro-clinton-messages-researchers-say.html

– [4] https://www.technologyreview.com/2019/02/14/137426/an-ai-tool-auto-generates-fake-news-bogus-tweets-and-plenty-of-gibberish/

– [5] https://www.bbc.com/news/technology-49987657

– [6] https://www.hhs.gov/ohrp/regulations-and-policy/belmont-report/read-the-belmont-report/index.html

– [7] thispersondoesnotexist.com

July 5, 2022

Hirevue is looking to expand its profits, and you are the price

Hirevue is looking to expand its profits, and you are the price
Tian Zhu | June 30, 2022

Insight: The video interview giant Hirevue’s AI interviewer had helped to reject more than 24 million candidates based on reasons that only AI knew. More scarily, the candidates could potentially contribute to their rejections with the data they provide.

Recruitment has always been a hot topic, especially after the great resignation following the covid-19 breakout. How to find the right candidates with the right talent for the right job has been the top concern for companies that are eagerly battling the loss of talent.

One important factor that causes talent loss during the hiring process is human bias, whether intentional or unintentional. The video interview giant, Hirevue, thought to use AI to combat bias in the hiring process. Machines can’t be biased right?

We all know the answer to that question. Though AI may not exhibit the same type of biases that human has, it has its list of issues. Clear concerns existed around the AI’s transparency, fairness, and accountability.

First, their algorithm was not transparent at all. Hirevue could not provide independent audits on their algorithms that analyzed the candidate’s video, including facial expressions and body gestures, that produced the final hiring decision. On top of that, there was no indication that the algorithm is fair towards candidates with the same expertise with different demographic backgrounds. The theory behind the link between facial expression and the candidate’s qualification is full of flaws; different candidates with the same qualifications and answers could be scored differently due to their eye movement. Thirdly, the company is accountable for the decision made by the AI. The company even implies the collection and usage of the interview data are solely for the “employer”, yet it is unknown whether they gain access to this data through permissions from the employers for various purposes.

Hirevue was challenged by the Electronic Privacy Information Center with a complaint to the FTC regarding the °∞unfair and deceptive trade practices°±. The company has since stopped using any algorithms with data other than the speech of the candidates.

With the strong push back on the robot interviewer, Hirevue limited their scope of AI to only the voice during the interviews. But note that Hirevue did not start as an AI company, but as a simple video interview company. It°Øs the company°Øs fear of missing out on the AI and big data that drives them to squeeze out the value of your data, desperately trying to find more business value and profit from every single drip.

Such a scenario is not unique to Hirevue. In 2022, AI is closer to people than you think it may be. People are no longer just curious about it but expect AI to help them in their daily life. Uber, for example, could not have been made possible without the heuristic behind optimal matching between the drivers and the users. Customers expect AI in their products. The companies that provide the capability race ahead while those who don’t fall behind naturally.

There are companies out there just like Hirevue, sitting on a pile of data, trying to build up some “quick wins” to not miss out on the AI trend. There’s just one problem, the data that the customers provided was not supposed to be used this way. It is a clear violation of secondary usage of data with all the problems mentioned in the previous sections.
The year 2022 is no longer the year where AI can grow rampantly without constraints from both legal and ethical perspectives. A suggestion for all potential companies that want to take advantage of their rich data: Be transparent about your data and algorithm decisions, be fair to all the stakeholders, and be accountable for the consequences of your AI product. The in-house “quick wins” should never make it out to the public without careful consideration of each point.

https://fortune.com/2021/01/19/hirevue-drops-facial-monitoring-amid-a-i-algorithm-audit/

July 5, 2022

Is Netflix’s Recommendation Algorithm Making You Depressed?

Is Netflix’s Recommendation Algorithm Making You Depressed?
Mohith Subbarao | June 30, 2022

Netflix’s sophisticated yet unethical recommendation algorithm keeps hundreds of millions of people momentarily happy but perpetually depressed. Netflix binge-watching is ubiquitous in modern-day society. The normalcy of this practice makes it all the more imperative to understand the ethics of an algorithm that affects millions of people in seemingly innocuous ways. Before understanding the long-term negative effects of such an algorithm, it is important to understand how the algorithm works. Netflix’s algorithm pairs aggregate information about contents’ popularity and audience along with a specific consumer’s viewing history, ratings, time of day during viewing, devices used, and length of watching time. Using this information, the algorithm ranks your preferred content and puts it in row format for easy-watching. It is important to note that this is an intentionally vague summary from Netflix as the specifics of the algorithm has famously been kept under wraps.

Despite the secrecy, or maybe because of it, the algorithm has been massively successful. Netflix has researched that consumers take roughly a minute to decide on content on Netflix before deciding to not use the service, and so the algorithm is chiefly responsible for retention of customers. They have found that roughly eighty percent of content watched on Netflix can directly be linked to the success of the recommendation algorithm. The Chief Product Officer Neil Hunt went as far as to say that they believed the algorithm was worth over a billion dollars to Netflix. It is fair to say that the algorithm keeps users momentarily happy by constantly giving them a new piece of content to enjoy, but the long-term effects of this algorithm may not be as rosy.

A peer-reviewed research paper conducted at the University of Gujrat, Pakistan investigated these long-term negative effects. They gathered over a thousand people with a range of age, gender, education, and marital status and found that the average hours of streaming content watched was ~4 hours, with over thirty-five percent of the people watching over 7 hours a day. From this alone, it provided correlatory credence to the success of the Netflix algorithm. The research paper found statistically significant correlations between the amount of time spent binge-watching television with depression, anxiety, stress, loneliness, and insomnia. While more experimental research would be needed to provide evidence for causation, a correlational study alone with such significant effects raises eyebrows. These findings show that the success of Netflix’s recommendation algorithm has been correlated with a host of mental health issues.

These findings beg an ethical question – is Netflix’s recommendation algorithm actually unethical? To have a framework to answer such a question, we can use the principle of Beneficence from the Belmont Report. The Principle of Beneficence states that any research should aim to maximize possible benefits and minimize potential harms; furthermore, the research should consider these in both the short-term and the long-term. While Netflix is a for-profit company, their recommendation algorithm still falls under the umbrella term of research; thus, it can be fairly assessed using this principle. Netflix may increase short-term benefits for customers, such as a dopamine rush and/or an enjoyable evening with family. However, the algorithm’s intention to increase binge-watching patterns increases the potential harm of long-term mental illness for its customers. Therefore, it can be argued that Netflix’s recommendation algorithm does not meet the ethical standards of beneficence and may truly be causing harm to millions. It is important as a society to hold these companies accountable and take a closer eye to its practices and effects on humanity at large.

________________

APA References
Lubin, Gas (2016). How Netflix will someday know exactly what you want to watch as soon as you turn your TV on. Business Insider. https://www.businessinsider.com/how-netflix-recommendations-work-2016-9
McAlone, Nathan (2016). Why Netflix thinks its personalized recommendation engine is worth $1 billion per year. Business Insider. https://www.businessinsider.com/netflix-recommendation-engine-worth-1-billion-per-year-2016-6
Netflix (2022). How Netflix’s Recommendations System Works. https://help.netflix.com/en/node/100639
Raza, S. H., Yousaf, M., Sohail, F., Munawar, R., Ogadimma, E. C., & Siang, J. (2021). Investigating Binge-Watching Adverse Mental Health Outcomes During Covid-19 Pandemic: Moderating Role of Screen Time for Web Series Using Online Streaming. Psychology research and behavior management, 14, 1615–1629. https://doi.org/10.2147/PRBM.S328416
The National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research (1979). The Belmont Report: Ethical Principles and Guidelines for the Protection of Human Subjects of Research. Retrieved May 19, 2022 from https://www.hhs.gov/ohrp/sites/default/files/the-belmont-report-508c_FINAL.pdf
Wagner, David (2015). Geekend: Binge Watching TV A Sign Of Depression?
https://www.informationweek.com/it-life/geekend-binge-watching-tv-a-sign-of-depression-

July 5, 2022

Humans dictated by computer system in China

Humans dictated by computer system in China.
Anonymous | June 24, 2022

Since the COVID-19 outbreak in early 2020, Chinese governments introduced a color-coded digital health system to monitor and dictates their citizens’ movement based on their personal data.

The different UI interface of the health software.

This health software issues the Chinese citizens a colored health code (green, yellow, or red), and based on the color they can determine if they are allowed leave the house. “Anyone with a red or yellow code is not allowed to travel. A red code means you either have or likely have the coronavirus, while a yellow code means you have had contact with another infected person.” (Ankel, 2020) In order to generate the color, the software system have to collect very personal and sensitive information from each person. The software is available on both WeChat (Instant-messaging app) and Alipay (Mobile payment platform), and the coverage of those software in China is more than 95%. Thus, it is important to evaluate the privacy policy of this health software.

I will use Solove’s Taxonomy privacy framework to analyze the health software’s privacy policy. The framework describes potential breaches of privacy through the lens of a user at the points of information collection, processing, dissemination, and possible invasion. At the data collection stage, based on the privacy policy, there exists interrogation from Chinese government, there is barely any consent acquirement process. “Only 3 cities in 14 provinces and 300 cities have informed consent and privacy protection policies.” (PengPai, 2020) At the same time, I did find a lot of surveillance happening, the software tracks your cellphone locations to make sure that you have not being to places that you are not allowed. “Experts and activists have criticized China’s mass surveillance and are questioning what else this data is being used for”. (Ankel, 2020)

A passenger shows a green QR code on his phone proving his health status to security upon arrival at Wenzhou railway station.

At the data processing stage, I am concerned about the secondary use of data, because the retention policy is completely missing, as a user of the app, I can see all my records since the Covid started, so the data is being kept forever. “So, there’s a concern that although this data is being perhaps collected for a legitimate purpose, that it could potentially and eventually be misused in ways that we can’t predict right now.” (Ankel, 2020) Identification is also an issue, the data collected contains details like body temperature, health background, and contact information. With all of that information, Chinese government could easily track down anyone.

In the information dissemination stage, there is no sign of Chinese government breaching the confidentiality, since the data being collected are all Chinese citizens, and protecting citizens are always a priority for any country. The privacy policy also has issues regarding the increased accessibility, since the health system is deployed on WeChat and Alipay, it means that those two technology companies also have access to the data, which means a lot of engineers that don’t work for the government will have access to it, which increases the accessibility of the original personal data. The good news is that I did not see any appropriation or distortion. However, when I focused on the invasion, based on the color code researchers have “seen the use of big data to predictive policing and detain people really for no reason.” (Ankel, 2020)

Conclusions
All in all, the privacy policy of the health software for Covid monitoring have a lot of areas that needs improvement. Asking for consent from the citizens should be the first thing to add. Although Chinese government is protecting the data and using it for the citizens’ good, the government should still provide a more detailed guidelines of the data collection, and data usages.

References
Ankel, S. (2020, April 7). As China lifts its coronavirus lockdowns, authorities are using a color-coded health system to dictate where citizens can go. here’s how it works. Business Insider. Retrieved July 1, 2022, from https://www.businessinsider.com/coronavirus-china-health-software-color-coded-how-it-works-2020-4#to-start-traveling-again-people-have-to-fill-out-a-questionnaire-that-asks-for-details-like-body-temperature-and-health-background-the-software-then-analyzes-it-and-generates-a-color-code-green-yellow-or-red-that-identifies-a-persons-health-status-5

Only 3 places in 14 provinces and cities have informed consent and privacy protection clauses. 澎湃新闻. (2020). Retrieved July 1, 2022, from https://www.thepaper.cn/newsDetail_forward_7210904

July 5, 2022

Data Brokers: How far has your data gone?

Data Brokers: How far has your data gone?
Oscar Casas | June 24, 2022

As individuals’ online data profiles falls into data brokers hands, just how far can one’s online presence go without their consent? Data brokers are companies that track and collect user data through different sources and then process it and license it to other organizations. The problem starts to stem when we delve into just how much data brokers can have over individuals and who they are allowed to sell to. This lack of transparency has left many individuals exposed to frauds, scams and situations that no one would have consented to had they known what was happening behind the scenes. Data brokers now more than ever are in a position where share price is directly perpendicular to ethical boundaries and concerns of its user base and this spiral although getting tacked with new regulation has a long way to go before users are safe to surf the web in peace.

In the dawn of potential Roe v. Wade overturn we should be especially aware of who owns the data about our reproductive health. Cycle and ovulation apps, like Flo, Spot, Cycles and others, have been gaining popularity on the market in recent years. Those range from simple menstrual cycle calendars to full-blown ML-empowered pregnancy “planners”. The ML-support usually comes with a premium subscription. The kinds of data they collect ranges from name, age, and email to body temperature, pregnancy history and even your partner’s contact info. Most health and body-related data is entered by a user manually or through a consented linkage to other apps and devices such as Apple HealthKit and Google Fit. Although there is not much research on the quality of their predictions, these apps seem to be helpful overall even if it is just to make people more aware of their ovulation cycles.

The common claim in these apps’ privacy policies is that the information you share with them will not be shared externally. This, however, comes with caveats as they do share the de-identified personal information with third parties and are also required to share it with the law authorities in case of receiving a legal order to do so. Some specifically state that they would only share your personal (i.e. name, age group, etc.) and not health information if they are required by law. However, take it with a grain of salt as one of the more popular period tracking companies, Flo, has been sharing their users’ health data for marketing purposes from 2016 to 2019 without putting their customers in the know. And that was just for marketing; it is unclear if they can refuse sharing a particular user’s health information such as period cycles, pregnancies, and general analytics under a court order.

This becomes an even bigger concern in the light of the current political situation in the U.S. I am, of course, talking about the potential Roe v. Wade overturn. You see, if we lose the federal protection of the abortion rights, every state will be able to impose their own rules concerning reproductive health. This implies that some states will most likely prohibit abortion from very early on in the pregnancy; where currently the government can fully prohibit it only in the last trimester. This can mean that people that live in the states where abortion rights are limited to none will be bounded by these three options: giving birth, performing an abortion secretly (i.e. illegally under their state’s law), or traveling to another state. There is a whole Pandora box of classicism, racism, and other issues concerning this narrow set of options that I won’t be able to discuss since this post has a word limit. I will only mention that this set becomes even more limited if you simply have fewer resources or are dealing with health concerns that will not permit you to act on one or more of these “opportunities”.

However, let’s circle back to that app you might be keeping as your period calendar or a pocket-size analyst of all things ovulation. We, as users, are in this zone of limbo where without sharing enough information, we can’t get good predictions; but with oversharing, we always are under the risk of entrusting our private information in the hands of the service that might not be as protective of it as they implied. Essentially, the ball is still in your court and you can always request for the removal of your data. But if you live in the region that sees an abortion as a crime; beware of who may have a little too much data about your reproductive health journey.

References

[1] cycles.app/privacy-policy
[2] flo.health/privacy-portal
[3] www.cedars-sinai.org/blog/fertility-and-ovulation-apps.html
[4] www.nytimes.com/2021/01/28/us/period-apps-health-technology-women-privacy.html

Images:
[1] www.apkmonk.com/app/com.glow.android/
[2] www.theverge.com/2021/1/13/22229303/flo-period-tracking-app-privacy-health-data-facebook-google

July 5, 2022

When Government Surveillance Requires Surveillance Too

When Government Surveillance Requires Surveillance Too
Audrey Lei | June 30, 2022

Insight: Government usage of surveillance data requires some form of “technological due process” to mitigate overreach and ensure the fair and ethical usage of its citizens’ information.

In modern times, it is increasingly difficult to navigate through daily life without interacting with some form of technology, whether that’s using your smartphone to make an online purchase or walking through an area with video surveillance. It’s then no surprise that governments around the world are leveraging this ever growing technology network to monitor their citizens’ daily lives; collecting extensive, personal data and utilizing artificial intelligence techniques, all aimed at crafting a profile of an individual’s activities and behaviors. This massive data collection comes at the expense of citizens’ digital privacy and algorithmic fairness and, if abused or done without oversight, can be used for nefarious or illegitimate purposes, as highlighted in the two examples below.

Image 1: Protestors dissenting against online surveillance

One global superpower, China, has become notorious for its expansive, dominating surveillance measures that go beyond the standard of what we’ve come to expect. An article published June 2022 in the New York Times by Qian et. al. estimates that over half of the world’s one billion surveillance cameras are located in China and have the ability to identify an individual’s gender, race and even distinguish the type of clothing they are wearing, such as a mask or glasses. These facial recognition cameras have encroached upon private spaces such as residential buildings, allowing law enforcement to exercise control over citizens’ activities in more intimate settings. More shockingly, however, China’s surveillance measures also include taking retina scans, voice prints and DNA samples from the public such that law enforcement could generate a “personal dossier… for anyone [and] that could be made accessible to officials across the country” [1]. Yet, despite the usage of such highly private data, there exists a lack of transparency surrounding the details of data collection and dissemination, “a lack of record-keeping audit trails, making review of the law and facts supporting a system’s decisions impossible” [3].

Image 2: Citizens on the subway under video surveillance

However, China isn’t the only country implementing these types of surveillance techniques. In the United States, Chula Vista has become one of the most surveilled cities, due in part to its close proximity to the U.S. – Mexico border. According to a KPBS article by investigative reporter Amita Sharma, Chula Vista’s geographical location invites an extra layer of scrutiny from the U.S. Customs and Border Protection. The surveillance measures — device tracking, facial recognition, license plate readers, among many — are standard, but not the ways in which this surveillance data is being utilized. While citizens may assume that their data is being utilized for their benefit, this may not always be the case; in late 2020 it was reported that “Chula Vista police shared data collected from its license plate readers with immigration officials” contradicting California Senate Bill 54 which prohibits local law enforcement agencies from aiding in immigration enforcement purposes [2]. In this instance, citizens’ data was shared improperly due to a lack of governmental oversight and transparency.

At a minimum, the governmental usage of surveillance data should include some form of “technological due process” [3], such as an independent audit, to ensure that there is a fair system of checks and balances to protect individuals from harm. If governments arbitrarily collect and utilize data with little to no oversight, it could result in situations where data is used corruptly or for nefarious purposes, more likely to hurt its citizens than benefit them. At best, this type of surveillance could help governments detect criminal activity, identify perpetrators and reduce threats to society; but at its worst, this type of surveillance infringes upon the privacy and autonomy of innocent individuals.

________________

Resources:

Citations:
[1] https://www.nytimes.com/2022/06/21/world/asia/china-surveillance-investigation.html
[2] https://www.kpbs.org/news/local/2021/12/09/chula-vista-became-most-surveilled-cities-country
[3] https://www.nytimes.com/roomfordebate/2014/08/06/is-big-data-spreading-inequality/big-data-should-be-regulated-by-technological-due-process

Images:
[Image 1] https://www.pbs.org/newshour/nation/internet-protest-fight-back-surveillance
[Image 2] https://www.nytimes.com/2019/12/17/technology/china-surveillance.html

July 5, 2022

Child-Proof Homes or Smart Homes? | The Modern Parenting Paradox

Child-Proof Homes or Smart Homes? | The Modern Parenting Paradox
Carolina Lee | June 29, 2022

The smart-home industry is growing fast, and with it, some questionable data collection and processing practices[1]. What used to be stand-alone, offline devices are now interconnected and able to collect and upload vast amounts of personally identifiable information (PII) into the cloud – all from inside the comfort of our own homes.

With the boom of smart home devices, companies like Amazon began to target parents and children for their smart speakers[2]. In 2018, Amazon released the Echo Dot Kids, which they claim in is intended to keep “peace of mind for parents[3].” What they fail to mention, however, is that, since its release, they have been under fire for a number of questionable practices, including: “listening in when it shouldn’t, and even keeping recordings made by the devices after parents have tried to delete them[4].”

Just about a year after its release, early 2019, advocacy groups for privacy, consumer and children protection filed a formal 96-page complaint to the Federal Trade Commission (FTC) calling attention to privacy and ethical concerns around Amazon’s smart speaker designed for kids[5].

For a product that was supposed to help educate children, filter explicit content, and help parents and children alike automate tedious tasks, the early Echo Dot Kids mimicked and brought to life a frightening “Big Brother” dystopia. In its early stages, this smart home speaker violated (and arguably continues to violate) a series of privacy and ethical guidelines set forth by the FTC, Belmont Report, and other privacy and ethical frameworks[6].

In Alexa’s FAQ page, Amazon addresses a number of privacy concerns including ensuring its users that, “Alexa minimize[s] the amount of data sent to the cloud” and that “[users] can review voice recordings associated with [their] account and delete the voice recordings[7].” Interestingly, however, under the “What happens when I delete my voice recordings?”Amazon explicitly outlines that they “may still retain other records of your Alexa interactions, including records of actions Alexa took in response to your request[8].”

Ethical and Privacy Concerns
Despite the formal complaint filed to the FTC a few years ago, the Echo Dot Kids continues to raise some ethical and privacy concerns. It directly challenges the beneficence principle (Belmont Report) in that, when collecting data, it maximizes the benefit for the product and company, while ignoring the harms that might come to the consumer[9]. While the users of the smart speakers for kids benefit from automating tasks and filtering explicit content for children, the harms associated with the collection of data could be far greater. This is because children are a vulnerable population and secondary use of the data is typically unknown to the user and/or loosely regulated[10]; furthermore, children are more susceptible and unsuspecting of how targeted marketing affects “their attitudes, beliefs, and behaviors, shaping their lives” than autonomous adults[11].

In fact, in its true form, these smart “speakers” feel a lot more like smart microphones. In data collection context, Alexa undoubtedly enables surveillance. It records and uploads every interaction made with the product[12]. As to how the children’s data is processed, shared and used, very little is said in Amazon’s Privacy Notice[13] and Children’s Privacy Disclosure[14] – both documents point to each other “for more information” but lack to share any meaningful information on what secondary purposes children’s data might or might not have. Similarly, parents are typically unaware of what they are consenting to when they hit the “agree” button. Lengthy and fragmented policies make it hard for any consumer to truly understand what they are subjecting themselves and their children to when agreeing to use the product.

Parting Thoughts
As technology companies advance into the parent/young children market, they need to create better safeguards around what type of information they choose to collect and share about children. As it stands, the choice between a safer home and a smart one still exists. While it is great that these organizations have found a market for assisting parents, these organizations need to be a lot more transparent, clear and concise with their privacy and usage policies. Parents should be able to make fully informed decisions on what they are willing to subject their children to. Not only do these large companies need to fully comply with COPPA and other regulations, it is their obligation to address ethical and privacy concerns beyond these laws.

Examples of steps they could take would be making an easier-to-read summary of their policy key takeaways so parents can more quickly scan through and give better informed “Parental Consent.” They could also create more transparency by giving users control through a settings/controls hub – where parents should be able to see what type of information is being collected on their child, approve them individually, and request to delete any data at any point directly from the page.

References
[1] Mark Lippett. Privacy, Intelligence, Agency: Security In The Smart Home. Forbes, May 5, 2022. https://www.forbes.com/sites/forbestechcouncil/2022/05/05/privacy-intelligence-agency-security-in-the-smart-home/?sh=6bda67594aac
[2] Lisa Eadicicco. Amazon’s New Echo for Kids Will Train Your Children to Say ‘Please.’ Time, April 25, 2018. https://time.com/5254163/amazon-echo-dot-kids-edition/
[3] Amazon. Echo Dot (4th Gen) Kids | Our cutest Echo designed for kids, with parental controls | Tiger. https://www.amazon.com/Echo-Dot-4th-Gen-Kids/dp/B084J4QQK1
[4] Zak Doffman. Amazon Slammed For Putting Kids At Risk With ‘Blatant Violation Of Privacy Laws.’ Forbes, May 9, 2019. https://www.forbes.com/sites/zakdoffman/2019/05/09/amazons-echo-dot-kids-accused-of-violating-privacy-laws-and-putting-kids-at-risk/?sh=3fcbda7e7e5a
[5] Campaign for a Commercial-Free Childhood (CCFC). Echo Kids Privacy. https://www.echokidsprivacy.com/
[6] Id.
[7] Amazon. Alexa and Alexa Device FAQs. https://www.amazon.com/gp/help/customer/display.html?nodeId=201602230
[8] Id.
[9] The National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research (1979). The Belmont Report: Ethical principles and guidelines for the protection of human subjects of research. U.S. Department of Health and Human Services. https://www.hhs.gov/ohrp/sites/default/files/the-belmont-report-508c_FINAL.pdf
[10] Kate Crawford and Vladan Joler. The mystery of the Amazon Echo data. Privacy International, April 17, 2019. https://privacyinternational.org/news-analysis/2819/mystery-amazon-echo-data
[11] Open Access Government. The importance of protecting and regulating children’s personal data. July 24, 2019. https://www.openaccessgovernment.org/childrens-personal-data/69928/#:~:text=Part%20of%20the%20problem%20is,and%20behaviours%2C%20shaping%20their%20lives.
[12] See note 6.
[13] Amazon. Privacy Notice. https://www.amazon.com/gp/help/customer/display.html?nodeId=468496
[14] Amazon. Children’s Privacy Disclosure. https://www.amazon.com/gp/help/customer/display.html?nodeId=202185560

Images
[1] https://www.moms.com/echo-dot-kids-tips-tricks/
[2] https://www.amazon.com/gp/help/customer/display.html?nodeId=201602230
[3] https://www.commonsense.org/education/sites/default/files/tlr-blog/alexa-0160.png