Hey Alexa, where’s my data going?

Hey Alexa, where’s my data going?
Anonymous | June 23, 2022

In exchange for comfort and convenience, households who opt for smart home devices like Amazon Alexa hand off a surprising amount of personal data and security – but how much, exactly?

Firstly, what are smart home devices? They can range anywhere from gaming systems to refrigerators, and most importantly the common thread between them is that they need to connect to the Internet to fully function. Many of these devices are touted to help improve and streamline your day-to-day life, with one’s smartphone often used as the remote control.[1] Well-known choices include Amazon Alexa/Echo, Google Home and Nest, Ring doorbells, and Samsung Smart TVs.

Infographic: Amazon's Alexa Rules American Smart Homes | Statista
You will find more infographics at Statista

Fig. 1: An infographic showing the most popular smart home speakers.

They have gained popularity in recent years and have also faced pushback from several concerned populations, yet interestingly as the Consumers International survey shows, 63% of people surveyed distrust smart/connected devices due to how they collect data on people and their behaviors, yet about 72% of people surveyed own at least one smart device.[2] In another survey conducted by CUJO AI, a whopping 98% of 4,000 participants expressed concerns over privacy with smart home devices, yet a good half of them did not take the requisite steps to still get them or don’t take the necessary precautions to secure themselves and their devices.[3] I use Nissenbaum’s contextual integrity to investigate prominent privacy risks from one of the top smart home devices, Amazon Alexa.

Infographic: The Growing Footprint of Amazon's Alexa | Statista
You will find more infographics at Statista

Fig. 2: An infographic of how many smart home devices Alexa can control.

Personal information is collected by smart home devices for as long as they are in operation, which could mean 24/7 insight into someone’s life, and with Alexa, it was found in a study by Lentzsch et al. that skills, the commands and applications that Alexa could be installed with for different functions, had several privacy issues that could lead to third parties gaining personal information.[4] For example, fraud was a risk on the skills store due to the fact that an unrelated party could mask itself as a reputable organization, and when a user downloaded and used their skill, their personal data would go to this third party instead of the expected organization. Another oversight is that while Amazon required publishers of skills to have a public privacy policy detailing their intent of collecting and using personal data, 23.3% out of about a thousand skills had incredibly opaque policies or none at all, and still got access to personal information through Alexa.[4] Using Nissenbaum’s framework of contextual integrity, we can categorize the data subject and sender of the data as the user of the smart home device, the primary recipient of the data as Amazon Alexa , the information type as personal information such as shopping and living habits and voice, and the transmission principle as through the smart home devices and the Internet.[5] The context of the transmission principle of this personal data, intended for Alexa’s use, could now be leaked to third parties unknowingly and has been compromised.

This is not to say that smart home devices are all bad, as they indubitably provide lifestyle benefits, especially to those who may be disabled or otherwise disadvantaged. The tangible benefits do not outweigh the current privacy costs, however, and thus there needs to be more work done to protect people and their information, whether the work is legal, technological, and/or ethical. We can start by spreading more awareness on exactly how much agency people have over their personal data and privacy, and giving people the right to control them. Additionally, companies of smart home devices should take ownership of making privacy policies and disclosures more digestible and transparent for the average consumer, as well as allowing them to opt out of data harvesting.[6]

Hey Alexa, opt me out of data collection!

Positionality and Reflexivity Statement

I am an Asian American, middle class, cisgender woman. I have a smartphone, a smart watch, and a gaming system, but my household does not have any other smart home devices like TVs and kitchen appliances. Prior to this post, I was already wary of smart home devices and my stance remains the same. However, I have utilized several applications and features like fitness tracking on both my phone and watch that likely collected my data points and could be compromised. Moving forward, I will be more judicious of my smart device use and protect myself where possible, even in this increasingly data-driven world with little privacy where it seems like one’s every step is being watched. I encourage you to evaluate your relationship with your smart home devices and take your privacy into your own hands.

[1]Kaspersky. (n.d.). How safe are smart homes? Retrieved 2022, from https://usa.kaspersky.com/resource-center/threats/how-safe-is-your-smart-home
[2]Consumers’ International and Internet Society. (2019, May). The Trust Opportunity: Exploring Consumers’ Attitudes to the Internet of Things. Retrieved 2022, from https://www.internetsociety.org/wp-content/uploads/2019/05/CI_IS_Joint_Report-EN.pdf
[3]CUJO AI. (2021, October). Cybersecurity Perceptions Survey. Retrieved 2022, from https://cujo.com/wp-content/uploads/2021/10/Cybersecurity-Perceptions-Survey-2021.pdf
[4]Lentzsch, C. et al. (2021, February). Hey Alexa, is this Skill Safe?: Taking a Closer Look at the Alexa Skill Ecosystem. Retrieved 2022, from https://anupamdas.org/paper/NDSS2021.pdf
[5]Nissenbaum, H. F. (2011). A Contextual Approach to Privacy Online. Daedalus 140 (4), Fall 2011: 32-48, Available at SSRN: https://ssrn.com/abstract=2567042
[6]Tariq, A. (2021, January 21). The Challenges and Security Risks of Smart Home Devices. Retrieved 2022, from https://www.entrepreneur.com/article/362497Images
[Fig. 1] https://www.statista.com/chart/16068/most-popular-smart-speakers-in-the-us/
[Fig. 2] https://www.statista.com/chart/22338/smart-home-devices-compatible-with-alexa/

Websites use privacy popups to track you on every website you visit.

Websites use privacy popups to track you on every website you visit.
Francisco Valdez | June 23, 2022

Privacy Notice popups are annoying, and people are ignoring them. But, they do more than just asking for consent to store cookies.

When GDPR came into effect on May 25th, 2019, we saw privacy notice popups on websites suddenly appear, and the constant battle to keep our information private began to gain awareness. Although GDPR didn’t specify any mechanism to provide consent, the European ad industry implemented the privacy notice popups even before the law came into effect. The privacy notice popups have many goals. The most important goal is to inform users about how they are being tracked and provide a mechanism to opt out. Then the California Consumer Privacy Act (CCPA) became effective on January 1st, 2020, providing similar protections to California residents. CCPA additionally requires an easy and accessible way to opt out. [1]

Most websites use third-party Consent Management Providers (CMP) that provide plugins to implement the privacy notice popups. When users consent, they store a cookie with the response given.

Example of a Privacy Notice popup. Image by WordPress Cookie Notice plugin https://wordpress.org/plugins/cookie-notice/

Nowadays, people consent to privacy notices without reading them to be able to access the content they want. When you consent to be tracked, you do not only agree to store cookies for that particular website on your device. You also agree to access existing data on your device, such as third-party cookies, advertising identifiers, device identifiers, and similar technologies. Also included in the consent is a provision to personalize advertising and content for you on other websites or apps. In addition, your actions on other sites or apps are utilized to make inferences about your interests, influencing future advertising or content.

Many websites have illegally made it harder to opt out by making you read super long privacy notices before you can opt out. Or they make you hop through different pages to show you the opt-out button. Other websites don’t even provide an opt-out. And when you are lucky enough to find the opt-out button, some websites ignore your decision [2].

uBlock Origin in action. Image by uBlock Origin https://ublockorigin.com/

There are many countermeasures for websites that make it harder to opt out. First, you can install an ad-blocker. Ad-blockers have gained popularity in the past few years, but at the same time, it’s becoming an arms race. Publishers are always looking for ways of evading ad-blockers, and ad-blockers are always trying to detect ads and tracking signals. As a result, some publishers have made arrangements with ad-blockers to provide easy opt-out mechanisms and unobtrusive ads; these publishers can bypass
ad-blockers. Ad-blockers generally stop websites from storing cookies in your browser and block tracking signals [3]. Since websites are constantly trying to evade ad-blockers, using them may interfere with the website’s correct functioning.

Do Not Track initiative logo. Image by https://www.eff.org/issues/do-not-track

An additional layer of protection is to enable your Do-Not-Track (DNT) signals in your browser. The problem with current opt-out mechanisms is that they require you to store a cookie with your consent decision on your device. Also, you would need to opt out of every website you use. So instead of opting out constantly, DNT is a setting stored in your browser, which signals every website you visit with your privacy preferences [4]. DNT is consistent with GDPR and CCPA. Both frameworks consider a mechanism where the users configure their devices to store their privacy preferences. For this to be a successful mechanism, mass adoption from all the stakeholders is required.

In summary, Privacy Notice popups are becoming annoying, and the only viable path forward is for mass adoption of Do-Not-Track. Consumers are also responsible for reporting websites not compliant with GDPR, CCPA, or any other regional privacy law.



Kamala D. Harris, Attorney General, California Department of Justice (2014, May). Making Privacy Practices Public. Retrieved from https://oag.ca.gov/sites/all/files/agweb/pdfs/cybersecurity/making_your_privacy_practices_public.pdf


C. Matte, N. Bielova, and C. Santos (2019, May) Do Cookie Banners Respect my Choice?. ANR JCJC PrivaWeb ANR-18-CE39-0008. Retrieved from http://www-sop.inria.fr/members/Nataliia.Bielova/papers/Matt-etal-20-SP.pdf https://www-sop.inria.fr/members/Nataliia.Bielova/cookiebanners/


uBlock Origin. About uBlock Origin. Retrieved (2022, June) from https://ublockorigin.com/


Dan Goodin (2020, October 8th) Now you can enforce your privacy rights with a single browser tick. Ars Technica. Retrieved from https://arstechnica.com/tech-policy/2020/10/coming-to-a-browser-near-you-a-new-way-to-keep-sites-from-selling-your-data/

Surveillance & Control – The Future of National Governance

Surveillance & Control – The Future of National Governance
Austin Sanders | June 23, 2022

Global powers are pursuing contrasting data privacy laws and regulations. Will
government surveillance and control be the new norm worldwide?

The Chinese Communist Party (CCP) demonstrates an apparent lack of trust or care for its citizens. With artificial intelligence systems and data collection
methods incorporated into every part of society, the CCP monitors every text
message, web search, and purchase made legally within their borders.[1] While the internet should facilitate the spread of ideas and knowledge throughout the world, the CCP installed the “Great Firewall” to suppress and control their
population.[2] Government officials argue that it’s in the nation’s best interest
to remain united in their journey to return to the top of the global order. In
the process, the CCP has created a social credit system to track people’s
behavior and encourage habits and actions that align with the CCP’s established norm.[3] The developing and underdeveloped world stands vulnerable as the CCP governance and control philosophy spreads across the globe undermining

China’s “Great Firewall” is a threat to democratic principles.

China’s “Great Firewall” is a threat to democratic principles.[2]

Business Insider describes the Chinese social credit system as a way to rank its population. Chinese AI and technology systems track citizens’ behavior and score their actions based on subjective rules. Bad driving or posting the wrong news article online will hurt an individual’s score. Low social credit scores will result in punishments ranging from low-internet speed to the inability to use public transportation.[3] A social check and balances system is a dangerous and slippery slope. Who makes the rules for what is socially acceptable? Humanity should fight against this type of data hoarding and manipulation by national governments.

Chinese companies are at the forefront of the technological revolution.
As China produces surveillance technology, cell towers, and cloud-based
infrastructure worldwide, there is a growing concern that countries will follow
a similar censorship model to China.[4] China’s substantial growth over the last
thirty years can be attributed to the period of relative peace at home and
abroad. Developing countries with authoritarian leaders will view the
“Great Firewall” and social credit system as a means to maintain control and
facilitate growth and development.[4] While this reads well on paper, this comes at high costs to fundamental human rights. Uyghur Muslims in Xinjiang,
pro-democratic leaders in Hong Kong, and Buddhists in Tibet have borne the brunt of atrocious human rights abuses in China’s surveillance state.[5] Government control over digital communications allows the CCP to track and arrest people whose lifestyle does not align with what the CCP envisions for the Chinese people. With such a diverse population, this method of governance is not only inappropriate for China; it would be a problematic system to implement in most countries worldwide.

Given China’s lack of domestic data privacy laws and tight control over the
Chinese tech giants, there is reason to believe that the companies open up
backdoors to access data from foreign countries. This has led Western
governments to stray away from Chinese technology companies.[6] However, countries trying to develop their digital infrastructure often have no other options. Relying on Chinese technology is essential to domestic growth but leaves their government and citizens vulnerable to Chinese surveillance.[4]

Ecuador’s surveillance system installed by Chinese tech companies.

Ecuador’s surveillance system installed by Chinese tech companies.[4]

As global powers continue to diverge in their data privacy laws, getting
developing and underdeveloped countries on board with similar governance norms as the United States and the European Union is essential. Authoritarian
governments will likely side with the CCP and seek out Chinese technologies to monitor their citizens and maintain a stronghold on control. This is detrimental to the democratic global order and will stymie human ideas and development. Going head-to-head with the CCP is challenging and will inevitably cause uncomfortable interactions at the international table. However, the United States must partner with the EU to spread ethical technology and data governance principles worldwide to promote a world that encourages freedom of speech and expression.[4] No one wants to live in a world where they might go to jail for sending a text message professing their religious beliefs.

1. Yang J. WeChat Becomes a Powerful Surveillance Tool Everywhere in China.
Wall Street Journal. https://www.wsj.com/articles/wechat-becomes-a-powerful-surveillance-tool-everywhere-in-china-11608633003. Published December 22, 2020. Accessed June 24, 2022.
2. Wang Y. In China, the ‘Great Firewall’ Is Changing a Generation. POLITICO.
Published September 1, 2020. Accessed June 24, 2022. https://www.politico.com/ news/magazine/2020/09/01/china-great-firewall-generation-405385
3. Canales K. China’s “social credit” system ranks citizens and punishes them
with throttled internet speeds and flight bans if the Communist Party deems them untrustworthy. Business Insider. Published December 25, 2021. Accessed June 24, 2022. https://www.businessinsider.com/china-social-credit-system-punishments-
4. International Republican Institute. Chinese Malign Influence and the
Corrosion of Democracy. Published online 2019.
5. Human Rights Watch. China: Events of 2021. In: English. ; 2021. Accessed
June 24, 2022. https://www.hrw.org/world-report/2022/country-chapters/china- and-tibet
6. Triolo RG Paul. Will China Control the Global Internet Via its Digital Silk
Road? Carnegie Endowment for International Peace. Published May 8, 2020.
Accessed June 24, 2022. https://carnegieendowment.org/2020/05/08/will-china-control-global-internet-via-its-digital-silk-road-pub-81857

My body, my data, my choice: How data science enhancements threaten privacy to reproductive healthcare

My body, my data, my choice: How data science enhancements threaten privacy to reproductive healthcare
Anonymous | June 23, 2022

Tweet: How surveillance technology and facial recognition software has entered the conversation of reproductive health care, and why the right to liberty and privacy is at stake

In May of 2022, a draft opinion of the Supreme Court’s previous ruling of Roe v. Wade [1] was leaked to the public, indicating that it is possible US citizens (particularly women) could lose the right to privacy towards reproductive health decisions such as abortion.

Protests in Washington D.C. in front of the U.S. Supreme Court following the leak of the potential Roe v. Wade overturning

While this decision has not been confirmed, there have been an uproar of social conversations how families may be criminally prosecuted for seeking abortions where it is legal, and how facial recognition and other data science advancements may be used as evidence to criminally prosecute or penalize them. This recent development towards reproductive privacy demands that the right to privacy be protected on a federal level to preserve America’s fundamental right to liberty.

Since the ruling of Roe v. Wade was established in 1973, the topic of abortion and reproductive health care has been deeply politically charged. It has often threatened the separation of church and state, compelling states with social cultures more intrinsically based in religion to make legislative decisions that protect life, compared to states with larger liberal populations which make legislative decisions that protect choice. Since the Supreme Court’s ruling almost 50 years ago, reproductive rights has been a constant topic in presidential elections, and is often used as a divisive measure of discussion to win votes (or sabotage their opponents). So, with the draft opinion being released of its potential overruling, the conversation regarding the threat to individual privacy has started anew, but this time the public has realized how data science and technological enhancements have a potentially menacing part in the conversation.

While it is not uncommon for pro-life protestors to be stationed outside of reproductive health clinics such as Planned Parenthood, these “peaceful” protestors have started to use facial & license plate recognition software to track and monitor those who seek reproductive health care from these locations [2]. While currently all U.S. citizens have a protected right to seek an abortion, the overturning of Roe v. Wade could mean that the information and data collected by these protestors could be used as criminal evidence. Even if someone chose to get an abortion in a state where that right is protected, they could still be recognized with these surveillance techniques and penalized upon returning to a state with different legal jurisdiction.

Protestors with list of license plates (redacted) of those who visit a Planned Parenthood site in Charlotte, North Carolina

Unfortunately, the concern for privacy doesn’t stop at facial or license plate recognition. Even indirectly associated services such as menstrual cycle monitors, private messaging, or search history logs could contain enough information to serve as evidence to assess if a user may have aborted a pregnancy. Without a clear understanding of how people could be criminalized with this information, any or all data collected could be used as social or criminal retribution and could deprive Americans of their 14th amendment right to “life, liberty, or property without due process of law” [3].

All things considered, if Roe v. Wade were overturned, there is still the option that various state legislatures could protect the right to abortion, however the right to individual or corporeal privacy would be lost. Roe v. Wade may explicitly protect reproductive rights, but its absence could create a chasm towards protecting individuals’ right to corporeal, mental, or emotional privacy. Regardless of the Roe v. Wade decision, there needs to be federal legislation protecting the right to privacy. Without a federally imposed safeguard for individual privacy, especially considering the growing enhancements of artificial intelligence, Americans could lose their basic liberty and could be socially or criminally penalized simply by making decisions for their mental, medical, or familial well-being.

For those who are want information regarding safeguarding privacy while seeking reproductive health care, visit the Digital Defense Fund [4].

[1]. Roe v. Wade – Center for Reproductive Rights

[2].Anti-abortion activists are collecting the data they’ll need for prosecutions post-Roe

[3].Due Process and Equal Protection | CONSTITUTION USA with Peter Sagal | PBS.


Cookieless world, Privacy & Disruption

Cookieless world, Privacy & Disruption
Amey Mahajan | June 23, 2022

Personalized ads, digital behavior and tracking users’ data for targeted marketing has been a major trend in the last few users. These practices have been a major contributor to growth of the marketing and retail industry. (ICSD 2022) suggests that purchasing behavior of nearly 44% of consumers is driven by ads. These ads and digital marketing techniques are heavily based on 3rd-party cookie sharing. By 2023 though, Google has planned to remove the 3rd party cookie support on Chrome (nd 2022) to support user privacy like Firefox and Apple.

Four cookies, each smaller than the one before.

The decision taken by a platform that enjoys the largest user base who are also the potential customers for businesses definitely has repercussions and consequences across domains that needs to be studied carefully. The discussion and analysis should be done not just through the lens of user privacy but also its far-fetched impacts on the overall marketing strategies and economic impacts on smaller businesses who don’t have a huge chunk of 1st party cookies (that are still going to remain).

User privacy is the topmost concern in this ever growing digital market for all companies. Until recent times, topics like data collection practices, understanding how collected data is packaged and shared across platforms and the potential harms of such openly used techniques were rarely discussed. With an increased awareness around multiple dimensions of privacy and legitimate practices that need to be enforced, along with stringent regulations like GDPR and CCPA, companies like Google have realized the importance of keeping customer interests first. Moreover, studies have shown that “81% people say that the potential risks they face because of data collection outweigh the benefits” (David 2022). It is imperative that steps need to be taken in the right direction to make people feel comfortable during their digital footprint expansion process.

Third-party cookie retargeting process diagram

Ads generated from these 3rd party cookies drive a lot of revenue for small and big businesses because consumer behavior techniques and the overall business model is largely dependent on it. Given that nearly 81% of companies depend on this technique to drive their business (RetailWire 2022), it is an undeniable fact that there is going to be a huge disruption after the world goes “cookie-less”. This long-lasting impact needs to be studied, analyzed and alternative techniques must be introduced to soften the blow. Apart from the retail industry, another sector to be worst-hit from this decision is definitely the marketing industry (Caccavale 2021). Revenues generated are directly proportional to the techniques used to capture customer behavior and come up with metrics to analyze and strategize for targeted marketing campaigns. Given that the foundational technique itself is radically changing, it is undeniable that this sector will be hit and should certainly gear up for it. Various studies and articles have started publishing about techniques that need to be used by businesses who heavily leverage 3rd party cookies for their strategies.

In today’s day and age, technological advancements play a pivotal role in reaching and analyzing customers and drive businesses based on that analysis. With the landscape changing so quickly and with privacy being at the forefront in driving these changes, exhaustive research needs to be done as to why these proposed changes are important, who they are going to impact and what is the scale of that impact.


Caccavale, Michael. “Council Post: Bye Bye, Third-Party Cookies.” Forbes, Forbes Magazine, 13 Apr. 2021, https://www.forbes.com/sites/forbesagencycouncil/2021/04/13/bye-bye-third-party-cookies/?sh=66d66d3a3788.

International Centre for Settlement of Investment Disputes. (n.d.). Retrieved June 23, 2022, from https://icsid.worldbank.org/

3 steps for marketers to prepare for a cookieless world. Gartner. (n.d.). Retrieved June 23, 2022, from https://www.gartner.com/en/marketing/insights/articles/three-steps-for-marketers-to-prepare-for-a-cookieless-world

Google ending third-party cookies in Chrome. (n.d.). Retrieved June 23, 2022, from https://www.cookiebot.com/en/google-third-party-cookies/

Will retailers be ready when the third-party cookies crumble? RetailWire. (n.d.). Retrieved June 23, 2022, from https://retailwire.com/discussion/will-retailers-be-ready-when-the-third-party-cookies-crumble/#:~:text=Eighty%2Done%20percent%20of%20companies,State%20of%20Customer%20Engagement%20Report.%E2%80%9D

Temkin, D. (2021, March 3). Charting a course towards a more privacy-first web. Google. Retrieved June 23, 2022, from https://blog.google/products/ads-commerce/a-more-privacy-first-web/


When an Auto accident happens, is it due to the driver ability to drive or is it due to their credit score?

When an Auto accident happens, is it due to the driver ability to drive or is it due to their credit score?
By Jai Raju | June 24, 2022

One car collides into the front of another.

We all know in the USA, there are regulations for industries to not discriminate by race, gender, age and other classes. So it should not be a surprise that the Auto Insurance companies are not allowed to use Race as a parameter to charge you a premium. Well, on the surface it would appear they don’t. But if you take a deeper look, and connect a few dots you’ll see that’s exactly what they do – discriminate by race.

Context: Auto Insurance Industry as a case study

Insurance is a means of protection from financial loss. It is a form of risk management, primarily used to hedge against the risk of a contingent or uncertain loss.

— Wikipedia

Auto Insurance premium is a function of:

  1. Moving violations – number of tickets
  2. Loss history – prior accidents
  3. Location – some locations are highly populated and the more the number of vehicles on the road, the chances of accidents increases
  4. Age – like everything in life, learning to drive well takes time. So younger drivers are more likely to get into accidents than mature drivers.
  5. Gender – Women are a lot safer drivers than men.
  6. Marital status – Married drivers are safer drivers than the single category drivers
  7. Credit score – lower credit score people are more likely to file a claim than those of higher credit score.

There are several other variables that determine the result of how much you pay. At the core, it is a mathematical equation the data scientists come up with by relating these variables to the potential loss you may cause.
While all of that on the outset seems reasonable, note that there is no scientific proof for any of the above reasons for using the variables. The whole industry uses observational data to summarize statistically the impact of these factors on a potential loss from a driver. While these variables are correlated to the loss there is no proof that these variables and their impacts are the actual cause of the loss. When there is no causation it is unreasonable to use variables that are discriminatory.


Auto insurance protects you against a loss due to the operation of your vehicle, as such it is protecting you from a mistake you could make driving your vehicle. However, they use credit score as their most important variable in computing the loss. Said another way, the Auto insurance companies claim that, If you have a bad credit score, you are more likely to cause an accident. Here is why that claim is flawed:

  1. First of all, how is credit score related to your driving behavior?
  2. Second, credit score is determined by private, for-profit, publicly traded companies,whose goals are to turn a profit. Plus, Credit bureaus are under no legal requirement to be accurate, and the current credit reporting bureaus make a tremendous amount of mistakes at the consumers’ cost. A 2013 Federal Trade Commission study of the U.S. credit reporting industry discovered that 5% of consumers had errors. This disproportionately affects the poor as they cannot afford paying lawyers to get this corrected. A Congressional Research Service report stated that – consumers sometimes find it difficult to advocate for themselves when credit reporting issues arise because they are not aware of their rights and how to exercise them.
  3. Third, we have plenty of science and research that shows people of color disproportionately have lower credit scores. Also, In 2020, 18% of Black Americans had no credit score, compared to 15% of Latinos, 13% of white Americans and 10% of Asian Americans.

A similar argument can be made for the moving violations variable. There is plenty of research and science behind how the people of color are disproportionately pulled over by the police and given citations. Insurance industries have turned a blind eye towards it.

Parting thoughts:

Mobility is an essential part of a path to middle class. Auto insurance industries have a responsibility to treat all the drivers equally and assess their risk purely based on the drivers’ driving behaviour. They should not look for ways to discriminate.
Racism and Economy are tied together, racism has been about the economy has fueled racism. We often think of them as separate, putting them together is the only way to get the at the issues and challenges associated with racism2 – Angela Blackwell Glover


Culpability in AI Incidents: Can I Have A Piece?

Culpability in AI Incidents: Can I Have A Piece?
By Elda Pere | June 16, 2022

With so many entities deploying AI products, it is not difficult to distribute blame when things go wrong. As data scientists, we should keep the pressure on ourselves and welcome the responsibility to create better, fairer learning systems.

The question of who should take responsibility for technology-gone-wrong situations is a messy one. Take the case mentioned by Madeleine Clare Elish in her paper “Moral Crumple Zones: Cautionary Tales in Human-Robot Interaction”. If an autonomous car gets into an accident, is it the fault of the car owner that allowed this setting? Is it the fault of the engineer that built the autonomous functionality? The manufacturer that built the car? The city infrastructure’s unfriendliness towards autonomous vehicles? How about in the case when banks disproportionately deny loans to marginalized communities, is it the fault of the loan officer, who they buy information from, or the repercussions of a historically unjust system? The cases are endless, ranging from misgendering on social media platforms to misallocating resources on a national scale.

A good answer would be that there is a share of blame amongst all parties, but however true this may be, it does not prove useful in practice. It just makes it easier for each party to pass the baton and take away the pressure of doing something to resolve the issue. With this posting,  in the name of all other data scientists I hereby take on the responsibility to resolve the issues that a data scientist is skilled to resolve. (I expect rioting on my lawn sometime soon, with logistic regressions in place of pitchforks.)

Why Should Data Scientists Take Responsibility?

Inequalities that come from discriminating against demographic features such as age, gender or race occur because the users are categorized into specific buckets and stereotyped as a group. The users are categorized in this way because the systems that make use of this information need buckets to function. Data scientists control these systems. They choose between a logistic regression and a clustering algorithm. They choose between a binary gender option, a categorical gender with more than two categories, or a free form text box where users do not need to select from a pre-curated list. While this last option most closely follows the user’s identity, the technologies that make use of this information need categories to function. This is why Facebook “did not change the site’s underlying algorithmic gender binary” despite giving the user a choice of over 50 different genders to identify with back in 2014.

So What Can You Do?

While there have been a number of efforts in the field of fair machine learning, many of them are still in the format of a scientific paper and have not been used in practice, especially with the growing interest demonstrated in Figure 1.

Figure 1: A Brief History of Fairness in ML (Source)

Here are a few methods and tools that are easy to use and that may help in practice.

  1. Metrics of fairness for classification models such as demographic parity, equal opportunity and equalized odds. “How to define fairness to detect and prevent discriminatory outcomes in Machine Learning” describes good use cases and potential things that could go wrong when using these metrics.
  1. Model explainability tools that increase transparency and make it easier to spot discrepancies. Popular options listed by “Eliminating AI Bias” include:
  1. LIME (Local Interpretable Model-Agnostic Explanations),
  2. Partial Dependence Plots (PDPs) to decipher how each feature influences the prediction.
  3. Accumulated Local Effects (ALE) plots to decipher individual predictions rather than aggregations as used in PDPs.
  1. Toolkits and fairness packages such as:
  1. The What-if Tool by Google,
  2. The FairML bias audit toolkit,
  3. The Fair ClassificationFair Regression or Scalable Fair Clustering Python packages.

Parting Words        

My hope for these methods is that they inform data science practices that have sometimes gained too much inertia, and that they encourage practitioners to model beyond the ordinary and choose methods that could make the future just a little bit better for the people using their products. With this, I pass the baton to the remaining culprits to see what they may do to mitigate –.

This article ended abruptly due to data science related rioting near the author’s location.

Protests in the Era of Data Surveillance

Protests in the Era of Data Surveillance
By Niharika Sitomer | June 16, 2022

Modern technology is giving law enforcement the tools to be increasingly invasive in their pursuit of protesters – but what can we do about it?

In the summer of 2020, the country exploded with Black Lives Matter protests spurred by the murder of George Floyd. Even today, the wave of demonstrations and dissent has not ended, with many protests cropping up regarding the recent developments on the overturning of Roe v. Wade and the March for Our Lives events in response to gun violence tragedies. These movements are a sign of robust public involvement in politics and human rights issues, which is a healthy aspect of any democracy and a necessary means of holding governing bodies accountable. However, the use of technological surveillance by law enforcement to track protesters is a dangerous and ongoing occurrence that many may not even realize is happening.

The use of facial recognition technology poses a significant threat for wrongful arrests of innocent people due to misclassification by untested and unfairly developed algorithms. For instance, the software used by the London Metropolitan Police achieved only 19% accuracy when tested by Essex University. Moreover, many of these algorithms do not have adequate racial diversity in their training sets, leading the software to err and wrongfully classify mostly on racial minorities. The locations of deployment for facial recognition systems outside of protests are also extremely racially determined, with the brunt falling disproportionately on black neighborhoods. This represents a huge disparity in policing practices and increases the likelihood that innocent black citizens will be misidentified as protesters and arrested. What’s more, the use of facial recognition by law enforcement is largely unregulated, meaning that there are few repercussions for the harms caused by these systems.

It is not only the threat of uninvolved people being targeted, however, that makes police surveillance so dangerous. People who attend protests without endangering public safety are also at risk, despite constituting the vast majority of protesters (93% of summer 2020 protests were peaceful, and even violent protests contain many non-violent protesters). Drone footage is frequently used to record and identify people in attendance at protests, even if their actions do not warrant such attention. Perhaps even more concerning are vigilante apps and the invasion of private spaces. During the George Floyd protests, the Dallas Police launched an app called iWatch, where the public could upload footage of protesters to aid in their prosecution. Such vigilante justice entails the targeting of protesters by those who oppose their causes and seek to weaken them, even if doing so results in unjust punishments. Additionally, LAPD requested users of Ring, Amazon’s doorbell camera system, to provide footage of people who could potentially be connected to protests, despite it being a private camera network where people were unaware they could be surveilled without a warrant. Violations of privacy also occur on social media, as the FBI has requested personal information of protest planners from online platforms, even if their pages and posts had been set to private.

One of the most invasive forms of police surveillance of protesters is location tracking, which typically occurs through RFID chips, mobile technology, and automated license plate reader systems (ALPRs). RFID chips use radio frequencies to identify and track tags on objects, allowing both the scanning of personal information without consent and the tracking of locations long after people have left a protest. Similarly, mobile tracking uses signals from your phone to determine your location and access your private communications, and it can also be used at later times to track down and arrest people who had been in attendance at previous protests; such arrests have been made in the past without real proof of any wrongdoing. ALPRs can track protestors’ vehicles and access databases containing their locations over time, effectively creating a real-time tracker.

You can protect yourself from surveillance at protests by leaving your phone at home or keeping it turned off as much as possible, bringing a secondary phone you don’t use often, using encrypted messages to plan rather than unencrypted texts or social media, wearing a mask and sunglasses, avoiding vehicle transportation if possible, and changing clothes before and after attending. You should also abstain from posting footage of protests, especially that in which protesters’ faces or other identifiable features are visible. The aforementioned methods of law enforcement surveillance are all either currently legal, or illegal but unenforced. You can petition your local, state, and national representatives to deliver justice for past wrongs and to pass laws restricting police from using such methods on protesters without sufficient proof that the target of surveillance has endangered others.

Generalization Furthers Marginalization

Generalization Furthers Marginalization
By Meer Wu | June 18, 2022

In the world of big data where information is currency, people are interested in finding trends and patterns hidden within mountains of data. The cost of favoring these huge sets of data is that often, the relatively small amounts of data representing marginalized populations can often be overlooked and misused. How we currently deal with such limited data from marginalized groups is more a convenient convention than a true, fair representation. Two ways to better represent and understand marginalized groups through data is to ensure that they are proportionately represented and that each distinct group has its own category as opposed to being lumped together in analysis.

How do we currently deal with limited demographic data of marginalized groups?

Studies and experiments where the general population is of interest typically lack comprehensive data of marginalized groups. Marginalized populations are “those excluded from mainstream social, economic, educational, and/or cultural life,” including, but not limited to, people of color, the LGBTQIA+ community, and people with disabilities [[1]](#References:). There are a number of reasons that marginalized populations tend to have small sample sizes. Some common reasons include studies intentionally or unintentionally excluding their participation [[2]](#References:), people unwilling to disclose their identities in fear of potential discrimination, or the lack of quality survey design that accurately capture their identities [[3]](#References:). These groups with small sample sizes often end up being lumped together or excluded from the analysis altogether.

Disaggregating the “Asian” category: The category “Asian-American” can be broken down into many subpopulations.  Image source: [Minnesota Compass](https://www.mncompass.org/data-insights/articles/race-data-disaggregation-what-does-it-mean-why-does-it-matter)
What is the impact of aggregating or excluding these data?

While aggregating or excluding data of marginalized groups ensures anonymity and/or helps establish statistically meaningful results, it can actually cause harm to them. Excluding or aggregating marginalized communities erases their identities, preventing access to fair policies guided by research, thus perpetuating the very systemic oppression that causes such exclusion in the first place. For example, the 1998 Current Population Survey reported that 21% Asian-Americans and Pacific Islanders (AAPI) lack health insurance, but a closer look into subpopulations within AAPI revealed that only 13% of Japanese-Americans actually lacked insurance coverage while 34% of Korean-Americans were uninsured [[4]](#References:). The exclusion of pregnant women in clinical research jeopardizes fetal safety and prevents their access to effective medical treatment [[5]](#References:). The results of marginalized groups should never be excluded and should not be lumped together so that each population’s results are not misrepresented.

What happens when we report unaggregated results instead?

Reporting unaggregated data, or data that is separated into small units, can help provide more accurate representation, which will help create better care, support, and policies for marginalized communities. On the other hand, it may pose potential threats to individual privacy when the sample size is too small. This is often used as the motivation to not report data of marginalized populations. While protecting anonymity is crucial, aggregation and exclusion should not be solutions to the problem. Instead, efforts should be made to increase sample sizes of marginalized groups so that they are proportionally represented in the data.

While there are statistical methods that will give accurate results without risking individual privacy, these methods are more reactive than preventative towards the actual problem at hand- the lack of good quality data from marginalized populations. One way to ensure a representative sample size is to create categories that are inclusive and representative of marginalized groups. A good classification system of racial, gender, and other categories should make visible populations that are more nuanced than what traditional demographic categories offer. For example, using multiple-choice selection and capturing changes in identities over time in surveys can better characterize the fluidity and complexities of gender identity and sexual orientation for the LGBTQ+ community [[3]](#References:). Having more comprehensive data of marginalized groups will help drive more inclusive policy decisions. Over time, the U.S. Census has been adding more robust racial categories to include more minority groups. Until 1860, American Indian was not recognized as a race category on the Census, and 2000 marked the first year the Census allowed respondents to select more than one race category. Fast forwarding to 2020, people who marked their race as Black or White were asked to describe their origins in more detail [[6]](#References:). The census has yet to create a non-binary gender category, but for the first time in 2021, U.S. Census Bureau’s Household Pulse Survey includes questions about sexual orientation and gender identity [[7]](#References:). This process will take time, but it will be time well spent.

U.S. Census Racial Categories in 1790 vs. 2020: Racial categories displayed in the 1790 U.S. Census (left) and in the 2020 U.S. Census (right). This image only shows a fraction of all racial categories displayed in the 2020 U.S. Census. Image source: [Pew Research Center](https://www.pewresearch.org/interactives/what-census-calls-us/)
[[1]](https://doi.org/10.1007/s10461-020-02920-3) Sevelius, J. M., Gutierrez-Mock, L., Zamudio-Haas, S., McCree, B., Ngo, A., Jackson, A., Clynes, C., Venegas, L., Salinas, A., Herrera, C., Stein, E., Operario, D., & Gamarel, K. (2020). Research with Marginalized Communities: Challenges to Continuity During the COVID-19 Pandemic. AIDS and Behavior, 24(7), 2009–2012. https://doi.org/10.1007/s10461-020-02920-3

[[2]](https://doi.org/10.1371/journal.pmed.0030019) Wendler, D., Kington, R., Madans, J., Wye, G. V., Christ-Schmidt, H., Pratt, L. A., Brawley, O. W., Gross, C. P., & Emanuel, E. (2006). Are Racial and Ethnic Minorities Less Willing to Participate in Health Research? PLoS Medicine, 3(2), e19. https://doi.org/10.1371/journal.pmed.0030019

[[3]](https://doi.org/10.1177/2053951720933286) Ruberg, B., & Ruelos, S. (2020). Data for queer lives: How LGBTQ gender and sexuality identities challenge norms of demographics. Big Data & Society, 7(1), 2053951720933286. https://doi.org/10.1177/2053951720933286

[[4]](http://healthpolicy.ucla.edu/publications/Documents/PDF/Racial%20and%20Ethnic%20Disparities%20in%20Access%20to%20Health%20Insurance%20and%20Health%20Care.pdf) Brown, E. R., Ojeda, V. D., Wyn, R., & Levan, R. (2000). Racial and Ethnic Disparities in Access to Health Insurance and Health Care. UCLA Center for Health Policy Research and The Henry J. Kaiser Family Foundation, 105.

[[5]](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2747530/?report=classic) Lyerly, A. D., Little, M. O., & Faden, R. (2008). The second wave: Toward responsible inclusion of pregnant women in research. International Journal of Feminist Approaches to Bioethics, 1(2), 5–22. https://doi.org/10.1353/ijf.0.0047

[[6]](https://www.pewresearch.org/fact-tank/2020/02/25/the-changing-categories-the-u-s-has-used-to-measure-race/) Brown, A. (2020, February 25). The changing categories the U.S. census has used to measure race. Pew Research Center. https://www.pewresearch.org/fact-tank/2020/02/25/the-changing-categories-the-u-s-has-used-to-measure-race/

[[7]](https://news.stlpublicradio.org/politics-issues/2020-03-17/the-2020-census-is-underway-but-nonbinary-and-gender-nonconforming-respondents-feel-counted-out) Schmid, E. (2020, March 17). The 2020 Census Is Underway, But Nonbinary And Gender-Nonconforming Respondents Feel Counted Out. STLPR. https://news.stlpublicradio.org/politics-issues/2020-03-17/the-2020-census-is-underway-but-nonbinary-and-gender-nonconforming-respondents-feel-counted-out

Cycle tracking apps: what they know and who they share it with

Cycle tracking apps: what they know and who they share it with
By Kseniya Usovich | June 16, 2022

In the dawn of potential Roe v. Wade overturn we should be especially aware of who owns the data about our reproductive health. Cycle and ovulation apps, like Flo, Spot, Cycles and others, have been gaining popularity on the market in recent years. Those range from simple menstrual cycle calendars to full-blown ML-empowered pregnancy “planners”. The ML-support usually comes with a premium subscription. The kinds of data they collect ranges from name, age, and email to body temperature, pregnancy history and even your partner’s contact info. Most health and body-related data is entered by a user manually or through a consented linkage to other apps and devices such as Apple HealthKit and Google Fit. Although there is not much research on the quality of their predictions, these apps seem to be helpful overall even if it is just to make people more aware of their ovulation cycles.

The common claim in these apps’ privacy policies is that the information you share with them will not be shared externally. This, however, comes with caveats as they do share the de-identified personal information with third parties and are also required to share it with the law authorities in case of receiving a legal order to do so. Some specifically state that they would only share your personal (i.e. name, age group, etc.) and not health information if they are required by law. However, take it with a grain of salt as one of the more popular period tracking companies, Flo, has been sharing their users’ health data for marketing purposes from 2016 to 2019 without putting their customers in the know. And that was just for marketing; it is unclear if they can refuse sharing a particular user’s health information such as period cycles, pregnancies, and general analytics under a court order.

This becomes an even bigger concern in the light of the current political situation in the U.S. I am, of course, talking about the potential Roe v. Wade overturn. You see, if we lose the federal protection of the abortion rights, every state will be able to impose their own rules concerning reproductive health. This implies that some states will most likely prohibit abortion from very early on in the pregnancy; where currently the government can fully prohibit it only in the last trimester. This can mean that people that live in the states where abortion rights are limited to none will be bounded by these three options: giving birth, performing an abortion secretly (i.e. illegally under their state’s law), or traveling to another state. There is a whole Pandora box of classicism, racism, and other issues concerning this narrow set of options that I won’t be able to discuss since this post has a word limit. I will only mention that this set becomes even more limited if you simply have fewer resources or are dealing with health concerns that will not permit you to act on one or more of these “opportunities”.

However, let’s circle back to that app you might be keeping as your period calendar or a pocket-size analyst of all things ovulation. We, as users, are in this zone of limbo where without sharing enough information, we can’t get good predictions; but with oversharing, we always are under the risk of entrusting our private information in the hands of the service that might not be as protective of it as they implied. Essentially, the ball is still in your court and you can always request for the removal of your data. But if you live in the region that sees an abortion as a crime; beware of who may have a little too much data about your reproductive health journey.


[1] https://cycles.app/privacy-policy
[2] https://flo.health/privacy-portal
[3] https://www.cedars-sinai.org/blog/fertility-and-ovulation-apps.html
[4] https://www.nytimes.com/2021/01/28/us/period-apps-health-technology-women-privacy.html

[1] https://www.apkmonk.com/app/com.glow.android/
[2] https://www.theverge.com/2021/1/13/22229303/flo-period-tracking-app-privacy-health-data-facebook-google