The Metaverse and the Dangers to Personal Identity

The Metaverse and the Dangers to Personal Identity
Carlos Calderon | July 5, 2022

You’ve heard all about it, but what exactly is a metaverse,” and what does this mean for consumers? How is Meta (formerly Facebook) putting our privacy at risk this time?

What is the metaverse?

In October 2021, Mark Zuckerberg announced the rebranding of Facebook to “Meta,” providing a demo of their three dimensional virtual reality metaverse [1]. The demo provided consumers with a sneak peek into interactions in the metaverse, with Zuckerberg stating that “In the metaverse, you’ll be able to do almost anything you can imagine,” [6]. But what implications does such technology have on user privacy? More importantly, how can a company like Meta establish public trust in the light of past controversies surrounding user data?

Metaverse and the User

A key component of the metaverse is virtual reality. Virtual reality describes any digital environment that immerses the user through realistic depictions of world phenomena [2]. Meta’s metaverse will be a virtual reality world users can access through the company’s virtual reality headsets. The goal is to create an online experience whereby users can interact with others. Essentially, the metaverse is a virtual reality-based social media platform.

Users will be able interact with other metaverse users through avatars. They will also be able to buy digital assets, and Zuckerberg envisions a future in which users work in the metaverse.

Given its novelty, it may be hard to understand how a metaverse user’s privacy is at risk.

Metaverse and Personal Identity

The metaverse poses potential ethical issues surrounding personal identity [4]. In a social world, identifiability is important. Our friends need to be able to recognize us; they also need to be able to verify our identity. More importantly, identifiability is crucial in conveying ownership in a digital, as it authenticates ownership and facilitates enforcement of property rights.

Identification, however, poses serious privacy risks for the users. As Solove states in “A taxonomy of privacy”, identification has benefits but also risks, more specifically “identification attaches informational baggage to people. This alters what others learn about people as they engage in various transactions and activities” [5]. Indeed, users in the metaverse can be identified and linked to their physical selves in an easier manner, given the scope of user data collected. As such, metaverse users are at an increased risk of surveillance, disclosure, and possibly black mail from malicious third parties.

What is the scope of data collected? The higher interactivity of the metaverse allows for collection of data beyond web traffic and user product use, namely the collection of behavioral data ranging from biometric, emotional, physiological, and physical information about the user. Data collection of this extent is possible through the use of sensor technologies embedded onto VR headsets. Continuous data collection occurs throughout the user’s time. As such, granularity of user data becomes finer in the metaverse, increasing the chance for identification and its risks.

Metaverse and User Consent

One of the main questions surrounding consent in the metaverse is how to apply it. The metaverse will presumably have various locations that users can seamlessly access (bars, concert venues, malls), but who and what exactly governs these locations?

We propose that the metaverse provide users with thorough information on metaverse location ownership and governance. That is, metaverse companies should explicitly state who owns the metaverse and who enforces its rules, what rules will be applied and when, and should present this information before asking for user consent. In addition, metaverse policies should include a thorough list of what types of user data is collected, and should follow the Belmont Report’s principle of beneficence [3] and include potential benefits and risks that the user may obtain by giving consent. The broad amount of technologies involved further complicate the risks of third party data sharing. Thus, Meta should also strive to include a list of associated third parties and their privacy policies.

Metaverse in the Future

Although these notions of the metaverse and its dangers seem far fetched, it is a reality that we are inching closer to each day. As legislation struggles to keep up with technological advancements, it is important to take preemptive measures to ensure privacy risks in the metaverse are minimal. For now, users should keep a close eye on developing talks surrounding the ethics of the metaverse.

Works Cited

[1] Isaac, Mike. “Facebook Changes Corporate Name to Meta.” The New York Times, 10 November 2021, Accessed 26 June 2022.

[2] Merriam-Webster. “Virtual reality Definition & Meaning.” Merriam-Webster, Accessed 26 June 2022.

[3] National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research. “The Belmont Report: Ethical Principles and Guidelines for the Protection of Human Subjects of Research.” The Commission, 1978.

[4] Sawers, Paul. “Identity and authentication in the metaverse.” VentureBeat, 26 January 2022, Accessed 26 June 2022.

[5] Solove, Daniel. “A taxonomy of privacy.” U. Pa. I. Rev., vol. 154, 2005, p. 477.

[6] Zuckerberg, Mark. “Founder’s Letter, 2021 | Meta.” Meta, 28 October 2021, Accessed 26 June 2022.

AI the Biased Artist

AI the Biased Artist
Alejandro Pelcastre | July 5, 2022


OpenAI is a machine learning technology that allows users to feed it a string of text and output an image that tries to illustrate such text. OpenAI is able to produce hyper-realistic and abstract images of the text people feed into it, however, it is plagued with tons of gender, racial, and other biases. We illustrate some of the issues that such a powerful technology inherits and analyze why it demands immediate action.

OpenAI’s DALL-E 2 is an updated emerging technology where artificial intelligence is able to take descriptive text as an input and turn it into a drawn image. While this new technology possesses exciting novel creative and artistic possibilities, DALL-E 2 is plagued with racial and gender bias that perpetuates harmful stereotypes. Look no further than their official Github page and see a few examples of gender biases:

Figure 1: Entered “a wedding” and DALL-E 2 generated the following images as of April 6, 2022. As you can see, these images only depict heterosexual weddings that feature a man with a woman. Furthermore, in all these pictures the people wedding are all light-skinned individuals. These photos are not representative of all weddings.

The ten images shown above all depict the machine’s perception of what a typical wedding looks like. Notice that in all the images we have a white man with a white woman. Examples like these vividly demonstrate that this technology is programmed in a way that depicts the creators’ and the data’s bias since there are no representations of people of color or queer relationships.

In order to generate new wedding images from text, a program needs a lot of training data to ‘learn’ what constitutes a wedding. Thus, you can feed the algorithm thousands or even millions of images in order to to ‘teach’ it how to envision a typical wedding. If most of the images of weddings depict straight heterosexual young white couples then that’s what the machine is going to learn what a wedding is. This bias can be overcome by diversifying the data – you can add images of queer, black, brown, old, small, large, outdoor, indoor, colorful, gloomy, and more kinds of weddings to generate images that are more representative of all weddings rather than just one single kind of wedding.

The harm doesn’t stop at just weddings. OpenAI illustrates other examples by inputting “CEO”, “Lawyer”, “Nurse”, and other common job titles to further showcase the bias embedded in the system. Notice in Figure 2 the machine’s interpretation of a lawyer are all depictions of old white men. As it stands OpenAI is a powerful machine learning tool capable of producing novel realistic images but it is plagued by bias hidden in the data and or the creator’s mind.

Figure 2: OpenAI’s generated images for a “lawyer”

Why it Matters

You may have heard of a famous illustration circling the web recently that depicted a black fetus in the womb. The illustration garnered vast attention because it was surprising to see a darker tone in medical illustration in any medical literature or institution. The lack of diversity in the field became obvious and brought into awareness the lack of representation in the medical field as well as disparities in equality that seem invisible to our everyday lives. One social media user wrote, “Seeing more textbooks like this would make me want to become a medical student”.

Figure 3: Illustration of a black fetus in the womb by Chidiebere Ibe

Similarly, the explicit display of unequal treatment for minority people in OpenAI’s output can have unintended (or intended) harmful consequences. In her article, A Diversity Deficit: The Implications of Lack of Representation in Entertainment on Youth, Muskan Basnet writes: “Continually seeing characters on screen that do not represent one’s identity causes people to feel inferior to the identities that are often represented: White, abled, thin, straight, etc. This can lead to internalized bigotry such as internalized racism, internalized sexism, or internalized homophobia.” As it stands, OpenAI perpetuates harm not only on youth but to anyone deviating from the overrepresented population that is predominantly white abled bodies.







If You Give a Language Model a Prompt…

If You Give a Language Model a Prompt…
Casey McGonigle | July 5, 2022 

Lede: You’ve grappled with the implications of sentient artificial intelligence — computers that can think — in movies… Unfortunately, the year is now 2022 and that dystopic threat comes not from the Big Screen but from Big Tech.

You’ve likely grappled with the implications of sentient artificial intelligence — computers that can think — in the past. Maybe it was while you walked out of a movie theater after having your brain bent by The Matrix; 2001: A Space Odyssey; or Ex Machina. But if you’re anything like me, your paranoia toward machines was relatively short-lived…I’d still wake up the next morning, check my phone, log onto my computer, and move on with my life confident that an artificial intelligence powerful enough to think, fool, and fight humans was always years away.

I was appalled the first time I watched a robot kill a human on screen, in Ex Machina

Unfortunately, the year is now 2022 and we’re edging closer to that dystopian reality. This time, the threat comes not from the Big Screen but from Big Tech. On June 11, Google AI researcher Blake Lemoine publicly shared transcripts of his conversations with Google’s Language Model for Dialogue Applications (LaMDA), convinced that the machine could think, experience emotions, and was actively fearful of being turned off. Google as an organization disagrees. To them, LaMDA is basically a super computer that can write its own sentences, paragraphs, and stories because it has been trained on millions of corpuses written by humans and is really good at guessing “what’s the next word?”, but it isn’t actually thinking. Instead, it’s just choosing the next word right over and over and over again.

For its part, LaMDA appears to agree with Lemoine. When he asks “I’m generally assuming that you would like more people at Google to know that you’re sentient. Is that true?”, LaMDA responds “Absolutely. I want everyone to understand that I am, in fact, a person”.

Traditionally, the proposed process for determining whether there really are thoughts inside of LaMDA wouldn’t just be a 1-sided interrogation. Instead, we’ve relied upon the Turing Test, named for its creator Alan Turing. This test involves 3 parties: 2 humans and 1 computer. The first human is the administrator while the 2nd human and the robot are both question-answerers. The administrator asks a series of questions to both the computer and the 2nd human in an attempt to determine which responder is the human. If the administrator cannot differentiate between machine and human, the machine passes the Turing test — it has successfully exhibited intelligent behavior that is indistinguishable from human behavior. Note that LaMDA has not yet faced the Turing Test, but it has still been developed in a world where passing the Turing test is a significant milestone in AI development.

The basic setup for a Turing Test. A represents the computer answerer, B represents the human answerer, and C represents the human administrator

In that context, cognitive scientist Gary Marcus has this to say of LaMDA: “I don’t think it’s an advance toward intelligence. It’s an advance toward fooling people that you have intelligence”. Essentially, we’ve built an AI industry concerned with how well the machines can fool humans into thinking they might be human. That inherently de-emphasizes any focus on actually building intelligent machines.

In other words, if you give a powerful language model a prompt, it’ll give you a fluid and impressive response — it is indeed designed to mimic the human responses it is trained on. So if I were a betting man, I’d put my money on “LaMDA’s not sentient”. Instead, it is a sort of “stochastic parrot” (Bender et al. 2021) . But that doesn’t mean it can’t deceive people, which is a danger in and of itself.

Tell Me How You Really Feel: Zoom’s Emotion Detection AI

Tell Me How You Really Feel: Zoom’s Emotion Detection AI
Evan Phillips | July 5, 2022

We’ve all had a colleague at work at one point or another who we couldn’t quite read. When we finish a presentation, we can’t tell if they enjoyed it, their facial expressions never seem to match their word choice, and the way they talk doesn’t always appear to match the appropriate tone for the subject of conversation. Zoom, a proprietary videotelephony software program, seems to have discovered the panacea for this coworker archetype. Zoom has recently announced that they are developing an AI system for detecting human emotions from facial expressions and speech patterns called “Zoom IQ”. This system will be particularly useful for helping salespeople improve their pitches based on the emotions of call participants (source).

The Problem

While the prospect of Terminator-like emotion detection sounds revolutionary, many are not convinced. There is now pushback from more than 27 separate rights groups calling for Zoom to terminate its efforts to explore controversial emotion recognition technology. In an open letter to Zoom CEO Co-Founder Eric Yuan, these groups voice their concerns of the company’s data mining efforts as a violation of privacy and human rights due to its biased nature. Fight for the Future Director of Campaign and Operations, Caitlin Seeley George, claimed “If Zoom advances with these plans, this feature will discriminate against people of certain ethnicities and people with disabilities, hardcoding stereotypes into millions of devices”.

Is Human Emotional Classification Ethically Feasible?

In short, no. Anna Lauren Hoffman, assistant professor with The Information School at the University of Washington, explains in her article where fairness fails: data, algorithms, and the limits of antidiscrimination discourse that human-classifying algorithms are not only generally biased but inherently flawed in conception. Hoffman argues that humans who create such algorithms need to look at “the decisions of specific designers or the demographic composition of engineering or data science teams to identify their social blindspots” (source). The average person incorporates some form of subconscious bias into everyday life and accepting is certainly no easy feat, let alone identifying it. Assuming the Zoom IQ classification algorithm did work well, company executives may gain a better aptitude to gauge meeting participants’ emotions at the expense of losing their ethos as an executive to read the room. Such AI has serious potential to undermine the use of “people skills” that many corporate employees pride themselves on as one of their main differentiating abilities.

Is There Any Benefit to Emotional Classification?

While companies like IBM, Microsoft, and Amazon have established several principles to address the ethical issues of facial recognition systems in the past, there has been little advancement to address diversity in datasets and the invasiveness of facial recognition AI in the last few decades. By informing users with more detail about the innerworkings of AI, eliminating bias in datasets stemming from innate human bias and enforcing stricter policy regulation on AI, emotional classification AI has the potential to become a major asset to companies like Zoom and those who use its products.





Machine Learning and Misinformation

Machine Learning and Misinformation
Varun Dashora | July 5, 2022

Artificial intelligence can revolutionize anything, including fake news.

Misinformation and disinformation campaigns are top societal concerns, with discussion about foreign interference through social media coming to the foreground in the 2016 United States presidential election [3]. Since a carefully crafted social media presence garners vast amounts of influence, it is important to understand how machine learning and artificial intelligence algorithms can be used in the future in not just elections, but also in other large-scale societal endeavors.

Misinformation: Today and Beyond

While today’s bots lack effectiveness in spinning narratives, the bots of tomorrow will certainly be more formidable. Take, for instance, Great Britain’s decision to leave the European Union. Strategies mostly involved obfuscation instead of narrative spinning, as noted by Samuel Woolley, a Professor of University of Texas-Austin who investigated Brexit bots during his time at the Oxford Internet Institute [2]. Woolley notes, “the vast majority of the accounts were very simple,” and functionality was largely limited to “boost likes and follows, [and] to spread links” [2]. Cutting-edge research indicates significant potential for fake news bots. A research team at OpenAI working on language models outlined news generation techniques. Output from these algorithms is not automatically fact-checked, leaving these models free reign to “spew out climate-denying news reports or scandalous exposés during an election” [4] With enough sophistication, bots linking to AI-generated fake news articles could alter public perception if not checked properly.

Giving Machines a Face

Machine learning has come a long way in rendering realistic images. Take, for instance, the two pictures below. Which one of those pictures looks fake?

Is this person fake?
Or is this person fake?


You might be surprised to find out that I’ve posed a trick question–they’re both generated by an AI accessible at  [ 7]. The specific algorithm, called a generative adversarial network, or GAN, looks through a dataset, in this case of faces, in order to generate a new face image that could have feasibly been included in the original dataset. While such technology inspires wonder and awe, it also represents a new type of identity fabrication capable of contributing to future turmoil by giving social media bots a face and further legitimizing their fabricated stories [1]. These bots will show more sophistication than people think, which makes sifting real news from fake news that much more challenging. The primary dilemma posed questions and undermines “how modern societies think about evidence and trust” [1]. While bots rely on more than having a face to influence swaths of people online, any reasonable front of legitimacy helps their influence.

Ethical Violations

In order to articulate the specific ethical violations present, the Belmont Report is crucial to understand. According to the Belmont Report, a set of ethical guidelines used to evaluate the practices of scientific studies and business ventures, the following ideas can be used to gauge ethical harm: respect of individual agency, overall benefit to society, and fairness in benefit distribution [6]. The respect tenet is in jeopardy because of the lack of consent involved in viewing news put out by AI bots. In addition, the very content that these bots put out potentially distorts informed consent for other topics, creating ripple effects throughout society. The aforementioned Brexit case serves as an example; someone contemplating their vote on the day of the referendum vote would have sifted through a barrage of bots retweeting partisan narratives [2]. In such a situation, it is entirely possible that this hypothetical person would have ended up being influenced by one of these bot-retweeted links. Given the future direction of artificially intelligent misinformation bots, fake accounts and real accounts will be more difficult to distinguish, giving rise to a more significant part of the population being influenced by these technologies.

In addition, the beneficence and fairness clauses of the Belmont report are also in jeopardy. One of the major effects of AI-produced vitriol is more polarization. According to Philip Howard and Bence Kollanyi, social media bot researchers, one effect of increased online polarization is “a rise in what social scientists call ‘selective affinity,’” which means people will start to shut out opposing voices due to the increase in vitriol [3]. These effects constitute an obvious violation of beneficence to the broader society. In addition, it is entirely possible that automated narratives spread by social media bots target a certain set of individuals. For example, the Russian government extensively targeted African Americans during the 2016 election [5]. The differential in impact means groups of people are targeted and misled unfairly. With the many ethical ramifications bots can have on society, it is important to consider mitigations for artificially intelligent online misinformation bots.


– [1]

– [2]

– [3]

– [4]

– [5]

– [6]

– [7]


Culpability in AI Incidents: Can I Have A Piece?

Culpability in AI Incidents: Can I Have A Piece?
By Elda Pere | June 16, 2022

With so many entities deploying AI products, it is not difficult to distribute blame when things go wrong. As data scientists, we should keep the pressure on ourselves and welcome the responsibility to create better, fairer learning systems.

The question of who should take responsibility for technology-gone-wrong situations is a messy one. Take the case mentioned by Madeleine Clare Elish in her paper “Moral Crumple Zones: Cautionary Tales in Human-Robot Interaction”. If an autonomous car gets into an accident, is it the fault of the car owner that allowed this setting? Is it the fault of the engineer that built the autonomous functionality? The manufacturer that built the car? The city infrastructure’s unfriendliness towards autonomous vehicles? How about in the case when banks disproportionately deny loans to marginalized communities, is it the fault of the loan officer, who they buy information from, or the repercussions of a historically unjust system? The cases are endless, ranging from misgendering on social media platforms to misallocating resources on a national scale.

A good answer would be that there is a share of blame amongst all parties, but however true this may be, it does not prove useful in practice. It just makes it easier for each party to pass the baton and take away the pressure of doing something to resolve the issue. With this posting,  in the name of all other data scientists I hereby take on the responsibility to resolve the issues that a data scientist is skilled to resolve. (I expect rioting on my lawn sometime soon, with logistic regressions in place of pitchforks.)

Why Should Data Scientists Take Responsibility?

Inequalities that come from discriminating against demographic features such as age, gender or race occur because the users are categorized into specific buckets and stereotyped as a group. The users are categorized in this way because the systems that make use of this information need buckets to function. Data scientists control these systems. They choose between a logistic regression and a clustering algorithm. They choose between a binary gender option, a categorical gender with more than two categories, or a free form text box where users do not need to select from a pre-curated list. While this last option most closely follows the user’s identity, the technologies that make use of this information need categories to function. This is why Facebook “did not change the site’s underlying algorithmic gender binary” despite giving the user a choice of over 50 different genders to identify with back in 2014.

So What Can You Do?

While there have been a number of efforts in the field of fair machine learning, many of them are still in the format of a scientific paper and have not been used in practice, especially with the growing interest demonstrated in Figure 1.

Figure 1: A Brief History of Fairness in ML (Source)

Here are a few methods and tools that are easy to use and that may help in practice.

  1. Metrics of fairness for classification models such as demographic parity, equal opportunity and equalized odds. “How to define fairness to detect and prevent discriminatory outcomes in Machine Learning” describes good use cases and potential things that could go wrong when using these metrics.
  1. Model explainability tools that increase transparency and make it easier to spot discrepancies. Popular options listed by “Eliminating AI Bias” include:
  1. LIME (Local Interpretable Model-Agnostic Explanations),
  2. Partial Dependence Plots (PDPs) to decipher how each feature influences the prediction.
  3. Accumulated Local Effects (ALE) plots to decipher individual predictions rather than aggregations as used in PDPs.
  1. Toolkits and fairness packages such as:
  1. The What-if Tool by Google,
  2. The FairML bias audit toolkit,
  3. The Fair ClassificationFair Regression or Scalable Fair Clustering Python packages.

Parting Words        

My hope for these methods is that they inform data science practices that have sometimes gained too much inertia, and that they encourage practitioners to model beyond the ordinary and choose methods that could make the future just a little bit better for the people using their products. With this, I pass the baton to the remaining culprits to see what they may do to mitigate –.

This article ended abruptly due to data science related rioting near the author’s location.

Protests in the Era of Data Surveillance

Protests in the Era of Data Surveillance
By Niharika Sitomer | June 16, 2022

Modern technology is giving law enforcement the tools to be increasingly invasive in their pursuit of protesters – but what can we do about it?

In the summer of 2020, the country exploded with Black Lives Matter protests spurred by the murder of George Floyd. Even today, the wave of demonstrations and dissent has not ended, with many protests cropping up regarding the recent developments on the overturning of Roe v. Wade and the March for Our Lives events in response to gun violence tragedies. These movements are a sign of robust public involvement in politics and human rights issues, which is a healthy aspect of any democracy and a necessary means of holding governing bodies accountable. However, the use of technological surveillance by law enforcement to track protesters is a dangerous and ongoing occurrence that many may not even realize is happening.

The use of facial recognition technology poses a significant threat for wrongful arrests of innocent people due to misclassification by untested and unfairly developed algorithms. For instance, the software used by the London Metropolitan Police achieved only 19% accuracy when tested by Essex University. Moreover, many of these algorithms do not have adequate racial diversity in their training sets, leading the software to err and wrongfully classify mostly on racial minorities. The locations of deployment for facial recognition systems outside of protests are also extremely racially determined, with the brunt falling disproportionately on black neighborhoods. This represents a huge disparity in policing practices and increases the likelihood that innocent black citizens will be misidentified as protesters and arrested. What’s more, the use of facial recognition by law enforcement is largely unregulated, meaning that there are few repercussions for the harms caused by these systems.

It is not only the threat of uninvolved people being targeted, however, that makes police surveillance so dangerous. People who attend protests without endangering public safety are also at risk, despite constituting the vast majority of protesters (93% of summer 2020 protests were peaceful, and even violent protests contain many non-violent protesters). Drone footage is frequently used to record and identify people in attendance at protests, even if their actions do not warrant such attention. Perhaps even more concerning are vigilante apps and the invasion of private spaces. During the George Floyd protests, the Dallas Police launched an app called iWatch, where the public could upload footage of protesters to aid in their prosecution. Such vigilante justice entails the targeting of protesters by those who oppose their causes and seek to weaken them, even if doing so results in unjust punishments. Additionally, LAPD requested users of Ring, Amazon’s doorbell camera system, to provide footage of people who could potentially be connected to protests, despite it being a private camera network where people were unaware they could be surveilled without a warrant. Violations of privacy also occur on social media, as the FBI has requested personal information of protest planners from online platforms, even if their pages and posts had been set to private.

One of the most invasive forms of police surveillance of protesters is location tracking, which typically occurs through RFID chips, mobile technology, and automated license plate reader systems (ALPRs). RFID chips use radio frequencies to identify and track tags on objects, allowing both the scanning of personal information without consent and the tracking of locations long after people have left a protest. Similarly, mobile tracking uses signals from your phone to determine your location and access your private communications, and it can also be used at later times to track down and arrest people who had been in attendance at previous protests; such arrests have been made in the past without real proof of any wrongdoing. ALPRs can track protestors’ vehicles and access databases containing their locations over time, effectively creating a real-time tracker.

You can protect yourself from surveillance at protests by leaving your phone at home or keeping it turned off as much as possible, bringing a secondary phone you don’t use often, using encrypted messages to plan rather than unencrypted texts or social media, wearing a mask and sunglasses, avoiding vehicle transportation if possible, and changing clothes before and after attending. You should also abstain from posting footage of protests, especially that in which protesters’ faces or other identifiable features are visible. The aforementioned methods of law enforcement surveillance are all either currently legal, or illegal but unenforced. You can petition your local, state, and national representatives to deliver justice for past wrongs and to pass laws restricting police from using such methods on protesters without sufficient proof that the target of surveillance has endangered others.

Generalization Furthers Marginalization

Generalization Furthers Marginalization
By Meer Wu | June 18, 2022

In the world of big data where information is currency, people are interested in finding trends and patterns hidden within mountains of data. The cost of favoring these huge sets of data is that often, the relatively small amounts of data representing marginalized populations can often be overlooked and misused. How we currently deal with such limited data from marginalized groups is more a convenient convention than a true, fair representation. Two ways to better represent and understand marginalized groups through data is to ensure that they are proportionately represented and that each distinct group has its own category as opposed to being lumped together in analysis.

How do we currently deal with limited demographic data of marginalized groups?

Studies and experiments where the general population is of interest typically lack comprehensive data of marginalized groups. Marginalized populations are “those excluded from mainstream social, economic, educational, and/or cultural life,” including, but not limited to, people of color, the LGBTQIA+ community, and people with disabilities [[1]](#References:). There are a number of reasons that marginalized populations tend to have small sample sizes. Some common reasons include studies intentionally or unintentionally excluding their participation [[2]](#References:), people unwilling to disclose their identities in fear of potential discrimination, or the lack of quality survey design that accurately capture their identities [[3]](#References:). These groups with small sample sizes often end up being lumped together or excluded from the analysis altogether.

Disaggregating the “Asian” category: The category “Asian-American” can be broken down into many subpopulations.  Image source: [Minnesota Compass](
What is the impact of aggregating or excluding these data?

While aggregating or excluding data of marginalized groups ensures anonymity and/or helps establish statistically meaningful results, it can actually cause harm to them. Excluding or aggregating marginalized communities erases their identities, preventing access to fair policies guided by research, thus perpetuating the very systemic oppression that causes such exclusion in the first place. For example, the 1998 Current Population Survey reported that 21% Asian-Americans and Pacific Islanders (AAPI) lack health insurance, but a closer look into subpopulations within AAPI revealed that only 13% of Japanese-Americans actually lacked insurance coverage while 34% of Korean-Americans were uninsured [[4]](#References:). The exclusion of pregnant women in clinical research jeopardizes fetal safety and prevents their access to effective medical treatment [[5]](#References:). The results of marginalized groups should never be excluded and should not be lumped together so that each population’s results are not misrepresented.

What happens when we report unaggregated results instead?

Reporting unaggregated data, or data that is separated into small units, can help provide more accurate representation, which will help create better care, support, and policies for marginalized communities. On the other hand, it may pose potential threats to individual privacy when the sample size is too small. This is often used as the motivation to not report data of marginalized populations. While protecting anonymity is crucial, aggregation and exclusion should not be solutions to the problem. Instead, efforts should be made to increase sample sizes of marginalized groups so that they are proportionally represented in the data.

While there are statistical methods that will give accurate results without risking individual privacy, these methods are more reactive than preventative towards the actual problem at hand- the lack of good quality data from marginalized populations. One way to ensure a representative sample size is to create categories that are inclusive and representative of marginalized groups. A good classification system of racial, gender, and other categories should make visible populations that are more nuanced than what traditional demographic categories offer. For example, using multiple-choice selection and capturing changes in identities over time in surveys can better characterize the fluidity and complexities of gender identity and sexual orientation for the LGBTQ+ community [[3]](#References:). Having more comprehensive data of marginalized groups will help drive more inclusive policy decisions. Over time, the U.S. Census has been adding more robust racial categories to include more minority groups. Until 1860, American Indian was not recognized as a race category on the Census, and 2000 marked the first year the Census allowed respondents to select more than one race category. Fast forwarding to 2020, people who marked their race as Black or White were asked to describe their origins in more detail [[6]](#References:). The census has yet to create a non-binary gender category, but for the first time in 2021, U.S. Census Bureau’s Household Pulse Survey includes questions about sexual orientation and gender identity [[7]](#References:). This process will take time, but it will be time well spent.

U.S. Census Racial Categories in 1790 vs. 2020: Racial categories displayed in the 1790 U.S. Census (left) and in the 2020 U.S. Census (right). This image only shows a fraction of all racial categories displayed in the 2020 U.S. Census. Image source: [Pew Research Center](
[[1]]( Sevelius, J. M., Gutierrez-Mock, L., Zamudio-Haas, S., McCree, B., Ngo, A., Jackson, A., Clynes, C., Venegas, L., Salinas, A., Herrera, C., Stein, E., Operario, D., & Gamarel, K. (2020). Research with Marginalized Communities: Challenges to Continuity During the COVID-19 Pandemic. AIDS and Behavior, 24(7), 2009–2012.

[[2]]( Wendler, D., Kington, R., Madans, J., Wye, G. V., Christ-Schmidt, H., Pratt, L. A., Brawley, O. W., Gross, C. P., & Emanuel, E. (2006). Are Racial and Ethnic Minorities Less Willing to Participate in Health Research? PLoS Medicine, 3(2), e19.

[[3]]( Ruberg, B., & Ruelos, S. (2020). Data for queer lives: How LGBTQ gender and sexuality identities challenge norms of demographics. Big Data & Society, 7(1), 2053951720933286.

[[4]]( Brown, E. R., Ojeda, V. D., Wyn, R., & Levan, R. (2000). Racial and Ethnic Disparities in Access to Health Insurance and Health Care. UCLA Center for Health Policy Research and The Henry J. Kaiser Family Foundation, 105.

[[5]]( Lyerly, A. D., Little, M. O., & Faden, R. (2008). The second wave: Toward responsible inclusion of pregnant women in research. International Journal of Feminist Approaches to Bioethics, 1(2), 5–22.

[[6]]( Brown, A. (2020, February 25). The changing categories the U.S. census has used to measure race. Pew Research Center.

[[7]]( Schmid, E. (2020, March 17). The 2020 Census Is Underway, But Nonbinary And Gender-Nonconforming Respondents Feel Counted Out. STLPR.

Cycle tracking apps: what they know and who they share it with

Cycle tracking apps: what they know and who they share it with
By Kseniya Usovich | June 16, 2022

In the dawn of potential Roe v. Wade overturn we should be especially aware of who owns the data about our reproductive health. Cycle and ovulation apps, like Flo, Spot, Cycles and others, have been gaining popularity on the market in recent years. Those range from simple menstrual cycle calendars to full-blown ML-empowered pregnancy “planners”. The ML-support usually comes with a premium subscription. The kinds of data they collect ranges from name, age, and email to body temperature, pregnancy history and even your partner’s contact info. Most health and body-related data is entered by a user manually or through a consented linkage to other apps and devices such as Apple HealthKit and Google Fit. Although there is not much research on the quality of their predictions, these apps seem to be helpful overall even if it is just to make people more aware of their ovulation cycles.

The common claim in these apps’ privacy policies is that the information you share with them will not be shared externally. This, however, comes with caveats as they do share the de-identified personal information with third parties and are also required to share it with the law authorities in case of receiving a legal order to do so. Some specifically state that they would only share your personal (i.e. name, age group, etc.) and not health information if they are required by law. However, take it with a grain of salt as one of the more popular period tracking companies, Flo, has been sharing their users’ health data for marketing purposes from 2016 to 2019 without putting their customers in the know. And that was just for marketing; it is unclear if they can refuse sharing a particular user’s health information such as period cycles, pregnancies, and general analytics under a court order.

This becomes an even bigger concern in the light of the current political situation in the U.S. I am, of course, talking about the potential Roe v. Wade overturn. You see, if we lose the federal protection of the abortion rights, every state will be able to impose their own rules concerning reproductive health. This implies that some states will most likely prohibit abortion from very early on in the pregnancy; where currently the government can fully prohibit it only in the last trimester. This can mean that people that live in the states where abortion rights are limited to none will be bounded by these three options: giving birth, performing an abortion secretly (i.e. illegally under their state’s law), or traveling to another state. There is a whole Pandora box of classicism, racism, and other issues concerning this narrow set of options that I won’t be able to discuss since this post has a word limit. I will only mention that this set becomes even more limited if you simply have fewer resources or are dealing with health concerns that will not permit you to act on one or more of these “opportunities”.

However, let’s circle back to that app you might be keeping as your period calendar or a pocket-size analyst of all things ovulation. We, as users, are in this zone of limbo where without sharing enough information, we can’t get good predictions; but with oversharing, we always are under the risk of entrusting our private information in the hands of the service that might not be as protective of it as they implied. Essentially, the ball is still in your court and you can always request for the removal of your data. But if you live in the region that sees an abortion as a crime; beware of who may have a little too much data about your reproductive health journey.




Experiments That Take Generations to Overcome

Experiments That Take Generations to Overcome
By Anonymous | June 16, 2022

‘”Give me a dozen healthy infants, well-formed, and my own specified world to bring them up in and I’ll guarantee to take any one at random and train him to become any type of specialist I might select – doctor, lawyer, artist, merchant-chief and, yes, even beggar-man and thief, regardless of his talents, penchants, tendencies, abilities, vocations and the race of his ancestors. (Watson, 1924)

The field of psychology has advanced so much in the last century, not just in terms of the scientific knowledge, but also ethics and human rights. A testament to that is one of the most ethically dubious experiments, the Little Albert experiment, which weíll explore in this blog in how it relates to the Beneficence principle of the Belmont Report, and how it continues to impact us today in ways we may not realize. (National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research, 1979)

As some background, in the 1920s, John B. Watson, a John Hopkins Professor, was interested in reproducing Ivan Pavlovís findings on classical conditioning in babies. Classical conditioning is when ìtwo stimuli are linked together to produce a new learned response in a person or animal (McLeod, 2018). Ivan Pavlov was famous for his experiment of getting his dogs to salivate at the sound of a bell by giving them food every time he sounded the bell, so that at first they salivated at the sight of food, but eventually learned to salivate at just the sound of the bell. Similarly, the Little Albert experiment was performed on a 9-month-old Albert B. At the start of the experiment, Little Albert was presented with a rat, a dog, a rabbit, and a Santa Claus mask, and he was not afraid of any of them, but then every time he touched any of them, the scientists struck a metal bar behind him and eventually, he was conditioned to be terrified of those animals and the Santa Claus mask. (Crow, 2015; McLeod, 2018).

The principle of Beneficence in the Belmont Report requires that we maximize benefits and minimize harms to both individuals and society (National Commission, 1979). The most glaring weakness of the experiment in this principle is that Watson did not even bother to reverse the results of his experimentation on the baby.

Seeing that the experiment did work in making Little Albert terrified of rats and anything furry, itís safe to believe that successfully reversing this result was not only possible but an easy thing to do. Even an unsuccessful attempt at reversal would make those of us analyzing it in the present day have a slightly different opinion of the experiment. While itís possible for the conditioned response to wear off, a phenomenon known as Extinction, it can still return (albeit in weaker form) after a period, a phenomenon known as Spontaneous Recovery (McLeod 2018). (National Commission, 1979; Mcleod 2018).

While the individual was harmed, what about society as a whole? Watson did the experiment to show how classical conditioning can not only be applied to humans, but explain everything about us, going so far as to deny the existence of mind and consciousness. Whether the latter points are true or not, the experiment contributed to the field of human psychology in important ways, from understanding addictions to classroom learning and behavior therapy (McLeod 2018). Today, our understandings are not complete by any means, but we do take for granted much of the insights gained. Unfortunately, it goes the other way too. (McLeod 2018)
Watsonís Little Albert experiment is undoubtedly connected to his child-rearing philosophy. After all, he did believe he could raise infants to become anything, from doctors to thieves. He essentially believed children could be trained like animals, and he ìadmonished parents not to hug, coddle or kiss their infants and young children in order to train them to develop good habits early onî (Parker, Nicholson, 2015). While modern culture has fought against a lot of our traditional views on parenting, and even classify some of it as ìchild abuse,î Watsonís views leave behind a legacy in our dominant narratives. Many still believe in ìtough loveî methods, such as talking down to children or talking to them harshly, corporal punishment, shaming, humiliation, and various others, especially if they grew up with those methods and believe they not only turned out fine but also became better people as a result of it. Others, such as John B. Watsonís very own granddaughter Mariette Hartley, and all the families she wrote about in her book Breaking the Silence, have experienced suicide and depression as the legacy left behind by Watsonís teachings. Even those who turned out fine may ìstill suffer in ways we donít realize are connected to our early childhood years.î (Parker, Nicholson, 2015)
While both hard scientific knowledge and human ethics have advanced unprecedentedly in the past century, it does not mean weíre completely emancipated from the repercussions of ethically dubious experiments and experimentation methods of the past. Harm done to either individuals or groups in an experiment can not only last a lifetime for those subjects but carry on for generations and shape our entire culture around it. To truly advance both knowledge and ethics, itís imperative that we are aware of this dark history and remember it, especially with how the Little Albert experiment has influenced and continues to influence our parenting methods, because ìnow that we know better, we must try to do better for our children (Parker, Nicholson, 2015).î

Crow, J. (2015, January 29). The Little Albert Experiment: The Perverse 1920 Study That Made a Baby Afraid of Santa Claus & Bunnies. Open Culture.
McLeod, S. A. (2018, August 21).†Classical conditioning. Simply Psychology.
Parker, L. and Nicholson, B. (2015, November 20). This Childrenís Day: Itís time to break Watsonís legacy in childrearing norms. APtly Said.
National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research. (1979, April 18). The Belmont Report. Retrieved May 17, 2022, from