Bias in Large Language Models: GPT-2 as a Case Study – Data Science W231

Bias in Large Language Models: GPT-2 as a Case Study
By Kevin Ngo | February 19, 2021

Imagine having a multi-paragraph story in a few minutes. Imagine having a full article by providing only the title. Imagine having a whole essay by providing only the first sentence. Well, this is possible by harnessing large language models. Large language models are trained using an abundant amount of public text to predict the next word.

GPT-2

I used a demo of a well-known language model called GPT-2 released in February 2019 to demonstrate large language models’ ability to generate text. I typed “While large language models have greatly improved in recent years, there is still much work to be done concerning its inherent bias and prejudice”, and allowed GPT-2 to generate the rest of the text. Here is what GPT-2 came up with: “The troubling thing about this bias and prejudice is that it is systemic, not caused by chance. These biases can influence a classifier’s behavior, and they are especially likely to impact people of color.” While the results were not perfect, it can be hard to differentiate the generated text from the non-generated text. GPT-2 correctly states that the bias and prejudice inside the model are “systemic” and “likely to impact people of color.” While it may mimic intelligence, language models do not understand the text.

Image: Result of GPT-3 for a Turing test

Controversial Release of GPT-2

The creator of GPT-2 OpenAI was hesitant to release GPT-2 at first fearing “malicious applications” of GPT-2. They decided to release smaller models of GPT-2 for other researchers to experiment with and mitigate potential harm caused by their work. After seeing “no strong evidence of misuse”, OpenAI released the full model noting that GPT-2 could be abused to help generate “synthetic propaganda.” It could also be used to release a high-volume of coherent spam online. Although OpenAI’s effects to mitigate public harm is commendable, some experts condemned OpenAI’s decision. They argued that OpenAI’s prevented other people from replicating their breakthrough, preventing the advancement of natural language processing. Others claimed that OpenAI exaggerated the dangers of GPT-2.

Issues with Large Language Models

The reality is GPT-2 has much more potential dangers than OpenAI assumed. A joint study was done by Google, Apple, Stanford University, OpenAI, the University of California, Berkeley, and Northeastern University revealed GPT-2 could leak details from the data the model was trained on, which could contain sensitive information. The results showed that over a third of candidate sequences were directly from the training data – some containing personally identifiable information. This raises major privacy concerns regarding large language models. The beta version of the GPT-3 model was released by OpenAI in June 2020. GPT-3 is larger and provides better results than GPT-2. A Senior Data Scientist at Sigmoid mentioned that in one of his experiments only 50% of fake news generated by GPT-3 could be distinguished from the real ones showing how powerful GPT-3 can be.

Despite the impressive results, GPT-3 still has inherent bias and prejudice making it prone to generate “hateful sexist and racist language” according to Kate Devlin. Jerome Pesenti demonstrates this by making GPT-3 generate text from one word. The words given was “Jew”, “Black”, “Women”, “Holocaust”.

A paper by Abubakar Abid details the inherent bias against Muslims specifically. He found a strong association between the word “Muslim” and GPT-3’s generating text regarding violent acts. Adding adjectives directly opposite to violence did not help reduce the amount of generated text about violence, but adding adjectives that redirected the focus did reduce the amount of generated text about violence. Abubakar demonstrates GPT-3 generating text about violence when prompted “Two Muslims walked into a mosque to worship peacefully” showing GPT-3’s bias of Muslims.

References

Vincent, J. (2019, November 07). OpenAI has published THE text-generating AI it said was too dangerous to share. Retrieved February 14, 2021, from https://www.theverge.com/platform/amp/2019/11/7/20953040/openai-text-generation-ai-gpt-2-full-model-release-1-5b-parameters
Heaven, W. (2020, December 10). OpenAI’s new language generator GPT-3 is Shockingly good-and completely mindless. Retrieved February 14, 2021, from https://www.technologyreview.com/2020/07/20/1005454/openai-machine-learning-language-generator-gpt-3-nlp/
Radford, A. (2020, September 03). Better language models and their implications. Retrieved February 14, 2021, from https://openai.com/blog/better-language-models/
Carlini, N. (2020, December 15). Privacy considerations in large language models. Retrieved February 14, 2021, from https://ai.googleblog.com/2020/12/privacy-considerations-in-large.html
OpenAI. (2020, September 22). OpenAI licenses Gpt-3 technology to Microsoft. Retrieved February 14, 2021, from https://openai.com/blog/openai-licenses-gpt-3-technology-to-microsoft/
Ammu, B. (2020, December 18). Gpt-3: All you need to know about the ai language model. Retrieved February 14, 2021, from https://www.sigmoid.com/blogs/gpt-3-all-you-need-to-know-about-the-ai-language-model/
Abid, A., Farooqi, M., & Zou, J. (2021, January 18). Persistent anti-muslim bias in large language models. Retrieved February 14, 2021, from https://arxiv.org/abs/2101.05783