Embedding Ethics in the Code we Write

Embedding Ethics in the Code we Write
By Allison Fox | February 18, 2022

In the last few years, several researchers and activists have pulled back the curtain on algorithmic bias, sharing glaring examples of how artificial intelligence (AI) models have the potential to discriminate based on age, sex, race, and other identities. A 2013 study conducted by Latanya Sweeney revealed that if you have a name that is more often given to black babies than white babies, you are 80% more likely to have an ad suggestive of an arrest display when a Google search for your name is performed (Sweeney 2013). 

Joy Boulamwini, Coded Bias

Similar discrimination was presented in the Netflix documentary Coded Bias – MIT researcher Joy Buolamwini discovered that facial recognition technologies do not accurately classify women or detect darker-skinned faces (Coded Bias 2020). Another case of algorithmic bias surfaced recently when news articles revealed that an algorithmic tool used by the Justice Department to assess the risk of prisoners returning to crime generated inconsistent results based on race (Johnson 2022). As the use of AI decision-making continues to increase, and more decisions are made by algorithms instead of humans, these algorithmic biases are only going to be amplified. Data science practitioners can take steps to mitigate these biases and their impacts by embedding ethics in the code they write – both figuratively and literally.

To better integrate conversations about ethics into the actual process of doing data science, the company DrivenData developed Deon, a command line tool that provides developers with reminders about ethics throughout the entire lifecycle of their project (DrivenData 2020). 

Deon Checklist: Command Line Tool

The checklist is organized into five sections, designed to mirror the various stages of a data science project – data collection, data storage, analysis, modeling, and deployment. Each section includes several questions that aim to provoke discussion and ensure that important steps are not overlooked. DrivenData also put together a table of real-world ethical issues with AI that maybe could have been avoided had the corresponding checklist questions been discussed during the data science project. For example, during analysis, it is important to examine the dataset for possible sources of bias, and then take steps to address those biases. If this step is not taken during analysis, unintended consequences can ensue – garbage in often results in garbage out, meaning that if you provide a model with biased data, the model is likely going to produce outputs that reflect that bias. For example, female jobseekers are more likely to be shown Google ads for lower-paying jobs than male jobseekers (Gibbs 2015). This discriminatory behavior by Google’s model could be a result of biased data, and had steps been taken to address biased data, this discriminatory treatment potentially could have been avoided. By using Deon to embed ethics in the code we write, data scientists will be reminded of these ethical risks while coding, and can take steps to address biased data before a model is released into the wild, in turn avoiding and mitigating potential unintended biases. 

Ethics are also relevant during the modeling stage of a data science project, where it is important to test model results for fairness across groups. The Deon checklist includes a checklist item on this step, and several open-source, code-based toolkits like AI Fairness 360 and Fairlearn have been developed recently to help data scientists assess and improve fairness in AI models. If this step is ignored, models may treat people differently based on certain identities, such as when Apple’s credit card first launched, and offered smaller lines of credit to men than women (Knight 2019). 

As the use of AI to make decisions that were previously made by humans becomes even more widespread, classification decisions will be made faster and at a larger scale, reaching more people than ever before. While this will have its benefits, in that the advent of new technologies such as the ones discussed in this blog can improve quality of life and access to opportunity, it will also have its consequences. Minorities populations who already face discrimination have been shown to be the most susceptible to these consequences. Open-source tools that embed ethical considerations in the data science process, like Deon, AI Fairness360, and Fairlearn, can all help to combat these consequences by encouraging data scientists to place ethics at the forefront during each stage of a data science project.

References:

1. Coded Bias. (2020). About the Film. Coded Bias. https://www.codedbias.com/

2. DrivenData. (2020). About – Deon. Deon. https://deon.drivendata.org/ 

3. Gibbs, S. (2015, July 8). Women less likely to be shown ads for high-paid jobs on Google, study shows. The Guardian. https://www.theguardian.com/technology/2015/jul/08/women-less-likely-ads-high-paid-jobs-google-study 

4. Johnson, C. (January 6, 2022). Flaws plague a tool meant to help low-risk federal prisoners win early release. NPR. https://www.npr.org/2022/01/26/1075509175/justice-department-algorithm-first-step-act 

5. Knight, W. (2019, November 19). The Apple Card Didn’t “See” Gender—and That’s the Problem. Wired. https://www.wired.com/story/the-apple-card-didnt-see-genderand-thats-the-problem/ 

6. Sweeney, Latanya, Discrimination in Online Ad Delivery (January 28, 2013). Available at SSRN: http://dx.doi.org/10.2139/ssrn.2208240