Recently Perspective, an application created by Google to score “toxicity” in online comments, has come under fire for displaying high levels of gender and racial bias. Here we will attempt to view this perceived bias through a machine learning lens.
There are two types of bias exhibited in machine learning and it is useful to distinguish them. On one hand we have the contribution due to algorithmic bias. This occurs when the mathematical algorithm itself is too simple to account for all of the variance in the observed data. On the other hand we have training bias. This occurs when the data used as training input into the mathematical model is too limited to explain all of the variance in the world. The solution would seem simple: increase the complexity of the algorithm and the variety of the training data. However in the real world this is often difficult and sometimes impossible.
One of the decisions that goes into creating a mathematical model is known as the “bias-variance tradeoff” in which a supervised machine learning model is selected such that it isn’t so specific that it only works in a limited number of cases, but isn’t so general that it ignores all the details. This tradeoff is straightforward to quantify and is very well understood in known algorithms. With Perspective Google uses a type of supervised machine learning called a deep neural network, a machine learning algorithm specifically designed to solve complex problems. Interestingly deep neural networks almost exclusively sit on the high variance end of the bias-variance spectrum. That is to say Perspective almost certainly has very low algorithmic bias. While it is possible that the model does have some unquantified algorithmic bias (for example it may not be able to distinguish intentional deception) the instances of text used in the referenced articles are not an example of this.
The conclusion then is that training bias accounts for almost all of the bias in this application. Training bias is much less understood than its algorithmic counterpart. The data used to train Perspective comes from discussions between different Wikipedia editors about the content of page edits, a simple and widely available data set. However the latent sources of bias in this training dataset are difficult to spot ranging from local copyright law to the composition of the employees in the software industry. Algorithms can be corrected so that these biases are not amplified but these adjustments require an apriori knowledge to identify the affected classes and still result in at least the same bias as the training data itself. The solution to this problem will involve awareness, working with both the variables in the data set as well as the outcome to be predicted.
Developing tests for bias among the predictor variables in the training set can, at a minimum, allow the consumer of the model to be informed of its limits. With Perspective Google simply put forward a test environment that allows an individual to enter in any english utterance and get a toxicity score. But the testing data consisted of full sentences that were a part of a larger thread of conversation, which is a bias in itself. If Perspective forced the user to submit an entire conversation and then select a specific response for a toxicity rating the results may be more interpretable.
Adjusting the outcome variables to incorporate additional parameters of “fairness” is one avenue being explored. Another solution is to throw out the predicted outcomes entirely and allow the algorithm infer the underlying structure of the data. Asking Perspective to partition up the Wikipedia data into a number of unlabeled categories may yield an implicit toxic/non-toxic split. This type of machine learning is much less understood but many experts believe this is the path forward towards a more generalizable intelligence.
Overall bias presents one of the most difficult obstacles to overcome in a wider adoption of machine learning. An awareness and understanding of the sources of those biases is the first step to correcting them.