April 2020 – Data Science W231 | Behind the Data: Humans and Values

April 15, 2020

A Problem to A Dress: Algorithmic Transparency and Appeal

A Problem to A Dress: Algorithmic Transparency and Appeal
By Adam Johns | April 13, 2020

Once upon a time, a million years ago (Christmas 2019), people cared about things buying fashionable gifts for their friends and family, rather than access to bleach. At this time, I was in the process of attempting to purchase a dress for my partner from an online store I’d shopped with in the past. Several days before the delivery cutoff for Christmas delivery, I received an unceremonious computer-generated email that my order had been cancelled. No sweat, I thought, and repeated the purchase. Cancelled again. As the deadline for the holidays approached, I called the particular merchant, who informed me that my order had been flagged by an algorithm as a security risk, my purchase had been cancelled, and there was in fact nobody I could speak to to appeal to, and no possibility of determining what factors had contributed to this verdict. I hung up the phone, licked my wounds, and moved on to other merchants for my last-minute shopping.

Upon later reflection, chastened by a nearly missed holiday gift deadline, I mused at what could have possibly resulted in the rejection. Looking back over my past purchases, it became apparent that in a year or two of shopping with this particular retailer, I hadn’t actually bought any women’s clothes. Perhaps it was the sudden change from menswear to dresses that led the algorithm to flag me (a not very progressive criteria for an otherwise progressive-seeming retailer). Whatever the reason, this frivolous example got me thinking about some very serious aspects of algorithmic decision making. What made this particular example so grating? Firstly, the decision was not transparent—I wasn’t informed that an algorithm had flagged my purchase until a number of calls to customer service. Secondly, I had no recourse to appeal—even after calling up, credit card info and personal identification in hand, nobody at the company was willing or able to overturn the decision. While such an algorithmic “hard no” was easy to shake off for a gift purchase, imagining such an approach applied to a credit decision, an insurance purchase, or a college application was disconcerting.

In 2020, algorithmic adjudication is becoming an increasingly frequent part of life. Machine learning may be broadly accurate in the aggregate, but individual decisions can always suffer from false positives and false negatives. When such a decision is applied to customer service or security, bad decisions can alienate customers and lead previously loyal customers to take their business elsewhere. When algorithms impact more consequential social matters like person’s access to health care, housing, or education, the consequences of a poor prediction take on higher stakes. Instead of just resulting in disappointed customers writing snarky blog posts, such decision making can amplify inequity, reinforce detrimental trends in society, and lead to self-reinforcing feedback loops of diminished individual and societal potential.

The growing importance of machine learning in commercial and government decision making isn’t likely to decline any time in the future. But to apply algorithms for maximum benefit, organizations should ensure that algorithmic decision making embeds transparency and a right to appeal. Let somebody know when they’ve been flagged, and what factored into the decision. Give them the right to speak to a person and correct the record if the decision is wrong (Crawford and Schultz’s concept of algorithmic due process offers a solid base for any organization trying to apply algorithms fairly). As a bonus, letting subjects of algorithmic decision making appeal offers a tantalizing opportunity to the data scientist: More training data to improve the algorithm. While it requires more investment, and a person on the other end of a phone, transparency and right to appeal can result in a rare win-win for algorithmic designers and the people to whom those algorithms are being applied, and ultimately lead us toward a more perfect future of algorithmic coexistence.

Reference:
Kate Crawford & Jason Schultz, Big Data and Due Process: Toward a Framework to Redress Predictive Privacy Harms, 55 B.C.L. Rev. 93 (2014), https://lawdigitalcommons.bc.edu/bclr/vol55/iss1/4

April 15, 2020

Transgender Lives and COVID-19

Transgender Lives and COVID-19
By Ollie Downs | April 10, 2020

Transgender Day of Visibility (TDOV) is March 31st every year; it is a day to celebrate the trans experience and “to bring attention to the accomplishments of trans people around the globe while fighting cissexism and transphobia by spreading knowledge of the trans community”. I spent this year’s TDOV voluntarily sheltering in place in my home in Berkeley, California, with two other non-binary housemates of mine. During this shelter-in-place, I am reminded of the struggles faced uniquely by trans and non-binary folks in light of COVID.

Being Counted
Being counted is essential to dealing with issues like COVID-19, but there are challenges associated with counting trans people. Knowing who is getting it, where and when, and how they are dealing with it, are all crucial questions to answer. Unique groups like the non-binary community may very well be at higher risk for contracting COVID-19–and we need to know that. The ethical implications of collecting this data are tricky. Being visible as trans/non-binary is crucial for some people, and dangerous for others. On one hand, being able to quantify how many people, and what kinds of people, identify that way and where allows us to not only understand the demographics–and thus potential challenges and experiences–of those people. Especially in public health and government settings, knowing where things are happening, and to whom, is crucial in designing solutions like enforcing quarantines and distributing resources. On the other hand, forcing people to identify themselves as one thing or the other is challenging for many, and divides the world into discrete parts when actual identities are fluid and on spectrums. The truth may be lost when a person is forced to choose between imprecise options.

Social isolation and Abuse
Shelter-in-place orders are effective tools for containing the spread of a disease. But they’re also very effective at containing people who may not get along. As many of us have experienced firsthand, being isolated with others can create tension and conflict–which can be deadly for people with identities or characteristics outside ‘the norm.’ Transgender people, especially youth, may be trapped with abusive parents, partners, or other people who may seek to harm them, especially in situations where other identities intersect with their gender. Many transgender individuals find community in social spaces like communithy centers or bars, and without access to them, these communities (like many other marginalized communities) will suffer.

Other intersecting identities
The intersection of gender with other identities is complex and precarious. Other examples of discrimination against people with marginalized identity are everywhere. One example can be found here. In this post, Nadya Stevens reveals the danger faced by “poor people, Black people and Brown people” who are “essential workers” who must commute on crowded, reduced-service public transportation. Transgender and non-binary people, who face poverty and racism at alarmingly high levels, are directly impacted by the policy changes like that of the MTA. There is some light at the end of this particular tunnel. Actor Indya Moore began a campaign to take direct action to support transgender people of color (donate on Cashapp to $IndyaAMoore), and Moore’s campaign raised so much money in its first week that their account was frozen. This cannot be an isolated campaign: policy efforts must be made to continue this action.

Education at Home
Policy shifts towards turning education online during this time have been extremely difficult, especially for anyone in an unsafe home environment, without access to the Internet, or who are otherwise unable to consume material or who learn better in classroom settings. Transgender and non-binary people, again, experience poverty and violence at high rates, which may be worsened by these policy measures, and also often face medical discrimination, and may be impacted by failure to make online learning accessible to deaf, blind, or otherwise ‘non-normative’ students.

Medical issues
It makes sense that hospitals and medical care providers are halting ‘non-essential’ services like surgeries to focus on the care of COVID-19 patients. But the classification of some surgeries as ‘non-essential’ can be devastating, especially for trans and non-binary patients. Gender-affirming procedures are often categorized this way, but for many patients, they are crucial for their health and safety in a transphobic world. Additionally, patients with AIDS–many of whom are transgender–are at a higher risk of death from COVID-19.

The Unknowns
What we don’t know could be the worst part of this epidemic. We don’t know if, or how, COVID-19 interacts with hormone treatments or HIV medication. We don’t know how it will impact the future of education or policy, or how social isolation and intersecting identities might change these outcomes.

What’s next?
Taking action is very difficult in a pandemic. This situation impacts everyone differently, but impacts transgender people as a community especially. What can be done? Until we can return to normal life, there are several actionable ideas; donate to funds you know will go towards transgender lives (Cashapp: $IndyaAMoore and many others), check in with your friends, family, colleagues, coworkers, and acquaintances who you know are transgender and offer your support; educate yourself and others about the struggles of the trans community; volunteer for organizations committed to transgender health. Finally, have hope. The transgender community has been more than resilient before. We will continue to be resilient now.

If you or anyone you know who is trans/non-binary/gender non-conforming and facing suicidality, please call Trans Lifeline at 877-565-8860.

Photo credits:
https://www.state.gov/coronavirus/
https://commons.wikimedia.org/wiki/File:Nonbinary_Gender_Symbol.svg
https://en.m.wikipedia.org/wiki/File:A_TransGender-Symbol_black-and-white.svg

April 15, 2020

The Ethics of Not Sharing

The Ethics of Not Sharing
By George Tao | April 10, 2020

In this course, we’ve thoroughly covered the potential dangers of data in many different forms. Most of our conclusions have led us to believe that sharing our data is dangerous, and while this is true, we still must remember that data is and will be an instrumental part in societal development. To switch things up, I’d like to present the data ethics behind not sharing your data and steps that can be taken to improve trust between the consumer and the corporation.

The Facebook-Cambridge Analytica data scandal and the Ashley Madison data leaks are among many news stories regarding data misuse that have been etched into our minds. However, we often remember the bad more vividly than the good, so as consumers, we seek to hide our data whenever possible to protect ourselves from the bad. However, we also must remember the tremendous benefits that data can provide for us.

One company has created a sensor that pregnant women can wear to predict when they are going into labor. This app can provide great benefits in reducing maternal and infant mortality, but it can also be very invasive in the type of data it collects. However, childbirth is an area that can use this invasive type of data collection to improve upon current research. Existing research regarding female labor is severely outdated. The study that modern medicine bases its practices on was done in the 1950s on a population of 500 women who were exclusively white. By allowing this company to collect data regarding women’s pregnancy and labor patterns, we are able to replace these outdated practices.

Shot of a beautiful group of young pregnant women taking a selfie together after a yoga session in studio

This may seem like an extremely naive perspective on sharing data, and it is. As a society, we have not progressed to the point where consumers can trust corporations with their data. One suggestion that this article provides is that data collectors should provide their consumers with a list of worst case scenarios that could happen with their data, similar to how a doctor lists side effects that can come with a medicine. This information not only provides consumers with necessary knowledge, but also helps corporations make decisions that will avoid these outcomes.

I believe that one issue that hinders trust between consumer and corporation is that of the privacy policy. Privacy policies and terms of agreement are filled with technical jargon that make them too lengthy and too confusing for consumers to read. This is a problem because I believe that privacy policies should be the bridge that builds trust between the consumer and the corporation. My proposed solution is to create two separate but identical privacy policies: one that is designed for legal purposes and one that is designed for understandability. By doing this, we provide consumers with knowledge of what the policy is saying while not losing any legal protections that the policy provides.

There are many different ways to approach the problem of trust, but ultimately, the goal is to create trust between the consumer and the corporation. When we have achieved this trust, we can use the data built by this trust to improve upon current practices that may be outdated.

Works Cited
https://www.wired.com/story/ethics-hiding-your-data-from-machines/

April 15, 2020

Ethical CRISP-DM: The Short Version

Ethical CRISP-DM: The Short Version
By Collin Cunningham | April 11, 2020

If you could impart one lesson to a fledgling data scientist, what would it be? I asked myself this question last year when data science author Bill Franks called for contributors to his upcoming book, 97 Things About Ethics Every Data Scientist Should Know.

The data scientists I have managed and mentored most often struggle with transitioning from academic datasets to real world business problems. In machine learning classes, we are given clearly defined problems with manicured datasets. This could not be further from the reality of a data science job: requirements are vague, data is messy and often doesn’t exist, and causality hides behind spurious correlations.

This is why I teach junior data scientists the Cross Industry Standard Process for Data Mining (CRISP-DM). Even though it was developed for data mining long ago, it is perfectly applicable to modern data science. The steps of CRISP-DM are:

Business Understanding
Data Understanding
Data Preparation\
Modeling
Evaluation
Deployment

These steps are not necessarily sequential as shown in the diagram; you often find yourself back at Business Understanding after an unsuccessful deployment. However, this framework gives much needed structure which smoothes the awkward transition from academia to industry.

And yet, this would not be the singular lesson I would impart. That lesson would be ethics. Without instilling ethics in data science education, we are arming millions of young professionals with tools of immense power but no notion of responsibility. Thus, I sought to combine the simplicity and applicability of CRISP-DM with ethical guardrails in developing Ethical CRISP-DM. Each step in CRISP-DM is augmented with a question on which to reflect during that stage.

Business understanding – What are potential externalities of this solution? We ask data scientists to lean on those with domain experience when refining requirements into problem statements. Similarly, these subject matter experts are the people who have the most insight into those who may be affected by a model.

Data understanding< – Does my data reflect unethical bias?/strong> As imperfect creatures, it is naive to view anyone as void of bias. It follows that data generated by humans inevitably holds the shadow of these biases. We must reflect on what biases could exist in our data and perform specific analysis to identify these biases.

Data preparation – How do I cleanse data of bias? The data cleansing we are all familiar with has a parallel cleansing phase in which we seek to mitigate the biases identified in the previous step. Some of these biases are easier to address than others; filtering explicitly racist words from a language model is easier than removing relationships between sex and career choice. Furthermore, we must acknowledge that it is impossible to completely scrape bias from data, but attempting to do so is a worthwhile endeavor.

Modeling – Is my model prone to outside influence? With the growing ubiquity of online learning, models often adapt to their environment without human oversight. To maintain the ethical standard we have cultivated so far, guardrails must be put in place to prevent nefarious evolutions of a model. When Microsoft released Tay onto Twitter, users were able to pervert her language model resulting in a racist, anti-semetic, sexist, Trump-supporting cyborg.

Evaluation and Deployment – How can I quantify an unethical consequence? The foundation of artificial intelligence is feedback. It is critical we create metrics to monitor high-risk ethical consequences. For example, predictive policing applications should monitor the distribution of crimes across neighborhoods to avoid over-policing.

Ultimately, we are responsible for the entire products we deliver including their consequences. Ethical CRISP-DM holds us to a strict regime of reflection throughout the development lifecycle, thereby assuring the models we deliver are built ethically.

April 13, 2020

The Robot of Wall Street

The Robot of Wall Street
By Vinicio De Sola | April 12, 2020

Since the start of the pandemic, the market has become more like a rollercoaster rather than the “Little Engine that Could” that was in the previous 8 to 10 years. We have weeks that seem to wipe out all the hard earnings of our 401(k), pension plans, and investments, while others it looks like the worst had happened, that investors are regaining confidence, and the market will rebound. For the high rollers, the people with high capital, this translates to long calls to their financial analysts, asking them what to do, when to sell, when to buy, how to weather the storm. But, for the majority of Americans, these analysts had a transformation: enter the Robots.

Robo Advisors are very different than human, financial analysts in several ways. First, they automate investment management decisions by using computer algorithms. Without having to bore the reader with the financial jargon, this means that they use portfolio optimization techniques with constraints based on the risk tolerance of the user. In essence, this means asking the investor a set of predetermined questions so they can assess the tolerance on a given scale (this varies from different robo-advisors). Second, because there is no real research focused on the client. The trading can be done in bulk by clustering similar users; the fees are way smaller when compared to human advisors – 0.25% compare to around 2 to 3%, and have way lower minimum balances, with some allowing the user to open the account with $0. In contrast, human advisors have some minimum thresholds of around $250,000 to consider the user.

So, Did the Robot kill the financial star? Not so fast. Robo advisors are far from perfect. Some recent research in the financial space had shed some light on the shortcomings of robo advisors. First, the process is far from standard: each company or brokerage firm has its method of evaluating investor risk tolerance, but on average, this means just asking a questionnaire of around 35 questions – which in reality can’t fit the whole reality of an investor. Some authors even found that some of the questions weren’t even related to assessing risk tolerance, but to sell products from the same brokerage firm or fund. Also, despite their name, these advisors don’t use Big Data, Artificial Intelligence, or social media to paint a clearer picture of the user, instead of focusing in broad clusters of risk – to keep fees small and trades free.

Another key and essential difference are to ask if the robo advisor is a fiduciary of the investor (does the advisor have the responsibility of acting on the best interest of the investor over their own). A considerable chunk of robo-advisors are not fiduciary, but instead only need to follow the suitability obligation (they are only bound to provide suitable recommendations to their clients). On the other hand, for the small group of robo-advisor that consider themselves fiduciary, the industry has the view that they can’t be fiduciaries. In essence, they only offer advice to their clients based on their goals, not on their full financial situation, so that any advice will be flawed from the start.

So, what should you, the reader, do? Many robo-advisor services had a substantial inflow of new users during the pandemic by people that want to buy the deep (Buying stocks after a significant drop in price). Still, for the long-period investors, their portfolios suffered significant losses given the massive tail risk that many advised portfolios have in times of crisis. Without being financial advice, I would recommend you, the reader, to do the following research when selecting a robo-advisor:

1. Does the company have a fiduciary duty with its clients?
2. Does the company have any form of human interaction or services? Is it a call center? Or just email-based? How quickly can they respond to a crisis?
3. Please read their privacy policies: many advisors also have other products that relate to marketing, so all your financial information will be in display for them.
4. Finally, decide if the robo-advisor portfolio matches your risk characteristic? If not, move to another one – you won’t pay much if you hop around advisors during set-up time, but once you decide one, always monitored your performance.

Sources
——-

– What Is a Fiduciary?, https://smartasset.com/financial-advisor/what-is-fiduciary-financial-advisor
– To Advise, or Not to Advise — How Robo-Advisors Evaluate the Risk Preferences of Private Investors, https://www.cfainstitute.org/research/cfa-digest/2019/01/dig-v49-n1-1
– Robo-Advisor vs. Personal Financial Advisor: How to Decide, https://www.nerdwallet.com/blog/investing/personal-financial-advisor-robo-advisor/
– Robo-Advisor Fee Comparison, https://www.valuepenguin.com/comparing-fees-robo-advisors

April 13, 2020

Collecting citizens data to slow down coronavirus spread without harming privacy

Collecting citizens data to slow down coronavirus spread without harming privacy
By Tomas Lobo | April 5, 2020

During these days, we have seen how coronavirus has been able to paralyze 4 billion humans, locking them in their houses. The human losses and economic consequences are far from being over. Some countries have been better than others at containing the spread, and there’s a lot that other countries can learn from them in order to more effectively contain their respective outbreaks. In all cases of success (eg. China (arguably), Singapore, Taiwan, Hong Kong, South Korea, etc.), the common denominator has been: a) test fast, b) increase surveillance.

Increased surveillance during a global health crisis can be very positive for the virus containment. For example, if somebody who has the virus decides to go grocery shopping, the consequences can be catastrophic. More surveillance would allow the Government to know exactly what are the whereabouts of people that can potentially contaminate other individuals and fine them if they disobey the rules of mobility. For a system like that to work and be endorsed by citizens, it would be necessary to a) have multiple sources of real time data collection and b) assure that people’s privacy is respected.

In order to understand who is more likely to have the virus, who already has the virus and what are the whereabouts of both groups of people, the Government would require: 1) to use the cameras systems available with added facial recognition capabilities, 2) to connect tests data to a national database, 3) to monitor peoples’ location via their smartphones, 4) to monitor peoples’ temperature with IoT thermometers (like Kinsa). A surveillance system like this would not be cheap, but considering that the coronavirus is expected to cost the economy trillions of dollars – 4% of the global GDP this year and who knows how much more in future years until a vaccine is invented – this cost looks little in comparison.

To implement a system like this, people are required to endorse it and embrace it, as most of the world’s political systems are democratic by nature. For this to happen, the Governments should demonstrate to the citizens that a system like the described above would only be used to contain the spread of the virus and to save hundreds of thousands of lives. The flow of information should be carefully controlled, so there are no leaks that could potentially damage individuals lives. Also, this data should be kept strictly for Governmental use.

In the coronavirus era, it’s imperative to innovate and embrace the use of data collection technologies to impose efficient, sustainable lockdowns that target individuals that are most at risk of spreading the disease. For that to be implemented effectively, securing citizens privacy is a key component that will guarantee the program’s endorsement. Time is ticking and it’s time to learn from what some Asian governments have been able to put together relatively quickly.

Sources:
https://www.aljazeera.com/news/2020/03/china-ai-big-data-combat-coronavirus-outbreak-200301063901951.html