Telematics in automobile insurance : Are you giving too much for little in return?

Telematics in automobile insurance : Are you giving too much for little in return?
By Aditya Mengani | July 9, 2021

Telematics based programs like  “Pay as you drive” have been dramatically transforming the automobile insurance industry over the past few years. Many traditional insurance providers like AllState, Progressive, Geico etc. and newer startups like Metromile, Root and even car manufacturers like Tesla have been introducing programs or planning new ones centered around “Pay as you drive” type of model where a user consents to providing his driving data using telematics devices.

The data collected is used to determine the risk of the customer and provide discounts or other offers tailored in a personalized way to the customer based on his driving habits. The only caveat being the user subscribes to a continuous feedback of data points from the vehicle using telematics that can be embedded in the car using built-in sensors or plug in devices using GPS or mobile phones. This could open up a can of worms, with regards to issues of privacy and ethics surrounding the collected data. Many consumers were skeptical of enrolling to these programs, but recently due to the pandemic, a lot has been changing with the perception of users as many more consumers are enrolling hoping to reduce their insurance costs due to lesser travelling habits. This trend is expected to continue in future with more and more consumers opting in for these services.

There is a lack of transparency as to what gets collected by these telamatics as many insurance providers provide a vague definition of various metrics and what constitutes driving behaviour. In traditional insurance multiple factors affect the risk and in turn the premium paid by the customer like location, age, gender, marital status, years of driving experience, driving and claims history, vehicle information etc. With the “Pay as you drive”, insurance providers claim that, additionally, they track real time metrics related to driving habits which include speed, acceleration, braking, miles driven, time of the day etc and what gets collected varies by each insurance provider. For example, In 2015 AllState obtained a patent that can use sensors and cameras to detect potential sources of driver distraction within a vehicle and also has potential to evaluate heart rate, blood pressure, and electrocardiogram signals that are recorded from steering wheel sensors. Concerns similar to these have made consumers skeptical about enrolling in these programs.

Another challenging aspect is algorithmic transparency. With wide spread consumption of telematics data, insurance providers and regulators need to define a clear set of factors collected for actuarial justification of rating premiums and underwriting the policies. Most of the algorithms are proprietary and insurance providers do not release data as to how their algorithms use the factors to derive a score. From a fairness and transparency perspective, there is very less choice and information available for consumers to decide before opting in to these programs.

Widespread usage of artificial intelligence based telematics in predicting risks can pose predictive privacy harms to the consumer and without proper regulations insurers can collect a variety of data that does not have any causative effect on the factors used for predicting risk scores. Currently such regulations are not enforced across many countries. This can also lead to discriminatory practices and create unintended biases during the data collection and exploration processes.

Who gets to build these telematics services is another thing to worry about. Most of the telematics services are created by third party vendors like verisk who are non-insurance firms providing software and analytical solutions for these programs. Thus these providers escape the radar of regulators who only scrutinize the insurance providers but do not have similar protocols established over these non-insurance vendors.

With chances of collecting a myriad amount of data and lack of transparency and regulations, there is a chance that the data can be used for advertising or can be attempted to sell to third parties for monetization purposes. Even though regulations like CCPA enforce companies to provide information regarding what they do with the data, there is a chance that these regulations might change over time in future, and consumers who are already enrolled in programs might not be aware of the changes to the policy terms.

Finally privacy would be another major issue that could pose many dangers to the consumer. Many companies retain the data collected forever or won’t state clearly in their privacy policies what they intend to do with the data collected after a period of time or what is their retention policy for the data. If there are no proper measures taken to protect the data collected, this could lead to privacy breaches.

With all the above risks laid out, for a consumer it all comes down to doing a risk vs return decision. Will a consumer give a lot of his information for little in return is something that varies by an individual’s perception of the processes practiced and his needs with respect to insurance prices. As far as it goes with the regulators and law, they are still evolving and embracing this new arena and still need to do a lot of catching up. All we can do for now is to hope that in future, a comprehensive set of laws and regulations can help this new area of insurance thrive to success and make a conscious choice on our needs.

References:

https://www.forbes.com/advisor/car-insurance/usage-based-insurance/

https://www.insurancethoughtleadership.com/are-you-ready-for-telematics/

https://content.naic.org/cipr_topics/topic_telematicsusagebased_insurance.htm

https://www.chicagotribune.com/business/ct-allstate-car-patent-0827-biz-20150826-story.html.

https://dataethics.eu/insurance-companies-should-balance-personalisation-and-solidarity/

https://consumerfed.org/reports/watch-where-youre-going/

https://www.accenture.com/nl-en/blogs/insights/telemethics-how-to-materialize-ethics-in-pay-how-you-drive-insurance

https://www.internetjustsociety.org/challenges-of-gdpr-telematics-insurance

The Death of the Third-party Cookie

The Death of the Third-party Cookie
By Anonymous | July 9, 2021

It’s a Sunday afternoon. You’re on your couch looking at a pair of sneakers on Nike.com but decide not to buy them because you already have a pair that’s slightly worn out. Then, you navigate to Facebook to catch-up on your friends’ lives. You scroll through various posts, some more interesting than others, and lo and behold, you see an ad for the same exact pair of sneakers that you just saw.

If you’ve ever wondered why this happens, it’s because of something called third-party cookies. These cookies allow information about your activity on one website to be used to target advertisements to you on another website.

What are third-party cookies?
A cookie is a small text file that your browser stores on your computer to keep track of your online activity. For example, when you’re shopping online, cookies store information about the items in your shopping cart. A third-party cookie is a cookie that was created by a website other than the one you are visiting. In this example, you’re visiting facebook.com but the cookie was created by nike.com.

Cookies Store Various Forms of User Data
How Third-party Cookies are Generated

What are third-party cookies used for?
They are primarily used for online advertising, so that ad companies can target advertisements towards specific users based on what they know about them.

Do all browsers support third-party cookies?
Two major browsers, Safari and Firefox, both block third-party cookies by default. Google Chrome, the most popular browser, still allows third-party cookies, but Google announced in Jan 2020 that Chrome would stop supporting them in the next few years.

Why is “the death of the third-party cookie” significant?
A large part of why browsers are no longer supporting third-party cookies is a change in public opinion. With incidents like Facebook’s Cambridge Analytica scandal, where a third-party company misused Facebook’s user data to manipulate voters, consumers have become increasingly aware and concerned about data privacy. Because targeted ads are so prevalent, it presents one of the biggest pain points. According to a poll conducted by the Pew Research Center, 72% of Americans worry that everything they do is being tracked by various parties, which is not too far from the truth.

The “death of the third-party cookie” means that advertisers will no longer be able to track users across different domains, such that the cookies created on a particular website can only affect the user’s experience on that site. This is called a first-party cookie. This means that it will be more difficult for an advertiser to develop a user profile based on your actions, given that they cannot consolidate your actions between various sites.

With third-party cookies going away, advertisers will be increasingly reliant on first-party data, data collected directly from the user (e.g. name, email, and viewed content) for targeting advertising. Hence, users will have to be more attentive to the data points that they willingly provide online and how they can be used.

Does this mean I should be less worried about ad tracking?
Yes and no. Although the phasing out of third-party cookies helps reduce privacy harms committed by Adtech firms, it also results in more power for companies like Facebook that not only have an immense amount of user data but also a large stake in the ad industry. New approaches to targeted advertising are already in the works as a replacement for third-party cookies, and it is yet to be seen how well these will guard user privacy.

References
* https://qz.com/2000490/the-death-of-third-party-cookies-will-reshape-digital-advertising/
* https://blog.hubspot.com/marketing/third-party-cookie-phase-out?toc-variant-a=
* https://www.mckinsey.com/business-functions/marketing-and-sales/our-insights/the-demise-of-third-party-cookies-and-identifiers
* https://www.epsilon.com/us/insights/trends/third-party-cookies

Are we ready to trust machines and algorithms to decide, for all?

Are we ready to trust machines and algorithms to decide, for all?
By Naga Chandrasekaran | July 9, 2021

Science Fiction to Reality:

I wake up to soothing alarm music and mutter, “Alexa, turnoff!” I pickup espresso from automated coffee machine and begin my zoom workday. The Ring doorbell alerts me about prepaid lunch, delivered by Uber Eats. After lunch, I interview candidates recommended  to me by an algorithm. After work, I ride an autonomous Tesla car to my date I had met on Tinder. Back home, I share my day on social media accounts and watch a Netflix recommended movie. Around midnight, I ask Alexa to switch off the lights and set an alarm for the morning. Machines are becoming our life partner! 

Digital Transformation Driving Data Generation Imbalance:

Through seamless integration of technology into every aspect of life, we share personally identifiable information (PII) and beyond, generating over 2.5 exabytes of data per day [1]. Advances in semiconductors, algorithmic power, and availability of big data has led to significant progress in data science, artificial intelligence (AI), and machine learning (ML). This progress is helping solve cosmetic, tactical, and strategic issues impacting individuals and societies universally. But, is it making the world a better place, for all? 

Digitally Connected World Driving Information Flow [2]

Digital transformation is influencing only part of the world’s population. In January 2021, 59.5% of the global population were active internet users [3]. The number dropped further for usage of digital devices at edge. These users contribute to data generation. The categories and classifications created by data scientists is only a representation of wealthy individuals and developed nations that created the data. So, such classifications are incomplete. 

Interconnected world also generates data from unassuming participants who are thrust into a system through surveillance and interrogation [4, 6]. Even for willing participants, privacy challenges emerge when their data is used outside the original context [5]. Data providers  share a high degree of risk to harm vs benefit, from data leaks [6]. Privacy policies, established by organizations that collect such data, is focused on defensive measures instead of ethics. These issues drive users to avoid participation or provide limited information, which leads to inaccuracies in the dataset. 

As pointed out by Sandra Harding, “all scientific knowledge is always socially situated [7]”. Knowledge generated by data has inbuilt bias and exclusions. In addition, timely access to this data is limited to few. The imbalance generated from power position, social settings, data inaccuracy, and incomplete datasets create bias in our accumulated knowledge.

AI Cannot be Biased!

We apply this imbalanced knowledge to further our wisdom, which is to discern and make decisions. This is the field of AI/ML, also termed predictive analytics. When our data and knowledge are inaccurate and biased, our algorithms and decisions reconfirm our bias (Amazon recruiting). When decisions have limited impact (e.g., movie recommendations), we have the opportunity to explore algorithmic decision making. However, when decisions have deep societal impact (e.g., crime sentencing), would we turn our decision making to AI? [8, 9]

Big data advocates claim that with sufficient data we can reach same conclusions as scientific inquiry, however, data is just an input with inherent issues. There are other external factors that shape reality. We have to interrogate how the data was generated: Who are included and excluded? Does the variance count for diversity? Whose interests are represented? Without such exploration of the input data, the outputs do not represent the true world. To become wiser, we have to recognize that our knowledge is incomplete and algorithms are biased.

Collaborative Human – Machine Model:

In the scene enacted at the beginning of this article, it appeared that humans are making decisions while enjoying technological benefits. However, it is possible that our decisions are influenced by hidden technology bias. As depicted in Disney-Pixar movie Wall-E, are we creating a world where humans will forget their purpose? 

Scene from Wall-E showing Humans Living a Digitally Controlled Life

With these identified issues in the digitally transforming world and associated dataset, how can we progress? Technology is always a double edged sword. It can force change in the world, despite the social systems as well as its converse. The interplay between technology and people who interact with it is extremely critical in making sure the social fabric is protected and moving in the right direction. We cannot delegate all our decisions to algorithms and machines with the identified data issues. We need to continue to optimize our data and algorithms with human judgment [10]. Data scientists have a role to play beyond data analysis. Power delegation and distribution between humans and machines are extremely important in making the digitally transformed world a better place for all. 

Collaborative Human-Machine Model [10]

References:

[1] Jacquelyn Bulao, 2021, How much data is created everyday in 2021, Link

[2] https://www.securehalo.com/improve-third-party-risk-management-program-four-steps/

[3] Global digital population, Statista analysis, 2021, Link

[4] Daniel Solove, 2006, A Taxonomy of Privacy, Link

[5] Helen Nissenbaum, 2004, Privacy as Contextual Integrity, Link

[6] The Belmont Report, 1979, Link

[7] Sandra Harding, 1986, The Science Question in Feminism, Link 

[8] Ariel Conn, 2017, When Should Machines Make Decisions, Link

[9] Janssen et al., 2019, History and Future of Human Automation Interaction, Link

[10] Eric Colson, 2019, What AI Driven Decision Making Looks Like, Link

Do we actually agree to these terms and conditions?

Do we actually agree to these terms and conditions?
By Anonymous | July 9, 2021

Pic 1 Fictional Representation of Terms of Service Agreement Buttons

Every time I go on to a new website or online service there is a pop up of a terms of service agreement and privacy policy. Now this pop up agreement covers three quarters of the page and at the bottom right has two buttons, I Agree or Decline. Now in this scenario, do you think I read every line of this long document carefully and take time to consider what I am agreeing to or do you think I quickly move the scrollbar to the bottom without reading a single word and press I Agree without thinking much about it. Like most of the online population, I always do the latter. In fact, a 2017 study by Deloitte found that 91% of consumers accept the terms and conditions without reading them (Cakebread, 2017). ProPrivacy.com, a digital privacy group claims that number is higher with only 1% of subjects in a social experiment actually reading the terms of conditions (Sandle, 2020). The other 99% of survey respondents actually agreed to absurd things from the terms and conditions like permission to give their mother full access to their browsing history, the name rights to their first-born child, the ability to “invite” a personal FBI agent to Christmas dinner for the next 10 years, and so forth (Sandle, 2020). Now since they clicked on the I Agree button, does that mean that if ProPrivacy.com really wanted to name their first-born child, could they dispute that? This question of ability to dispute an agreed upon terms of service boils down to the question, “Does clicking the I Agree button signify their informed consent?”

I argue that even though they indeed pressed the button, it isn’t informed consent through the lens of the first belmont principle, respect for persons. The Belmont Report was published in 1979 in response to ethical failures in medical research. In the Belmont Report, it outlined three ethical principles for the protection of human subjects in research: 1) Respect for persons; 2) Beneficence; and Justice. Respect for persons is about participants giving informed consent to be in the research. Informed consent is broken down further to say that participants should be presented with relevant information in a comprehensible format and then should voluntarily agree to participate. Do the terms of service include relevant information in a comprehensible format? This is debatable, as the terms of service do include all the relevant information but it is always too long to read it in a reasonable amount of time.

Pic 2: Word length of the terms and privacy policies of top sites and apps

Terms and conditions are not in a comprehensive format as they are really difficult to read and often employ legalese, terminology and phrasing used by those in the legal field. A study from Boston College Law found that the terms of service of the top 500 websites in the U.S., had the average reading level of articles in academic journals which do not have the terminology used by the general public(Benoliel & Becher, 2019). So even if people try to carefully read these long terms of service, they may not understand what they are agreeing to. In terms of voluntarily agreeing to terms of service, while it isn’t forced acceptance, the acceptance of the terms is needed to use the website. So saying no to the terms of service isn’t a no penalty result, rather it will prevent you from the service that you wanted.

Pic 3: Infographic of the obscure wording of terms of service

So, how can we turn the agreements of terms of service into actual informed consent? Some ideas are: having summaries on the side of the terms of service so that it is more comprehensible, having important parts bolded and highlighted for people to notice and read them, and making a mandatory wait time before they can click on the Agree button so that people must spend some time on the terms of services to read it through.

References

Benoliel, U., & Becher, S. I. (2019). The Duty to Read the Unreadable. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.3313837

Cakebread, C. (2017, November 15). You’re not alone, no one reads terms of service agreements. Business Insider. https://www.businessinsider.com/deloitte-study-91-percent-agree-terms-of-service-without-reading-2017-11.

Sandle, T. (2020, January 29). Report finds only 1 percent reads ‘Terms & Conditions’. Digital Journal. https://www.digitaljournal.com/business/report-finds-only-1-percent-reads-terms-conditions/article/566127.

Facing the Issues of Facial Recognition and Surveillance

Facing the Issues of Facial Recognition and Surveillance
By Anonymous | July 9, 2021

Facial recognition technology is an already well developed and widespread technology which is rapidly expanding to cover all areas of the public and even private life. Its expansion into a near-ubiquitous presence threatens not only individuals’ most fundamental definitions of privacy but also the freedoms of assemblage and protest. Facial recognition technology serves the interests of the existing power structure. Its negative implications do not just pertain to the ways in which it can be used to infringe on an individual’s privacy. Rather, its reach and potential harm is much broader, infringing on society and the power of groups.

One ostensible justification for facial recognition technology is the use by police departments in protecting against crime. However, as the ACLU has pointed out, the narratives that support this use are deceptive. For instance, in 2016 the Detroit Police Department partnered with eight gas stations to install real-time camera connections with police headquarters as part of a ground-breaking crime-fighting partnership between local businesses, the City of Detroit and community groups called “Project Green Light Detroit.” This collaboration was presented to the community as a positive step along the line of a neighborhood watch system.

A store displaying their partnership with project green light

However, facial recognition technology is a form of general surveillance that allows monitoring of community members even without a warrant or determination of probable cause for the need to monitor them. This is one reason the ACLU is concerned about the expanded use of facial recognition technology in our society: it could easily be used for general surveillance searches because there are ample databases of individual photographs from each state’s motor vehicle license identifying photographs. (https://www.aclu.org/issues/privacy-technology/surveillance-technologies/face-recognition-technology.)

Individual privacy concerns are impacted by the use of facial recognition technology because it can be used in a passive way that does not require the knowledge, consent, or participation of the subjects. Additionally, there are security concerns related to the storage and use of facial recognition data. Facial data is very sensitive and can be quite accurate in identifying an individual. It is not clear that firms and government agencies have adequately managed the security of this data.

https://www.crainsdetroit.com/article/20180104/news/649206/detroit-aims-to-mandate-project-green-light-crime-monitoring

However, even more concerning is the broader societal impact that results from the widespread use of facial recognition data. Because of the extremely broad scope of facial recognition’s surveillance and power, there are more than just individual rights that need to be protected: it is the nature of society as a whole that is at risk of changing. The philosopher Michel Foucault considered the societal impact of surveillance systems, and he used the example of a panopticon to illustrate his theory. The panopticon explanation is an apt metaphor for the far-reaching societal impact of facial recognition systems, as well.

In 1791, British utilitarian philosopher Jeremy Bentham articulated the concept of a panopticon as a type of institutional building designed for social control. (Jeremy Bentham (1791). Panopticon, or The Inspection House.) The building’s design allows a single guard to observe and monitor all of the prisoners of an institution without the inmates being able to tell whether they are being watched at any particular moment.

 

Competition: A Solution to Poor Data Privacy Practices in Big Tech?

Competition: A Solution to Poor Data Privacy Practices in Big Tech?
By Anonymous | July 9, 2021

Competition

President Biden recently gave a press conference during which he spoke of a newly signed executive order on anticompetitive practices.  In his introductory remarks, he highlighted the effects of a lack of competition on hearing aids. He explained, “Right now, if you need a hearing aid, you can’t just walk into a pharmacy and pick one up over the counter.  You have to get it from a doctor or a specialist.  Not only does that make getting hearing aids inconvenient, it makes them considerably more expensive, and it makes it harder for new companies to compete, innovate, and sell hearing aids at lower prices. As a result… a pair of hearing aids can cost thousands of dollars.” ( Biden, 2021 ) This example, however, is not unique. It explores the fundamental relationship between consumer interests and companies’ products and practices.

Commonly, it is seen that a lack of competition allows companies to charge higher prices than consumers would reasonably pay for an item under ideal competition. If there is one gas station in a town charging $4.50 per gallon of gas, people will pay $4.50 per gallon. If 9 more gas stations open up, and costs for the gas stations are equivalent to $2.50 per gallon, each gas station will lower its price per gallon to gain customers while still earning a healthy margin, resulting in a gas price that might hover around $2.70 per gallon. This results in fair pricing for residents of the town.  Most people have thought about and understand this simple economic reality, but often do not think about a less tangible but equally existent application of this same effect; over-consolidation and anti-competitive practices among tech companies has led to the prevalence of poor privacy practices.  

“Rather than competing for consumers, they are consuming their competitors.” – President Joseph Biden (Biden, 2021 )

All other variables held equal, higher competition between companies causes a larger variety of products, services, and practices. As President Biden proclaimed, “The heart of American capitalism is a simple idea: open and fair competition — that means that if your companies want to win your business, they have to go out and they have to up their game; better prices and services; new ideas and products.” (Biden, 2021  If this is the case, as the President implies in his speech, the logical inverse is also true; lower competition between companies leads to a smaller variety of products, services, and practices.  Privacy and data protection practices are one of the many casualties of low competition.  Note that while not all lack of competition is due to anti-competitive practices, a lack of meaningful competition exists among tech companies nonetheless, and even those examples that are not necessarily attributable to noncompetitive practices are useful in seeing the effect of a lack of competition on privacy. If we consider the case where there is only one company providing an important or essential service, privacy practices are nearly irrelevant. If a user wants to use that service, the user must accept the privacy policy no matter its contents. Due to a lack of user-privacy focused legislation, however, current privacy policy writing and presentation practices lead a large majority of the population to almost never read a privacy policy ( Auxier et al., 2019 ).

Despite this lack of readership, which may be able to be fixed through education about privacy policies and reforms to their complexity and presentation, an increasing number of people do care significantly about data privacy practices, as can be seen by an increase in articles focused on privacy.

One example of the aforementioned lack of competition leading to poor privacy policies is Snapchat, owned by Snap, Inc.  Almost everyone in the younger generations uses Snapchat, and it is not interoperable with other platforms. It can even be considered a social necessity in many groups.  A user is therefore pressured into using the platform despite the severe privacy violations allowed by its terms of service and privacy policy, including Snap, Inc.’s right to save and use any user produced content – including self destructing messages ( Snap, Inc., 2019).  Imagine a hypothetical society in which the network effect is a nonissue and privacy policies are easily accessible to everyone.  There are three companies that offer similar services to Snapchat. Company A takes users’ privately sent pictures and uses them, stating as such in its privacy policy. Company B generally does not take users’ privately sent pictures or use them, but states in its privacy policy that it has the right to if it so chooses. Company C does not take or sell users’ privately sent pictures, and specifically states in its privacy policy that it does not have the right to do so. Company B here represents how Snapchat actually operates.  Which company would you choose? Through competition, which company do you think would come out on top?

Snapchat logo

While the lack of competition in the Snapchat example is due primarily to the network effect and not documentedly to anticompetitive practices taken by Snap, Inc., promoting competition in tech more generally can lead to a change in prevailing privacy and data security practices, thus leading to a systemic shift to fairer and more private privacy and data practices.

References:

Regarding the University of California’s Newest Security Event

Regarding the University of California’s Newest Security Event
By Ash Tan | July 9, 2021

Figure 1. The University of California describes its security event in an email to a valued member of the
UC community.

Figure 1. The University of California describes its security event in an email to a valued member of the UC community.

If you’re reading this, there’s a good chance that your personal data has been leaked. Important data too – your address, financial information, even your social security number could very well be floating around the Internet at this very moment. Of course, this prediction is predicated on the assumption that you, the reader, have some connection to the University of California system. Maybe you’re a student, or a staff or faculty member, or even a retired employee; it doesn’t really matter. This spring, the UC system announced that its data, along with the data of a hundred other institutions and schools, had been compromised in a euphemistically-described “security event.” In short, the UC system hired Accellion, an external firm, to handle their file transfers, and Accellion was the victim of a massive cybersecurity attack in December of 2020 (Fn. 1). This resulted in the information of essentially every person involved in the UC system being leaked to the internet, and while the UC system has provided access to credit monitoring and identity theft protection for the space of one year (Fn. 2), it should be noted that Experian, their chosen credit monitoring company, was responsible for a massive data breach of roughly 15 million people’s financial information in 2017(Fn. 3).

Figure 2. Affected individual receiving recompense for damages sustained from the 2015 Experian data breach.

Perhaps the framework that applies most intuitively to this system is Solove’s Taxonomy of Privacy (Fn. 4), which compels us to seek a comparable physical analog in order to better understand this situation. One might consider the relation between paperwork and a filing cabinet: we give our paperwork to the UC, which then stores it in a filing cabinet which is maintained by Accellion. We entrust our data to the UC system with the expectation that they safeguard our information, while the UC system entrusts our data to Accellion with the same expectation. When something goes wrong, this results in a chain of broken expectations that can make parsing accountability a difficult issue. Who, then, is to blame when the file cabinet is broken into: the owner of the cabinet, or the one who manages the paperwork within?

Figure 3. Cyber crime.

One take is that the laws that enabled a data breach of this scale are to blame. In an op-ed piece by The Hill, two contributors with backgrounds in education research, public policy, and political science point at a certain section of California Proposition 24 that exempts schools from privacy protection requirements (Fn. 5). These exemptions, including disallowing students of the right to be forgotten, allow a greater possibility of data mismanagement/misuse. The authors Evers and Hofer claim that stronger regulatory protections could have prevented this data breach along with numerous other ransomware attacks on educational institutions across the country, and that, in line with the Nissenbaum framework of contextual privacy (Fn. 6), “opt in/out” options could have potentially limited the amount of information leaked in this event while respecting individual agency. The “right to be forgotten” could have limited the amount of leaked information regarding graduated students, retired employees, and all individuals who have exited the UC system. In addition, California currently has no defined rights that enable an individual’s right to privately pursue recompense for damages resulting from negligent data management; the authors hold that defining such rights would incentivize entities such as the UC system to secure data more responsibly.

Notably, Evers and Hofer make no mention of Nissenbaum or Solove or even the esteemed Belmont Report (Fn. 7) when prescribing their recommendations. These proposed policy changes are not necessarily grounded in a theoretical, ethical framework of abstract rights and conceptual wrongs; they are intended to minimize real-life harms of situations that have already happened and could happen again. In the context of this very close-to-home example, we can see how these frameworks are more than academic in nature. But then again, the difference between framework and policy is the same as that between a skeleton and a living, breathing creature. The question that remains to be answered is whether the UC system will take this as an opportunity to better student data protections – between GDPR and California’s more recent privacy laws, there is plenty of groundwork to draw upon – or whether they will consider it nothing more than as an unfortunate security event, the product of chance rather than the result of an increasingly dangerous digital world (Fn. 5).

References:
1. https://www.businesswire.com/news/home/20210510005214/en/UC-Notice-of-Data-Breach
2. https://ucnet.universityofcalifornia.edu/data-security/updates-faq/index.html
3. https://www.theguardian.com/business/2015/oct/01/experian-hack-t-mobile-credit-checks-personal-information
4. https://www.law.upenn.edu/journals/lawreview/articles/volume154/issue3/Solove154U.Pa.L.Rev.477(2006).pdf
5. https://thehill.com/opinion/technology/550959-massive-school-data-breach-shows-we-need-better-privacy-policies?rl=1
6. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2567042
7. https://www.hhs.gov/ohrp/sites/default/files/the-belmont-report-508c_FINAL.pdf

Image sources:
1. Edited from a personal email.
2. https://topclassactions.com/lawsuit-settlements/lawsuit-news/hp-printer-experian-data-breach-settlement-checks-mailed/
3. https://www.linkedin.com/pulse/experian-data-breach-andrew-seldon?trk=public_profile_article_view

Brain-Machine Interfaces and Neuralink: privacy and ethical concerns

Brain-Machine Interfaces and Neuralink: privacy and ethical concerns
By Anonymous | July 9, 2021

Brain-Machine Interfaces

As the development of microchips furthers and advances in neuroscience occur, the possibility for seamless brain-machine interfaces, where a device decodes inputs from the user’s brain to perform functions, becomes more of a reality. These various forms of these technologies already exist. However, technological advances have made implantable and portable devices possible. Imagine a future where humans don’t need to talk to each other, but rather can transmit their thoughts directly to another person. This idea is the eventual goal of Elon Musk, the founder of Neuralink. Currently, Neuralink is one of the main companies involved in the advancement of this type of technology. Analysis of the Neuralink’s technology and their overall mission statement provide an interesting insight into the future of this type of human-computer interface and the potential privacy and ethical concerns with this technology.

Diagram of brain-computer interface

Brain-machine interfaces have actually been in existence for over 50 years. Research on these interfaces began in the 70’s at UCLA. However, with recent developments in wireless technologies, implanting devices, computational power and electrode design, a world where a device can be implanted to read the motor movements of a brain is now possible. In fact, Neuralink has already achieved this in a chimpanzee. The company successfully allowed this chimpanzee to control a game of pong with its mind. Their current goal is to advance the prosthetic space by allowing prosthetic devices to directly read input from the user’s motor cortex. However, the applications of this technology are vast and Musk has mentioned other ideas about the use of this technology such as downloading languages into brain, essentially allowing the device to write onto the brain. For now, this remains out of the realm of possibility, as our current understanding of the brain is insufficiently advanced. Yet, we are making advances in this direction every year. A paper was just published in Nature that allowed for high performance decoding of motor cortex signals into handwriting using a recurrent neural network.

Picture of chimpanzee controlling game from neuralink

Privacy

As this technology further develops, several privacy and ethical concerns come into question. To begin, using Solove’s Taxonomy as a privacy framework, many areas of potential harm are revealed. In the realm of information collection, there is much risk. Brain-computer interfaces, depending on where they are implanted, could have access to people’s most private thoughts and emotions. This information would need to be transmitted to another device for processing. The collection of this information by companies such as advertisers would represent a major breach of privacy. Additionally, there is risk to the user from information processing. These devices must work concurrently with other devices and often wirelessly. Given the widespread importance of cloud computing in much of today’s technology, offloading information from these devices to the cloud would be likely. Having the data stored in a database puts the user at the risk of secondary use if proper privacy policies are not implemented. The trove of information stored within the information collected from the brain is vast. These datasets could be combined with existing databases such as browsing history on Google to provide third parties with unimaginable context on individuals. Lastly, there is risk for information dissemination, more specifically, exposure. The information collected and processed by these devices would need to be stored digitally. Keeping such private information, even if anonymized, would be a huge potential for harm, as the contents of the information may in itself be re-identifiable to a specific individual. Lastly there is risk for invasions such as decisional interference. Brain-machine interfaces would not only be able to read information in the brain but also write information. This would allow the device to make potential emotional changes in its users, which be a major example of decisional interference. Similar devices are already present in major depression treatment devices that implant electrodes for deep brain stimulation.

Ethics

One of the most common ethical principles for guiding ethical behavior in research and science is the Belmont Principles, which include respect for persons, beneficence and justice. Future brain-machine interfaces present challenges to all three guiding principles. In order to protect the respect for persons, people’s autonomy must be respected. However, with these devices, the emotions of the users could physically be altered by the device, and thus affecting their autonomy. Beneficence involves doing no harm to the participants. However, as mentioned with the privacy potential harms, there is likely to be harm towards the first adopters of the technology. In regards to justice, these devices may also be lacking. The first iterations of the devices are extremely expensive and not attainable to most people. However, the potential cognitive benefits of such devices would be vast. This could further emphasize the already wide wealth inequality gap. The benefits of such a device would not be spread fairly across all participants and would mostly benefit those who could afford the devices.

Based on other respected neuroscientists invested in brain-machine interfaces, these sort of devices with the abilities purported by Elon are quite far away. We currently still lack fundamental knowledge about the details of the brain and its inner workings. Yet, our existing guidelines for privacy and ethics fail to encompass the potential of such advances in brain-machine interfaces, which is why further thought is needed to provide polices and frameworks to properly guide the development of the technology.

What Will Our Data Say About Us In 200 years?

What Will Our Data Say About Us In 200 years?
By Jackson Argo | June 18, 2021

Just two weeks ago, Russian scientists published a paper explaining how they extracted a 24,000 year old living bdelloid rotifer, a microorganism with exceptional survival skills, from Siberian permafrost. This creature is not only a biological wonder, but comes with a trove of genetic curiosities soon to be studied by biotechnologists. Scientists have found many other creatures preserved in ice, including Otzi the Iceman, a man naturally preserved in ice for over 5300 years. Unlike the rotifer, Otzi is a human, and even though he nor any of his family can give consent for the research conducted on his remains, he has been the subject of numerous studies. This research does not pose a strong moral dilemma for the same reason it is impossible to get consent, he has been dead for more than five millennia and it’s hard to imagine what undo harm could affect Otzi or his family. Frameworks such as the Belmont Report emphasize the importance of consent from the living, but make no mention of the deceased. However, the dead are not the only ones whose data is at the mercy of researchers. Even with legal and ethical frameworks in place, there are many cases where the personal data of living people is used in studies they might have not consented to.

*A living bdelloid rotifer from 24,000-year-old Arctic permafrost.*

It’s not hard to imagine that several hundred years from now, historians will be analyzing the wealth of data collected by today’s technology, regardless of the privacy policies we may or may not have read. Ortiz’s remains only provide a snapshot of his last moments, and this limited information has left scientists many unanswered questions about his life. Similarly, today’s data does not capture a complete picture of our world and some may even be misleading. Historians are no stranger to limited or misleading data, and are constantly filling in the gaps and revising their understanding as new information surfaces. But, what kind of biases will historians face when looking at these massive datasets of personal and private information?

Missing in Action

To answer this question, we first look for the parts of our world that are not captured or underrepresented in these datasets. Kate Crawford gives us two examples of this in the article Hidden Biases in Big Data. A study of Twitter and Foursquare data revealed interesting features about New Yorker’s activity during Hurricane Sandy. However this data also revealed it’s inherent bias; the majority of the data was produced in Manhattan and little data was produced in the harder-hit areas. In a similar way, a smartphone app designed to detect potholes will be less effective in lower-income areas where smartphones are not as prevalent.

For some, absence from these datasets is directly built into legal frameworks. GDPR, as one example, gives citizens in the EU the right to be forgotten. There are some constraints, but this typically allows an individual to request that a data controller, a company like Google that collects and stores data, should erase that individual’s personal data from the company’s databases. Provided the data controller complies, this individual will no longer be represented in that dataset. We should not expect that the people who exercise this right are evenly distributed in some demographic. Tech savvy and security-conscious individuals may be more likely to fall into this category than others. 

The US has COPPA], the children’s privacy act, which puts heavy restrictions on data that companies can collect from children. Many companies, such as the discussion website Reddit, chose to omit children under 13 entirely in their user agreements or terms of service. Scrolling through the posts in r/Spongebob, a subreddit community for the tv show Spongebob Squarepants, might suggest that no one under 13 is talking about Spongebob online.

Context Clues

For those of us who are collected into the nebulous big data-sphere, how accurately does your data actually represent you? Data collection is getting more and more sophisticated as the years go on. To name just a few sources of your data, virtual reality devices capture your motion data, voice controlled devices capture your speech patterns and intonation, and cameras capture your biometric data like faceprints and fingerprints. There are even now devices that interface directly with the neurons in primate brains to detect intended actions and movements. 

Unfortunately, this kind of data collection is not free from contextual biases. When companies like Google and Facebook collect data, they are only collecting data particular to their needs, which is often to inform advertising or product improvements. Data systems are not able to capture all the information that they detect; this is far too ambitious, even for our biggest data centers. A considerable amount of development time is spent deciding what data is important and worth capturing, and the result is never to paint a true picture of history. Systems that capture data are designed to emphasize the important features, and everything else is either greatly simplified or dropped. Certain advertisers may only be interested in whether an individual is heterosexual or not, and nuances like gender and sexuality are heavily simplified in their data. 

Building an indistinguishable robot replica of a person is still science fiction, for now, but several ai based companies are already aiming to replicate people and their emotions through chatbots. These kinds of systems learn from our text and chat history from apps like Facebook and Twitter to create a personalized chatbot version of ourselves. Perhaps there will even be a world where historians ask chatbots questions about our history. But therein lies another problem historians are all too familiar with, the meaning of words and phrases we use today can change dramatically in a short amount of time. This is, of course, assuming that we can even agree on the definition of words today.

In the article Excavating AI, Kate Crawford and Trevor Paglen discuss the political context surrounding data used in machine learning. Many machine learning models are trained using a set of data and corresponding labels to indicate what the data represents. For example, a training dataset might contain thousands of pictures of different birds along with the species of the bird in the picture. This dataset could train a machine learning model to identify species of birds from satellite images. The process begins to break down when the labels are more subjectively defined. A model trained to differentiate planets from other celestial bodies may incorrectly determine that Pluto is a planet if the training data was compiled before 2006. The rapidly evolving nature of culture and politics makes this kind of model training heavily reliant on the context of the dataset’s creation.

*A Venezuelan Troupial in Aruba*

Wrapping Up

200 years from now, historians will undoubtedly have access to massive amounts of data to study, but they will face the same historical biases and misinformation that plague historians today. In the meantime, we can focus on protecting our own online privacy and addressing biases and misinformation in our data to make future historians’ job just a little easier.

Thank you for reading!

References

  • https://www.cell.com/current-biology/fulltext/S0960-9822(21)00624-2
  • https://www.nationalgeographic.com/history/article/131016-otzi-ice-man-mummy-five-facts
  • https://www.hhs.gov/ohrp/regulations-and-policy/belmont-report/read-the-belmont-report/index.html
  • https://hbr.org/2013/04/the-hidden-biases-in-big-data
  • https://gdpr.eu/right-to-be-forgotten/
  • https://www.ftc.gov/tips-advice/business-center/privacy-and-security/children%27s-privacy
  • https://www.wired.com/story/replika-open-source/
  • https://excavating.ai/

AI Bias: Where Does It Come From and What Can We Do About It?

AI Bias: Where Does It Come From and What Can We Do About It?
By Scott Gatzemeier | June 18, 2021

Artificial Intelligence (AI) bias is not a new topic but it is certainly a heavily debated and hot topic right now. AI can be an incredibly powerful tool that provides tremendous business value from automating or accelerating routine tasks to discovering insights not otherwise possible. We are in the big data era and most companies are working to take advantage of these new technologies. However, there are several examples of poor AI implementations that enable biases to infiltrate the system and undermine the purpose of using AI in the first place. A simple search on DuckDuckGo for ‘professional haircut’ vs ‘unprofessional haircut’ depicts a very clear gender and racial bias.

Scott Gatzmeier 1
Professional Haircut
Unprofessional Haircut

In this case, a picture is truly worth 1000 words. This gender and racial bias is not hard-coded in the algorithm by the developers maliciously. Rather it is a reflection of the word-to-picture associations that the algorithm picked up from the authors of the web commentary. So the AI is simply reflecting back historical societal biases to us in the images returned. If these biases are left unchecked by AI developers they are perpetuated. These perpetuated AI biases have proven to be especially harmful in several cases, such as Amazon’s Sexist Hiring Algorithm that inadvertently favored male candidates and the Racist Criminological Software COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) where black defendants were 45% more likely to be assigned higher risk scores than white defendants.

Where does AI Bias Come From?

There are several potential sources of AI bias. First, AI will inherit the biases that are in the training data. (Training data is a collection of labeled information that is used to build a machine learning (ML) model. Through training data, an AI model learns to perform its task at a high level of accuracy.) Garbage in, Garbage out. AI reflects the views of the data that it is built on and can only be as objective as the data. Any historical data that is used would be subject to the same societal biases at the time the data was generated. When used to generate predictive AI, for example, this can lead to the perpetuation of stereotypes that impact decisions which can have real consequences and harms.

Next, most ML algorithms are built upon statistical math and look to make decisions based on distributions of data and key features that can be used to separate data points into categories or other items it associates together. Outliers that don’t fit the primary model tend to be weighted lower, especially when focusing only on the model accuracy. When working with people-focused data, often the outlier data points are in an already marginalized group. This is how biased AI can come from good clean non-biased data. AI is only able to learn about different biases (race, gender, etc.) if there is a high enough frequency of each group in the data set. The training data set must contain an adequate size for each group, otherwise this statistical bias can further perpetuate marginalizations.

Finally, most AI algorithms are built on correlation to the training data. As we know, correlation doesn’t always equal causation. The AI algorithm doesn’t understand what any of the inputs mean in context. For example, you get a few candidates from a particular school but you don’t hire them because you have a position freeze due to business conditions. The fact that they weren’t hired gets added to the training data. AI would start to correlate that school with bad candidates and potentially stop recommending candidates from that school even if they are great potentially because it doesn’t know the causation of why they weren’t selected.

What can we do about AI Bias?

Before applying AI to a problem, we need to ask what level of AI is appropriate? What should the role of AI be depending on the sensitivity and impact of the decision on people’s lives? Should it be an independent decision maker, a recommender system, or not used at all? Some companies are applying AI even if it is not at all suited to the task in question and other means would be more appropriate. So, there is a moral decision that needs to be made prior to implementing AI. Obed Louissaint the Senior Vice President of Transformation and Culture talks about “Augmented Intelligence”. This refers to leveraging the AI algorithms as “colleagues” to assist company leaders in making better decisions and better reasoning rather than replace human decision making. We also need to focus on the technical aspects of AI development and work to build models that are more robust against bias and against bias propagation. Developers need to focus on explainable, auditable, and transparent algorithms. When major decisions are made by humans the reasoning associated with that decision is an expectation and there is accountability. Algorithms should be subject to the same expectations, regardless of IP protection. Visualization tools that help to explain how AI works and the ‘why’ behind the conclusion that AI came to continue to be a major area of focus and opportunity.

In addition to AI transparency, there are emerging AI technologies such as Generative Adversarial Networks (GAN) that can be used to create synthetic unbiased training data based on parameters defined by the developer. Causal AI is another promising area that is building momentum and could provide cause and effect understanding to the algorithm. This could give AI some ‘common sense’ and prevent several of these issues.

AI is being adopted rapidly and the world is just beginning to capitalize on its potential. As Data Scientists, it is increasingly important to understand the sources of AI bias and continue to develop fair AI that prevents the social and discriminatory issues that arise from that bias.

References

  • https://www.inc.com/guadalupe-gonzalez/amazon-artificial-intelligence-ai-hiring-tool-hr.html
  • https://hbr.org/2020/10/ai-fairness-isnt-just-an-ethical-issue
  • https://www.logically.ai/articles/5-examples-of-biased-ai
  • https://towardsdatascience.com/why-your-ai-might-be-racist-and-what-to-do-about-it-c081288f600a
  • https://medium.com/ai-for-people/the-ethics-of-algorithmic-fairness-aa394e12dc43
  • https://towardsdatascience.com/survey-d4f168791e57
  • https://techcrunch.com/2020/06/24/biased-ai-perpetuates-racial-injustice/
  • https://towardsdatascience.com/reducing-ai-bias-with-synthetic-data-7bddc39f290d
  • https://towardsdatascience.com/ai-is-flawed-heres-why-3a7e90c48878