November 2020 – Data Science W231 | Behind the Data: Humans and Values

November 30, 2020

Twitch and the U.S. Military – an Ethical Issue

Twitch and the U.S. Military – an Ethical Issue
By Harinandan Srikanth | November 27, 2020

Image: How do we address the ethics or lack thereof in the military recruiting minors and young adults on Twitch when many of the representatives in Congress have never heard of Twitch?
(The Verge)

This was the frustration expressed by Congresswomen Alexandria Ocasio-Cortez from New York’s 14th district after the House of Representatives failed to vote for her bill to ban the U.S. Army from recruiting on Twitch. The draft of the amendment to the House Appropriations Bill, proposed on July 22nd, “would ban U.S. military organizations from using funds to ‘maintain a presence on Twitch.com or any video game, e-sports, or live-streaming platform.’” (Polygon.com). Twitch is a subsidiary of Amazon that specializes in video live streaming. This platform primarily supports gaming channels but also content other than gaming. Twitch has grown over the past decade to become the leading platform for online gaming, surpassing Youtube Gaming’s audience in recent years.

With 72% of men and 49% of women ages 18 to 29 engaging in gaming as a source of entertainment, the U.S. Military saw prominent live-streaming platforms like Twitch as an opportunity for recruitment from “Gen Z”. The U.S. Army launched its esports team in 2018, receiving 7,000 applicants for 16 spots, with team members streaming war-related games on “Twitch, Discord, Rivals, Mixer, and Facebook” (Military.com). The Army primarily uses fake prize giveaways on its esports channels to direct viewers to the recruitment page (Polygon.com). The number of recruiting leads has been growing rapidly, with 3,500 recruiting leads last year and 13,000 recruiting leads this year. The U.S. Navy and U.S. Air Force followed suit in recruiting gamers on live-streaming platforms (Military.com).

There was, however, an exception to the U.S. Military’s embrace of recruitment via online gaming, which was the U.S. marines. Last year, the Marine Corps Recruiting Command wrote that they would “not establish eSports teams or create branded games… due in part to the belief that the brand and issues associated with combat are too serious to be ‘gamified’ in a
responsible manner” (Military.com). Representative Ocasio-Cortez echoed this concern in another tweet:

Image: AOC justifying her legislation to ban the military from recruiting on Twitch (Polygon.com).

This tweet highlights the dangerous potential for the U.S. Army’s recruitment via fake prize giveaways on esports channels to lead minors and young adults to conclude that being a member of the armed forces is as easy as playing a war-related game. Sgt. 1st Class Joshua David, deployed as a Green Beret, says that reality could not be more different from the game. According to Sgt. 1st Class Christopher Jones, “He’ll tell every single person that we engage with that there’s no comparison between the two. There’s no way soldiers are going to carry 90 pounds’ worth of equipment moving in an environment like that, essentially superhuman. You know, these environments are made up; they’re fictional” (Military.com). There is also an informed consent issue presented by this method of recruitment. If minors and young adults who are led to the U.S. Army’s recruitment page via Twitch and similar platforms get the impression that military service is just like playing games, then they are not making the choice of signing up for service in the U.S. Army with the knowledge of what being in the Army is actually like.

On the flip side, deputy chief marketing officer for Navy Recruiting Command Allen Owens says that esports also has the potential to enlighten young people about the realities of military service. If, for example, an aircraft mechanic is good at a shooter game and the person their playing with asks them if shooting is their specialty in the military, the mechanic can explain what their real job is and that being good at shooting in real life is completely different from in a game (Military.com). While those possibilities are on the horizon, however, there are steps that both the U.S. Military and live-streaming companies need to take to resolve the ethical issues presented by recruitment via platforms like Twitch.

References

1. “After impassioned speech, AOC’s ban on US military recruiting via Twitch fails House vote”. The Verge. https://www.theverge.com/2020/7/30/21348451/military-recruiting-twitch-ban-block-amendment-ocasio-cortez
2. “Amendment would ban US Army from recruiting on Twitch”. Polygon.
https://www.polygon.com/2020/7/30/21348126/twitch-military-ban-alexandria-ocasio-cortez-aoc-law-congress-amendment-army-war-crimes
3. “As Military Recruiters Embrace Esports, Marine Corps Says it Won’t Turn War into a Game”. Military.com. https://www.military.com/daily-news/2020/05/12/military-recruiters-embrace-esports-marine-corps-says-it-wont-turn-war-game.html

November 30, 2020

Should we regulate apps like we do addictive drugs?

Should we regulate apps like we do addictive drugs?
By Blake Allen | Nov. 26, 2020

image source: Rosyscription.com

You pull your phone out at midnight, there’s the familiar buzz of a notification… waking up and loading your app you see that someone tagged a friend at a party you weren’t invited to. You get a sinking feeling… maybe they forgot? You press on, reading, scrolling and liking. Desperate for something that you can’t quite find, you eventually give up. That’s when you notice it’s 3am. Shocked at how much precious sleep you just wasted you wonder how did it get to this?

While the above story is fictionalized, it may not be for many people. As our phones and technology become more sophisticated, the apps we use are becoming more addictive… and this is by design [1]. Addictive technology is defined as software that attempts to hijack normal user behavior by subtle manipulation via hacking our innate reward systems. While trying to create a product that makes consumers use it over and over again is nothing new, for example original Coca-Cola had cocaine in it’s recipe [6]. As our societies progressed, many highly addictive substances were outlawed and controlled for our safety. Is it time for technology to receive the same process?

Technology addiction is estimated to have a rate between 1.5 and 8.2% of individuals. [2] In a country like the US this could correspond to roughly 3 to ~ 20 million individuals. Despite this shockingly high amount of affected individuals, there has been no formal governmental response to the issue of internet addiction. In fact there is a debate as to whether or not the diagnostic and statistical manual of mental disorders (DSM) should even seek to define internet addiction.The American Society of Addiction Medicine (ASAM) recently released a new definition of addiction as a chronic brain disorder, officially proposing for the first time that addiction is not limited to substance use. [3]

How to classify technological addiction?
What defines someone who is highly engaged between an addicted individual? Internet addiction can be summarized as the inability to control the amount of time spent interfacing with digital technology, withdrawal symptoms when not engaged, a diminishing social life and adverse work or academic consequences. [1] This may not describe you, but how many people who define themselves as ‘influencers’ could this define? In fact, phone addiction is becoming so prevalent a new word was defined “phubbing” which stands for phone snubbing – or the act when someone ignores another to look at their phone. [7]

How technology gets us hooked:
How did addictive technology even get created in the first place? Addictive apps such as instagram employ what is known as the hooked model, aptly described by Nir Eyal in his booked titled “hooked: how to build habit forming products. [4] In it Eyal describes a four step process of a trigger, which causes an action, a variable reward, and an investment by the user.
The trigger, often a push notification, interrupts our daily life and sends us down a distracting rabbit hole that may consume precious hours of our daily life. It could be argued that social media is actually manipulating us to take action. This manipulation is being codified, studied and amplified through the use of machine learning which can scale this in an unprecedented way. The end result is that each year our technology learns exactly what we like and how to press our buttons in order to increase engagement.

1. Trigger – External or internal cues that prompt certain behavior
2. Action – Use of the product, based on ease of use and motivation
3. Variable Reward – The reason for product use, which keeps the user engaged
4. Investment – A useful input from the user that commits him to go through the cycle again
Source: “Hooked” by Nir Eyal [4]

Who is at most risk?
While most individuals are likely to have a handle on their technology use, which users are most likely to fall victim to addictive technology? In a study done with rats, it was shown that rats preferred social interaction to highly addictive substances such as heroin and meth. [5] This is actually quite surprising and leads to some interesting conclusions. The reward for social engagement is actually more motivating than extremely addictive substances. When applying this to humans, one could argue that individuals who are socially isolated are at a higher risk for all forms of addictions including internet addiction.

Ethical concerns:
If a user is psychologically dependent on a technology, are they in fact being manipulated by that technology? I argue that addictive technology is manipulation, as it is hijacking internal reward systems in order to create a habitual activity which benefits a private company (ie: facebook, twitter, netflix, etc.). These companies have a perverse incentive to make the most addictive technologies as it directly corresponds to a larger bottom line. The more addictive their technology, the more successful they are. With a complete lack of regulation there is no incentive to quell this technology, in fact, if technologies don’t employ addictive technology they may be at a disadvantage in the marketplace.

Inherent in addictive technology is a lack of informed consent. On the surface each social media seems to provide a simple service, connecting with friends. What is often lurking underneath the surface are highly sophisticated machine learning agents which are learning exactly which buttons to press in order to get you to engage with their service. This isn’t something that the average user is informed about, nor are they aware that this is going behind the scenes. If the user were prompted with a warning label similar to say the ones found on cigarettes, they may have a better understanding of the potential harms of using such a service.

In addition the algorithms which are driving user interaction could be problematic as they could be causing a net negative experience for an individual. For instance, perhaps an individual feels like they aren’t beautiful enough, the applications may be feeding this insecurity because it is a primary driver for user behavior. The user has more activity, but as time goes on has an increasing negative impact on that user’s mental health as their insecurities are constantly being reinforced. This becomes a runaway process which could lead to disastrous consequences if left unchecked.

Mitigation / Management:
Addictive technology is a fairly new field but it leverages the years of psychological research that has been done to classify and codify what actions leverage user behavior. One could codify these addictive elements and either put safeguards in place or outright outlaw them in order to protect consumers. Additional outreach could be made to individuals who use technology above a certain threshold which seeks to engage them and promote social interactions, which could lessen the desire for addictive technology.

What is important is that we as a society understand the root causes of addiction and treat this as a mental health issue. If safeguards can be made, then it’s possible we can have a meaningful and safe interaction with our applications that doesn’t lead us down the rabbit hole of addiction. Furthermore we should penalize any company employing these addictive techniques without regulation. There is far too much at stake for our mental health if these companies are left unchecked.

Question to the reader, do you feel like you’re addicted to an app / apps? If so, which ones and why? comment below!

Sources:
[1] Hilarie Cash et al. Internet Addiction: A brief summary of Research and Practice. November, 2012. Current Psychiatry Review. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3480687/

[2] Weinstein A, Lejoyeux M. Internet addiction or excessive Internet use. The American Journal of Drug and Alcohol Abuse. 2010 Aug;36(5 ):277–83. https://www.ncbi.nlm.nih.gov/pubmed/20545603

[3] American Society of Addiction Medicine. Public Policy Statement: Definition of Addiction. 2011 [cited 2011 August 21]; http: //www.asam.org/1DEFINITION_OF_ ADDICTION_LONG_4-11.pdf. Public Policy Statement: Definition of Addiction. 2011 [cited 2011 Augus.

[4] Hooked: How to build Habit-Forming Products, Nir Eyal. Penguin Random House. December 26, 2013

[5] Venniro, M., Zhang, M., Caprioli, D., et al. Volitional social interaction prevents drug addiction in rat models. Nature Neuroscience. 21(11):1520-1529, 2018.

[6] Did Coca Cola have cocaine in the original recipe? https://teens.drugabuse.gov/blog/post/coca-colas-scandalous-past

[7] Phubbing, a definition. https://www.healthline.com/health/phubbing

November 30, 2020

Automatic License Plate Readers – For your Safety?

Automatic License Plate Readers – For your Safety?
By Anonymous | November 27, 2020

Imagine a system that can track where you have been and knows where you will be going in the future. This system is not science fiction; it is reality. A network of interconnected video cameras located on interstate highways, local streets, outside of homes, and in police cars creates a universal mesh of data of your personal breadcrumbs. When interconnected, the system can find individuals in real-time by capturing your license plate number. The technology of Automatic License Plate Readers (ALPR) has exploded in growth with databases containing more than 15 Billion license plates, covering the majority of the United States and present in over 50 countries. Though the solution was sold as a way to reduce crime while keeping up with shrinking law enforcement budgets, personal privacy protection has been lagging.

Image: Electronic Frontier Foundation

Historically, law enforcement would look for license plate numbers by walking and driving down streets within the city. This process would naturally limit how much information was gathered. Today, officers can incorporate cameras within their cars that will indiscriminately capture images on vehicles, not because of suspected criminal activity, but because the information might be useful in future investigations. This method is called “gridding” and feeds the data into the ALPR system.

Any law enforcement agency that participates in ALPR will have access to real-time data of over 150 million plate reads per month at no cost. In return, most ALPR systems require access to license plate reads from cameras established in that specific jurisdiction. The use of ALPR systems goes beyond law enforcement agencies and is available to anyone, even private citizens. Local neighborhoods can pay $2,000/mo for a neighborhood camera that will scan license plates, allowing neighbors to view travel patterns and follow up on suspicious vehicles.

Image: Rekor Recognition System

The collection of massive amounts of data is not new. Google has been collecting images of everything outdoors and is viewable through its product, Google Streets. The same argument was used for the mass collection of license plate information, it is already publicly visible and accessible to everyone to record or take pictures. Companies creating ALPR systems are taking publicly available information and using the data to help catch criminals and reduce city expenses. So what is the harm?

The harm is that law enforcement could misuse ALPR systems to stalk individuals at their work, events, or political rallies, or even at their doctors’ offices. It enables law enforcement or anyone to analyze travel patterns that could reveal sensitive information, regardless of whether they are suspected of criminal activity. For instance, police can use ALPR data to determine the places people visit, which doctors they go to, and what religious services they attend.

If these technologies were deployed without reasonable suspicion, personal bias could intervene and police could deploy this technology more heavily in low income and minority neighborhoods. Police could grid these neighborhoods more often, leading to over-policing these areas. In 2016, a BuzzFeed investigation found that ALPRs in Port Arthur, Texas, were primarily used to find individuals with unpaid traffic citations, leading to their incarceration.

Image: Gridding – Tempe Police Department

Eric J. Richard was driving his white Buick LaCrosse on Interstate 10, when Louisiana State Police stopped him for following a truck too closely. During the roadside interrogation, the trooper asked where Richard was traveling from. “I was coming from my job right there in Vinton,” Richard replied. The officer had already looked up the travel records for Richard’s car and already knew it had crossed into Louisiana from Texas earlier in the day. Based on this “apparent lie,” the trooper extended the traffic stop by asking more questions and calling in a drug dog.

The privacy harms with ALPR systems become very apparent when collecting millions of license plates of innocent individuals are analyzed together, creating new use cases that were never possible before. The ability to track individuals in real-time and analyzing travel patterns to predict where people might be, provide new capabilities that were not available before. These new capabilities, coupled with a weak privacy policy, has resulted in numerous harms stemming from the lack of training, inconsistent data retention policies, unclear access, auditing, and security policies.

References:

Image Source: Image: Electronic Frontier Foundation. https://www.youtube.com/watch?v=ofpxX49vdXY
Image Source: Image: Rekor Recognition System. https://www.google.com/streetview/
Image Source: Image: Gridding – Tempe Police Department. https://www.azmirror.com/2019/07/08/arizona-police-agencies-gather-share-license-plate-data-but-few-ensure-rules-are-being-followed
https://www.eff.org/deeplinks/2020/02/california-auditor-releases-damning-report-about-law-enforcements-use-automated
https://www.nbcbayarea.com/news/local/south-bay/neighbors-install-license-plate-reader-in-los-gatos/2233057
https://massprivatei.blogspot.com/2019/09/massive-30-state-real-time-alpr.html?q=Rekor+Systems
https://massprivatei.blogspot.com/2020/02/rekor-systems-uses-video-doorbells-to.html
https://www.ncjrs.gov/pdffiles1/nij/grants/247283.pdf
https://www.technocracy.news/police-use-license-plate-readers-to-grid-neighborhoods/

November 4, 2020November 4, 2020

Section 230: Congress Seeks Testimony, Ignores It

Section 230: Congress Seeks Testimony, Ignores It
By EJ Haselden, October 30, 2020

It’s a timeless trope from the era of afterschool specials: misbehaving children stand before Mom and Dad’s kitchen-table duumvirate to answer for their schoolyard shenanigans, but the pretense of discipline soon wears through and the scene devolves into a nasty argument between the grownups. The kids’ real punishment is that they are made pawns and captive audience to a painful display of parental dysfunction. So unfolded this week’s Senate hearing on social media regulation, rhetorically titled “Does Section 230’s Sweeping Immunity Enable Big Tech Bad Behavior?”

Section 230 (47 U.S.C. § 230) is a part of the 1996 Communications Decency Act, and it is perhaps best known for shielding social media companies (among others) from liability for content that their users post:

“No provider or user of an interactive computer service shall be treated as the publisher or speaker of any information provided by another information content provider.”

The titular “bad behavior” and “sweeping immunity” that prompted this hearing, however, relate to another, lesser-known protection granted by Section 230, which shields platforms when they choose to filter, fact-check, or otherwise annotate content that they consider harmful and/or inaccurate:

“No provider or user of an interactive computer service shall be held liable on account of any action voluntarily taken in good faith to restrict access to or availability of material that the provider or user considers to be obscene, lewd, lascivious, filthy, excessively violent, harassing, or otherwise objectionable, whether or not such material is constitutionally protected”

The nominal debate here surrounds the “otherwise objectionable” material in that description. Social media companies have chosen to interpret it as any content of questionable origin or veracity that could result in public harm (most recently regarding health advisories, voter suppression, and influence campaigns orchestrated by foreign intelligence services). Their caution stems from lessons learned in the rapid spread of disinformation leading up to the 2016 election, as well as a once-in-a-century pandemic that has seen deadly irresponsible claims espoused by supposed authority figures. Republican lawmakers claim that this content moderation has disproportionately muted conservative voices on social media. Democratic lawmakers, meanwhile, argue that these companies not only have the right, but the responsibility, to assess content based on its potential consequences and without regard for its ideological bent. It should be noted that multiple independent studies and a Facebook internal audit failed to find the alleged anti-conservative bias, but the fact that right-leaning engagement actually dwarfs that of center and left-leaning sources means that flagging only a small fraction of it still provides ample anecdotal evidence of prejudice (which is obviously enough to prompt Congressional hearings).

The administration has called for an outright repeal of Section 230, despite the fact that this would almost certainly lead to more content restrictions as companies adapt to the increased threat of liability. The consensus on Capitol Hill and in Silicon Valley therefore appears to be some amount of targeted Section 230 reform, while keeping the basic framework intact.

Which brings us back to this week’s hearing (or spectacle, or charade, or sham, depending on whom you ask). The Senate Committee on Commerce, Science, and Transportation subpoenaed the CEOs of Google, Twitter, and Facebook, respectively, to testify on behalf of Social Media. Most commentators agree that the face time with Tech Actual was not spent productively. As with those quarrelling parents, it was never really about the kids.

Republicans’ line of soi-disant questioning focused almost entirely on what they consider censorship of conservatives (69 of 81 questions, per the New York Times), as they demanded examples of the same (loosely defined) censorship directed at liberal outlets. Senator Ron Johnson asked the witnesses about the ideological makeup of their respective workforces—rhetorically, because it would be illegal for them to maintain that sort of record—in an effort to prove anti-conservative bias by virtue of microcultural majority (which almost sounded like an argument for some variant of affirmative action).

Democrats, for their part, focused most of their attention on the legitimacy and impact of the hearing itself, expressing concern that it could serve to intimidate social media companies into relaxing moderation policies at a time when the nation is perhaps most vulnerable to manipulative media. The bulk of their more on-topic questioning concerned dis- and misinformation and what actions the companies were taking to combat it ahead of the election. Still, not that much about Section 230 reform.

In keeping with the scripted, postured non-discussion, the most meaningful witness testimony came in the form of prepared opening statements. In those, Pichai reasserted Google’s anti-bias philosophy and cautioned against reactionary changes to Section 230, Dorsey promoted increased transparency and user inclusion in Twitter’s decision-making processes, and Zuckerberg praised Section 230 while inviting a stricter and more explicit rewrite of its provisions (for which Facebook would gladly provide input). Their full statements are available on the committee’s hearing website.

The timing and tenor of this eleventh-hour pre-election partisan screed exchange never inspired much hope for substantive debate, but even so, there was a jarring lack of effort to better understand the pressing and complex problems that Section 230 is still, at this moment, expected to resolve. The reason this matters, the reason it’s so alarming that neither side was terribly interested in the companies’ offers of greater transparency—something we’d consider a win for democracy in saner times—is that our government has abdicated its responsibility of oversight on this topic except in cases where the threat of enforcement can be used as a political weapon.

In the end, it’s probably fitting that Congress used a social media hearing as a platform to amplify and disseminate entrenched views that they had no intention of rethinking.

Photo credits:

November 4, 2020

Can there truly be ethics in autonomous machine intelligence?

Can there truly be ethics in autonomous machine intelligence?
By Matt White, October 30, 2020

Some would say that we are in the infancy of the fourth industrial revolution, where artificial intelligence and the autonomy it is ushering in are positioned to become life-altering technologies. Most understand the impacts of autonomous technologies as it relates to jobs, they are concerned that autonomous vehicles and robotic assembly lines will relegate them to the unemployment line. But very little thought and conversely research has been done into the ethical implications of autonomous decision making that these systems are confronted with. Although there are far reaching ethical implications with AI and automation there are opposing views of who is truly responsible for the ethical decisions made by an autonomous system. Is it the designer? The programmer? The supplier of the training data? The operator? Or should the system itself should be responsible for any moral or ethical dilemmas and their outcomes.

Take for instance the incident with Uber’s self-driving car a few years ago, where one of its cars killed a pedestrian crossing the road in the middle of the night. The vehicle’s sensors collected data which revealed it was aware of a person crossing in front of its path, but the vehicle took no action and struck and killed the pedestrian. Who is ultimately responsible when an autonomous vehicle kills a person? In this case it was the supervising driver but what happens when there is no driver in the driver seat? What if the vehicle had to make a choice like in the trolley problem, between hitting a child or hitting a grown man? How would it make such a challenge moral decision?

A car parked on a city street

Image Source: Singularity Hub

The Moral Machine, a project from MIT’s Media Lab is tackling just this, developing a dataset on how people would react to particular moral and ethical decisions where it comes to driverless cars. Should you run over 1 disabled person and 1 child or 3 obese people, or should you crash yourself into a barrier and kill your 3 adult passengers to save two men and two women of a healthy weight pushing a baby? However, the thought that autonomous vehicles will base their decisions of morality on crowd-sourced datasets of varying moral perspectives seems absurd. Only those who participate in the process will have their opinions included, anyone can go online and contribute to the dataset without any form of validation, and not withstanding all of the opinions that are not included, there are various moral philosophy theories that could be applied to autonomous ethical decision making that would overrule rules derived from datasets. Does the system follow utilitarianism, Kantianism, virtue ethics, so forth? Although the Moral Machine is considered to be a study in its current incarnation, it uses a very primitive set of parameters (number of people, binary gender, weight, age, visible disability) to allow users to determine the value they place on human life. In real life, real people have more than these handful of dimensions like race, socio-economic status, non-binary gender, and so forth. Could adding these real-life dimensions create a bias that would further de-value people who might meet certain criteria and be in the way of an autonomous vehicle? Might the value placed on a homeless person by less than that of a Wall street stockbroker?

Graphical user interface, diagram

Image Source: Moral Machine

There is certainly a lot to unpack here, especially if we change contexts and look at armed unmanned autonomous vehicles (AUAVs) which are used in warfare to varying degrees. As we transition from remote pilots to fully autonomous war machines, who makes the decision whether to drop a bomb on a school containing 100 terrorists and 20 children? Does the operator absolve himself of any responsibility when the AUAV makes the decision to drop a bomb and kill innocent people? Does the programmer or the trainer of the system bear any responsibility?

As you can see the idea of ethical decision making by autonomous systems is highly problematic and presents some very serious challenges that require further research and exploration. Systems that are designed to have a moral compass will not be sufficient, as they will adopt the moral standpoint of its creators. Training data is likely to be short-sighted, shallow in dimensions and biased based on the ethical standpoints of its contributors. It is obvious that the issue of ethical decision making in autonomous system needs further discourse and research in order to ensure that future systems that we come to rely on can make ethical decisions in a manner that demonstrates no bias; or perhaps we may have to accept that in fact autonomous machines will not be able to make ethical decisions in an unbiased manner.

References:

November 2, 2020

The Looming Revolution of Online Advertising

The Looming Revolution of Online Advertising
By Anonymous, October 30, 2020

In the era of the internet, advertising is getting creepily accurate and powerful. Large ad networks like Google, Facebook, and more collect huge amounts of data, through which they can infer a wide range of user characteristics, from basic demographics like age, gender, education, and parental status to broader interest categories like purchasing plan, lifestyle, beliefs, and personality. With such powerful ad networks out there, users often feel like they are being spied on and chased around by ads.

Image credit: privateinternetaccess.com

How is this possible?
How did we leak so much data to these companies? The answer is through cross-site and app tracking. When you surf the internet, going from one page to another, trackers collect data on where you have been and what you do. According to one Wall Street Journal study, the top fifty Internet sites, from CNN to Yahoo to MSN, install an average of 64 trackers[1]. The tracking can be done by scripts, cookies, widgets, or invisible image pixels embedded on the sites you visit. You probably have seen the following social media sharing buttons. Those buttons, no matter you click them or not, can record your visits and send data back to the social platform.

Image credit: pcdn.co

A similar story is happening on mobile apps. App developers often link in SDKs from other companies, through which they can gain analytic insights or show ads. As you can imagine, those SDKs will also report data back to the companies and track your activities across apps.

Why is it problematic?
Cross-site or app tracking poses great privacy concerns. Firstly, the whole tracking process happens behind the scenes. Most users are not aware of it until they see some creepily accurate ads, and even if they are aware of it, the users often have no idea how the data is collected and used, and who owns it. Secondly, only very technically sophisticated people know how to prevent this tracking, which can involve tedious configuration or even installation of other software. To make things worse, even if we can prevent future tracking, there is no clue how to wipe out the already collected data.

In general, cross-site and app activities are collected, sold, and monetized in various ways with very limited user transparency and control. GDPR and CCPA have significantly improved this. Big trackers like Google, Facebook, and more provide dedicated ad setting pages (1, 2), which allow users to delete or correct their data, to choose how they want to be tracked, etc. Though GDPR and CCPA gave users more control, most users stay with the default options and cross-site tracking remains prevalent.

The looming revolution
With growing concerns of user privacy, Apple took a radical action to kill the cross-site and app tracking. Over the past couple of years, Apple gradually rolled out the feature of Safari Intelligent Tracking Prevention (ITP)[2], which curtailed companies’ ability to install third-party cookies. With Apple taking the lead, Firefox and Chrome browsers are also launching similar features as ITP. In the release of IOS 14, Apple brought a similar feature as ITP to Apps world.

Image credit: clearcode.com

While at the first glance this may sound like a long-overdue change to safeguard users’ privacy, when delving deeper, it could create backlashes. Firstly, internet companies collect data in exchange for their free services: products like Gmail, Maps, Facebook are all free of use. According to one study from VOX, in an ad-free internet, the user would need to pay $35 every month to compensate for ad revenue[3]. Some publishers even threatened to proactively stop working on Apple devices. Secondly, Apple’s ITP solution doesn’t give much chance for users to participate. Cross-site tracking can in general enable more personalized services, more accurate search results, better recommendations, etc. Some uses may choose to opt-in to allow cross-site tracking for this purpose. Thirdly, Apple’s ITP only disabled third party cookies, and there are many other ways to continue the tracking. For example, ad platforms can switch to device-id or “fingerprint” the users by combining IP address and Geolocation.

Other radical solutions were also proposed, such as Andrew Yang’s Data Dividend Project. With many ethical concerns and the whole ads industry at stake, it is very interesting to see how things play out and what other alternatives are proposed around cross-site and app tracking.

References

November 2, 2020

We see only shadows

We see only shadows
By David Linnard Wheeler, October 30, 2020

After the space shuttle Challenger disaster (Figure 1) on January 28th, 1986, most people agreed on the cause of the incident – the O-rings that sealed the joints on the right solid rocket booster failed under cold conditions (Lewis, 1988). What most failed to recognize, however, was a more fundamental problem. The casual disregard of outliers, in this case from a data set used by scientists and engineers involved in the flight to justify the launch in cold conditions, can yield catastrophic consequences. The purpose of this essay is to show that a routine procedure for analysts and scientists – outlier removal – not only introduces biases but, under some circumstances, can actually lead to lethal repercussions. This observation raises important moral questions for data scientists.

Figure 1. Space shuttle Challenger disaster. Source: U.S. NEWS & WORLD REPORT

The night before the launch of the space shuttle Challenger, executives and engineers from NASA and Morton Thiokol, the manufacturer of the solid rocket boosters, met to discuss the scheduled launch over a teleconference call (Dalal et al. 1989). The subject of conversation was the sensitivity of O-rings (Figure 2) on the solid rocket boosters to the cold temperatures forecasted for the next morning.

Figure 2. Space shuttle Challenger O-rings on solid rocket boosters. Source: https://medium.com/rocket-science-falcon-9-and-spacex/space-shuttle-challenger-disaster-1986-7e05fbb03e43

Some of the engineers at Thiokol opposed the planned launch. The performance of the O-rings during the previous 23 test flights, they argued, suggested that temperature was influential (Table 1). When temperatures were low, for example between 53 and 65∘F, more O-rings failed than when temperatures were higher.

Table 1: Previous flight number, temperature, pressure, number of failed O-rings, and number of total O-rings

Some personnel at both agencies did not see this trend. They focused only on the flights where at least one O-ring had failed. That is, they ignored outlying cases where no O-rings failed because, from their perspective, they did not contribute any information (Presidential Commission on the space shuttle Challenger Accident, 1986). Their conclusion, upon inspection of data from Figure 3, was that “temperature data [are] not conclusive on predicting primary O-ring blowby” (Presidential Commission on the space shuttle Challenger Accident, 1986). Hence, they asked Thiokol for an official recommendation to launch. It was granted.

Figure 3. O-ring failure as a function of temperature

The next morning the Challenger launched and 7 people died.

After the incident, President Regan ordered William Rogers, former Secretary of State, to lead a commission to determine the cause of the explosion. The O-rings, the Commission found, became stiff and brittle in response to cold temperatures, thereby unable to maintain the seal between the joints of the solid rocket boosters. The case was solved. But a more fundamental lesson was missed.

Outliers and their removal from data sets can introduce consequential biases. Although this may seem obvious, it is not. Some practitioners of data science essentially promote cavalier removal of observations that are different from the rest. They focus instead on the biases that can be introduced when certain outliers are included in analyses.

This practice is hubristic for at least one reason. We, as observers, do not, in most cases, completely understand the processes by which the data we collect are generated. To use Plato’s allegory of the cave, we just see the shadows, not the actual objects. Indeed, this is one motivation to collect data. To remove data without defensible justification (e.g measurement or execution error) is to claim, even if implicitly, that we know how the data should be distributed. If true, then why collect data at all?

To be clear, I am not arguing that outlier removal is indefensible under any condition. Instead, I am arguing that we should exercise caution and awareness of the consequences of our actions, both when classifying observations as outliers and ignoring or removing them. This point was acknowledged by the Rogers Commission in the statement: “a careful analysis of the flight history of O-ring performance would have revealed the correlation in O-ring performance in low temperature[s]” (Presidential Commission on the space shuttle Challenger Accident, 1986).

Unlike other issues in fields like data science, the solution here may not be technical. That is, a new diagnostic technique or test will likely not emancipate us from our moral obligations to others. Instead, we may need to iteratively update our philosophies of data analysis to maximize benefits, minimize harms, and satisfy our fiduciary responsibilities to society.

References:

Dalal, S.R., Fowlkes, E.B., Hoadley, B. 1989. Risk analysis of the space shuttle: Pre-Challenger prediction of failure. Journal of the American Statistical Association.
Lewis, S. R. 1988. Challenger The Final Voyage. New York: Columbia University Press.
United States. 1986. Report to the President. Washington, D.C.: Presidential Commission on the Space Shuttle Challenger Accident.