Freedom of Speech vs Sedition

Freedom of Speech vs Sedition
Gajarajan Nagarajan | January 29, 2021

2021 storming of the United States Capitol

Ideas that offend are getting more prominent due to divisive and hateful rhetoric harvested by major political parties, their associated news channels and ever growing, unmonitored social media platforms. As US is reeling over recent storming of US Capitol, passionate debates have commenced across the country on who can be the enforcers? Freedom of speech does have its limits as against threats, racism, hostility violence including acts of sedition. Hate crime laws are constitutional so long as they punish violence or vandalism.

US First amendment protects all types of speech and hence hate speech gets amplified in the new digital era with millions of followers can get induced or get swayed by propaganda. Under the first amendment, there is no such thing as a false idea. However pernicious an opinion may seem; we depend for its correction not on the conscience of judges and juries but on the competition of other ideas.

Weaponization of Social Media

Jan 6th event at US Capitol did trigger an important change across all major social media companies and their primary cloud infrastructure providers. Twitter, Facebook, YouTube, Amazon, Apple and Google banned President Trump and scores of his supporters from their platforms for inciting violence. How big will this challenge remain going forward? Aren’t these companies the original enablers and accelerators with no effective control for violence prevention? Should large media companies take law onto their own hands (or their platforms) while state and federal governments take a pause in moderation? Or is this something that needs action by societies as we the people are the cause of the pervasive and polarizing content creators of conspiracy theories in American Society?

Private companies have shown themselves able to act far more nimbly than our government, imposing consequences on a would-be tyrant who has until now enjoyed a corrosive degree of impunity. But in doing so, these companies have also shown a power that goes beyond that of many nation-states and without democratic accountability. Technology companies have employed AI/ML and NLP tools to help generate more visitors and longer duration of engagement of users in their platforms which has been a breeding ground for hate groups. Negative aspects of this unilateral power exercised by technology companies can become precedent only to be exploited by the enemies of freedom of speech around the world. Dictators, authoritative regimes and those in power can do extreme harm to democracy by colluding or forcing technology companies to bend the rules to satisfy their political gain.

In a democratic government, public opinion impacts everything. It is all important that truth should be the basis of public information. If public opinion is ill formed – poisoned by lies, deception, misrepresentations or mistakes; the consequences could be dire. Government, which is the preservative of the general happiness and safety, cannot be secure if falsehood and malice are injected to rob confidence and trust of the people

Looking back into history combined with data science may provide some options to protect future of our democracy.

  • The Sedition Act of 1918 covers broad range of offenses, notably speech and expression of opinion that cast the government or the war effort in a negative light. In 2007, a bill named “Violent Radicalization and Homegrown Terrorism Prevention Act” was sponsored by Representative Jane Harman (Democrat from California). The bill would have amended Homeland Security Act to add provisions to prevent and control homegrown terrorism and also establish a grant program to prevent radicalization. Congress can be enabled to revisit above bill with bipartisan support.
  • Section 3 of the 14th Amendment provides guidelines including prohibition of current or former military officers, along with current and former federal and state public officials from serving in variety of government offices if they shall have engaged in insurrection or rebellion against the United States Constitution
  • Social media bans are key defense mechanisms and needs to be nurtured, enhanced and implemented across all democratic nations and otherwise. Ability to drive conversation, reaching wider audiences for recruitment and perhaps more important benefit of monetization of anger and distrust by conflict entrepreneurs are effectively neutralized with strong enforcement of social media ban.
  • Consumer influence on large companies have major role in regulating nefarious online media houses. For example, de-platforming pressure to turn off cloud and app store access to Parler (competitor to Twitter); pressure on publishing houses to block book proposals and FCC regulation on podcasts may provide manageable impact for both extreme left and right wing fanatism and fear mongering.

Photo credits:

Ethical Implications with Autonomous Vehicles

Ethical Implications with Autonomous Vehicles
Surya Gutta | January 29, 2021

Autonomous vehicles are poised to revolutionize the transportation industry as they could dramatically reduce automotive accidents. Apart from saving human lives, they can reduce billions of dollars in accident damages in the U.S.[1] They could also give people ample free time and increase productivity by removing time wasted driving. The cost of ride-sharing also decreases as labor accounts for roughly 60%[2] of the taxi business’s total cost.

Autonomous vehicles use either Radar or LiDAR sensors data to detect obstacles, such as human beings, supporting the Advanced Driver Assistance Systems (ADAS). ADAS allows a vehicle to operate autonomously in an environment (other vehicles, bicyclists, pedestrians, traffic signals, and obstacles in the scene). Autonomous vehicles process large amounts of data generated by these sensors, real-time traffic data, and personal data that includes locations, start and stop times.


Ethical challenges

Data collection and analysis: Autonomous vehicles collect large amounts of data. The sensors collect human beings’ images (ex: human being/pedestrian as an obstacle in front of the car) without the user’s consent. There is no regulation on how much data can be collected. Once the data is collected, there are no regulations on who can access that data and how it is distributed and stored. Moreover, there will be many implications of a data breach. The collected data can be used for other purposes, without the users’ consent, leading to unintended consequences. The data variation due to human body size and shape might influence the autonomous software’s decision.


Quality of vehicle sensors: Sensors are one of the costly components in autonomous vehicles. High-end sensors increase the cost drastically. If the vehicle purchase price increases beyond a specific limit in certain countries, there won’t be incentives from the local government to the vehicle owners. To minimize the cost, vehicle manufacturers might not use all the required sensors[3] at the expense of increased risk to human beings.


Jobs: While autonomous vehicles will create jobs in engineering and customer service [4,5], many driver jobs could be lost as there won’t be any need for drivers. More than 3 million taxi, truck, and bus drivers may lose their livelihoods and professions in the U.S.[6] As the accidents decrease due to autonomous vehicles (95% of recent accidents are due to human error[7]), the importance of vehicle insurance might decrease. Also, people working in collision repair centers and chiropractic care centers might lose jobs. People might opt for autonomous ride-shares compared to public transit services[8] because of the cheaper prices offered by autonomous ride-shares, which will impact the jobs in public transit services. What happens to the people dependent on the construction and maintenance of the public transit system? Also, ample parking spaces might not be required, and people either directly or indirectly dependent on them will lose their livelihood. Even though there is a lot of time before autonomous vehicles take over so that the impacted people can change their careers, it’s hard for some people due to their age, family circumstances, etc.

Regulations and Guidelines
Most of the current regulations[9] on the safety of motor vehicles are based on the assumption of humans driving vehicles. New regulations [10,11] should be adopted where ethics should be given utmost importance starting from the vehicle’s design to its adoption in society. Also, there should be transparency on the algorithms being used and data being collected by the autonomous vehicles.

There should be a uniform policy on what data can be collected and how it can be used. The federal government should regulate the data privacy [12] as the vehicle manufacturer can promise to de-identify personal information [13] (what time a user left home and to where the user went), but due to different standards maintained by different manufacturers, there is a risk that some of them will allow re-identification. Since autonomous vehicles are in the early stages, there are many unanswered questions like what’s the expected behavior if the sensors fail? When an accident occurs, who is at fault? The owner or the manufacturer of the autonomous vehicle? All these need to be considered while coming up with regulations and guidelines.

Policymakers should act now to prepare for and minimize disruptions to the millions of jobs due to autonomous vehicles that may come in the future. There should be a timeline to come up with new regulations and guidelines protecting humans and their privacy.

[1] Ramsey, M. (2015, March 5). Self-Driving Cars Could Cut 90% of Accidents. WSJ; Wall Street Journal.
[2] Noonan, K. (2019, September 30). What Does the Future Hold for Self-Driving Cars? The Motley Fool; The Motley Fool.
[3] Insider Q&A: Velodyne advocates for safer self-driving cars. (2019, May 19). AP NEWS.
[4] Alison DeNisco Rayome. (2019, January 11). Self-driving cars will create 30,000 engineering jobs that the US can’t fill. TechRepublic; TechRepublic.
[5] Gray, R. (n.d.). Driving your career towards a booming sector.
[6] Balakrishnan, A. (2017, May 22). Self-driving cars could cost America’s professional drivers up to 25,000 jobs a month, Goldman Sachs says; CNBC.
[7] Crash Stats: Critical Reasons for Crashes Investigated in the National Motor Vehicle Crash Causation Survey. (2015).
[8] Will autonomous cars change the role and value of public transportation? (2015, June 23). The Transport Politic.
[9] Laws and Regulations- As a Federal agency, NHTSA regulates the safety of motor vehicles and related equipment. (2016, August 16). NHTSA.
[10] Dot/NHTSA Policy Statement Concerning Automated Vehicles 2016 Update to ‘preliminary statement of policy concerning automated vehicles’.(2016).
[11] NHTSA Federal Automated Vehicles Policy. (2016).
[12] Office, U. S. G. A. (2014). In-Car Location-Based Services: Companies Are Taking Steps to Protect Privacy, but Some Risks May Not Be Clear to Consumers. Www.Gao.Gov, GAO-14-81.
[13] Goodman, E. P. (2017, July 14). Self-driving cars: overlooking data privacy is a car crash waiting to happen. The Guardian; The Guardian.

Never Let Them See You Sweat

Never Let Them See You Sweat
Steve Dille | February 2, 2021

The global pandemic hasn’t been bad for one company. Peloton, the maker of internet and social media connected exercise bikes has seen an explosion of demand from exercise shut-ins. Peloton bikes let you stream live classes, communicate with other riders, and integrate with social media. President Biden rides a Peloton which has raised some security eyebrows with the NSA. So, just how secure and private is your information on Pelton? Here are answers to some common questions.

How Visible am I?
The Peloton bike has a camera and microphone. But, can Peloton instructors watch me workout and hear me? According to the Peloton Privacy Policy, the camera and microphone can only be activated by you to accept a video chat from another user. The instructors cannot see you.

What Data does Peloton Collect?
When you set up your profile, Peloton asks you to provide information such as a username, email address, weight, height, age, location, birthday, phone number and an image. Only the email address and username are required. Payment information is collected for the monthly subscription but only stored at secure third-party processors.

Peloton also collects information about your exercise participation – date, class, time, total output, and heart rate monitor information. Peloton user profiles are set to public by default, allowing other registered Peloton users to view your fitness performance history, leaderboard name, location and age (if provided). Those users can also contact or follow you through the Peloton service. You have the option to set your profile to “Private,” so only members you approve as followers can see your profile and fitness history.

As you navigate the service, certain passive information is collected through cookies. Peloton uses personal information and other information about you to create anonymized, aggregated demographic, location and device information. This information is used to measure rider interest and usage of various features of the Peloton services.

Does Peloton Sell My Information to Advertisers?
Peloton’s privacy policy states “We currently do not “sell” your information as we understand this term.” However, they seem to “share” your information. The privacy policy contains a section on “Marketing – Interest-Based Advertising and Third-Party Marketing.” Peloton does make your data available for interest-based advertising and may use it in making services available to you that would seem of interest. Peloton enables you to minimize sharing of your information with third parties for marketing purposes with this form.

What About Pelton and Social Media?
This is an area where your privacy can be violated in ways hard to envision if you chose to participate. Peloton offers publicly accessible blogs, social media pages, private messages, video chat, community forums and the ability to connect to Facebook and other fitness gadgets like Fitbit. When you disclose information about yourself in any of these areas, Peloton collects and stores the information. Further, if you choose to submit content, to any public area of the Peloton Service or any other public sites, such content will be considered “public” and will not be subject to the Peloton privacy protections. This can be problematic for riders posting their new personal record to an instructor’s Facebook page. Whether they realize it, they just made some previously private profile information public.

Once you start connecting your Peloton information to social networks, it becomes very possible for others to piece information together about you. For example, Amazon, has a leaderboard group called “Pelozonians.” When you join that group, it is now known that you work at Amazon to anyone on Peloton or the free app.

What Can I Do to Protect My Privacy?
Configuring your settings wrong can allow others to look into your personal information. Remember, your default profile is public so make sure you don’t include private information you don’t want shared like city or age. Better yet, set your profile to private. Make sure your username isn’t easily associated with you offline or on social media so others can’t piece together information about you. Do you really need to post your rides on Facebook? This just opens another complex layer of connection between your personal life and information on Peloton. Remember to use the forms from Peloton to opt out of interest-based advertising.

The Peloton is a wonderful bike requiring a “privacy” update to an old, humorous politeness adage. Today, when you meet someone new, it’s now impolite to ask their age, weight or Peloton leaderboard name.

Peloton Privacy Policy

Peloton Terms of Service

Section 230: Congress Seeks Testimony, Ignores It

Section 230: Congress Seeks Testimony, Ignores It
By EJ Haselden, October 30, 2020

It’s a timeless trope from the era of afterschool specials: misbehaving children stand before Mom and Dad’s kitchen-table duumvirate to answer for their schoolyard shenanigans, but the pretense of discipline soon wears through and the scene devolves into a nasty argument between the grownups. The kids’ real punishment is that they are made pawns and captive audience to a painful display of parental dysfunction. So unfolded this week’s Senate hearing on social media regulation, rhetorically titled “Does Section 230’s Sweeping Immunity Enable Big Tech Bad Behavior?

Section 230 (47 U.S.C. § 230) is a part of the 1996 Communications Decency Act, and it is perhaps best known for shielding social media companies (among others) from liability for content that their users post:

“No provider or user of an interactive computer service shall be treated as the publisher or speaker of any information provided by another information content provider.”

The titular “bad behavior” and “sweeping immunity” that prompted this hearing, however, relate to another, lesser-known protection granted by Section 230, which shields platforms when they choose to filter, fact-check, or otherwise annotate content that they consider harmful and/or inaccurate:

“No provider or user of an interactive computer service shall be held liable on account of any action voluntarily taken in good faith to restrict access to or availability of material that the provider or user considers to be obscene, lewd, lascivious, filthy, excessively violent, harassing, or otherwise objectionable, whether or not such material is constitutionally protected”

The nominal debate here surrounds the “otherwise objectionable” material in that description. Social media companies have chosen to interpret it as any content of questionable origin or veracity that could result in public harm (most recently regarding health advisories, voter suppression, and influence campaigns orchestrated by foreign intelligence services). Their caution stems from lessons learned in the rapid spread of disinformation leading up to the 2016 election, as well as a once-in-a-century pandemic that has seen deadly irresponsible claims espoused by supposed authority figures. Republican lawmakers claim that this content moderation has disproportionately muted conservative voices on social media. Democratic lawmakers, meanwhile, argue that these companies not only have the right, but the responsibility, to assess content based on its potential consequences and without regard for its ideological bent. It should be noted that multiple independent studies and a Facebook internal audit failed to find the alleged anti-conservative bias, but the fact that right-leaning engagement actually dwarfs that of center and left-leaning sources means that flagging only a small fraction of it still provides ample anecdotal evidence of prejudice (which is obviously enough to prompt Congressional hearings).

The administration has called for an outright repeal of Section 230, despite the fact that this would almost certainly lead to more content restrictions as companies adapt to the increased threat of liability. The consensus on Capitol Hill and in Silicon Valley therefore appears to be some amount of targeted Section 230 reform, while keeping the basic framework intact.

Which brings us back to this week’s hearing (or spectacle, or charade, or sham, depending on whom you ask). The Senate Committee on Commerce, Science, and Transportation subpoenaed the CEOs of Google, Twitter, and Facebook, respectively, to testify on behalf of Social Media. Most commentators agree that the face time with Tech Actual was not spent productively. As with those quarrelling parents, it was never really about the kids.

Republicans’ line of soi-disant questioning focused almost entirely on what they consider censorship of conservatives (69 of 81 questions, per the New York Times), as they demanded examples of the same (loosely defined) censorship directed at liberal outlets. Senator Ron Johnson asked the witnesses about the ideological makeup of their respective workforces—rhetorically, because it would be illegal for them to maintain that sort of record—in an effort to prove anti-conservative bias by virtue of microcultural majority (which almost sounded like an argument for some variant of affirmative action).

Democrats, for their part, focused most of their attention on the legitimacy and impact of the hearing itself, expressing concern that it could serve to intimidate social media companies into relaxing moderation policies at a time when the nation is perhaps most vulnerable to manipulative media. The bulk of their more on-topic questioning concerned dis- and misinformation and what actions the companies were taking to combat it ahead of the election. Still, not that much about Section 230 reform.

In keeping with the scripted, postured non-discussion, the most meaningful witness testimony came in the form of prepared opening statements. In those, Pichai reasserted Google’s anti-bias philosophy and cautioned against reactionary changes to Section 230, Dorsey promoted increased transparency and user inclusion in Twitter’s decision-making processes, and Zuckerberg praised Section 230 while inviting a stricter and more explicit rewrite of its provisions (for which Facebook would gladly provide input). Their full statements are available on the committee’s hearing website.

The timing and tenor of this eleventh-hour pre-election partisan screed exchange never inspired much hope for substantive debate, but even so, there was a jarring lack of effort to better understand the pressing and complex problems that Section 230 is still, at this moment, expected to resolve. The reason this matters, the reason it’s so alarming that neither side was terribly interested in the companies’ offers of greater transparency—something we’d consider a win for democracy in saner times—is that our government has abdicated its responsibility of oversight on this topic except in cases where the threat of enforcement can be used as a political weapon.

In the end, it’s probably fitting that Congress used a social media hearing as a platform to amplify and disseminate entrenched views that they had no intention of rethinking.


Photo credits:

Can there truly be ethics in autonomous machine intelligence?

Can there truly be ethics in autonomous machine intelligence?
By Matt White, October 30, 2020

Some would say that we are in the infancy of the fourth industrial revolution, where artificial intelligence and the autonomy it is ushering in are positioned to become life-altering technologies. Most understand the impacts of autonomous technologies as it relates to jobs, they are concerned that autonomous vehicles and robotic assembly lines will relegate them to the unemployment line. But very little thought and conversely research has been done into the ethical implications of autonomous decision making that these systems are confronted with. Although there are far reaching ethical implications with AI and automation there are opposing views of who is truly responsible for the ethical decisions made by an autonomous system. Is it the designer? The programmer? The supplier of the training data? The operator? Or should the system itself should be responsible for any moral or ethical dilemmas and their outcomes.

Take for instance the incident with Uber’s self-driving car a few years ago, where one of its cars killed a pedestrian crossing the road in the middle of the night. The vehicle’s sensors collected data which revealed it was aware of a person crossing in front of its path, but the vehicle took no action and struck and killed the pedestrian. Who is ultimately responsible when an autonomous vehicle kills a person? In this case it was the supervising driver but what happens when there is no driver in the driver seat? What if the vehicle had to make a choice like in the trolley problem, between hitting a child or hitting a grown man? How would it make such a challenge moral decision?

A car parked on a city street

Image Source: Singularity Hub

The Moral Machine, a project from MIT’s Media Lab is tackling just this, developing a dataset on how people would react to particular moral and ethical decisions where it comes to driverless cars. Should you run over 1 disabled person and 1 child or 3 obese people, or should you crash yourself into a barrier and kill your 3 adult passengers to save two men and two women of a healthy weight pushing a baby? However, the thought that autonomous vehicles will base their decisions of morality on crowd-sourced datasets of varying moral perspectives seems absurd. Only those who participate in the process will have their opinions included, anyone can go online and contribute to the dataset without any form of validation, and not withstanding all of the opinions that are not included, there are various moral philosophy theories that could be applied to autonomous ethical decision making that would overrule rules derived from datasets. Does the system follow utilitarianism, Kantianism, virtue ethics, so forth? Although the Moral Machine is considered to be a study in its current incarnation, it uses a very primitive set of parameters (number of people, binary gender, weight, age, visible disability) to allow users to determine the value they place on human life. In real life, real people have more than these handful of dimensions like race, socio-economic status, non-binary gender, and so forth. Could adding these real-life dimensions create a bias that would further de-value people who might meet certain criteria and be in the way of an autonomous vehicle? Might the value placed on a homeless person by less than that of a Wall street stockbroker?

Graphical user interface, diagram

Image Source: Moral Machine

There is certainly a lot to unpack here, especially if we change contexts and look at armed unmanned autonomous vehicles (AUAVs) which are used in warfare to varying degrees. As we transition from remote pilots to fully autonomous war machines, who makes the decision whether to drop a bomb on a school containing 100 terrorists and 20 children? Does the operator absolve himself of any responsibility when the AUAV makes the decision to drop a bomb and kill innocent people? Does the programmer or the trainer of the system bear any responsibility?

As you can see the idea of ethical decision making by autonomous systems is highly problematic and presents some very serious challenges that require further research and exploration. Systems that are designed to have a moral compass will not be sufficient, as they will adopt the moral standpoint of its creators. Training data is likely to be short-sighted, shallow in dimensions and biased based on the ethical standpoints of its contributors. It is obvious that the issue of ethical decision making in autonomous system needs further discourse and research in order to ensure that future systems that we come to rely on can make ethical decisions in a manner that demonstrates no bias; or perhaps we may have to accept that in fact autonomous machines will not be able to make ethical decisions in an unbiased manner.


The Looming Revolution of Online Advertising

The Looming Revolution of Online Advertising
By Anonymous, October 30, 2020

In the era of the internet, advertising is getting creepily accurate and powerful. Large ad networks like Google, Facebook, and more collect huge amounts of data, through which they can infer a wide range of user characteristics, from basic demographics like age, gender, education, and parental status to broader interest categories like purchasing plan, lifestyle, beliefs, and personality. With such powerful ad networks out there, users often feel like they are being spied on and chased around by ads.

Image credit:

How is this possible?
How did we leak so much data to these companies? The answer is through cross-site and app tracking. When you surf the internet, going from one page to another, trackers collect data on where you have been and what you do. According to one Wall Street Journal study, the top fifty Internet sites, from CNN to Yahoo to MSN, install an average of 64 trackers[1]. The tracking can be done by scripts, cookies, widgets, or invisible image pixels embedded on the sites you visit. You probably have seen the following social media sharing buttons. Those buttons, no matter you click them or not, can record your visits and send data back to the social platform.

Image credit:

A similar story is happening on mobile apps. App developers often link in SDKs from other companies, through which they can gain analytic insights or show ads. As you can imagine, those SDKs will also report data back to the companies and track your activities across apps.

Why is it problematic?
Cross-site or app tracking poses great privacy concerns. Firstly, the whole tracking process happens behind the scenes. Most users are not aware of it until they see some creepily accurate ads, and even if they are aware of it, the users often have no idea how the data is collected and used, and who owns it. Secondly, only very technically sophisticated people know how to prevent this tracking, which can involve tedious configuration or even installation of other software. To make things worse, even if we can prevent future tracking, there is no clue how to wipe out the already collected data.

In general, cross-site and app activities are collected, sold, and monetized in various ways with very limited user transparency and control. GDPR and CCPA have significantly improved this. Big trackers like Google, Facebook, and more provide dedicated ad setting pages (1, 2), which allow users to delete or correct their data, to choose how they want to be tracked, etc. Though GDPR and CCPA gave users more control, most users stay with the default options and cross-site tracking remains prevalent.

The looming revolution
With growing concerns of user privacy, Apple took a radical action to kill the cross-site and app tracking. Over the past couple of years, Apple gradually rolled out the feature of Safari Intelligent Tracking Prevention (ITP)[2], which curtailed companies’ ability to install third-party cookies. With Apple taking the lead, Firefox and Chrome browsers are also launching similar features as ITP. In the release of IOS 14, Apple brought a similar feature as ITP to Apps world.

Image credit:

While at the first glance this may sound like a long-overdue change to safeguard users’ privacy, when delving deeper, it could create backlashes. Firstly, internet companies collect data in exchange for their free services: products like Gmail, Maps, Facebook are all free of use. According to one study from VOX, in an ad-free internet, the user would need to pay $35 every month to compensate for ad revenue[3]. Some publishers even threatened to proactively stop working on Apple devices. Secondly, Apple’s ITP solution doesn’t give much chance for users to participate. Cross-site tracking can in general enable more personalized services, more accurate search results, better recommendations, etc. Some uses may choose to opt-in to allow cross-site tracking for this purpose. Thirdly, Apple’s ITP only disabled third party cookies, and there are many other ways to continue the tracking. For example, ad platforms can switch to device-id or “fingerprint” the users by combining IP address and Geolocation.

Other radical solutions were also proposed, such as Andrew Yang’s Data Dividend Project. With many ethical concerns and the whole ads industry at stake, it is very interesting to see how things play out and what other alternatives are proposed around cross-site and app tracking.



We see only shadows

We see only shadows
By David Linnard Wheeler, October 30, 2020

After the space shuttle Challenger disaster (Figure 1) on January 28th, 1986, most people agreed on the cause of the incident – the O-rings that sealed the joints on the right solid rocket booster failed under cold conditions (Lewis, 1988). What most failed to recognize, however, was a more fundamental problem. The casual disregard of outliers, in this case from a data set used by scientists and engineers involved in the flight to justify the launch in cold conditions, can yield catastrophic consequences. The purpose of this essay is to show that a routine procedure for analysts and scientists – outlier removal – not only introduces biases but, under some circumstances, can actually lead to lethal repercussions. This observation raises important moral questions for data scientists.

Figure 1. Space shuttle Challenger disaster. Source: U.S. NEWS & WORLD REPORT

The night before the launch of the space shuttle Challenger, executives and engineers from NASA and Morton Thiokol, the manufacturer of the solid rocket boosters, met to discuss the scheduled launch over a teleconference call (Dalal et al. 1989). The subject of conversation was the sensitivity of O-rings (Figure 2) on the solid rocket boosters to the cold temperatures forecasted for the next morning.

Figure 2. Space shuttle Challenger O-rings on solid rocket boosters. Source:

Some of the engineers at Thiokol opposed the planned launch. The performance of the O-rings during the previous 23 test flights, they argued, suggested that temperature was influential (Table 1). When temperatures were low, for example between 53 and 65∘F, more O-rings failed than when temperatures were higher.

Table 1: Previous flight number, temperature, pressure, number of failed O-rings, and number of total O-rings

Some personnel at both agencies did not see this trend. They focused only on the flights where at least one O-ring had failed. That is, they ignored outlying cases where no O-rings failed because, from their perspective, they did not contribute any information (Presidential Commission on the space shuttle Challenger Accident, 1986). Their conclusion, upon inspection of data from Figure 3, was that “temperature data [are] not conclusive on predicting primary O-ring blowby” (Presidential Commission on the space shuttle Challenger Accident, 1986). Hence, they asked Thiokol for an official recommendation to launch. It was granted.

Figure 3. O-ring failure as a function of temperature

The next morning the Challenger launched and 7 people died.

After the incident, President Regan ordered William Rogers, former Secretary of State, to lead a commission to determine the cause of the explosion. The O-rings, the Commission found, became stiff and brittle in response to cold temperatures, thereby unable to maintain the seal between the joints of the solid rocket boosters. The case was solved. But a more fundamental lesson was missed.

Outliers and their removal from data sets can introduce consequential biases. Although this may seem obvious, it is not. Some practitioners of data science essentially promote cavalier removal of observations that are different from the rest. They focus instead on the biases that can be introduced when certain outliers are included in analyses.

This practice is hubristic for at least one reason. We, as observers, do not, in most cases, completely understand the processes by which the data we collect are generated. To use Plato’s allegory of the cave, we just see the shadows, not the actual objects. Indeed, this is one motivation to collect data. To remove data without defensible justification (e.g measurement or execution error) is to claim, even if implicitly, that we know how the data should be distributed. If true, then why collect data at all?

To be clear, I am not arguing that outlier removal is indefensible under any condition. Instead, I am arguing that we should exercise caution and awareness of the consequences of our actions, both when classifying observations as outliers and ignoring or removing them. This point was acknowledged by the Rogers Commission in the statement: “a careful analysis of the flight history of O-ring performance would have revealed the correlation in O-ring performance in low temperature[s]” (Presidential Commission on the space shuttle Challenger Accident, 1986).

Unlike other issues in fields like data science, the solution here may not be technical. That is, a new diagnostic technique or test will likely not emancipate us from our moral obligations to others. Instead, we may need to iteratively update our philosophies of data analysis to maximize benefits, minimize harms, and satisfy our fiduciary responsibilities to society.



  • Dalal, S.R., Fowlkes, E.B., Hoadley, B. 1989. Risk analysis of the space shuttle: Pre-Challenger prediction of failure. Journal of the American Statistical Association.
  • Lewis, S. R. 1988. Challenger The Final Voyage. New York: Columbia University Press.
  • United States. 1986. Report to the President. Washington, D.C.: Presidential Commission on the Space Shuttle Challenger Accident.

A Short Case for a Data Marketplace

A Short Case for a Data Marketplace
By Linda Dong, October 23, 2020

In today’s digital, internet age, data is power. Using data, Netflix can generate recommendations, Facebook can tailor advertisements, and Visa can detect fraud. Google can predict your search phrase, Alexa can prompt you to restock household products, and Wealthfront can create your personalized retirement path, taking into account individual savings, spending, and investment goals.

Not only are data products powerful, but they also tend to be lucrative. Data products tend to be high-margin because the cost of goods sold is so low: companies generally do not pay users to collect their data. Whether companies are channeling these lucrative products into customer savings (by making other services free) or purely amassing these gains as company profits, the central question remains: should data collection be free?

– – – –

Image Source: Robinhood

Just like oil, labor, and water, data is a commodity. True – it happens to be a non-finite commodity that humans can create; however, it is also a raw material used to create sold products. Just as a bar of chocolate is made from many cacao beans, so is a web marketing analytics insight crafted from many individual browser interactions.

If you’re a chocolate maker, you’ll likely have a handful of cocoa suppliers. If you’re a web analytics company, you’ll likely have millions of users providing a little data each. However, the simple facts that your suppliers are: (i) distributed, and (ii) orders-of-magnitude more numerous do not constitute adequate justification for not compensating them.

The logistics might be simpler than you think. The idea of web-based microtransactions is not new; little known to most people, the HTTP status code of 402 [2] has been reserved for “Payment Required” use-cases for a while. While this was meant to power the opposite flow (for a requestor to present payment to access content, rather than a content provider to pay a visitor for data gathered during an interaction), this nevertheless brings us one step closer to a future where browsers might contain native wallets that can enable hundreds of microtransactions per hour.

Image Source: Mozilla Foundation

– – – –

Regulation lags behind innovation. While privacy concerns have culminated in new statutes regulating how entities should collect and use data, most protections today concern only data subjects’ rights and obligations. They have not yet evolved to address questions of compensation and profit-sharing.

Some of this is due to a lack of pressure from the general public, which, in turn, results from a lack of awareness regarding the value of data, as well as opacity regarding how companies collect and use data. Some of this is due to coercive user policies that foist consent of data collection. And some of it is due to the lack of a clear solution and path forward.

What if we reimagined the concept of privacy in an economic, rather than rights-based, context? Could browsers compete for users by providing more sophisticated privacy customizations? Could they better enable user control to select and disclose limited and specific data in exchange for monetary earnings? Could they auto-respond to pesky cookie preference pop-ups? Could they broker a new type of data marketplace between companies who want to buy data and users who want to sell data? Are these features valuable enough for them to charge users a fee, and would the public pay?

I, for one, would.



All about Grandma

All about Grandma
By Anonymous, October 23, 2020

My grandma Diane lives in Tulsa, OK on a small farm with one of my aunts, Heather, my uncle Carl, my two cousins Carl III and Toby, and my uncle Carl’s mom Bethanne. They raise goats and fowl, have a couple house dogs and some cats that come and go as they are wont to. The farm has a pond that the dogs swim in sometimes. These are things that I know because they’re my family. I’ve spent countless Thanksgivings and Christmases and been to several weddings with them.

What I didn’t know until today was that grandma is a registered Republican and Heather and Carl are registered Democrats. I didn’t intend to find this information. Rather with the 2020 election on the mind and news media covering early voting, I decided to do a cursory search about what voting information exists in the public domain. It took less than a minute to stumble onto grandma’s voter registration on the data aggregator:, where voter registration records are available in searchable form for 16 states, Oklahoma included.


Of course, voter registration records have been public for a long time, but before sites like it took real effort to go peruse voter rolls. While the process differed from state to state, you typically had to go to the local county office or the secretary of state’s office to formally request access. These barriers meant only the most interested of actors, like political parties or investigative journalists, took the time to do it. Now, this information is available almost accidentally to anyone with an internet connection anywhere in the world.

While presence of the internet makes access to voter records fundamentally different than in the past, what makes it concerning now is the degree to which political affiliation has become enmeshed with personal identity, particularly for more extreme actors on both ends of the political spectrum, some of which threaten violence.

To make matters much worse, connects voter registration information to sites that conduct extensive background searches – and – all without transparent labeling that prominently displayed buttons will trigger a background search.

Truthfinder conducts a search of property records, criminal records, bankruptcy records, social media accounts, etc. While truthfinder exploits public records databases for much of this information, its site is set up to make use of users’ interactions to reinforce algorithmic conclusions about which records are related to the actual person in question. Presenting follow-on questions in a way that most users are likely to think that the site is trying to isolate a particular individuals’ records, the questions ask users to confirm or deny algorithmically generated relationships with other records it has come across, thereby strengthening the person-matching algorithms that form the core of those sites.

After asking several such questions the site prompts users to search for more people – including people with which the person likely has no personal connection such as ‘celebrities’. Truthfinder’s charges for its services, and its model invites people to conduct ‘unlimited’ searches over a month, rather than purchase individual reports. Furthermore, the generated report contains information not just about the person you’ve gone down a rabbit hole searching for but also about several people that truthfinder has determined are related to the person you’ve searched for.

It is through this that I learned, despite having known grandma all my life, that a lien was put on the farm last year, that she received her social security number and card around the time she turned 18 rather than at birth, and the VIN number on her Toyota Sequoia. While she doesn’t have a criminal record, several people in neighboring states with similar names do. While I know those people aren’t her, someone who doesn’t know her as well may not and might mistakenly come to the conclusion that my grandma has a problem with shoplifting. Truthfinder’s presentation of this information makes this outcome more likely by exaggerating and not disclaiming that the information may not be linked to the right person, as happened in this case. This is all in addition to a litany of phone numbers, email addresses, social media accounts, amazon wish lists, and the addresses she has lived at or co-signed for going back decades. A couple more clicks yields similar information about all of my Oklahoma relatives over the age of 18.

While voter registration records and for that matter each of the other sets of public records used by these sites historically may have had valid reasons for being in the public domain, the internet has enabled aggregation across these datasets in a way that it literally takes less than 10 minutes to stumble unintentionally from a person’s voter record to knowing some of the most personal aspects of their lives like bankruptcy and criminal records, and not much longer to unearth similar information about nearly everyone they are related to.

This is made all the more troubling by the devolution in public discourse and increase in othering as personal identities of all sorts and stripes are increasingly coalescing into constellations around bipolar political affiliations. This is all paired with increasing rhetoric of political violence. Americans should consider carefully what information is put into the public domain, and should advocate to their state legislatures to curtail the publication and aggregation of such data sources.

To Broadcast, Promote, and Prepare: Facebook’s Alleged Culpability in the Kenosha Shootings

To Broadcast, Promote, and Prepare: Facebook’s Alleged Culpability in the Kenosha Shootings
By Matt Kawa | October 9, 2020

The night of August 25, 2020 saw Kenosha, WI engrossed with peaceful protests, riots, arson, looting, and killing in the wake of the shooting of Jacob Blake. In many ways Kenosha was not unlike cities all around the country facing protests both peaceful and violent sparked by the killing of George Floyd and others by police forces. However, Kenosha manages to distinguish itself by the fact that in the midst of the responses to the untimely death of these individuals, more individuals were killed. Namely, two protestors were shot and killed, and another injured, by seventeen-year-old Antioch, IL resident, Kyle Rittenhouse.

Rittenhouse was compelled and mobilized to cross state lines, illegally (as a minor) in possession of a firearm, to “take up arms and defend out City [sic] from the evil thugs” who would be protesting, as posted by a local vigilante militia that calls themselves the Kenosha Guard. The Kenosha Guard set up a Facebook event (pictured below) entitled “Armed Citizens to Protect our Lives and Property” in which the administrators posted the aforementioned quote (also pictured).

In addition to egregious proliferation of racist and antisemitic rhetoric, the administrators of these Facebook groups blatantly promote commission of acts of violence against protestors and rioters, not only via the groups per se, but on their personal accounts as well.

On September 22, a complaint and demand for jury trial was filed by the life partner of one of Rittenhouse’s victims and three other Kenosha residents with the United States District Court for the Eastern District of Wisconsin against shooter Kyle Rittenhouse, Kyle Matheson, “commander” of the Kenosha Guard, co-conspirator Ryan Balch a member of a similar violent organization called the “Boogaloo Bois,” both organizations per se, and most surprisingly, Facebook, Inc.

The complaint effectively alleges intentional negligence on behalf of Facebook for allowing the vigilantes to coordinate their violent presence unchecked. The claim states that Facebook “provides the platform and tools for the Kenosha Guard, Boogaloo Bois, and other right-wing militias to recruit members and plan events.” In anticipation of the defense of ignorance, the complaint then cites that over four hundred reports were filed by users regarding the Kenosha Guard group and event page expressing concern that members would be seeking to cause violence, intimidation, and injury. Reports containing speculation which, as the complaint summarizes, ultimately did transpire.

While Facebook CEO Mark Zuckerberg did eventually apologize for his platforms role in the incident, calling it an “operational mistake” and removing the Kenosha Guard page, the complaint claims that as part of an observable pattern of similar behavior, Facebook “failed to act to prevent harm to Plaintiffs and other protestors” by ignoring material numbers of reports attempting to warn them.

Ultimately, the Plaintiffs’ case rests on the Wisconsin legal principle that, “A duty consists of the obligation of due care to refrain from any act which will cause foreseeable harm to others . . . . A defendant’s duty is established when it can be said that it was foreseeable that [the] act or omission to act may cause harm to someone.” Or, simply put, Facebook had a duty to “stop the violent and terroristic threats that were made using its tools and platform,” including through inaction.

Inevitably, defenses will be made on First Amendment grounds, claiming that the Kenosha Guard and Boogaloo Bois, and their leaders and members, were simply exercising their right to freedom of speech, a right Facebook ought to afford its users. However, the Supreme Court has interpreted numerous exceptions into the First Amendment including quite prominently forbidding of incitement to violence. Whether Facebook has a moral obligation to adjudicate First Amendment claims is less clear cut. But the decision must be made in the modern, rapidly evolving world of social media as to what the role of the platform is in society and what ought or ought no be permissible enforcement of standards across the board.

The full text of the complaint can be found here.