Cross-Border Data Transfers & Privacy

Cross-Border Data Transfers & Privacy
By Anne Yu | March 22, 2019

Cross-border Data Transfers (CBDTs):
Personal data collected from one location are transferred to a third country or international organization.

Nowadays global economy depends on CBDTs. From my investigation modern governments recognize,

importance of moving data freely to wherever those data are needed…the economic and social benefits of protecting the personal information of users of digital trade.

But there is also a trend for countries to restrict the data exchange and make data localized.

CBDTs restrictions fall into two general categories:

1. Privacy Regulations: the transfer process subject to compliance with a set of conditions, including conditions for onward transfer. Once conditions are met, transfers are allowed. It typically covers a variety of matters [1], if being overlooked or jepardized, operators will be imposed with legal and civil obligations.

    • Data subject consent
    • Data anonymization
    • Breach notification
    • Appointment data protection officers
        • One government agency with enforcement authority
        • One third-party accountability agent

2. Data Localization: ban on transferring data out of the country, or require the organization to build or use local infrastructure and servers.

Cross-Border Privacy Regulations

CBDTs rule and policy are varied depending on where the data come from and go to. The type of data also matter in this case. For example,

  • The EU GDPR generally prohibites CBDTs of personal data outside of EU territory, unless to a third country with a set of conditions are fulfilled.
    The USMCA promotes cross-border data flows with less strict conditions as well.
    The APEC CBPRs established a principles-based model for national privacy laws that recognized the importance of effective privacy protections that avoid barriers to information flows.
  • Within which, each APEC member was encouraged to implement their domestic privacy laws based on the principles in this framework, which seems less strict.
  • Other countries, like China, South Korean might have even stricker rules.

When CBDTs Happens?

Surprisedly, CBDTs occurs in many daily occasions. For example,

  • Cooperate emails, customer suppport communications.
  • Data analysis to optimize global logistics.
  • Outsource services.
  • HR for global workforces.We will discuss each of these cases using examples.
  • Conduct global researches.
  • Use Internet to query, post or update information locating overseas

There are other unexpected situations, for example, accidental or intentional data breaches.

Some Case Studies

Overall, one can query  for a better understanding. The service can also be used to compare laws from two countries side by side.

European Union (EU) Countries: General Data Protection Regulation (GDPR)

GDPR contains a set of rules for protecting the personal data of all EU residents plus visitors. GDPR also provides strict law protecting data transfering accross borders, including significant fines and penalties for non-compliant data controllers and processors. It has been updating new protections until 2018, especially focusing on EU citizens’ data. For more details, check GDPR Articles 44, and 45 – 49, which lays out conditions data can be transferred beyond EU/EEA.

Canada and Mexico (USMCA)

In 2018 the U.S., Mexico and Canada announced a new trade agreement United States-Mexico-Canada Agreement (USMCA), which is built on a chapter on digital trade from APEC Cross-Border Privacy Rules (APEC CBPRs), aims to

> “adopt or maintain a legal framework that provides for the protection of the personal information of the users.

USMCA formally recognizes the “APEC CBPRs” within their respective legal systems. [2]

Asian (APEC CBPRs)

The APEC CBPRs system is developed by the 21 APEC member economies as a cross-border transfer mechanism and comprehensive privacy program for private sector organizations to enable the accountable free flow of data across the APEC region. The APEC CBPRs system has now been formally joined by the United States, Canada, Japan and Mexico.

Comparison of CBPRs and GDPR

The rest of the world

With most countries following either GDPRs or APEC CBPRs there are still some countries imposing their own systems to regulate stricker on CBDTs.


Russian enacted Data protection (privacy) Laws to permit CBPRs as long as operator ensures that the recipient state provides adequate protection of personal data.

In 2017, Cybersecurity Law of the People’s Republic of China (CSL) was published including policies invovling cross-border data transfer. It is much stricker [3]. In general,

it requires Critical Information Infrastructure (CII) to localize data within the territory of China and all network operators to conduct security assessments prior to the data export.

South Korean
It is stricker as well with new constraints on CBDTs. In March 2019 Amendments to IT Networks Act took effect. The most important point is that it imposes to appoint a local agent responsible for Korean data privacy compliance regarding CBPRs [4].

Data Localization
On the other extreme side, banning data transfers, is called Data Localization or Data Residency,

which regulates means data about a nations’ citizens or residents be collected, processed or stored inside the country. The requirement for localization has been increased after the incident of ex CIA – *Edward Snowden*, who leaked highly classified information from NSA in 2013. Also goverments in Europe and across the world are starting to realize the perils brought by data flow through technology. The emerging trend is becoming to enforce data are consumed on the spot before serve upper applications. Germany and France are the first to approve data localization laws, following by EU in 2017.

Data Types Matter as well
Each country could have its own laws for diffent types of data. For example, Australia regulates its health records, Canada restricts personal data from public service providers, and China restricts more including personal, business, and financial data [5].

[1] Top 10 operational impacts of the GDPR: Part 4 – Cross-border data transfers

[2] APEC Cross-Border Privacy Rules Enshrined in U.S.-Mexico-Canada Trade Agreement

[3] China: New Challenges Ahead: How To Comply With Cross-Border Data Transfer Regulation In China

[4] Korean data law amendments pose new constraints for cross-border online services and data flows

[5] Data localization

      • [GDPR]: General Data Protection Regulation
      • [CBDTs]: Cross-border Data Transfers
      • [EU]: European Union
      • [APEC]: Asia-Pacific Economic Cooperation Apec
      • [APEC CBPRs]: APEC Cross-Border Privacy Rules
      • [USMCA]: United States-Mexico-Canada Agreement

Enforcing Antitrust Laws in Tech

Enforcing Antitrust Laws in Tech
By Anonymous | March 14, 2019

A lot of the contemporary focus on antitrust regulation with respect to the technology sector was first sparked by the Justice Department’s case against Microsoft in 2000. The case brought up questions about whether our legal framework was strong enough to oversee the quickly growing and evolving tech industry. Because technology companies are so different to companies involved in the first antitrust cases in the early 20th century, some measures and approaches have become inadequate for their proper regulation. In the long run, this may have concerning implications to consumers and market participants. So, how are large tech companies different from large monopolies of the past?

Intellectual Property

The first major difference is that tech companies’ most valuable assets are frequently their intellectual property as opposed to physical property. Given the intangible nature of intellectual property, it can be difficult to value. Regulators then have a hard time comparing one firm to another or calculating market metrics that help make decisions about going after a firm for antitrust. In the eyes of regulators, however, intellectual property is just another form of property.

Network Effects

Additionally, many technology companies depend on network effects for their services to be viable. This means that competitors of a certain service are competing for the entirety of the market instead of just a share of it. In other words, a traditional firm may sell a product and a consumer will buy that product without considering what other consumers do. However, in the example of a social media company, a potential user will consider signing up only if their friends are also using the service.

Dynamic Markets

Identifying substitute products can also be a complex proposal for antitrust regulators in the tech sector. Many distinct services can appear to be fulfilling different needs in a market but, in a matter of one update, can suddenly be competing directly. So, while it’s easy to discern that Coke and Pepsi are substitute products and competitors, services like Instagram and Snapchat are not so clear cut. These two social media technology companies started out offering different services, but their offerings have only recently started to overlap. When studying antitrust cases in the technology sector, regulators must make careful case by case analysis and tease out the details of each market.

How to measure effects

Lastly, recent antitrust claims have been evaluated by judges by considering whether a company’s practices have raised consumer prices. This means some tech companies have been permitted to grow while being overlooked by regulators because many of the services they offer are free to consumers.

Implications for individuals and market participants

As companies become large technology conglomerates, they increase their ability collect larger amounts of data from different aspects of our lives. So, a company like Alphabet has information from a user’s email, location data from their cellular phones, and browsing data collected through Chrome. This increased amount of data collection leaves consumers exposed to potential breaches by the companies, and violations of privacy.

Consolidation has also made it difficult for smaller startups to enter the market and innovate by creating new services. Many times, the choice that these smaller firms have is to either accept an acquisition by a much larger firm, or have their service cloned by the larger firm with more resources. In situations where firms have created markets, like advertising platforms, these large companies might be inclined to give unfair special pricing to certain businesses. In addition, vertically integrated tech companies may prioritize their own products to users within other products.

There is still much research left to be done on the effects of large tech companies on consumers, businesses and the markets in which they participate. What is clear is that antitrust regulators need a different approach to overseeing the industry. As we continue to have conversations about the ideal way to regulate large technology companies, we should consider different metrics than we have used in the past to measure their potential negative effects.

Works Cited

  • Baker, Jonathan B. “Can Antitrust Keep Up?: Competition policy in high-tech markets” The Brookings Institution,, (December 1, 2001).
  • Finley, Klint. ìLegal scholar Tim Wu says the US must enforce antitrust lawsî, Wired Magazine,, (March 11, 2019).
  • Hart, David M. “Antitrust and Technological Innovation.” Issues in Science and Technology 15, no. 2 (Winter 1999).
  • “2018 Antitrust and Competition Conference.” Stigler Center for the Economy and the State,, (April 19-20, 2018).

Image source
Image 1: LA Times

Trust me, I’m a Company

Trust me, I’m a Company
By Mumin Khan | March 12, 2019

With great economic prosperity comes great consequences. Americans hold the tenants of capitalism and growth closely as a fundamental part of their identity. The rewards America reaps by obsessing about profits are undeniable; America is the single largest economy in the world by a number of metrics. Americans enjoy some of the best wages in the world, have access to the best post secondary education, and enjoy a reasonably high quality of life. Simply earning over $32,400 per year puts you in the top 1% globally; the median income for US households in 2017 was $61,372. This same profit addiction, that has taken Americans to such highs, has also brought them new lows. The legislative climate in the United States has all but guaranteed that corporations can play fast and loose with the lives of consumers and face little to no consequences when things go south. The larger the corporation is, the more they can get away with.

On September 7th, 2017, Equifax, one of the largest American credit bureaus, who collect information on an estimated 820+ million people and 91+ million businesses, disclosed that they were hacked several months before. Over 143 million people had sensitive information, including names, addresses, dates of birth, Social Security numbers, and driver’s license numbers, stolen from Equifax over a period of 76 days. Obtaining some or all of this information would allow a malicious actor to assume someone’s identity for financial gain and wreak havoc on their life.

Pictured above, Revenue of the largest credit bureaus in millions of dollars.

The method of intrusion was a known vulnerability in Apache Struts , a web technology that powered Equifax’s dispute portal. This same vulnerability, dubbed CVE-2017-9805 was found in Equifax’s system by the United States Computer Emergency Readiness Team in March 2017 and disclosed to them. Internally, Equifax circulated the information using an email list of system administrators. Unfortunately, the list was out of date and certain key SA’s did not get the notice to update Struts. To make matters worse, an expired certificate allowed hackers to bypass automatic malicious activity detection software throughout the 76 day breach. Once inside, the hackers found that individual databases were not isolated from one another, this allowed them to access more personal information. During this process, the hackers gained access to a database of unencrypted credentials which then allowed them to query against even more user information. More information can be found in the following GAO Report: Actions Taken by Equifax and Federal Agencies in Response to the 2017 Breach.

Given the facts of the hack, it’s hard to view Equifax as a victim alongside the 143 million people that had their information stolen. Rather, their systematic failures to protect the private data of users who often don’t have a say as to what Equifax collects on them makes them an accomplice to the hack. Yet nearly two years after the hack was initiated, no charges against a single Equifax employee were filed. No fines were levied on the corporation. No legislative action has been taken to audit and monitor Equifax in the future. In fact, the opposite happened: Congress passed Senate Bill 2155 which shielded Equifax from class action lawsuits.

Senate Banking Committee member Sen. Mike Crapo, R-Idaho questions Wells Fargo Chief Executive Officer John Stumpf, on Capitol Hill in Washington, Tuesday, Sept. 20, 2016, during the committee’s hearing. Stumpf was called before the committee for betraying customers’ trust in a scandal over allegations that employees opened millions of unauthorized accounts to meet aggressive sales targets. (AP Photo/Susan Walsh) ORG XMIT: DCSW129

Pictured above, Representative Mike Crapo, sponsor of S. 2155

Equifax abdicated its responsibility to guard the data that it collects on people. Why aren’t there regulatory requirements on private companies that collect extremely sensitive personal information on American citizens? Where are our institutions that hold these organizations accountable? Someone will always pay for data breaches like this one. As of now, only the American consumer has paid. Until we start guaranteeing each American’s right to the protection of their data, these types of incidents will continue to happen.

Where Incentives Collide: Maintaining Privacy in P2P Lending

Where Incentives Collide: Maintaining Privacy in P2P Lending
By Kyle Redfield | March 10, 2019

Peer to peer lending is a growing channel for borrowing funds. Peer to peer lending apps and websites act as a marketplace for individual borrowers and individual lenders to distribute funds outside typical venues. The concept is simple: a prospective borrower will log into the app and request funds paid back over some duration of time. The app will decide, based on available information about the borrower, an appropriate interest rate for the situation. In reality, the system is a more complex and, depending on your point of view, nefarious.

A 2010 study investigated whether borrowers who offered lenders more of their personal data received more favorable interest rates in return. By investigating about 600 lending projects, they found that, in some cases, releasing more information does tend to lower the borrower’s interest rate. This finding has been supported anecdotally as well. A 2011 legal review of the issues in online lending finds cases where online lenders rigorously interview, request further information from, or otherwise use some fancy Googling to re-identify the originally de-identified borrower.

Perhaps these results are unsurprising. After all, information asymmetry has plagued financial institutions long before the Internet existed. However, as Bohme and Potzsch conclude, it is more often those with economic disadvantages that might seek peer to peer lending solutions. Therefore, “one form of inequality is replaced by another, potentially more subtle one: socially disadvantaged members of society are more likely to act as borrowers and thus are in a worse position to protect their informational privacy” (Bohme and Potzsch 2010). So, while peer to peer lending may promote economic parity, so too does it exacerbate privacy disparity.

But this need not be a tale of outrage and despair. Companies such as Uber or OkCupid have been exploiting the privacy of individuals for the sake of profit without compensation for years. On the other hand, we may be glimpsing into the future of privacy. By receiving lower interest rates or higher probabilities of receiving a loan for disclosing more information, peer to peer lending apps are implicitly compensating its users for disclosing their personally identifiable information (PII).

Increasingly, public expectation has been trending away from any prospect of privacy (take a look at page 86). Just today, my friend proclaimed “I don’t really care that Facebook has my data, I just wish I could get something for it”. Peer to peer lending offers insight into exactly that opportunity. Rather than attempting to regulate and outwit at every turn the massive and intelligent organizations that face every incentive to exploit users’ privacy, governments can acknowledge the long standing tradition of Pareto efficiency. In the spirit of Pareto efficiency, simply by granting one the right to own and sell your own privacy – particularly in the online space – the free market can begin to organize around a new regime.

The cost to the individual may well be trivial for releasing the rights to oneself, but – hey – at least I could get something for it.



Guilty or Innocent? The Use of Algorithms in Bail Reform

Guilty or Innocent? The Use of Algorithms in Bail Reform
By Rachel Kramer | March 10, 2019

There are efforts being made across the country to reform our criminal justice system; to incarcerate fewer people of color, to revise or remove the system of bail, to change how drug and other non-violent offences are treated in our courts. One major path of reform many states are traveling is through technology: implementing risk assessment algorithms as a way to mitigate human bias, error, or inability to systematically compile many pieces of information on a person in order to statistically infer a range of specific outcomes. These risk assessment algorithms, while containing a lot of diversity in and of themselves, all perform the same basic function: they take past information on a defendant and, using machine learning, predict the likelihood of a future event that the court wants to prevent, such as fleeing the state, not showing up to court dates, or being arrested for violent or non-violent crimes after pretrial release. Judges use these risk score outputs to decide a variety of outcomes for the defendant at every stage of the criminal justice system, including, as this post focuses on, pretrial decisions such as bail and how the defendant is monitored before sentencing.

The purpose of bail as it stands now is to assess the dangerousness of the defendant to the public if they are released back into society, and to set bail that is in line with that dangerousness. In extreme cases, the court can withhold bail and mandate pre-trial imprisonment. The original purpose of bail was to incentivise defendants to show up to their court dates and to discourage the accused from fleeing the jurisdiction. Over the years, however, the purpose and goal of bail has shifted. While the civil rights movement shepherded in a brief period of bail reform–meant to remedy the poverty-driven high pretrial incarceration rates of populations unable to make bail–sentiments reversed during the conservative Nixon and Reagan eras, landing us back in assessments of a defendantís dangerousness to the public as the primary goal of bail hearings.

Assessment of the threat a defendant poses to the general population is a highly subjective matter, and it is no surprise that most states are beginning to favor a system that is statistical and backed by data. Machine learning represents a shining chance at an objective and neutral decision-maker — or so goes the prevailing sentiment of many industries at the moment. If the criminal justice system is desperate for reform, why should we turn to a decision-maker trained with data from the very system we are trying to reform? John Logan Koepke and David G. Robinson, both scientists at a technology think tank, ask this question in their comprehensive article, “Danger Ahead: Risk Assessment and the Future of Bail Reform.”

As every industry moves toward machine learning and AI applications, the question should be not only how we use the algorithms, but if we should use them. It is well-publicized that our criminal justice system is biased against black and/or impoverished communities. Risk assessment algorithms repeat and enhance these biases because the algorithms learn patterns inherent to the larger justice system, even ones we aren’t aware of enough to name or address. Most risk assessment programs donít use race as an input, but there are so many other predictors of race in our lives and communities that the system learns to disenfranchise based on race even without the explicit information. There is a high chance that algorithms trained on data from a broken system will lead to what the authors call “zombie predictions,” or predictions that reanimate biases or overestimations of risks of certain defendants (usually black) and underestimations for others (usually white). Even if the bias in the training data were to be alleviated or worked around through data science procedures such as bootstrapping or feature weighting, the fix is not strong enough for many reformers, including Koepke and Robinson. Making our punishment systems more efficient ultimately does little to reform the system as a whole.

Koepke and Robinson suggest that the system can and should be reformed without algorithms. Such reform ideas include automatic pretrial release for certain categories of crime, different standards for imposing pretrial detention, or replacing or cabining money bail entirely, like the recent law in California ruling cash bail unconstitutional. Many pretrial arrests are due to infringements of pretrial restrictions set out in bail hearings, and failure to show up to court dates is often due to the defendant being unable to miss work, find childcare, or access transportation. Simple processes can alleviate these problems, such as court-funded ride services or text reminders about appointments. Reforms at the police level are also vital, though outside the scope of this post.

If machine learning algorithms are here to stay in our justice system, which is likely the case, there are actionable ways to improve their performance and reduce harm and injustice in their use. Appallingly, many of the algorithms in use have not been externally validated or audited. Beyond guaranteeing accountability in the software itself, courts could follow up on defendants to compare the systemís predictions against the real outcomes specific to their jurisdiction. This is especially important to repeat after any bail reforms have been put into place. The algorithms need to be trained on the most recent local data available, and importantly, data coming from an already reformed or reforming system. Recently in New York, to improve their flawed bail system, the city’s Criminal Justice Agency announced they would train an algorithm using data–but this data came from the stop-and-frisk era of policing, a policy now ruled unconstitutional. Egregious oversights like these can further marginalize already vulnerable populations.

Our focus in data science has generally been on improving and refining the tools of our trade. This paper, along with other reform movements, invite data science and its associated fields to take a step back in the implementation process. We need to ask ourselves what consequences an algorithmic or machine learning application could engender, and if there are alternative ways to address change in a field before leaning on technologies whose impacts we are only just beginning to understand.



What your Fitness Apps say about you: Should you be worried?

What your Fitness Apps say about you: Should you be worried?
By Laura Chutny | March 10, 2019

If you run or cycle, meditate, track your diet or sleep, you probably use Strava, Garmin, MyFitnessPal, Fitbit or one of the dozens of other health and fitness applications. When you signed up for those services, did you read the privacy policy and determine what might happen to your personal data? If you did, are you concerned about the fate of your data? It is a concern that many of us have, but privacy policies are often long, obtuse and often dreadfully boring to read. Knowing what those companies may or may not do with your very personal data, however, is important.

(Image courtesy United News Desk)

Health and Fitness apps are among the top 10 categories in both the Google Play and Apple App] stores,. Many mobile devices come with at least one health and fitness app preinstalled (e.g. Apple’s Health).

Data and Privacy Concerns

Health and fitness apps take data from you and store it in your account in the cloud. This data includes things like your weight, height, birth date, blood pressure, pulse, location during exercise, menstrual cycle, diet, and many more. By installing one of the apps, you have consented to share your data with the company that created the app.

In some cases, your data becomes part of a wider set of data through aggregation, as in Strava’s Heat Maps. This particular feature has recently come under fire for allowing re-identification of location. In this particular instance, the heat map highlighted the location of military bases after soldiers logged their exercise through Strava which potentially put soldiers at risk. Those soldiers most likely were not aware their data in Strava would allow this type of reverse engineering. Single people may also be put at risk if they can be tracked to their home, gym or workplace from their publicly available data.

(Image courtesy Mashable)

In other cases, your data may be shared with analytics companies, advertisers and social networks. Even if your data is not shared, the security of your data within the application itself may be at risk, with no standards of practice or regulation on how applications use, store or transfer data. Recently, one company has begun to use your data to adjust your life insurance policy. It is not inconceivable then that unregulated sharing of your personal information with health insurance providers may affect your eligibility or premiums. Maybe you should rethink that third beer on Fridays!

Dimensions of Privacy

Daniel Solove created a Taxonomy of Privacy that we can use to evaluate the risks presented by health and fitness applications. Many of the risks surrounding surveillance, interrogation, and security have been discussed.

Unwanted disclosure and exposure could be damaging to an individual. For example, imagine a young woman whose menstrual cycle tracker in her health app alerts an advertiser that she has missed a cycle 3 times in a row and has gained 5 lbs. That advertiser may calculate she is pregnant and start offering her ads for maternal vitamins. This is eerily similar to the Target case of the early 2000s.

If the app is leaking your personal data or allowing inappropriate secondary uses, your information could be distorted, for example by a faulty prediction algorithm, which may have unintended consequences for you. Imagine if inaccurate measurement and tracking of your resting heart rate resulted in a faulty prediction of your cardiovascular health, which leads a health insurer to deny you coverage for future heart attacks.

What happens next?

What does this all mean for you? As much as it is not fun to do, take the time to read the privacy policies of the apps you use. At least focus on the use and sharing of your data. If you do not understand it, contact the company. If they cannot explain it to your satisfaction, you might want to consider not doing business with that company any longer.

Cut down the number of apps you use, eliminate accounts for apps you no longer use and ensure the privacy settings for the apps you keep are appropriate for your level of comfort. For example, set your ‘home zone’ in Strava to protect your place of residence from showing up in your public feed, or more drastically, change your account to private.

Additionally, the app sector should be taking responsibility for the privacy aspects of personal health and fitness data. Companies need to give users options to opt-in to sharing each activity or chunk of data and clarify what it means to not opt-in. Paying customers might have the option to share less data. Reminders to users to check privacy settings are already beginning to happen. In Europe, with the advent of the GDPR, some of these actions are becoming part of doing digital business, but there is a long way to go to fully protect our personal privacy with health and fitness apps.

Human versus Machine: Can the struggle for better decision-making apparatuses prove to be a forum for partnership?

Human versus Machine
Can the struggle for better decision-making apparatuses prove to be a forum for partnership?
By Brian Neesby | March 10, 2019

“Man versus Machine” has been a refrain whose origins are lost to history—perhaps it dates back to the Industrial Revolution, perhaps to John Henry and the Steam Mill. Searching the reams of books on Google’s archives, the first mention of the idiom appears to hail from an 1833 article in the New Anti-Jacobin Review. Authorship is credited to Percy Bysshe Shelley, posthumously, but the editor was his cousin Thomas Medwin. Both poets are famous in their own right, but Shelly’s first wife, Mary Shelly, is probably more renown. Personally, I choose to believe that the author of Frankenstein herself dubbed the phrase.

Not only must the phrase be updated for modern sensibilities—take note of the blog’s gender-agnostic title—but the debate itself must be reimagined. Our first concerns were over who was the best at certain strategic, memory, or mathematical tasks. The public watched as world chess champion Garry Kasparov beat IBM’s Deep Blue in 1996, only to be conquered by the computer just on year later, when the machine could evaluate 200 million chess moves per second. I think in modern times, we can safely say that machines have won. In 2011, Watson, an artificial intelligence named after IBM’s founder, soundly beat Jeopardy champions, Ken Jennings and Brad Rutter, in the classic trivia challenge; it wasn’t close. But do computers make better decisions; they certainly make faster decisions, but are they substantively better? The modern debate with these first “thinking computers” centers on the use of automated decision making, especially those decisions that affect substantive rights.

Automated Decision Making

One does not have to go too far to find automated decision-making gone awry. Some decisions are not about rights, per se, but they can still have far-flung consequences.

  • Beauty.AI, a deep-learning system supported by Microsoft, was programmed to use objective factors, such as facial symmetry and lack of wrinkles, to identify the most attractive contestants in beauty pageants. It was used in 2016 to judge an international beauty contest of over 6000 participants. Unfortunately, the system proved racist; its algorithms equated beauty with fair skin, despite the numerous minority applicants. Alex Zhavoronkov, Beauty.AI’s Chief Science Officer, blamed the system’s training data, which “did not include enough minorities”.
  • Under the guise of objectivity, a computer program called the Correctional Offender Management Profiling for Alternative Sanctions (Compas) was created to rate a defendant on the likeliness of recidivism, particularly of the violent variety. The verdict—the algorithm was given high marks for predicting recidivism in general, but with one fundamental flaw; it was not color blind. Black defendants who did not commit crimes over the next two years were nearly twice as likely to be misclassified as higher risks vis-à-vis their white counterparts. The inverse was also true. White defendants who reoffended within the two-year period had been mislabeled low risk approximately twice as often as black offenders.
  • 206 teachers were terminated in 2009 when Washington DC introduced an algorithm to assess teacher performance. Retrospective analysis eventually proved that the program had disproportionately weighed a small number of student survey results; other teachers had gamed the system by encouraging their students to cheat. At the time, the school could not explain why excellent teachers had been fired.
  • A Massachusetts resident had his driving license privileges suspended when a facial recognition system mistook him for another driver, one that had been flagged in an antiterrorist database.
    Algorithms in airports inadvertently classify over a thousand customers a week as terrorists. A pilot for American Airlines was detained eighty times within a single year because his name was similar to a leader of the Irish Republican Army (IRA).
  • An Asian DJ was denied a New Zealand passport because his photograph was automatically processed; the algorithm decided that he had his eyes closed. The victim was gracious: “It was a robot, no hard feelings,” he told Reuters.

Human Decision-Making is all too “Human”

Of course, one could argue that the problem with biased algorithms is the humans themselves. Algorithms just entrench existing stereotypes and biases. Put differently, do algorithms amplify existing prejudice, or can they be a corrective? Unfortunately, decision-making by human actors does not fare much better than our robotic counterparts. Note the following use cases and statistics:

  • When researchers studied parole decisions, the results were surprising. The prisoner’s chance of being granted parole was heavily influenced by the timing of the hearing – specifically it’s proximity to the judge’s lunch hour. 65% of cases were granted parole in the morning hours. This fell precipitously over the next couple hours, occasionally to 0%. The rate returned to 65% once the ravenous referee had been satiated. Once again, late afternoon hours brought a resurgence of what Daniel Kahneman calls decision fatigue.
  • College-educated Blacks are twice as likely to face unemployment compared to all other students.
  • One study reported that applicants with white-sounding names received a call back 50% more often than applicants with black-sounding names, even when identical resumes were submitted to prospective employers.
  • A 2004 study found that when police officers were handed s series of pictures and asked to identify faces that “looked criminal”, they chose Black faces more often than White ones.
  • Black students are suspended three times more often than White students, even when controlling for the type of infraction.
  • Black children are 18 times more likely than White children to be sentenced as adults.
  • The Michigan State Law Review presented the results of a simulated capital trial. Participants were shown one of four simulated trial videotapes. The videos were identical except for the race of the defendant and/or the victim. The research participant – turned juror – was more likely to sentence a black defendant to death, particularly when the victim was white. The researchers’ conclusion speaks for itself: “We surmised that the racial disparities that we found in sentencing outcomes were likely the result of the jurors’ inability or unwillingness to empathize with a defendant of a different race—that is, White jurors who simply could not or would not cross the ’empathic divide’ to fully appreciate the life struggles of a Black capital defendant and take those struggles into account in deciding on his sentence.”

At this point, dear reader, your despair is palpable. Put succinctly, society has elements that are bigoted, racist, masochist – add your ‘ism’ of choice – and humans, and algorithms created by humans, reflect that underlying reality. Nevertheless, there is reason for hope. I shared the litany of bad decisions that are attributable to humans, without the aid of artificial intelligence, to underscore the reality that humans are just as prone to making unforgivable decisions as their robotic counterparts. Nevertheless, I contend that automated decision-making can be an important corrective for human frailty. As a data scientist, I might be biased in this regard – according to Kauffman, this would be an example of my brain’s self-serving bias. I think that the following policies can marry the benefits of human and automated decision-making, for a truly cybernetic solution – if you’ll permit me to misuse that metaphor. Here are some correctives that can be applied to automatic decision-making to provide a remedial effective for prejudiced or biased arbitration.

  • Algorithms should be reviewed by government and nonprofit watchdogs. I am advocating turning over both the high-level logic, as well as the source code, to the proper agency. I think there should be no doubt that government-engineered algorithms require scrutiny, since they involve articulable rights. The citizen’s sixth amendment right to face their accuser would alone necessitate this, even if the accuser in this case is an inscrutable series of 1s and 0s. Nevertheless, I think that corporations could also benefit from such transparency, even if it is not legally coerced. If a trusted third-party watch dog or government agency has vetted a company’s algorithm, the good publicity – or, more likely, the avoidance of negative publicity – could be advantageous. The liability of possessing a company’s proprietary algorithm would need to be addressed. If a nonprofit agency’s security was compromised, damages would likely be insufficient to remedy a company’s potential loss. Escrow companies routinely take on such liability, but usually not for clients as big as Google, Facebook, or Amazon. The government might provide some assistance here, by guaranteeing damages in the case of a security breach.
  • There also need to be publicly-accessible descriptions of company algorithms. The level of transparency for the public cannot be expected to be quite as formulaic as above; such transparency should not expose proprietary information, nor permit the system to be gamed in a meaningful way.
  • Human review should be interspersed into the process. I think a good rule of thumb is that automation should preserve rights or other endowments, but rights, contractual agreements, or privileges, should only be revoked after human review. Human review, by definition, necessitates a diminution in privacy. This should be weighed appropriately.
  • Statistical review is a must. The search for a discriminatory effect can be used to continually adjust and correct algorithms, so that bias does not inadvertently creep in.

One final problem presents itself. Algorithms, especially those based on deep learning techniques, can be so opaque that it becomes difficult to explain their decisions. Alan Winfield, professor of robot ethics at the University of the West of England, is leading a project to solve this seemingly intractable problem. “My challenge to the likes of Google’s DeepMind is to invent a deep learning system that can explain itself,” Winfield said. “It could be hard, but for heaven’s sake, there are some pretty bright people working on these systems.” I couldn’t have said it better. We want the best and the brightest humans working not only to develop algorithms to get us to spend our money on merchandise, but also to develop algorithms to protect us from the algorithms themselves.


Youtube and the Momo Challenge

Youtube and the Momo Challenge
By Matt Vay | March 10, 2019

Youtube has been in hot waters recently over a series of high profile incidents that have gained massive media coverage and put into question the algorithms that drive its business and the role it should be playing in censoring the content it puts out. The first incident consisted of predatory comments made on videos showing children with the second major incident, and the focus of this blogpost, dealing with a new dangerous challenge called the “Momo Challenge”.

What is the Momo Challenge?
Momo began as an urban legend created in a public forum online but evolved over time. The Momo Challenge has become a series of images that appear in children’s videos, telling kids to harm themselves. Many believe this story has been perpetuated by mainstream media and unnecessarily frightened parents across the world due to the lack of evidence of these videos existing on Youtube. However, this has brought to attention once again, what role does Youtube play in censoring the content that it puts out on its website?

What are the legal and ethical issues?
Youtube’s recommender algorithm has been the subject of great debate over the past few years. It has a tendency to place individuals into “filter bubbles” where they are shown videos similar to those they have watched in the past. But what kind of dangers could that lead to when the videos it records our children watching are dangerous pranks? Could it lead to seeing a child watching the Momo Challenge and then recommend them to watch a Tide Pod Challenge video? Companies with this much power have a responsibility to protect the rights of our young children from seeing disturbing content. If a child watches one of these videos and then harms them self, how much to blame is Youtube for its part in recommending these videos?

What has Youtube done?
The Momo Challenge is not the first time our nation has been captivated by a dangerous challenge that has been targeted at our youth. From the tide pod challenge to the bird box challenge, Youtube has experience these dangerous pranks before and recently updated their Community Guidelines. In them, Youtube policies now ban challenge and prank videos that could lead to serious physical injury. They even went one step further with the Momo Challenge and demonetized all videos even referencing Momo. Many of those videos also have warning screens that classify the video as having potentially offensive content.

Where do we go from here?
Unfortunately, these types of videos do not seem to be going away. Youtube has taken the right steps toward censoring its content for children, but how much further do they need to go? I think that answer is very unclear. Nobody will ever be fully happy with all of the content found on Youtube and that is the nature of the beast. It is an open source video sharing platform where users can upload a video file with anything they want in it. But with children gaining access to these sites with ease and at such a young age, we always need to be challenging Youtube to be better with its policies, its censorship and its algorithms, as it likely will never be enough.


Alexander, Julia. “YouTube Is Demonetizing All Videos about Momo.” The Verge, The Verge, 1 Mar. 2019,

Hale, James Loke. “YouTube Bans Stunts Like Particularly Risky ‘Bird Box,’ Tide Pod Challenges In Updated Guidelines.” Tubefilter, Tubefilter, 16 Jan. 2019,

A Day in My Life According to Google: The Case for Data Advocacy

A Day in My Life According to Google: The Case for Data Advocacy
By Stephanie Seward | March 10, 2019

Recently I was sitting in an online class for the University of California-Berkeley’s data science program discussing privacy considerations. If someone from outside the program were to listen in, they would interpret our dialogue as some sort of self-help group for data scientists who fear an Orwellian future that we have worked to create. It’s an odd dichotomy potentially akin to Oppenheimer’s proclamation that he had become death, destroyer of worlds after he worked diligently to create the atomic bomb (

One of my fellow students mentioned as part of our in depth, perhaps somewhat paranoid, dialogue that users can download the information Google has collected on them. He said he hadn’t downloaded the data, and the rest of the group insisted that they wouldn’t want to know. It would be too terrifying.

I, however, a battle-hardened philosopher that graduated from a military school in my undergraduate days thought, I’m not scared, why not have a look? I was surprisingly naïve just four weeks ago.

What follows is my story. This is a story of curiosity, confusion, fear, and a stark understanding that data transparency and privacy concerns are prevalent, prescient, and more pervasive than I could have possibly known. This is the (slightly dramatized) story of a day in my life according to Google.
This is how you can download your data.

A normal workday according to Google
0500: Wake up, search “News”/click on a series of links/read articles about international relations
0530: Movement assessed as “driving” start point: home location end point: work location
0630: Activity assessed as “running” grid coordinate: (series of coordinates)
0900: Shopping, buys swimsuit, researches work fashion

1317: Uses integral calculator
1433: Researches military acquisition issues for equipment
1434: Researches information warfare
1450: Logs into maps, views area around City (name excluded for privacy), views area around post
1525: Calls husband using Google assistant
1537: Watches Game of Thrones Trailer (YouTube)
1600: Movement assessed as “driving” from work location to home location
1757: Watches Inspirational Video (YouTube)
1914-2044: Researches topics in Statistics
2147: Watches various YouTube videos including Alice in Wonderland-Chesire Cat Clip (HQ)
Lists all 568 cards in my Google Feed and annotates which I viewed
Details which Google Feed Notifications I received and which I dismissed

I’m not a data scientist yet, but it is very clear to me that the sheer amount of information Google has on me (about 10 GB in total) is dangerous. Google knows my interests and activities almost every minute of every day. What does Google do with all that information?

We already know that it is used in targeted advertising, to generate news stories of interests, and sometimes even in hiring practices. Is that, however, where the story ends? I don’t know, but I doubt it. I also doubt that we are advancing toward some Orwellian future in which everything about us is known by some big brother figure. We will probably fall somewhere in between.

I also know that, I am not the only one Google has about 10GB if not more information on. If you would like to view your own data, visit: or to view your data online visit

Privacy considerations cannot remain in the spheres of data science and politics, we each have a role in the debate. This post is a humble attempt to drum up more interest from everyday users. Consider researching privacy concerns. Consider advocating for transparency. Consider the data, and consider the consequences.

Looking for more?
Here is a good place to start: This article, “The Privacy Battle to Save Google from Itself” by Lily Hay Newman is in the security section of It details Google’s recent battles, as of late 2018, with privacy concerns. Newman discusses emphasis on transparency efforts contrasted with increased data collection on users. She talks of Google’s struggle with remaining transparent to the public and its own employees when it comes to data collection and application use. In her final remarks, Newman reiterates, “In thinking about Google’s extensive efforts to safeguard user privacy and the struggles it has faced in trying to do so, this question articulates a radical alternate paradigm ̶ one that Google seems unlikely to convene a summit over. What if the data didn’t exist at all?”

GDPR: The tipping point for a US Privacy Act?

GDPR: The tipping point for a US Privacy Act?
By Harith Elrufaie | March 6, 2019

GDPR, which is a short for General Data Protection Regulation, was probably in the top ten buzz words of 2018! For many reasons, this new regulation fundamentally reshapes the way data is handled across every sector. According to the new law, any company that is based in the EU, or has a business with EU customers must comply with the new regulations. Failing to comply will result in fines that could reach 4% of annual global turnover or €20 Million (whichever is greater). Here in the US, Companies revamped their privacy policies, revised architectures, data storage and encryption policies. It is estimated that US companies spent over $40 billions to be GDPR compliant.

To be a GDPR compliant, the company must:

1. Obtaining consent: consents must be simple. This means complex legal terms and conditions are not accepted.
2. Timely breach notification: if a security data breach occurs, the company must not only inform the users, obut must also be within 72 hours.
3. Right to data access: the user has the right to request all their stored data and for free.
4. Right to be forgotten: the user has the right to request the deletion of their data any time and for free.
5. Data portability: the user has the right to obtain their data and reuse the same data in a different system.
6. Privacy by design: calls for the inclusion of data protection from the onset of the designing of systems, rather than an addition.
7. Potential data protection officers: to appoint Data Protection Officer (DPO) to oversee for some cases.

Is this the tipping point?

The last few years were a revolving door of data privacy scandals; the shutdown of websites, data mishandling, public apologies, and CEO’s testifying before US Congress. A question that pops in the mind of many is will a GDPR similar act appear in the United States sometime soon?

The answer is maybe.

In January 2019, two U.S. senators, Amy Klobuchar and John Kennedy, introduced the Social Media Privacy and Consumer Rights Act, a bipartisan legislation that will protect the privacy of consumers’ online data. Having senator Kennedy is no surprise to many. He has been an advocate of data privacy and been vocal about Facebook’s user agreement. In Mark Zuckerberg’s testimony before the Congress, senator John Kennedy said: “Your user agreement sucks. The purpose of that user agreement is to cover Facebook’s rear end. It’s not to inform your users of their rights.” The act is very similar to GDPR in many forms. After reading the bill, I could not identify anything unique or different from GDPR. While this is a big step towards consumers data privacy, many believe such measures will never become a law, because of the power of the tech lobby and the lack of public demand for data privacy overhaul.

The second good move happened here in California with the new California Consumer Privacy Act of 2018. The act grants consumers the right to know what data businesses and edge providers are collecting from them and offers them specific controls over how that data is handled, kept, and shared. This new act will take effect on January 1st of 2020 and will only apply to the residents of California.

To comply with the California Consumer Privacy Act, companies must:

1. Disclose to consumers the personal information being collected, how it is used, and to whom it is being disclosed or sold.
2. Allow consumers to opt out of the sale of their data.
3. Allow consumers to request the deletion of their personal information.
4. To offer an opt-in services for consumers under the age 16.

While the United States has a rich history of data protection acts, such as HIPPA, COPPA, etc., there is no single act to address online consumers privacy. Corporates have benefited for many years by invading our privacy and selling out data without our knowledge. It is time to make an end to this and voice our concerns and demands to our representatives. There is no better time than now for an online consumers privacy act.