Where Incentives Collide: Maintaining Privacy in P2P Lending

Where Incentives Collide: Maintaining Privacy in P2P Lending
By Kyle Redfield | March 10, 2019

Peer to peer lending is a growing channel for borrowing funds. Peer to peer lending apps and websites act as a marketplace for individual borrowers and individual lenders to distribute funds outside typical venues. The concept is simple: a prospective borrower will log into the app and request funds paid back over some duration of time. The app will decide, based on available information about the borrower, an appropriate interest rate for the situation. In reality, the system is a more complex and, depending on your point of view, nefarious.

A 2010 study investigated whether borrowers who offered lenders more of their personal data received more favorable interest rates in return. By investigating about 600 lending projects, they found that, in some cases, releasing more information does tend to lower the borrower’s interest rate. This finding has been supported anecdotally as well. A 2011 legal review of the issues in online lending finds cases where online lenders rigorously interview, request further information from, or otherwise use some fancy Googling to re-identify the originally de-identified borrower.

Perhaps these results are unsurprising. After all, information asymmetry has plagued financial institutions long before the Internet existed. However, as Bohme and Potzsch conclude, it is more often those with economic disadvantages that might seek peer to peer lending solutions. Therefore, “one form of inequality is replaced by another, potentially more subtle one: socially disadvantaged members of society are more likely to act as borrowers and thus are in a worse position to protect their informational privacy” (Bohme and Potzsch 2010). So, while peer to peer lending may promote economic parity, so too does it exacerbate privacy disparity.

But this need not be a tale of outrage and despair. Companies such as Uber or OkCupid have been exploiting the privacy of individuals for the sake of profit without compensation for years. On the other hand, we may be glimpsing into the future of privacy. By receiving lower interest rates or higher probabilities of receiving a loan for disclosing more information, peer to peer lending apps are implicitly compensating its users for disclosing their personally identifiable information (PII).

Increasingly, public expectation has been trending away from any prospect of privacy (take a look at page 86). Just today, my friend proclaimed “I don’t really care that Facebook has my data, I just wish I could get something for it”. Peer to peer lending offers insight into exactly that opportunity. Rather than attempting to regulate and outwit at every turn the massive and intelligent organizations that face every incentive to exploit users’ privacy, governments can acknowledge the long standing tradition of Pareto efficiency. In the spirit of Pareto efficiency, simply by granting one the right to own and sell your own privacy – particularly in the online space – the free market can begin to organize around a new regime.

The cost to the individual may well be trivial for releasing the rights to oneself, but – hey – at least I could get something for it.


Sources:

  1. https://www.aaai.org/ocs/index.php/SSS/SSS10/paper/viewFile/1048/1472
  2. https://lawreview.law.ucdavis.edu/issues/45/2/articles/45-2_Verstein.pdf
  3. https://www.digitalcenter.org/wp-content/uploads/2013/10/2017-Digital-Future-Report.pdf

Guilty or Innocent? The Use of Algorithms in Bail Reform

Guilty or Innocent? The Use of Algorithms in Bail Reform
By Rachel Kramer | March 10, 2019

There are efforts being made across the country to reform our criminal justice system; to incarcerate fewer people of color, to revise or remove the system of bail, to change how drug and other non-violent offences are treated in our courts. One major path of reform many states are traveling is through technology: implementing risk assessment algorithms as a way to mitigate human bias, error, or inability to systematically compile many pieces of information on a person in order to statistically infer a range of specific outcomes. These risk assessment algorithms, while containing a lot of diversity in and of themselves, all perform the same basic function: they take past information on a defendant and, using machine learning, predict the likelihood of a future event that the court wants to prevent, such as fleeing the state, not showing up to court dates, or being arrested for violent or non-violent crimes after pretrial release. Judges use these risk score outputs to decide a variety of outcomes for the defendant at every stage of the criminal justice system, including, as this post focuses on, pretrial decisions such as bail and how the defendant is monitored before sentencing.

The purpose of bail as it stands now is to assess the dangerousness of the defendant to the public if they are released back into society, and to set bail that is in line with that dangerousness. In extreme cases, the court can withhold bail and mandate pre-trial imprisonment. The original purpose of bail was to incentivise defendants to show up to their court dates and to discourage the accused from fleeing the jurisdiction. Over the years, however, the purpose and goal of bail has shifted. While the civil rights movement shepherded in a brief period of bail reform–meant to remedy the poverty-driven high pretrial incarceration rates of populations unable to make bail–sentiments reversed during the conservative Nixon and Reagan eras, landing us back in assessments of a defendantís dangerousness to the public as the primary goal of bail hearings.

Assessment of the threat a defendant poses to the general population is a highly subjective matter, and it is no surprise that most states are beginning to favor a system that is statistical and backed by data. Machine learning represents a shining chance at an objective and neutral decision-maker — or so goes the prevailing sentiment of many industries at the moment. If the criminal justice system is desperate for reform, why should we turn to a decision-maker trained with data from the very system we are trying to reform? John Logan Koepke and David G. Robinson, both scientists at a technology think tank, ask this question in their comprehensive article, “Danger Ahead: Risk Assessment and the Future of Bail Reform.”

As every industry moves toward machine learning and AI applications, the question should be not only how we use the algorithms, but if we should use them. It is well-publicized that our criminal justice system is biased against black and/or impoverished communities. Risk assessment algorithms repeat and enhance these biases because the algorithms learn patterns inherent to the larger justice system, even ones we aren’t aware of enough to name or address. Most risk assessment programs donít use race as an input, but there are so many other predictors of race in our lives and communities that the system learns to disenfranchise based on race even without the explicit information. There is a high chance that algorithms trained on data from a broken system will lead to what the authors call “zombie predictions,” or predictions that reanimate biases or overestimations of risks of certain defendants (usually black) and underestimations for others (usually white). Even if the bias in the training data were to be alleviated or worked around through data science procedures such as bootstrapping or feature weighting, the fix is not strong enough for many reformers, including Koepke and Robinson. Making our punishment systems more efficient ultimately does little to reform the system as a whole.

Koepke and Robinson suggest that the system can and should be reformed without algorithms. Such reform ideas include automatic pretrial release for certain categories of crime, different standards for imposing pretrial detention, or replacing or cabining money bail entirely, like the recent law in California ruling cash bail unconstitutional. Many pretrial arrests are due to infringements of pretrial restrictions set out in bail hearings, and failure to show up to court dates is often due to the defendant being unable to miss work, find childcare, or access transportation. Simple processes can alleviate these problems, such as court-funded ride services or text reminders about appointments. Reforms at the police level are also vital, though outside the scope of this post.

If machine learning algorithms are here to stay in our justice system, which is likely the case, there are actionable ways to improve their performance and reduce harm and injustice in their use. Appallingly, many of the algorithms in use have not been externally validated or audited. Beyond guaranteeing accountability in the software itself, courts could follow up on defendants to compare the systemís predictions against the real outcomes specific to their jurisdiction. This is especially important to repeat after any bail reforms have been put into place. The algorithms need to be trained on the most recent local data available, and importantly, data coming from an already reformed or reforming system. Recently in New York, to improve their flawed bail system, the city’s Criminal Justice Agency announced they would train an algorithm using data–but this data came from the stop-and-frisk era of policing, a policy now ruled unconstitutional. Egregious oversights like these can further marginalize already vulnerable populations.

Our focus in data science has generally been on improving and refining the tools of our trade. This paper, along with other reform movements, invite data science and its associated fields to take a step back in the implementation process. We need to ask ourselves what consequences an algorithmic or machine learning application could engender, and if there are alternative ways to address change in a field before leaning on technologies whose impacts we are only just beginning to understand.

——-

Source:
(1) http://digital.law.washington.edu/dspace-law/bitstream/handle/1773.1/1849/93WLR1725.pdf

What your Fitness Apps say about you: Should you be worried?

What your Fitness Apps say about you: Should you be worried?
By Laura Chutny | March 10, 2019

If you run or cycle, meditate, track your diet or sleep, you probably use Strava, Garmin, MyFitnessPal, Fitbit or one of the dozens of other health and fitness applications. When you signed up for those services, did you read the privacy policy and determine what might happen to your personal data? If you did, are you concerned about the fate of your data? It is a concern that many of us have, but privacy policies are often long, obtuse and often dreadfully boring to read. Knowing what those companies may or may not do with your very personal data, however, is important.


(Image courtesy United News Desk)

Health and Fitness apps are among the top 10 categories in both the Google Play and Apple App] stores,. Many mobile devices come with at least one health and fitness app preinstalled (e.g. Apple’s Health).

Data and Privacy Concerns

Health and fitness apps take data from you and store it in your account in the cloud. This data includes things like your weight, height, birth date, blood pressure, pulse, location during exercise, menstrual cycle, diet, and many more. By installing one of the apps, you have consented to share your data with the company that created the app.

In some cases, your data becomes part of a wider set of data through aggregation, as in Strava’s Heat Maps. This particular feature has recently come under fire for allowing re-identification of location. In this particular instance, the heat map highlighted the location of military bases after soldiers logged their exercise through Strava which potentially put soldiers at risk. Those soldiers most likely were not aware their data in Strava would allow this type of reverse engineering. Single people may also be put at risk if they can be tracked to their home, gym or workplace from their publicly available data.


(Image courtesy Mashable)

In other cases, your data may be shared with analytics companies, advertisers and social networks. Even if your data is not shared, the security of your data within the application itself may be at risk, with no standards of practice or regulation on how applications use, store or transfer data. Recently, one company has begun to use your data to adjust your life insurance policy. It is not inconceivable then that unregulated sharing of your personal information with health insurance providers may affect your eligibility or premiums. Maybe you should rethink that third beer on Fridays!

Dimensions of Privacy

Daniel Solove created a Taxonomy of Privacy that we can use to evaluate the risks presented by health and fitness applications. Many of the risks surrounding surveillance, interrogation, and security have been discussed.

Unwanted disclosure and exposure could be damaging to an individual. For example, imagine a young woman whose menstrual cycle tracker in her health app alerts an advertiser that she has missed a cycle 3 times in a row and has gained 5 lbs. That advertiser may calculate she is pregnant and start offering her ads for maternal vitamins. This is eerily similar to the Target case of the early 2000s.

If the app is leaking your personal data or allowing inappropriate secondary uses, your information could be distorted, for example by a faulty prediction algorithm, which may have unintended consequences for you. Imagine if inaccurate measurement and tracking of your resting heart rate resulted in a faulty prediction of your cardiovascular health, which leads a health insurer to deny you coverage for future heart attacks.

What happens next?

What does this all mean for you? As much as it is not fun to do, take the time to read the privacy policies of the apps you use. At least focus on the use and sharing of your data. If you do not understand it, contact the company. If they cannot explain it to your satisfaction, you might want to consider not doing business with that company any longer.

Cut down the number of apps you use, eliminate accounts for apps you no longer use and ensure the privacy settings for the apps you keep are appropriate for your level of comfort. For example, set your ‘home zone’ in Strava to protect your place of residence from showing up in your public feed, or more drastically, change your account to private.

Additionally, the app sector should be taking responsibility for the privacy aspects of personal health and fitness data. Companies need to give users options to opt-in to sharing each activity or chunk of data and clarify what it means to not opt-in. Paying customers might have the option to share less data. Reminders to users to check privacy settings are already beginning to happen. In Europe, with the advent of the GDPR, some of these actions are becoming part of doing digital business, but there is a long way to go to fully protect our personal privacy with health and fitness apps.

Human versus Machine: Can the struggle for better decision-making apparatuses prove to be a forum for partnership?

Human versus Machine
Can the struggle for better decision-making apparatuses prove to be a forum for partnership?
By Brian Neesby | March 10, 2019

“Man versus Machine” has been a refrain whose origins are lost to history—perhaps it dates back to the Industrial Revolution, perhaps to John Henry and the Steam Mill. Searching the reams of books on Google’s archives, the first mention of the idiom appears to hail from an 1833 article in the New Anti-Jacobin Review. Authorship is credited to Percy Bysshe Shelley, posthumously, but the editor was his cousin Thomas Medwin. Both poets are famous in their own right, but Shelly’s first wife, Mary Shelly, is probably more renown. Personally, I choose to believe that the author of Frankenstein herself dubbed the phrase.

Not only must the phrase be updated for modern sensibilities—take note of the blog’s gender-agnostic title—but the debate itself must be reimagined. Our first concerns were over who was the best at certain strategic, memory, or mathematical tasks. The public watched as world chess champion Garry Kasparov beat IBM’s Deep Blue in 1996, only to be conquered by the computer just on year later, when the machine could evaluate 200 million chess moves per second. I think in modern times, we can safely say that machines have won. In 2011, Watson, an artificial intelligence named after IBM’s founder, soundly beat Jeopardy champions, Ken Jennings and Brad Rutter, in the classic trivia challenge; it wasn’t close. But do computers make better decisions; they certainly make faster decisions, but are they substantively better? The modern debate with these first “thinking computers” centers on the use of automated decision making, especially those decisions that affect substantive rights.

Automated Decision Making

One does not have to go too far to find automated decision-making gone awry. Some decisions are not about rights, per se, but they can still have far-flung consequences.

  • Beauty.AI, a deep-learning system supported by Microsoft, was programmed to use objective factors, such as facial symmetry and lack of wrinkles, to identify the most attractive contestants in beauty pageants. It was used in 2016 to judge an international beauty contest of over 6000 participants. Unfortunately, the system proved racist; its algorithms equated beauty with fair skin, despite the numerous minority applicants. Alex Zhavoronkov, Beauty.AI’s Chief Science Officer, blamed the system’s training data, which “did not include enough minorities”.
  • Under the guise of objectivity, a computer program called the Correctional Offender Management Profiling for Alternative Sanctions (Compas) was created to rate a defendant on the likeliness of recidivism, particularly of the violent variety. The verdict—the algorithm was given high marks for predicting recidivism in general, but with one fundamental flaw; it was not color blind. Black defendants who did not commit crimes over the next two years were nearly twice as likely to be misclassified as higher risks vis-à-vis their white counterparts. The inverse was also true. White defendants who reoffended within the two-year period had been mislabeled low risk approximately twice as often as black offenders.
  • 206 teachers were terminated in 2009 when Washington DC introduced an algorithm to assess teacher performance. Retrospective analysis eventually proved that the program had disproportionately weighed a small number of student survey results; other teachers had gamed the system by encouraging their students to cheat. At the time, the school could not explain why excellent teachers had been fired.
  • A Massachusetts resident had his driving license privileges suspended when a facial recognition system mistook him for another driver, one that had been flagged in an antiterrorist database.
    Algorithms in airports inadvertently classify over a thousand customers a week as terrorists. A pilot for American Airlines was detained eighty times within a single year because his name was similar to a leader of the Irish Republican Army (IRA).
  • An Asian DJ was denied a New Zealand passport because his photograph was automatically processed; the algorithm decided that he had his eyes closed. The victim was gracious: “It was a robot, no hard feelings,” he told Reuters.

Human Decision-Making is all too “Human”

Of course, one could argue that the problem with biased algorithms is the humans themselves. Algorithms just entrench existing stereotypes and biases. Put differently, do algorithms amplify existing prejudice, or can they be a corrective? Unfortunately, decision-making by human actors does not fare much better than our robotic counterparts. Note the following use cases and statistics:

  • When researchers studied parole decisions, the results were surprising. The prisoner’s chance of being granted parole was heavily influenced by the timing of the hearing – specifically it’s proximity to the judge’s lunch hour. 65% of cases were granted parole in the morning hours. This fell precipitously over the next couple hours, occasionally to 0%. The rate returned to 65% once the ravenous referee had been satiated. Once again, late afternoon hours brought a resurgence of what Daniel Kahneman calls decision fatigue.
  • College-educated Blacks are twice as likely to face unemployment compared to all other students.
  • One study reported that applicants with white-sounding names received a call back 50% more often than applicants with black-sounding names, even when identical resumes were submitted to prospective employers.
  • A 2004 study found that when police officers were handed s series of pictures and asked to identify faces that “looked criminal”, they chose Black faces more often than White ones.
  • Black students are suspended three times more often than White students, even when controlling for the type of infraction.
  • Black children are 18 times more likely than White children to be sentenced as adults.
  • The Michigan State Law Review presented the results of a simulated capital trial. Participants were shown one of four simulated trial videotapes. The videos were identical except for the race of the defendant and/or the victim. The research participant – turned juror – was more likely to sentence a black defendant to death, particularly when the victim was white. The researchers’ conclusion speaks for itself: “We surmised that the racial disparities that we found in sentencing outcomes were likely the result of the jurors’ inability or unwillingness to empathize with a defendant of a different race—that is, White jurors who simply could not or would not cross the ’empathic divide’ to fully appreciate the life struggles of a Black capital defendant and take those struggles into account in deciding on his sentence.”

At this point, dear reader, your despair is palpable. Put succinctly, society has elements that are bigoted, racist, masochist – add your ‘ism’ of choice – and humans, and algorithms created by humans, reflect that underlying reality. Nevertheless, there is reason for hope. I shared the litany of bad decisions that are attributable to humans, without the aid of artificial intelligence, to underscore the reality that humans are just as prone to making unforgivable decisions as their robotic counterparts. Nevertheless, I contend that automated decision-making can be an important corrective for human frailty. As a data scientist, I might be biased in this regard – according to Kauffman, this would be an example of my brain’s self-serving bias. I think that the following policies can marry the benefits of human and automated decision-making, for a truly cybernetic solution – if you’ll permit me to misuse that metaphor. Here are some correctives that can be applied to automatic decision-making to provide a remedial effective for prejudiced or biased arbitration.

  • Algorithms should be reviewed by government and nonprofit watchdogs. I am advocating turning over both the high-level logic, as well as the source code, to the proper agency. I think there should be no doubt that government-engineered algorithms require scrutiny, since they involve articulable rights. The citizen’s sixth amendment right to face their accuser would alone necessitate this, even if the accuser in this case is an inscrutable series of 1s and 0s. Nevertheless, I think that corporations could also benefit from such transparency, even if it is not legally coerced. If a trusted third-party watch dog or government agency has vetted a company’s algorithm, the good publicity – or, more likely, the avoidance of negative publicity – could be advantageous. The liability of possessing a company’s proprietary algorithm would need to be addressed. If a nonprofit agency’s security was compromised, damages would likely be insufficient to remedy a company’s potential loss. Escrow companies routinely take on such liability, but usually not for clients as big as Google, Facebook, or Amazon. The government might provide some assistance here, by guaranteeing damages in the case of a security breach.
  • There also need to be publicly-accessible descriptions of company algorithms. The level of transparency for the public cannot be expected to be quite as formulaic as above; such transparency should not expose proprietary information, nor permit the system to be gamed in a meaningful way.
  • Human review should be interspersed into the process. I think a good rule of thumb is that automation should preserve rights or other endowments, but rights, contractual agreements, or privileges, should only be revoked after human review. Human review, by definition, necessitates a diminution in privacy. This should be weighed appropriately.
  • Statistical review is a must. The search for a discriminatory effect can be used to continually adjust and correct algorithms, so that bias does not inadvertently creep in.

One final problem presents itself. Algorithms, especially those based on deep learning techniques, can be so opaque that it becomes difficult to explain their decisions. Alan Winfield, professor of robot ethics at the University of the West of England, is leading a project to solve this seemingly intractable problem. “My challenge to the likes of Google’s DeepMind is to invent a deep learning system that can explain itself,” Winfield said. “It could be hard, but for heaven’s sake, there are some pretty bright people working on these systems.” I couldn’t have said it better. We want the best and the brightest humans working not only to develop algorithms to get us to spend our money on merchandise, but also to develop algorithms to protect us from the algorithms themselves.

Sources:
https://www.theguardian.com/technology/2017/jan/27/ai-artificial-intelligence-watchdog-needed-to-prevent-discriminatory-automated-decisions
https://www.marketplace.org/2015/02/03/tech/how-algorithm-taught-be-prejudiced
https://humanhow.com/en/list-of-cognitive-biases-with-examples/
https://www.forbes.com/sites/markmurphy/2017/01/24/the-dunning-kruger-effect-shows-why-some-people-think-theyre-great-even-when-their-work-is-terrible/#541115915d7c
https://www.pnas.org/content/108/17/6889
https://deathpenaltyinfo.org/studies-racial-bias-among-jurors-death-penalty-cases

Youtube and the Momo Challenge

Youtube and the Momo Challenge
By Matt Vay | March 10, 2019

Youtube has been in hot waters recently over a series of high profile incidents that have gained massive media coverage and put into question the algorithms that drive its business and the role it should be playing in censoring the content it puts out. The first incident consisted of predatory comments made on videos showing children with the second major incident, and the focus of this blogpost, dealing with a new dangerous challenge called the “Momo Challenge”.

What is the Momo Challenge?
Momo began as an urban legend created in a public forum online but evolved over time. The Momo Challenge has become a series of images that appear in children’s videos, telling kids to harm themselves. Many believe this story has been perpetuated by mainstream media and unnecessarily frightened parents across the world due to the lack of evidence of these videos existing on Youtube. However, this has brought to attention once again, what role does Youtube play in censoring the content that it puts out on its website?

What are the legal and ethical issues?
Youtube’s recommender algorithm has been the subject of great debate over the past few years. It has a tendency to place individuals into “filter bubbles” where they are shown videos similar to those they have watched in the past. But what kind of dangers could that lead to when the videos it records our children watching are dangerous pranks? Could it lead to seeing a child watching the Momo Challenge and then recommend them to watch a Tide Pod Challenge video? Companies with this much power have a responsibility to protect the rights of our young children from seeing disturbing content. If a child watches one of these videos and then harms them self, how much to blame is Youtube for its part in recommending these videos?

What has Youtube done?
The Momo Challenge is not the first time our nation has been captivated by a dangerous challenge that has been targeted at our youth. From the tide pod challenge to the bird box challenge, Youtube has experience these dangerous pranks before and recently updated their Community Guidelines. In them, Youtube policies now ban challenge and prank videos that could lead to serious physical injury. They even went one step further with the Momo Challenge and demonetized all videos even referencing Momo. Many of those videos also have warning screens that classify the video as having potentially offensive content.

Where do we go from here?
Unfortunately, these types of videos do not seem to be going away. Youtube has taken the right steps toward censoring its content for children, but how much further do they need to go? I think that answer is very unclear. Nobody will ever be fully happy with all of the content found on Youtube and that is the nature of the beast. It is an open source video sharing platform where users can upload a video file with anything they want in it. But with children gaining access to these sites with ease and at such a young age, we always need to be challenging Youtube to be better with its policies, its censorship and its algorithms, as it likely will never be enough.

Sources:

Alexander, Julia. “YouTube Is Demonetizing All Videos about Momo.” The Verge, The Verge, 1 Mar. 2019, www.theverge.com/2019/3/1/18244890/momo-youtube-news-hoax-demonetization-comments-kids.

Hale, James Loke. “YouTube Bans Stunts Like Particularly Risky ‘Bird Box,’ Tide Pod Challenges In Updated Guidelines.” Tubefilter, Tubefilter, 16 Jan. 2019, www.tubefilter.com/2019/01/16/youtube-bans-bird-box-tide-pod-community-guidelines-strikes/.

A Day in My Life According to Google: The Case for Data Advocacy

A Day in My Life According to Google: The Case for Data Advocacy
By Stephanie Seward | March 10, 2019

Recently I was sitting in an online class for the University of California-Berkeley’s data science program discussing privacy considerations. If someone from outside the program were to listen in, they would interpret our dialogue as some sort of self-help group for data scientists who fear an Orwellian future that we have worked to create. It’s an odd dichotomy potentially akin to Oppenheimer’s proclamation that he had become death, destroyer of worlds after he worked diligently to create the atomic bomb (https://www.youtube.com/watch?v=dus_M4sn0_I).

One of my fellow students mentioned as part of our in depth, perhaps somewhat paranoid, dialogue that users can download the information Google has collected on them. He said he hadn’t downloaded the data, and the rest of the group insisted that they wouldn’t want to know. It would be too terrifying.

I, however, a battle-hardened philosopher that graduated from a military school in my undergraduate days thought, I’m not scared, why not have a look? I was surprisingly naïve just four weeks ago.

What follows is my story. This is a story of curiosity, confusion, fear, and a stark understanding that data transparency and privacy concerns are prevalent, prescient, and more pervasive than I could have possibly known. This is the (slightly dramatized) story of a day in my life according to Google.
This is how you can download your data.

https://support.google.com/accounts/answer/3024190?hl=en

A normal workday according to Google
0500: Wake up, search “News”/click on a series of links/read articles about international relations
0530: Movement assessed as “driving” start point: home location end point: work location
0630: Activity assessed as “running” grid coordinate: (series of coordinates)
0900: Shopping, buys swimsuit, researches work fashion


1317: Uses integral calculator
1433: Researches military acquisition issues for equipment
1434: Researches information warfare
1450: Logs into maps, views area around City (name excluded for privacy), views area around post
1525: Calls husband using Google assistant
1537: Watches Game of Thrones Trailer (YouTube)
1600: Movement assessed as “driving” from work location to home location
1757: Watches Inspirational Video (YouTube)
1914-2044: Researches topics in Statistics
2147: Watches various YouTube videos including Alice in Wonderland-Chesire Cat Clip (HQ)
Lists all 568 cards in my Google Feed and annotates which I viewed
Details which Google Feed Notifications I received and which I dismissed

I’m not a data scientist yet, but it is very clear to me that the sheer amount of information Google has on me (about 10 GB in total) is dangerous. Google knows my interests and activities almost every minute of every day. What does Google do with all that information?

We already know that it is used in targeted advertising, to generate news stories of interests, and sometimes even in hiring practices. Is that, however, where the story ends? I don’t know, but I doubt it. I also doubt that we are advancing toward some Orwellian future in which everything about us is known by some big brother figure. We will probably fall somewhere in between.

I also know that, I am not the only one Google has about 10GB if not more information on. If you would like to view your own data, visit: https://support.google.com/accounts/answer/3024190?hl=en or to view your data online visit https://myactivity.google.com/.

Privacy considerations cannot remain in the spheres of data science and politics, we each have a role in the debate. This post is a humble attempt to drum up more interest from everyday users. Consider researching privacy concerns. Consider advocating for transparency. Consider the data, and consider the consequences.


Looking for more?
Here is a good place to start: https://www.wired.com/story/google-privacy-data/. This article, “The Privacy Battle to Save Google from Itself” by Lily Hay Newman is in the security section of wired.com. It details Google’s recent battles, as of late 2018, with privacy concerns. Newman discusses emphasis on transparency efforts contrasted with increased data collection on users. She talks of Google’s struggle with remaining transparent to the public and its own employees when it comes to data collection and application use. In her final remarks, Newman reiterates, “In thinking about Google’s extensive efforts to safeguard user privacy and the struggles it has faced in trying to do so, this question articulates a radical alternate paradigm ̶ one that Google seems unlikely to convene a summit over. What if the data didn’t exist at all?”

GDPR: The tipping point for a US Privacy Act?

GDPR: The tipping point for a US Privacy Act?
By Harith Elrufaie | March 6, 2019

GDPR, which is a short for General Data Protection Regulation, was probably in the top ten buzz words of 2018! For many reasons, this new regulation fundamentally reshapes the way data is handled across every sector. According to the new law, any company that is based in the EU, or has a business with EU customers must comply with the new regulations. Failing to comply will result in fines that could reach 4% of annual global turnover or €20 Million (whichever is greater). Here in the US, Companies revamped their privacy policies, revised architectures, data storage and encryption policies. It is estimated that US companies spent over $40 billions to be GDPR compliant.

To be a GDPR compliant, the company must:

1. Obtaining consent: consents must be simple. This means complex legal terms and conditions are not accepted.
2. Timely breach notification: if a security data breach occurs, the company must not only inform the users, obut must also be within 72 hours.
3. Right to data access: the user has the right to request all their stored data and for free.
4. Right to be forgotten: the user has the right to request the deletion of their data any time and for free.
5. Data portability: the user has the right to obtain their data and reuse the same data in a different system.
6. Privacy by design: calls for the inclusion of data protection from the onset of the designing of systems, rather than an addition.
7. Potential data protection officers: to appoint Data Protection Officer (DPO) to oversee for some cases.

Is this the tipping point?

The last few years were a revolving door of data privacy scandals; the shutdown of websites, data mishandling, public apologies, and CEO’s testifying before US Congress. A question that pops in the mind of many is will a GDPR similar act appear in the United States sometime soon?

The answer is maybe.

In January 2019, two U.S. senators, Amy Klobuchar and John Kennedy, introduced the Social Media Privacy and Consumer Rights Act, a bipartisan legislation that will protect the privacy of consumers’ online data. Having senator Kennedy is no surprise to many. He has been an advocate of data privacy and been vocal about Facebook’s user agreement. In Mark Zuckerberg’s testimony before the Congress, senator John Kennedy said: “Your user agreement sucks. The purpose of that user agreement is to cover Facebook’s rear end. It’s not to inform your users of their rights.” The act is very similar to GDPR in many forms. After reading the bill, I could not identify anything unique or different from GDPR. While this is a big step towards consumers data privacy, many believe such measures will never become a law, because of the power of the tech lobby and the lack of public demand for data privacy overhaul.

The second good move happened here in California with the new California Consumer Privacy Act of 2018. The act grants consumers the right to know what data businesses and edge providers are collecting from them and offers them specific controls over how that data is handled, kept, and shared. This new act will take effect on January 1st of 2020 and will only apply to the residents of California.

To comply with the California Consumer Privacy Act, companies must:

1. Disclose to consumers the personal information being collected, how it is used, and to whom it is being disclosed or sold.
2. Allow consumers to opt out of the sale of their data.
3. Allow consumers to request the deletion of their personal information.
4. To offer an opt-in services for consumers under the age 16.

While the United States has a rich history of data protection acts, such as HIPPA, COPPA, etc., there is no single act to address online consumers privacy. Corporates have benefited for many years by invading our privacy and selling out data without our knowledge. It is time to make an end to this and voice our concerns and demands to our representatives. There is no better time than now for an online consumers privacy act.

Sources:

Privacy Reckoning Comes For Healthcare

Privacy Reckoning Comes For Healthcare
By Anonymous | March 3, 2019

The health insurance industry (“payors”), compared to other industries, is relatively late to the game in utilizing data science and advanced analytics in its core business. While actuarial science has long been at the heart of pricing and risk management in insurance, not only are actuarial methods years behind the latest advances in applied statistics and data science, but the scope of use of these advanced analytical tools has been limited largely to solely underwriting and risk management.

But times are a-changing. Many leading payors are investing in data science capabilities in applications ranging from the traditional stats-heavy domain of underwriting to a range of other enterprise functions including marketing, care management, member engagement, and beyond. With this larger foray into data science has come requisite concerns with data privacy. ProPublica and NPR teamed up last year to publish the results of an investigation into privacy concerns related to the booming industry of using aggregated personal data in healthcare applications (link); while sometimes speculative and short on details, the report brings up skin-crawling possibilities of how this can go horribly wrong. Given the sensitivity of healthcare generally and the alarming scope of data collection in process, it’s high time for the healthcare industry to take a stand on how they intend to use this data and confront privacy issues top of mind for consumers. Let’s explore a few issues in particular.

Data usage: “Can they do that?”

One issue raised in the article — which would be an issue for any person with a health insurance plan — is how personal will actually be used. There are a number of protections in place that prevent some of the more egregious imagined uses of personal data, the most important being that insurance companies cannot price-discriminate for individual plans (though insurers can charge different prices for different plan tiers in different geographies). Beyond this, however, one could imagine other uses that might raise concerns on the expectations of privacy with data, including: using personal data in group plan pricing (insurance plans fully underwritten by the payor and offered to employers with <500 employees), outreach to individuals that may alert others to personal medical information (consider the infamous Target incident where a father learned of his daughter’s then-unannounced pregnancy through pregnancy-related mailers sent by Target), and individualized pricing that takes into account data collected from social media in a world where laws governing health care pricing are in flux in our current political environment. Data usage is something that payors need to be transparent about with its consumers if payors hope to engender and maintain the already-mercurial trust of its members…and ultimately voters.

Data provenance: “Do I really subscribe to ‘Guns & Ammo’?”

It is demonstrable that payors are making significant investments in personal data, sourced from a cottage industry of providers that aggregate data using a variety of proprietary methods. Given the potential uses laid out above, consider the following: what if major decisions about the healthcare offered to consumers is based on data that is factually incorrect? Data aggregation firms sometimes resort to imputing data for people with missing data points — so that, if all my neighbors subscribe to Guns & Ammo magazine, for instance, it may assume I am also a subscriber. Notwithstanding what my specific hypothetical Guns & Ammo subscription might mean, what is the impact of erroneous data on decisions around important healthcare decisions? How do we protect consumers from being the victim of erroneous decisions based on erroneous data that is out of their control? A standard is required here in order to ensure decisions are not made based on inaccurate data.

Conclusion: Miles to go before we sleep on this issue

ProPublica and NPR merely scratched the surface of potential data privacy issues that can arise from questionable data usage, data inaccuracy, and other issues not addressed in the article. As the healthcare industry continues to invest further in burgeoning its data science capabilities — which, by the way, has the potential to also help millions of people — it will be critical for payors to take a clear stand in articulating a clear data privacy policy with, at the very least, well-understood standards of data usage and data accuracy.

—————————

IMAGE SOURCES: both are examples of what a ‘personal dossier’ of an individual’s health risk might look like, including personal data. Both come from the main ProPublica article mentioned above (“Health Insurers Are Vacuuming Up Details About You – And It Could Raise Your Rates”, by Marshall Allen, July 17, 2018), found here: https://www.propublica.org/article/health-insurers-are-vacuuming-up-details-about-you-and-it-could-raise-your-rates

Both images are credited to Justin Volz, special to ProPublica

Contextual Violations of Privacy

Contextual Violations of Privacy
By Anonymous | March 3, 2019

Facebook’s data processing practices are once again in headlines (shocker, right?). One recent outrage surrounds the way in which data from non-related mobile applications is shared with the social media platform in order to improve their respective efficacy of targeting users on Facebook. This particular question has raised serious questions about end user privacy harm. This has in fact prompted New York Department of Financial Services to request documents from Facebook. In this post we will discuss some of the evidence concerning the data sharing practices of third-party applications with Facebook, and then discuss a useful lens for evaluating the perceived privacy harm. Perhaps we will also provide some insights in alternative norms in which we might construct the web to be a less commercial, surveillance-oriented tool for technology platforms.

The Wall Street Journal recently investigated 70 of the top Apple iOS 11 apps and found that 11 of them (16%) shared sensitive, user-submitted data with Facebook in order to enhance the ad targeting effectiveness of Facebook’s platform. The sensitive health and fitness data provided by the culprit apps includes very intimate data such as ovulation tracking, sexual activity defined as ìexerciseî, alcohol consumption, heart rates and other sensitive data. These popular apps use a Facebook feature called “App Events” that is then used to feed Facebook ad-targeting tools. In essence, this feature enables companies to effectively track users across platforms to improve their ad effectiveness targeting.

A separate, unrelated and earlier study conducted by Privacy International running Android 8.1 (Oreo) provides more technical discussion and details of data sharing. In tests of 34 common apps it found that 23 (61%) automatically transferred data to Facebook at the time a user opens an application. This occurred regardless of a user having a Facebook account. This data includes the specific application accessed by a user, events such as the open and closure of the application, device specific information, the userís suspected location based on language and time zone settings and a unique Google advertising ID (AAID) provided by the Google Play Store. For example, specific applications such as the travel app Kayakî sent detailed search behavior of end users to Facebook.

In response to the Wall Street Journal reports, a Facebook spokesperson commented that it’s common for developers to share information with a wide range of platforms for advertising and analytics. To be clear, the report was focused on how other apps use peopleís information to create Facebook ads. If it is common practice to share information across platforms, which on the surface appears to be true (although the way in which targeted marketing and data exchanges work is not entirely clear), then why are people so upset? Moreover, why did the report published by the Wall Street journal spark regulatory action while the reports from Privacy International were not as polarizing?

Importance of Context

Helen Nissenbaum NYU researcher, criticizes the current approach to online privacy which is dominated by discussion of transparency and choice. One central challenge to the whole paradigm is what Nissenbaum calls the “transparency paradox”. That is, providing simple, digestible and easy to comprehend privacy policies are, with few exceptions, directly opposed to detailed understanding as to how data are really controlled in practice. Instead, she argues for an approach that leverages contextual integrity in order to define the ways in which data and information ought to be handled. For example, if you operate as an online bank, then the ways in which information is used and handled in a banking context ought to apply whether it is online or in-person.

Now applying Nissenbaum’s approach to the specific topic of health applications sharing data, e.g. when one annotates her menstrual cycle on her personal device, would she reasonably expect that information to be accessed and used for forums in social media (e.g., on Facebook)? Moreover, would she reasonably expect that her travel plans to Costa Rica would then be algorithmically aggregated with her menstrual cycle information in order to detect whether she would be more or less inclined to purchase trip insurance? What if that information was then used to charge her more for the trip insurance? The number of combinations and permutations of this scenario is only constrained by one’s imagination.

Arguably many of us would be uncomfortable with this contextual violation. Debatably, sharing flight information with Facebook does not result in the same level of outrage as does health data. That is due to the fact that the norms that govern health data tend to privilege autonomy and privacy much more than those of other commercial activities like airline travel. While greater transparency would have been a meaningful step towards minimizing the outrage experienced by the general public with the health specific example, it is still not sufficient to remove the privacy harm that could be, was or is experienced.

As Nissenbaum has proposed, perhaps it is time that we rethink the norms of how data are governed and whether informed consent with todayís internet is really a sufficient approach towards protecting individual privacy. We can’t agree on a lot in America today, but it feels like keeping our medical histories safe from advertisers is maybe one area where we could find a majority of support?

A Case Study on the Evolution and Effectiveness of Data Privacy Advocacy and Litigation with Google

A Case Study on the Evolution and Effectiveness of Data Privacy Advocacy and Litigation with Google
By Jack Workman | March 3, 2019

2018 was an interesting year for data privacy. Multiple data breaches, the Facebook Cambridge Analytica scandal, and the release of the European Union’s General Data Protection Regulation  (GDPR) mark just a few of the many headlines. Of course, data privacy is not a new concept, and it is gaining prominence as more online services collect and share our personal information. Unfortunately, as 2018 showed, this gathered personal information is not always safe, which is why governments are introducing and exploring new policies and regulations like GDPR to protect our online data. Some consumers might be surprised that this is not the first time governments have attempted to tackle the issue of data privacy. GDPR actually replaced an earlier  data privacy initiative by the EU called the Data Protection Derivative of 1995. In the US, California’s Online Privacy Protection Act  (CalOPPA) of 2003 governs many actions involving privacy and is planned to be replaced by the California Consumer Privacy Act  (CCPA) in 2019. Knowing this, you might be wondering, what’s changed? Why do these earlier policies need replacing? And are these policies actually effective in setting limits on data privacy practices? To answer these questions, we turn to the history of one of the internet’s most well-known superstars: Google.

Google: Two Decades of Data Privacy History

Google’s presence and contributions in today’s ultra-connected world cannot be understated. It owns the most used  search engine, the most used internet browser, and the most popular smartphone operating system. Perhaps more than any other company, Google has experienced and been at the forefront of the evolution of the internet’s data privacy debates.

As such, it is a perfect subject for a case study to answer our questions. Even better, Google publishes an archive  of all of its previous privacy policy revisions with highlights of what’s changed. Why are privacy policies important? Because privacy policies are documents legally required to be shared by a company to explain how it collects and shares personal information. If a company changes its approach to personal information use, then this change should be reflected in a privacy policy update. By reviewing the changes between Google’s privacy policies, we can assess how Google responded to and the impact on Google of major data privacy events in the last two decades of data privacy advocacy and policy.

2004: The Arrival of CalOPPA

Google’s first privacy policy , published in June of 1999, is a simple affair: only 3 sections and 617 words. The policy remained mostly the same until July 1, 2004, the same date that CalOPPA’s policy went into effect, where Google added a full section on “Data Collection” and much further detail on how it shared your information. Both additions were required under the new regulations set forth by CalOPPA and can be considered positive steps towards more transparent data practices.

2010: Concerns Over Government Data Requests

A new update in 2010 brings first mention of the Google Dashboard. The Dashboard, published after rising media attention focusing on reports that Google shared its data with governments upon request, is a utility for users to view the data Google’s collected. This massive increase in transparency can be considered a big win for data privacy advocates.

2012: A New Privacy Policy and Renewed Scrutiny

March 2012 brings Google’s biggest policy change yet. In a sweeping move, Google overhauled its policy to give it the freedom to share user data across all of its services. At least, all except for ads: “We will not combine DoubleClick cookie information with personally identifiable information unless we have your opt-in consent”. This move received negative attention and fines from both international media and governments.

2016: The Ad Wall Falls

With a simple, one-line change in its privacy policy, Google drops the barrier preventing it from using data from all of its services to better target its advertisements. This move shows that, despite previous negative attention, Google is not afraid of expanding its use of our personal information.

2018: The Arrival of GDPR

It is still far too soon to assess the impact of GDPR, but, if the impact on Google’s privacy policy  is any indicator, then it represents a massive change. With the addition of videos, additional resources, and clearer language, it seems as if Google is taking these new regulations very seriously.

Conclusion

Comparing Google’s first privacy policy to its most recent depicts a company that’s become more aware of and more interested in communicating its data practices. As demonstrated, this growth was caused by media scrutiny and governmental legislation along the way. However, while the increased transparency is appreciated, the same media scrutiny and governmental legislation has not prevented Google from expanding its use and sharing of our personal information. This raises a new question that will only be answered with time: will GDPR and the pending US regulations actually place real limits on the use of and protections for our personal information, or will they just continue to increase transparency?