Breast Cancer, Genetic Testing, and Privacy

Breast Cancer, Genetic Testing, and Privacy
By Anna Jacobson | June 24, 2019

5%-10% of breast cancer is believed to be hereditary, meaning that it results directly from a genetic mutation passed on from a parent. The most common known cause of hereditary breast cancer is an inherited mutation in the BRCA1 or BRCA2 gene; about 70% of women with these mutations will develop breast cancer before the age of 80. Identification of these mutations can determine a breast cancer patient’s course of treatment and post-treatment monitoring, inform decisions about if and how she has children, and raise awareness in her family members of their potentially higher risk.

Because of this, newly diagnosed breast cancer patients may be referred for genetic risk evaluation if they meet criteria laid out in the National Comprehensive Cancer Network (NCCN) genetic testing guidelines, including family medical history, tumor pathology, ethnicity, and age. These at-risk patients typically undergo multi-gene panel testing that looks for BRCA1 and BRCA2 mutations, as well as a handful of other less common gene mutations, some of which are associated with inherited risk for other forms of cancer as well as breast cancer.

Genetic testing for breast cancer is a complex issue that raises many concerns. One concern is that not enough patients have access to the testing; some recent studies have shown that the genetic testing guideline’s criteria are too restrictive, excluding many patients who in fact do carry hereditary gene mutations. Another concern is that the testing is not well-understood; for example, patients and even doctors may not be aware that there are many BRCA mutations that are not detected by current tests, including ones that are more common that those that are currently tested. Yet another set of concerns revolves around the value of predictive genetic testing of family members who do not have a positive cancer diagnosis, and whether the benefit of the knowledge of possible risk outweighs the potential harms.

To help a patient navigate this complexity, this genetic testing is ideally offered in the context of professional genetic expertise for pre- and post-test counseling. However, under a 2013 Supreme Court ruling which declared that genes are not patentable, companies like 23andMe now offer direct-to-consumer BRCA testing without professional medical involvement or oversight. And even at its best, genetic counseling comes at a time at which breast cancer patients and their caregivers may be least able to comprehend it. They may be suffering from the shock of their recent diagnoses. They may be overwhelmed by the vast amount of information that comes with a newly diagnosed illness. Most of all, they may only be able to focus on the immediate and urgent need to take the steps required to treat their disease. To many, it is impossible to think about anything other than whether the test results are positive, and if they are, what to do.

But to a breast cancer survivor, other concerns about her genetic testing may arise months or years later. One such concern may be about privacy. Genetic testing for breast cancer is not anonymous; as with all medical testing, the patient’s name is on the test order and the results, which then become part of the patient’s medical record. All medical records, including genetic test results, are protected under HIPAA (Health Insurance Portability and Accountability Act of 1996). However, the recent proliferation of health data breaches from cyberattacks and ransomware has given rise to growing awareness that the confidentiality of medical records can be compromised. This in turn leads to the fears that exposure of a positive genetic test result — one that suggests increased lifetime cancer risk — could lead to discrimination by employers, insurers, and others.

In the United States, citizens are protected against such discrimination by GINA (Genetic Information Nondiscrimination Act of 2008), which forbids most employers and health insurers from making decisions based on genetic information. However, GINA does not apply to small businesses (with fewer than 15 employees), federal and military health insurance, and other types of insurance, such as life, disability, and long-term care. It also does not address other settings of potential discrimination, such as in housing, social services, education, financial services and lending, elections, and legal disputes. Furthermore, in practice it could be very difficult to prove that discrimination prohibited by GINA took place, particularly in the context of hiring, in which it is not required that an employer give complete or truthful reasons – or sometimes, any reasons at all – to a prospective employee for why they were not hired. And perhaps the greatest weakness of GINA, from the standpoint of a breast cancer survivor, is that it only prohibits discrimination based on genetic information about someone who has not yet been diagnosed with a disease.

Though not protected by GINA, cancer survivors are protected by the Americans with Disabilities Act (ADA), which prohibits discrimination in employment, public services, accommodations, and communications based on a disability. In 1995, the Equal Employment Opportunity Commission (EEOC) issued an interpretation that discrimination based on genetic information relating to illness, disease, or other disorders is prohibited by the ADA. In 2000, the EEOC Commissioner testified before the Senate that the ADA “can be interpreted to prohibit employment discrimination based on genetic information.” However, these EEOC opinions are not legally binding, and whether the ADA protects against genetic discrimination in the workplace has never been tested in court.

Well beyond existing legislative and legal frameworks, genetic data may have implications in the future of which we have no conception today, more than perhaps any other health data. The field of genomics is rapidly evolving; it is possible that a genetic mutation that is currently tested because it signals an increased risk for ovarian cancer might in the future be shown to signal something completely different and possibly more sensitive. And unlike many medical tests which are relevant at the time of the test but have decreasing relevance over time, genetic test results are eternal, as true on the day of birth as on the day of death. Moreover, an individual’s genetic test results can provide information about their entire family, including family members who never consented to the testing and family members who did not even exist at the time the test was done.

The promise of genetic testing is that it will become a powerful tool for doctors to use in the future for so-called “precision prevention”, as well as personalized, targeted treatment. However, in our eagerness to prevent and cure cancer, we must remember to consider that as the area of our knowledge grows, so too grows its vulnerable perimeter – and so must our defenses against those who might wish to misuse it.


  • “Genetic Testing and Privacy.”, 28 Sept. 2016,
  • “Genetic Testing Guidelines for Breast Cancer Need Overhaul.”, 24 August 2018,–esid–&enl=true.
  • “Genetic Information Privacy.”
  • “Genetic Discrimination.”
  • “NCCN Guidelines Version 3.2109.”
  • “Understanding Genetic Testing for Cancer.”

Maintaining Data Integrity in an Enterprise

Maintaining Data Integrity in an Enterprise
By Keith Wertsching | June 21, 2019

Everyone suffers when an enterprise does not maintain the integrity of its data and the leaders employ that data to make important decisions for the enterprise. There are many roles involved in mitigating the risk of poor data integrity, which is defined by Digital Guardian as “the accuracy and consistency (validity) of data over its lifecycle.” But who should be responsible for making sure that the integrity of the data is preserved throughout collection, extraction, and use by the data consumers?
The agent who maintains data accuracy should ideally be someone who:

  • Understands where the data is collected from and how it is collected
  • Understands where and how the data is stored
  • Understands who is accessing the data and how they are accessing it
  • Has the ability to recognize when that data is not accurate and understands the steps required to correct it

Too often, the person responsible for maintaining data integrity is focused primarily on the second bullet point, with a casual understanding of the first and third bullet points. Take this job description for a data integrity analyst from Investopedia:
“The primary responsibility of a data integrity analyst is to manage a company’s computer data by way of monitoring its security…the data integrity analyst tracks records indicating who is accessing what information held by company computer systems at specific times.”

The job description demonstrates that someone working in data integrity should be an expert on where and how the data is stored, and be familiar with who should be accessing that information in order to make sure that company data is not stolen or used inappropriately. But who is ultimately responsible for making sure that the information is accurate in the first place, and for making sure that any changes needed are done in a timely fashion and tracked for future records?

In today’s world of enterprise database administrators, there is often a distinct separation between the person or team that understands how the data is stored and maintained and the person or team that has the ability to recognize when the data is not accurate. Let’s take the example of a configuration management database (CMDB) to highlight the potential issues from separation of data integrity responsibility. SearchDataCenter defines a CMDB as “a database that contains all relevant information about the hardware and software components used in an organization’s IT services and the relationships between those components.” The information stored in the CMDB is important because it allows the entire organization to refer to technical components in the same manner. In a larger organization, the team that is responsible for provisioning hardware and software components will often be responsible for also making sure that any information related to newly provisioned components makes its way into the CMDB. There is often an administrator or set of administrators that will maintain the information in the CMDB. The data will then be consumed by a large number of teams, including IT Support, Project Teams, and Finance.

When the accuracy of the data is not complete, the teams consuming the data do not have the ability to speak the same language regarding IT components. The Finance Team may allocate dollars based on the number of components or breakdown of types of components. If they do not have adequate information, they may fail to allocate the right budget for the project teams to complete their work on time. A different understanding of enterprise components may cause delays in assistance from the IT Support organization, which has the potential to push out timelines and delay projects.

One potential solution to this issue: make one team responsible for maintaining the accuracy of the data from collection to consumption. As mentioned before, this team needs to have an understanding of where the data comes from, how it is stored, how it is consumed, and the ability to recognize when the data is not accurate and the steps required to correct the information. The data integrity team must be accessible to the rest of the organization to correct data accuracy problems when they arise. As the team grows and matures, they should target developing proactive measures to test that data is accurate and complete so that they can solve data integrity issues before they impact the user. By assigning specific ownership over the entire data lifecycle to one team, the organization can enforce accountability and integrity and mitigate the risk that leaders make poor decisions based on false information.


[1] Digital Guardian:
[2] Investopedia:
[3] SearchDataCenter:

Using Social Media to Screen Job Candidates: Ethical and Future Implications

Using Social Media to Screen Job Candidates: Ethical and Future Implications
By Anonymous | June 24, 2019

Image Source:

Hiring qualified people is hard. Most of the time, the foundation of a hiring manager’s decision is built off of a 1-page resume, a biased reference or two (sometimes none), and a few hours of interviews with the candidate on their best behavior.

It’s no surprise that around 70% of employers have admitted to snooping on personal social media sites as a method for screening candidates [1]. Since hiring someone who isn’t the right fit can be expensive, it’s only natural for companies to turn to Facebook, Twitter, Instagram, or other social media sites to get a deeper glimpse into the personality they’re hiring. Unfortunately, there’s a lot that can go wrong for all parties involved due to the ethical implications.

What could go wrong?

Using social media to screen candidates doesn’t just weed out people who are vocal online about their criminal or illegal behavior. Doing this can lead to hiring managers screening out perfectly qualified candidates.

Recently, CIPD (an employee advocate group based in London) wrote a comprehensive pre-employment guide for organizations to follow, and included a section on using social media for job screening [2]. They outlined the risks of employers doing this, which included a case study about a company deciding not to hire a transgender candidate, even after indicating that the individual was suitable for the job prior to the social media check. This was considered an act direct discrimination under a protected characteristic, brought on by the company using social media to get more information on the candidate.

It doesn’t stop there. For some people, it’s common sense that employers review social media profiles, and they are able to keep their private thoughts secured. However, not everybody is a social media expert, and deciphering exactly is and isn’t private can be unwieldy. Many people are not aware that they are consenting to disclosing posts from 5+ years ago to potential employers. When companies don’t directly disclose that all content from personal social media sites are subject to review, this could be considered a breach of privacy for individuals who are unaware.

The Future of Social Media Screening

Manually reading through social media sites for potential issues with the candidate is time consuming. Why can’t someone your just create an algorithm that parses through social media content when it’s available, and labels attributes of your employees for you?

Image Source:

With the massive influx of artificial intelligence being leveraged within the job-hunting industry, it’s surprising that this isn’t already an industry norm. However, there are a myriad of potential ethical concerns around creating algorithms to do this.

It’s entirely possible that job candidates can fall victim to algorithmic bias, and be categorized as something they’re not because of an unperfected algorithm. If someone is new to social media and undergoes a screening like this, it’s possible the result will find no positive traits for the candidate, and the company will reject the candidate based on the algorithm’s decision.

Between the start-ups that continue to sprout up for the purpose of data mining to gain valuable insights on individuals and the “Social Credit Score” going live in Chine in 2020 [3], it’s hard to discount the possibility of algorithmic social media screenings that score how “hirable” a candidate is becoming prevalent. Because of this, all aspects of the hiring process should continually be subjected to ethical laws and frameworks to protect job candidates from unfair discrimination.





Ethical Implication of Generative AI

Ethical Implication of Generative AI
By Gabriel Hudson | April 1, 2019

Generative data models are rapidly growing in popularity and sophistication in the world of artificial intelligence (AI). Rather than using existing data to classify an individual or predict some aspect of a dataset these models actually generate new content. Recently developments in generative data modeling have begun to blur lines not only between real and fake, but also between machine and human generated content creating a need to look at the ethical issues that arise as the technologies evolve.

Bots are an older technology that has already been used over a large range of functions such as automated customer service or directed personal advertising. Bots are generative (almost exclusively creating language), but historically have been very narrow in function and limited to small interaction on a specified topic. In May of 2018 Google debuted a Bot system called Duplex that was able to successfully “fool” a significant number of test subjects while carrying out daily tasks such as booking restaurant reservations and making a hair salon appointment (link). This, combined with ubiquity of digital assistants, sparked a resurgence in bot advancement.

In this case Deepfake is a generalized term used to describe very realistic “media” (such images, videos, music, and speech) created with an AI technology know as a Generative Adversarial Network (GAN). GANs were originally introduced in 2014 but came into prominence when a new training method was published in 2018. GANs represent the technology behind seemingly innocuous generated media such as the first piece of AI generated art sold (link):

as well as a much more harmful set of false pornographic videos created using celebrities faces

The key technologies in this area were fully release fully released to the public upon their completion.

Open AI’s GPT-2
In February 2019 Open AI (a non-profit AI research organization founded in part by Elon Musk) released a report claiming a significant technology breakthrough in generating human sounding text as well as promising sample results (link). Open AI, however, against longstanding trends in the field and their own history chose not to release the full model citing potential for misuse on a large scale. Similar to GPT-2, there have also been breakthroughs in generative technology in other media like images, that have been released to the public. All of the images in the subsequent frame were generated with technology developed by Nvidia.

In limiting access to a new technology Open AI brought to the forefront some discussions about how the rapid evolution of generative models must be handled. Now that almost indistinguishable “false” content can be generated in large volume with ease it is important to consider who is tasked with deciding and maintaining the integrity of online content. In the near future, discussions must be extended about the reality of the responsibilities of both consumers and distributors of data and the way their “rights” to know fact from fiction and human from machine may be changing.

ESG Investing and Data Privacy

ESG Investing and Data Privacy
By Nate Velarde | March 31, 2019

Much of the focus on how to better protect individuals’ data privacy revolves around legal remedies and more stringent regulatory requirements. Market-based solutions are either not discussed or seen as unrealistic, ineffective or impractical. However, the “market” in the form of “responsible” or “sustainable” driven investors are imposing market discipline on companies with insufficient data privacy safeguards through lower share prices and redirecting investment capital to those companies with lower data privacy risks. Responsible investing as a market force is poised to grow dramatically. Blackrock, the world’s largest asset manager, is forecasting that responsible investing strategies will comprise 21% of total fund assets by 2028, up from only 3% today.

Responsible investing involves the integration of environmental, social and governance (“ESG”) factors into investment processes and decision-making. Many investors recognize that ESG information about companies is vital to understand a company’s business model, strategy and management quality. Several academic studies have shown that good corporate sustainability performance is associated with good financial results and superior investment returns. The best known ESG factors having financial relevance are those related to climate change. The reason for this is that climate change is no longer a hypothetical threat, but one that is real with multi-billion dollar consequences for investment portfolios.

Why Do ESG Investors Care About Data Privacy?

ESG investors are becoming increasing focused on data privacy issues. Under the ESG framework, data privacy is considered a human rights issue – falling under the “S” of ESG. Privacy is a fundamental human right, according to international norms established by the United Nations, the US and EU constitutions, but it is increasingly at odds with the business models of technology companies. As these companies have become more reliant on personal data collection, processing and distribution, they have faced increased scrutiny from users and regulators, heightening reputational, litigation and regulatory risks.

Data has been dubbed the “new oil”, the commodity that powers the digital economy. But, as investors are finding, scandals caused by privacy breaches can be just as damaging to tech behemoths as oil spills are to fossil fuel companies. Facebook-Cambridge Analytica was the tech industry’s Exxon-Valdez moment in regards to data privacy. $120 billion was wiped off Facebook’s market value in the aftermath of the scandal. Many of the sellers were ESG investors who sold the stock because of what they perceived as Facebook’s poor data stewardship.

For ESG investors, data privacy risk has become a crucial metric in assessing the companies in which they invest. ESG funds are pushing companies to be more transparent in their data-handling processes (collection, use and protection) and privacy safeguards with shareholders. ESG investors want companies to be proactive and self-regulate rather than wait for government involvement, which often tends to be overbearing and ultimately, more damaging to long-term profitability.

How ESG Investors Advocate for Data Privacy

ESG investors have three levers to advocate for stronger privacy safeguards – one carrot and two sticks. The first is dialog with senior management. As shareholders and/or potential shareholders, ESG investors are given the opportunity to meet regularly with the CEO, CFO and other key executives. ESG investors use their management face time to discuss business opportunities and risks, of which privacy, is top of mind. ESG investors can highlight any deficiencies in privacy policies (relative to what they see as industry best practice) and advocate for increased management and board oversight, spending on privacy and security audits and staff training and helping shift the mindset of executives towards designing in privacy into their products and servives. The key message ESG investors convey to tech executives is that companies that are better at better managing privacy risks have a lower probability of suffering incidents that can meaningfully impact their share price. Any direct incremental expenses associated with privacy risk mitigation is miniscule (in dollar terms) compared to the benefits of a higher share price valuation that is associated with lower risk.

As demonstrated by the Facebook-Cambridge Analytica share price sell-off in mid-2018, ESG investors’ second lever is to vote with their feet and sell their shares if companies fall short of data privacy expectations. Large share price declines are never pleasant, but they are often temporary. As long as business model profitability is not permanently impaired, the share price will eventually recover in most cases. Management may not feel enough pain to see through the hard work of implementing the technical and cultural changes required to adequately protect their users’ data. This is when ESG investors’ third lever can be deployed. Acting in concert with other shareholders, ESG investors’ can engage in a proxy fight and vote to replace the company management and/or board with one more focused on data privacy concerns. The mere threat of a proxy fight has proved to be a powerful catalyst for change at many companies across many industries. While this has yet to happen specifically in regards to data privacy, given the growing market power of ESG investors and their focus on privacy issues, that day is likely to come sooner, rather than later.


Data privacy researchers and advocates should establish relationships with ESG investors, ESG research firms (Sustainlytics) and influential proxy voting advisory firms (Institutional Shareholder Services and Glass-Lewis), to highlight concerns, make recommendations and mold the overall data privacy conversation at publicly traded technology companies. Data privacy advocacy through ESG investors is a more direct, and likely, much faster route to positive change (albeit, incremental) than litigation or regulation.

The Privacy Tradeoff

The Privacy Tradeoff
By John Pette | March 31, 2019

I see privacy referenced often as an all-or-nothing proposition, often in discussions of whether one has it or one does not. In the realm of data, though, privacy exists on a continuum. It is a tradeoff between the benefits from having data readily available and the protection of people’s privacy. There is tremendous gray area in this discussion, but some things are clear. Few would argue that all social security numbers should be public. Things like people’s names and addresses are less clear. It is easy to argue that these data have always been publicly available in America via the White Pages. This is not a valid argument, as it ignores context. While that information was certainly available, the internet was not. Name, phone, and address records were not in one collected location; they were only on the local level, and not digitized. As such, there were limits to the danger of dissemination. Also, there was only so much a bad actor could do with information. In the modern world, anyone can use these basic data elements to commit fraud from anywhere in the world. The context has changed, and the need to protect information has changed with it.

Of course, to what extent data should be protected is also a gray area. Technology and, arguably, society benefit greatly from data availability. People want Waze to work reliably. Many of those same people probably do not want Google to track their locations. It is easy to go too far in either direction. These sorts of situations should all have privacy assessments to evaluate the benefits and risks.

The privacy tradeoff is particularly tricky in government, which has the responsibility for protecting its citizens, but also an obligation for transparency. In studying public crime data from all U.S. municipalities with populations of more than 100,000, I uncovered enormous differences in privacy practices. Some cities made full police reports publicly available to any anonymous user, exposing the privacy details of anyone involved in an incident. Others locked down all data under a blanket statement like, “All data are sensitive. If you want access to a report, file a FOIA request in person.” In the latter case, the data are certainly protected, but the police departments provide no data of value to its citizens. At the risk of making a fallacious “slippery slope” argument, I fear the expansion of government using privacy as a catch-all excuse for hiding information and eliminating transparency. The control of information is a key element of any authoritarian regime, and it is easy to reach that point without the public noticing.

The Freedom of Information Act (FOIA) is intended to provide the American public transparency in government information. It is a flawed system with good intentions. Having worked in an office responsible for FOIA responses for one government bureau, I have seen both sides of FOIA in action. When people discuss their FOIA requests publicly, it is generally in the form of complaints, and usually in one of two contexts:

  1. “They are incompetent.”
  2. “They’re hiding something.”

Most of the time, no one is intentionally hiding anything, though that makes for the most convenient conspiracy theories. In reality, there is an unspeakable volume of FOIA requests. Records are not kept in any central database, so each response requires any involved employee to dig through their email, and their regular jobs are already full-time affairs. Then, each response goes through multiple legal reviews to redact privacy data of U.S. citizens. Eventually, this all gets packaged, approved, and delivered to the requestor. It is far from a perfect system. However, it does, to a sufficient degree, serve its original intent. As long as FOIA is in place and respected, I do not see the information control aspect of government devolving into authoritarianism.

What is the proper balance? This is the ultimate question in the privacy tradeoff. Privacy risk should be assessed with every new technology or application that could contain threats of exposure, and the benefits should always outweigh those risks to the public. If companies provide transparency in their privacy policies and mechanisms for privacy data removal, the benefits and risks should coexist harmoniously.

The Bias is Real

The Bias is Real
By Collin Reinking | March 31, 2019

In 2017, a poll conducted by Digital Examiner, after a very public controversy in which a Google employee wrote about Google’s “ideological echo chamber”, 57.1% of respondents said they believed search results were “biased” in some way. In the years since then, Google, alongside other big tech companies, increasingly find themselves at the center of much public debate about whether their products are biased.

Of course they are biased.

This is nothing new.

That doesn’t mean it can’t be a problem.

Search is Biased
In its purest form, search engines filter down the massive corpus of media hosted on the world wide web to just the selections from the corpus that relate to our desired topic. Whatever bias that corpus has, the search engine will reflect. Search engines are biased because the Internet is made by people and people are biased.

This aspect of bias rose to the international spotlight in 2016 when a teenager from Virgina posted a video showing how Google’s image search results for “three white teenagers” differed from the results for “three black teenagers”. The results for “three white teenagers” were dominated by stock photos of three smiling teens while the results for “three black teenagers” were dominated by mugshots (performing the same searches today mostly returns images from articles referencing the controversy).

In their response to the controversy Google asserted that it’s search engine results are driven by what images were found next to what text on the Internet. In other words, Google was only reflecting the corpus it’s search engine was searching over. Google didn’t create the bias, the Internet did.

This is Not New
Before the Internet there were libraries. Before search engines there were card catalogs, many of which relied on the Dewey Decimal Classification system. Melvil Dewey was a serial sexual harasser whose classification system reflected the racism and homophobia, along with other biases, that were common in the dominant culture at the time of its invention in 1876. If you had searched the Google of 1919 for information about homosexuality, you would have landed in the section for abnormal psychology, or similar. Of the 100 numbers in the system dedicated to religion, 90 of them covered Christianity. Google didn’t invent search bias.

Pick your Bias
Do we want Google, or any other company to try to filter or distort our view of the corpus? This is the first question we must ask ourselves when we consider the conversation around “fixing” bias in search. Some instances clearly call for action, such as Google’s early efforts at image labeling not adequately distinguishing images of African Americans from Images of gorillas. Other questions, like how to handle content that some might consider political propaganda or hate speech, are more confusing and would require Google to serve as of truth and social norms.

But We Know The Wrong Answer When We See It
Google is currently working to build an intentionally censored search engine, Dragonfly, to allow itself to enter the Chinese market. This project, which is shrouded in more secrecy than usual (even for a tech company), is the wrong answer. Google developing a robust platform for managing censorship is basically pouring the slippery onto the slope. With the current political climate both here in the United States and around the globe it is not hard to imagine actors of all political stripes looking to exert more control over the flow of information. Developing an interface for bias to exert that control is not a solution, it’s a problem.

A New Danger To Our Online Photos

A New Danger To Our Online Photos
By Anonymous | March 29, 2019

This is age of photo sharing.

We as humans have replaced some our socialization needs by posting our captured moments online. Those treasured pictures on Instagram and facebook fulfill many psychological and emotional needs – from keeping in touch with our family, reinforcing our ego, collecting our memories and even so we can keep up with the Joneses.

You knew what you were doing when you posted your Lamborghini to your FB group. Photo credit to @Alessia Cross

We do this even when the dangers of posting photos at times appear to outweigh the benefits. Our pictures can be held for ransom by digital kidnappers, used in catfishing scams, used to power fake gofundme campaigns or be gathered up by registered sex offenders. Our photos could expose us to real world perils such as higher insurance premiums, real life stalking (using location metadata) and blackmail. This doesn’t even include activities which aren’t criminal but still expose us to harm – like our photos being used against us in job interviews, being taken out of context or being used to embarrass us years later. As they say, the internet never forgets.

As if this all wasn’t even enough, now our private photos are being used by companies to train their algorithms. According to this article in fortune, IBM “released a collection nearly a million photos which were scraped from Flickr and then annotated to describe the subject’s appearance. IBM touted the collection of pictures as a way to help eliminate bias in facial recognition. The pictures were used without consent from the photographers and subjects, IBM relied on “Creative Commons” licenses to use them without paying licensing fees.

IBM has issued the following statement:

IBM has been committed to building responsible, fair and trusted technologies for more than a century and believes it is critical to strive for fairness and accuracy in facial recognition. We take the privacy of individuals very seriously and have taken great care to comply with privacy principles, including limiting the Diversity in Faces dataset to publicly available image annotations and limiting the access of the dataset to verified researchers. Individuals can opt-out of this dataset.

Opting-out however is easier said than done. To remove any images requires photographers to email IBM links to the images they would like to have removed which is a bit hard since IBM has not revealed usernames of any users it pulled photos from.

Given how all the dangers our photos are already exposed to, it might be easy to dismiss this. Is a company training models on your pictures really more concerning than, say, what your creepy uncle is doing with downloaded pictures of your kids?

Well, it depends.

The scary part of our pictures being used to train machines is that we don’t know a lot of things. We don’t know which companies are doing it and we don’t know what they are doing it for. They could be doing it for a whole spectrum of purposes from the beneficial (make camera autofocus algorithms smarter) to innocuous (detect if someone is smiling) to possibly iffy (detect if someone is intoxicated) to ethical dubious (detecting someone’s race or sexual orientation) to downright dangerous (teach Terminators to hunt humans).

It’s all fun and games until your computer tries to kill you. Photo by @bwise

Not knowing means we don’t get to choose. Our online photos are currently thought of a public good and used for any conceivable purpose, even if those purposes are not only something we may not support but possibly even harmful to us. Could your Pride Parade photos be used to train detection of sexual orientation? Could insurance companies use your photos to train detection of participation in risky activities? Could T2000s use John Connor’s photos to find out what Sarah Connor would look like? Maybe these are extreme examples, but it is not much of a leap to think there might be companies developing models that you might find objectionable. And now your photos could be helping them.

All of this is completely legal of course, though it goes against the principles laid out in the Belmont Report. It doesn’t respect persons due to its lack of consent (Respect for Persons), it provides no real advantage to the photographers or subjects (Beneficience) and all the benefits really go to the companies exploiting our photos while we absorb all of the costs (Justice).

With online photo sharing, a Pandora’s box has been opened and there is no going back. As much as your local Walgreen’s Photo Center might wish it, wallet sized photos and printed 5.75 glossies are things of the past. Online photos are here to stay, so we have have to do better.

Maybe we can start with not helping Skynet.

Hasta la vista, baby.


Millions of Flickr Photos Were Scraped to Train Facial Recognition Software, Emily Price, Fortune March 12, 2019,

Apple’s Privacy Commercial: A Deconstruction

Apple’s Privacy Commercial: A Deconstruction
By Danny Strockis | March 29, 2019

On March 14, Apple released their most recent advertisement, ‘Privacy on iPhone – Private Side’. Reflecting classic Apple style, it’s a powerful piece of subliminal advertising that is both timely and emotional. In 54 short seconds, Apple comments on a variety of privacy matters and not-so-subtley positions their company and products as the antidote to the surveillance economy run by the likes of Google and Facebook.

Privacy is a notoriously difficult concept to define; there’s not a single way to capture its full essence. So let’s have a closer look at Apple’s commercial and break down the privacy messages within. I’ll also touch on implications of the commercial as a whole.

0:03 – Keep out

Apple starts out by touching on the privacy of the home. A home is a hugely valued center of privacy, where people feel most comfortable. When the privacy of a home is violated, harsh reactions often follow. Examples of recent home privacy violations include the introduction of always-on personal assistants like Google Home, Amazon’s ability to deliver packages inside your home, and Google’s wifi-sniffing self-driving cars.

This scene simultaneously addresses what 1890 lawyers Warren & Brandeis have called “the right to be let alone”. When we feel that our solitude has been unwantingly violated, we often claim a privacy violation has taken place. Privacy expert Daniel Solove would identify thsee as violations of “Surveillance” or “Intrusion”.

0:08 – The eavesdropping waitress

I personally identify with this next scene; just last week I found myself pausing a conversation with my brother to let a waitress refill my water. It’s a unique way for Apple to comment on the importance of privacy in conversations, even when those conversations take place in a public forum. We often say or post something in a forum that is public, and yet maintain a certain expectation around the privacy of our words. Many online examples exist of intrusion on conversations – for instance, Facebook reading texts to serve advertisements.

0:19 – Urinals

The most laugh-out-loud scene playfully acknowledges our desire for privacy of our physical selves and bodies. It’s an often under-appreciated part of privacy in the technology, but with the advent of selfies, fitness trackers, and always-on video cameras, protection of people’s physical self is an increasingly relevant subject. Solove calls privacy violations of this nature “Exposure”.

0:23 – Paper shredder

In perhaps the most direct scene, Apple succintly addresses many important topics around data privacy. The credit applicaton being shredded contains many pieces of highly sensitive personally identifiable information, which in the wrong hands could be used for identify theft and many other privacy violations. The scene plainly describes our desire to keep our information out of the wrong hands, and exercise some control over our data.

Importantly this scene also personifies our desire to have our information destroyed once it’s no longer needed. While many online businesses have made a habit of collecting historical information for eternal storage, policies in recent years have begun to enforce maximum retention periods and the right-to-be-forgotten).

0:25 – Window locks

A topic closely assoicated with privacy, especially online, is security. When a company’s databases are breached and our personal information leaked to hackers, we feel our privacy has been violated in what Solove would call “Insecurity” or “Disclosure”. In the offline world, we take great strides to ensure our security, like locking our windows or installing a security system in our home. In the online world, security is often far more out of our control; we are only as safe as the weakest security practices of the websites we visit.

0:31 – Traffic makeup

Interestingly, the final and longest segment of the commerical is perhaps the least obvious (but maybe that’s becuase I’m a male). I believe the image of a woman being watched while she applies makeup is primarily intended to describe how we don’t want creepy observers in our lives. But I like to think Apple comments on something else here – our desire to control how we are perceived in life.

Solove says that “people want to manipulate the world around them by selective disclosure of facts about themselves… they want more power to conceal information that others might use to their disadvantage.” A person’s right to control their physical appearance might be the most basic form of this desire. When the unwelcome driver in the next lane watches the businesswoman apply makeup, he steals from her the ability to control her appearance to the world.

Apple and Privacy

Apple has been heralded as a privacy and security conscious company, and industry leader amongst a sea of companies with lackluster views towards consumer privacy. A flagship example of Apple’s commitment to privacy is their early adoption of end-to-end encryption in iMessage, which protects the privacy of your written conversations on the Apple platform. Apple has also made a point of highlighting that even though they could, they have chosen not to collect information on its customers and use it for advertising or secondary uses. They like to say “their customer is not their product.”

Even still, Apple hasn’t been immune to its own privacy problems. A 2014 breach of iCloud celebrity accounts made headlines. More recently, a Facetime mistake allowed callers to view the recipient’s video feed before the recipient accepted the call. Loose rules about 3rd party application access to user information has also come under scrutiny.

Reception for Apple’s privacy commercial has been largely positive, but some have highlighted Apple’s imperfect privacy record. But I believe the more significant event here is the promotion of privacy as a topic into the forefront of consumer advertisement. Apple has long been an innovator in creative advertisements. The fact that Apple has promoted privacy so heavily shows that Apple believes privacy has reached a tipping point and has become something customers look for in purchasing decisions. This goes against previous research studies, which have shown that consumers de-priortize privacy in favor of other factors until all other factors are equal. Privacy historically takes a stark back seat to price, convenience, appeal, and functionality.

Apple seems to think privacy concern is at an all time high, and represents a business opportunity for their company. Google search trends for the “privacy” topic would say otherwise:

Only time will tell if privacy has become enough of an issue to drive a change in Apple’s bottom line. But for their part, Apple has once again done a masterful job of distilling a highly complex range of emotions into a beautiful and powerful piece of art.

Health Insurance and Our Data

Health Insurance and Our Data
By Ben Thompson | March 29, 2019

Health Insurance is a part of life. It is something that most of us need and have to purchase each year whether we like it or not. If you’re lucky, this involves picking a plan offered by an employer who takes on some of the financial burden, which is helpful considering plan prices continue to rise. Otherwise, you’re left to either choose a full price plan from the marketplace (at least $1000/month for family of 3) or the risk of going uninsured, paying out of pocket for any medical expenses and rolling the dice that a catastrophe won’t cause you major debt.

When you sign up for a health insurance plan you probably assume that the insurance company has some of your basic data, like the data you shared when you signed up or some medical record data, however, you might be surprised to find out that insurance companies collect far more than basic demographic information. These companies are collecting, or purchasing, all sorts of data about people, including information on income, hobbies, social media posts, recent purchases, types of cars owned and television viewing habits. This should be concerning. Consider that health insurance companies make more money by insuring more healthy people than unhealthy. What if they begin to use data to predict who is healthier and make coverage decisions from these predictions? For example, LexisNexis is a company that collects data on people and uses hundreds of non-medical personal attributes, like those previously mentioned, to estimate the cost of insuring a person and sells this information to insurance companies and actuaries. They say that this information is not used to set plan prices but there are no laws prohibiting this.

Currently, HIPPA and the Genetic Information Nondiscrimination Act only regulate how health records are used but not other data, and even the regulations on health records are fairly lax. For example, the Genetic Information Nondiscrimination Act does not apply to life insurance. This means that a life insurer can use your genetic data if you’ve had a genetic test, like 23andMe or, to alter your policy. If you refuse to share the requested data after having a test done, they can legally terminate your policy. More generally, there are no laws prohibiting insurance companies from collecting any non-health related data about you or how they use it. It is all free-game.

When you use the internet you don’t assume that your actions could influence your ability to get fair health coverage. You’re not anticipating that data brokers are tracking your every action, attempting to infer as much as possible about you, often getting it wrong and selling it all to insurance companies. There is great evidence that the data that data brokers compile is incorrect. We need to start demanding that there be policies in place to regulate what data insurance companies can collect and how they can use it.

With the passing of GDPR in the EU, an EU citizen is legally allowed to request to view all of the data an insurer has on them, request that it be deleted from their databases, and/or make corrections to it. It is time that the U.S. start implementing similar regulations. These straightforward rights would move the U.S. a long ways towards making sure that everyone has access to to fair health coverage and has control over their personal data.

Allen, Marshall. “Health Insurers Are Vacuuming Up Details About You — And It Could Raise Your Rates.” Propublica, Propublica, 17 Jul. 2018,

Andrews, Michelle. “Genetic Tests Can Hurt Your Chances Of Getting Some Types Of Insurance.” NPR, NPR, 7 Aug. 2018,

Leetaru, Kalev. “The Data Brokers So Powerful Even Facebook Bought Their Data – But They Got Me Wildly Wrong.” Forbes, Forbes, 5 Apr. 2018,

Miller, Caitlyn Renee. “I Bought a Report on Everything That’s Known About Me Online.” The Atlantic, The Atlantic, 6 Jun. 2017,

Morrissey, Brian et al. “The GDPR and key challenges faced by the Insurance industry.” KPMG, KMPG, Feb. 2018,

Probasco, Jim. “Why Do Healthcare Costs Keep Rising.” Investopedia, Investopedia, 29 Oct. 2018,

Song, Kelly. “4 Risks consumers need to know about DNA testing kit results and buying life insurance.” CNBC, CNBC, 4 Aug. 2018,–risks-consumer-face-with-dna-testing-and-buying-life-insurance.html