Hirevue is looking to expand its profits, and you are the price

Hirevue is looking to expand its profits, and you are the price
Tian Zhu | June 30, 2022

Insight: The video interview giant Hirevue’s AI interviewer had helped to reject more than 24 million candidates based on reasons that only AI knew. More scarily, the candidates could potentially contribute to their rejections with the data they provide.

Recruitment has always been a hot topic, especially after the great resignation following the covid-19 breakout. How to find the right candidates with the right talent for the right job has been the top concern for companies that are eagerly battling the loss of talent.

One important factor that causes talent loss during the hiring process is human bias, whether intentional or unintentional. The video interview giant, Hirevue, thought to use AI to combat bias in the hiring process. Machines can’t be biased right?

We all know the answer to that question. Though AI may not exhibit the same type of biases that human has, it has its list of issues. Clear concerns existed around the AI’s transparency, fairness, and accountability.

First, their algorithm was not transparent at all. Hirevue could not provide independent audits on their algorithms that analyzed the candidate’s video, including facial expressions and body gestures, that produced the final hiring decision. On top of that, there was no indication that the algorithm is fair towards candidates with the same expertise with different demographic backgrounds. The theory behind the link between facial expression and the candidate’s qualification is full of flaws; different candidates with the same qualifications and answers could be scored differently due to their eye movement. Thirdly, the company is accountable for the decision made by the AI. The company even implies the collection and usage of the interview data are solely for the “employer”, yet it is unknown whether they gain access to this data through permissions from the employers for various purposes.

Hirevue was challenged by the Electronic Privacy Information Center with a complaint to the FTC regarding the °∞unfair and deceptive trade practices°±. The company has since stopped using any algorithms with data other than the speech of the candidates.

With the strong push back on the robot interviewer, Hirevue limited their scope of AI to only the voice during the interviews. But note that Hirevue did not start as an AI company, but as a simple video interview company. It°Øs the company°Øs fear of missing out on the AI and big data that drives them to squeeze out the value of your data, desperately trying to find more business value and profit from every single drip.

Such a scenario is not unique to Hirevue. In 2022, AI is closer to people than you think it may be. People are no longer just curious about it but expect AI to help them in their daily life. Uber, for example, could not have been made possible without the heuristic behind optimal matching between the drivers and the users. Customers expect AI in their products. The companies that provide the capability race ahead while those who don’t fall behind naturally.

There are companies out there just like Hirevue, sitting on a pile of data, trying to build up some “quick wins” to not miss out on the AI trend. There’s just one problem, the data that the customers provided was not supposed to be used this way. It is a clear violation of secondary usage of data with all the problems mentioned in the previous sections.
The year 2022 is no longer the year where AI can grow rampantly without constraints from both legal and ethical perspectives. A suggestion for all potential companies that want to take advantage of their rich data: Be transparent about your data and algorithm decisions, be fair to all the stakeholders, and be accountable for the consequences of your AI product. The in-house “quick wins” should never make it out to the public without careful consideration of each point.


Is Netflix’s Recommendation Algorithm Making You Depressed?

Is Netflix’s Recommendation Algorithm Making You Depressed?
Mohith Subbarao | June 30, 2022

Netflix’s sophisticated yet unethical recommendation algorithm keeps hundreds of millions of people momentarily happy but perpetually depressed. Netflix binge-watching is ubiquitous in modern-day society. The normalcy of this practice makes it all the more imperative to understand the ethics of an algorithm that affects millions of people in seemingly innocuous ways. Before understanding the long-term negative effects of such an algorithm, it is important to understand how the algorithm works. Netflix’s algorithm pairs aggregate information about contents’ popularity and audience along with a specific consumer’s viewing history, ratings, time of day during viewing, devices used, and length of watching time. Using this information, the algorithm ranks your preferred content and puts it in row format for easy-watching. It is important to note that this is an intentionally vague summary from Netflix as the specifics of the algorithm has famously been kept under wraps.

Despite the secrecy, or maybe because of it, the algorithm has been massively successful. Netflix has researched that consumers take roughly a minute to decide on content on Netflix before deciding to not use the service, and so the algorithm is chiefly responsible for retention of customers. They have found that roughly eighty percent of content watched on Netflix can directly be linked to the success of the recommendation algorithm. The Chief Product Officer Neil Hunt went as far as to say that they believed the algorithm was worth over a billion dollars to Netflix. It is fair to say that the algorithm keeps users momentarily happy by constantly giving them a new piece of content to enjoy, but the long-term effects of this algorithm may not be as rosy.

A peer-reviewed research paper conducted at the University of Gujrat, Pakistan investigated these long-term negative effects. They gathered over a thousand people with a range of age, gender, education, and marital status and found that the average hours of streaming content watched was ~4 hours, with over thirty-five percent of the people watching over 7 hours a day. From this alone, it provided correlatory credence to the success of the Netflix algorithm. The research paper found statistically significant correlations between the amount of time spent binge-watching television with depression, anxiety, stress, loneliness, and insomnia. While more experimental research would be needed to provide evidence for causation, a correlational study alone with such significant effects raises eyebrows. These findings show that the success of Netflix’s recommendation algorithm has been correlated with a host of mental health issues.

These findings beg an ethical question – is Netflix’s recommendation algorithm actually unethical? To have a framework to answer such a question, we can use the principle of Beneficence from the Belmont Report. The Principle of Beneficence states that any research should aim to maximize possible benefits and minimize potential harms; furthermore, the research should consider these in both the short-term and the long-term. While Netflix is a for-profit company, their recommendation algorithm still falls under the umbrella term of research; thus, it can be fairly assessed using this principle. Netflix may increase short-term benefits for customers, such as a dopamine rush and/or an enjoyable evening with family. However, the algorithm’s intention to increase binge-watching patterns increases the potential harm of long-term mental illness for its customers. Therefore, it can be argued that Netflix’s recommendation algorithm does not meet the ethical standards of beneficence and may truly be causing harm to millions. It is important as a society to hold these companies accountable and take a closer eye to its practices and effects on humanity at large.


APA References
Lubin, Gas (2016). How Netflix will someday know exactly what you want to watch as soon as you turn your TV on. Business Insider. www.businessinsider.com/how-netflix-recommendations-work-2016-9
McAlone, Nathan (2016). Why Netflix thinks its personalized recommendation engine is worth $1 billion per year. Business Insider. www.businessinsider.com/netflix-recommendation-engine-worth-1-billion-per-year-2016-6
Netflix (2022). How Netflix’s Recommendations System Works. help.netflix.com/en/node/100639
Raza, S. H., Yousaf, M., Sohail, F., Munawar, R., Ogadimma, E. C., & Siang, J. (2021). Investigating Binge-Watching Adverse Mental Health Outcomes During Covid-19 Pandemic: Moderating Role of Screen Time for Web Series Using Online Streaming. Psychology research and behavior management, 14, 1615–1629. doi.org/10.2147/PRBM.S328416
The National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research (1979). The Belmont Report: Ethical Principles and Guidelines for the Protection of Human Subjects of Research. Retrieved May 19, 2022 from www.hhs.gov/ohrp/sites/default/files/the-belmont-report-508c_FINAL.pdf
Wagner, David (2015). Geekend: Binge Watching TV A Sign Of Depression?

Humans dictated by computer system in China

Humans dictated by computer system in China.
Anonymous | June 24, 2022

Since the COVID-19 outbreak in early 2020, Chinese governments introduced a color-coded digital health system to monitor and dictates their citizens’ movement based on their personal data.

The different UI interface of the health software.

This health software issues the Chinese citizens a colored health code (green, yellow, or red), and based on the color they can determine if they are allowed leave the house. “Anyone with a red or yellow code is not allowed to travel. A red code means you either have or likely have the coronavirus, while a yellow code means you have had contact with another infected person.” (Ankel, 2020) In order to generate the color, the software system have to collect very personal and sensitive information from each person. The software is available on both WeChat (Instant-messaging app) and Alipay (Mobile payment platform), and the coverage of those software in China is more than 95%. Thus, it is important to evaluate the privacy policy of this health software.

I will use Solove’s Taxonomy privacy framework to analyze the health software’s privacy policy. The framework describes potential breaches of privacy through the lens of a user at the points of information collection, processing, dissemination, and possible invasion. At the data collection stage, based on the privacy policy, there exists interrogation from Chinese government, there is barely any consent acquirement process. “Only 3 cities in 14 provinces and 300 cities have informed consent and privacy protection policies.” (PengPai, 2020) At the same time, I did find a lot of surveillance happening, the software tracks your cellphone locations to make sure that you have not being to places that you are not allowed. “Experts and activists have criticized China’s mass surveillance and are questioning what else this data is being used for”. (Ankel, 2020)

A passenger shows a green QR code on his phone proving his health status to security upon arrival at Wenzhou railway station.

At the data processing stage, I am concerned about the secondary use of data, because the retention policy is completely missing, as a user of the app, I can see all my records since the Covid started, so the data is being kept forever. “So, there’s a concern that although this data is being perhaps collected for a legitimate purpose, that it could potentially and eventually be misused in ways that we can’t predict right now.” (Ankel, 2020) Identification is also an issue, the data collected contains details like body temperature, health background, and contact information. With all of that information, Chinese government could easily track down anyone.

In the information dissemination stage, there is no sign of Chinese government breaching the confidentiality, since the data being collected are all Chinese citizens, and protecting citizens are always a priority for any country. The privacy policy also has issues regarding the increased accessibility, since the health system is deployed on WeChat and Alipay, it means that those two technology companies also have access to the data, which means a lot of engineers that don’t work for the government will have access to it, which increases the accessibility of the original personal data. The good news is that I did not see any appropriation or distortion. However, when I focused on the invasion, based on the color code researchers have “seen the use of big data to predictive policing and detain people really for no reason.” (Ankel, 2020)

All in all, the privacy policy of the health software for Covid monitoring have a lot of areas that needs improvement. Asking for consent from the citizens should be the first thing to add. Although Chinese government is protecting the data and using it for the citizens’ good, the government should still provide a more detailed guidelines of the data collection, and data usages.

Ankel, S. (2020, April 7). As China lifts its coronavirus lockdowns, authorities are using a color-coded health system to dictate where citizens can go. here’s how it works. Business Insider. Retrieved July 1, 2022, from www.businessinsider.com/coronavirus-china-health-software-color-coded-how-it-works-2020-4#to-start-traveling-again-people-have-to-fill-out-a-questionnaire-that-asks-for-details-like-body-temperature-and-health-background-the-software-then-analyzes-it-and-generates-a-color-code-green-yellow-or-red-that-identifies-a-persons-health-status-5

Only 3 places in 14 provinces and cities have informed consent and privacy protection clauses. 澎湃新闻. (2020). Retrieved July 1, 2022, from www.thepaper.cn/newsDetail_forward_7210904

Data Brokers: How far has your data gone?

Data Brokers: How far has your data gone?
Oscar Casas | June 24, 2022

As individuals’ online data profiles falls into data brokers hands, just how far can one’s online presence go without their consent? Data brokers are companies that track and collect user data through different sources and then process it and license it to other organizations. The problem starts to stem when we delve into just how much data brokers can have over individuals and who they are allowed to sell to. This lack of transparency has left many individuals exposed to frauds, scams and situations that no one would have consented to had they known what was happening behind the scenes. Data brokers now more than ever are in a position where share price is directly perpendicular to ethical boundaries and concerns of its user base and this spiral although getting tacked with new regulation has a long way to go before users are safe to surf the web in peace.

In the dawn of potential Roe v. Wade overturn we should be especially aware of who owns the data about our reproductive health. Cycle and ovulation apps, like Flo, Spot, Cycles and others, have been gaining popularity on the market in recent years. Those range from simple menstrual cycle calendars to full-blown ML-empowered pregnancy “planners”. The ML-support usually comes with a premium subscription. The kinds of data they collect ranges from name, age, and email to body temperature, pregnancy history and even your partner’s contact info. Most health and body-related data is entered by a user manually or through a consented linkage to other apps and devices such as Apple HealthKit and Google Fit. Although there is not much research on the quality of their predictions, these apps seem to be helpful overall even if it is just to make people more aware of their ovulation cycles.

The common claim in these apps’ privacy policies is that the information you share with them will not be shared externally. This, however, comes with caveats as they do share the de-identified personal information with third parties and are also required to share it with the law authorities in case of receiving a legal order to do so. Some specifically state that they would only share your personal (i.e. name, age group, etc.) and not health information if they are required by law. However, take it with a grain of salt as one of the more popular period tracking companies, Flo, has been sharing their users’ health data for marketing purposes from 2016 to 2019 without putting their customers in the know. And that was just for marketing; it is unclear if they can refuse sharing a particular user’s health information such as period cycles, pregnancies, and general analytics under a court order.

This becomes an even bigger concern in the light of the current political situation in the U.S. I am, of course, talking about the potential Roe v. Wade overturn. You see, if we lose the federal protection of the abortion rights, every state will be able to impose their own rules concerning reproductive health. This implies that some states will most likely prohibit abortion from very early on in the pregnancy; where currently the government can fully prohibit it only in the last trimester. This can mean that people that live in the states where abortion rights are limited to none will be bounded by these three options: giving birth, performing an abortion secretly (i.e. illegally under their state’s law), or traveling to another state. There is a whole Pandora box of classicism, racism, and other issues concerning this narrow set of options that I won’t be able to discuss since this post has a word limit. I will only mention that this set becomes even more limited if you simply have fewer resources or are dealing with health concerns that will not permit you to act on one or more of these “opportunities”.

However, let’s circle back to that app you might be keeping as your period calendar or a pocket-size analyst of all things ovulation. We, as users, are in this zone of limbo where without sharing enough information, we can’t get good predictions; but with oversharing, we always are under the risk of entrusting our private information in the hands of the service that might not be as protective of it as they implied. Essentially, the ball is still in your court and you can always request for the removal of your data. But if you live in the region that sees an abortion as a crime; beware of who may have a little too much data about your reproductive health journey.


[1] cycles.app/privacy-policy
[2] flo.health/privacy-portal
[3] www.cedars-sinai.org/blog/fertility-and-ovulation-apps.html
[4] www.nytimes.com/2021/01/28/us/period-apps-health-technology-women-privacy.html

[1] www.apkmonk.com/app/com.glow.android/
[2] www.theverge.com/2021/1/13/22229303/flo-period-tracking-app-privacy-health-data-facebook-google

When Government Surveillance Requires Surveillance Too

When Government Surveillance Requires Surveillance Too
Audrey Lei | June 30, 2022

Insight: Government usage of surveillance data requires some form of “technological due process” to mitigate overreach and ensure the fair and ethical usage of its citizens’ information.

In modern times, it is increasingly difficult to navigate through daily life without interacting with some form of technology, whether that’s using your smartphone to make an online purchase or walking through an area with video surveillance. It’s then no surprise that governments around the world are leveraging this ever growing technology network to monitor their citizens’ daily lives; collecting extensive, personal data and utilizing artificial intelligence techniques, all aimed at crafting a profile of an individual’s activities and behaviors. This massive data collection comes at the expense of citizens’ digital privacy and algorithmic fairness and, if abused or done without oversight, can be used for nefarious or illegitimate purposes, as highlighted in the two examples below.

Image 1: Protestors dissenting against online surveillance

One global superpower, China, has become notorious for its expansive, dominating surveillance measures that go beyond the standard of what we’ve come to expect. An article published June 2022 in the New York Times by Qian et. al. estimates that over half of the world’s one billion surveillance cameras are located in China and have the ability to identify an individual’s gender, race and even distinguish the type of clothing they are wearing, such as a mask or glasses. These facial recognition cameras have encroached upon private spaces such as residential buildings, allowing law enforcement to exercise control over citizens’ activities in more intimate settings. More shockingly, however, China’s surveillance measures also include taking retina scans, voice prints and DNA samples from the public such that law enforcement could generate a “personal dossier… for anyone [and] that could be made accessible to officials across the country” [1]. Yet, despite the usage of such highly private data, there exists a lack of transparency surrounding the details of data collection and dissemination, “a lack of record-keeping audit trails, making review of the law and facts supporting a system’s decisions impossible” [3].

Image 2: Citizens on the subway under video surveillance

However, China isn’t the only country implementing these types of surveillance techniques. In the United States, Chula Vista has become one of the most surveilled cities, due in part to its close proximity to the U.S. – Mexico border. According to a KPBS article by investigative reporter Amita Sharma, Chula Vista’s geographical location invites an extra layer of scrutiny from the U.S. Customs and Border Protection. The surveillance measures — device tracking, facial recognition, license plate readers, among many — are standard, but not the ways in which this surveillance data is being utilized. While citizens may assume that their data is being utilized for their benefit, this may not always be the case; in late 2020 it was reported that “Chula Vista police shared data collected from its license plate readers with immigration officials” contradicting California Senate Bill 54 which prohibits local law enforcement agencies from aiding in immigration enforcement purposes [2]. In this instance, citizens’ data was shared improperly due to a lack of governmental oversight and transparency.

At a minimum, the governmental usage of surveillance data should include some form of “technological due process” [3], such as an independent audit, to ensure that there is a fair system of checks and balances to protect individuals from harm. If governments arbitrarily collect and utilize data with little to no oversight, it could result in situations where data is used corruptly or for nefarious purposes, more likely to hurt its citizens than benefit them. At best, this type of surveillance could help governments detect criminal activity, identify perpetrators and reduce threats to society; but at its worst, this type of surveillance infringes upon the privacy and autonomy of innocent individuals.



[1] www.nytimes.com/2022/06/21/world/asia/china-surveillance-investigation.html
[2] www.kpbs.org/news/local/2021/12/09/chula-vista-became-most-surveilled-cities-country
[3] www.nytimes.com/roomfordebate/2014/08/06/is-big-data-spreading-inequality/big-data-should-be-regulated-by-technological-due-process

[Image 1] www.pbs.org/newshour/nation/internet-protest-fight-back-surveillance
[Image 2] www.nytimes.com/2019/12/17/technology/china-surveillance.html

Child-Proof Homes or Smart Homes? | The Modern Parenting Paradox

Child-Proof Homes or Smart Homes? | The Modern Parenting Paradox
Carolina Lee | June 29, 2022

The smart-home industry is growing fast, and with it, some questionable data collection and processing practices[1]. What used to be stand-alone, offline devices are now interconnected and able to collect and upload vast amounts of personally identifiable information (PII) into the cloud – all from inside the comfort of our own homes.

With the boom of smart home devices, companies like Amazon began to target parents and children for their smart speakers[2]. In 2018, Amazon released the Echo Dot Kids, which they claim in is intended to keep “peace of mind for parents[3].” What they fail to mention, however, is that, since its release, they have been under fire for a number of questionable practices, including: “listening in when it shouldn’t, and even keeping recordings made by the devices after parents have tried to delete them[4].”

Just about a year after its release, early 2019, advocacy groups for privacy, consumer and children protection filed a formal 96-page complaint to the Federal Trade Commission (FTC) calling attention to privacy and ethical concerns around Amazon’s smart speaker designed for kids[5].

For a product that was supposed to help educate children, filter explicit content, and help parents and children alike automate tedious tasks, the early Echo Dot Kids mimicked and brought to life a frightening “Big Brother” dystopia. In its early stages, this smart home speaker violated (and arguably continues to violate) a series of privacy and ethical guidelines set forth by the FTC, Belmont Report, and other privacy and ethical frameworks[6].

In Alexa’s FAQ page, Amazon addresses a number of privacy concerns including ensuring its users that, “Alexa minimize[s] the amount of data sent to the cloud” and that “[users] can review voice recordings associated with [their] account and delete the voice recordings[7].” Interestingly, however, under the “What happens when I delete my voice recordings?”Amazon explicitly outlines that they “may still retain other records of your Alexa interactions, including records of actions Alexa took in response to your request[8].”

Ethical and Privacy Concerns
Despite the formal complaint filed to the FTC a few years ago, the Echo Dot Kids continues to raise some ethical and privacy concerns. It directly challenges the beneficence principle (Belmont Report) in that, when collecting data, it maximizes the benefit for the product and company, while ignoring the harms that might come to the consumer[9]. While the users of the smart speakers for kids benefit from automating tasks and filtering explicit content for children, the harms associated with the collection of data could be far greater. This is because children are a vulnerable population and secondary use of the data is typically unknown to the user and/or loosely regulated[10]; furthermore, children are more susceptible and unsuspecting of how targeted marketing affects “their attitudes, beliefs, and behaviors, shaping their lives” than autonomous adults[11].

In fact, in its true form, these smart “speakers” feel a lot more like smart microphones. In data collection context, Alexa undoubtedly enables surveillance. It records and uploads every interaction made with the product[12]. As to how the children’s data is processed, shared and used, very little is said in Amazon’s Privacy Notice[13] and Children’s Privacy Disclosure[14] – both documents point to each other “for more information” but lack to share any meaningful information on what secondary purposes children’s data might or might not have. Similarly, parents are typically unaware of what they are consenting to when they hit the “agree” button. Lengthy and fragmented policies make it hard for any consumer to truly understand what they are subjecting themselves and their children to when agreeing to use the product.

Parting Thoughts
As technology companies advance into the parent/young children market, they need to create better safeguards around what type of information they choose to collect and share about children. As it stands, the choice between a safer home and a smart one still exists. While it is great that these organizations have found a market for assisting parents, these organizations need to be a lot more transparent, clear and concise with their privacy and usage policies. Parents should be able to make fully informed decisions on what they are willing to subject their children to. Not only do these large companies need to fully comply with COPPA and other regulations, it is their obligation to address ethical and privacy concerns beyond these laws.

Examples of steps they could take would be making an easier-to-read summary of their policy key takeaways so parents can more quickly scan through and give better informed “Parental Consent.” They could also create more transparency by giving users control through a settings/controls hub – where parents should be able to see what type of information is being collected on their child, approve them individually, and request to delete any data at any point directly from the page.

[1] Mark Lippett. Privacy, Intelligence, Agency: Security In The Smart Home. Forbes, May 5, 2022. www.forbes.com/sites/forbestechcouncil/2022/05/05/privacy-intelligence-agency-security-in-the-smart-home/?sh=6bda67594aac
[2] Lisa Eadicicco. Amazon’s New Echo for Kids Will Train Your Children to Say ‘Please.’ Time, April 25, 2018. time.com/5254163/amazon-echo-dot-kids-edition/
[3] Amazon. Echo Dot (4th Gen) Kids | Our cutest Echo designed for kids, with parental controls | Tiger. www.amazon.com/Echo-Dot-4th-Gen-Kids/dp/B084J4QQK1
[4] Zak Doffman. Amazon Slammed For Putting Kids At Risk With ‘Blatant Violation Of Privacy Laws.’ Forbes, May 9, 2019. www.forbes.com/sites/zakdoffman/2019/05/09/amazons-echo-dot-kids-accused-of-violating-privacy-laws-and-putting-kids-at-risk/?sh=3fcbda7e7e5a
[5] Campaign for a Commercial-Free Childhood (CCFC). Echo Kids Privacy. www.echokidsprivacy.com/
[6] Id.
[7] Amazon. Alexa and Alexa Device FAQs. www.amazon.com/gp/help/customer/display.html?nodeId=201602230
[8] Id.
[9] The National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research (1979). The Belmont Report: Ethical principles and guidelines for the protection of human subjects of research. U.S. Department of Health and Human Services. www.hhs.gov/ohrp/sites/default/files/the-belmont-report-508c_FINAL.pdf
[10] Kate Crawford and Vladan Joler. The mystery of the Amazon Echo data. Privacy International, April 17, 2019. privacyinternational.org/news-analysis/2819/mystery-amazon-echo-data
[11] Open Access Government. The importance of protecting and regulating children’s personal data. July 24, 2019. www.openaccessgovernment.org/childrens-personal-data/69928/#:~:text=Part%20of%20the%20problem%20is,and%20behaviours%2C%20shaping%20their%20lives.
[12] See note 6.
[13] Amazon. Privacy Notice. www.amazon.com/gp/help/customer/display.html?nodeId=468496
[14] Amazon. Children’s Privacy Disclosure. www.amazon.com/gp/help/customer/display.html?nodeId=202185560

[1] www.moms.com/echo-dot-kids-tips-tricks/
[2] www.amazon.com/gp/help/customer/display.html?nodeId=201602230
[3] www.commonsense.org/education/sites/default/files/tlr-blog/alexa-0160.png

Is someone listening to my conversation with my doctor?

Is someone listening to my conversation with my doctor?
Radia Abdul Wahab | July 5, 2022

Literature has shown that 43.9% of the U.S. medical offices have adopted either full or partial EHR systems by 2009 [1] . Every time we visit the doctor, either in the office or virtually; a series of sensitive information is recorded. This includes but is not limited to demographic, health problems, medications, progress notes, medical history and lab data information [3]. Lab data information itself may not seem sensitive. However, as genome sequencing is becoming more and more within our reach, a lot of the lab data now includes genomic information.

On the other hand, by use of social media and other online tools, large amounts of information are continuously being voluntarily shared on the internet by us as individuals. This poses a huge risk of re-identification.

Additionally, by using mobile health monitoring devices for us to track our health and well-being, we are adding yet another flood of private health information into more databases. Figure 1 below shows a wheel of various sources of information.

All this information together with emerging technologies of web sniffing/crawling, along with information sciences, pose a huge challenge for patient/individual information privacy.

Figure 1: Sources of Health Information we share using our mobile devices [2]

Who has access to my data?
As more and more data is being collected and digitized, there is a tendency to manage large big-data databases, in order to enhance scientific assessment. The government and various corporations have also made a lot of data available with an intention for scientific enhancement. Oftentimes these are accessible fully on public websites, or by a minimum payment.
Is Re-identification really possible?
Various forms of information are collected when we go to the Doctor. These include but are not limited to, Identifier attributes (such as name, SSN), Quasi-Identifier attributes (such as gender, zip code) or sensitive attributes (such as disease conditions or genomic data). Most of the time, this data is “sanitized” and removed before being available to external parties [3].

“when an attacker possesses a small amount of (possibly inaccurate) information from healthcare-related sources, and associate such information with publicly-accessible information from online sources, how likely the attacker would be able to discover the identity of the targeted patient, and what the potential privacy risks are.” [3].

One of the most critical misunderstandings we have is that it is not possible to link information from one source with information from a different source. However, with the advent of modern technologies, it has become quite easy for algorithms to crawl across various web pages and consolidate information.

Another area of risk is that a lot of algorithms are using “smart” techniques, in order to bridge gaps between missing or inaccurate information. Below (Figure 2) is a schematic that shows a case study of such an algorithm.

Figure 2: Re-identification using various web sources. [3]

What is the process of Re-identification?
There are three main steps of re-identification: Attribution, Inference and Aggregation. Attribution is when sensitive or identifiable information is collected from online sources. Inference is when additional information is either “fitted” to that, or learned by algorithms. Aggregation is when various sources of information are aggregated together. These three steps provide quite a clear path to re-identification. Figure 3 below shows some aspects of these processes.

Figure 3: Process of Re-identification [4]

With the flood of health information entering the web, and with emerging technologies, almost no aspect of our health is really concealable. It is very important for us to minimize sharing of our information on the web, to the extent possible, since there is a lot more out there that we will never know of. There are smart technologies out there that are reaching in, and listening to all of these, and these may be used against us by adversaries.


[1] Hsiao CJ, Hing E, Socey TC, Cai B: Electronic medical record/electronic health record systems of office-based physicians: United States, 2009, and preliminary 2010 state estimates. National Center for Health Statistics Health E-stat 2010.
[2] Isma Masood ,1 Yongli Wang,1 Ali Daud,2 Naif Radi Aljohani,3 and Hassan Dawood4: Towards Smart Healthcare: Patient Data Privacy and Security in Sensor-Cloud Infrastructure. Wireless Communications and Mobile Computing Volume 2018, Article ID 2143897
[3] Fengjun Li, Xukai Zou, Peng Liu & Jake Y Chen: New threats to health data privacy. BMC Bioinformatics volume 12, Article number: S7 (2011)
[4] Lucia Bianchi, Pietro Liò: Opportunities for community awareness platforms in personal genomics and bioinformatics education. Briefings in Bioinformatics, Volume 18, Issue 6, November 2017, Pages 1082–1090, doi.org/10.1093/bib/bbw078

Privacy Risks in Brain-Computer Interfaces and Recommendations

Privacy Risks in Brain-Computer Interfaces and Recommendations
By Mia Yin | March 16, 2022

1. What is BCI?
BCI is a pathway connecting brain with an external device, most commonly a computer or a robotic limb. BCI is used to collect and process brain’s neurodata, and then the neurodata will be translated into outputs used in the visualizations or used as commands to tell the external interfaces/machines how to control people’s behavior or modulate neural activity. Because the neurodata is generated from people’s nerve system, it is also personal data.

BCIs are currently mostly used in gaming and healthcare. For example, in the gaming industry, BCIs use neurodata to allow players to control their gaming actions by their conscious thoughts. BCIs games provide greater immersion in games.

There are three main categories of BCIs:
a. BCIs that record brain activity;
b. BCIs that modulate brain activity;
c. BCIs that do both, also called bi-directional BCIs/BBCIs

BCIs can be invasive or non-invasive. Invasive BCIs enables the direct communication between the brain and an external device, like a computer. They are inserted in the brain.

Unlike invasive BCIs, noninvasive BCIs are not inserted in the brain. They are equipped outside and can also record neurodata.

2. BCIs risks including BCIs accuracy and mental privacy
BCIs accuracy: BCIs data accuracy is quite important especially in the healthcare industry. Patients who use BCIs depend its accurate translation to express their thoughts to the doctors. Some patients also reply on BCIs to mitigate disorders. For example, patients who suffer from epilepsy rely on BCIs to get mitigations. If BCIs process neurodata incorrectly, patients may have bad health consequences, even death.And also doctors depend on BCIs’ accurate neurodata information to provide the best treatment. The device data and interpretation accuracy need to be verifiable, sufficient and reliable.

Mental privacy: Since BCIs collect and process personal neurodata to get people’s thoughts and conscious or unconscious intentions, BCIs raise new mental privacy risks to the neural networks in addition to the existing privacy risks that are related to people’s personal health data. For example, some wheelchairs are controlled by BCIs. Patients who use such a wheelchair can control the wheelchair to go to a place for food when they are thinking about food. However, these BCIs can also collect information about patient’s food preferences, at what time a patient may feel hungry or thirsty etc. These neurodata can show a lot of personal biological and private information. If the data is shared with other organizations, it may cause many privacy problems, such as disclosing a patient’s medical condition to an employer or other public entities.

3. Technical and policy recommendations
Technical recommendation: BCIs can provide more control for users to collect neurodata. For example, BCIs can ask the user if they want to start the neurodata collector. This feature prevent users switch on the privacy collection unintentionally and give users more control over personal neurodata flows.

Policy recommendation: More transparency should be displayed in the privacy policy. The policy should tell the users about what data BCIs may collect, what purpose will be used for, who controls and has access to the data, how data will be stored etc. Developers and regulators should clearly reflect the particular privacy risks in BCI applications and let users decide whether or not to give the informed consent to use BCIs.

4. Conclusion:
BCI is an advanced computer-based system which collects and process a lot of personal neurodata. Stakeholders must understand how BCI work and what BCI stores and translates. BCI has many privacy risks that may expose personal data to public entities, thus more technical methods and privacy policy need to be improved to protect the private data and ensure the data is secured and not used in any unwanted purposes.

[1] fpf.org/blog/bci-commercial-and-government-use-gaming-education-employment-and-more/
[2] fpf.org/blog/bcis-data-protection-in-healthcare-data-flows-risks-and-regulations/
[3] fpf.org/blog/bci-technical-and-policy-recommendations-to-mitigate-privacy-risks/

Data Sharing during the COVID-19 Pandemic

Data Sharing during the COVID-19 Pandemic
By Javier Romo | March 16, 2022

Patient data privacy and security is, by standard, something that all healthcare organizations must provide assurance for. However, when a global pandemic threatens humanity, data sharing amongst healthcare entities is vital to understanding, controlling and responding to the spread of virus, as we saw in the 2020 COVID-19 pandemic. The problem was, at least early in the pandemic, barriers that were in-place to isolate patient data for privacy and security prior to the COVID-19 pandemic created data silos and lack of capability to tackle the crisis in a coordinated fashion. While in a normal situation, we study privacy, ethics and legalities around data sharing, but the pandemic forced us to abandon those strongholds to, hopefully, save as many lives as possible.

Why Was Data Sharing Important?

Data sharing during the pandemic does not simply involve lab or vaccine data, but also demographics like age, race, location, perhaps even previous diagnosis. Electronic health records allowed for the mass collection of this data. As scientists learn about a virus, they must understand who the virus is impacting the most and generally, that involves the curation of all this data to describe the effected population. While many of us would in a normal case prefer our data be private to our healthcare organization of choice, the benefits of this practice became obvious.

First, vaccine distribution was tailored towards individuals who were high-risk of a serious COVID-19 reaction or more likely to get infected. For example, the first group of individuals prioritized for the first dose were older adults, many in nursing homes or assisted living facilities, and healthcare staff, especially hospitals who were actively treating infected patients (Stieg, 2020).

Secondly, the curation of patient data allowed governments to make significant policy and orders to reduce the spread of the pandemic. For example, as the data showed a spread of the pandemic, governors across the United States placed shutdown orders to prevent the spread. These decisions were based on the story of the data and the impact the virus could have on regions under threat.

What Is Happening Now?

The COVID-19 pandemic presented many learning lessons that will impact how the United States, and likely the world, addresses a future pandemic. For example, the National COVID Cohort Collaborative, a data sharing project sponsored by the National Institute of Health (NIH), is developing a database named N3C that allows for healthcare organizations that are participating in the project to share “entire medical records” with the database (Frieden, 2021). This framework is specific to the COVID-19 pandemic; however, it is a framework that can be recreated and deployed for a new virus or disease. And while all of it sounds good during a pandemic, now that is 2022 and the pandemic appears to be nearing its end, patient privacy concerns are reemerging. It is important that we review when data sharing, especially to this extent is allowed and rebuild trust in our healthcare data security and privacy

To conclude, the COVID-19 pandemic was a shock to the healthcare system, and the world. It required rapid changes to data sharing to move data out of healthcare system silos and in the hands of healthcare entities and government that can help combat the pandemic. In war, militaries utilize information gathered by spies, reconnaissance or other intelligence to strategize a battleplan, healthcare systems during a pandemic need something similar, and in this case, it was data gathered from electronic health records. Now that the pandemic is nearing the end, we must review what was done and rebuild the trust in our healthcare data privacy. However, we must research and develop contingency plans to share data in the case another pandemic threatens lives of many people.


[1] Frieden, J. (2021, April 23). Health Data Sharing Improved During Pandemic, but Barriers Remain. Retrieved from Medpage Today: <www.medpagetoday.com/practicemanagement/informationtechnology/92263&gt;
[2] Stieg, C. (2020, December 14). When Dr. Fauci and other experts say you can expect to get vaccinated for Covid-19. Retrieved from CNBC.com: <www.cnbc.com/2020/12/14/who-gets-the-covid-vaccine-first-timeline-and-priority-explained.html&gt;

Image 1: <www.medpagetoday.com/practicemanagement/informationtechnology/92263&gt;
Image 2: <chicagoitm.org/learn-how-to-harness-this-national-covid-19-database-for-your-research/&gt;

Operationalizing Privacy Protection through Data Capitalization

Operationalizing Privacy Protection through Data Capitalization
By Pow Chang | March 16, 2022

Many companies have emerged to provide software-as-a-service (SaaS) in the last two decades. They harvested massive datasets and aggregated the dataset into high-value service products. The primary business model of these data-centric organizations is to design and build a digital platform, where data and information are primary assets to the organization to generate revenue perpetually. These types of data-centric companies such as Facebook, Twitter, Netflix, Amazon have harvested millions of Petabytes of the data subject as such PII (personally identifiable data) data. Since this valuable data collected on their platforms are the assets and tools to generate future cash flow, it is in their best interests to protect these data, furthermore, ensure and comply with privacy regulations such as GDPR and CCPA. Public wants to hold these companies accountable for privacy protection [2], stakeholders would want to have this data be capitalized as tangible assets in the collection and storage process to ensure financial integrity and good data governance.

There are a few plausible reasons for capitalizing PII data in the balance sheet to operationalize data privacy: first, this serves as a proxy for privacy protection; capitalized PII is reflected in the financial statement and subject to scrutinization and auditing process every quarter by professional auditors. In the current practice, companies usually expense out the data acquisition cost even though the data has a significant impact on their future cash flow. Expensing the acquisition cost does not capture the intrinsic value and physical presence of the PII in their book. According to the Ponemon Institute’s 2020 “Cost of Data Breach Study”, the average cost per compromised record for PII is $175 as compared to the Intellectual Property loss of $151 [8]. Capitalized data does not equal replacing data acquisition costs; it adds the tangible asset component to validate the existence of the PII.

Second, there is no practical way to quantify the actual loss and degree of damage in any data breach harms [7]. One good example will be the case of Equifax, on September 7, 2017, Equifax announced that they lost the personal information of over 140 million consumers from its network in a catastrophic data breach [3], including people’s social security numbers, driver’s license numbers, email addresses, and credit card information. Even though Equifax settled the data breach and agreed to pay $1.38 billion, which includes $1 billion in security upgrades. For customers’ data that has been compromised, the customer could be entitled to up to $20,000 claims of damage. However, this claims process puts the onus on the consumer to justify that they deserve that. If Equifax had capitalized PII on its book, this could provide a detailed assessment of damage and budget planning for security technology expenditure to safeguard the PII assets.

Third, since capitalized data is captured in the book, the audit and transparency could prevent any potential poor corporate governance and dishonest culture that nurture severe conflicts of interests or even unethical behavior, such as the case of Facebook – Cambridge Analytica [4]. All the assets are subject to matching their equivalent market value. It could be impaired or appreciated. In either case, the company must have a plausible explanation to adjust and to reflect any incident materially affects the underlying fundamental of its business. For instance, Target paid $18.5M for the 2013 data breach that affected 41 million consumers [5]. This settlement amount could have been much more considerable higher and revealed the millions of consumers’ actual loss if they captured PII records in the financial statement.

There are still many studies to understand the implication of using data capitalization as a proxy for boosting privacy protection. Privacy dimension [6] provides a comprehensive list of privacy dimensions and attributes; this is a framework could define the construct of the data to be capitalized. To operationalize privacy is to protect the vulnerable group and create a fair system, these organizations reap the profit, but customers are the one to bear the cost in the long term. This does not align with the Belmont Report’s Principle of Justice – fairness in distribution [9]. Most data breaches were due to poor internal security control, people factors, overdue patches, and known application vulnerabilities. The cost suffered by the consumer is colossal and could never be adequately estimated unless PII data is capitalized as tangible asset.


[1] www.cnbc.com/2019/07/25/how-to-claim-your-compensation-from-the-equifax-data-breach-settlement.html
[2] Nicholas Diakopoulos. Accountability in Algorithmic Decision Making. Communications of the ACM, February 2016, Vol. 59 No. 2, Pages 56-62. cacm.acm.org/magazines/2016/2/197421-accountability-in-algorithmic-decision-making/fulltext
[3] www.inc.com/maria-aspan/equifax-data-breach-worst-ever.html
[4] www.wired.com/story/cambridge-analytica-facebook-privacy-awakening/
[5] www.usatoday.com/story/money/2017/05/23/target-pay-185m-2013-data-breach-affected-consumers/102063932/
[6] Deirdre K. Mulligan, C. K. (2016). Privacy is an essentially contested concept: a multi- dimensional analytic for mapping privacy. The Royal Society Publishing, 374.
[7] Solove, D. J. (2005). A Taxonomy of Privacy. GWU Law School Public Law Research Paper No. 129, 477.
[8] www.ponemon.org/
[9] The Belmont Report: What is it and how does it relate to today’s clinical trials?
Department of Health, E. a. (1979, 4 18). The Belmont Report.
[10] Pictures source: pixabay.com