Class Blog – Page 10 – Data Science W231 | Behind the Data: Humans and Values

June 20, 2022

Cycle tracking apps: what they know and who they share it with

Cycle tracking apps: what they know and who they share it with
By Kseniya Usovich | June 16, 2022

In the dawn of potential Roe v. Wade overturn we should be especially aware of who owns the data about our reproductive health. Cycle and ovulation apps, like Flo, Spot, Cycles and others, have been gaining popularity on the market in recent years. Those range from simple menstrual cycle calendars to full-blown ML-empowered pregnancy “planners”. The ML-support usually comes with a premium subscription. The kinds of data they collect ranges from name, age, and email to body temperature, pregnancy history and even your partner’s contact info. Most health and body-related data is entered by a user manually or through a consented linkage to other apps and devices such as Apple HealthKit and Google Fit. Although there is not much research on the quality of their predictions, these apps seem to be helpful overall even if it is just to make people more aware of their ovulation cycles.

The common claim in these apps’ privacy policies is that the information you share with them will not be shared externally. This, however, comes with caveats as they do share the de-identified personal information with third parties and are also required to share it with the law authorities in case of receiving a legal order to do so. Some specifically state that they would only share your personal (i.e. name, age group, etc.) and not health information if they are required by law. However, take it with a grain of salt as one of the more popular period tracking companies, Flo, has been sharing their users’ health data for marketing purposes from 2016 to 2019 without putting their customers in the know. And that was just for marketing; it is unclear if they can refuse sharing a particular user’s health information such as period cycles, pregnancies, and general analytics under a court order.

This becomes an even bigger concern in the light of the current political situation in the U.S. I am, of course, talking about the potential Roe v. Wade overturn. You see, if we lose the federal protection of the abortion rights, every state will be able to impose their own rules concerning reproductive health. This implies that some states will most likely prohibit abortion from very early on in the pregnancy; where currently the government can fully prohibit it only in the last trimester. This can mean that people that live in the states where abortion rights are limited to none will be bounded by these three options: giving birth, performing an abortion secretly (i.e. illegally under their state’s law), or traveling to another state. There is a whole Pandora box of classicism, racism, and other issues concerning this narrow set of options that I won’t be able to discuss since this post has a word limit. I will only mention that this set becomes even more limited if you simply have fewer resources or are dealing with health concerns that will not permit you to act on one or more of these “opportunities”.

However, let’s circle back to that app you might be keeping as your period calendar or a pocket-size analyst of all things ovulation. We, as users, are in this zone of limbo where without sharing enough information, we can’t get good predictions; but with oversharing, we always are under the risk of entrusting our private information in the hands of the service that might not be as protective of it as they implied. Essentially, the ball is still in your court and you can always request for the removal of your data. But if you live in the region that sees an abortion as a crime; beware of who may have a little too much data about your reproductive health journey.

References

[1] https://cycles.app/privacy-policy
[2] https://flo.health/privacy-portal
[3] https://www.cedars-sinai.org/blog/fertility-and-ovulation-apps.html
[4] https://www.nytimes.com/2021/01/28/us/period-apps-health-technology-women-privacy.html

Images:
[1] https://www.apkmonk.com/app/com.glow.android/
[2] https://www.theverge.com/2021/1/13/22229303/flo-period-tracking-app-privacy-health-data-facebook-google

June 20, 2022

Experiments That Take Generations to Overcome

Experiments That Take Generations to Overcome
By Anonymous | June 16, 2022

‘”Give me a dozen healthy infants, well-formed, and my own specified world to bring them up in and I’ll guarantee to take any one at random and train him to become any type of specialist I might select – doctor, lawyer, artist, merchant-chief and, yes, even beggar-man and thief, regardless of his talents, penchants, tendencies, abilities, vocations and the race of his ancestors. (Watson, 1924)

The field of psychology has advanced so much in the last century, not just in terms of the scientific knowledge, but also ethics and human rights. A testament to that is one of the most ethically dubious experiments, the Little Albert experiment, which weíll explore in this blog in how it relates to the Beneficence principle of the Belmont Report, and how it continues to impact us today in ways we may not realize. (National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research, 1979)

As some background, in the 1920s, John B. Watson, a John Hopkins Professor, was interested in reproducing Ivan Pavlovís findings on classical conditioning in babies. Classical conditioning is when ìtwo stimuli are linked together to produce a new learned response in a person or animal (McLeod, 2018). Ivan Pavlov was famous for his experiment of getting his dogs to salivate at the sound of a bell by giving them food every time he sounded the bell, so that at first they salivated at the sight of food, but eventually learned to salivate at just the sound of the bell. Similarly, the Little Albert experiment was performed on a 9-month-old Albert B. At the start of the experiment, Little Albert was presented with a rat, a dog, a rabbit, and a Santa Claus mask, and he was not afraid of any of them, but then every time he touched any of them, the scientists struck a metal bar behind him and eventually, he was conditioned to be terrified of those animals and the Santa Claus mask. (Crow, 2015; McLeod, 2018).

The principle of Beneficence in the Belmont Report requires that we maximize benefits and minimize harms to both individuals and society (National Commission, 1979). The most glaring weakness of the experiment in this principle is that Watson did not even bother to reverse the results of his experimentation on the baby.

Seeing that the experiment did work in making Little Albert terrified of rats and anything furry, itís safe to believe that successfully reversing this result was not only possible but an easy thing to do. Even an unsuccessful attempt at reversal would make those of us analyzing it in the present day have a slightly different opinion of the experiment. While itís possible for the conditioned response to wear off, a phenomenon known as Extinction, it can still return (albeit in weaker form) after a period, a phenomenon known as Spontaneous Recovery (McLeod 2018). (National Commission, 1979; Mcleod 2018).

While the individual was harmed, what about society as a whole? Watson did the experiment to show how classical conditioning can not only be applied to humans, but explain everything about us, going so far as to deny the existence of mind and consciousness. Whether the latter points are true or not, the experiment contributed to the field of human psychology in important ways, from understanding addictions to classroom learning and behavior therapy (McLeod 2018). Today, our understandings are not complete by any means, but we do take for granted much of the insights gained. Unfortunately, it goes the other way too. (McLeod 2018)
Watsonís Little Albert experiment is undoubtedly connected to his child-rearing philosophy. After all, he did believe he could raise infants to become anything, from doctors to thieves. He essentially believed children could be trained like animals, and he ìadmonished parents not to hug, coddle or kiss their infants and young children in order to train them to develop good habits early onî (Parker, Nicholson, 2015). While modern culture has fought against a lot of our traditional views on parenting, and even classify some of it as ìchild abuse,î Watsonís views leave behind a legacy in our dominant narratives. Many still believe in ìtough loveî methods, such as talking down to children or talking to them harshly, corporal punishment, shaming, humiliation, and various others, especially if they grew up with those methods and believe they not only turned out fine but also became better people as a result of it. Others, such as John B. Watsonís very own granddaughter Mariette Hartley, and all the families she wrote about in her book Breaking the Silence, have experienced suicide and depression as the legacy left behind by Watsonís teachings. Even those who turned out fine may ìstill suffer in ways we donít realize are connected to our early childhood years.î (Parker, Nicholson, 2015)
While both hard scientific knowledge and human ethics have advanced unprecedentedly in the past century, it does not mean weíre completely emancipated from the repercussions of ethically dubious experiments and experimentation methods of the past. Harm done to either individuals or groups in an experiment can not only last a lifetime for those subjects but carry on for generations and shape our entire culture around it. To truly advance both knowledge and ethics, itís imperative that we are aware of this dark history and remember it, especially with how the Little Albert experiment has influenced and continues to influence our parenting methods, because ìnow that we know better, we must try to do better for our children (Parker, Nicholson, 2015).î

References:
Crow, J. (2015, January 29). The Little Albert Experiment: The Perverse 1920 Study That Made a Baby Afraid of Santa Claus & Bunnies. Open Culture. https://www.openculture.com/2015/01/the-little-albert-experiment.html
McLeod, S. A. (2018, August 21).†Classical conditioning. Simply Psychology. www.simplypsychology.org/classical-conditioning.html
Parker, L. and Nicholson, B. (2015, November 20). This Childrenís Day: Itís time to break Watsonís legacy in childrearing norms. APtly Said. https://attachmentparenting.org/blog/2015/11/20/this-childrens-day-its-time-to-break-watsons-legacy-in-child-rearing-norms/
National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research. (1979, April 18). The Belmont Report. Retrieved May 17, 2022, from https://www.hhs.gov/ohrp/sites/default/files/the-belmont-report-508c_FINAL.pdf

June 20, 2022

Digital health in the metaverse: overview of the landscape and legal considerations

Digital health in the metaverse: overview of the landscape and legal considerations
By Anonymous | June 16, 2022

Key takeaway: A brave new world for healthcare innovation, the metaverse presents sci fi-like solutions, from immersive exposure therapy to whole body digital twins. But, like all new technology, it brings its own challenges of tackling health inequities and data privacy.

[13] Example of virtual reality headsets.

As the next generation of the internet, the metaverse promises immersive, three-dimensional experiences through digital marketplaces and social interactions [1]. In the context of digital health, the metaverse changes the relationship between people and technology, with users experiencing within or alongside virtual content, rather than interacting with digital products and services. Right now, digital health is predominantly products and solutions that allow patients and providers to view, share, exchange or create digital content. Some product examples are entering patient data into electronic health records, sharing video during a telemedicine consultation or sending payment through online portals [2]. In the digital health metaverse, product offerings shift to patients attending virtual reality group therapy sessions, surgeons planning out their procedures on holograms and those with cognitive disabilities practicing learning social cues through simulated social interactions.

[14] Human anatomy and physiology.

While there are a wide range of possibilities for healthcare within the metaverse, the two most common categories of metaverse applications in digital health are immersive environments and digital twins [3].

Immersive environments

These are virtual or hybrid worlds in which providers and consumers engage with each other for educational, assistive, or therapeutic purposes. The biggest category of digital health in the metaverse, immersive environments are accessed through virtual reality (VR) or a hybrid of real-world and virtual components that come together via augmented reality (AR) technology or holograms [3]. Educational applications range from medical libraries to surgical training platforms and immersive emergency situations for clinicians to practice without worrying about real-world consequences [4][5]. In the operating room, some VR headsets help surgeons control minimally-invasive surgical robots while others help them place implants [6][7]. Therapeutic metaverse environments allow for specialized settings for different kinds of interventions, such as allowing patients to try exposure therapy virtually to address phobias [8].

Digital twins

Representations of real-world entities that exist in virtual worlds, digital twins can be manipulated to extract insights for healthcare decision making. In healthcare, digital twins can be organs, individuals, patients or populations. And although they are a form of synthetic data, because they are modeled off of real entities, these digital twins are often connected in an ongoing manner to their real-world counterparts [3]. Starting with organs and muscle groups, cardiac digital twins are being pioneered by large corporations, simulations that reflect the molecular structure and biological function of individual patients’ hearts [9]. This allows doctors to simulate how each patient’s heart would respond to different medications or surgeries. Bones and muscle groups are also on the forefront, allowing scientists to simulate how medical devices and implants may interact or degrade within the patient’s body over time [10]. Beyond organs, whole-body digital twins of individuals are being created, where patient vitals, scans, medical history and genetic tests are combined to create simulations of patient anatomy and physiology [11].

Healthcare hurdles & legal considerations

Healthcare applications in the metaverse can compound health inequities related to device ownership, digital literacy and internet accessibility [3]. And the creation of virtual entities like digital twins and avatars raise new questions in patient health data and privacy [12]. In terms of legal considerations, health care providers and professionals must consider custody of digital assets, select a platform, register IP or file trademarks, secure blockchain domains to facilitate metaverse payments, and reserve metaverse rights [2]. The decentralized nature of the metaverse poses challenges to businesses that are used to having predictable law enforcement mechanisms to protect their legal interests. In addition, specifically for healthcare, questions of how traditional state-based licensure requirements apply to metaverse providers and whether blockchain technology of health data sharing complies with state and federal data privacy and security requirements are uncertain.

The rise of the metaverse has presented healthcare with endless possibilities, allowing providers, patients and businesses to interact in a way that was considered science fiction just a few years ago. However, like all new technologies, the metaverse brings its own hurdles and legal challenges.

References

[1] https://insights.omnia-health.com/technology/dawn-metaverse-healthcare

[2] https://www.natlawreview.com/article/metaverse-legal-primer-health-care-industry

[3]https://rockhealth.com/insights/digital-health-enters-the-metaverse/?mc_cid=46fab87845&mc_eid=2d0859afdf

[4] https://www.giblib.com

[5] https://www.healthscholars.com

[6] https://www.vicarioussurgical.com

[7] https://augmedics.com

[8] https://ovrhealth.com

[9] https://www.siemens-healthineers.com/perspectives/mso-solutions-for-individual-patients.html

[10] https://www.virtonomy.io

[11] https://q.bio

[12] https://xrsi.org/publication/the-xrsi-privacy-framework

[13] https://unsplash.com/s/photos/health-tech

[14] https://www.pexels.com/search/health%20tech/

June 20, 2022

The Tip of the ICE-berg: How Immigration and Customs Enforcement uses your data for deportation

The Tip of the ICE-berg: How Immigration and Customs Enforcement uses your data for deportation
By Anonymous | June 16, 2022

How Immigration and Customs Enforcement uses your data for deportation

Photo by SIMON LEE on Unsplash

The United States is a country made of immigrants. It is a great social experiment in anti-tribalism, meaning that there is no single homogenous group. Everyone came from somewhere with different ideals, ancestry, religion, and life goals that have all blended to make this country the multifaceted melting pot it is today. But for a country made of immigrants, the U.S. certainly goes the extra mile to find and punish those who would immigrate now by any means available to them.

Before Immigration and Customs Enforcement (ICE), the U.S. created the Immigration and Naturalization Service (INS) in 1933 which handled the country’s immigration process, border patrol, and enforcement.[1] During the Great Depression, immigration rates dropped, and the INS shifted its focus to law enforcement.[2] From the 1950s through the 1990s INS answered the public outcry of illegal aliens working in the U.S. by cracking down on immigration and enforcing deportation. However, the INS did not have a meaningful way of tracking those already in the United States and lacked a border exit system for those who had entered the country on visas. INS’s shortcomings were further highlighted in the aftermath of 9/11, as it was uncovered that at least two of the hijackers were in the U.S. on expired visas.[3] In response, the Homeland Security Act dissolved the INS and created ICE, Customs and Border Patrol (CBP), and U.S. Citizenship and Immigration Services (USCIS). ICE wasted no time acquiring data to fulfill its mission “[T]o protect America from the cross-border crime and illegal immigration that threaten national security and public safety.”[4]

According to a recent report by Wang et al., (2022)., ICE has contracts to increase its surveillance abilities by collecting and using data in the following categories:

Biometric Data– Fingerprint, face recognition programs, and DNA
Data Analysis – Combining of different data sources and management
Data Brokers – Private companies’ databases and those that sell third party data such as Thomson Reuters and Lexis Nexis that sell information such as utility data, credit data, DMV records
Geolocation Data – GPS tracking, license plate readers, and CCTV
Government databases – Access to government agencies databases not under the Department of Homeland Security
Telecom Interception – wiretapping, Wi-Fi interception, and translation services[5]

This practice violates the five basic principles (transparency, individual participation, limitations of purpose, data accuracy, and data integrity) set forth by the HEW Report to safeguard personal information contained in data systems and of which is the framework for the Fair Information Practice Principles (FIPPs).[6] The Department of Homeland Security (DHS) which oversees ICE, uses a formulation of FIPPs in the treatment of Personally Identifiable Information (PII).[7] However, ICE has been able to blur these ethical lines in the pursuit of finding and deporting illegal immigrants.

ICE can access information on individuals who are American citizens, legal residents, and legal visa holders in addition to what they have classified as “criminal aliens” all without consent from the individual. ICE has purchased, collected, and used your information to enforce and deport immigrants (some with no criminal record) warrantlessly and without legislative, judicial, or public oversight (Wang et al., 2022). Unfortunately, this may be by design because purchasing data and combining it to identify individuals is not illegal and provides a way for government organizations to get around legal requirements for things such as warrants or reasonable cause.

Setting up basic utilities such as power, water, and internet or being detained, not convicted of a crime, where biometric data is taken (fingerprints, DNA) should not be counted as consent to have your data accessible to ICE.[8] Your trust in these essential services, state, and federal government agencies to safeguard and use your PII data for their originally intended purpose is being abused. ICE continues to spend $388 million per year of taxpayer dollars to purchase as much data as possible to build an unrestricted domestic surveillance system (Wang et al., 2022). While ICE’s stated focus is illegal immigration, what or whom is to stop them from targeting other forms of immigrants, legal resident aliens, dual citizens, or foreign-born American citizens?

This process gives Solove’s Taxonomy of Privacy[9] whiplash as it is rife with opportunity for surveillance, identification, insecurity, blackmail, increased accessibility among other government agencies, invasion, and intrusion for what should be a mundane piece of data they collect or purchase about you – repeatedly.

Photo by ThisisEngineering RAEng on Unsplash

As a nation of immigrants, we owe it to the newest arrivals, no matter how they got here, the basic expectation of privacy, respect, and protection. In contextualizing the exploitation of this data based on Nissenbaum’s[10] approach there is expectation of privacy, US Citizen’s believe they have a right to privacy including from the government, why should this be different based on your immigration status? When you apply for a basic service such as water and power you should not be fearful that your home could be raided at any moment. As an individual will only generate more data as technology and digitalization progresses. U.S. laws and policies with severe penalties for companies and the government need to evolve to provide protection of PII before you become a target of ICE or any other government agency looking to play the role of law enforcement.

Positionality and Reflexivity Statement:
I am an American biracial, able-bodied, cisgender, woman, wife, and mother. I belong to the upper-middle class, non-religious, unaffiliated voter block. I am the product of growing up among a family made up of newly arrived immigrants from all parts of the globe. Some came here through visa programs, some came sponsored and some overstayed the tourist visas, fell in love, and married Americans. Regardless of their path to legal status to live and work in the United States, nothing can describe the weight lifted and relief once granted approval to remain in this country to not only the individual being reviewed but the entire community of family and friends, they have created around them. I tell your this from experience but with limited information so that I don’t give ICE another data point or a reason to come looking.

[1] USCIS. (n.d.). USCIS. USCIS. Retrieved 2022, from https://www.uscis.gov/about-us/our-history

[2] USCIS History Office and Library. (n.d.). Overview of INS History. Https://Www.Uscis.Gov/Sites/Default/Files/Document/Fact-Sheets/INSHistory.Pdf. Retrieved May 19, 2022, from https://www.uscis.gov/sites/default/files/document/fact-sheets/INSHistory.pdf

[3] 9/11 Commission Report: Staff Monographs. (2004). Monograph on 9/11 and Terrorist Travel. National Commission on Terrorist Attacks Upon the United States. Retrieved May 19, 2022, from https://www.9-11commission.gov/staff_statements/911_TerrTrav_Ch1.pdf

[4] ICE. (n.d.). ICE. U.S. Immigration and Customs Enforcement. Retrieved May 19, 2022, from https://www.ice.gov

[5] Wang, N., McDonald, A., Bateyko, D., & Tucker, E. (2022, May). American Dragnet, Data-Driven Deportation in the 21st Century. Center on Privacy & Technology at Georgetown Law. https://americandragnet.org

[6] U.S. Department of Health, Education and Welfare. (July 1973) Records, Computers and the Rights of Citizens, Retrieved May 19, 2022, from https://aspe.hhs.gov/reports/records-computers-rights-citizens

[7] Fair Information Practice Principles (FIPPs) in the Information Sharing Environment. https://pspdata.blob.core.windows.net/webinarsandpodcasts/The_Fair_Information_Practice_Principles_in_the_Information_Sharing_Environment.pdf

[8] Department of Homeland Security. (2020, July). Privacy Impact Assessment for CBP and ICE DNA Collection, DHS/ALL/PIA-080. Retrieved May 19, 2022, from https://www.dhs.gov/sites/default/files/publications/privacy-pia-dhs080-detaineedna-october2020.pdf

[9] Solove, Daniel J. (2006). A Taxonomy of Privacy. University of Pennsylvania Law Review, 154:3 (January 2006), p. 477. https://ssrn.com/abstract=667622

[10] Nissenbaum, Helen F. (2011). A Contextual Approach to Privacy Online. Daedalus 140:4 (Fall 2011), 32-48.

June 20, 2022

Big Data to Battle COVID

Big Data to Battle COVID
By Anonymous | June 16, 2022

How China used new technology to control the spread of the deadly disease, but can it be recreated in the west and at what expense to personal privacy?

When COVID first emerged on the world stage in late 2019, many governments were unprepared to respond to a global pandemic. Despite being the initial source of the outbreak, China has arguably been one of the best at containing and limiting the spread of the virus. That success is often credited in part to new technologies they deployed, but critics have questioned whether it comes at the expense of personal privacy and if it could be reproduced in other countries.

Containing COVID through Smart Phones & A New “Health Code” System

Within weeks of the virus first being detected, two tech giants in China, Alibaba and Tencent quickly began developing competing but similar solutions to tackle the challenge of controlling the spread of the virus. On February 9^th, WeChat and AliPay, two of the biggest smartphone apps used in China, launched “health code” systems that use multiple sources of data to determine your exposure risk and display a green, yellow or red QR code to indicate your risk level (Liang, 2020). Green indicates the user is healthy and able to travel freely. Yellow or red indicates a risk of exposure and the need to quarantine. The solutions were quickly adopted by local governments, employers and business across China and became mandatory in over 300 cities for entry into public places like restaurants, grocery stores or transportation. Within its first month, the WeChat app alone had been used over 6 billion times (CSIS, 2020).

Figure 1: Alipay Health Code app displaying a green QR code to indicate the user is low risk and can move freely (CSIS, 2020)

Solutions like this have been credited with helping China contain the spread of COVID much more effectively than most other countries. For example, in June of 2020, a rare “super spreader” event occurred at a wet market in Beijing and hundreds of individuals were infected. In a densely populated city of 21 million people, this represented a significant risk to China’s containment efforts but through the use of this system, mass testing and containment, the number of new infections was back to zero within three weeks (Tian, 2020).

Figure 2: A person’s health code is scanned and validated at a checkpoint (CSIS, 2020)

How does the health code system work?

The reality is there is very little public information about the exact algorithm used to determine an individual’s exposure risk “color code” or how the data is managed. In past interviews, the Alipay parent company has declined to answer questions and has only stated that “government departments set the rules and control the data” (Mozur, et al, 2021). What is known is that the system relies on a considerable amount of personal data. Users self-certify whether they have any COVID symptoms, information about their movements comes from their “check ins” at public places as well as geolocation data from the phone carriers and finally their personal interactions are mined from their digital transactions with others (Liang, 2020). This is combined with data from local health authorities about infections to determine whether the user was in a high-risk location or interacted with someone with known exposure (Zhang, 2022).

A Model for Other Countries?

Could this model be recreated in western countries? Not likely. The amount of personal data being collected by private/government entities, the lack of transparency about how its used, and the fact that the use of this system is required, conflicts with the trend towards data privacy in most western nations. Multiple privacy standards including the General Data Protection Regulation (GDPR), California Consumer Privacy Act (CCPA), Children’s Online Privacy Protection Act (COPPA) and the Health Insurance Portability and Accounting Act (HIPAA) would make navigating implementation difficult and the many concerns they raise about user consent and risk of misuse from this mass surveillance would need to be addressed. Additionally, an estimated 12% of the U.S. population don’t own a smartphone which risks harm to this population and would impact it’s success (Kolmar, 2022).

Most importantly is the human element. The health code system is a tool but its effectiveness depends on mass adoption by the population and governments would need to deploy significant resources to enforce its use like we see in China. Given the high priority many western countries place on individual freedoms and the challenges experienced in adopting even simple masking policies, it suggests the health code system is a good fit for China, but difficult to reproduce outside its borders.

References

Zhou, S. L., Jia, X., Skinner, S. P., Yang, W., & Claude, I. (2021). Lessons on mobile apps for COVID-19 from China. Journal of Safety Science and Resilience, 2(2). https://doi.org/10.1016/j.jnlssr.2021.04.002
China’s Novel Health Tracker: Green on Public Health, Red on Data. (2020). CSIS: Center for Strategic & International Studies. https://www.csis.org/blogs/trustee-china-hand/chinas-novel-health-tracker-green-public-health-red-data-surveillance
Liang, F. (2020). COVID-19 and Health Code: How Digital Platforms Tackle the Pandemic in China. Social Media + Society, 6(3), 205630512094765. https://doi.org/10.1177/2056305120947657
O’Neill, P. H. (2020, October 9). A flood of coronavirus apps are tracking us. Now it’s time to keep track of them. MIT Technology Review. https://www.technologyreview.com/2020/05/07/1000961/launching-mittr-covid-tracing-tracker/
Tian, T. (2020, July 16). Covid health code reveals China’s big data edge. Fidelity International. https://www.fidelityinternational.com/editorial/blog/covid-health-code-reveals-chinas-big-data-edge-71af0e-en5/
Zhang, P. (2022, April 5). The colour-coded Covid app that’s become part of life in China – despite the red flags. South China Morning Post. https://www.scmp.com/news/china/science/article/3173123/colour-coded-covid-app-thats-become-part-life-china-despite-red
Kolmar, C. (2022, June 7). S. Smartphone Industry Statistics [2022]: Facts, Growth, Trends, And Forecasts – Zippia. Zippia. https://www.zippia.com/advice/us-smartphone-industry-statistics/#:%7E:text=12%25%20of%20Americans%20own%20non,of%20Americans%20were%20smartphone%20owners.
Mozur, P., Zhong, R., & Krolik, A. (2021, July 26). In Coronavirus Fight, China Gives Citizens a Color Code, With Red Flags. The New York Times. https://www.nytimes.com/2020/03/01/business/china-coronavirus-surveillance.html

June 20, 2022

STOP using ZOOM or risk being WATCHED by the Chinese Communist Party (CCP)

STOP using ZOOM or risk being WATCHED by the Chinese Communist Party (CCP)
By Anonymous | June 16, 2022

Tweet Long Lede Sentence:

Zoom, an American success story or quasi-spy agency to the CCP? Since its popularity, thanks to COVID, Zoom has terminated American users’ account by behest of the CCP, routed American users’ data through china, and has enjoyed an investigation by the Department of Justice.

Background

Unless you have lived under a rock, you have heard about Zoom, right? A video chatting app that was started in 2011 by a Chinese native in the United States and is headquartered in San Jose, California. It exploded in popularity during the COVID era, where it went from obscurity to a household name. Sounds like the quintessential American dream. However, behind the façade of a Silicon Valley success story, Zoom has some very disturbing secrets. The app has already been banned by NASA, SpaceX, Taiwan, New York City’s Department of Education, and many more. You may ask why?

Zoom capitulates to Chinese censorships

So why is Zoom Controversial?

In 2020, US federal prosecutors launched an investigation about Zoom Executives working with the Chinese Communists party to “surveil users and suppress video calls.” Justice Department hinted that the Americans’ accounts who were doing a video call about the 1989 Tianmen massacre were terminated. This does not sound appetizing, does it? Imagine being an American, performing a zoom call from the US to advocate for democracy only for your account to get terminated. What was Zoom’s response to these allegations you may ask? We “..will no longer allow requests from the Chinese government to affect users outside mainland China,” said the company.

In another case, in 2021, Zoom agreed to pay $86m for a lawsuit in the US for shortcomings in its security practices. This lawsuit was brought up against Zoom due to “Zoombombing,” which happens when a hacker enters a Zoom meeting to create trouble. At this point, you may wonder that surely Zoom has changed and updated its privacy policies due to these shortcomings.

Should you still use Zoom in 2022?

Red Flags in Zoom’s Privacy Policy

Like most tech companies, Zoom collects both active and passive data on its users and use them for marketing purposes. Nothing out of ordinary here in comparison to other tech companies. However, they also collect Meeting and Messaging content. In other words, any and all content generated in a meeting is being collected, such as audio, video, messages, and chats. Basically, everything you say is being stored passively. Shocking, right? Wrong! Here is the kicker. Under their “Legal Reasons” section, Zoom says that they will share personal information with government agencies if required. Keep in mind that the CCP is a government agency. At this point, you may argue that your data may not be stored in China, so, therefore, the CCP will not have access to your data. Wrong! In 2020, Zoom admitted that “calls got mistakenly routed through China.” Conveniently, Zoom did not say how many users were affected. Even more conveniently, all companies in China are required by law to give the CCP access to user data. So what guarantees us that the CCP did not store users’ data while it was being routed through China? Oh and to add the cherry on top, Zoom of course has “taken” precautions for this mistake not to happen again.

Despite, all these red flags, unfortunately, most academic institutions, including prestigious universities like Berkeley, and American companies, still, primarily use Zoom as their main source of communication. This needs to stop. I recommend that Zoom be forced to sell its American branch to another American company or sever its ties completely with its Chinese branch.

References

Bloomberg. (n.d.). Bloomberg.com. Retrieved June 16, 2022, from https://www.bloomberg.com/billionaires/profiles/eric-s-yuan/#:~:text=Eric%20Yuan%20was%20born%20on,University%20of%20Mining%20and%20Technology.

Video conferencing, web conferencing, webinars, screen sharing. Zoom. (2022, April 19). Retrieved June 16, 2022, from https://explore.zoom.us/en/about/

Vigliarolo, B., Staff, T. R., Wolber, A., Whitney, L., Pernet, C., Alexander, M., & Combs, V. (2020, April 9). Who has banned zoom? Google, NASA, and more. TechRepublic. Retrieved June 16, 2022, from https://www.techrepublic.com/article/who-has-banned-zoom-google-nasa-and-more/

Harwell, D., & Nakashima, E. (2020, December 19). Federal prosecutors accuse Zoom Executive of working with Chinese government to surveil users and suppress video calls. The Washington Post. Retrieved June 16, 2022, from https://www.washingtonpost.com/technology/2020/12/18/zoom-helped-china-surveillance/

BBC. (2021, August 1). Zoom settles US class action privacy lawsuit for $86M. BBC News. Retrieved June 16, 2022, from https://www.bbc.com/news/business-58050391

Privacy. Zoom. (2022, April 5). Retrieved June 16, 2022, from https://explore.zoom.us/en/privacy/

Wood, C. (2020, April 6). Zoom admits calls got ‘mistakenly’ routed through China. Business Insider. Retrieved June 16, 2022, from https://www.businessinsider.com/china-zoom-data-2020-4

Bradley A. Thayer, opinion contributor. (2021, January 7). For Chinese firms, theft of your data is now a legal requirement. The Hill. Retrieved June 16, 2022, from https://thehill.com/opinion/cybersecurity/532583-for-chinese-firms-theft-of-your-data-is-now-a-legal-requirement/

Image credit:

https://www.salon.com/2020/06/15/zoom-capitulates-to-chinese-censorship-shutting-down-activists-accounts/

Josh, it, N. buying, Summers, J., Anon, Ruelas, R., Andy, Pedro, P., mchardy, C., Thom, Ray, & E, A. (2022, February 3). Should you still use Zoom in 2022? (hint: Security is not an issue anymore). All Things Secured. Retrieved June 16, 2022, from https://www.allthingssecured.com/tips/stop-using-zoom/

June 20, 2022June 20, 2022

Can fake data be good?

Can fake data be good?
By Anonymous | June 20, 2022

With the chaos caused by deep fake videos in recent years, can fake generated data also do good? Apparently yes, synthetic data has been playing an important role in machine learning in recent years. Many in the AI industry even think that using synthetic data will become more commonplace than using real data as techniques to generate synthetic data improve.

Image source: NVIDIA Blog – Graph of projected synthetic data usage in 10 years

And there’s an ever growing list of new companies that focus on technology to generate all kinds of synthetic data (Devaux 2022). An interesting, albeit a bit creepy example, is the https://thispersondoesnotexist.com/ website that generates a realistic, photo-like image of a person that does not actually exist. Below image contains example images from that site.

Image source: thispersondoesnotexist.com – Examples of generated human face images

This article provides a brief overview of synthetic data and its uses. For more details, refer to the Towards Data Science podcast on synthetic data that much of this information originates from.

What is synthetic data?

Synthetic data is data generated by an algorithm instead of collected from the real world (Devaux 2021). Depending on the use case the data is generated for, it can have different properties from its source data. Even though it’s generated, the statistics behind it are based on real world data so that its predictive value remains intact.

You can also generate synthetic data from simulations of the real world. For example, self-driving vehicles require a lot of data to run safely. And sometimes it’s difficult to come by, not to mention unethical to create, situations that it would need to be aware of like an accident (Devaux 2022). By simulating such incidents, you can generate data about them without endangering anyone.

Why synthetic data?

Synthetic data has many use cases from nefarious actors generating deep fake videos to more positive use cases like augmenting data collected from the real world if for example the dataset is too small or limited in some way (Andrews 2022). By far one of the more touted reasons for using synthetic data is privacy protection and speed of development.

Privacy is a huge factor in using synthetic data instead of real world data. Data from industries like finance, medicine and other sensitive areas aren’t readily available and have a lot of hurdles to go through to get access. But synthetic data generated from the mathematical properties of those data do not need protection because they don’t reveal anything about the individuals in the original dataset.

This brings me to the next reason for using synthetic data which is the efficiency with which researchers can get access to data. Often real world data is either buried in privacy and security restrictions or expensive and time consuming to collect and transform properly. Synthetic data provides an alternative to that without losing the predictive power of the original data.

Another useful way to use synthetic data is to test AI systems. As regulators and companies alike use AI more in their products and businesses, they need a way to test those systems without violating privacy. Synthetic data provides a good alternative.

Challenges of synthetic data

Overfitting is one of the challenges of generating synthetic data. This can happen if you generate a large synthetic dataset from a small real world dataset. Because the pool of source data is limited, the model you create from the synthetic data will usually overfit to the characteristics in that smaller dataset. In extreme cases this can lead to a model memorizing some specific individual data which violates privacy completely. There are techniques like detecting and removing data that’s too similar between the generated and source datasets or removing data points that are outliers that help prevent overfitting.

Another big challenge is bias. If you’re unaware of a bias in the original dataset, generating another dataset from that original will just duplicate that bias. In some cases, it can even exacerbate the bias if the generated dataset is much larger than the original. There are a lot of tools and currently a lot of work going on in the field to detect and prevent bias in data.

Conclusion

Synthetic data is becoming a mainstay of machine learning. It provides a way to continue to innovate despite the difficulty of collecting real data while still protecting the privacy of the individuals in the original data. Even though there are still big challenges in using these techniques, it seems using synthetic data will continue to be a growing part of AI development.

References

Andrews, G. (2022, May 19). What Is Synthetic Data? | NVIDIA Blogs. NVIDIA Blog. Retrieved June 17, 2022, from https://blogs.nvidia.com/blog/2021/06/08/what-is-synthetic-data/
Devaux, E. (2021, December 15). Introduction to privacy-preserving synthetic data – Statice. Medium. Retrieved June 17, 2022, from https://medium.com/statice/introduction-to-privacy-preserving-synthetic-data-f5bccbdb8e0c
Devaux, E. (2022, January 7). List of synthetic data startups and companies — 2021. Medium. Retrieved June 17, 2022, from https://elise-deux.medium.com/the-list-of-synthetic-data-companies-2021-5aa246265b42
Hao, K. (2021, June 14). These creepy fake humans herald a new age in AI. MIT Technology Review. Retrieved June 17, 2022, from https://www.technologyreview.com/2021/06/11/1026135/ai-synthetic-data/
Harris, J. (2022, May 21). Synthetic data could change everything – Towards Data Science. Medium. Retrieved June 17, 2022, from https://towardsdatascience.com/synthetic-data-could-change-everything-fde91c470a5b
Somers, M. (2020, July 21). Deepfakes, explained. MIT Sloan. Retrieved June 17, 2022, from https://mitsloan.mit.edu/ideas-made-to-matter/deepfakes-explained
Watson, A. (2022, March 24). How to Generate Synthetic Data: Tools and Techniques to Create Interchangeable Datasets. Gretel.Ai. Retrieved June 17, 2022, from https://gretel.ai/blog/how-to-generate-synthetic-data-tools-and-techniques-to-create-interchangeable-datasets#:%7E:text=Synthetic%20data%20is%20artificially%20annotated,learning%20dataset%20with%20additional%20examples.

Image References

Wang, P. (n.d.). [Human faces generated by AI]. Thispersondoesnotexist.com. https://imgix.bustle.com/inverse/4b/17/8f/0e/cf91/4506/99c7/e6a491c5d4ac/these-people-are-not-real–they-were-produced-by-our-generator-that-allows-control-over-different-a.png?w=710&h=426&fit=max&auto=format%2Ccompress&q=50&dpr=2
Andrews, G. (2022, May 19). What Is Synthetic Data? | NVIDIA Blogs. NVIDIA Blog. Retrieved June 17, 2022, from https://blogs.nvidia.com/blog/2021/06/08/what-is-synthetic-data/

June 20, 2022

Tesla: Should You Pay For My Car Insurance?

Tesla: Should You Pay For My Car Insurance?
By Melinda Leung | June 17, 2022

Tweeter Lede: NHTSA published a report identifying that Tesla is involved in 75% of incidents involving autonomous vehicles. But before we blame Tesla, we need to better understand the data and context behind those numbers.

The National Highway Traffic Safety Association (NHTSA) recently published their first ever report on vehicles using driver-assist technologies. They found that there have been 367 crashes in the last nine months involving vehicles that were using these types of advanced driver assistance systems. Almost 75% of the incidents involved a Tesla system functioning on Tesla’s iconic Autopilot, three of which led to injuries and five resulting in death.

Before we jump the gun, blame Tesla for these issues, and forever veto all self-driving vehicles, we need to take a step back and understand the context behind the numbers.

Levels of Autonomy

First, how do autonomous vehicles even work? There are actually five levels of autonomy.

Level 0: We’re still driving. Hence, this is not even considered a level in the autonomy scale.
Level 1: The vehicle can assist with basic steering, braking and accelerating.
Level 2: The vehicle can control both steering, braking and acceleration (adaptive cruise control, lane keep). However, the human driver still needs to monitor the environment at all times. This is the level Tesla currently is officially at.
Level 3: The vehicle can perform most driving tasks. However, the human driver needs to be ready to take control if needed and essentially acts as the figure behind the wheel.
Level 4: The vehicle can perform all driving tasks and monitor the driving environment in most conditions. The human doesn’t have to pay attention during those times.
Level 5: Forget steering wheels. At this point, the vehicle completely drives for you and the human occupants are just passengers. They are not involved in the driving whatsoever.

There are 5 levels of autonomous vehicles. (Source: Lemberg Law)

Now that we understand autonomy, we now know that there is still a human component to Tesla’s Autopilot feature. Full self-driving isn’t fully here yet, so accidents that occur with driver-assisted technologies still very much involve human interaction.

So Can We Blame Tesla Yet?

Not quite yet. Because Tesla is the brand name associated with autonomous vehicles, it is no surprise that they happen to also sell the largest number of vehicles with the most advanced driver assistance technologies. Therefore, by being purely the largest and most well known amongst the autonomous vehicle industry, it is not surprising that they are responsible for the largest count of crashes that occurs. What is more useful is to understand the percentage of accidents that occur by the number of miles driven. A classic base rate fallacy.

Unlike most automakers, Tesla also knows exactly which vehicles were using Autopilot at the time of a crash. Its vehicles are equipped with cellular connectivity that automatically reports this information back to Tesla when a crash occurs. Not all vehicles do so. Therefore, Tesla’s systems may also be better at relaying crash information than others.

Next, what if the crash was going to happen irregardless if the vehicle was in Autopilot or not. For example, if the car behind you was driving too quickly and rear-ended you, it didn’t matter who was driving: you would have been hit no matter what. Because we don’t really have any context in the type of accident, it makes it difficult to understand who is at fault.

Lastly, according to NHTSA, these companies need to document crashes if any automated technologies were used within 30 seconds of impact. According to Waymo, which is Google’s autonomous driving division, a third of its reported crashes took place when the vehicle was in manual mode but still fit within this 30 second range. They are one of the oldest players in this industry and we can extrapolate and expect similar stats for Tesla. If that’s the case, it’s really difficult to 100% blame Tesla.

If two cars on Autopilot crash and this is a common occurrence, then yes, let’s make Elon Musk pay for our increased car insurance policies. But until we have a lot more data about the conditions of these crashes, it’s hard for us to determine who is really at fault and make sweeping assumptions about the safeness of these vehicles.

References

National Highway Traffic Safety Association. (2022, June). Summary Report:Standing General Order on Crash Reporting for Level 2 Advanced Driver Assistance Systems. https://www.nhtsa.gov/sites/nhtsa.gov/files/2022-06/ADAS-L2-SGO-Report-June-2022.pdf
Lemberg Law. What You Need to Know about Driverless Cars. https://lemberglaw.com/are-driverless-cars-safe/
McFarland, Matt. (2022, March 16). CNN. Tesla owners say they are wowed — and alarmed — by ‘full self-driving’. https://www.cnn.com/2021/11/03/cars/tesla-full-self-driving-fsd/index.html
Hawkins, Andrew. (2022, June 15). The Verge. US releases new driver-assist crash data & surprise, it’s mostly Tesla. https://www.theverge.com/2022/6/15/23168088/nhtsa-adas-self-driving-crash-data-tesla

June 20, 2022

This Pride Month, the Fight for LGBTQ Equality Continues

This Pride Month, the Fight for LGBTQ Equality Continues
By Dustin Cox | March 16, 2022

Is it time for another Stonewall? Progress, as LGBTQ folks have learned, is often hard fought. Recent events have reminded us that equality is still elusive in places across the United States, and progress is far from inevitable. Limitations on how we talk about, categorize, and analyze data are a little-understood front in GOP culture wars targeting LGBTQ people at various levels of government. These limitations fundamentally undermine the ability for researchers, public policy makers, and data scientists to understand the LGBTQ community, craft effective measures to promote general welfare, and design effective AI and machine-learning-powered capabilities for good.

Republican Governor Ron DeSantis recently rammed through Florida’s “Don’t Say Gay” law, which will “prohibit instruction about sexual orientation and gender identity” and goes into effect just two weeks from now on July 1st (CBS, 2022). This chilling limitation on speech is a stark reminder that many on the political right desire to take us back to a time where LGBTQ people are relegated to the closet, marginalized, and even criminalized in society. It threatens LGBTQ youths’ safety by ensuring they have less knowledge and feel more isolated; jeopardizes LGBTQ teachers’ jobs and livelihoods if they let slip that they have spouses who aren’t straight; and demeans LGBTQ families in Florida. And that’s the point. No discussing data, no studying it, and certainly no progress if it has anything to do with LGBTQ topics.

It’s not only at the state level that we see such policies being enacted. The federal government, under Former President Donald J. Trump, advanced various changes that were aimed at erasing LGBTQ people from critical data sources used for a wide array of government initiatives. The Health and Human Services Department removed sexual orientation from their data collection activities, most notably the National Survey of Older Americans Act Participants and the Annual Program Performance Report for the Centers for Independent Living, which will limit the ability to serve LGBTQ seniors (Kozuch, 2017). The US Census Bureau even altered a report to congress by literally erasing their plans to measure sexual orientation and gender identity in the America Community Survey (HRC, 2017).

As many data scientists will tell you, “more data beats better algorithms.” That is to say that – embarrassingly often – simply having higher quality or more training data will yield more accurate predictions than new algorithms do. The right-wing push to suppress, eliminate, and criminalize data and speech about LGBTQ people strikes at the very core of our ability to utilize AI and machine learning techniques to categorize, understand, model, and predict in ways that would benefit LGBTQ communities. When LGBTQ people are categorized incorrectly, lumped together with broader groups, or thrown away as “residual,” it makes for worse health outcomes, fewer government resources, inequitable public policy, and more. Again… that’s the point.

While these policies are more abstract in nature, they discriminate, harass, and abuse the LGBTQ community like police officers did decades ago in New York City. Drag queens, trans women of color, lesbians, and gays rose up against their antagonists in the Stonewall Uprising in 1969 and demanded equality (History.com, 2022)… it just may be time to rise up against this new wave of oppressors who would see us erased.

References
[1] CBS Miami Team (2022). CBS Broadcasting, Inc. https://www.cbsnews.com/miami/news/gov-ron-desantis-addresses-woke-gender-ideology-dont-say-gay-law/
[2] Kozuch, Elliott (2017). Human Rights Campaign. https://www.hrc.org/news/hrc-calls-on-trump-admin-to-reinstate-sexual-orientation-question
[3] HRC Staff (2017). Human Rights Campaign. https://www.hrc.org/news/trump-administration-eliminates-lgbtq-data-collection-from-census
[4] Schnoebelen, Tyler (2016). More Data Beats Better Algorithms. Data Science Central. https://www.datasciencecentral.com/more-data-beats-better-algorithms-by-tyler-schnoebelen/
[5] History.com Editors (2022). History.com. https://www.history.com/topics/gay-rights/the-stonewall-riots

Images
Image 1: https://news.harvard.edu/gazette/story/2019/06/harvard-scholars-reflect-on-the-history-and-legacy-of-the-stonewall-riots/
Image 2: https://cbs4indy.com/news/bill-passes-in-senate-would-allow-businesses-to-deny-service-to-gay-couples/
Image 3: https://www.documentarytube.com/articles/stonewall-riots-the-protest-that-inspired-modern-pride-parades

March 16, 2022

Privacy Risks in Brain-Computer Interfaces and Recommendations

Privacy Risks in Brain-Computer Interfaces and Recommendations
By Mia Yin | March 16, 2022

1. What is BCI?
BCI is a pathway connecting brain with an external device, most commonly a computer or a robotic limb. BCI is used to collect and process brain’s neurodata, and then the neurodata will be translated into outputs used in the visualizations or used as commands to tell the external interfaces/machines how to control people’s behavior or modulate neural activity. Because the neurodata is generated from people’s nerve system, it is also personal data.

BCIs are currently mostly used in gaming and healthcare. For example, in the gaming industry, BCIs use neurodata to allow players to control their gaming actions by their conscious thoughts. BCIs games provide greater immersion in games.

There are three main categories of BCIs:
a. BCIs that record brain activity;
b. BCIs that modulate brain activity;
c. BCIs that do both, also called bi-directional BCIs/BBCIs

BCIs can be invasive or non-invasive. Invasive BCIs enables the direct communication between the brain and an external device, like a computer. They are inserted in the brain.

Unlike invasive BCIs, noninvasive BCIs are not inserted in the brain. They are equipped outside and can also record neurodata.

2. BCIs risks including BCIs accuracy and mental privacy
BCIs accuracy: BCIs data accuracy is quite important especially in the healthcare industry. Patients who use BCIs depend its accurate translation to express their thoughts to the doctors. Some patients also reply on BCIs to mitigate disorders. For example, patients who suffer from epilepsy rely on BCIs to get mitigations. If BCIs process neurodata incorrectly, patients may have bad health consequences, even death.And also doctors depend on BCIs’ accurate neurodata information to provide the best treatment. The device data and interpretation accuracy need to be verifiable, sufficient and reliable.

Mental privacy: Since BCIs collect and process personal neurodata to get people’s thoughts and conscious or unconscious intentions, BCIs raise new mental privacy risks to the neural networks in addition to the existing privacy risks that are related to people’s personal health data. For example, some wheelchairs are controlled by BCIs. Patients who use such a wheelchair can control the wheelchair to go to a place for food when they are thinking about food. However, these BCIs can also collect information about patient’s food preferences, at what time a patient may feel hungry or thirsty etc. These neurodata can show a lot of personal biological and private information. If the data is shared with other organizations, it may cause many privacy problems, such as disclosing a patient’s medical condition to an employer or other public entities.

3. Technical and policy recommendations
Technical recommendation: BCIs can provide more control for users to collect neurodata. For example, BCIs can ask the user if they want to start the neurodata collector. This feature prevent users switch on the privacy collection unintentionally and give users more control over personal neurodata flows.

Policy recommendation: More transparency should be displayed in the privacy policy. The policy should tell the users about what data BCIs may collect, what purpose will be used for, who controls and has access to the data, how data will be stored etc. Developers and regulators should clearly reflect the particular privacy risks in BCI applications and let users decide whether or not to give the informed consent to use BCIs.

4. Conclusion:
BCI is an advanced computer-based system which collects and process a lot of personal neurodata. Stakeholders must understand how BCI work and what BCI stores and translates. BCI has many privacy risks that may expose personal data to public entities, thus more technical methods and privacy policy need to be improved to protect the private data and ensure the data is secured and not used in any unwanted purposes.

References:
[1] https://fpf.org/blog/bci-commercial-and-government-use-gaming-education-employment-and-more/
[2] https://fpf.org/blog/bcis-data-protection-in-healthcare-data-flows-risks-and-regulations/
[3] https://fpf.org/blog/bci-technical-and-policy-recommendations-to-mitigate-privacy-risks/