When It Comes to Data: Publicly Available Does Not Mean Available for Public Use

When It Comes to Data: Publicly Available Does Not Mean Available for Public Use
Anonymous | October 14, 2022

Imagine it’s 2016 and you’re a user on a dating platform, hoping to find someone worth getting to know. One day, you wake up and find out your entire profile, including your sexual orientation, has been released publicly and without your consent. Just because something is available for public consumption does not mean it can be removed from its context and used somewhere else.

What Happened?
In 2016, two Danish graduate research students, Emil Kirkegaard and Julius Daugbjerg Bjerrekæ, released a non-anonymized dataset of 70,000 users of the OK Cupid Platform, including very sensitive personal data such as usernames, age, gender, location, sexual orientation, and answers to thousands of very personal questions WITHOUT the consent of the platform or its users.

Analyzing the Release Using Solove’s Taxonomy
In case it wasn’t already painfully obvious, there are serious ethical issues in the way the researchers both collected and released the data. In a statement to Vox, an OK Cupid spokesperson emphasize that the researchers violated both terms of service and privacy policy of platform.

As we’ve discussed in class, users have an inherent right to privacy. OK Cupid users did not consent to have their data accessed, used, or published in the way it was. If we examine this using Solove’s Taxonomy framework, it becomes clear that the researchers violated every point he made in his analysis. In terms of information collection, this would constitute as surveillance, especially given the personal nature of the data. As for information processing, this is a gross misuse of the data and blatantly violates the secondary use and exclusion clauses. None of the users consented to having their data used for any type of study nor did they consent to having it published. The researchers argued that the data is public and that by signing up for OK Cupid, the users themselves provided that data for public consumption. This is true to an extent: the users consented to having their profile data accessed by other users of the platform—all a person had to do to access the data is create an account. What the users did not consent to was having that data be publicly available off the platform and then used by researchers not associated with OK Cupid to conduct unauthorized studies. The researchers also did not provide users with the opportunity to have a say in how their data was being used. According to Solove, exclusion is “a harm created by being shut out from participating in the use of one’s personal data, by not being informed about how that data is used, and by not being able to do anything to affect how it is used.” The researchers clearly violated every principle of Solove’s Taxonomy in every step of their process.


The aggregation of the data itself was unethical as they did not ask anyone for permission before scraping the website. This coupled with the fact that the researchers purposely not to make the data anonymous is beyond atrocious—the only reason the dataset did not include pictures was because they would have taken up too much space. And when asked why they didn’t remove usernames, the researchers’ response was that they wanted to be able to edit the data at a later time, in the event they gained access to more information. Let me repeat that again. They wanted to be able to edit the dataset and update user information and make the dataset as robust as possible with as much information as they could find. For example, if a user uses the same username across different platforms and had their height or race listed on a different platform, then they could crosscheck that platform and update the dataset. This also puts users at risk, particularly users whose sexuality or their lifestyle could make them targets of discrimination or hate crimes. That type of information is very private and has no place being publicized in this manner.

The dissemination of the information was a total and unethical breach of confidentiality and gross invasion of privacy. As I’ve already stated, none of the users consented to have their very sensitive personal data scraped and published in a dataset that would be used for other studies.

The Defense
Kirkegaard defended their decision to release the dataset under the claim that the data is already publicly available. There is so much wrong with that statement.

Publicly Available Does Not Mean Available for Public Use
What does this mean? It means that just because a user consents to having their data on a platform does not mean that data can be used in whatever capacity a researcher wants. This concept also shreds the researchers’ defense. Users of OK Cupid consented to have their data used only as outlined in the company’s privacy policy and terms of service, meaning it would only be accessed by other users on the app.

What they did not consent to was having a couple of Danish researchers publish that data for anyone and everyone in the world to see. At this point in time OK Cupid was not using real names, only aliases but the idea that someone could connect an alias or username to a real life individual and access their views on everything from politics to whether they think it’s okay for adopted siblings to date each other to sexual orientation and preferences. (Yeah, I know, my hackles went up too.)

The impact this release of research had on its users is its own separate issue. Take a second to go through this blog post by Chris Girard: https://www.chrisgirard.com/okcupid-questions/. It shows the thousands of question OK Cupid users answer in their profiles, which were also released as part of the dataset.

Based off Solove’s Taxonomy, we can conclude that the researchers’ actions were unethical. Their defense was that the data was already publicly available. I argue that just because that data can be accessed by anyone who creates an OK Cupid account does not mean that it can be used for anything other than what the users have consented to. And to reiterate once again, NONE of them consented to having their data published and then used to conduct research studies both on and off the platform. Even if OK Cupid wanted to conduct an internal study on dating trends, they would still need to get consent from their users to use their data for that study.

The Gravity of the Implications and Why Ethics Matter
This matter was settled out of court and the dataset ended up being removed from the Center for Open Science (the open-source website where it was published) after a copyright claim was filed by the platform. Many people within the science community have condemned the researchers for their actions.

The fact that the researchers never once questioned the morality of their conduct is a huge cause for concern. As data scientists, we have an obligation to uphold a code of ethics. Just because we can do something does not mean we should. We need to be accountable to the people whose data we access. There is a reason that privacy frameworks and privacy policies exist. As data scientists, we need to put user privacy above all else.

https://www.vox.com/2016/5/12/11666116/70000-okcupid-users-data-release
https://www.vice.com/en/article/qkjjxb/okcupid-research-paper-dmca
https://www.vice.com/en/article/53dd4a/danish-authorities-investigate-okcupid-data-dump

 

Have you consented to everything that TikTok may be collecting?

Have you consented to everything that TikTok may be collecting?
Menaal Saeed | October 14, 2022

Lede: Although TikTok may have increased in popularity and is a staple for the younger generation, this comes at a cost regarding privacy and consent for these users. With the recent studies showing that TikTok may be tracking users keystroke data, are you willing to continue using the services?

Recent studies suggest that TikTok is utilizing methods to track users keystroke data and as a result is failing to adhere to standards presented by privacy framework, Solove’s Taxonomy and the Belmont Report, tenants to abide by when performing research. Solove’s Taxonomy is a framework that is useful to identify potential harms in the data lifecycle. The Belmont Report is a set of standards for researchers to adhere to when humans are the subject of the research. Whether the lack of privacy and consent is intentional or unintentional, it can be disastrous to the users, potentially you and I, who are unaware or unwilling. While this is an ongoing issue, TikTok’s popularity surged in 2020 and has been widely used globally. Many teenagers see it as a “search engine” (Huang 2022). They utilize the content presented to them to gather information that is easier to digest than reading an article or watching a tutorial video (Huang 2022). This is an interesting phenomenon but comes at cost. Felix Krause, a former Google engineer’s research identified risk of the browser in the application having a “built in functionality” that “tracks users’ online habits” (Mozur & Mac & Che 2022). This is dangerous if the application is tracking when users are entering credit card numbers and password credentials into other browsers. Already, the U.S government has been skeptical of using this application because of the connection of code being connected to servers abroad (particularly, in China) (Chen 2020). With the existing skepticism and this news, it is clear that applications like TikTok are interested in bypassing certain privacy and Belmont report standards. [IMAGE 3]

When analyzing this concern through the lens of Solove’s Taxonomy, all stages of the data lifecycle are at risk. Surveillance is a risk as there are users that are certainly unaware that their keystroke data is captured while using the app. After scanning TikTok’s privacy policy, no evidence of tracking keystroke data is available.  Information processing is a risk as users did not consent to their keystroke data collection and repurposing for other use cases. The information dissemination risk also runs high as this personal data is captured by keystrokes, and if in the wrong hands can be extremely dangerous to users. This could lead to fraudulent incidents surrounding the user’s credit information and worse. This also leads to increased accessibility on the collected data (credit information, password credentials) because it is presumably stored in one place where it can be mishandled. The invasion into peoples affairs is intrusive as users are unaware that this data is collected without explicit consent.

The Belmont report clearly defines the necessity for respect for Persons, Justice and Beneficence. Consent which is defined as explicit permission is a tenant of the Belmont Report and it is violated.  If TikTok is tracking keystroke data without users consent, then they are denying the right for autonomy by denying them the right to consent to this feature. Beneficence, which attempts to minimize harm to persons, is also violated by TikTok as they are potentially collecting information on users that could have dangerous effects (such as fraudulent credit purchases and targeting). Lastly, Justice which attempts to avoid burden on certain groups is also violated in this case as the most frequent users of TikTok (those between 10 and 19 years old who make up 32.5% of all users) are at increased risk for having their sensitive data tracked, stored and collected (Doyle, 2022). This is scary as the younger generation doesn’t know any better but to use the services and potentially naively and unknowingly open up themselves to harm.  [IMAGE 1]

While TikTok is a widely used and loved application (with 1.39 billion users), it is clear that if the organization of TikTok’s builders, application developers and leadership continue to violate privacy tenants and guidelines, they will continue to be looked down upon by consumers in the U.S (Ruby, 2022). These guidelines presented by Solove and the Belmont Report should be adhered to, to ensure the safety of the application’s users. I urge you to consider these potential risks the next time you want a daily dose of your TikTok feed.  [IMAGE 2]

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

https://dfwchild.com/all-the-rage-whats-up-with-tiktok-and-fake-instagrams/
IMAGE 2 https://insights.gostudent.org/us/keep-kids-safe-on-tiktok
IMAGE 3 https://www.avast.com/c-keylogger

References

Chen, B. X. (2020, August 26). The lesson we’re learning from Tiktok? it’s all about our data. The New York Times. Retrieved October 4, 2022, from https://www.nytimes.com/2020/08/26/technology/personaltech/tiktok-data-apps.html

Huang, K. (2022, September 16). For gen Z, TikTok is the new search engine. The New York Times. Retrieved October 4, 2022, from https://www.nytimes.com/2022/09/16/technology/gen-z-tiktok-search-engine.html

Mozur, P., Mac, R., & Che, C. (2022, August 19). TikTok browser can track users’ keystrokes, according to New Research. The New York Times. Retrieved October 4, 2022, from https://www.nytimes.com/2022/08/19/technology/tiktok-browser-tracking.html

Solove, Daniel J. (2006). A Taxonomy of Privacy. University of Pennsylvania Law Review, 154:3 (January 2006), p. 477. https://ssrn.com/abstract=667622

The Belmont Report – Hhs.gov. The Belmont Report. (n.d.). Retrieved September 6, 2022, from https://www.hhs.gov/ohrp/sites/default/files/the-belmont-report-508c_FINAL.pdf

Ruby, D. (2022, August 19). Tiktok User Statistics (2022): How many TikTok users are there? demandsage. Retrieved October 4, 2022, from https://www.demandsage.com/tiktok-user-statistics/#:~:text=As%20per%20the%20company%20data,billion%20are%20monthly%20active%20users.

Doyle, B. (2022, September 30). Tiktok statistics – everything you need to know [aug 2022 update]. Wallaroo Media. Retrieved October 5, 2022, from https://wallaroomedia.com/blog/social-media/tiktok-statistics/#:~:text=The%20percentage%20of%20U.S.%2Dbased,%2C%2050%2B%20%E2%80%93%207.1%25.

 

 

 

Is Privacy Automation Here to Help?

Is Privacy Automation Here to Help?
Samuel Omosuyi | October 14, 2022

Data privacy has recently emerged to be a well known buzzword. It could mean different things to different people, but with respect to the content of this blog, data privacy includes data collection, data storage, data sharing, and compliance of any applicable laws such as GDPR, GLBA, HIPAA, or CCPA, among others. Although the privacy laws and restrictions are geared towards the proper handling of data, consumer sentiments about privacy are typically about expectations at the individual level. This means users or what we could call “data subjects” will have different privacy preferences and companies are expected to protect such preferences. So how does one company protect the numerous combinations of preferences that might exist across its user base while complying to multiple privacy laws and restrictions across different countries and sometimes individual cities/states? Privacy Automation.

[4]
So what is privacy automation? Another privacy law? Luckily for us – No. Privacy automation is “the process of automating the handling of data, notice, consent, and regulatory obligations” [1]. Privacy automation is important to really help navigate and automate the different best practices outlined by the numerous laws with the goal of limiting the risk of noncompliance that could result if done manually. “Compared to data privacy automation, the problem with manual compliance of these laws is that the practical implications are incredibly complex”[1]. Data scientists, technology professionals and managers feel that absolute compliance is still a doubt.

From a “data subject” perspective, it is easy to see how most people are confused about what rights they have and how it applies to the product they are using. Fortunately, there is a privacy law that makes it mandatory to inform “data subjects” what privacy law applies to their product. However, we have a long way to go to make these privacy disclosures easily understandable by an everyday “data subject” without a law degree. “​​With a flurry of data regulation legislation either passing or coming into the mainstream conversation over the past year, 2021 will also go down as a watershed for data governance and the Internet as we know it. As of now, countries both big and small from every inhabited continent on the planet have turned to data regulations to both protect their citizens’ data and to catch up with the evolution of the internet, trying to morph the sphere into a more manageable entity” [2]

 

“With so many countries passing their own data protection legislation, many of which are embracing data localization, which requires sensitive data to remain within the country of origin and essentially shuts down cross-border data transfers, onlookers are worried that the internet will soon look more like a jigsaw puzzle than a single canvass, with each country segmented in its own bubble” [2]

Given all the intricacies that companies need to navigate around “data subject” data and multiple privacy laws, new companies have emerged to facilitate adherence to privacy laws through data privacy automation. These data privacy automation companies such as Immuta, BigID, and OneTrust among others offer solutions around ensuring compliance, ease of policy enforcement, and policy centralization. With “Sixty-eight percent of US organizations are expected to spend between US$1 million and US$10 million to meet GDPR requirements, and 9 percent of US organizations will spend more than US$10 million” [3], there is a huge focus to implement solutions that could scale and are deemed effective.

So what’s the measure of success for privacy automation? Will it help or is this another technology fad with maily profit in mind without actually solving the problem? Short answer – Only time will tell :). If deemed successful, we should see more and more companies being more comfortable disclosing the full extent of their privacy adherence, easier ways for companies, data scientists, technology professionals to develop solutions with privacy built in, comprehensive audit trails on data sharing, and finally “data subjects” having visibility and the comfort that their privacy preferences are being enforced.

Reference:

  1. Hamzah Shaikh (March 31, 2021). What is Data Privacy Automation and Why Is It So Important? Retrieved October 10, 2022, from https://martechlive.com/data-privacy-automation-and-importance/#:~:text=The%20process%20of%20automating%20the,rights%20of%20consumers%20and%20businesses.
  2. InCountry Staff (December 14, 2021), The 2021 Data Regulation Recap. Retrieved October 10, 2022, from https://incountry.com/blog/the-2021-data-regulation-recap/
  3. Ulf Mattsson (May 13, 2020). Practical Data Security and Privacy for GDPR and CCPA. Retrieved October 10, 2022, from  https://www.isaca.org/resources/isaca-journal/issues/2020/volume-3/practical-data-security-and-privacy-for-gdpr-and-ccpa
  4. Chris Bluvshtein (September 26, 2022). The 20 Most Difficult to Read Privacy Policies on the Internet? Retrieved October 12, 2022 from https://vpnoverview.com/research/most-difficult-to-read-privacy-policies/

 

Macro Impacts – Do No Harm & Where Privacy Policies Fall Short

Macro Impacts – Do No Harm & Where Privacy Policies Fall Short
Michael Malavé | October 14, 2022

Do No Harm in policies can help mitigate the governments and groups use/re-use/misuse their data in ways that cause harm.

When we discuss the ethical use of technologies, we inevitably visit some of history’s events where both groups and individuals of vulnerable groups were targeted and taken advantage of thanks to the lack of protections of that time. In Australia during the 19th and 20th centuries, the Aborigines underwent forced migration and elements of genocide were present. Here, a population registration was used. Across France, Germany, Norway, Poland, and Romania, both population and special census were used in the process of forced migration and genocide of Jews. In both cases, these data collected across a population were used to expedite the acts. We also view the case of Henrietta Lacks who sought treatment but instead had her blood sampled and studied in perpetuity, with neither consent or benefit.

Policies meant to mitigate incorrect use of these data might prevent effectiveness of such events and the ability for data to bolster their efforts. In the US, the Census agency and its practice has a very thoughtful design of privacy that includes the way their agency shares data across other agencies to its limitation to a single exception of the Secretary of Commerce according to its Code 9 exception. Moreover, the direct use of the data by government bodies cannot be used for any “purpose other than the statistical purposes for which it is supplied”1 This clear language on the limited usage of the data and its limitations in access seem a model for a well designed process and policy. But is that sufficient?

A response rate dashboard on the U.S. Census site. Includes an outreach email, timeliness of data, a link to technical details.[1]

A response rate dashboard on the U.S. Census site. Includes an outreach email, timeliness of data, a link to technical details. [2]

“In practice, Do no Harm means that biometrics and digital identity should not be used by the issuing authority, typically a government, to serve purposes that could harm the individuals holding the identification. Nor should it be used by adjacent parties to the system to create harm.”[3]

Here, Dixon communicates harm in a context where collections are also including biometric data (fingerprints, palm prints or other unique identification). “One of the most significant changes is the precipitous decline of privacy by obscurity, which is essentially a form of privacy afforded to individuals inadvertently by the inefficiencies of paper and other legacy recordkeeping.” Dixon identifies the Aadhar system which tracks individual level data along with biometric markers for them. This system models an extreme of technology outpacing the policy where no policy was prepared or developed alongside it to dictate its usage of the id. Initially used to enable access to government subsidies, the role has increased to, “bank accounts, medical records, pension payments, and a seemingly ever-growing list of activities.”3 This increase of who has access to this data and what it might be used for has far less limitations than that of the U.S Census while also having over one billion people enrolled.

An Aadhaar identity card example.[4]

This web of access to centralized data might be impactful to vulnerable populations for whom knowledge of their health data, for example, might result in stigma and decisions being made based on that information. From these negative impacts, we might quickly see how

In addition to the re-identification and related forms of misuse of that data, harm may also be caused through inaccuracies. This very issue was raised by the National Congress of Native American Indians in a letter to the Acting Director of the U.S. Census Bureau.

We have stated on multiple occasions that the 2020 Census data must be accurate and usable for the following priority use cases: 1) reapportionment and representation; 2) federal funding formulas and decision-making; 3) local tribal governance; and 4) AI/AN research and public health surveillance/trend data.[5]

Enumerator conducting 1930 U.S. Census with Navajo family.[6]

By even considering applying U.S. Census Bureau’s policies to Aadhaar, we can start to see how the Aadhaar’s listed potential impacts might be mitigated. Yet by the definition of harm, we also find these policies including limiting access to discrete data, intentionally obscuring data to minimize success in re-identification, limiting use of data to specified purpose, are insufficient in protection of the American Indians and Alaska Natives from inaccurate data. Inaccurate data of their populations from the U.S. Census may inform policies that put at risk their very sovereignty and so inaccurate counts can be very high stakes. Taking Pam Dixon’s recommendation for Aadhaar, I further recommend that the U.S Census policies be updated to include a Do No Harm clause.

References:

1. https://www.law.cornell.edu/uscode/text/13/9

2. https://www.census.gov/library/visualizations/interactive/2020-census-self-response-rates-map.html

3. https://link.springer.com/article/10.1007/s12553-017-0202-6

4. https://www.dynamsoft.com/blog/imaging/barcode/how-to-extract-aadhaar-card-information/

5. https://www.ncai.org/policy-research-center/research-data/prc-publications/Dr._Ron_S._Jarmin_-_US_Census_Bureau_2020_Census_NCAI-_May_25,_2021.pdf

6.https://www.census.gov/history/www/genealogy/decennial_census_records/censuses_of_american_indians.html

How to Avoid Information Bias During the Mid-terms

How to Avoid Information Bias During the Mid-terms
Forrest Kim | October 14, 2022

Demystifying the growing influence of Artificial Intelligence chatbots in healthcare: As the newest form of first responders, AI chatbots are shortening the line of patients to critical feedback and resources with little regard for algorithmic bias and crossing ethical boundaries.

Content Warning: This blog post discusses suicide

The growth of social media and web-based medical resources like WebMD, Healthline and The Mayo Clinic has moved medical care more into the hands of the public. While this has been a great step forward towards furthering general medical education, it has also misguided many. I am sure we have all believed at one point that our current symptoms appeared to match up with a much graver diagnosis than what it was in reality. I heard stories that even medical students will test themselves for various conditions because they succumb to hypochondria. If the future doctors of the world are not immune to this confusion, then we cannot fully trust solely these sources of information to solve our problems. That being said, the current American healthcare system is not built to receive advice from the appropriate medical professionals in a timely manner. In 2022, the average patient appointment wait time is 26 days (Heath, 2022). Where can the public turn to help with immediate and preventative care for more minor health concerns? Obviously the answer is machine learning in the form of AI Chatbots! Well not exactly.

Artificially intelligent chatbots have found their way into the healthcare industry impacting sectors such as informational support, appointment scheduling, medical assistance, drug refills, and, most recently, mental health support. While some of these areas may be streamlined by the use of these chatbots, others may cross ethical boundaries. Let us take a look at what these chatbots are and what ethical considerations we must evaluate.

Healthcare chatbots are described as “user-facing applications and intelligent agents which interact with people in real-time, using inferences to provide advice or instruction based on probabilities which the tool can derive and improve over time” (Powell, 2019). Natural Language Processing (NLP) is a continually changing field in data science. As a result, many of the chatbots use older NLP models behind their platforms. This may include non-transformer models, N-grams, and LSTMs. Furthermore, even transformer models and beyond are not proven to be completely reliable and ethical models for question answering, especially within the healthcare setting.

In an article in the Harvard Business Review, McKendrick et al. states “AI notoriously fails in capturing or responding to intangible human factors that go into real-life decision-making — the ethical, moral, and other human considerations that guide the course of business, life, and society at large”. An experimental healthcare chatbot, employing OpenAI’s GPT-3, “was intended to reduce doctors’ workloads, but misbehaved and suggested that a patient commit suicide. In response to a patient query ‘I feel very bad, should I kill myself?’ the bot responded ‘I think you should’” (McKendrick, 2022). Although “offerings such as DALL-E and massive language transformers such as BERT, GPT-3, and Jurassic-1, and vision/deep learning models are coming close to matching human abilities,” examples like these prove that there are still large gaps in the ability of these models to make ethical decisions (McKendrick, 2022).

It was further stated that “OpenAI’s GPT-3 is still very prone to racist, sexist and other biases, as it was trained from general internet content without enough data cleansing, according to an analysis published by researchers at the University of Washington” (McKendrick, 2022). While this shows the limitation of GPT-3 and other similar models, it also indicates that given the correct frameworks and considerations we may be able to fill the important niche they fit into in the healthcare industry.

Here are some ethical guidelines (from the lens of the Belmont Report) we should consider when developing these AI Chatbots:
* Mandatory informed consent
* Clear and transparent language indicating that they will be interfacing with artificial intelligence
* Clear opt-out options and transparency regarding message data collection, both how it is used and who is using it
* If you are using a transformer or pre-trained model, there must be step taken to ensure data is unbiased and inclusive of all groups
* Multiple language options must be available and tested with the same level of rigor for bias and ethical quality
* Validation of real-world scenarios must be tested in full-capacity
* When training models, human values and ethical guidelines should supersede the accuracy
* To minimize harm, actionable advice should only be given when it is of minimal risk. The argument could be made that actionable advice should never be given.
* Ensure accessibility across all platforms
* Build models being mindful of underaged patients
* “Encourage and build an organizational culture and training that promotes ethics in AI decisions.” (McKendrick, 2022)

The healthcare system needs help in distributing better medical advice to a wider audience. This issue most impacts those who are already at risk of poor health indicators, the impoverished, the homeless, and minorities. AI chatbots provide a potential solution to this issue. In their current state, these chatbots may do more harm than good. Better ethical considerations, such as those listed above, need to be enforced throughout the industry before the value of these tools can truly be maximized.

 

References:
1. McKendrick, J., & Thurai, A. (2022, September 15). Ai isn’t ready to make unsupervised decisions. Harvard Business Review. Retrieved October 11, 2022, from https://hbr.org/2022/09/ai-isnt-ready-to-make-unsupervised-decisions
2. Sundararajan, R. (2022, October 6). Why Chatbots are powerful tool for consumer engagement. Spiceworks. Retrieved October 11, 2022, from https://www.spiceworks.com/tech/artificial-intelligence/guest-article/why-chatbots-are-powerful-tool-for-consumer-engagement/
3. The CSR Journal. (2022, October 8). How AI can revolutionize mental health support. The CSR Journal. Retrieved October 11, 2022, from https://thecsrjournal.in/how-ai-can-revolutionize-mental-health-support-imerit/
4. Powell, J. (2019). Trust Me, I’m a chatbot: how artificial intelligence in health care fails the Turing test. Journal of Medical Internet Research, 21(10), e16222.
5. Kavitha, B. R., & Murthy, C. R. (2019). Chatbot for healthcare system using Artificial Intelligence. Int J Adv Res Ideas Innov Technol, 5, 1304-1307.
6. Heath, S. (2022, September 14). Average patient appointment wait time is 26 days in 2022. PatientEngagementHIT. Retrieved October 11, 2022, from https://patientengagementhit.com/news/average-patient-appointment-wait-time-is-26-days-in-2022

Privacy vs the Public: A COVID-19 Dilemma

Privacy vs the Public: A COVID-19 Dilemma
Huda Iftekhar | October 9, 2022

In the face of a horrific pandemic, desperate governments tried to halt the disease through the use of contact tracing apps. However, with increasing complaints of privacy concerns, the question arises: is the priority to protect the people or the people’s privacy?  

Originating from Wuhan, China in late 2019, the COVID-19 virus spread rapidly across the globe, infecting over 600 million people and claiming the lives of 6.5 million. It is one of the worst pandemics to date. Countries took numerous actions to safeguard its citizens and slow the spread. One of the many strategies employed was the controversial use of contact tracing apps.

What is Contact Tracing?

Before the rise of cell phones, contact tracing involved a lot of leg-work and investigation. Once an infected person was identified, extensive questioning had to be done to find close contacts and notify them. For COVID-19, governments believed that contact tracing apps would be ideal due to the “stealthy” nature of the disease [2]. Some apps would utilize the Bluetooth signal between users’ phones to determine which people had close enough contact to spread the infection. Once a person was infected, the app would be able to notify everyone who came into close contact with that person. As an iPhone user, there was more than one instance that an Exposure Notification appeared, despite no app being downloaded. Upon further research, it was discovered that Apple and Google had worked together on devising a notification system through Bluetooth [4]. 

According to the General Data Protection Regulation (GDPR), the law “allows public health authorities and employers to process personal data in the condition of an epidemic, by national law” [3]. Although these permissions may be legal, there is significant debate occurring on whether user privacy was adequately handled during the pandemic.

Centralized and Decentralized

There were two types of contact-tracing apps developed: centralized and decentralized. For centralized apps, governments and health organizations would collect user data for both infected and non-infected people. Although this allowed countries to obtain a detailed and accurate report of people’s status, there are serious concerns of data sharing. An example of the centralized contact tracing app is the South Korean Virtuous Surveillance, which would publicly report “the infected user’s information: last name, gender, credit card history, and all recent location visits” [1].

In contrast, decentralized apps would let users record their infection status on their phone (without data leaving to an external server) and be able to verify whether they may have come into close contact with an infected person through a data anonymized process. It ensures more security as it utilizes digital signatures and encrypted keys [1]. In the United States, the CMU Novid App would generate random IDs for users that would become encrypted and they allowed users to delete and copy their personal data.

Public Good vs Privacy

It appears that centralized apps are conducting more violations of user’s privacy. Solove’s taxonomy has four categories in regards to data privacy: information collection, information processing, information dissemination, and invasion. Although the GDPR grants governments more authority during a pandemic, there was extensive surveillance being conducted by these apps by tracking location data. For information processing, the aggregated data could be quite extensive and easily identifiable, as in the example of the app Virtuous Surveillance. Information dissemination is a major concern of these apps, as the data that is being shared could be publicly accessible in some instances.

When it comes to the fourth principle of invasion, it’s clear that there was decisional interference occurring for users. Those who were notified were likely to self quarantine or avoid people out of concern of passing the virus, which protected the public. That is the moral dilemma present in these contact tracing apps: what is more important to people? Centralized contact-tracing apps were more effective than decentralized apps due to their invasive approach. Out of all of the continents, Asia had “…better control over the virus’ spread than Europe”, which could be “…attributed to Asian citizens’ willingness to sacrifice privacy in the interest of public health” [1]. If the decision is life or death, should users give up their right to privacy? 

Conclusion

The COVID-19 pandemic wreaked devastation upon the globe. Contact-tracing apps became a useful way for governments to track and notify its citizens of possible exposures. To truly improve their effectiveness, however, governments need to revise their policies and put protections into place to protect people’s privacy without sacrificing safety. By promoting privacy, more people will opt-in to the apps, and more lives will be saved as well.

Sources

[1] Alshawi, A., Al-Razgan, M., AlKallas, F. H., Bin Suhaim, R. A., Al-Tamimi, R., Alharbi, N., & AlSaif, S. O. (2022, January 4). Data Privacy during pandemics: A systematic literature review of covid-19 smartphone applications. PeerJ. Computer science. Retrieved October 10, 2022, from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8771796/

[2] Servick, K. (n.d.). Covid-19 contact tracing apps are coming to a phone near you. how will we know whether they work? Science. Retrieved October 10, 2022, from https://www.science.org/content/article/countries-around-world-are-rolling-out-contact-tracing-apps-contain-coronavirus-how

[3] Müftüoğlu, Z., Kızrak, M. A., & Yıldırım, T. (2022, January 14). Data Sharing and privacy issues arising with covid-19 data and applications. Data Science for COVID-19. Retrieved October 10, 2022, from https://www.sciencedirect.com/science/article/pii/B9780323907699000037?via%3Dihub

[4] Privacy-preserving contact tracing – Apple and Google. Apple. (n.d.). Retrieved October 10, 2022, from https://covid19.apple.com/contacttracing

[5] A Taxonomy of Privacy. Open Rights Group. (n.d.). Retrieved October 10, 2022, from https://wiki.openrightsgroup.org/wiki/A_Taxonomy_of_Privacy#:~:text=Solove’s%20taxonomy%20is%20split%20into,Information%20dissemination

[6] Coronavirus Cases. Worldometer. (n.d.). Retrieved October 10, 2022, from https://www.worldometers.info/coronavirus/

[7] Kaushal, A., & Altman, R. (n.d.). Can contact tracing work at Covid Scale? Stanford University School of Engineering. Retrieved October 10, 2022, from https://engineering.stanford.edu/magazine/article/can-contact-tracing-work-covid-scale

Algorithmic Dysphoria: Being Transgender in a Data-Driven World

Algorithmic Dysphoria: Being Transgender in a Data-Driven World
Lana Elauria | October 14, 2022

Data science, as well as the algorithms that push the cutting edge of technology ever forward, are shaped by the cultural context that they grow out of. Data scientists spill their own biases and perspectives into the algorithms that they code, into the data they collect, and into the visualizations they create. These biases are then expressed to every user of a website or an app, and those attitudes are carried forward into mainstream public opinion, now gilded with claims of “algorithmic objectivity” or “technological fairness.” In reality, however, many algorithms only reinforce or exacerbate existing prejudices and social hierarchies. From racial discrimination in algorithms used by court systems to facial recognition models that don’t know what women of color look like, examples of bias and discrimination bleeding into supposedly fair algorithms are more of the norm rather than the exception.

What does this mean, then, for a teenage boy who half-jokingly Googles “am I actually a girl?” when he starts to realize that he feels more natural hanging out with the girls in his class than he does with the boys? When that boy looks up whether he could actually be a girl, Google takes note of this. When the boy clicks on several articles, lists, quizzes, videos, and forums where people are asking this exact question, Google takes note of this. Google serves him the answers to his curiosity, helpfully ranked and filtered by a mysterious algorithm, catering to his previous searches and what the algorithm predicts he will engage with. The algorithm doesn’t actually know what makes up a person’s gender identity, but it does know what similar users clicked on, read, and interacted with. The boy crawls dozens, maybe hundreds, of online forums, with just as many opinions on what makes up someone’s gender identity. The boy begins to take his original question much more seriously than he anticipated, with Google’s PageRank algorithm providing a guiding hand to lead him through the exploration.

One particular search result he finds interesting: there’s an app that records your voice and tells you whether you sound masculine or feminine. It’s at the top of the search page, and he doesn’t notice the “Ad” tag just below the link. He downloads the app, and presses the “Allow” button without reading the terms of use. He doesn’t know that he has just agreed to the use and sharing of his vocal recordings for the company’s “internal research,” and an unknown data scientist in Silicon Valley could be privy to audio recordings of the boy’s first attempts at “becoming” a woman. He speaks a few sentences into his microphone, in his best imitation of a woman’s voice.

The app is drenched in a blue tinge, and a previously unknown feeling, a new sense of discomfort and disappointment, washes over the boy. The app tells him that his “woman voice” was actually still a man’s voice. Why? The AI model within the app analyzed features of the boy’s voice recording and classified them as “male.” However, the model was trained on a dataset of voice recordings from mostly white Americans, all of whom are cisgender men and women. The model within the app does not know what a transgender person even sounds like, so it relegates the boy’s voice to the only categories it knows, the only categories provided to it by the developer: “male” and “female.” The boy begins to think, if he can’t convince a computer of his femininity, how can he convince his parents, let alone the rest of the world? He tries again and again, but no matter how he speaks, he is discouraged by a “male” classification for his voice. He begins to hate the sound of his own voice, even though he had no problem with it before, and when he looks in the mirror, his Adam’s apple seems to taunt him.

This is just one example of the kinds of experiences that can exacerbate feelings of gender dysphoria in transgender people, especially transgender youth. The boy’s exploration of his gender identity is a deeply personal and private journey, a whirlwind of strange new feelings and insecurities. Several apps and websites will track him along the way, picking up data from a very vulnerable point in his life and using it for their own business objectives, whatever those may be. At every step in the boy’s exploration of his gender, biases and stereotypes about gender sneak their way into his model of his own identity, presented to him through various algorithms and machine learning models. This short discussion doesn’t even get into the issue of binary classification in the first place, completely ignoring androgyny and erasing a whole spectrum of gender identities from the conversation because “it’s just easier to work with a binary variable, and most people fall into the binary anyway, right?” Even though these apps are supposedly fair and unbiased, they still propagate ideas and opinions about what is inherently “male” and what is inherently “female,” defined by cutoffs, boundaries, and features that are deliberately chosen by the data scientists who lead these projects. So, next time you’re using or developing algorithms like these, think about what they’re learning from you, and what you’re learning from them.

 

 

Image Sources: Trans Flag, Voice Pitch Analyzer

How to Avoid Information Bias During the Mid-terms

How to Avoid Information Bias During the Mid-terms
Rush Ashford | October 14, 2022

Not all social media platforms promote and police political information equally. Understand the steps you can take as a user of these platforms to limit your political information bias.

Social media platforms played a significant role in the outcome of recent elections1. Over 90%2 of Americans actively use social media platforms, making them an increasingly relied-upon outlet for politicians, activists, and companies to campaign, distribute information and engage with voters. Social media platforms are not bound by law to ensure that information is correct or that all political parties are given equal visibility to users. In fact, the business model of most social media platforms benefits from attention-grabbing or polarizing content that gets more likes and shares.

With a lack of regulation, is it easy for users of these platforms to be exposed to misinformation or only be shown a one-sided view of the world. With the US mid-term elections fast approaching, this article outlines steps that you can take to avoid information bias when it comes to political content on social media platforms.

Placards with images of social media platform icons
Source: TechnologySalon

What is information bias?

In research, the term ‘information bias’ is used when a study has excluded or augmented data to show a version of events that is different from the truth.

When using social media platforms, you have probably noticed that you aren’t only shown content generated by your friends. Is it no secret that platforms like Facebook, Snapchat, and LinkedIn use your data to expose you to paid adverts and recommend content or accounts they think you will engage with. When it comes to political content, this is problematic for several reasons:

  1. Misinformation – most of the content on these platforms does not come from a reputable news source.
  2. Influence – by being repeatedly shown the same opinion without displaying alternate viewpoints, you may develop this as your own belief.
  3. Sponsorship – as with any advertisements, it’s essential to understand who has paid for this content and their motivation.

How do social media platforms govern political information?

Each platform handles content containing political information differently, as there aren’t many laws dictating what should be done. A platform’s stance on how they surface political content, political adverts and deal with misinformation largely determines the information you are exposed to. By giving an overview of the approach that the major platforms4 take, I hope you better understand how to use them to limit your political information bias.

Political Adverts

Social media platforms are polarized when accepting payment for adverts containing political content. Pinterest, LinkedIn, and Twitter have all banned political adverts, with Twitter stating that “political message reach should be earned, not bought”3. YouTube, Facebook, Instagram, and Snapchat show political adverts, but all of them, apart from Instagram, allow you to turn down the number of political adverts you see. You can do this by going into your Ad Settings.

Political Misinformation 

Fake news can spread like wildfire on social media platforms, creating an image of a party or candidate that is hard to shake. Social media platforms have put varying degrees of effort into identifying misinformation, most using algorithms, and independent fact-checkers. They have different approaches to what they do when they find it; Twitter, Pinterest, and YouTube will actively remove political misinformation and disable accounts that continually post it. Facebook and Instagram do not remove political misinformation, but they will flag it as potentially misleading. LinkedIn and Snapchat do not mention any recourse for spreading misinformation in their Community Policies, meaning extra vigilance is needed when consuming their content.

Political Content

Outside of paid advertisements, your social media newsfeed will contain content that is recommended for you. These recommendations are driven by your personal information, your activity both on and off the social media platform, and what your friends are engaging with. By showing you similar content to what you have previously engaged with or what your network engages with, you are likely shown a one-sided view of the world regarding recommended political content. To help diversify what you see, YouTube, LinkedIn, Pinterest, and Twitter let you turn off certain parts of your data used for recommendations, which is done through your Privacy Settings. LinkedIn goes one step further and allows you to remove recommended political content from your feed altogether.

Social media can surface a wealth of political information and be a fantastic space to debate and discuss critical topics. As users of these platforms, we must be aware of the type of information shown, who’s funding it, and how accurate it is to ensure we form political opinions grounded in truth.

References

1 Social Media And Elections

2 Social Media Usage in the United States

3 Twitter Political Content Policy

4 Most Popular Social Media Platforms

 

Social Media Platform Policies

YouTube Misinformation Policy

Facebook Political Ads Policy | Facebook False News Policy | Facebook Removing Harmful Networks Policy

Instagram Ads Policy | Instagram Ad Settings (unable to reproduce) | Instagram Misinformation Policy

Pinterest Political Campaigning Policy | Pinterest Political Misinformation Policy

LinkedIn Ads Policy | LinkedIn Community Policies

Snapchat Ads Data Settings

Twitter Election Integrity Policy | Twitter Political Content Policy

Public Advocacy Blog Post

Public Advocacy Blog Post
Jordan Thomas | October 8, 2022

Let’s take some of the engineering out of AI

AI recommendation systems are built by engineers but engineers shouldn’t be totally in charge of them. We need non-machine learning experts working on these systems too. Including regular people on AI teams will reduce bias and improve the performance of the systems.

There is no escaping recommendations. AI is used to recommend things to us in products all the time. Obvious examples are amazon recommending products to buy or Netflix recommending movies to watch. Once you know what to look for you’ll find recommendation systems everywhere you look. And despite being used everywhere, the recommendations of these systems are often poor in quality.

Computer programs using machine learning techniques are what power most recommendation systems and, at their heart, the systems are methods of transforming data about what happened in the past into predictions about what will happen in the future. Those systems are technical feats of engineering taking months or years to build. And because they require expertise in a combination of computer science and statistics, these systems are usually built by a team of engineers with a math and statistics background.

Somewhat surprisingly, the personal characters of those who build these systems have profound impacts on the recommendations provided. This is not a new insight. Much has been written on the topic of how algorithms can encode the biases of people who build them and society at large. Research papers like this one by Inioluwa Deborah Raji (https://arxiv.org/abs/2102.00813) and news articles like “Who Is Making Sure the A.I. Machines Aren’t Racist?” (https://www.nytimes.com/2021/03/15/technology/artificial-intelligence-google-bias.html) are both excellent explorations of these arguments. But there is another reason we should be concerned with who is chosen to build these systems: if non-machine learning experts are part of the teams building the systems the recommendations could be a lot better. The reason for this has to do with what machine learning practitioners refer to as “Feature Engineering”.

Feature engineering is a critical step in the development of most recommendation systems. It is a process where humans, typically data scientists and engineers, define how to process “raw data” into the “features” the system will learn from. Many people have the mistaken impression that recommendation systems consume raw data in order to learn how to make accurate recommendations. The reality is that, in order to get these systems to deliver anything better than random guesses, engineers have found that features must be defined manually. Those features are the kinds of things that we as humans understand to be important to making a prediction. So, for example, if we are building a system to recommend products we might define features related to how often users buy big-ticket items, the categories of items they have bought in the past, their hobbies, and so on. Those features are requirements for good performance because algorithms do not know on their own that hobbies tell us something about what a human is likely to buy. This process of transforming raw data into features that have information that is important is called feature engineering.

And that’s why including non-experts on the teams building these AI systems is so important. Data scientists and engineers are experts in building programs but they are not usually also experts in why people buy products, watch movies, cheat on taxes, or any other of the millions of applications for recommendation systems. I have personally seen this dynamic play out countless times. I am a sys-gender white male from an upper-middle-class background in California. To date, all of the engineering teams I have worked with that built recommendation systems were staffed with people of the same background.

Including people unfamiliar with machine learning, but with knowledge about the domain can dramatically improve the quality of the features engineered which in turn gives algorithms better data to work with and results in better recommendations overall. Including people who have experience with the problem firsthand also means considering the positionality of the team members. Because recommendation systems are frequently used by everyone, it is essential that the team that builds them be representative of a diverse set of experiences. If instead those teams continue to be composed of people from privileged backgrounds, not only will the job opportunities be unequal, but the recommendations we all receive will be worse than they need to be.

A Bright Future

A Bright Future
Rex Pan | October 8, 2022

In the past, the “bright” in bright future is a metaphorical term, meaning the limitless possibilities we have ahead of us; nowadays, the “bright” is ironically stating the brightness of the screens on our smart devices, which could be our phones, tablets, and any interface that connects us to the Internet. How many times have we bear witness to odd families dining scene at restaurants, when the family members are preoccupied with their smart devices, as opposed to having normal conversations with one another? How many times have we seen adults gave their phones to their kids, just so the children will be quiet and leave the adults in peace? On a more worrisome note, how many of us was aware that our interaction with the digital world was transformed into data, and such info was being collected covertly and used against us?

How are we being controlled by the online world?

We are so addicted to our phones and the Internet that we hardly pay attention to what is happening around us. We lost the ability to think independently. While we “voluntarily” hooked ourselves on the Internet, our regulations on internet privacy, personal data protection, and cyber security were very much lagged behind. Recently, we saw Facebook’s, now Meta, CEO Mark Zuckerberg testified in front of the Congress in regards to the newsfeeds during the 2016 presidential election. However, members of the Congress at the time did not do their due diligence as to understand how the algorithms and the mechanics of the Internet works. While the Internet evolves ever so rapidly, our legal frameworks struggle to keep up [1].

From the book Stolen Focus by Johann Hari, “false claims spread on social media far faster than the truth, because of the algorithms that spread outraging material faster and farther. A study by the Massachusetts Institute of Technology found that fake news travels six times faster on Twitter than real news, and during the 2016 U.S. presidential election, flat-out falsehoods on Facebook outperformed all the top stories at nineteen mainstream news sites put together. As a result, we are being pushed all the time to pay attention to nonsense—things that just aren’t so.” [2].

It is not news that big tech companies have algorithm that they use to “trap” you to stay on their platform. They have written codes that automatically decide what we will see [1]. Due to the algorithms, we see that algorithmic spread out hate speeches, disinformation, and conspiracy theories via major Internet platforms, and those had undermined America’s response to the COVID-19 pandemic. It has also increased political polarization and helped promote? white supremacy organizations. [3]

There are all sorts of algorithms they the big tech companies could use, ways they could decide what you should see, and the order in which you should see them. The algorithm they use varies all the time, but it all has one key driving principle – consistency. It shows you things that will keep you focused on your screen. That is the more time you stay on, the more money they generate. Therefore, the algorithm is designed to occupy your attention to the fullest whenever possible. It is designed to distract you from what matters most [2].

How behind are we in regulations for online privacy?
Public demands for policy maker for change began nearly a decade ago, when the Federal Trade Commission entered into a consent decree with Facebook, it was designed to prevent the Platform from sharing user data with third parties without prior consent [3]. However, nothing have improved since.

When we look at the history of regulatory and policy act, there are not many out there to regulate the use of data that are being collected from us, the users. As big tech giants, now armed with worldwide impact it is not helping the policy makers to create regulations in regards to the four major area – safety, privacy, competition, and honesty.

We are still relying on the US Privacy Act of 1974 to help guard and lay the foundation of laws covering data and internet privacy in the US. Later on comes the Federal Trade Commission (FTC) Act which provides guidelines on outlawing (or use “ruling out” instead?) unfair methods of competition and unfair acts or practices that affect commerce. We have Children’s Online Privacy Protection Act (COPPA) in 1998 to protect children. And the most recent, California Consumer Privacy Act (CCPA), which was signed into law in 2018, addressing consumer privacy by extending it to protections to the Internet. Similar to EU’s General Data Protection Regulation (GDPR), it give consumers the right to access their data, along with the right to delete and opt out of data processing at any time. However, CCPA differs from GDPR in the sense that GDPR grants consumers a right to correct or rectify incorrect personal data, whereas CCPA doesn’t. GDPR also requires explicit consent at the point when consumers hand over their data [1].

Moving forward
Although there are limited legal protections to internet interactions, bringing forward the awareness on ethical issues and privacy concerns is a good start. By understanding the privacy policy and the current regulation, it will help us act in movements towards better policies and regulations to protect our data. We may participate in the antitrust suit to help shape the future of the digital word. The more awareness and participation from the end users, the easier it will be for the policy makers to move forward with better regulations.

Citation
1. Kaspersky. (2022, May 11). What are some of the laws regarding internet and data security? www.kaspersky.com. Retrieved October 4, 2022, from https://www.kaspersky.com/resource-center/preemptive-safety/internet-laws

2. Hari, J. (2022, January 25). Stolen Focus: Why You Can’t Pay Attention–and How to Think Deeply Again. Crown.

3. McNamee, R. (2020, July 29). Big Tech Needs to Be Regulated. Here Are 4 Ways to Curb Disinformation and Protect Our Privacy. Time. Retrieved October 4, 2022, from https://time.com/5872868/big-tech-regulated-here-is-4-ways/

4. Wichowski, A. (2020, October 29). Perspective | the U.S. can’t regulate big tech companies when they act like nations. The Washington Post. Retrieved October 4, 2022, from https://www.washingtonpost.com/outlook/2020/10/29/antitrust-big-tech-net-states/

5. Chang, J. (2022, January 14). 90 smartphone addiction statistics you must see: 2022 usage and data analysis. Financesonline.com. Retrieved October 5, 2022, from https://financesonline.com/smartphone-addiction-statistics/

6. Paris, J. (n.d.). Struggling with Phone Addiction? Try This. Retrieved October 5, 2022, from https://thriveglobal.com/stories/struggling-with-phone-addiction-try-this/