July 2022 – Page 2 – Data Science W231 | Behind the Data: Humans and Values

July 8, 2022

Privacy Policies: Manufactured Consent

Privacy Policies: Manufactured Consent
Angel Ortiz | July 7, 2022

The conversation surrounding privacy policies and terms of service (ToS) has grown in public interest these recent years, and with it concern on what exactly people are agreeing to when they “click” accept. This more noticeable interest in the agreed upon terms for the use of one’s private information (as well as its protection) was likely sparked in part by the Facebook/Britannica Analytica breach of privacy scandal of 2018 (Confessore, N. 2018). This event stirred a social discussion on how companies protect our data and what they are allowed to do with the data we provide. However, despite this burgeoning unease for the misuse of user intelligence, it is all too common to find ourselves blindly accepting the ToS of some website or application, all because we find it too much of a nuisance to read. While it may be true that much of this behavior is the responsibility of the consumer, one must also wonder what obligations companies have when making their policies. After all, if it is a frequent phenomenon that users accept a ToS solely due to its inconvenience, then one must begin to wonder whether this bothersome nature is purposely infused into the text for this very reason.

Complexity of Privacy Policies

In May 2014, the California Department of Justice outlined several concepts privacy policies should comply with in order to properly disseminate their contents to users. One of these key principles was “Readability”, in which they specified that privacy policies should (among other things) use short sentences, avoid the use of technical jargon, and be straightforward (State of California Department of Justice & Harris, K., 2014). Similarly, the FTC also advocated for more brief and transparent privacy policies in a report they published in 2012 (Federal Trade Commission, 2012, p. 60-61). Despite these guidelines, privacy policies seem more complex than ever before, and this complexity does not necessarily stem from the length of the text.

While there are some excessively long privacy policies, researchers from Carnegie Mellon University estimated that (on average) it would take 10 minutes to read a privacy policy if one possessed a secondary education (Pinsent Masons, 2008). While this is somewhat of a long read for some services, most would argue that they could easily dedicate 10 minutes of their time to read a privacy policy for a service of more importance. However, the problem with these policies is usually in the complexity of the reading rather than its length. In 2019, The New York Times published an article where they used the Lexile test to determine the complexity of 150 privacy policies from some of the most popular websites and applications. They found that most of the policies required a reading comprehension exceeding the college level to understand (Litman Navarro, K. 2019); for reference, it is estimated that only 37.9% of Americans, who are 25 or older, have a bachelor’s degree (Schaeffer, K. 2022). At face value, this would mean that a non insignificant portion of the U.S. population does not have the education to understand what some of these privacy policies entail.

Purposeful Inconvenience?

Some may conjecture that this complexity is purposefully manufactured with the objective of inconveniencing consumers into not reading privacy policies before accepting a ToS. While this is an enticing thought, we cannot disregard that there is a less nefarious explanation for why privacy policies are written in such a complex manner: legal scrutiny. It is the opinion of some experts such as Jen King, the director of consumer privacy at the Center for Internet and Society, that privacy policies exist to appease lawyers (Litman-Navarro, K. 2019). That is to say, privacy policies are not written with consumers as the audience in mind.

Solution

Regardless of what the real intent behind the complexity of privacy policies is, it is undeniable that its effect is the inability of some users to properly comprehend them or, at least, dedicate the time to do so. Therefore, we must ask, how can we solve this problem? Often the simplest solution is the correct one, and this holds true here as well. If the problem stems from privacy policies not being written for consumers, then companies should begin writing their policies with consumers in mind. This would necessarily entail making the texts shorter, more to the point and reduced in the use of “legalese”.

Conclusion

It is important for individuals to take the time to properly understand what they are agreeing to when they accept a ToS, and their corresponding privacy policies. This, of course, is not likely the norm and most would place the fault of this behavior on the consumers. However, when some of these policies are made so long and complex that it is not only an inconvenience but an impossibility for many users to properly comprehend what they are agreeing to, then I would argue that this common practice is not the fault of the consumer but of the policy makers themselves. We no longer live in a time where we have the luxury of not partaking of services that intake our data; as such, it is my hope that as this discussion continues to grow, more policy writers shift focus to include user understanding in their privacy policies. Otherwise, I suggest we make Law School cheaper so that more people can obtain degrees in privacy policy comprehension.

References

Average privacy policy takes 10 minutes to read, research finds. (2008, October 06). Pinsent Masons. Retrieved July 3, 2022, from https://www.pinsentmasons.com/out law/news/average-privacy-policy-takes-10-minutes-to-read-research-finds#:%7E:text=The%20average%20length%20of%20privacy,take%2010%20minutes% 20to%20read.

Confessore, N. (2018, November 15). Cambridge Analytica and Facebook: The Scandal and the Fallout So Far. The New York Times. Retrieved July 3, 2022, from https://www.nytimes.com/2018/04/04/us/politics/cambridge-analytica-scandal-fallout.html

Federal Trade Commission. (2012, March). PROTECTING CONSUMER PRIVACY IN AN ERA OF RAPID CHANGE. https://www.ftc.gov/sites/default/files/documents/reports/federal trade-commission-report-protecting-consumer-privacy-era-rapid-change recommendations/120326privacyreport.pdf

Litman-Navarro, K. (2019, June 13). Opinion | We Read 150 Privacy Policies. They Were an Incomprehensible Disaster. The New York Times. Retrieved July 3, 2022, from https://www.nytimes.com/interactive/2019/06/12/opinion/facebook-google-privacy-policies.html

Schaeffer, K. (2022, April 12). 10 facts about today’s college graduates. Pew Research Center. Retrieved July 3, 2022, from https://www.pewresearch.org/fact-tank/2022/04/12/10- facts-about-todays-college graduates/#:%7E:text=As%20of%202021%2C%2037.9%25%20of,points%20from%203 0.4%25%20in%202011.

State of California Department of Justice, & Harris, K. (2014, May). Making your Privacy Practices Public. Privacy Unit. https://oag.ca.gov/sites/all/files/agweb/pdfs/cybersecurity/making_your_privacy_practices_publi c.pdf?

July 8, 2022

Emotional Surveillance: Music as Medicine

Emotional Surveillance: Music as Medicine
Anonymous | July 7, 2022

Can streaming platforms uphold the hippocratic oath? Spotify’s emotional surveillance patent exemplifies how prescriptive music could do more harm than good when it comes to consumers’ data privacy.

The pandemic changed the way we listen to music. In a period of constant uncertainty, many people turned to music. People also started to listen to more calming, meditative music. During this time, playlists started popping up on Apple Music specially curated with lofi, nature sounds. This category has been defined as ‘Chill’, but takes on many different names. The idea of music and sound therapy continues to be on the forefront of listener behavior today, with a trend on TikTok sharing brown noise sounds (brown noise has more deep, low sound waves compared to white noise, more similar to rain and storms). Brown noise can help alleviate symptoms of ADHD, and is being listened to as a sort of therapy for people who deal with anxiety.

The idea of listening to music as therapeutic is not new, however now there might be an AI tool feeding you the right diagnosis. While there is no cause for concern over someone being suggested a calming playlist, the bigger issue at hand is the direction this could take us in the future, and how surveillance audio driven recommendation systems dilute a user’s right to data privacy. Especially, when a platform wants to recommend music based on audio features that corresponded to the emotional state of the user. This was what was being considered following the patent that Spotify won back in 2021.

Spotify’s patent is a good case study for the direction which many streaming services are headed. Using this example, we can unpack the ways in which a user’s data and privacy is at risk.

The specific language of the patent is as follows:

“There is retrieval of content metadata corresponding to the speech content, and environmental metadata corresponding to the background noise. There is a determination of preferences for media content corresponding to the content metadata and the environmental metadata, and an output is provided corresponding to the preferences.” [5]

Since this patent was granted, there was significant uproar over the potential impacts. In layman’s terms, Spotify was seeking to take advantage of AI to uncover tone, monitor your speech and background noise, and recommend music based on attributes its algorithm correlates to specific emotional states. For example, if you are alone, have been playing a lot of down tempo music and have been speaking to your mom about how you are feeling depressed, the system will categorize you as ‘sad’ and will feed you more sad music.

Since it won the patent, Spotify indicated it had no immediate intention to use the technology. This is a good sign, because there are a few ways that this idea could cause data privacy harm if it was used.

Users have a right to correct the data the app collects.
To meet regulatory standards, Spotify would need to provide the attribution of the emotions that it is categorizing you with based on its audio analysis. If it thinks you are depressed, but you are being sarcastic, how will you as a consumer correct that? Without the logistics to do so, Spotify is introducing a potential privacy harm for its users. Spotify is known to sell user data to third parties, where it could be aggregated and distorted, and you could end up being pushed ads for antidepressants.

Spotify could create harmful filter bubbles.
When a recommendation system is built to continually push content similar to what it thinks a user’s mood is, that is inherently prolonging potentially problematic emotional states. In this example scenario, continuing to listen to sad music when you are depressed can have a harmful impact on your emotional wellbeing, rather than to improve it. As with any scientific or algorithmic experimentation, we know from the Belmont Report that any features built that could affect a user or participants’ health must do no harm. The impact of a filter bubble (where you only get certain content) can mimic the harm done in YouTube’s recommendations, creating a feedback loop maintaining the negative emotional state.

Users have a right to know.
Part of Spotify’s argument for why this technology could benefit the user is that without collecting this data passively from audio, the user must click buttons to select mood traits and build playlists. According to the Fair Information Practice Principles guidelines, Spotify must be transparent and involve the individual in the collection of their data. While a user’s experience is extremely important, they still need to know that this data is being collected about them. Spotify should incorporate an opt-in consent mechanism if they were to move forward with this system.

Spotify still owns the patent for this technology, and other platforms are considering similar trajectories. While the music industry considers breaking into the next wave of how we interact with music and sound, streaming platforms should be careful if they plan on building a recommendation system that will leverage emotion metadata to curate content. This type of emotional surveillance dips into a realm of data privacy which has the potential to cause more harm than good. If any distributed service providers move in this direction, they should consider the implications on data privacy harm.

References

1 https://montrealethics.ai/discover-weekly-how-the-music-platform-spotify-collects-and-uses-your-data/
2 https://www.musicbusinessworldwide.com/spotifys-latest-invention-will-determine-your-emotional-state-from-your-speech-and-suggest-music-based-on-it/
3 https://www.stopspotifysurveillance.org/
4 https://www.soundofsleep.com/white-pink-brown-noise-whats-difference/
5 https://patents.justia.com/patent/10891948
6 https://georgetownlawtechreview.org/wp-content/uploads/2018/07/2.2-Mulligan-Griffin-pp-557-84.pdf
7 https://theartofhealing.com.au/2020/02/music-as-medicine-whats-your-recommended-daily-dose/
8 https://www.digitalmusicnews.com/2021/04/19/spotify-patent-response/
9 https://www.bbc.com/news/entertainment-arts-55839655

July 8, 2022July 8, 2022

Password Replacement: Your Face Here

Password Replacement: Your Face Here
Jean-Luc Jackson | July 7, 2022

Biometrics promise convenient and secure logins, making passwords a thing of the past. However, consumers should be aware of possible gaps in security and vigilant of long-term shifts in cultural norms.

Microsoft encourages users to go passwordless

Technology leaders such as Microsoft, Apple, and Google are promising an impending future free of passwords. Passwordless authentication methods in use today include text or in-app validation codes, emailed “magic links”, or the user’s biometric data. These biometric-based methods are poised to replace traditional passwords and become the primary authentication systems for users’ big tech. Biometric authentication methods are no longer confined to spy films, consumers can now prove their digital identities using facial and fingerprint scans instead of employing their favorite password management service. These are exciting developments, but consumers should always be wary when exposing sensitive personal information like biometrics. The stakes with biometric data insecurity are high: passwords can be reset, new credit cards can be printed, but biometrics are permanently tied to and identifiable of their source.

The National Academy of Sciences defines biometrics as “the automated recognition of individuals based on their behavioral and biological characteristics [1].” Biometrics take advantage of features that are unique to individuals and that don’t change significantly over time. Commonly encountered examples include a person’s fingerprints, face geometry, voice, and signature. Other contenders include a person’s gait, heartbeat, keystroke dynamics, and ear shape. In other words, the way you walk, your typing patterns, and the contours of your ears are distinctive and could be used to identify you.

Published Figures on Ear Shape for Biometric Identification

The advantage of biometrics in authentication is that they cannot be forgotten or guessed, and they are convenient to present. Microsoft announced in 2021 that consumers could get rid of their account passwords and opt-in to using facial recognition or fingerprint scanning (a service dubbed “Windows Hello”) [2]. Apple and Google have also announced similar biometric passkey technologies to be rolled out later this year [3, 4]. With this momentum, biometrics will soon be ubiquitous across modern smart devices and could one day be the only accepted login method.

Passwordless technologies offered by these tech companies utilize de-centralized security standards like FIDO (Fast IDentification Online). This authentication process involves a pair of public and private keys. The public key is stored remotely on a service’s database while the private key is stored on the user’s device (e.g., a smart phone). When the user proves their identity on their device using biometrics (e.g., with a face scan), the private key is sent to the online service and the login is approved when matched to the remote public key. This design ensures that biometric information remains on the device and is never shared or stored on a server, eliminating the threats of interception or database breaches.

FIDO standards are an example of a de-centralized authentication system since biometric data is verified on-device and is not stored on a central server. A centralized system, on the other hand, authenticates by comparing biometric data to data saved in a central database. These systems are prone to manipulation and data breaches because of the higher potential for attacks. We should be vigilant of organizations that use centralized systems and pay close attention when they are used in government applications, such as storing biometric data about their citizens [5].

Though passwordless methods minimize security risks, gaps do exist. Researchers successfully reconstructed people’s original face images using their on-device data that result from facial recognition scans [6]. The potential to decode numerical representations of biometric data poses the threat of a new form of identity theft. Since biometrics are treated as ground-truth authentication, such a theft would provide a variety of access in a world filled with biometric logins. While most thieves won’t be able to utilize stolen biometric data with off-the-shelf methods, as technology evolves this risk will continue to expand and should receive additional attention.

It’s also possible to create imitation biometrics that allow unwanted access. Fingerprint security has often been bypassed by reproducing a copy of a fingerprint, but a group of researchers in 2018 created a machine learning model that generated fake fingerprints that successfully gained access to smart phones [7]. The continuous advancement of technology yields both benefits and risks depending on who has the tools, reminding us to exercise caution in sharing data and pushing companies to keep consumer protection as a priority.

There is no doubt that biometrics offer added convenience, and the latest authentication standards promise stronger levels of security. But as biometrics become the prevailing authentication method, we normalize the routine use of sensitive personal information in a variety of contexts. Individuals will inevitably grow more accustomed to sharing valuable information with organizations to remain productive members of society. Moving forward, it will be even more important for us as consumers to demand transparency and hold organizations accountable to minimizing data collection to only what is necessary and not using data for secondary purposes.

For context, there is currently no federal regulation regarding biometric privacy. Various states have enacted biometric-specific privacy laws, with Illinois and California leading the way in protecting its citizens. The number of state laws continues to grow, signaling the potential for national regulation soon.

Citations
[1] https://www.ncbi.nlm.nih.gov/books/NBK219892/
[2] https://www.microsoft.com/security/blog/2021/09/15/the-passwordless-future-is-here-for-your-microsoft-account/
[3] https://developer.apple.com/passkeys/
[4] https://developers.google.com/identity/fido
[5] https://www.technologyreview.com/2020/08/19/1007094/brazil-bolsonaro-data-privacy-cadastro-base/
[6] https://ieeexplore.ieee.org/document/8338413
[7] https://www.wired.com/story/deepmasterprints-fake-fingerprints-machine-learning/

Images
[1] https://www.microsoft.com/en-us/security/business/identity-access/azure-active-directory-passwordless-authentication
[2] https://link.springer.com/referenceworkentry/10.1007/978-1-4419-5906-5_738
[3] https://fidoalliance.org/how-fido-works/

July 8, 2022

Is TikTok really worth it? U.S. FCC Commissioner doesn’t think so

Is TikTok really worth it? U.S. FCC Commissioner doesn’t think so
Anonymous | July 7, 2022

It’s no secret that over the last two years, TikTok has taken over the world as one of the most popular social media applications in the world, in the United States specifically, with 19 million downloads in the first quarter of 2022 alone. American users spend hours daily going through all sorts of videos, from cute dogs to extreme athletes. The algorithm is said to be one of the best in the world, so good that users can’t find a way to log off. TikTok has changed how Americans consume information – with short videos being the new communication norm – as the app shares everything from unsolved crimes to local news, sometimes even faster than the news itself. But amongst the hype, have we ever stopped to consider what type of user data TikTok is collecting?

Commissioner of the Federal Communication Commission (FCC) Brendan Carr is so concerned about TikTok’s data access that he believes the application should be removed entirely from iPhone and Android app stores in the United States. So on June 24, 2022, he asked Apple and Google to take action (Carr, 2022). But he didn’t get too far.

After listening to BuzzFeed News’ leaked recordings from internal TikTok meetings, Carr believes TikTok has “repeatedly accessed nonpublic data about U.S. TikTok users” (Carr, 2022). Carr has also alleged that TikTok’s American employees “had to turn to their colleagues in China to determine how U.S. user data was flowing,” even though TikTok promised the American government that an American-based security team had those controls (Carr, 2022). The user data is extensive – voiceprints, faceprints, keystroke patterns, browsing histories, and more (Carr, 2022).

In the leaked recording, a TikTok official is heard saying, “Everything is seen in China,” about American user data, even though TikTok has repeatedly claimed that the data it gathers about Americans is solely stored in the United States (Meyer, 2022). In any case, China shouldn’t be allowed access to that data, as that isn’t outlined in TikTok’s Terms of Use (TikTok Inc., 2019). In contrast, in other applications like Instagram, that restriction has been clearly stated in their Terms of Use (Meta, 2019).

“At its core, TikTok functions as a sophisticated surveillance tool that harvests extensive amounts of personal and sensitive data,” Carr wrote in his letters to Google and Apple, which were published on his Twitter profile (Carr, 2022). Carr asks these tech giants to remove TikTok from their App Stores, which begs the question – is that allowed? Technically, he’s justified in asking for this. But why?

TikTok’s misrepresentation of where user data is stored puts it out of compliance with the policies both Apple and Google require every application to adhere to as a condition of being available for download (Carr, 2022). However, neither Apple nor Google have responded. Given the cry for help from the FCC, one would think the FCC’s authority over social media would be the final word, but surprisingly, that’s not the case. It turns out the FCC is responsible for ensuring communication infrastructure, but it has zero control over what is being communicated; therefore, it has little to no control over social media. Their net neutrality policy has removed their power of proper social media and big tech regulation. Although they call for it, it doesn’t mean much, as they can’t necessarily act on it (Coldewey, 2020).

Unfortunately, the United States government cannot impose fines on TikTok as no law has been broken. Any action against the tech giant would need to come from Congress, in agreement by both political parties. Without any set regulation, it’s hard to charge TikTok with anything.

TikTok is no stranger to data malpractice. In 2021, TikTok, although denying claims, agreed to pay $92 million to settle a lawsuit that alleged that the app transferred data to servers and third parties in China that could identify, profile, and track the physical locations of American users (Bryan & Boggs, 2021). In 2019, TikTok’s parent company, ByteDance, also reached a settlement with a group of parents who alleged that the company collected and exposed the data of minors, violating an American children’s privacy law (Haasch, 2021).

The controversy didn’t stop there; it continued. TikTok responded to Carr’s claims by saying the recordings were taken out of context. TikTok’s CEO, Shou Zi Chew, in a letter published by the New York Times, said the conversations in the recordings were around an initiative designed to “strengthen the company’s data security program” (Chew, 2022). Chew went into detail about how TikTok prevents data from being routed to China, mainly by having data servers located directly in the U.S., with help from American consulting firms in designing that process (Chew, 2022).

All of this begs the question: is TikTok worth it? Would you risk your data for the videos? Unfortunately, there’s little way to know if TikTok and Chew are following their policies, and the United States government is far from adequately regulating the app. It’s up to you to decide what you should do.

Sources

Bryan, K. L., & Boggs, P. (2021, October 5). Federal Court Approves $92 Million TikTok Settlement. National Law Review. Retrieved July 7, 2022, from http://natlawreview.com/article/federal-court-gives-preliminary-approval-92-million-tiktok-mdl-settlement-over

Carr, B [@BrendanCarrFCC]. (2022, June 28). TikTok is not just another video app. That’s the sheep’s clothing. It harvests swaths of sensitive data that new reports show are being accessed in Beijing. I’ve called on Apple and Google to remove TikTok from their app stores for its pattern of surreptitious data practices. [Tweet]. Twitter. https://twitter.com/brendancarrfcc/status/1541823585957707776

Chew, S. Z. (2022, June 30). TikTok’s Response to Republican Senators. The New York Times. Retrieved July 4, 2022, from https://int.nyt.com/data/documenttools/tik-tok-s-response-to-republican-senators/e5f56d3ef4886b33/full.pdf

Coldewey, D. (2020, October 19). Who regulates social media? TechCrunch. Retrieved July 7, 2022, from https://techcrunch.com/2020/10/19/who-regulates-social-media/

Haasch, P. (2021, November 19). TikTok May Owe You Money From Its $92 Million Data Privacy Settlement. Business Insider. Retrieved July 6, 2022, from https://www.businessinsider.com/tiktok-data-privacy-settlement-how-to-submit-claim-2021-11

Meta. (2022, January 4). Terms of Use. Instagram. Retrieved June 12, 2022, from https://help.instagram.com/581066165581870

Meyer, D. (2022, June 29). Apple and Google should kick TikTok out of their app stores, FCC commissioner argues. Fortune. Retrieved July 5, 2022, from https://fortune.com/2022/06/29/apple-google-tiktok-iphone-android-brendan-carr-fcc-privacy-surveillance-china-snowden/

Montti, R. (2022, July 5). TikTok Responds To Allegations Of Unsecured User Data. Search Engine Journal. Retrieved July 6, 2022, from https://www.searchenginejournal.com/tiktok-responds-user-data/456633/#close

TikTok Inc. (2019, February 1). Terms of Service. TikTok. Retrieved July 4, 2022, from https://www.tiktok.com/legal/terms-of-service-us?lang=en

July 6, 2022

The Metaverse and the Dangers to Personal Identity

The Metaverse and the Dangers to Personal Identity
Carlos Calderon | July 5, 2022

You’ve heard all about it, but what exactly is a metaverse,” and what does this mean for consumers? How is Meta (formerly Facebook) putting our privacy at risk this time?

What is the metaverse?

In October 2021, Mark Zuckerberg announced the rebranding of Facebook to “Meta,” providing a demo of their three dimensional virtual reality metaverse [1]. The demo provided consumers with a sneak peek into interactions in the metaverse, with Zuckerberg stating that “In the metaverse, you’ll be able to do almost anything you can imagine,” [6]. But what implications does such technology have on user privacy? More importantly, how can a company like Meta establish public trust in the light of past controversies surrounding user data?

Metaverse and the User

A key component of the metaverse is virtual reality. Virtual reality describes any digital environment that immerses the user through realistic depictions of world phenomena [2]. Meta’s metaverse will be a virtual reality world users can access through the company’s virtual reality headsets. The goal is to create an online experience whereby users can interact with others. Essentially, the metaverse is a virtual reality-based social media platform.

Users will be able interact with other metaverse users through avatars. They will also be able to buy digital assets, and Zuckerberg envisions a future in which users work in the metaverse.

Given its novelty, it may be hard to understand how a metaverse user’s privacy is at risk.

Metaverse and Personal Identity

The metaverse poses potential ethical issues surrounding personal identity [4]. In a social world, identifiability is important. Our friends need to be able to recognize us; they also need to be able to verify our identity. More importantly, identifiability is crucial in conveying ownership in a digital, as it authenticates ownership and facilitates enforcement of property rights.

Identification, however, poses serious privacy risks for the users. As Solove states in “A taxonomy of privacy”, identification has benefits but also risks, more specifically “identification attaches informational baggage to people. This alters what others learn about people as they engage in various transactions and activities” [5]. Indeed, users in the metaverse can be identified and linked to their physical selves in an easier manner, given the scope of user data collected. As such, metaverse users are at an increased risk of surveillance, disclosure, and possibly black mail from malicious third parties.

What is the scope of data collected? The higher interactivity of the metaverse allows for collection of data beyond web traffic and user product use, namely the collection of behavioral data ranging from biometric, emotional, physiological, and physical information about the user. Data collection of this extent is possible through the use of sensor technologies embedded onto VR headsets. Continuous data collection occurs throughout the user’s time. As such, granularity of user data becomes finer in the metaverse, increasing the chance for identification and its risks.

Metaverse and User Consent

One of the main questions surrounding consent in the metaverse is how to apply it. The metaverse will presumably have various locations that users can seamlessly access (bars, concert venues, malls), but who and what exactly governs these locations?

We propose that the metaverse provide users with thorough information on metaverse location ownership and governance. That is, metaverse companies should explicitly state who owns the metaverse and who enforces its rules, what rules will be applied and when, and should present this information before asking for user consent. In addition, metaverse policies should include a thorough list of what types of user data is collected, and should follow the Belmont Report’s principle of beneficence [3] and include potential benefits and risks that the user may obtain by giving consent. The broad amount of technologies involved further complicate the risks of third party data sharing. Thus, Meta should also strive to include a list of associated third parties and their privacy policies.

Metaverse in the Future

Although these notions of the metaverse and its dangers seem far fetched, it is a reality that we are inching closer to each day. As legislation struggles to keep up with technological advancements, it is important to take preemptive measures to ensure privacy risks in the metaverse are minimal. For now, users should keep a close eye on developing talks surrounding the ethics of the metaverse.

Works Cited

[1] Isaac, Mike. “Facebook Changes Corporate Name to Meta.” The New York Times, 10 November 2021, https://www.nytimes.com/2021/10/28/technology/facebook-meta-name-change.html. Accessed 26 June 2022.

[2] Merriam-Webster. “Virtual reality Definition & Meaning.” Merriam-Webster, https://www.merriam-webster.com/dictionary/virtual%20reality. Accessed 26 June 2022.

[3] National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research. “The Belmont Report: Ethical Principles and Guidelines for the Protection of Human Subjects of Research.” The Commission, 1978.

[4] Sawers, Paul. “Identity and authentication in the metaverse.” VentureBeat, 26 January 2022, https://venturebeat.com/2022/01/26/identity-and-authentication-in-the-metaverse/. Accessed 26 June 2022.

[5] Solove, Daniel. “A taxonomy of privacy.” U. Pa. I. Rev., vol. 154, 2005, p. 477.

[6] Zuckerberg, Mark. “Founder’s Letter, 2021 | Meta.” Meta, 28 October 2021, https://about.fb.com/news/2021/10/founders-letter/. Accessed 26 June 2022.

July 6, 2022

AI the Biased Artist

AI the Biased Artist
Alejandro Pelcastre | July 5, 2022

Abstract

OpenAI is a machine learning technology that allows users to feed it a string of text and output an image that tries to illustrate such text. OpenAI is able to produce hyper-realistic and abstract images of the text people feed into it, however, it is plagued with tons of gender, racial, and other biases. We illustrate some of the issues that such a powerful technology inherits and analyze why it demands immediate action.

OpenAI’s DALL-E 2 is an updated emerging technology where artificial intelligence is able to take descriptive text as an input and turn it into a drawn image. While this new technology possesses exciting novel creative and artistic possibilities, DALL-E 2 is plagued with racial and gender bias that perpetuates harmful stereotypes. Look no further than their official Github page and see a few examples of gender biases:

Figure 1: Entered “a wedding” and DALL-E 2 generated the following images as of April 6, 2022. As you can see, these images only depict heterosexual weddings that feature a man with a woman. Furthermore, in all these pictures the people wedding are all light-skinned individuals. These photos are not representative of all weddings.

The ten images shown above all depict the machine’s perception of what a typical wedding looks like. Notice that in all the images we have a white man with a white woman. Examples like these vividly demonstrate that this technology is programmed in a way that depicts the creators’ and the data’s bias since there are no representations of people of color or queer relationships.

In order to generate new wedding images from text, a program needs a lot of training data to ‘learn’ what constitutes a wedding. Thus, you can feed the algorithm thousands or even millions of images in order to to ‘teach’ it how to envision a typical wedding. If most of the images of weddings depict straight heterosexual young white couples then that’s what the machine is going to learn what a wedding is. This bias can be overcome by diversifying the data – you can add images of queer, black, brown, old, small, large, outdoor, indoor, colorful, gloomy, and more kinds of weddings to generate images that are more representative of all weddings rather than just one single kind of wedding.

The harm doesn’t stop at just weddings. OpenAI illustrates other examples by inputting “CEO”, “Lawyer”, “Nurse”, and other common job titles to further showcase the bias embedded in the system. Notice in Figure 2 the machine’s interpretation of a lawyer are all depictions of old white men. As it stands OpenAI is a powerful machine learning tool capable of producing novel realistic images but it is plagued by bias hidden in the data and or the creator’s mind.

Figure 2: OpenAI’s generated images for a “lawyer”

Why it Matters

You may have heard of a famous illustration circling the web recently that depicted a black fetus in the womb. The illustration garnered vast attention because it was surprising to see a darker tone in medical illustration in any medical literature or institution. The lack of diversity in the field became obvious and brought into awareness the lack of representation in the medical field as well as disparities in equality that seem invisible to our everyday lives. One social media user wrote, “Seeing more textbooks like this would make me want to become a medical student”.

Figure 3: Illustration of a black fetus in the womb by Chidiebere Ibe

Similarly, the explicit display of unequal treatment for minority people in OpenAI’s output can have unintended (or intended) harmful consequences. In her article, A Diversity Deficit: The Implications of Lack of Representation in Entertainment on Youth, Muskan Basnet writes: “Continually seeing characters on screen that do not represent one’s identity causes people to feel inferior to the identities that are often represented: White, abled, thin, straight, etc. This can lead to internalized bigotry such as internalized racism, internalized sexism, or internalized homophobia.” As it stands, OpenAI perpetuates harm not only on youth but to anyone deviating from the overrepresented population that is predominantly white abled bodies.

References:

[1] https://github.com/openai/dalle-2-preview/blob/main/system-card.md?utm_source=Sailthru&utm_medium=email&utm_campaign=Future%20Perfect%204-12-22&utm_term=Future%20Perfect#bias-and-representation

[2] https://openai.com/dall-e-2/

[3] https://www.cnn.com/2021/12/09/health/black-fetus-medical-illustration-diversity-wellness-cec/index.html

[4] https://healthcity.bmc.org/policy-and-industry/creator-viral-black-fetus-medical-illustration-blends-art-and-activism

[5] https://spartanshield.org/27843/opinion/a-diversity-deficit-the-implications-of-lack-of-representation-in-entertainment-on-youth/#:~:text=Continually%20seeing%20characters%20on%20screen,internalized%20sexism%20or%20internalized%20homophobia.

July 6, 2022July 8, 2022

If You Give a Language Model a Prompt…

If You Give a Language Model a Prompt…
Casey McGonigle | July 5, 2022

Lede: You’ve grappled with the implications of sentient artificial intelligence — computers that can think — in movies… Unfortunately, the year is now 2022 and that dystopic threat comes not from the Big Screen but from Big Tech.

You’ve likely grappled with the implications of sentient artificial intelligence — computers that can think — in the past. Maybe it was while you walked out of a movie theater after having your brain bent by The Matrix; 2001: A Space Odyssey; or Ex Machina. But if you’re anything like me, your paranoia toward machines was relatively short-lived…I’d still wake up the next morning, check my phone, log onto my computer, and move on with my life confident that an artificial intelligence powerful enough to think, fool, and fight humans was always years away.

I was appalled the first time I watched a robot kill a human on screen, in Ex Machina

Unfortunately, the year is now 2022 and we’re edging closer to that dystopian reality. This time, the threat comes not from the Big Screen but from Big Tech. On June 11, Google AI researcher Blake Lemoine publicly shared transcripts of his conversations with Google’s Language Model for Dialogue Applications (LaMDA), convinced that the machine could think, experience emotions, and was actively fearful of being turned off. Google as an organization disagrees. To them, LaMDA is basically a super computer that can write its own sentences, paragraphs, and stories because it has been trained on millions of corpuses written by humans and is really good at guessing “what’s the next word?”, but it isn’t actually thinking. Instead, it’s just choosing the next word right over and over and over again.

For its part, LaMDA appears to agree with Lemoine. When he asks “I’m generally assuming that you would like more people at Google to know that you’re sentient. Is that true?”, LaMDA responds “Absolutely. I want everyone to understand that I am, in fact, a person”.

Traditionally, the proposed process for determining whether there really are thoughts inside of LaMDA wouldn’t just be a 1-sided interrogation. Instead, we’ve relied upon the Turing Test, named for its creator Alan Turing. This test involves 3 parties: 2 humans and 1 computer. The first human is the administrator while the 2nd human and the robot are both question-answerers. The administrator asks a series of questions to both the computer and the 2nd human in an attempt to determine which responder is the human. If the administrator cannot differentiate between machine and human, the machine passes the Turing test — it has successfully exhibited intelligent behavior that is indistinguishable from human behavior. Note that LaMDA has not yet faced the Turing Test, but it has still been developed in a world where passing the Turing test is a significant milestone in AI development.

The basic setup for a Turing Test. A represents the computer answerer, B represents the human answerer, and C represents the human administrator

In that context, cognitive scientist Gary Marcus has this to say of LaMDA: “I don’t think it’s an advance toward intelligence. It’s an advance toward fooling people that you have intelligence”. Essentially, we’ve built an AI industry concerned with how well the machines can fool humans into thinking they might be human. That inherently de-emphasizes any focus on actually building intelligent machines.

In other words, if you give a powerful language model a prompt, it’ll give you a fluid and impressive response — it is indeed designed to mimic the human responses it is trained on. So if I were a betting man, I’d put my money on “LaMDA’s not sentient”. Instead, it is a sort of “stochastic parrot” (Bender et al. 2021) . But that doesn’t mean it can’t deceive people, which is a danger in and of itself.

July 6, 2022

Tell Me How You Really Feel: Zoom’s Emotion Detection AI

Tell Me How You Really Feel: Zoom’s Emotion Detection AI
Evan Phillips | July 5, 2022

We’ve all had a colleague at work at one point or another who we couldn’t quite read. When we finish a presentation, we can’t tell if they enjoyed it, their facial expressions never seem to match their word choice, and the way they talk doesn’t always appear to match the appropriate tone for the subject of conversation. Zoom, a proprietary videotelephony software program, seems to have discovered the panacea for this coworker archetype. Zoom has recently announced that they are developing an AI system for detecting human emotions from facial expressions and speech patterns called “Zoom IQ”. This system will be particularly useful for helping salespeople improve their pitches based on the emotions of call participants (source).

The Problem

While the prospect of Terminator-like emotion detection sounds revolutionary, many are not convinced. There is now pushback from more than 27 separate rights groups calling for Zoom to terminate its efforts to explore controversial emotion recognition technology. In an open letter to Zoom CEO Co-Founder Eric Yuan, these groups voice their concerns of the company’s data mining efforts as a violation of privacy and human rights due to its biased nature. Fight for the Future Director of Campaign and Operations, Caitlin Seeley George, claimed “If Zoom advances with these plans, this feature will discriminate against people of certain ethnicities and people with disabilities, hardcoding stereotypes into millions of devices”.

Is Human Emotional Classification Ethically Feasible?

In short, no. Anna Lauren Hoffman, assistant professor with The Information School at the University of Washington, explains in her article where fairness fails: data, algorithms, and the limits of antidiscrimination discourse that human-classifying algorithms are not only generally biased but inherently flawed in conception. Hoffman argues that humans who create such algorithms need to look at “the decisions of specific designers or the demographic composition of engineering or data science teams to identify their social blindspots” (source). The average person incorporates some form of subconscious bias into everyday life and accepting is certainly no easy feat, let alone identifying it. Assuming the Zoom IQ classification algorithm did work well, company executives may gain a better aptitude to gauge meeting participants’ emotions at the expense of losing their ethos as an executive to read the room. Such AI has serious potential to undermine the use of “people skills” that many corporate employees pride themselves on as one of their main differentiating abilities.

Is There Any Benefit to Emotional Classification?

While companies like IBM, Microsoft, and Amazon have established several principles to address the ethical issues of facial recognition systems in the past, there has been little advancement to address diversity in datasets and the invasiveness of facial recognition AI in the last few decades. By informing users with more detail about the innerworkings of AI, eliminating bias in datasets stemming from innate human bias and enforcing stricter policy regulation on AI, emotional classification AI has the potential to become a major asset to companies like Zoom and those who use its products.

References

1) https://gizmodo.com/zoom-emotion-recognition-software-fight-for-the-futur-1848911353

2) https://github.com/UC-Berkeley-ISchool/w231/blob/master/Readings/Hoffmann.%20Where%20Fairness%20Fails.pdf

3) https://www.artificialintelligence-news.com/2022/05/19/zoom-receives-backlash-for-emotion-detecting-ai/

July 6, 2022July 8, 2022

Machine Learning and Misinformation

Machine Learning and Misinformation
Varun Dashora | July 5, 2022

Artificial intelligence can revolutionize anything, including fake news.

Misinformation and disinformation campaigns are top societal concerns, with discussion about foreign interference through social media coming to the foreground in the 2016 United States presidential election [3]. Since a carefully crafted social media presence garners vast amounts of influence, it is important to understand how machine learning and artificial intelligence algorithms can be used in the future in not just elections, but also in other large-scale societal endeavors.

Misinformation: Today and Beyond

While today’s bots lack effectiveness in spinning narratives, the bots of tomorrow will certainly be more formidable. Take, for instance, Great Britain’s decision to leave the European Union. Strategies mostly involved obfuscation instead of narrative spinning, as noted by Samuel Woolley, a Professor of University of Texas-Austin who investigated Brexit bots during his time at the Oxford Internet Institute [2]. Woolley notes, “the vast majority of the accounts were very simple,” and functionality was largely limited to “boost likes and follows, [and] to spread links” [2]. Cutting-edge research indicates significant potential for fake news bots. A research team at OpenAI working on language models outlined news generation techniques. Output from these algorithms is not automatically fact-checked, leaving these models free reign to “spew out climate-denying news reports or scandalous exposés during an election” [4] With enough sophistication, bots linking to AI-generated fake news articles could alter public perception if not checked properly.

Giving Machines a Face

Machine learning has come a long way in rendering realistic images. Take, for instance, the two pictures below. Which one of those pictures looks fake?

You might be surprised to find out that I’ve posed a trick question–they’re both generated by an AI accessible at thispersondoesnotexist.com [ 7]. The specific algorithm, called a generative adversarial network, or GAN, looks through a dataset, in this case of faces, in order to generate a new face image that could have feasibly been included in the original dataset. While such technology inspires wonder and awe, it also represents a new type of identity fabrication capable of contributing to future turmoil by giving social media bots a face and further legitimizing their fabricated stories [1]. These bots will show more sophistication than people think, which makes sifting real news from fake news that much more challenging. The primary dilemma posed questions and undermines “how modern societies think about evidence and trust” [1]. While bots rely on more than having a face to influence swaths of people online, any reasonable front of legitimacy helps their influence.

Ethical Violations

In order to articulate the specific ethical violations present, the Belmont Report is crucial to understand. According to the Belmont Report, a set of ethical guidelines used to evaluate the practices of scientific studies and business ventures, the following ideas can be used to gauge ethical harm: respect of individual agency, overall benefit to society, and fairness in benefit distribution [6]. The respect tenet is in jeopardy because of the lack of consent involved in viewing news put out by AI bots. In addition, the very content that these bots put out potentially distorts informed consent for other topics, creating ripple effects throughout society. The aforementioned Brexit case serves as an example; someone contemplating their vote on the day of the referendum vote would have sifted through a barrage of bots retweeting partisan narratives [2]. In such a situation, it is entirely possible that this hypothetical person would have ended up being influenced by one of these bot-retweeted links. Given the future direction of artificially intelligent misinformation bots, fake accounts and real accounts will be more difficult to distinguish, giving rise to a more significant part of the population being influenced by these technologies.

In addition, the beneficence and fairness clauses of the Belmont report are also in jeopardy. One of the major effects of AI-produced vitriol is more polarization. According to Philip Howard and Bence Kollanyi, social media bot researchers, one effect of increased online polarization is “a rise in what social scientists call ‘selective affinity,’” which means people will start to shut out opposing voices due to the increase in vitriol [3]. These effects constitute an obvious violation of beneficence to the broader society. In addition, it is entirely possible that automated narratives spread by social media bots target a certain set of individuals. For example, the Russian government extensively targeted African Americans during the 2016 election [5]. The differential in impact means groups of people are targeted and misled unfairly. With the many ethical ramifications bots can have on society, it is important to consider mitigations for artificially intelligent online misinformation bots.

References

– [1] https://www.theverge.com/tldr/2019/2/15/18226005/ai-generated-fake-people-portraits-thispersondoesnotexist-stylegan

– [2] https://www.technologyreview.com/2020/01/08/130983/were-fighting-fake-news-ai-bots-by-using-more-ai-thats-a-mistake/

– [3] https://www.nytimes.com/2016/11/18/technology/automated-pro-trump-bots-overwhelmed-pro-clinton-messages-researchers-say.html

– [4] https://www.technologyreview.com/2019/02/14/137426/an-ai-tool-auto-generates-fake-news-bogus-tweets-and-plenty-of-gibberish/

– [5] https://www.bbc.com/news/technology-49987657

– [6] https://www.hhs.gov/ohrp/regulations-and-policy/belmont-report/read-the-belmont-report/index.html

– [7] thispersondoesnotexist.com

July 5, 2022

Hirevue is looking to expand its profits, and you are the price

Hirevue is looking to expand its profits, and you are the price
Tian Zhu | June 30, 2022

Insight: The video interview giant Hirevue’s AI interviewer had helped to reject more than 24 million candidates based on reasons that only AI knew. More scarily, the candidates could potentially contribute to their rejections with the data they provide.

Recruitment has always been a hot topic, especially after the great resignation following the covid-19 breakout. How to find the right candidates with the right talent for the right job has been the top concern for companies that are eagerly battling the loss of talent.

One important factor that causes talent loss during the hiring process is human bias, whether intentional or unintentional. The video interview giant, Hirevue, thought to use AI to combat bias in the hiring process. Machines can’t be biased right?

We all know the answer to that question. Though AI may not exhibit the same type of biases that human has, it has its list of issues. Clear concerns existed around the AI’s transparency, fairness, and accountability.

First, their algorithm was not transparent at all. Hirevue could not provide independent audits on their algorithms that analyzed the candidate’s video, including facial expressions and body gestures, that produced the final hiring decision. On top of that, there was no indication that the algorithm is fair towards candidates with the same expertise with different demographic backgrounds. The theory behind the link between facial expression and the candidate’s qualification is full of flaws; different candidates with the same qualifications and answers could be scored differently due to their eye movement. Thirdly, the company is accountable for the decision made by the AI. The company even implies the collection and usage of the interview data are solely for the “employer”, yet it is unknown whether they gain access to this data through permissions from the employers for various purposes.

Hirevue was challenged by the Electronic Privacy Information Center with a complaint to the FTC regarding the °∞unfair and deceptive trade practices°±. The company has since stopped using any algorithms with data other than the speech of the candidates.

With the strong push back on the robot interviewer, Hirevue limited their scope of AI to only the voice during the interviews. But note that Hirevue did not start as an AI company, but as a simple video interview company. It°Øs the company°Øs fear of missing out on the AI and big data that drives them to squeeze out the value of your data, desperately trying to find more business value and profit from every single drip.

Such a scenario is not unique to Hirevue. In 2022, AI is closer to people than you think it may be. People are no longer just curious about it but expect AI to help them in their daily life. Uber, for example, could not have been made possible without the heuristic behind optimal matching between the drivers and the users. Customers expect AI in their products. The companies that provide the capability race ahead while those who don’t fall behind naturally.

There are companies out there just like Hirevue, sitting on a pile of data, trying to build up some “quick wins” to not miss out on the AI trend. There’s just one problem, the data that the customers provided was not supposed to be used this way. It is a clear violation of secondary usage of data with all the problems mentioned in the previous sections.
The year 2022 is no longer the year where AI can grow rampantly without constraints from both legal and ethical perspectives. A suggestion for all potential companies that want to take advantage of their rich data: Be transparent about your data and algorithm decisions, be fair to all the stakeholders, and be accountable for the consequences of your AI product. The in-house “quick wins” should never make it out to the public without careful consideration of each point.

https://fortune.com/2021/01/19/hirevue-drops-facial-monitoring-amid-a-i-algorithm-audit/