Concerns with Privacy in Virtual Reality

Concerns with Privacy in Virtual Reality
by Simran Bhatia | February 26, 2021

Jamie was only 13 when she started playing her first game in Virtual Reality (VR). She loved it because she was able to create her own avatar, do virtual fist bumps with people from across the world, all while just wearing a headset and some haptic clothes. However, Jamie didn’t know that her “just 20 minute” VR game session was recording 2 million data points about her. She also didn’t know that the owners of the game that she was playing were selling her data to health insurance companies. Years later, when Jamie went to apply for health insurance, she was turned down because her body movement in VR data classified her as having high likelihood for chronic pain regional syndrome. While this is a made up situation, this is the power of data collected through VR. As of 2021, there are no regularizations or standards for data collected through VR which is scary because the VR market will hit $108 billion this year.

With the VR field expanding into a range of different fields, from healthcare to entertainment, there is high concern of the privacy of data being collected in this field. More different applications for VR means a diverse portfolio and volume of data being collected on each user, and currently with no regulation.

What is VR?

Virtual Reality (VR) is a technology that creates simulations of environments, and enables users to interact in these environments, through different devices. In the past, industries such as aerospace and defense have used VR for training and flight simulation purposes, but more recently, it has become an avid gaming tool, especially in the post-pandemic world. It has been pitched as the next great communications platform and user interface.

What does privacy in AR/VR mean?

As said, with great power comes great responsibility, the fascinating technology of VR brings with it an unprecedented ability to track body motions and consequently collect data on it’s users. Research has shown that the identifiability of users under VR, with specific tasks, the system was able to identify 95% users correctly when trained on less than 5 minutes of tracking data per person. Another research shows that with combined data of eye-gaze, hand position, height, head direction, biological and behavioral characteristics, there is 8 to 12 times better accuracy of identifying users, as compared to chance.

With each individual’s unique patterns of movement, anonymizing VR tracking data is nearly impossible, at least so far. This is because no person has the same hand movement as another. Similar to IP address, zip code and voice print, VR tracking data should be considered as “personally identifiable information” because it can be used to trace an individual’s identity. This type of data is similar to data in health and medical research, such as DNA sequence, which even when stripped of names and other identifying information, can be traced back to individuals through simple compilation with other public data sources. The reason of concern is that unlike medical data, VR tracking data is currently unregulated on how it is collected, used and shared, as it is not monitored by any external entity.

With Oculus dominating the hardware space in the VR industry currently, another area of concern was Oculus’ announcement that it will require a Facebook account for all users. This means that users are forced to accept Facebook’s Community Standards, which means that users can no longer remain completely anonymous on their device and that Facebook will own all VR tracking data along with their social media data, through Facebook, Instagram and Whatsapp. This puts Facebook, as a company, on having monopoly on most parts of a users’ data.


Source

Another privacy threat is posed by the setup of VR devices, with densely packed cameras, microphones and sensors that collect data about a users’ environment. This environment can be a users’ home, office, or community space which is getting exposed, as well.

What can be done in the future?

Privacy in VR will depend on concrete action now, not just through one person or organization, but instead as a community driven action. VR enthusiasts and technology reviewers need to prioritize privacy-conscious practice, and encourage the community to take actions towards regularization of the VR tracking data. VR developers need to take steps to ensure that they make their work transparent, yet secure. Most importantly, industry leaders need to introduce unique principles for monitoring and creating transparency on each part of the VR data process – collection, aggregation, processing, analysis and storage with utmost importance and security. As an industry practice, only data necessary for core functionality of the VR device or it’s software should be collected; moreover, each data point collected should be purposeful and companies should be transparent about the sensitive functionality of the data they collect.

The next step’s responsibility lies on the shoulders of VR users. Users need to be more aware about what they are giving consent to, when they sign up for VR games or other applications. Novice users, like Jamie, need to read the current Terms and Conditions for each part of the VR process and raise their voices to the industry if they are not comfortable with the data that they collect. Users need to be aware of their rights with the VR tracking data now, or else it might be too late.

References

Bailenson, J. (2018). Protecting Nonverbal Data Tracked in Virtual Reality. JAMA Pediatrics, 172(10), 905. doi.org/10.1001/jamapediatrics.2018.1909

Erlich, Y., Shor, T., Pe’er, I., & Carmi, S. (2018). Identity inference of genomic data using long-range familial searches. Science, 362(6415), 690–694. doi.org/10.1126/science.aau4832

Oculus. (2020, August 27). Facebook Horizon Invite-Only Beta Is Ready For Virtual Explorers | Oculus. Oculus Blog. www.oculus.com/blog/facebook-horizon-invite-only-beta-is-ready-for-virtual-explorers/

Pfeuffer, K., Geiger, M. J., Prange, S., Mecke, L., Buschek, D., & Alt, F. (2019). Behavioural Biometrics in VR. Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, 1–12. doi.org/10.1145/3290605.3300340

The Yale Tribune. (2019, April 12). VR/AR Privacy Concerns Emerging with the Field’s Development. campuspress.yale.edu/tribune/vrar-privacy-concerns-emerging-with-the-fields-development/

Miller, M.R., Herrera, F., Jun, H. et al. Personal identifiability of user tracking data during observation of 360-degree VR video. Sci Rep 10, 17404 (2020). doi.org/10.1038/s41598-020-74486-y

Diez, M. (2021, January 29). Virtual Reality Will Be A Part Of The Post-Pandemic Built World. Forbes. www.forbes.com/sites/forbesrealestatecouncil/2021/02/01/virtual-reality-will-be-a-part-of-the-post-pandemic-built-world/?sh=a08553348ded

Should TikTok be Banned?

Should TikTok be Banned?
by Mikayla Pugel | February 26, 2021

In the last couple of years and with large technology companies, there have been many concerns about data collection and processing, however, the issues were always kept inhouse. With the creation and rise of TikTok, the issue has been taken to another level since the data collection is leaving America. Throughout this article, I will discuss concerns about TikTok and reasons why some people want it banned, as well as walk through some reasons why the concerns may be misplaced and why the ban has not happened yet.

First, many American entities have already banned TikTok from their workers’ devices. Some of these groups include The Democratic National Committee, The Republican National Committee, the Coast Guard, the Marine Corps, and the TSA (Meisenzahl). Leaders from these groups are worried about the app gaining sensitive information from the device it is downloaded to. These worries are not without a warrant as Apple’s iOS 14 caught TikTok secretly accessing user’s clipboards (Doffman, July 09). However, other tech companies were caught doing the same thing, but TikTok was the only second-time offender. The concern around TikTok getting sensitive information is not limited to it being a vast tech company, but mainly because it is a Chinese-based company, and there are concerns about where the data may end up.

TikTok collects an abundance of data from all over the world, and many foreign leaders are concerned that the data may fall into the hands of Communist China. The company has made many claims that they would never give up user data to their government, however, the Chinese National Intelligence Law of 2017 says “any Chinese company can be drafted into espionage, and a company could be forced to hand over the data” (Ghaffary). These concerns of the foreign leaders seem validated and even the government of India has already taken the step to ban the Chinese company (Meisenzahl). However, the ban increased conflict between the two countries, and there would be similar fallout if American’s were to take similar steps.

The US and China already have their issues and there are concerns that if the US were to ban TikTok, the country’s relationship would continue to decline. There is the fear of retaliation from China as well as other Countries following similar bans on all large tech companies, most of which are American (Ghaffary). The Chinese government already has bans on major US tech companies and has worked to create copies of companies like Google, Facebook, and Uber. Americans are concerned that if Countries start to become paranoid with other companies owning their data, then the American economy will be hit hard.

There are many data collection and storage concerns, as there are with most technology companies, however, TikTok has been the leader in one main issue with data, and that is the collection and storage of data collected from children. The US has many laws on what data can be collected from children starting at a certain age, and since TikTok’s main user base is children, they have been at the head of a lot of controversies. TikTok recently agreed to pay $5.7 million in a settlement with the Federal Trade Commission over allegations of illegally collected personal data from children (Doffman, August 11). The FTC has also accused them of exposing locations of young children and not complying when they were instructed to delete certain information that had been collected from minors (Doffman, August 11).

Altogether there are many concerns with data collection and processing from foreign companies, the largest concern may come from the fear of censorship and manipulation of the public opinion within the site (Matsakis). As we have seen with the power Facebook holds over public opinion, TikTok could someday hold this much power, and it would be in the hands of the Chinese Government. Many leaders are concerned about this power, however, banning TikTok would not necessarily free the country from concerns of social media manipulation.

In conclusion, there are valid reasons to be concerned about TikTok, but in contrast, there are a vast number of reasons to not ban it. Many of the concerns brought up could be applied to most American technology companies and because of this, I do not believe the US government is ever going to do anything to remove TikTok’s place in America. Our government instead should be taking a step further to look at policies that apply to all data collected from any company, or how to decrease internet manipulation through education of our citizens, as it seems hypocritical to bash TikTok when we have Facebook to claim as ours.

References:
Doffman, Z. (2020, August 11). TikTok users-here’s why you should be worried. Retrieved February 22, 2021, from www.forbes.com/sites/zakdoffman/2020/08/11/tiktok-apple-iphone-google-android-data-security-update-warning-investigation-trump-ban/?sh=3b04029f3436
Doffman, Z. (2020, July 09). Yes, TikTok has a Serious China PROBLEM-HERE’S why you should be concerned. Retrieved February 22, 2021, from www.forbes.com/sites/zakdoffman/2020/07/09/tiktok-serious-china-problem-ban-security-warning/?sh=2445db3e1f22
Ghaffary, S. (2020, August 11). Do you really need to worry about your security ON Tiktok? Here’s what we know. Retrieved February 22, 2021, from www.vox.com/recode/2020/8/11/21363092/why-is-tiktok-national-security-threat-wechat-trump-ban
Matsakis, L. (n.d.). Does TikTok really pose a risk to US national security? Retrieved February 22, 2021, from www.wired.com/story/tiktok-ban-us-national-security-risk/
Meisenzahl, M. (2020, July 13). Trump is considering banning Chinese social media app tiktok.
see the full list of countries, companies, and organizations that have already banned it. Retrieved February 22, 2021, from www.businessinsider.com/tiktok-banned-by-countries-organizations-companies-list-2020-7

What’s your data worth?…

What’s your data worth?…
by Anonymous | February 26, 2021

…asks Alexander McCaig, CEO of Tartle at the end of an introductory video on company’s website. According to Alex, commercial enterprises around the globe make billions of dollars every year by selling their customers’ data (McCaig, 2020). Revenues generated by sales to third parties likely pale in comparison to enterprise value created through primary use of data to generate customer insights with potential to increase revenues and lower costs. Despite providing consent, some customers may not be fully aware of of how and what type of data about them is being collected and whether or not it is being sold.

Data privacy laws passed in recent years (e.g. GDPR, CCPA) have provided consumers with better information and greater control over their data. The laws have forced private enterprises and public institutions to offer greater transparency into their data collection, processing, usage and selling practices. Regulators hope that these new laws will lead to an increase in general population’s awareness of how individuals’ data is being used. Furthermore, to the extent that policies are effective, customers are likely to attribute greater, but still unknown, value to their own data.

Tartle, along with a handful of other private companies, believes that data is a precious asset, the value of which can be determined in the open market. Tartle’s success in helping individuals monetize their data ‘asset’ through secure and far-reaching marketplace connecting eager buyers and motivated sellers, at scale, may give society a big hand in equalizing the data privacy playing field.

Ignorance is Bliss, Seduction is Powerful

In an earlier blog post, Robert Hosbach discusses the “privacy paradox,” a phrase used to describe the significant discrepancy between stated concerns about privacy and actions taken to protect it (R. Hosbach, 2021). Lack of action is attributable to a number of factors, with individual ignorance being a meaningful contributor. According to one paper, up to 73% of American adults believe that the presence of privacy policy implies that their data will not be misused (J. Turow et al, 2018). What further exaggerates complacency are deliberate efforts by commercial enterprises to lead consumers into a sense of resignation by relying on four tactics of seduction: placation, diversion, misnaming and using jargon (NA Draper et al, 2019). Consumers need more help, and society needs to do more.

Evolving Policy Landscape, the “Carrot” or the “Stick”

“The new privacy law is a big win for data privacy,” says Joseph Turow, a privacy scholar and professor of communication at the Annenberg School for Communication at the University of Pennsylvania (Knowledge@Wharton, 2019). While 2020 was viewed as a big year for privacy professionals, 2021 may even be bigger. In addition to California passing “CCPA 2.0” late last year, a large number of other states have proposed new legislation. Moreover, with new administration taking office in January, some privacy advocates hope that 2021 will be the year in which U.S. passes GDPR-like federal privacy legislation (Husch Blackwell LLP, 2021). Stricter privacy laws may serve as an effective “stick,” but where is the “carrot”?

“Change Brings Opportunity”

This famous quote by Nido Qubein is used frequently by business leaders facing uncertainty. While evolving regulatory frameworks are likely to disrupt businesses for the benefit of consumers, they are unlikely to slow exponential growth of data. One Mckinsey & Co study points to a 300% growth in IoT to 43 billion data producing devices by 2023 and a 7-fold increase in number of digital interactions by 2025 (McKinsey & Co, 2019). Evolving privacy laws, greater customer awareness, combined with our ever-increasing reliance on data have given birth to companies like Tartle. While motivated by financial gain, these companies are also purpose-driven with potential to reduce income inequality across the globe and put monetary value on individual’s data privacy. So, ask yourself what is your data worth to you, and would you be willing to sell it?

Sources

McCaig, Alexander. Tartle.co (2020, January 8). www.youtube.com/watch?v=rslKr3W-Ex8&feature=youtu.be

Maintaining Privacy in Smart Home. Hosbach, Robert (2021, February 19). Retrieved from blogs.ischool.berkeley.edu/w231/blog/

Turow, Joseph & Hennessy, Michael & Draper, Nora. (2018). Persistent Misperceptions: Americans’ Misplaced Confidence in Privacy Policies, 2003–2015. Journal of Broadcasting & Electronic Media. 62. 461-478. 10.1080/08838151.2018.1451867.

Draper NA, Turow J. The corporate cultivation of digital resignation. New Media & Society. 2019;21(8):1824-1839. doi:10.1177/1461444819833331

Your Data Is Shared and Sold…What’s Being Done About It?. Knowledge@Wharton (2019, October 28). Retrieved from knowledge.wharton.upenn.edu/article/data-shared-sold-whats-done/

The Year To Come In U.S. Privacy & Cybersecurity Law, Husch Blackwell LLP (2021, January 28). Retrieved from www.jdsupra.com/legalnews/the-year-to-come-in-u-s-privacy-9238400/

Growing opportunities in the Internet of Things, McKinsey & Co (2019, July 29). Retrieved from
www.mckinsey.com/industries/private-equity-and-principal-investors/our-insights/growing-opportunities-in-the-internet-of-things?cid=eml-web

Maintaining Privacy in a Smart Home

Maintaining Privacy in a Smart Home
by Robert Hosbach | February 19, 2021


Source: futureiot.tech/more-smart-homes-will-come-online-in-2020/

Privacy is a hot topic. It is a concept that many of us feel we have a right to (a least in democratic societies) and is understandably something that we want to protect. What’s mine is mine, after all. But, how do you maintain privacy when you surround yourself with Internet-connected devices? In this post, I will briefly discuss what has come to be known as the “privacy paradox,� how smart home devices pose a challenge to privacy, and what we as consumers can do to maintain our privacy while at home.

The “Privacy Paradox”


Source: betanews.com/2018/10/08/privacy-paradox/

In 2001, Hewlett Packard published a study [1] about online shopping in which one of the conclusions was that participants claimed to care much about privacy, and yet their actions did not support this claim. The report dubbed this a “privacy paradox,” and this paradox has been studied numerous times over the past two decades with similar results. As one literature review [2] stated, “While users claim to be very concerned about their privacy, they nevertheless undertake very little to protect their personal data.” Myriad potential reasons exist for this. For instance, many consumers gloss over or ignore the fine print by habit; the design and convenience of using the product outweigh perceived harms from using the product; consumers implicitly trust the manufacturer to “do the right thing”; and some consumers remain blissfully ignorant of the extent to which companies use, share, and sell data. Whatever the underlying causes, the Pew Center published survey results in 2019 indicating that upwards of 60% of U.S. adults reported that they feel “very” or “somewhat” concerned about how companies and the government use their personal data [3]; yet, only about 1 in 5 Americans typically read the privacy policies they agree to [4]. And these privacy policies are precisely the documents that consumers should read if they are concerned about their privacy. At a high level, a privacy policy should contain information about what data the company or product collects, how those data are stored and processed, if and how data can be shared with third parties, and how the data are secured, among other things.

Even if you have never heard the term “privacy paradox” before, you can likely think of examples of the paradox in practice. For instance, you might think about Facebook’s Cambridge Analytica debacle, along with the other lower-profile data privacy issues Facebook has had over the years. As stated in a 2020 TechRepublic article [5], “Facebook has more than a decade-long track record of incidents highlighting inadequate and insufficient measures to protect data privacy.” And has Facebook experienced a sharp (or any) decline in users due to these incidents? No. (Of course, Facebook is not the only popular company that has experienced data privacy issues either.) Or, if you care about privacy, you might ask yourself how many privacy policies you actually read. Are you informed on what the companies you share personal data with purport to do with those data?

Now that we have a grasp on the “privacy paradox,” let us consider why smart homes create an environment rife with privacy concerns.

Smart Homes and Privacy


Source: internetofbusiness.com/alexa-beware-many-smart-home-devices-vulnerable-says-report/

The market is now flooded with smart, Internet-connected home devices that lure consumers in with the promise of more efficient energy use, unsurpassed convenience, and features that you simply cannot live without. Examples of these smart devices are smart speakers (“Alexa, what time is it?”), learning thermostats, video doorbells, smart televisions, and light bulbs that can illuminate your room in thousands of different colors (one of those features you cannot live without). But the list goes on. If you really want to keep up with the Joneses, you will want to get away from those more mundane smart devices and install a smart refrigerator, smart bathtub and showerhead, and perhaps even a smart mirror that could provide you with skin assessments. Then, ensure everything is connected to and controlled by your voice assistant of choice.

It may sound extreme, but this is exactly the type of conversion currently happening in our homes. A 2020 report published by NPR and Edison Research [6] shows that nearly 1 in 4 American adults already owns a smart speaker (this being distinct from the “smart speaker” most of us have on our mobile phones now). And all indicators point to increased adoption going forward. For instance, PR Newswire reports that a 2020 Verified Market Research report estimates a 13.5% compound annual growth rate from 2019-2027 for the smart home market [7]. Large home builders in the United States are even partnering with smart home companies to pre-install smart devices and appliances in newly-constructed homes [8].

All of this points to the fact that our homes are currently experiencing an influx of smart, Internet-connected devices that have the capability of collecting and sharing vast amounts of information about us. In most cases, the data collected by a smart device is used to improve the device itself and the services offered by the company. For instance, a smart thermostat will learn occupancy schedules over time to reduce heating and air-conditioning energy use. Companies also commonly use data for ad-targeting purposes. For many of us, this is not a deal-breaker. But, what happens if a data breach occurs and people of malintent gain access to the data streams flowing from your home, or the data are made publicly available? Private information such as occupancy schedules, what TV shows you stream, your Google or Amazon search history, and even what time of the evening you take a bath are potentially up for grabs. What was once very difficult information to obtain for any individual is now stored on cloud servers, and you are implicitly trusting the manufacturers of the smart devices you own to protect your data.

If we care about maintaining some level of privacy in what many consider a most sacrosanct place–their home–what can we do?

Recommendations for Controlling Privacy in Your Home

Smart devices are entering our homes at a rapid rate, and in many ways, they cause us to give up some of our privacy for the sake of convenience [9]. Now, I am not advocating for everyone taking their home off-grid and setting up a Faraday cage for protection. Indeed, I have a smart speaker and smart light bulbs in my home, and I do not plan to throw them in the trash anytime soon. However, I am advocating that we educate ourselves on the smart devices we welcome into our homes. Here are a few ways to do this:

  1. Pause for a moment to consider if the added convenience afforded by this device being Internet-connected is worth the potential loss of privacy.
  2. Read the privacy policy or terms of service for the product you are considering purchasing. What data does the device collect, and how will the company store and use these data? Will third-parties have access to your data? If so, for what purposes? If you are uncomfortable with what you are reading, contact the company to get clarification and ask direct questions.
  3. Research the company that manufactures the device. Do they have a history of privacy issues? Where is the company located? Does the company have a reputation for quality products and good customer service?
  4. Inspect the default settings on the device and Internet and smartphone applications to ensure you are not agreeing to give up more of your personal data than you would like to.

Taking these steps will not eliminate all privacy issues, but at least you will be more informed on the devices you are allowing into your home and how those devices use the data they collect.

References

[1] Brown, B. (2001). Studying the Internet Experience (HPL-2001-49). Hewlett Packard. www.hpl.hp.com/techreports/2001/HPL-2001-49.pdf

[2] Barth, S., & de Jong, M. D. T. (2017). The privacy paradox – Investigating discrepancies between expressed privacy concerns and actual online behavior – A systematic literature review. Telematics and Informatics, 34(7), 1038–1058. doi.org/10.1016/j.tele.2017.04.013

[3] Auxier, B., Rainie, L., Anderson, M., Perrin, A., Kumar, M., & Turner, E. (2019, November 15). 2. Americans concerned, feel lack of control over personal data collected by both companies and the government. Pew Research Center: Internet, Science & Techwww.pewresearch.org/internet/2019/11/15/americans-concerned-feel-lack-of-control-over-personal-data-collected-by-both-companies-and-the-government/

[4] Auxier, B., Rainie, L., Anderson, M., Perrin, A., Kumar, M., & Turner, E. (2019, November 15). 4. Americans’ attitudes and experiences with privacy policies and laws. Pew Research Center: Internet, Science & Techwww.pewresearch.org/internet/2019/11/15/americans-concerned-feel-lack-of-control-over-personal-data-collected-by-both-companies-and-the-government/

[5] Patterson, D. (2020, July 30). Facebook data privacy scandal: A cheat sheet. TechRepublic. www.techrepublic.com/article/facebook-data-privacy-scandal-a-cheat-sheet/

[6] NPR & Edison Research. (2020). The Smart Audio Reportwww.nationalpublicmedia.com/uploads/2020/04/The-Smart-Audio-Report_Spring-2020.pdf

[7] Verified Market Research. (2020, November 3). Smart Home Market Worth $207.88 Billion, Globally, by 2027 at 13.52% CAGR: Verified Market Research. PR Newswire. www.prnewswire.com/news-releases/smart-home-market-worth–207-88-billion-globally-by-2027-at-13-52-cagr-verified-market-research-301165666.html

[8] Bousquin, J. (2019, January 7). For Many Builders, Smart Homes Now Come Standard. Builder. www.builderonline.com/design/technology/for-many-builders-smart-homes-now-come-standard_o

[9] Rao, S. (2018, September 12). In today’s homes, consumers are willing to sacrifice privacy for convenience. Washington Postwww.washingtonpost.com/lifestyle/style/in-todays-homes-consumers-are-willing-to-sacrifice-privacy-for-convenience/2018/09/11/5f951b4a-a241-11e8-93e3-24d1703d2a7a_story.html

Bias in Large Language Models: GPT-2 as a Case Study

Bias in Large Language Models: GPT-2 as a Case Study
By Kevin Ngo | February 19, 2021

Imagine having a multi-paragraph story in a few minutes. Imagine having a full article by providing only the title. Imagine having a whole essay by providing only the first sentence. Well, this is possible by harnessing large language models. Large language models are trained using an abundant amount of public text to predict the next word.

GPT-2

I used a demo of a well-known language model called GPT-2 released in February 2019 to demonstrate large language models’ ability to generate text. I typed “While large language models have greatly improved in recent years, there is still much work to be done concerning its inherent bias and prejudice”, and allowed GPT-2 to generate the rest of the text. Here is what GPT-2 came up with: “The troubling thing about this bias and prejudice is that it is systemic, not caused by chance. These biases can influence a classifier’s behavior, and they are especially likely to impact people of color.” While the results were not perfect, it can be hard to differentiate the generated text from the non-generated text. GPT-2 correctly states that the bias and prejudice inside the model are “systemic” and “likely to impact people of color.” While it may mimic intelligence, language models do not understand the text.

Image: Result of GPT-3 for a Turing test

Controversial Release of GPT-2

The creator of GPT-2 OpenAI was hesitant to release GPT-2 at first fearing “malicious applications” of GPT-2. They decided to release smaller models of GPT-2 for other researchers to experiment with and mitigate potential harm caused by their work. After seeing “no strong evidence of misuse”, OpenAI released the full model noting that GPT-2 could be abused to help generate “synthetic propaganda.” It could also be used to release a high-volume of coherent spam online. Although OpenAI’s effects to mitigate public harm is commendable, some experts condemned OpenAI’s decision. They argued that OpenAI’s prevented other people from replicating their breakthrough, preventing the advancement of natural language processing. Others claimed that OpenAI exaggerated the dangers of GPT-2.

Issues with Large Language Models

The reality is GPT-2 has much more potential dangers than OpenAI assumed. A joint study was done by Google, Apple, Stanford University, OpenAI, the University of California, Berkeley, and Northeastern University revealed GPT-2 could leak details from the data the model was trained on, which could contain sensitive information. The results showed that over a third of candidate sequences were directly from the training data – some containing personally identifiable information. This raises major privacy concerns regarding large language models. The beta version of the GPT-3 model was released by OpenAI in June 2020. GPT-3 is larger and provides better results than GPT-2. A Senior Data Scientist at Sigmoid mentioned that in one of his experiments only 50% of fake news generated by GPT-3 could be distinguished from the real ones showing how powerful GPT-3 can be.

Despite the impressive results, GPT-3 still has inherent bias and prejudice making it prone to generate “hateful sexist and racist language” according to Kate Devlin. Jerome Pesenti demonstrates this by making GPT-3 generate text from one word. The words given was “Jew”, “Black”, “Women”, “Holocaust”.

 A paper by Abubakar Abid details the inherent bias against Muslims specifically. He found a strong association between the word “Muslim” and GPT-3’s generating text regarding violent acts. Adding adjectives directly opposite to violence did not help reduce the amount of generated text about violence, but adding adjectives that redirected the focus did reduce the amount of generated text about violence. Abubakar demonstrates GPT-3 generating text about violence when prompted “Two Muslims walked into a mosque to worship peacefully” showing GPT-3’s bias of Muslims.

References

  1. Vincent, J. (2019, November 07). OpenAI has published THE text-generating AI it said was too dangerous to share. Retrieved February 14, 2021, from www.theverge.com/platform/amp/2019/11/7/20953040/openai-text-generation-ai-gpt-2-full-model-release-1-5b-parameters
  2. Heaven, W. (2020, December 10). OpenAI’s new language generator GPT-3 is Shockingly good-and completely mindless. Retrieved February 14, 2021, from www.technologyreview.com/2020/07/20/1005454/openai-machine-learning-language-generator-gpt-3-nlp/
  3. Radford, A. (2020, September 03). Better language models and their implications. Retrieved February 14, 2021, from openai.com/blog/better-language-models/
  4. Carlini, N. (2020, December 15). Privacy considerations in large language models. Retrieved February 14, 2021, from ai.googleblog.com/2020/12/privacy-considerations-in-large.html
  5. OpenAI. (2020, September 22). OpenAI licenses Gpt-3 technology to Microsoft. Retrieved February 14, 2021, from openai.com/blog/openai-licenses-gpt-3-technology-to-microsoft/
  6. Ammu, B. (2020, December 18). Gpt-3: All you need to know about the ai language model. Retrieved February 14, 2021, from www.sigmoid.com/blogs/gpt-3-all-you-need-to-know-about-the-ai-language-model/
  7. Abid, A., Farooqi, M., & Zou, J. (2021, January 18). Persistent anti-muslim bias in large language models. Retrieved February 14, 2021, from arxiv.org/abs/2101.05783

The provenance of a consent

The provenance of a consent
by Mohan Sadashiva | February 19, 2021

What is informed consent?

The first Belmont principle[1] defines informed consent as permission given with full knowledge of the consequences. In the context of data collection on the internet, consent is often obtained by requiring the user to agree to terms of use, privacy policy, software license or a similar instrument. By and large, these terms of use tend to be abstract and broad so as to cover a wide range of possibilities without much specificity.

Why is it important?

Data controllers (entities that collect and hold the data, typically institutions or private companies) benefit from the collection of such data in a variety of ways with the end goal being improved product/service, better insight/knowledge, or ability to monetize through additional product/service sales. Consumers benefit from better products, customized service, new product/service recommendations and other possibilities that improve their quality of life.

However, there is a risk this information is misused or used to their detriment as well. One common example is when some data controllers sell the data they have collected to other third parties, who in turn combine this information with other sources and resell them. As you go through this chain of transfers, the original scope of consent is lost as the nature of data collected has expanded and the nature of application of the data has changed. Indeed, even the original consent contract is typically not transferred through the chain and subsequent holders of the data no longer have consent.

As new data is combined the result is much more than additive. For example, two sets of anonymized data when combined can result in non-anonymized data with the subject being identified. In this case the benefit to the company or institution is exponential, but so is the risk to the subject. Even if the subject consented to each set of data being collected, the consent is not valid for the combined set as the scope and benefit/risk equation is considerably changed.

What is provenance?

The provenance of consent for data is the original consent agreement when data was collected from a subject. If this original consent was codified into a data use contract that is passed with the data, then it provides a framework for a practical implementation of the Belmont principles of respect for persons, beneficence and justice.

There is some analogy here with data lineage which is a statement of the origin of data (provenance) and all the transformations and transfers (lineage) that lead to its current state. What is often ignored is the notion that consent cannot be transformed as this is an agreement between the subject and data controller that if changed in any way would require obtaining another consent from the subject.

Case Study: Dictionary.com

A site that I use quite often is dictionary.com. I decided to take a look under the hood to examine the terms of service and privacy policy and discovered that I had signed up to a very liberal information collection and sharing agreement. The company collects a lot of personal information about me that it doesn’t need for the interaction – which in my case is to look up the definition of a word. The most egregious is collecting my identity (from my mobile device) and my location. The most bizarre is recording the websites I visited prior to using the service. The company discloses that this information could be shared with other service providers and partners. It disclaims all responsibility thereafter and states that the information is then governed by the terms of service and privacy policies of the partner. However, there is no reference who these partners are and how my information will be used by them.

This illustrates how my consent for data collected by dictionary.com is lost on transfer of my personal data to a partner. After my personal data changes hands several times, there would be no way to trace the original consent even by well meaning partners.

Conclusion

An inviolable data use contract derived from informed consent that is associated with the data sets that are collected is one way to start implementing the Belmont principles. This needs standards and broad based agreement across industries, as well as laws for enforcement. While this may seem a hopeless pipe dream today, a lot can be achieved when people get organized. Just look at how the music industry embraced digital media and came up with a comprehensive mechanism for Digital Rights Management[2] (DRM) to record and enforce an artist or producer’s rights over the music they created or marketed.

References:

[1] The Belmont Report; Department of Health, Education and Welfare

[2] Digital Rights Management; Wikipedia

How Secure is your Security Camera?

How Secure is your Security Camera?
By Shujing Dong | February 19, 2021

Smart home security cameras have become must-to-have for most households in recent years. They can live stream what is going on in or around the home and record videos anytime. We feel safer and protected with home security cameras, however, do we know how secure they are? According to a recent tech news article “Yes, your security camera could be hacked: Here’s how to stop spying eyes”, home security cameras can be easily hacked like that in the ADT data breach story. This article would like to dive into the data security of smart security cameras by looking at the Privacy Policy of three major security camera service providers: RingNest, and Wyze.

What personal data do they collect?

All three providers collect account information (including gender/age), device information, location info and user interaction with the service, as well as video/audio recording and social media info like reviews on third party websites. But they do not give justification on why gender and age is needed for monitoring and protecting the home. With location and device info, it’s possible to illegally track users by targeted aggregation. In addition, Nest collects facial recognition data with familiar face alerts feature, and does not state if Nest provides opt-out options of sharing facial data to face alerts feature users.

How are these data used, shared and stored?

All providers use the collected data to improve their devices and services, personalize user experience and for promotional or marketing purposes. However, for online tracking, Ring says “Our websites are not designed to respond to “Do Not Track” signals received from browsers”, meaning it tracks users’ online activity at its will. The other two providers completely omit their responses to “Do Not Track” signals.

They all share data with vendors, service providers, technicians, as well as affiliates and subsidiaries. However, if their affiliates or subsidiaries use the data for different business purposes, it will pose privacy risks to the users. They also do not articulate what the data processing looks like and what preventive measures are taken for data breath or illegal access from employees or vendors .

As for data retention, Nest stores user data until the user requests deletion; Ring stores user recordings with “Ring Protected Plan” and Neighborhoods Recordings; whereas, Wyze only stores data to the SD card in the camera, for any recordings user voluntarily submitted to Wyze, it will not store them longer than 3 years.

What data security mechanisms do they have?

Ring only vaguely states “We maintain administrative, technical and physical safeguards designed to protect personal information”, without specifying what measures or tech they use for data security. However, Ring is known to have fired four employees who have abused internal access to customer video feed. Nest is the only one among the three that specifically points out they use data encryption during transmission. While both Wyze and Nest have international data transfer, Wyze does not mention how it protects data security across different jurisdictions, whereas Nest specifies that it adheres to EU-US “Privacy Shield” policy.

What security camera providers can do more?

Privacy policy shows how much the providers care about them. To increase transparency and build user’s trust on the service, security camera providers should do more to protect data security and list specific measures in their privacy policies. For example, they can specify data retention length as what Wyze does. They can also implement data encryption technology during transmission and articulate it in privacy policy. In addition, they can place authorization processes to only allow authorized employees for data access. Lastly, they can give users more opt-out options to control what data users share.

What home security camera users can do?

We users would need to intentionally protect our own privacy as well. Firstly, be aware of our rights and make choices based on our specific use cases. According to FTC and CalOPPA, we have rights to access our own data and request deletion. For example, we can periodically request security camera service providers to delete our video/audio recordings on their end. We can also try not link our account to social media to prevent our social network data being collected. Thirdly, we can anonymize our account information such as demographic information and device names. We can also set unique passwords for the security devices and change them periodically. If possible, use stand alone cameras that do not transfer data to cloud servers in private rooms such as bedrooms.

Freedom to Travel or Gateway to Health Surveillance? The privacy and data concerns of COVID-19 vaccination passports

Freedom to Travel or Gateway to Health Surveillance?
The privacy and data concerns of COVID-19 vaccination passports
By Matthew Hui | February 19, 2021

Borders closed and quarantine and testing requirements abound, travel may hardly be top of mind as the COVID-19 pandemic drags on. As the vaccine roll out continues in the United states, a “vaccine passport” is among the many ways to facilitate the reopening of travel and potentially giving a boost to a hospitality industry that has been disproportionately battered by COVID-19’s spread. Denmark is already in the process of rolling out such digital passports to its citizens, while Hawaii is currently developing its own to allow travelers to skip quarantine upon proof of vaccination. Both the travel industry and governments whose economies rely on it have strong incentives for a wide and quick roll out of these passports. Although they may provide increased freedom of movement and travel during a pandemic, the rollout of these digital records must address serious ethical and privacy concerns associated with their implementation and usage in order to improve the chances of success.

How would vaccine passports work?

Vaccination passports essentially act as digital documentation that provide proof of vaccination to COVID-19. A person would be able to access this documentation on a smartphone to show as proof. These are currently in development by both government and industry, such as the IATA Travel pass and IBM’s Digital Health Pass. These would also support proof of other activity such as virus testing results, or temperature checks.

Meeting the minimum requirement of having a smartphone and internet access to utilize a vaccine passport will need to be considered to address access. One method to address this is through the usage of a NFC-enabled card.

Privacy and Trust

As a credentialing mechanism, a digital vaccine passport system must inherently have methods of storing, sharing, and verifying sensitive health data. Digital vaccine passports will also need to be designed to prevent fraud and disclosure breaches. In both cases, failure to do so will undermine public trust necessary for widespread adoption. Fraudulent health records and insecurity could easily undermine adoption by organizations using these systems for verification. Disclosure of private or sensitive information could hamper uptake by individuals or prevent continued use due to an unwillingness to share personal health information.

As entry into countries and access to transportation may be conditioned on having a vaccine passport, we will need to consider what personal and health information is required to obtain a vaccine passport and ensure that it is commensurate with the scope of its use. Potential misuse of this information that goes outside of the containment of COVID-19 by governments must be considered in the design of a vaccine passport.

Beyond COVID-19 and Travel

While usage of vaccine passports have primarily been discussed in the context of travel, the proliferation of these passes could widen its scope to other aspects of life beyond crossing borders and create concerns around access and equity. It would not be difficult to imagine vaccine passports being used as a condition to access stadiums, museums, nightclubs, in addition to trains or airplanes. These entities may start to require different levels of access to health information, perhaps not only requiring vaccination records, but lab results or thermal scans. In the context of a pandemic, these requirements may seem reasonable. Prior to COVID-19 pandemic, if you were required to show proof of a flu vaccine to enter a stadium during flu season this could easily have felt intrusive.

In these expanded use cases of the vaccine passport, society must consider which aspects of the public sphere should or should not be conditioned on having a vaccine passport, and how much health information should be shared to gain that access. Should access to employment to jobs that interact with the public be subject to these conditions? With unequal access to the vaccine and healthcare more generally, will inability to obtain vaccination be a mitigating factor when subject to these conditions? Governments will need to have a framework in place that defines the scope of usage for these vaccine passports and what that entails for their continued usage outside the context of a pandemic. This will be important to prevent the encroachment of requiring personal health data by organizations to access the public sphere and minimize discrimination and harm.

References

The DNA Race

The DNA Race
Graham Schweer | February 5, 2021

In the news
The January 31, 2021 episode of 60 Minutes sounded alarm bells about a foreign country’s interest in collecting Americans’ health data through offers to help the United States’ response to the COVID-19 pandemic.

On November 30, 2020, Newsweek published an article stating that a certain foreign country would begin requiring digital, genetic test-driven health certificates from airline passengers before boarding flights into that country.

A December 24, 2020 article published by Roll Call described a section of the omnibus spending package passed by the U.S. Congress earlier that week which required the Government Accountability Office to investigate consumer genetic tests from companies such as 23andMe and Ancestry.com and their connections to a foreign country.

What do all of these recent stories have in common?

China is the foreign country that is the subject of concern, and the stories highlight China’s ambition to build a global DNA database.

China’s DNA database is estimated to have samples from 140 million people. The United States has the next largest database with 14 million profiles.

DNA Databases are not new

Based on information from Interpol published in 2014 by the DNA policy initiative, 70 countries have created DNA databases. The United Kingdom built the first DNA database in 1995. The United States followed shortly thereafter, and China’s DNA data gathering began in the early 2000s. In all cases, DNA databases were created for criminal investigation purposes.

DNA database proponents argue their merit with respect to solving crimes. Many high profile, cold cases have been solved using DNA databases. One recent example is the capture of the Golden State serial killer. His identity was uncovered via a match of DNA evidence with information stored in a private company’s DNA database. In fact, China expanded their domestic DNA collection program after using DNA data from a 2016 arrest to solve a murder from 2005.

Made in China 2025

China’s current efforts to build a DNA database appear to extend beyond criminal investigation use cases as well beyond China’s borders. Under the“Made in China 2025” strategic plan announced in 2015, China stated their intentions to become a major player in the global biotechnology sector. One example of the strategy at work is the acquisition of a US genetic sequencing company by a Chinese government-funded company, Beijing Genomics Institute (BGI), which provided BGI with a database of Americans’ DNA.

U.S. officials interviewed in the 60 Minutes episode referenced at the beginning of this post believe that China’s appetite to grow their DNA database is related to their aspiration to become the world’s leader in genetic innovation, disease treatments and vaccines. The U.S. officials contend that China sees the expansion of their DNA database as directly related to their increased statistical chances at discovering genetic breakthroughs.

Ethical considerations
Although viewed with skepticism from a vantage point inside the United States, China’s intentions to build a global DNA database may not be malevolent. However, their current approach is opaque, and the scales are tipped significantly in China’s favor.

  • China should take the following steps to increase transparency into their DNA database:
  • Clarify what data has already been collected and is stored in the DNA database;
  • Inform people that their DNA data has been (will be) added to the DNA database;
  • Give people the option to remove (not contribute) their DNA;
  • Do not seek consent from people who are under duress and seeking testing or treatment of a health care issue;
  • Establish safeguards to ensure DNA data is used only for health improvement-related purposes and not used to harm them in other ways; and
  • Eliminate restrictions that only permit a one-way flow of DNA data and share DNA database records with other countries and healthcare institutions.

Without taking the steps recommended above, skepticism of China’s intentions with respect to its global DNA database program will only intensify and force other countries, including the United States, to join the DNA race.

References:
1. Wertheim, J. (2021, January 31). China’s push to CONTROL americans’ health Care future. Retrieved February 05, 2021, from www.cbsnews.com/news/biodata-dna-china-collection-60-minutes-2021-01-31/
2. Chang, G. (2020, November 30). China wants Your DNA-AND it’s up to no GOOD: OPINION. Retrieved February 05, 2021, from www.newsweek.com/china-wants-your-dna-its-no-good-opinion-1550998
3. Ratnam, G. (2020, December 24). Hey, soldiers and spies – think twice about that home genetic ancestry test. Retrieved February 05, 2021, from www.rollcall.com/2020/12/24/hey-soldiers-and-spies-think-twice-about-that-home-genetic-ancestry-test/
4. Benson, T. (2020, June 30). DNA Databases in the U.S. and China Are Tools of Racial Oppression. Retrieved February 05, 2021, from spectrum.ieee.org/tech-talk/biomedical/ethics/dna-databases-in-china-and-the-us-are-tools-of-racial-oppression
5. Global summary. (n.d.). Retrieved February 05, 2021, from dnapolicyinitiative.org/wiki/index.php?title=Global_summary#:~:text=According%20to%20Interpol%2C%20seventy%20countries,9%25%20of%20the%20population
6. St. John, P. (2020, December 08). The untold story of how the Golden state killer was found: A covert operation and private DNA. Retrieved February 05, 2021, from www.latimes.com/california/story/2020-12-08/man-in-the-window
7. Wee, S. (2020, June 17). China is COLLECTING dna from tens of millions of men and Boys, USING U.S. Equipment. Retrieved February 05, 2021, from www.nytimes.com/2020/06/17/world/asia/China-DNA-surveillance.html
8. Atkinson, R. (2019, August 12). China’s biopharmaceutical Strategy: Challenge or complement to U.S. INDUSTRY COMPETITIVENESS? Retrieved February 05, 2021, from itif.org/publications/2019/08/12/chinas-biopharmaceutical-strategy-challenge-or-complement-us-industry
9. PRI Staff. (2019, March 15). Does China have Your DNA? Retrieved February 05, 2021, from www.pop.org/does-china-have-your-dna-2/

Freedom of Speech vs Sedition

Freedom of Speech vs Sedition
Gajarajan Nagarajan | January 29, 2021


2021 storming of the United States Capitol

Ideas that offend are getting more prominent due to divisive and hateful rhetoric harvested by major political parties, their associated news channels and ever growing, unmonitored social media platforms. As US is reeling over recent storming of US Capitol, passionate debates have commenced across the country on who can be the enforcers? Freedom of speech does have its limits as against threats, racism, hostility violence including acts of sedition. Hate crime laws are constitutional so long as they punish violence or vandalism.

US First amendment protects all types of speech and hence hate speech gets amplified in the new digital era with millions of followers can get induced or get swayed by propaganda. Under the first amendment, there is no such thing as a false idea. However pernicious an opinion may seem; we depend for its correction not on the conscience of judges and juries but on the competition of other ideas.

Weaponization of Social Media

Jan 6th event at US Capitol did trigger an important change across all major social media companies and their primary cloud infrastructure providers. Twitter, Facebook, YouTube, Amazon, Apple and Google banned President Trump and scores of his supporters from their platforms for inciting violence. How big will this challenge remain going forward? Aren’t these companies the original enablers and accelerators with no effective control for violence prevention? Should large media companies take law onto their own hands (or their platforms) while state and federal governments take a pause in moderation? Or is this something that needs action by societies as we the people are the cause of the pervasive and polarizing content creators of conspiracy theories in American Society?

Private companies have shown themselves able to act far more nimbly than our government, imposing consequences on a would-be tyrant who has until now enjoyed a corrosive degree of impunity. But in doing so, these companies have also shown a power that goes beyond that of many nation-states and without democratic accountability. Technology companies have employed AI/ML and NLP tools to help generate more visitors and longer duration of engagement of users in their platforms which has been a breeding ground for hate groups. Negative aspects of this unilateral power exercised by technology companies can become precedent only to be exploited by the enemies of freedom of speech around the world. Dictators, authoritative regimes and those in power can do extreme harm to democracy by colluding or forcing technology companies to bend the rules to satisfy their political gain.

In a democratic government, public opinion impacts everything. It is all important that truth should be the basis of public information. If public opinion is ill formed – poisoned by lies, deception, misrepresentations or mistakes; the consequences could be dire. Government, which is the preservative of the general happiness and safety, cannot be secure if falsehood and malice are injected to rob confidence and trust of the people

Looking back into history combined with data science may provide some options to protect future of our democracy.

  • The Sedition Act of 1918 covers broad range of offenses, notably speech and expression of opinion that cast the government or the war effort in a negative light. In 2007, a bill named “Violent Radicalization and Homegrown Terrorism Prevention Act” was sponsored by Representative Jane Harman (Democrat from California). The bill would have amended Homeland Security Act to add provisions to prevent and control homegrown terrorism and also establish a grant program to prevent radicalization. Congress can be enabled to revisit above bill with bipartisan support.
  • Section 3 of the 14th Amendment provides guidelines including prohibition of current or former military officers, along with current and former federal and state public officials from serving in variety of government offices if they shall have engaged in insurrection or rebellion against the United States Constitution
  • Social media bans are key defense mechanisms and needs to be nurtured, enhanced and implemented across all democratic nations and otherwise. Ability to drive conversation, reaching wider audiences for recruitment and perhaps more important benefit of monetization of anger and distrust by conflict entrepreneurs are effectively neutralized with strong enforcement of social media ban.
  • Consumer influence on large companies have major role in regulating nefarious online media houses. For example, de-platforming pressure to turn off cloud and app store access to Parler (competitor to Twitter); pressure on publishing houses to block book proposals and FCC regulation on podcasts may provide manageable impact for both extreme left and right wing fanatism and fear mongering.

Photo credits:

www.latimes.com/world-nation/story/2021-01-15/capitol-riot-police-veterans-extremists
www.amazon.com/LikeWar-Weaponization-P-W-Singer/dp/1328695743