February 2021 – Data Science W231 | Behind the Data: Humans and Values

February 24, 2021

Maintaining Privacy in a Smart Home

Maintaining Privacy in a Smart Home
by Robert Hosbach | February 19, 2021

Source: https://futureiot.tech/more-smart-homes-will-come-online-in-2020/

Privacy is a hot topic. It is a concept that many of us feel we have a right to (a least in democratic societies) and is understandably something that we want to protect. Whatâ€™s mine is mine, after all. But, how do you maintain privacy when you surround yourself with Internet-connected devices? In this post, I will briefly discuss what has come to be known as the â€œprivacy paradox,â€� how smart home devices pose a challenge to privacy, and what we as consumers can do to maintain our privacy while at home.

The “Privacy Paradox”

Source: https://betanews.com/2018/10/08/privacy-paradox/

In 2001, Hewlett Packard published a study [1] about online shopping in which one of the conclusions was that participants claimed to care much about privacy, and yet their actions did not support this claim. The report dubbed this a “privacy paradox,” and this paradox has been studied numerous times over the past two decades with similar results. As one literature review [2] stated, “While users claim to be very concerned about their privacy, they nevertheless undertake very little to protect their personal data.” Myriad potential reasons exist for this. For instance, many consumers gloss over or ignore the fine print by habit; the design and convenience of using the product outweigh perceived harms from using the product; consumers implicitly trust the manufacturer to “do the right thing”; and some consumers remain blissfully ignorant of the extent to which companies use, share, and sell data. Whatever the underlying causes, the Pew Center published survey results in 2019 indicating that upwards of 60% of U.S. adults reported that they feel “very” or “somewhat” concerned about how companies and the government use their personal data [3]; yet, only about 1 in 5 Americans typically read the privacy policies they agree to [4]. And these privacy policies are precisely the documents that consumers should read if they are concerned about their privacy. At a high level, a privacy policy should contain information about what data the company or product collects, how those data are stored and processed, if and how data can be shared with third parties, and how the data are secured, among other things.

Even if you have never heard the term “privacy paradox” before, you can likely think of examples of the paradox in practice. For instance, you might think about Facebook’s Cambridge Analytica debacle, along with the other lower-profile data privacy issues Facebook has had over the years. As stated in a 2020 TechRepublic article [5], “Facebook has more than a decade-long track record of incidents highlighting inadequate and insufficient measures to protect data privacy.” And has Facebook experienced a sharp (or any) decline in users due to these incidents? No. (Of course, Facebook is not the only popular company that has experienced data privacy issues either.) Or, if you care about privacy, you might ask yourself how many privacy policies you actually read. Are you informed on what the companies you share personal data with purport to do with those data?

Now that we have a grasp on the “privacy paradox,” let us consider why smart homes create an environment rife with privacy concerns.

Smart Homes and Privacy

Source: https://internetofbusiness.com/alexa-beware-many-smart-home-devices-vulnerable-says-report/

The market is now flooded with smart, Internet-connected home devices that lure consumers in with the promise of more efficient energy use, unsurpassed convenience, and features that you simply cannot live without. Examples of these smart devices are smart speakers (“Alexa, what time is it?”), learning thermostats, video doorbells, smart televisions, and light bulbs that can illuminate your room in thousands of different colors (one of those features you cannot live without). But the list goes on. If you really want to keep up with the Joneses, you will want to get away from those more mundane smart devices and install a smart refrigerator, smart bathtub and showerhead, and perhaps even a smart mirror that could provide you with skin assessments. Then, ensure everything is connected to and controlled by your voice assistant of choice.

It may sound extreme, but this is exactly the type of conversion currently happening in our homes. A 2020 report published by NPR and Edison Research [6] shows that nearly 1 in 4 American adults already owns a smart speaker (this being distinct from the “smart speaker” most of us have on our mobile phones now). And all indicators point to increased adoption going forward. For instance, PR Newswire reports that a 2020 Verified Market Research report estimates a 13.5% compound annual growth rate from 2019-2027 for the smart home market [7]. Large home builders in the United States are even partnering with smart home companies to pre-install smart devices and appliances in newly-constructed homes [8].

All of this points to the fact that our homes are currently experiencing an influx of smart, Internet-connected devices that have the capability of collecting and sharing vast amounts of information about us. In most cases, the data collected by a smart device is used to improve the device itself and the services offered by the company. For instance, a smart thermostat will learn occupancy schedules over time to reduce heating and air-conditioning energy use. Companies also commonly use data for ad-targeting purposes. For many of us, this is not a deal-breaker. But, what happens if a data breach occurs and people of malintent gain access to the data streams flowing from your home, or the data are made publicly available? Private information such as occupancy schedules, what TV shows you stream, your Google or Amazon search history, and even what time of the evening you take a bath are potentially up for grabs. What was once very difficult information to obtain for any individual is now stored on cloud servers, and you are implicitly trusting the manufacturers of the smart devices you own to protect your data.

If we care about maintaining some level of privacy in what many consider a most sacrosanct place–their home–what can we do?

Recommendations for Controlling Privacy in Your Home

Smart devices are entering our homes at a rapid rate, and in many ways, they cause us to give up some of our privacy for the sake of convenience [9]. Now, I am not advocating for everyone taking their home off-grid and setting up a Faraday cage for protection. Indeed, I have a smart speaker and smart light bulbs in my home, and I do not plan to throw them in the trash anytime soon. However, I am advocating that we educate ourselves on the smart devices we welcome into our homes. Here are a few ways to do this:

Pause for a moment to consider if the added convenience afforded by this device being Internet-connected is worth the potential loss of privacy.
Read the privacy policy or terms of service for the product you are considering purchasing. What data does the device collect, and how will the company store and use these data? Will third-parties have access to your data? If so, for what purposes? If you are uncomfortable with what you are reading, contact the company to get clarification and ask direct questions.
Research the company that manufactures the device. Do they have a history of privacy issues? Where is the company located? Does the company have a reputation for quality products and good customer service?
Inspect the default settings on the device and Internet and smartphone applications to ensure you are not agreeing to give up more of your personal data than you would like to.

Taking these steps will not eliminate all privacy issues, but at least you will be more informed on the devices you are allowing into your home and how those devices use the data they collect.

References

[1] Brown, B. (2001). Studying the Internet Experience (HPL-2001-49). Hewlett Packard. https://www.hpl.hp.com/techreports/2001/HPL-2001-49.pdf

[2] Barth, S., & de Jong, M. D. T. (2017). The privacy paradox â€“ Investigating discrepancies between expressed privacy concerns and actual online behavior â€“ A systematic literature review. Telematics and Informatics, 34(7), 1038â€“1058. https://doi.org/10.1016/j.tele.2017.04.013

[3] Auxier, B., Rainie, L., Anderson, M., Perrin, A., Kumar, M., & Turner, E. (2019, November 15). 2. Americans concerned, feel lack of control over personal data collected by both companies and the government. Pew Research Center: Internet, Science & Tech. https://www.pewresearch.org/internet/2019/11/15/americans-concerned-feel-lack-of-control-over-personal-data-collected-by-both-companies-and-the-government/

[4] Auxier, B., Rainie, L., Anderson, M., Perrin, A., Kumar, M., & Turner, E. (2019, November 15). 4. Americansâ€™ attitudes and experiences with privacy policies and laws. Pew Research Center: Internet, Science & Tech. https://www.pewresearch.org/internet/2019/11/15/americans-concerned-feel-lack-of-control-over-personal-data-collected-by-both-companies-and-the-government/

[5] Patterson, D. (2020, July 30). Facebook data privacy scandal: A cheat sheet. TechRepublic. https://www.techrepublic.com/article/facebook-data-privacy-scandal-a-cheat-sheet/

[6] NPR & Edison Research. (2020). The Smart Audio Report. https://www.nationalpublicmedia.com/uploads/2020/04/The-Smart-Audio-Report_Spring-2020.pdf

[7] Verified Market Research. (2020, November 3). Smart Home Market Worth $207.88 Billion, Globally, by 2027 at 13.52% CAGR: Verified Market Research. PR Newswire. https://www.prnewswire.com/news-releases/smart-home-market-worth–207-88-billion-globally-by-2027-at-13-52-cagr-verified-market-research-301165666.html

[8] Bousquin, J. (2019, January 7). For Many Builders, Smart Homes Now Come Standard. Builder. https://www.builderonline.com/design/technology/for-many-builders-smart-homes-now-come-standard_o

[9] Rao, S. (2018, September 12). In todayâ€™s homes, consumers are willing to sacrifice privacy for convenience. Washington Post. https://www.washingtonpost.com/lifestyle/style/in-todays-homes-consumers-are-willing-to-sacrifice-privacy-for-convenience/2018/09/11/5f951b4a-a241-11e8-93e3-24d1703d2a7a_story.html

February 24, 2021

Bias in Large Language Models: GPT-2 as a Case Study

Bias in Large Language Models: GPT-2 as a Case Study
By Kevin Ngo | February 19, 2021

Imagine having a multi-paragraph story in a few minutes. Imagine having a full article by providing only the title. Imagine having a whole essay by providing only the first sentence. Well, this is possible by harnessing large language models. Large language models are trained using an abundant amount of public text to predict the next word.

GPT-2

I used a demo of a well-known language model called GPT-2 released in February 2019 to demonstrate large language models’ ability to generate text. I typed “While large language models have greatly improved in recent years, there is still much work to be done concerning its inherent bias and prejudice”, and allowed GPT-2 to generate the rest of the text. Here is what GPT-2 came up with: “The troubling thing about this bias and prejudice is that it is systemic, not caused by chance. These biases can influence a classifier’s behavior, and they are especially likely to impact people of color.” While the results were not perfect, it can be hard to differentiate the generated text from the non-generated text. GPT-2 correctly states that the bias and prejudice inside the model are “systemic” and “likely to impact people of color.” While it may mimic intelligence, language models do not understand the text.

Image: Result of GPT-3 for a Turing test

Controversial Release of GPT-2

The creator of GPT-2 OpenAI was hesitant to release GPT-2 at first fearing “malicious applications” of GPT-2. They decided to release smaller models of GPT-2 for other researchers to experiment with and mitigate potential harm caused by their work. After seeing “no strong evidence of misuse”, OpenAI released the full model noting that GPT-2 could be abused to help generate “synthetic propaganda.” It could also be used to release a high-volume of coherent spam online. Although OpenAI’s effects to mitigate public harm is commendable, some experts condemned OpenAI’s decision. They argued that OpenAI’s prevented other people from replicating their breakthrough, preventing the advancement of natural language processing. Others claimed that OpenAI exaggerated the dangers of GPT-2.

Issues with Large Language Models

The reality is GPT-2 has much more potential dangers than OpenAI assumed. A joint study was done by Google, Apple, Stanford University, OpenAI, the University of California, Berkeley, and Northeastern University revealed GPT-2 could leak details from the data the model was trained on, which could contain sensitive information. The results showed that over a third of candidate sequences were directly from the training data – some containing personally identifiable information. This raises major privacy concerns regarding large language models. The beta version of the GPT-3 model was released by OpenAI in June 2020. GPT-3 is larger and provides better results than GPT-2. A Senior Data Scientist at Sigmoid mentioned that in one of his experiments only 50% of fake news generated by GPT-3 could be distinguished from the real ones showing how powerful GPT-3 can be.

Despite the impressive results, GPT-3 still has inherent bias and prejudice making it prone to generate “hateful sexist and racist language” according to Kate Devlin. Jerome Pesenti demonstrates this by making GPT-3 generate text from one word. The words given was “Jew”, “Black”, “Women”, “Holocaust”.

A paper by Abubakar Abid details the inherent bias against Muslims specifically. He found a strong association between the word “Muslim” and GPT-3’s generating text regarding violent acts. Adding adjectives directly opposite to violence did not help reduce the amount of generated text about violence, but adding adjectives that redirected the focus did reduce the amount of generated text about violence. Abubakar demonstrates GPT-3 generating text about violence when prompted “Two Muslims walked into a mosque to worship peacefully” showing GPT-3’s bias of Muslims.

References

Vincent, J. (2019, November 07). OpenAI has published THE text-generating AI it said was too dangerous to share. Retrieved February 14, 2021, from https://www.theverge.com/platform/amp/2019/11/7/20953040/openai-text-generation-ai-gpt-2-full-model-release-1-5b-parameters
Heaven, W. (2020, December 10). OpenAI’s new language generator GPT-3 is Shockingly good-and completely mindless. Retrieved February 14, 2021, from https://www.technologyreview.com/2020/07/20/1005454/openai-machine-learning-language-generator-gpt-3-nlp/
Radford, A. (2020, September 03). Better language models and their implications. Retrieved February 14, 2021, from https://openai.com/blog/better-language-models/
Carlini, N. (2020, December 15). Privacy considerations in large language models. Retrieved February 14, 2021, from https://ai.googleblog.com/2020/12/privacy-considerations-in-large.html
OpenAI. (2020, September 22). OpenAI licenses Gpt-3 technology to Microsoft. Retrieved February 14, 2021, from https://openai.com/blog/openai-licenses-gpt-3-technology-to-microsoft/
Ammu, B. (2020, December 18). Gpt-3: All you need to know about the ai language model. Retrieved February 14, 2021, from https://www.sigmoid.com/blogs/gpt-3-all-you-need-to-know-about-the-ai-language-model/
Abid, A., Farooqi, M., & Zou, J. (2021, January 18). Persistent anti-muslim bias in large language models. Retrieved February 14, 2021, from https://arxiv.org/abs/2101.05783

February 24, 2021

The provenance of a consent

The provenance of a consent
by Mohan Sadashiva | February 19, 2021

What is informed consent?

The first Belmont principle[1] defines informed consent as permission given with full knowledge of the consequences. In the context of data collection on the internet, consent is often obtained by requiring the user to agree to terms of use, privacy policy, software license or a similar instrument. By and large, these terms of use tend to be abstract and broad so as to cover a wide range of possibilities without much specificity.

Why is it important?

Data controllers (entities that collect and hold the data, typically institutions or private companies) benefit from the collection of such data in a variety of ways with the end goal being improved product/service, better insight/knowledge, or ability to monetize through additional product/service sales. Consumers benefit from better products, customized service, new product/service recommendations and other possibilities that improve their quality of life.

However, there is a risk this information is misused or used to their detriment as well. One common example is when some data controllers sell the data they have collected to other third parties, who in turn combine this information with other sources and resell them. As you go through this chain of transfers, the original scope of consent is lost as the nature of data collected has expanded and the nature of application of the data has changed. Indeed, even the original consent contract is typically not transferred through the chain and subsequent holders of the data no longer have consent.

As new data is combined the result is much more than additive. For example, two sets of anonymized data when combined can result in non-anonymized data with the subject being identified. In this case the benefit to the company or institution is exponential, but so is the risk to the subject. Even if the subject consented to each set of data being collected, the consent is not valid for the combined set as the scope and benefit/risk equation is considerably changed.

What is provenance?

The provenance of consent for data is the original consent agreement when data was collected from a subject. If this original consent was codified into a data use contract that is passed with the data, then it provides a framework for a practical implementation of the Belmont principles of respect for persons, beneficence and justice.

There is some analogy here with data lineage which is a statement of the origin of data (provenance) and all the transformations and transfers (lineage) that lead to its current state. What is often ignored is the notion that consent cannot be transformed as this is an agreement between the subject and data controller that if changed in any way would require obtaining another consent from the subject.

Case Study: Dictionary.com

A site that I use quite often is dictionary.com. I decided to take a look under the hood to examine the terms of service and privacy policy and discovered that I had signed up to a very liberal information collection and sharing agreement. The company collects a lot of personal information about me that it doesn’t need for the interaction – which in my case is to look up the definition of a word. The most egregious is collecting my identity (from my mobile device) and my location. The most bizarre is recording the websites I visited prior to using the service. The company discloses that this information could be shared with other service providers and partners. It disclaims all responsibility thereafter and states that the information is then governed by the terms of service and privacy policies of the partner. However, there is no reference who these partners are and how my information will be used by them.

This illustrates how my consent for data collected by dictionary.com is lost on transfer of my personal data to a partner. After my personal data changes hands several times, there would be no way to trace the original consent even by well meaning partners.

Conclusion

An inviolable data use contract derived from informed consent that is associated with the data sets that are collected is one way to start implementing the Belmont principles. This needs standards and broad based agreement across industries, as well as laws for enforcement. While this may seem a hopeless pipe dream today, a lot can be achieved when people get organized. Just look at how the music industry embraced digital media and came up with a comprehensive mechanism for Digital Rights Management[2] (DRM) to record and enforce an artist or producer’s rights over the music they created or marketed.

References:

[1] The Belmont Report; Department of Health, Education and Welfare

[2] Digital Rights Management; Wikipedia

February 24, 2021

How Secure is your Security Camera?

How Secure is your Security Camera?
By Shujing Dong | February 19, 2021

Smart home security cameras have become must-to-have for most households in recent years. They can live stream what is going on in or around the home and record videos anytime. We feel safer and protected with home security cameras, however, do we know how secure they are? According to a recent tech news article “Yes, your security camera could be hacked: Here’s how to stop spying eyes”, home security cameras can be easily hacked like that in the ADT data breach story. This article would like to dive into the data security of smart security cameras by looking at the Privacy Policy of three major security camera service providers: Ring, Nest, and Wyze.

What personal data do they collect?

All three providers collect account information (including gender/age), device information, location info and user interaction with the service, as well as video/audio recording and social media info like reviews on third party websites. But they do not give justification on why gender and age is needed for monitoring and protecting the home. With location and device info, it’s possible to illegally track users by targeted aggregation. In addition, Nest collects facial recognition data with familiar face alerts feature, and does not state if Nest provides opt-out options of sharing facial data to face alerts feature users.

How are these data used, shared and stored?

All providers use the collected data to improve their devices and services, personalize user experience and for promotional or marketing purposes. However, for online tracking, Ring says “Our websites are not designed to respond to “Do Not Track” signals received from browsers”, meaning it tracks users’ online activity at its will. The other two providers completely omit their responses to “Do Not Track” signals.

They all share data with vendors, service providers, technicians, as well as affiliates and subsidiaries. However, if their affiliates or subsidiaries use the data for different business purposes, it will pose privacy risks to the users. They also do not articulate what the data processing looks like and what preventive measures are taken for data breath or illegal access from employees or vendors .

As for data retention, Nest stores user data until the user requests deletion; Ring stores user recordings with “Ring Protected Plan” and Neighborhoods Recordings; whereas, Wyze only stores data to the SD card in the camera, for any recordings user voluntarily submitted to Wyze, it will not store them longer than 3 years.

What data security mechanisms do they have?

Ring only vaguely states “We maintain administrative, technical and physical safeguards designed to protect personal information”, without specifying what measures or tech they use for data security. However, Ring is known to have fired four employees who have abused internal access to customer video feed. Nest is the only one among the three that specifically points out they use data encryption during transmission. While both Wyze and Nest have international data transfer, Wyze does not mention how it protects data security across different jurisdictions, whereas Nest specifies that it adheres to EU-US “Privacy Shield” policy.

What security camera providers can do more?

Privacy policy shows how much the providers care about them. To increase transparency and build user’s trust on the service, security camera providers should do more to protect data security and list specific measures in their privacy policies. For example, they can specify data retention length as what Wyze does. They can also implement data encryption technology during transmission and articulate it in privacy policy. In addition, they can place authorization processes to only allow authorized employees for data access. Lastly, they can give users more opt-out options to control what data users share.

What home security camera users can do?

We users would need to intentionally protect our own privacy as well. Firstly, be aware of our rights and make choices based on our specific use cases. According to FTC and CalOPPA, we have rights to access our own data and request deletion. For example, we can periodically request security camera service providers to delete our video/audio recordings on their end. We can also try not link our account to social media to prevent our social network data being collected. Thirdly, we can anonymize our account information such as demographic information and device names. We can also set unique passwords for the security devices and change them periodically. If possible, use stand alone cameras that do not transfer data to cloud servers in private rooms such as bedrooms.

February 23, 2021

Freedom to Travel or Gateway to Health Surveillance? The privacy and data concerns of COVID-19 vaccination passports

Freedom to Travel or Gateway to Health Surveillance?
The privacy and data concerns of COVID-19 vaccination passports
By Matthew Hui | February 19, 2021

Borders closed and quarantine and testing requirements abound, travel may hardly be top of mind as the COVID-19 pandemic drags on. As the vaccine roll out continues in the United states, a “vaccine passport” is among the many ways to facilitate the reopening of travel and potentially giving a boost to a hospitality industry that has been disproportionately battered by COVID-19’s spread. Denmark is already in the process of rolling out such digital passports to its citizens, while Hawaii is currently developing its own to allow travelers to skip quarantine upon proof of vaccination. Both the travel industry and governments whose economies rely on it have strong incentives for a wide and quick roll out of these passports. Although they may provide increased freedom of movement and travel during a pandemic, the rollout of these digital records must address serious ethical and privacy concerns associated with their implementation and usage in order to improve the chances of success.

How would vaccine passports work?

Vaccination passports essentially act as digital documentation that provide proof of vaccination to COVID-19. A person would be able to access this documentation on a smartphone to show as proof. These are currently in development by both government and industry, such as the IATA Travel pass and IBM’s Digital Health Pass. These would also support proof of other activity such as virus testing results, or temperature checks.

Meeting the minimum requirement of having a smartphone and internet access to utilize a vaccine passport will need to be considered to address access. One method to address this is through the usage of a NFC-enabled card.

Privacy and Trust

As a credentialing mechanism, a digital vaccine passport system must inherently have methods of storing, sharing, and verifying sensitive health data. Digital vaccine passports will also need to be designed to prevent fraud and disclosure breaches. In both cases, failure to do so will undermine public trust necessary for widespread adoption. Fraudulent health records and insecurity could easily undermine adoption by organizations using these systems for verification. Disclosure of private or sensitive information could hamper uptake by individuals or prevent continued use due to an unwillingness to share personal health information.

As entry into countries and access to transportation may be conditioned on having a vaccine passport, we will need to consider what personal and health information is required to obtain a vaccine passport and ensure that it is commensurate with the scope of its use. Potential misuse of this information that goes outside of the containment of COVID-19 by governments must be considered in the design of a vaccine passport.

Beyond COVID-19 and Travel

While usage of vaccine passports have primarily been discussed in the context of travel, the proliferation of these passes could widen its scope to other aspects of life beyond crossing borders and create concerns around access and equity. It would not be difficult to imagine vaccine passports being used as a condition to access stadiums, museums, nightclubs, in addition to trains or airplanes. These entities may start to require different levels of access to health information, perhaps not only requiring vaccination records, but lab results or thermal scans. In the context of a pandemic, these requirements may seem reasonable. Prior to COVID-19 pandemic, if you were required to show proof of a flu vaccine to enter a stadium during flu season this could easily have felt intrusive.

In these expanded use cases of the vaccine passport, society must consider which aspects of the public sphere should or should not be conditioned on having a vaccine passport, and how much health information should be shared to gain that access. Should access to employment to jobs that interact with the public be subject to these conditions? With unequal access to the vaccine and healthcare more generally, will inability to obtain vaccination be a mitigating factor when subject to these conditions? Governments will need to have a framework in place that defines the scope of usage for these vaccine passports and what that entails for their continued usage outside the context of a pandemic. This will be important to prevent the encroachment of requiring personal health data by organizations to access the public sphere and minimize discrimination and harm.

References

https://www.sfgate.com/hawaii/article/Hawaii-is-developing-a-vaccine-passport-to-15957610.php
https://www.nytimes.com/2021/02/04/travel/coronavirus-vaccine-passports.html
https://www.iata.org/en/programs/passenger/travel-pass/
https://www.covidcreds.org
https://www.ibm.com/blogs/watson-health/health-pass-puts-privacy-first/

February 9, 2021

The DNA Race

The DNA Race
Graham Schweer | February 5, 2021

In the news
The January 31, 2021 episode of 60 Minutes sounded alarm bells about a foreign country’s interest in collecting Americans’ health data through offers to help the United States’ response to the COVID-19 pandemic.

On November 30, 2020, Newsweek published an article stating that a certain foreign country would begin requiring digital, genetic test-driven health certificates from airline passengers before boarding flights into that country.

A December 24, 2020 article published by Roll Call described a section of the omnibus spending package passed by the U.S. Congress earlier that week which required the Government Accountability Office to investigate consumer genetic tests from companies such as 23andMe and Ancestry.com and their connections to a foreign country.

What do all of these recent stories have in common?

China is the foreign country that is the subject of concern, and the stories highlight China’s ambition to build a global DNA database.

China’s DNA database is estimated to have samples from 140 million people. The United States has the next largest database with 14 million profiles.

DNA Databases are not new

Based on information from Interpol published in 2014 by the DNA policy initiative, 70 countries have created DNA databases. The United Kingdom built the first DNA database in 1995. The United States followed shortly thereafter, and China’s DNA data gathering began in the early 2000s. In all cases, DNA databases were created for criminal investigation purposes.

DNA database proponents argue their merit with respect to solving crimes. Many high profile, cold cases have been solved using DNA databases. One recent example is the capture of the Golden State serial killer. His identity was uncovered via a match of DNA evidence with information stored in a private company’s DNA database. In fact, China expanded their domestic DNA collection program after using DNA data from a 2016 arrest to solve a murder from 2005.

Made in China 2025

China’s current efforts to build a DNA database appear to extend beyond criminal investigation use cases as well beyond China’s borders. Under the“Made in China 2025” strategic plan announced in 2015, China stated their intentions to become a major player in the global biotechnology sector. One example of the strategy at work is the acquisition of a US genetic sequencing company by a Chinese government-funded company, Beijing Genomics Institute (BGI), which provided BGI with a database of Americans’ DNA.

U.S. officials interviewed in the 60 Minutes episode referenced at the beginning of this post believe that China’s appetite to grow their DNA database is related to their aspiration to become the world’s leader in genetic innovation, disease treatments and vaccines. The U.S. officials contend that China sees the expansion of their DNA database as directly related to their increased statistical chances at discovering genetic breakthroughs.

Ethical considerations
Although viewed with skepticism from a vantage point inside the United States, China’s intentions to build a global DNA database may not be malevolent. However, their current approach is opaque, and the scales are tipped significantly in China’s favor.

China should take the following steps to increase transparency into their DNA database:
Clarify what data has already been collected and is stored in the DNA database;
Inform people that their DNA data has been (will be) added to the DNA database;
Give people the option to remove (not contribute) their DNA;
Do not seek consent from people who are under duress and seeking testing or treatment of a health care issue;
Establish safeguards to ensure DNA data is used only for health improvement-related purposes and not used to harm them in other ways; and
Eliminate restrictions that only permit a one-way flow of DNA data and share DNA database records with other countries and healthcare institutions.

Without taking the steps recommended above, skepticism of China’s intentions with respect to its global DNA database program will only intensify and force other countries, including the United States, to join the DNA race.

References:
1. Wertheim, J. (2021, January 31). China’s push to CONTROL americans’ health Care future. Retrieved February 05, 2021, from https://www.cbsnews.com/news/biodata-dna-china-collection-60-minutes-2021-01-31/
2. Chang, G. (2020, November 30). China wants Your DNA-AND it’s up to no GOOD: OPINION. Retrieved February 05, 2021, from https://www.newsweek.com/china-wants-your-dna-its-no-good-opinion-1550998
3. Ratnam, G. (2020, December 24). Hey, soldiers and spies – think twice about that home genetic ancestry test. Retrieved February 05, 2021, from https://www.rollcall.com/2020/12/24/hey-soldiers-and-spies-think-twice-about-that-home-genetic-ancestry-test/
4. Benson, T. (2020, June 30). DNA Databases in the U.S. and China Are Tools of Racial Oppression. Retrieved February 05, 2021, from https://spectrum.ieee.org/tech-talk/biomedical/ethics/dna-databases-in-china-and-the-us-are-tools-of-racial-oppression
5. Global summary. (n.d.). Retrieved February 05, 2021, from http://dnapolicyinitiative.org/wiki/index.php?title=Global_summary#:~:text=According%20to%20Interpol%2C%20seventy%20countries,9%25%20of%20the%20population
6. St. John, P. (2020, December 08). The untold story of how the Golden state killer was found: A covert operation and private DNA. Retrieved February 05, 2021, from https://www.latimes.com/california/story/2020-12-08/man-in-the-window
7. Wee, S. (2020, June 17). China is COLLECTING dna from tens of millions of men and Boys, USING U.S. Equipment. Retrieved February 05, 2021, from https://www.nytimes.com/2020/06/17/world/asia/China-DNA-surveillance.html
8. Atkinson, R. (2019, August 12). China’s biopharmaceutical Strategy: Challenge or complement to U.S. INDUSTRY COMPETITIVENESS? Retrieved February 05, 2021, from https://itif.org/publications/2019/08/12/chinas-biopharmaceutical-strategy-challenge-or-complement-us-industry
9. PRI Staff. (2019, March 15). Does China have Your DNA? Retrieved February 05, 2021, from https://www.pop.org/does-china-have-your-dna-2/

February 3, 2021

Freedom of Speech vs Sedition

Freedom of Speech vs Sedition
Gajarajan Nagarajan | January 29, 2021

2021 storming of the United States Capitol

Ideas that offend are getting more prominent due to divisive and hateful rhetoric harvested by major political parties, their associated news channels and ever growing, unmonitored social media platforms. As US is reeling over recent storming of US Capitol, passionate debates have commenced across the country on who can be the enforcers? Freedom of speech does have its limits as against threats, racism, hostility violence including acts of sedition. Hate crime laws are constitutional so long as they punish violence or vandalism.

US First amendment protects all types of speech and hence hate speech gets amplified in the new digital era with millions of followers can get induced or get swayed by propaganda. Under the first amendment, there is no such thing as a false idea. However pernicious an opinion may seem; we depend for its correction not on the conscience of judges and juries but on the competition of other ideas.

Weaponization of Social Media

Jan 6th event at US Capitol did trigger an important change across all major social media companies and their primary cloud infrastructure providers. Twitter, Facebook, YouTube, Amazon, Apple and Google banned President Trump and scores of his supporters from their platforms for inciting violence. How big will this challenge remain going forward? Aren’t these companies the original enablers and accelerators with no effective control for violence prevention? Should large media companies take law onto their own hands (or their platforms) while state and federal governments take a pause in moderation? Or is this something that needs action by societies as we the people are the cause of the pervasive and polarizing content creators of conspiracy theories in American Society?

Private companies have shown themselves able to act far more nimbly than our government, imposing consequences on a would-be tyrant who has until now enjoyed a corrosive degree of impunity. But in doing so, these companies have also shown a power that goes beyond that of many nation-states and without democratic accountability. Technology companies have employed AI/ML and NLP tools to help generate more visitors and longer duration of engagement of users in their platforms which has been a breeding ground for hate groups. Negative aspects of this unilateral power exercised by technology companies can become precedent only to be exploited by the enemies of freedom of speech around the world. Dictators, authoritative regimes and those in power can do extreme harm to democracy by colluding or forcing technology companies to bend the rules to satisfy their political gain.

In a democratic government, public opinion impacts everything. It is all important that truth should be the basis of public information. If public opinion is ill formed – poisoned by lies, deception, misrepresentations or mistakes; the consequences could be dire. Government, which is the preservative of the general happiness and safety, cannot be secure if falsehood and malice are injected to rob confidence and trust of the people

Looking back into history combined with data science may provide some options to protect future of our democracy.

The Sedition Act of 1918 covers broad range of offenses, notably speech and expression of opinion that cast the government or the war effort in a negative light. In 2007, a bill named “Violent Radicalization and Homegrown Terrorism Prevention Act” was sponsored by Representative Jane Harman (Democrat from California). The bill would have amended Homeland Security Act to add provisions to prevent and control homegrown terrorism and also establish a grant program to prevent radicalization. Congress can be enabled to revisit above bill with bipartisan support.
Section 3 of the 14th Amendment provides guidelines including prohibition of current or former military officers, along with current and former federal and state public officials from serving in variety of government offices if they shall have engaged in insurrection or rebellion against the United States Constitution
Social media bans are key defense mechanisms and needs to be nurtured, enhanced and implemented across all democratic nations and otherwise. Ability to drive conversation, reaching wider audiences for recruitment and perhaps more important benefit of monetization of anger and distrust by conflict entrepreneurs are effectively neutralized with strong enforcement of social media ban.
Consumer influence on large companies have major role in regulating nefarious online media houses. For example, de-platforming pressure to turn off cloud and app store access to Parler (competitor to Twitter); pressure on publishing houses to block book proposals and FCC regulation on podcasts may provide manageable impact for both extreme left and right wing fanatism and fear mongering.

Photo credits:

https://www.latimes.com/world-nation/story/2021-01-15/capitol-riot-police-veterans-extremists
https://www.amazon.com/LikeWar-Weaponization-P-W-Singer/dp/1328695743

February 3, 2021February 3, 2021

Ethical Implications with Autonomous Vehicles

Ethical Implications with Autonomous Vehicles
Surya Gutta | January 29, 2021

Introduction
Autonomous vehicles are poised to revolutionize the transportation industry as they could dramatically reduce automotive accidents. Apart from saving human lives, they can reduce billions of dollars in accident damages in the U.S.[1] They could also give people ample free time and increase productivity by removing time wasted driving. The cost of ride-sharing also decreases as labor accounts for roughly 60%[2] of the taxi business’s total cost.

Autonomous vehicles use either Radar or LiDAR sensors data to detect obstacles, such as human beings, supporting the Advanced Driver Assistance Systems (ADAS). ADAS allows a vehicle to operate autonomously in an environment (other vehicles, bicyclists, pedestrians, traffic signals, and obstacles in the scene). Autonomous vehicles process large amounts of data generated by these sensors, real-time traffic data, and personal data that includes locations, start and stop times.

source: freepik.com

Ethical challenges

Data collection and analysis: Autonomous vehicles collect large amounts of data. The sensors collect human beings’ images (ex: human being/pedestrian as an obstacle in front of the car) without the user’s consent. There is no regulation on how much data can be collected. Once the data is collected, there are no regulations on who can access that data and how it is distributed and stored. Moreover, there will be many implications of a data breach. The collected data can be used for other purposes, without the users’ consent, leading to unintended consequences. The data variation due to human body size and shape might influence the autonomous software’s decision.

source: freepik.com

Quality of vehicle sensors: Sensors are one of the costly components in autonomous vehicles. High-end sensors increase the cost drastically. If the vehicle purchase price increases beyond a specific limit in certain countries, there won’t be incentives from the local government to the vehicle owners. To minimize the cost, vehicle manufacturers might not use all the required sensors[3] at the expense of increased risk to human beings.

source: freepik.com

Jobs: While autonomous vehicles will create jobs in engineering and customer service [4,5], many driver jobs could be lost as there won’t be any need for drivers. More than 3 million taxi, truck, and bus drivers may lose their livelihoods and professions in the U.S.[6] As the accidents decrease due to autonomous vehicles (95% of recent accidents are due to human error[7]), the importance of vehicle insurance might decrease. Also, people working in collision repair centers and chiropractic care centers might lose jobs. People might opt for autonomous ride-shares compared to public transit services[8] because of the cheaper prices offered by autonomous ride-shares, which will impact the jobs in public transit services. What happens to the people dependent on the construction and maintenance of the public transit system? Also, ample parking spaces might not be required, and people either directly or indirectly dependent on them will lose their livelihood. Even though there is a lot of time before autonomous vehicles take over so that the impacted people can change their careers, it’s hard for some people due to their age, family circumstances, etc.

Regulations and Guidelines
Most of the current regulations[9] on the safety of motor vehicles are based on the assumption of humans driving vehicles. New regulations [10,11] should be adopted where ethics should be given utmost importance starting from the vehicle’s design to its adoption in society. Also, there should be transparency on the algorithms being used and data being collected by the autonomous vehicles.

There should be a uniform policy on what data can be collected and how it can be used. The federal government should regulate the data privacy [12] as the vehicle manufacturer can promise to de-identify personal information [13] (what time a user left home and to where the user went), but due to different standards maintained by different manufacturers, there is a risk that some of them will allow re-identification. Since autonomous vehicles are in the early stages, there are many unanswered questions like what’s the expected behavior if the sensors fail? When an accident occurs, who is at fault? The owner or the manufacturer of the autonomous vehicle? All these need to be considered while coming up with regulations and guidelines.

Policymakers should act now to prepare for and minimize disruptions to the millions of jobs due to autonomous vehicles that may come in the future. There should be a timeline to come up with new regulations and guidelines protecting humans and their privacy.

References
[1] Ramsey, M. (2015, March 5). Self-Driving Cars Could Cut 90% of Accidents. WSJ; Wall Street Journal. https://www.wsj.com/articles/self-driving-cars-could-cut-down-on-accidents-study-says-1425567905
[2] Noonan, K. (2019, September 30). What Does the Future Hold for Self-Driving Cars? The Motley Fool; The Motley Fool. https://www.fool.com/investing/what-does-the-future-hold-for-self-driving-cars.aspx
[3] Insider Q&A: Velodyne advocates for safer self-driving cars. (2019, May 19). AP NEWS. https://apnews.com/article/714640aa989846c5bd32cfd12b0e3b9d
[4] Alison DeNisco Rayome. (2019, January 11). Self-driving cars will create 30,000 engineering jobs that the US can’t fill. TechRepublic; TechRepublic. https://www.techrepublic.com/article/self-driving-cars-will-create-30000-engineering-jobs-that-the-us-cant-fill/
[5] Gray, R. (n.d.). Driving your career towards a booming sector. Www.bbc.com. https://www.bbc.com/worklife/article/20181029-driving-your-career-towards-a-boom-sector
[6] Balakrishnan, A. (2017, May 22). Self-driving cars could cost America’s professional drivers up to 25,000 jobs a month, Goldman Sachs says; CNBC. https://www.cnbc.com/2017/05/22/goldman-sachs-analysis-of-autonomous-vehicle-job-loss.html
[7] Crash Stats: Critical Reasons for Crashes Investigated in the National Motor Vehicle Crash Causation Survey. (2015). https://crashstats.nhtsa.dot.gov/Api/Public/ViewPublication/812115
[8] Will autonomous cars change the role and value of public transportation? (2015, June 23). The Transport Politic. https://www.thetransportpolitic.com/2015/06/23/will-autonomous-cars-change-the-role-and-value-of-public-transportation/
[9] Laws and Regulations- As a Federal agency, NHTSA regulates the safety of motor vehicles and related equipment. (2016, August 16). NHTSA. https://www.nhtsa.gov/laws-regulations
[10] Dot/NHTSA Policy Statement Concerning Automated Vehicles 2016 Update to ‘preliminary statement of policy concerning automated vehicles’.(2016). Nhtsa.gov. http://www.nhtsa.gov/staticfiles/rulemaking/pdf/Autonomous-Vehicles-Policy-Update-2016.pdf
[11] NHTSA Federal Automated Vehicles Policy. (2016). https://www.transportation.gov/sites/dot.gov/files/docs/AV%20policy%20guidance%20PDF.pdf
[12] Office, U. S. G. A. (2014). In-Car Location-Based Services: Companies Are Taking Steps to Protect Privacy, but Some Risks May Not Be Clear to Consumers. Www.Gao.Gov, GAO-14-81. https://www.gao.gov/products/GAO-14-81
[13] Goodman, E. P. (2017, July 14). Self-driving cars: overlooking data privacy is a car crash waiting to happen. The Guardian; The Guardian. https://www.theguardian.com/technology/2016/jun/08/self-driving-car-legislation-drones-data-security

February 2, 2021

Never Let Them See You Sweat

Never Let Them See You Sweat
Steve Dille | February 2, 2021

The global pandemic hasn’t been bad for one company. Peloton, the maker of internet and social media connected exercise bikes has seen an explosion of demand from exercise shut-ins. Peloton bikes let you stream live classes, communicate with other riders, and integrate with social media. President Biden rides a Peloton which has raised some security eyebrows with the NSA. So, just how secure and private is your information on Pelton? Here are answers to some common questions.

How Visible am I?
The Peloton bike has a camera and microphone. But, can Peloton instructors watch me workout and hear me? According to the Peloton Privacy Policy, the camera and microphone can only be activated by you to accept a video chat from another user. The instructors cannot see you.

What Data does Peloton Collect?
When you set up your profile, Peloton asks you to provide information such as a username, email address, weight, height, age, location, birthday, phone number and an image. Only the email address and username are required. Payment information is collected for the monthly subscription but only stored at secure third-party processors.

Peloton also collects information about your exercise participation – date, class, time, total output, and heart rate monitor information. Peloton user profiles are set to public by default, allowing other registered Peloton users to view your fitness performance history, leaderboard name, location and age (if provided). Those users can also contact or follow you through the Peloton service. You have the option to set your profile to “Private,” so only members you approve as followers can see your profile and fitness history.

As you navigate the service, certain passive information is collected through cookies. Peloton uses personal information and other information about you to create anonymized, aggregated demographic, location and device information. This information is used to measure rider interest and usage of various features of the Peloton services.

Does Peloton Sell My Information to Advertisers?
Peloton’s privacy policy states “We currently do not “sell” your information as we understand this term.” However, they seem to “share” your information. The privacy policy contains a section on “Marketing – Interest-Based Advertising and Third-Party Marketing.” Peloton does make your data available for interest-based advertising and may use it in making services available to you that would seem of interest. Peloton enables you to minimize sharing of your information with third parties for marketing purposes with this form.

What About Pelton and Social Media?
This is an area where your privacy can be violated in ways hard to envision if you chose to participate. Peloton offers publicly accessible blogs, social media pages, private messages, video chat, community forums and the ability to connect to Facebook and other fitness gadgets like Fitbit. When you disclose information about yourself in any of these areas, Peloton collects and stores the information. Further, if you choose to submit content, to any public area of the Peloton Service or any other public sites, such content will be considered “public” and will not be subject to the Peloton privacy protections. This can be problematic for riders posting their new personal record to an instructor’s Facebook page. Whether they realize it, they just made some previously private profile information public.

Once you start connecting your Peloton information to social networks, it becomes very possible for others to piece information together about you. For example, Amazon, has a leaderboard group called “Pelozonians.” When you join that group, it is now known that you work at Amazon to anyone on Peloton or the free app.

What Can I Do to Protect My Privacy?
Configuring your settings wrong can allow others to look into your personal information. Remember, your default profile is public so make sure you don’t include private information you don’t want shared like city or age. Better yet, set your profile to private. Make sure your username isn’t easily associated with you offline or on social media so others can’t piece together information about you. Do you really need to post your rides on Facebook? This just opens another complex layer of connection between your personal life and information on Peloton. Remember to use the forms from Peloton to opt out of interest-based advertising.

The Peloton is a wonderful bike requiring a “privacy” update to an old, humorous politeness adage. Today, when you meet someone new, it’s now impolite to ask their age, weight or Peloton leaderboard name.

Peloton Privacy Policy
https://www.onepeloton.com/privacy-policy

Peloton Terms of Service
https://www.onepeloton.com/terms-of-service