November 2021 – Data Science W231 | Behind the Data: Humans and Values

November 4, 2021

The Appeal and the Dangers of Digital ID for Refugees Surveillance

The Appeal and the Dangers of Digital ID for Refugees Surveillance
By Joshua Noble | October 29, 2021

Digitization of national identity is growing in popularity as governments across the world seek to modernize access to services and streamline their own data stores. Refugees, especially those coming from war-torn areas where they have had to flee at short notice with few belongings or those who have made dangerous or arduous journeys, often lack any form of ID. Governments are often unwilling to provide ID to the stateless since they often have not determined whether they will allow a displaced person to stay and in some cases the stateless person may not want to stay in that country. Many agencies are beginning to explore non-state Digital ID as a way of providing some identity to stateless persons, among them the UNHCR, Red Cross, International Rescue Committee, and the UN Migration Agency. For instance, a UNHCR press release states: “UNHCR is currently rolling out its Population Registration and Identity Management EcoSystem (PRIMES), which includes state of the art biometrics.”

The need for a way for a stateless person to identify themselves is made all the more urgent by approaches that governments have begun to take to identifying refugees. Governments are increasingly using migrants’ electronic devices as verification tools. This practice is made easier with the use of mobile extraction tools, which allow an individual to download key data from a smartphone, including contacts, call data, text messages, stored files, location information, and more. In 2018, the Austrian government approved a law forcing asylum seekers to hand over their phones so authorities could check their origin, with the aim of determining if their asylum request should be invalidated if they were found to have previously entered another EU country.

NGO provided ID initiatives may convince governments to abandon or curtail these highly privacy invasive strategies. But while the intention of these initiatives is often charitable and seeking to help provide assistance to refugees, they have the challenge of many attempts to uniquely identify users or persons: access to services is often tied to the creation of the ID itself. For a person who is stateless, homeless, and in need of aid, arriving in a new country and shuttled to a camp, this can feel like coercion. There is an absence of informed consent on the part of refugees. The agencies creating these data subjects often fail to adequately educate them on what data is being collected and how it will be stored. Once data is collected, refugees face extensive bureaucratic challenges if they want to change or update that data. Agencies creating the data offer little in the way of transparency around how data is stored, used, and offered and most importantly, with whom it might be shared both inside and outside of the organizations collecting the data.

Recently as NGOs and aid agencies fled Afghanistan as the US military abandoned the country, thousands of Afghans who had worked with those organization agencies began to worry that biometric databases and their own digital history might be used by the Taliban to track and target them. In another example of the risks of using biometric data, the UNHCR shared information on Rohingya refugees with the government of Bangladesh. The Bangladeshi government then sent that same data to Myanmar to verify people for possible repatriation. Both of these cases identify the real and present risk that creating and storing biometric data and ID can pose.

While the need for ID and the benefits that it can provide are both valid concerns, the challenge of ad hoc and temporary institutions providing those IDs and collecting and storing data associated with them presents not only privacy risks to refugees but often real and present physical danger as well.

UNHCR. 2018. “UNHCR Strategy on Digital Identity and Inclusion” [https://www.unhcr.org/blogs/wp-content/uploads/sites/48/2018/03/2018-02-Digital-Identity_02.pdf](https://www.unhcr.org/blogs/wp-content/uploads/sites/48/2018/03/2018-02-Digital-Identity_02.pdf)

IOM & APSCA. 2018. 5th border management and identity conference (BMIC) on technical cooperation and capacity building. Bangkok: BMIC. [http://cb4ibm.iom.int/bmic5/assets/documents/5BMIC-Information-Brochure.pdf](http://cb4ibm.iom.int/bmic5/assets/documents/5BMIC-Information-Brochure.pdf).

Red Cross 510. 2018 An Initiative of the Netherlands Red Cross Is Exploring the Use of Self Managed Identity in Humanitarian Aid with Tykn.Tech. [https://www.510.global/510-x-tykn-press-release/](https://www.510.global/510-x-tykn-press-release/)

UNHCR. 2018. Bridging the identity divide – is portable user-centric identity management the answer? [https://www.unhcr.org/blogs/bridging-identity-divide-portable-user-centric-identity-management-answer/](https://www.unhcr.org/blogs/bridging-identity-divide-portable-user-centric-identity-management-answer/)

Data&Society 2020, “Digital Identity in the Migration & Refugee Context” [https://datasociety.net/wp-content/uploads/2019/04/DataSociety_DigitalIdentity.pdf](https://datasociety.net/wp-content/uploads/2019/04/DataSociety_DigitalIdentity.pdf)

November 4, 2021

India’s National Health ID – Losing Privacy with Consent

India’s National Health ID – Losing Privacy with Consent
By Anonymous | October 29, 2021

Source: Ayushman Bharat Digital Mission (ABDM)

“Every Indian will be given a Health ID,” Prime Minister Narendra Modi promised on India’s Independence Day this year, adding, “This Health ID will work like a health account for every Indian. Your every test, every disease – which doctor, which medicine you took, what diagnosis was there, when they were taken, what was their report – all this information will be included in your Health ID.”[1] The 14 digit Health ID will be linked to a health data consent manager – used to seek patient’s consent for connecting and sharing of health information across healthcare facilities (hospitals, laboratories, insurance companies, online pharmacies, telemedicine firms).

Technology Is The Answer, But What Was The Question?
India’s leadership of the landmark resolution on digital health by the World Health Organization (WHO) has been recognized globally. With a growing population widening the gap between number of health‑care professionals and patients (0.7 doctors per 1000 patients[3]) and with increasing cost of health care, investing in technology to enable health‑care delivery seems to be the approach to leapfrog public health in India. And, the National Digital Health Mission (NDHM) is India’s first big step in improving India’s health care system and a move towards universal health coverage.

PM Modi says “This mission will play a big role in overcoming problems faced by the poor and middle class in accessing treatment”[4]. It aims to digitize medical treatment facilities by connecting millions of hospitals. The Health ID will be free of cost and completely voluntary. Citizens will be able to manage their records in a private, secure, and confidential environment. The analysis of population health data will lead to better planning, budgeting and implementation for states and health programmes, helping save costs and improve treatment. But, with all its well intentions, this hasty rush to do something may actually be disconnected with ground reality and challenges abound.

Consent May Not Be The Right Way To Handle Data Privacy Issues
Let’s start with ‘voluntary’ consent. The government might be playing a digital sleight of hand here. Earlier this month, the Supreme Court of India issued notices to the Government seeking removal of the requirement for a National ID (Aadhar) from the government’s CoWin app. The CoWin app is used to schedule COVID vaccine appointments. For registration, Aadhar is voluntary (you can use a Driver’s License), but the app makes Aadhar required to generate a certificate[5]. You must be thinking what National ID has to do with National Digital Health ID? During its launch of National Digital Health ID, the government automatically created health ids for individuals that used the National ID for scheduling a vaccine appointment. 122 million (approx. 98%) of 124 million IDs generated have been for people registered on CoWin. Most recipients of the vaccine were not aware that their unique Health ID had been generated[6].

Then there is the issue of ‘forced’ consent. Each year, 63 million Indians are pushed into poverty due to healthcare costs[7] i.e. two citizens every second, and 50% of the population lives in poverty (3.1 USD per day). One of the stated benefits of Health ID is that it will be used to determine distribution of benefits under Government’s health welfare schemes. So if you are dependent on Government schemes or looking to participate, you have to create a Health ID and link it with the National ID. As Amulya Nidhi of the non-profit People’s Health Movement puts it “People’s vulnerability while seeking health services may be misused to get consent. Informed consent is a real issue when people are poor, illiterate or desperate[8].”

Good Digital Data Privacy Is Hard To Get Right
Finally, there is the matter of ‘privacy regulation’, the NDHM depends on a Personal Data Protection Bill (PDP) which overhauls the outdated Information Technology Act 2000. After two years of deliberation the PDP is yet to be passed, and 124 million Health IDs have already been generated. Moreover, principles such as qualified consent and specific user rights have no legal precedence in India[9]. In its haste, the Government has moved forward without a robust legal framework to protect health data. And without a data protection law or an independent data protection authority, there are few safeguards and no recourse when rights are violated.

The lack of PDP could lead to misuse of data by private firms and bad actors. It may happen that an insurance agency chooses to grant coverage only to customers willing to link their Health IDs and share digitised records. Similarly, they may offer incentives to those who share medical history and financial details for customised insurance premium plans[10]. Or, they may even reject insurance applications and push up premium rates for those with pre-existing medical conditions. If insurance firms, hospitals etc. demand health IDs, it will become mandatory, even if not required by law.

The New Normal: It’s All Smoke and Mirrors
In closing, medical data will lead to better planning, cost optimization, and implementation for health programs. But without a robust legal framework, the regulatory gap poses implementation challenges for a National Digital Health ID. Moreover, the government has to rein in intimidatory data collection practices else people will have no choice but to consent to access essential resources which they are entitled to. Lastly as the GDPR explains, consent is freely given, specific, informed and an unambiguous indication of the data subject’s wishes. The Government of India needs to decouple initiatives and remove any smoke and mirrors, so people are clearly informed about what they are agreeing to in each case. In the absence of such efforts, there will be one added ‘new normal’ for India – losing privacy with consent.

References:
1. Mehrotra Karishma (2020). PM Announces Health ID for Every Indian. The Indian Express. Accessed on October 25, 2001 from: https://indianexpress.com/article/india/narendra-modi-health-id-coronavirus-independence-day-address-6556559/
2. Bertalan Mesko et al (2017). Digital Health is a Cultural Transformation of Traditional Healthcare. Mhealth. Accessed on October 25, 2001 from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5682364/
3. Anup Karan et al (2021). Size, composition and distribution of health workforce in India. Human Resources for Health. Accessed on October 25, 2001 from: https://human-resources-health.biomedcentral.com/articles/10.1186/s12960-021-00575-2
4. Kaunain Sheriff (2021). PM Modi launches Ayushman Bharat Digital Mission. The Indian Express. Accessed on October 25, 2001 from: https://indianexpress.com/article/india/narendra-modi-pradhan-mantri-ayushman-bharat-digital-health-mission-7536669/
5. Ashlin Mathew (2021). Modi government issuing national health ID stealthily without informed consent. National Herald. Accessed on October 25, 2001 from: https://www.nationalheraldindia.com/india/modi-government-issuing-national-health-id-stealthily-without-informed-consent
6. Regina Mihindukulasuriya (2021). By 2025, rural India will likely have more internet users than urban India. ThePrint. Accessed on October 25, 2001 from: https://theprint.in/tech/by-2025-rural-india-will-likely-have-more-internet-users-than-urban-india/671024/
7. Vidhi Doshi (2018). India is rolling out a health-care plan for half a billion people. But are there enough doctors? Washington Post. Accessed on October 25, 2001 from: https://www.washingtonpost.com/world/2018/08/14/india-is-rolling-out-healthcare-plan-half-billion-people-are-there-enough-doctors/
8. Rina Chandran (2020). Privacy concerns as India pushes digital health plan, ID. Reuters. Accessed on October 25, 2001 from: https://www.reuters.com/article/us-india-health-tech/privacy-concerns-as-india-pushes-digital-health-plan-id-idUSKCN26D00B
9. Shahana Chatterji et al (2021). Balancing privacy concerns under India’s Integrated Unique Health ID. The Hindu. Accessed on October 25, 2001 from: https://www.thehindubusinessline.com/opinion/balancing-privacy-concerns-under-indias-integrated-unique-health-id/article36760885.ece
10. Mithun MK (2021). How the Health ID may impact insurance for patients with pre-existing conditions. The News Minute. Accessed on October 25, 2001 from: https://www.thenewsminute.com/article/how-health-id-may-impact-insurance-patients-pre-existing-conditions-156306

November 4, 2021

Social Media Analytics for Security : Freedom of Speech vs Government Surveillance

Social Media Analytics for Security : Freedom of Speech vs Government Surveillance
By Nitin Pillai | October 29, 2021

Introduction

The U.S. Department of Homeland Security (DHS) U.S. Customs and Border Protection (CBP) takes steps to ensure the safety of its facilities and personnel from natural disasters, threats of violence, and other harmful events and activities. For aiding these efforts, CBP personnel monitor publicly available social media to provide situational awareness and to monitor potential threats or dangers to CBP personnel and facility operators. CBP may collect publicly available information posted on social media sites to create reports and disseminate information related to personnel and facility safety. CBP conducted a Privacy Impact Assessment (PIA) because, as part of this initiative, CBP may incidentally collect, maintain, and disseminate personally identifiable information (PII) over the course of these activities.

Social Media Surveillance’s impact on Privacy

The Privacy Impact Assessment (PIA) states that CBP searches public social media posts to bolster the agency’s “situational awareness”—which includes identifying “natural disasters, threats of violence, and other harmful events and activities” that may threaten the safety of CBP personnel or facilities, including ports of entry. The PIA aims to inform the public of privacy and related free speech risks associated with CBP’s collection of personally identifiable information (PII) when monitoring social media. CBP claims it only collects PII associated with social media—including a person’s name, social media username, address or approximate location, and publicly available phone number, email address, or other contact information—when “there is an imminent threat of loss of life, serious bodily harm, or credible threats to facilities or systems.”

Chilling Effect on Free Speech
CBP’s social media surveillance poses a risk to the free expression rights of social media users. The PIA claims that CBP is only monitoring public social media posts, and thus individuals retain the right and ability to refrain from making information public or to remove previously posted information from their respective social media accounts. While social media users retain control of their privacy settings, CBP’s policy chills free speech by causing people to self-censor including not expressing their public opinions on the Internet for fear that CBP could collect their PII for discussing a topic of interest to CBP. Additionally, people running anonymous social media accounts might be afraid that PII collected could lead to their true identities being unmasked. This chilling effect is made worse by the fact that CBP does not notify users when their PII is collected. CBP also may share information with other law enforcement agencies, which could result in immigration consequences or being added to a government watchlist.

CBP’s Practices Don’t Mitigate Risks to Free Speech
The PIA claims that any negative impacts on free speech of social media surveillance are mitigated by both CBP policy and the Privacy Act’s prohibition on maintaining records of First Amendment activity. Yet, these supposed safeguards ultimately provide little protection.

Collecting information in emergency situations and to ensure public safety undoubtedly are important, but CBP collects vast amounts of irrelevant information – far beyond what would be required for emergency awareness – by amassing all social media posts that include matches to designated keywords. Additionally, CBP agents may use “situational awareness” information for “link analysis,” that is, identifying possible associations among data points, people, groups, events, and investigations. While that kind of analysis could be useful for uncovering criminal networks, in the hands of an agency that categorizes protests and immigration advocacy as dangerous, it may be used to track activist groups and political protesters.

Conclusion

Some argue that society must “balance” freedom and safety, and that in order to better protect ourselves from those who would do us harm, we have to give up some of our liberties. This might be a false choice in many areas. Especially in the world of data analysis, liberty does not have to be sacrificed to enhance security.

Freedom of speech is a critical stitch in the fabric of democracy. The public needs to know more about how agencies are gathering our data, what they’re doing with it, any policies that govern this surveillance, and the tools agencies use, including algorithmic surveillance and machine learning techniques. A single Facebook post or tweet may be all it takes to place someone on a watchlist, with effects that can range from repeated, invasive screening at airports to detention and questioning in the United States or abroad.

Our government should be fostering, not undermining our ability to maintain obscurity in our online personas for multiple reasons, including individual privacy, security, and consumer protection.

References :

1. Privacy Impact Assessment for Publicly Available Social Media Monitoring and Situational Awareness Initiative – DHS/CBP/PIA-058
https://www.dhs.gov/sites/default/files/publications/privacy-pia-cbp58-socialmedia-march2019.pdf
2. CBP’s new social media Surveillance : A Threat to Free Speech and Privacy

CBP’s New Social Media Surveillance: A Threat to Free Speech and Privacy

3. We’re demanding the government come clean on surveillance of social media
https://www.aclu.org/blog/privacy-technology/internet-privacy/were-demanding-government-come-clean-surveillance-social

November 4, 2021

Time flies when ISPs are having fun

Time flies when ISPs are having fun
By Anonymous | October 29, 2021

More than four years have passed since US Congress repealed FCC rules bringing essential privacy protections to ISP consumers. This is a matter affecting millions of Americans, and measures need to be taken so consumers are not left at their own peril and big corporations’ mercy while accessing the Internet.

**What Happened?**

In March 2017, as the country transitioned from Obama’s 2nd term to newly
elected President Trump, without much alarm the, US Congress repealed regulation providing citizens with privacy protections when using ISP and broadband services. The main area concerning the regulation was to inhibit ISP appetite to freely collect, aggregate and sell consumer data, including web browsing history.

The repeal was a massive victory for ISPs such as Verizon, Comcast and AT&T and a blow to consumers’ privacy rights. Not only was the “wild west” privacy status quo maintained, but it also impeded the FCC from trying to submit any similar regulations to congress (!) in the future.

The main argument for repealing this regulation was the FTC traditionally being the agency regulating corporate/business privacy affairs. Also by regulating ISPs, it was argued the FCC would put them at disadvantage when compared to FTC regulated web services such as Google, Apple, Yahoo and such. Never mind the ISP business model is based on charging for access and bandwidth, not monetization via data brokerage or advertising services. And never mind FCC newly appointed chair – Ajit Pai – who recommended for voting against its own regulatory agency, was a former lawyer for Verizon.[1]

So four years have passed and the FTC has not issued, nor it is expected to issue any robust privacy regulatory frameworks on ISP privacy. Consumers are left into privacy limbo and states scrambling to pass related laws [2]. How bad is it, and can can be done?

**What can ISPs see **

The Internet – a network of networks – is an open architecture of technologies
and services, where information flows thru its participant nodes in little
virtual envelopes called “packets”\*. Every information-containing packet
passing thru any of the network’s edges (known as routers), can be inspected and have its source address, destination address and information content (known as payload) known.

Since the ISP is your first node entering the Internet (also known as default
gateway), this node presents a great opportunity to collect data about everything sent or received by households. This complete visibility risk is only mitigated by the usage of encryption, which prevents any nodes (except the sender and receiver) from seeing packets’ contents. As long as encryption is being used (think of HTTPS, for example), payload is not visible to ISPs.

The good news is that encryption is becoming more pervasive across all internet domains. As of early 2021, 90% of internet traffic is encrypted, and the trend is still upward.

But even with encryption present ISPs can collect a lot of information. ISPs
have to route your packages after all, so they know exactly with whom you are
communicating to and from, along with how many packages are being exchanged and their timestamps. ISPs can easily deduct when one is for, example, watching Netflix movies, despite your communication to Netflix being encrypted.

In addition to the transport of information packets per se, there is another
venue ISPs use to collect data: Domain Name Services (DNS). Every time one needs to go to a domain (say by visiting URL [www.nyt.com](http://www.nyt.com)), the translation of that domain to routable IP addresses is visible to the ISP, either by it providing the DNS service (which usually is a default setting), or examining DNS traffic (TCP port 53). ISPs can easily collect important web browsing usage in this fashion.

Beyond what is known to be used by ISPs to collect usage data, some technologies could also be used. ISPs could use technics such as sophisticated traffic fingerprinting [3] and in extreme cases even deep packet inspection, or other some nefarious techniques such as Verizon’s infamous X-UIDH’s [4]. Fingerprinting is how for example, ISPs were supposed to detect movies being shared illegally via torrent streams, a failed imposition by the Record Industry Association of America (RIAA) [5]. While it is speculative that ISPs could be resorting to such technologies, it is important to notice that abuses by ISPs occurred in the past, so without specific regulations, the potential danger remains.

**So what can you do?**

Since our legislators failed to protect us, ‘some do-it-yourself work is
needed’. And some of these actions requite a good level of caution.

Opt-in was one of the most important FCC provisions repealed in 2017, so an
opt-out action from the consumer is needed:

Another measure is to configure your home router (or each individual device) so it no longer uses the ISP as the DNS server, and make DNS traffic encrypted. Here one needs to be careful selecting a DNS provider, otherwise you are at the mercy of the same privacy risks. Make sure you select a DNS service with good privacy. For example CloudFlare DNS (server “1.1.1.1”) privacy can be found here: https://developers.cloudflare.com/1.1.1.1/privacy/public-dns-resolver

Setting up private DNS on Android device. Credits: cloudflare

For a complete “cloak” of your traffic, making it virtually invisible to the ISP
one can use a VPN services. These services will make internet traffic extremely difficult for your ISP to analyze. Except for volumetrics, the ISP will not have much information about your traffic. The drawback is that a VPN service provider in turn can see all your traffic, just like the ISP. So one has to be EXTREMELY diligent selecting this type of services. Some of these providers are incorporated abroad in countries with lax regulations, with varying degrees of privacy assurance. For example, vendor NordVPN is incorporated and regulated in Panama, while “ExpressVPN” has its privacy independently audited by renowned company PwC.

Last but most importantly, it is important to contact your representative and
voice your concern about the current state ISP privacy. At the current state of
affairs the FCC has its arms tied by congress, and the FTC has done very little
to protect consumers privacy. As mid-terms elections approach, this is a good
time to make your voice be heard. Your representative along ways of contact can be found here: https://www.house.gov/representatives/find-your-representative

References:

[1] <https://www.reuters.com/article/us-usa-internet-trump-idUSKBN1752PR>

[2] <https://www.ncsl.org/research/telecommunications-and-information-technology/2019-privacy-legislation-related-to-internet-service-providers.aspx>

[3] <https://www.ndss-symposium.org/wp-content/uploads/2017/09/website-fingerprinting-internet-scale.pdf>

[4] https://www.eff.org/deeplinks/2014/11/verizon-x-uidh

[5] https://www.pcworld.com/article/516230/article-4652.html

November 3, 2021

Privacy Computing

Privacy Computing
By Anonymous | October 29, 2021

The collection, use, and sharing of user data can enable companies to better judge users’ needs and provide better services to customers. From the perspective of contextual integrity [1], all the above are reasonable. However, studying the multi-dimensional privacy model [2] and privacy classification method [3], there are many privacy risks in the processing and sharing of user data, such as data abuse, third-party leakage, data blackmail, and so on. Due to the protection of the value of data and the protection of user privacy authorization by enterprises and institutions, data is stored in different places, and it is difficult to effectively connect and interact with each other. Traditional commercial agreements cannot effectively protect the security of data. Once the original data is out of the database, it will face the risk of completely losing control. A typical negative case is the Cambridge Gate incident on Facebook. The two parties follow the agreement: Facebook will transfer tens of millions of user data to Cambridge Analytica for academic research [4]. However, once the original data was released, it was completely out of control and used for non-academic purposes, resulting in huge fines facing Facebook. It is needed to provide a more secure solution from the technical level to ensure that the data usage rights are subdivided in the process of data circulation and collaboration.

“Privacy computing” is a new computing theory and method for protecting the entire life cycle of private information [5]. Privacy leakage, privacy protection and privacy calculation models along with the separation of the right to use the axiom system and other methods, are used to protect the information while using it. Privacy computing is essentially to solve data service problems such as data circulation and data application on the premise of protecting data privacy. The concept of privacy computing includes: “data is available but not visible, data does not move the model moves”, “data is available but invisible, data is controllable and measurable”, “not sharing data, but sharing the value of data” and so on. According to the main related technologies of privacy computing technology in the market, it can be divided into three categories: multi-party secure computing, trusted hardware, and federated learning.

Federated learning is a distributed machine learning technology and system that includes two or more participants. It allows people to perform specific algebraic operations on plaintext data to get the result that is encrypted, and the result obtained by decrypting it is the same as the result of performing the same operation on the plaintext. These participants conduct joint machine learning through a secure algorithm protocol and can jointly model and provide model reasoning and prediction services in the form of intermediate data exchange. And the model effect obtained in this way is almost the same as the effect of the traditional central machine learning model, as shown in Fig.1.

Secure multi-party computation is a technology and system that can safely calculate agreed functions without requiring participants to share their own data and without a trusted third party. Through security algorithms and protocols, participants encrypt or convert data in plain text before providing the data to other parties. No participant can access other parties’ data in plain text, thus ensuring the security of all parties’ data, as shown in Fig.2.

Trusted computing includes a security root of trust that is first created, and then a chain of trust from the hardware platform, operating system to the application system is established. On this chain of trust, the first level of certification is measured from the root, and the first level of trust is the first level. This realizes the step-by-step expansion of trust, thereby constructing a safe and trustworthy computing environment. A trusted computing system consists of a root of trust, a trusted hardware platform, a trusted operating system, and a trusted application. Its goal is to improve the security of the computing platform.

With the increasing attention in various fields, privacy computing has become a hot emerging technology and a hot track for business and capital competition. Data circulation is a key link to release the value of data, and privacy computing technology provides a solution for data circulation. The development of privacy computing has certain advantages and a broad application space. However, due to the imperfect technology development, it also faces some problems. Whether it is innovation breakthroughs realized by engineering or optimization and adaptation between software and hardware, the performance improvement of privacy computing has a long way to go.

References:
【1】 Helen Nissenbaum, “Privacy as Contextual Integrity”, Washington Law Review, Volume 79, Number 1 Symposium: Technology, Values, and the Justice System, Feb 1, 2004.
【2】 Daniel J. Solove, “A Taxonomy of Privacy”, The University of Pennsylvania Law Review, Vol. 154, No. 3, pp. 477-564, 2006. https://doi.org/10.2307/40041279.
【3】 Mulligan Deirdre K., Koopman Colin and Doty Nick 2016, “Privacy is an essentially contested concept: a multi-dimensional analytic for mapping privacy”, Phil. Trans. R. Soc. A.3742016011820160118 http://doi.org/10.1098/rsta.2016.0118
【4】 Confessore, Nicholas (April 4, 2018). “Cambridge Analytica and Facebook: The Scandal and the Fallout So Far”. The New York Times. ISSN 0362-4331. Retrieved May 8, 2020.
【5】 F. Li, H. Li, B. Niu, J. Chen,” Privacy Computing: Concept, Computing Framework, and Future Development Trends”, journal of engineering 5, 1179-1192, 2019.

November 3, 2021

Alternative credit scoring – a rocky road to credit

Alternative credit scoring – a rocky road to credit
By Teerapong Ninvoraskul | October 29, 2021

Aimee, a food truck owner in the Philippines, was able to expand her business after getting access to a loan. She opened a second business where she sells beauty products on the side. Stories like Aimee were common in the Philippines, where 85% of the formal Filipino population is outside of the formal banking system.

“Aimee makes money, she’s clearly got an entrepreneurial spirit, but previously had no way of getting a forma bank to cooperate” said Shivani Siroya, founder and CEO of Tala, a fintech company providing financial access to individuals and small businesses.

Loan providers usines alternative credit scoring like Tala is spreading fast through developing countries. In just a few years China’s Ant Financial, an affiliate of Alibaba Group, has built up an extensive scoring system, called Zhima Credit (or Sesame Credit), covering 325m people.

Alternative credit scoring could be viewed as a development in building a loan-default prediction system. Unlike the traditional credit score system which determines consumers’ possibilities of default using financial information such as payment history, alternative scoring models use their behaviors on the Internet to predict default rates.

Personal information such as email, devices used, time of the day when browsing, IP address, purchase history, etc. are collected. These data are found to be correlated with loan default rate.

Alternative credit scoring

Financial access for the unbanked
Historically, lower income is the market segment which is too costly for traditional banking to serve, given its small ticket size, expensive infrastructure investment required, and high default rates.

For this market segment, traditional credit-scorers have limited data to work with. They could use payment records for services that are provided first and paid later, such as utilities, cable TV or internet. Such proven payment data are a good guide to default risk in the absence of credit history. In most cases, this specialized score is the only possible channel to get credible scores for consumers that were un-scorable based on traditional credit data alone.

In smaller and poorer countries with no financial infrastructure, credit-scorers have even more limited financial data to work with. Utilities are registered to households, not individuals, if they are registered at all. Thanks to high penetration of pay-as-you-go mobile phones among the poor, rapidly emerging alternative lenders are able to look at payment records for mobile phones.

New breed of startups spot opportunities to bring these data-driven, algorithm-based approaches to offer services to individual and small businesses. Tala, which operates in India, Mexico, the Philippines and east Africa, says it uses over 10,000 data points collected from a customer’s smartphone to determine whether to grant a loan. It has lent more than $2.7 billion to over 6 million customers since 2014.

With inexpensive cost structure and lower loan default rates, these fintech startups achieve attractive investment returns and are able to provide cost-efficient financing to the previously unbanked and underbanked.

Lack of transparency & fairness challenges
Despite benefits in expanding financial inclusion, alternative credit scoring presents new challenges that raises issues of transparency and fairness.

First, it is harder to explain to people seeking credit than traditional scores. While consumers generally have some sense of how their financial behavior affects their traditional credit scores, it may not always be readily apparent to consumers, or even to regulators, what specific information is utilized by certain alternative credit scoring systems, how such use impacts a consumer’s ability to secure a loan or its pricing, and what behavioral changes consumers might take to improve their credit access and pricing.

Difficulty in explaining the alternative scores is further amplified by the secretive “blackbox” roles that alternative scoring systems play as competitive edges against each other in producing better default predictions for lenders.

Second, improving their own credit standing is more difficult. Traditional credit scoring is heavily influenced by a person’s own financial behavior; therefore, clearer targeted actions to improve one’s credit standing, i.e., punctual monthly mortgage payments.

However, most alternative data may not be related to a person’s own financial conduct, making it beyond consumers’ control to positively influence the scores. For example, a scoring system using your social media profile, or where you attended high school, or where you shop to determine your creditworthiness would be very difficult for you to take actions to positively influence.

Third, big data could contains potential inaccuracies and biases that might lead to discrimination against against low-income, therefore, failing to provide equitable opportunity for the underserved population.

Using some alternative data, especially data about a trait or attribute that is beyond a consumer’s control to change, even if not illegal to use, could harden barriers to economic and social mobility, particularly for those currently out of the financial mainstream, i.e., Landlords often don’t report rental payments that million people make on a regular basis, including more than half of Black Americans.

Predicting the predictors

Ultimate goal of the alternative scoring system is to predict the likelihood of timely payment, which are incorporated in the predicting factors within the FICO traditional scoring system. One would argue that alternative scoring is simply an attempt to use correlations between these non-traditional characteristics and payment history to come up with the creditworthiness prediction.

It’s arguably whether these alternative models could match the prediction power of actual financial records, and whether it is simply a transitional road to the traditional model while financial payment records are not available for the underserved population.

References:

www.economist.com/international/2019/07/06/a-brief-history-and-future-of-credit-scores
Big Data: A tool for inclusion or exclusion? Understanding the issues (FTC Report)
CREDIT SCORING IN THE ERA OF BIG DATA Mikella Hurley* & Julius Adebayo** 18 YALE JL & TECH. 148 (2016)
Is an Algorithm Less Racist Than a Loan Officer? New York Times, Sep 2020
What Your Email Address Says About Your Credit Worthiness, Duke University’s Fuqua School of Business, Sep 2021,
Data Point: Credit Invisibles, The Consumer Financial Protection Bureau Office of Research
On the Rise of FinTechs – Credit Scoring using Digital Footprints
Zest AI Comments on The Federal Guidance For Regulating AIts-on-the-federal-guidance-for-regulating-ai
MEMORANDUM FOR THE HEADS OF EXECUTIVE DEPARTMENTS AND AGENCIES FROM: Russell T. Vought Acting Director

November 3, 2021

The New Need to Teach Technology Ethics

The New Need to Teach Technology Ethics
By Tony Martinez| October 29, 2021

The Hippocratic oath was written in the 5th century BC with on of the first lines stating “I will use those dietary regimens which will benefit my patients according to my greatest ability and judgement, and I will do no harm or injustice to them.”1 Iterations of this ode has been adopted and updated to be used in medicine and in other industries with the main purpose of stating do no harm. For these industries the onus of the oath falls on the industry and not the patients or users. Is it time now for Technology companies to take a similar oath?

Discussion:
Like many people I use a plethora of applications and websites for things like mobile banking and my daily work or for the occasional dog video. In doing this I blindly accept terms of service, cookie policies, and even share my data such as email for more targeted advertisements. Then I took w231 “Behind the Data: Human and Values” at the University of California Berkeley and was tasked to review these terms and understand them. It was here, where as a master level student, I was frustrated and unable to grasp some of the concepts companies discussed in the terms of service. So how would we expected the 88.75% of US households with social media accounts to be able to navigate such technical legalese.

With the average reading level in the United States being slightly over an 8th grade…

…the onus to protect the users of an application is shifting to the developers. As this shift occurs and we have the same public outcries due to data breaches or research like the Facebook contagion study we must explore if these developers have the tools to make ethical choices. Or if the companies should require them to be better trained and think through all the ethical implications.

These ethical issues are not new to technology or the Silicon Valley. Evidence of ethical issues in Technology can be found by reviewing the founding of the Markkula center in 1986. The purpose of the center was to provide silicon valley decision makers with the tools to properly practice ethics when making decisions. The founder of the center, and former Apple Chairman, Mike Markkula Jr. created this after he felt “[it was clear]”…that there were quite a few people who were in decision-making positions who just didn’t have ethics on their radar screen.” To him it was not that decision makers were being unethical but they didn’t have the tools needed to think ethically. Now the center serves as a location to provide training to companies with regards to technology, AI, and machine learning. This has lead to larger companies like Google to send a number of employees to train at the Markkula center and has since allowed them to develop a fairness module to train developers on the notion of fairness and ethics. More importantly after its creation google moved to allow the module to be publicly available as it felt the onus of protecting the users of their virtual world fell on the System developers. Googles fairness module even signifies this by stating “As ML practitioners build, evaluate, and deploy machine learning models, they should keep fairness considerations (such as how different demographics of people will be affected by a model’s predictions) in the forefront of their minds.”

It is clear from Googles stance and the growing course work at some public universities that an oath of no harm is needed in technology and is making its way into the education of developers. Such large paradigm shifts regarding ethics by these companies shows the increasing importance for them to train employees. The public view has shifted on them to not only state their ethical views but to prove it with actions and by making items like the fairness module available publicly it provides the groundwork to eventually have it mandatory in the Technology sector and for the Developers.

References:
1. National Institute of Health. (2012, February 07). Greek Medicine: “I Swear by Apollo Physician …” Greek Medicine from the Gods to Galen. https://www.nlm.nih.gov/hmd/greek/greek_oath.html
2. Statista Research Department. (2021, June 15). Social media usage in the United States – Statistics & Facts. https://www.statista.com/topics/3196/social-media-usage-in-the-united-states/#dossierKeyfigures
3. Wriber. (Accessed on 2021, October 27). A case for writing below a grade 8 reading level. https://wriber.com/writing-below-a-grade-8-reading-level/
4. Kinster, L. (2020, February 2020). “Ethicists were hired to save tech’s soul. Will anyone let them?”. https://www.protocol.com/ethics-silicon-valley
5. Kleinfeld, S (2018, October 18). “A new course to teach people about fairness in machine learning”. https://www.blog.google/technology/ai/new-course-teach-people-about-fairness-machine-learning/

November 3, 2021

Are the kids alright?

Are the kids alright?
By Anonymous | October 29, 2021

Today 84% of teenagers own a cellphone in the US . Further, teens spend an average of 9 hours per day online. While half of parents with teenagers aged 14 to 17 say they are “extremely” or “very aware” of what their kids are doing online, only 30 percent of teens say their parents are “extremely” or “very aware” of what they’re doing online. There are plenty of books, resources and programs/applications to help parents track what their teens are doing online. However, in truth there are just as many ways for kids to get around these types of controls.

This is even more disturbing when we consider that privacy policies of many companies only protect children 13 and under, but do not apply to teenagers. This means that teens are treated as adults when it comes to privacy. For example TikTok, which is the number one app used by teenagers in the US today, states the following in their privacy policy:

By contrast here is an excerpt from TikTok’s privacy policy for children under 13. It states clear retention and deletion processes.

While teens may be fine sharing their data with TikTok in what feels like a friendly community, they may not realize how many partners TikTok is sharing their data with. This list of partners includes ones that we might expect like payment processors, but it also includes advertising vendors that might be less expected/desirable.

In turn, each of these partners has their own data handling, retention, sharing, privacy and deletion policies and practices that are completely unknown to TikTok users.

What about the government?
While we might expect private corporations to do what is in their own best interests, even Congress has been slow to protect the privacy of teens. This week the Congressional subcommittee on Consumer Protection, Product Safety and Data Security questioned policy leaders from TikTok and Snap about the harmful effects of social media on kids and teens.

While these types of investigations are necessary and increase visibility into these companies’ opaque practices, the bottom line is that there are no formal protections for teens today. The Children’s Online Privacy Protection Act (COPPA), enacted in 1998, does impose certain restrictions on websites targeted at children, but only protects children 13 and under. The bill S.1628, which looks to amend COPPA to include protections to teenagers, was only introduced in May of this year . Additionally, there is the Kids Internet Design and Safety Act (KIDS) which was proposed last month to protect the online safety of children under 16. However, all this is still only under discussion – nothing has been approved.

What about protections such as GDPR and CCPA?
The General Data Protection Regulation (GDPR) which went into effect in Europe in 2018, was enacted to give European citizens more control over their data. It includes the “right to be forgotten” which states:

“The data subject shall have the right to obtain from the controller the erasure of personal data concerning him or her without undue delay and the controller shall have the obligation to erase personal data without undue delay” if one of a number of conditions applies. “Undue delay” is considered to be about a month.

Similarly in the US, California has enacted the California Consumer Privacy Act (CCPA), which went into effect in 1998, extends similar protections to California residents. While it is likely that many other states will follow suit with similar types of protections, companies are able to interpret their implementation of these regulations as they see fit, and many are still figuring out exactly how to implement these policies tactically in their organizations. Until then teens will continue to create a digital footprint and audit trail that could follow them for many years into the future.

How do we move forward?
As we see, there are many places where privacy protections for teens break down. They are legally children, but have none of the protections that kids should have. Google this week announced that children (persons under the age of 18) or adults on their behalf have the ability to request that photos of them be removed from the search engine. This is a step in the right direction. However, we need more. We need governmental agencies to move more quickly to enact legislation to provide stronger, explicit protections for teens so that their privacy protections are not dictated by the whims of online companies – we owe them that much.

Sources:
“It’s A Smartphone Life: More Than Half Of US Children Now Have One.” 31 Oct. 2019, https://www.npr.org/2019/10/31/774838891/its-a-smartphone-life-more-than-half-of-u-s-children-now-have-one. Accessed 7 Oct. 2021.
“How much time does a teenager spend on social media?.” 31 May. 2021, https://www.mvorganizing.org/how-much-time-does-a-teenager-spend-on-social-media/. Accessed 25 Oct. 2021.
“Think You Know What Your Teens Do Online? – ParentMap.” 16 Jan. 2018, https://www.parentmap.com/article/teen-online-digital-internet-safety. Accessed 25 Oct. 2021.
“Text – S.1628 – 117th Congress (2021-2022): Children and Teens ….” https://www.congress.gov/bill/117th-congress/senate-bill/1628/text. Accessed 7 Oct. 2021.
“Google now lets people under 18 or their parents request to delete ….” 27 Oct. 2021, https://techcrunch.com/2021/10/27/how-to-delete-your-kids-pictures-google-search/. Accessed 28 Oct. 2021.

November 3, 2021

Trends in Modern Medicine and Drug Therapy

Trends in Modern Medicine and Drug Therapy
By Anonymous | October 11, 2021

The prescription drug industry has been a constant headline in the news over the past decade for a variety of reasons. Opioid addiction is probably the most prominent drawing attention to the negative aspects of prescription drug abuse. One of the current headlines and topics in congress is prescription drug costs which is a large issue for certain demographics who are unable to access drugs essential to their well being. Overshadowed are discussions of the benefits of drug therapy and the opportunities for advancement in the medical field through research and a combination of modernized and alternative methodologies.

Three interesting methodologies and fields of research that overlap with drug therapy are personalized medicine, patient engagement, and synergies between modern and traditional medicine. Interestingly, data collection, data analytics, and data science are important components of each. Below is a quick synopsis of these topics including some of the opportunities and challenges with the integration of data in the research. I include a number of research papers I reviewed at the end.

Patient engagement defined broadly is the practice of the patient being involved in decision making throughout their treatment. A key component of patient engagement is education in various aspects of one’s own personal health and the treatment options available. A key benefit is collection of better treatment intervention and outcome data.

One of the primary aspects of decision making in pursuing a treatment option is that the benefits outweigh the risks (fda). Patients which take an active role in their treatment and are more aware of the associated risks are naturally better able to minimize the negative effects. One common example of a risk is weight gain. Another benefit of patient engagement is better decision making with respect to lifestyle changes such as having children.

Patient engagement also creates the opportunity to gather better data through technological advances in smartphone devices and apps which allow patients to enter data or collect data through automatic sensors. Social media data is actually a common data source and it is tough to argue that the patient provided data is not a better alternative.

Traditional Medicine, also known as alternative medicine, are those which have been practiced by indigenous cultures and rely on natural products and therapies to provide health care treatment. Two examples include Traditional Chinese Medicine and Ayurveda of India. For the purposes of this discussion, I would broaden the field to the evolving use of natural products such as CBD’s and medicinal marijuana.

While the efficacy of alternative medicine is debated, it can probably be agreed that components of traditional medicine can provide practical medical benefits to modern health care. One of the main constraints of identifying these components is the access to data. In the case of Ayurveda, one researcher has proposed a data gathering framework combining a digital web portal, operational training of practitioners, and leveraging India’s vast infrastructure of medical organizations to gather and synthesize data (P. S. R. K. Haranath). As the developed world becomes more comfortable with alternative medicine, these types of data collection frameworks will be critical to formalizing treatments.

Personalized Medicine is the concept of medicine which can be tailored to the individual rather than a one-size fits all approach (Bayer). The complex scientific framework relies on biomarkers, biogenetics, and patient stratification to develop targeted treatment for individual patients.

Data analytics and data professionals will play a vital role in the R & D of personalized medicine and the pharmaceutical industry in general. Operationalized data is a key component to the research methodologies. Many obstacles exist with clinical data including the variety of data sources, types, and terminology, siloed data across the industry, and data privacy and security. Frameworks are being developed to lead to more data uniformity and promising efforts are being made to share data across organizations. With operationalized data, advanced predictive and prescriptive analytics can be conducted to develop customized treatments and decision support (Peter Tormay). Although complex, hopefully continued progress in research and application of data analytics will lead to incremental innovations for medical treatment.

The broader purpose of the discussion is to bring awareness and advocacy for these fields of research as healthcare data is a sensitive topic for patients. The opportunities with respect to data are also highlighted to help build confidence in the prospect of jobs in the fields of data engineering, data analytics, and data science in medicine. Hopefully, the long term results of this medical research will be to provide patients with more and better treatment options, increase treatment effectiveness and long term sustainability, and lower costs and increase availability.

Resource Materials:

Pharmaceutical Cost, R&D Challenges, and Personalized Medicine
Ten Challenges in Prescription Drug Market – Cost <https://www.brookings.edu/research/ten-challenges-in-the-prescription-drug-market-and-ten-solutions/>
Big Data in Pharmaceutical R&D: Creating a Sustainable R&D Engine <https://link.springer.com/article/10.1007/s40290-015-0090-x> Peter Tormay
Bayer’s Explanation of Personalized Medicine <https://www.bayer.com/en/news-stories/personalized-medicine-from-a-one-size-fits-all-to-a-tailored-approach>

Patient Engagement and Centricity
Making Patient Engagement a Reality <https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5766722/>
Think It Through: Managing the Benefits and Risks of Medicines <https://www.fda.gov/drugs/information-consumers-and-patients-drugs/think-it-through-managing-benefits-and-risks-medicines>
Patient Centricity and Pharmaceutical Companies: Is It Feasible? <https://journals.sagepub.com/doi/full/10.1177/2168479017696268>

Traditional and Alternative Medicine
The Traditional Medicine and Modern Medicine from Natural Products <https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6273146/>
Role of pharmacology for integration of modern medicine and Ayurveda <https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4621664/> , P. S. R. K. Haranath
Number of States with Legalized Medical Marijuana <https://www.insidernj.com/press-release/booker-warren-call-doj-decriminalize-cannabis/>

Prescription Drug Stats
https://hpi.georgetown.edu/rxdrugs/ <https://hpi.georgetown.edu/rxdrugs/>
https://www.cdc.gov/nchs/data/hus/2019/039-508.pdf

Images
5 elements of successful patient engagement <https://hitconsultant.net/2015/07/17/5-elements-of-successful-patient-engagement/#.YWUB3bhKhyw> – HIT Consultant News
Personalized Medicine Image <https://blog.crownbio.com/pdx-personalized-medicine>