blogpost – Page 11 – Data Science W231 | Behind the Data: Humans and Values

March 16, 2022

Data Sharing during the COVID-19 Pandemic

Data Sharing during the COVID-19 Pandemic
By Javier Romo | March 16, 2022

Patient data privacy and security is, by standard, something that all healthcare organizations must provide assurance for. However, when a global pandemic threatens humanity, data sharing amongst healthcare entities is vital to understanding, controlling and responding to the spread of virus, as we saw in the 2020 COVID-19 pandemic. The problem was, at least early in the pandemic, barriers that were in-place to isolate patient data for privacy and security prior to the COVID-19 pandemic created data silos and lack of capability to tackle the crisis in a coordinated fashion. While in a normal situation, we study privacy, ethics and legalities around data sharing, but the pandemic forced us to abandon those strongholds to, hopefully, save as many lives as possible.

Why Was Data Sharing Important?

Data sharing during the pandemic does not simply involve lab or vaccine data, but also demographics like age, race, location, perhaps even previous diagnosis. Electronic health records allowed for the mass collection of this data. As scientists learn about a virus, they must understand who the virus is impacting the most and generally, that involves the curation of all this data to describe the effected population. While many of us would in a normal case prefer our data be private to our healthcare organization of choice, the benefits of this practice became obvious.

First, vaccine distribution was tailored towards individuals who were high-risk of a serious COVID-19 reaction or more likely to get infected. For example, the first group of individuals prioritized for the first dose were older adults, many in nursing homes or assisted living facilities, and healthcare staff, especially hospitals who were actively treating infected patients (Stieg, 2020).

Secondly, the curation of patient data allowed governments to make significant policy and orders to reduce the spread of the pandemic. For example, as the data showed a spread of the pandemic, governors across the United States placed shutdown orders to prevent the spread. These decisions were based on the story of the data and the impact the virus could have on regions under threat.

What Is Happening Now?

The COVID-19 pandemic presented many learning lessons that will impact how the United States, and likely the world, addresses a future pandemic. For example, the National COVID Cohort Collaborative, a data sharing project sponsored by the National Institute of Health (NIH), is developing a database named N3C that allows for healthcare organizations that are participating in the project to share “entire medical records” with the database (Frieden, 2021). This framework is specific to the COVID-19 pandemic; however, it is a framework that can be recreated and deployed for a new virus or disease. And while all of it sounds good during a pandemic, now that is 2022 and the pandemic appears to be nearing its end, patient privacy concerns are reemerging. It is important that we review when data sharing, especially to this extent is allowed and rebuild trust in our healthcare data security and privacy

To conclude, the COVID-19 pandemic was a shock to the healthcare system, and the world. It required rapid changes to data sharing to move data out of healthcare system silos and in the hands of healthcare entities and government that can help combat the pandemic. In war, militaries utilize information gathered by spies, reconnaissance or other intelligence to strategize a battleplan, healthcare systems during a pandemic need something similar, and in this case, it was data gathered from electronic health records. Now that the pandemic is nearing the end, we must review what was done and rebuild the trust in our healthcare data privacy. However, we must research and develop contingency plans to share data in the case another pandemic threatens lives of many people.

References

[1] Frieden, J. (2021, April 23). Health Data Sharing Improved During Pandemic, but Barriers Remain. Retrieved from Medpage Today: <https://www.medpagetoday.com/practicemanagement/informationtechnology/92263>
[2] Stieg, C. (2020, December 14). When Dr. Fauci and other experts say you can expect to get vaccinated for Covid-19. Retrieved from CNBC.com: <https://www.cnbc.com/2020/12/14/who-gets-the-covid-vaccine-first-timeline-and-priority-explained.html>

Images
Image 1: <https://www.medpagetoday.com/practicemanagement/informationtechnology/92263>
Image 2: <https://chicagoitm.org/learn-how-to-harness-this-national-covid-19-database-for-your-research/>

March 16, 2022

Operationalizing Privacy Protection through Data Capitalization

Operationalizing Privacy Protection through Data Capitalization
By Pow Chang | March 16, 2022

Many companies have emerged to provide software-as-a-service (SaaS) in the last two decades. They harvested massive datasets and aggregated the dataset into high-value service products. The primary business model of these data-centric organizations is to design and build a digital platform, where data and information are primary assets to the organization to generate revenue perpetually. These types of data-centric companies such as Facebook, Twitter, Netflix, Amazon have harvested millions of Petabytes of the data subject as such PII (personally identifiable data) data. Since this valuable data collected on their platforms are the assets and tools to generate future cash flow, it is in their best interests to protect these data, furthermore, ensure and comply with privacy regulations such as GDPR and CCPA. Public wants to hold these companies accountable for privacy protection [2], stakeholders would want to have this data be capitalized as tangible assets in the collection and storage process to ensure financial integrity and good data governance.

There are a few plausible reasons for capitalizing PII data in the balance sheet to operationalize data privacy: first, this serves as a proxy for privacy protection; capitalized PII is reflected in the financial statement and subject to scrutinization and auditing process every quarter by professional auditors. In the current practice, companies usually expense out the data acquisition cost even though the data has a significant impact on their future cash flow. Expensing the acquisition cost does not capture the intrinsic value and physical presence of the PII in their book. According to the Ponemon Institute’s 2020 “Cost of Data Breach Study”, the average cost per compromised record for PII is $175 as compared to the Intellectual Property loss of $151 [8]. Capitalized data does not equal replacing data acquisition costs; it adds the tangible asset component to validate the existence of the PII.

Second, there is no practical way to quantify the actual loss and degree of damage in any data breach harms [7]. One good example will be the case of Equifax, on September 7, 2017, Equifax announced that they lost the personal information of over 140 million consumers from its network in a catastrophic data breach [3], including people’s social security numbers, driver’s license numbers, email addresses, and credit card information. Even though Equifax settled the data breach and agreed to pay $1.38 billion, which includes $1 billion in security upgrades. For customers’ data that has been compromised, the customer could be entitled to up to $20,000 claims of damage. However, this claims process puts the onus on the consumer to justify that they deserve that. If Equifax had capitalized PII on its book, this could provide a detailed assessment of damage and budget planning for security technology expenditure to safeguard the PII assets.

Third, since capitalized data is captured in the book, the audit and transparency could prevent any potential poor corporate governance and dishonest culture that nurture severe conflicts of interests or even unethical behavior, such as the case of Facebook – Cambridge Analytica [4]. All the assets are subject to matching their equivalent market value. It could be impaired or appreciated. In either case, the company must have a plausible explanation to adjust and to reflect any incident materially affects the underlying fundamental of its business. For instance, Target paid $18.5M for the 2013 data breach that affected 41 million consumers [5]. This settlement amount could have been much more considerable higher and revealed the millions of consumers’ actual loss if they captured PII records in the financial statement.

There are still many studies to understand the implication of using data capitalization as a proxy for boosting privacy protection. Privacy dimension [6] provides a comprehensive list of privacy dimensions and attributes; this is a framework could define the construct of the data to be capitalized. To operationalize privacy is to protect the vulnerable group and create a fair system, these organizations reap the profit, but customers are the one to bear the cost in the long term. This does not align with the Belmont Report’s Principle of Justice – fairness in distribution [9]. Most data breaches were due to poor internal security control, people factors, overdue patches, and known application vulnerabilities. The cost suffered by the consumer is colossal and could never be adequately estimated unless PII data is capitalized as tangible asset.

References:

[1] https://www.cnbc.com/2019/07/25/how-to-claim-your-compensation-from-the-equifax-data-breach-settlement.html
[2] Nicholas Diakopoulos. Accountability in Algorithmic Decision Making. Communications of the ACM, February 2016, Vol. 59 No. 2, Pages 56-62. https://cacm.acm.org/magazines/2016/2/197421-accountability-in-algorithmic-decision-making/fulltext
[3] https://www.inc.com/maria-aspan/equifax-data-breach-worst-ever.html
[4] https://www.wired.com/story/cambridge-analytica-facebook-privacy-awakening/
[5] https://www.usatoday.com/story/money/2017/05/23/target-pay-185m-2013-data-breach-affected-consumers/102063932/
[6] Deirdre K. Mulligan, C. K. (2016). Privacy is an essentially contested concept: a multi- dimensional analytic for mapping privacy. The Royal Society Publishing, 374.
[7] Solove, D. J. (2005). A Taxonomy of Privacy. GWU Law School Public Law Research Paper No. 129, 477.
[8] https://www.ponemon.org/
[9] The Belmont Report: What is it and how does it relate to today’s clinical trials?
Department of Health, E. a. (1979, 4 18). The Belmont Report.
[10] Pictures source: pixabay.com

March 16, 2022

Exploring Student Data Privacy

Exploring Student Data Privacy
By Jamie Smith | March 16, 2022

When I was in high school, an athletic coach sat me and the team down and shared with us that someone on the team would not be playing for us this season. This came as a shock to all of us. The coach then shared that this student had failed to meet the eligibility requirements of the team. This then led to a discussion about how important keeping academic standing was and how serious this was for all of us. In hindsight, this oversharing of this individual’s academic standing was not only a breach of confidentiality, but is one of the many ways that educators fail to uphold the legal requirements of FERPA.

Image: Depicting the Department of Education Seal and FERPA: Family Educational Rights and Privacy Act

What is FERPA?

FERPA is the Family Education Rights and Privacy Act. FERPA was passed in 1974 as a means to make confidential student’s data for any institution that benefits from tax dollars, from pre-k through college. FERPA protects students’ personal information from being shared. This data includes student’s race and ethnicity, specifics about their academic standing, disciplinary actions, learning disabilities and more.

FERPA also lays the groundwork for how school’s can share some data that to most might be considered necessary to effectively run a school. This might include the student’s name, picture, email address, phone number, or home address. Examples of this data being used might be in the school year book, playbills for theater or honor roll announcements. This category of data is called “Directory Information.”

Image of Directory Information versus protected data. Image taken from the Department of Education

Directory Information is a very broad category of data and is not explicitly defined under FERPA. Each school district may have differing opinions of what is considered Directory Information and will also have different policies around how and with whom this data is shared. FERPA requires that the school district’s make this data known either through school bulletins provided at the start of the year or on the school’s website. Parent’s own the right to “opt out” of the sharing of this data, but there are many who do not believe that parents are properly informed of what this entails:

“The World Privacy Forum’s (WPF) Pam Dixon studied FERPA forms from districts across the country and found that many are worded in ways that “discourage parents” from opting-out of information sharing.” (Weisbaum 2015)

Though there are options to limit the amount of data that is being collected from our student’s, there’s a new age of data collection occurring with little oversight.

Student Data Collection through a Virtual Environment

Image of a book transforming into a laptop

There has been a steady shift towards online learning and with the onset of COVID, which halted in-person learning for many in the US, it was the only option. Unfortunately, and what is oftentimes commonplace, regulations around data privacy have not kept up with the changing times.

As educator’s scrambled to find the best ways to teach students virtually, a deluge of apps were being created to help with this transition to a virtual world. However, most of these apps come with their own set of terms and agreements that do not necessarily align with FERPA. This oftentimes means that the educator is unwittingly allowing these companies access to student information that can then be processed and used for things like marketing or advertisements.

As students and educators transition more and more to leveraging online resources, there are rights that caregivers should better understand if they want to keep their children’s data private.

The Parent Coalition for Student Privacy has a number of resources and tips for how to protect student data. An example is when students log into apps not managed by the school system. They encourage students to only use usernames that do not have any personally identifiable information. They also encourage caregivers to speak up if they are not ok with the privacy terms and agreements of these third party apps. It’s the right of the caregiver to remove their student from using an app if it collects data that they are not comfortable with.

As we move to a virtual world, it’ll be important for the Department of Education to act to protect students’ data, while also teaching data privacy to educators, caregivers and students.

March 16, 2022

Physical Implications of Virtual Smart Marketing

Physical Implications of Virtual Smart Marketing: How the Rise in Consumerism Powered by AI/ML Fuel Climate Change
By Anonymous | March 16, 2022

Introduction
Suspiciously relevant ads materialize in our social media feeds, our e-mails, and even texts. It’s become commonplace for digital marketing groups to invest in teams of data scientists with the hopes of building the perfect recommendation engine. At a glance, sales are increasing, web traffic is at an all-time high, and feedback surveys imply a highly satisfied customer base. But at what cost? This rise of consumerism, incited by big data analytics, has caused in increase in carbon emissions due to heightened manufacturing and freight. In this blog post, I will explore machine learning techniques used to power personalized advertisements in the consumer goods space, the resulting expedited rise of consumerism, and how our planet, in turn, is adversely affected.

Data Science in the Retail Industry
Data science enables retailers to utilize customer data in a multitude of ways, actively growing sales and improving profit margins. Recommendation engines consume your historical purchase history to predict what you’ll buy next. Swaths of transaction data are used to optimize pricing strategy across all the board. Computer vision is expanding, used to power the augmented reality features in certain mobile apps, such as IKEA’s app that customers can use to virtually place furniture into their very own homes.

Source

But arguably one of the largest use cases would have to be personalized marketing and advertising. Both proprietary and third-party machine learning algorithms have massively improved with time, predicting the unique purchases a single consumer will make with tremendous accuracy. According to a 2015 McKinsey Report, research shows that personalization can deliver five to eight times the ROI on marketing spend and lift sales 10 percent or more [1]. Modern day retailers understand this lucrativeness, and in turn, scramble to assemble expert data science teams. But what of their understanding of the long-term implications beyond turning a profit?

The Rise in Consumerism
As data science continues to assert its dominance in the consumer goods industry, customers are finding it hard to resist such compelling marketing. This pronounced advancement in marketing algorithms has unabashedly caused a frenzy in purchases by consumers throughout the years. According to Oberlo, the US retail sales number has grown to $5.58 trillion in the year 2020—the highest US retail sales recorded in a calendar year so far. This is a 36 percent increase over nine years, from 2011 [2]. These optimized marketing campaigns, coupled with the advent of nearly instantaneous delivery times (looking at you, Amazon Prime), have fostered a culture that sanctions excessive amounts of consumer spending.

Source

The Greater Rise in Temperature
To keep up with demand, retailers must produce a higher volume of goods. Unfortunately, this increased production will lead to higher pollution rates from both a manufacturing and freight standpoint. These retailers primarily use coal-based energy for their manufacturing, which emits greenhouse gases into the atmosphere. These goods are then transported in bulk by truck, train, ship, or aircraft, exuding carbon dioxide and further exacerbating the problem.

Although consumer goods production is not solely responsible for all emissions, it undeniably contributes to the exponential warming of the planet. According to the National Geographic, NOAA and NASA confirmed that 2010 to 2019 was the hottest decade since record keeping began 140 years ago [3].

Source

Furthermore, these purchased goods will eventually comprise earth’s MSW, or municipal solid waste (various items consumers throw away after they are used). The United States Environmental Protection Agency claims that the total generation of MSW in 2018 was 292.4 million tons, which was approximately 23.7 million tons more than the amount generated in 2017. This is a marked increase from the 208.3 million tons of MSW in 1990 [4]. The decomposition of organic waste in landfills produces a gas which is composed primarily of methane, another greenhouse gas contributing to climate change [5]. There are clearly consequential and negative effects of this learned culture of consumerism.

What You Can Do
To combat climate change, begin by understanding your own carbon footprint. You can perform your own research or you can use one of the many tools available on the internet, such as a carbon footprint calculator (https://www.footprintcalculator.org). If you incorporate less unprocessed foods into your diet, include more locally sourced fruits and vegetables, and avoid eating meat, you are taking small but important steps in the fight against climate change. Consider carpooling or taking public transit to work and/or social events to decrease carbon emissions from your commute. Steps like these seem small, but they build good habits and cultivate lifestyle changes that contribute to the health of our planet.

References

[1] https://www.mckinsey.com/~/media/McKinsey/Business%20Functions/Marketing%20and%20S ales/Our%20Insights/EBook%20Big%20data%20analytics%20and%20the%20future%20of%20m arketing%20sales/Big-Data-eBook.ashx
[2] https://www.oberlo.com/statistics/us-retail-sales
[3] https://www.nationalgeographic.com/science/article/the-decade-we-finally-woke-up-to- climate-change
[4] https://www.epa.gov/facts-and-figures-about-materials-waste-and-recycling/national- overview-facts-and-figures- materials#:~:text=The%20total%20generation%20of%20MSW,208.3%20million%20tons%20in% 201990.
[5] https://www.crcresearch.org/solutions- agenda/waste#:~:text=The%20decomposition%20of%20organic%20waste,potential%20impact %20to%20climate%20change.

March 16, 2022

Predictive policing algorithms: Put garbage in, get garbage out

Predictive policing algorithms: Put garbage in, get garbage out
By Elise Gonzalez | March 16, 2022

Image source: https://tinyurl.com/nz8n7xda

In recent years, “data-driven decision making” has seen a big increase in use across industries [1]. One industry making use of this approach, which relies on data rather than just human intuition to inform decisions, is law enforcement. Predictive policing tools have been developed to alert police as to where crime is likely to occur in the future, so that they can more effectively and efficiently deter it.

In a different and unbiased world, maybe tools like this would be reliable. In reality, because of the way they are designed, predictive policing tools merely launder the bias that has always existed in policing.

So, how are these tools designed? Let’s use two popular predictive policing softwares as examples: PredPol and Azavea’s HunchLab, which have been used in Los Angeles, New York, and Philadelphia, among other, smaller cities [2]. Each of these companies has designed an algorithm, or a set of instructions on how to handle different situations, that is equipped to rank locations on their relative future crime risk. These algorithms base that risk on any past instances of crime at or around each location. That information comes from historical policing data. PredPol uses addresses where police have made arrests or filed crime reports; HunchLab uses the same, as well as addresses to which police have been called in the past [3, 4]. This information is presented to the algorithm as a good and true indicator of where crimes occur. The algorithm then makes predictions of where crimes are likely to occur in the future based on the examples it has seen, and nothing else. Those predictions are used to inform decisions around where police should patrol, or where their presence may be the strongest crime deterrent.

HunchLab (left) and PredPol (right) user interfaces.
Image sources: https://tinyurl.com/2p8vbh7x (top), https://tinyurl.com/2u2u7cpu (bottom)

Algorithms like these lose their credibility because they base predictions of future crime on past police activity in an area. We know from years of research on the subject that minority and particularly Black communities in the United States are over-policed relative to their majority white counterparts [5]. For example, Black and white people are equally likely to possess or sell drugs in the United States, but Black people are arrested at a rate 3 to 5 times higher than whites nationally [6]. Policing trends like this one cause Black communities to be over-represented in police records. This makes them far more likely to appear as hot-spots for crime, when in reality they are hot-spots for police engagement.

Calls for service also do not represent the actual incidence of crime. Popular media has reported many examples in the last few years of police being called on Black Americans who are simply going about their lives – barbecuing at a park, sitting at Starbucks, or eating lunch [7]. Because of examples like this, presenting calls for service as a good and true representation of where crimes occur is misleading.

In short, predictive policing algorithms do not have minds of their own. They cannot remove bias from the data they are trained on. They cannot even identify bias in that data. They take as fact what we know to be the result of years of biased policing – that more crime happens in neighborhoods with more Black residents, and that less crime happens in majority white neighborhoods. This leads them to make predictions for future crimes that reproduce that bias. This is the idea of garbage in, garbage out: “If you produce something using poor quality materials, the thing that you produce will also be of poor quality” [8]. As these allegedly unbiased algorithms and those like them are increasingly used to make life-altering decisions, it is critically important to be aware of the ways that they can reproduce human bias. In this case, as with many, human bias is never removed from the process of making predictions; it is only made more difficult to see.

References
[1] Harvard Business Review Analytic Services. (2012). The Evolution of Decision Making: How Leading Organizations Are Adopting a Data-Driven Culture. Harvard Business Review. https://hbr.org/resources/pdfs/tools/17568_HBR_SAS%20Report_webview.pdf

[2] Lau, T. (2020, April 1). Predictive Policing Explained. Brennan Center for Justice. Retrieved March 10, 2022, from https://www.brennancenter.org/our-work/research-reports/predictive-policing-explained

[3] PredPol. (2018, September 30). Predictive Policing Technology. https://www.predpol.com/technology/

[4] Team Upturn. HunchLab — a product of Azavea · Predictive Policing. (n.d.). Team Upturn Gitbooks. https://teamupturn.gitbooks.io/predictive-policing/content/systems/hunchlab.html

[5] American Civil Liberties Union. (2020, December 11). ACLU News & Commentary. Retrieved March 10 2022 from https://www.aclu.org/news/criminal-law-reform/what-100-years-of-history-tells-us-about-racism-in-policing/

[6] Human Rights Watch. (2009, March 2). Decades of Disparity. Retrieved March 10, 2022, from https://www.hrw.org/report/2009/03/02/decades-disparity/drug-arrests-and-race-united-states#

[7] Hutchinson, B. (2018, October 20). From “BBQ Becky” to “Golfcart Gail,” list of unnecessary 911 calls made on blacks continues to grow. ABC News. Retrieved October 3, 2022, from https://abcnews.go.com/US/bbq-becky-golfcart-gail-list-unnecessary-911-calls/story?id=58584961

[8] The Free Dictionary by Farlex. (n.d.) garbage in, garbage out. Collins COBUILD Idioms Dictionary, 3rd ed. (2012). Retrieved March 10 2022 from https://idioms.thefreedictionary.com/garbage+in%2c+garbage+out

March 16, 2022March 16, 2022

Let’s talk about AirTags!

Let’s talk about AirTags!
By Jillian Luicci | March 16, 2022

Apple released a product called the AirTag in April 2021. The product costs $29 and is advertised to help users keep track of their belongings. (Apple, 2022) Some of the items Apple suggests tracking with an AirTag include keys, bikes, and luggage. The AirTag integrates with “Find My”, an Apple application which previously has been used to access the GPS of Apple products like iPhones and AirPods; the tag uses bluetooth technology to find the GPS of the product using the GPS of other nearby bluetooth-enabled Apple devices.

However, there have been many recent incidents of stalking using AirTags highlighted in news outlets. (Levitt, 2011) In most of these cases, the victims received a notification on their phone that there has been an AirTag tracking them for some amount of hours. The victims initially feel violated because they did not consent to their location data being tracked in the first place. Further, the victim did not consent to the dissemination of this data to the AirTag owner.

Young female character hiding in the bushes and looking through binoculars. Spying, conceptual illustration. Special agent. Secret mission. Sneak peek. Flat vector illustration, clip art

The California Attorney General recently released some privacy recommendations, which appear to have been violated by this abuse of AirTags for stalking. (California, 2014) First, they recommend a “Do Not Track” principle which traditionally refers to notifying and requesting consent from users prior to tracking their clicks and activities when web browsing. This principle draws parallels to the use of Apple AirTags. While the AirTag victim is not web browsing, the “Do Not Track” principle can still be applied to AirTags. Regardless of the technology used for tracking, the principle broadly speaks to the necessity of consent prior to passively tracking people’s data. Additionally, the recommendations include principles around data sharing, individual access, and accountability. These recommendations are gaps in Apple’s privacy policy. In this case, the recommendations are not exhaustive enough to extend to the rights of the victims because they never consented or necessarily reviewed the Apple AirTag policies.

When the stalking victims become aware of the device via the Apple alert, the victims often seek the assistance of police in order to deactivate the AirTag. The victims typically leave disappointed because the police are unable to assist without identifying the physical device. The AirTags are often difficult to find due to the small size of the device and the stalkers deliberately camouflaging the devices. Notably, only Apple users can receive alerts regarding AirTags tracking them. This excludes Android Users from this safety control.

As a result of these stalking cases, Apple recently released updates to AirTag and Find My. (AirTags, 2022) They have included software updates to notify AirTag users that the device is not meant for tracking people and to provide more precise tracking when a device is detected nearby. While these updates define the intent of the product, the changes do not promote informed consent nor does it prevent unwanted data dissemination. Further, these changes can only be effective if the victim is an informed Apple user. There are still risks for people who are targeted that do not know the risks associated with being tracked by an AirTag and the options to remove unwanted AirTags.

Apple should consider performing an ethical assessment of AirTags. The Belmont Report is a respected paper which defines three basic ethical principles: respect for persons, beneficence, and justice.(Belmont, `979) The application of AirTags for stalking violates all three of these principles. First, AirTags violate respect for persons because the victims do not consent to the collection and dissemination of their data. Second, beneficence is violated because the physical risks related to stalking far outweigh the benefits of finding an item such as keys. Third, justice is violated because it is illegal to stalk people. Overall, this product has potentially harmful applications to unsuspecting people. While Apple has attempted to resolve some of the concerns, there are still many glaring problems with AirTags that Apple should address immediately.

References

[1] Apple. 2022. AirTag. [online] Available at: <https://www.apple.com/shop/buy-airtag/airtag/1-pack?afid=p238%7CsyU1UIAS3-dc_mtid_1870765e38482_pcrid_516270124941_pgrid_116439818610_pntwk_g_pchan_online_pexid__&cid=aos-us-kwgo-pla-btb–slid—product-MX532AM%2FA> [Accessed 11 March 2022].

[2] Levitt, M., 2022. AirTags are being used to track people and cars. Here’s what is being done about it. [online] Npr.org. Available at: <https://www.npr.org/2022/02/18/1080944193/apple-airtags-theft-stalking-privacy-tech> [Accessed 11 March 2022].

[3] California Attorney General. Making Your Privacy Practices Public: Recommendations on Developing a Meaningful Privacy Policy. May 2014. https://oag.ca.gov/sites/all/files/agweb/pdfs/cybersecurity/making_your_privacy_practices_public.pdf

[4] Apple. 2022. AirTag. [online] Available at: <https://www.apple.com/shop/buy-airtag/airtag/1-pack?afid=p238%7CsyU1UIAS3-dc_mtid_1870765e38482_pcrid_516270124941_pgrid_116439818610_pntwk_g_pchan_online_pexid__&cid=aos-us-kwgo-pla-btb–slid—product-MX532AM%2FA> [Accessed 11 March 2022].

[5] The National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research. The Belmont Report: Ethical Principles and Guidelines for the Protection of Human Subjects of Research. April 18, 1979. https://www.hhs.gov/ohrp/sites/default/files/the-belmont-report-508c_FINAL.pdf

March 9, 2022March 9, 2022

Online Privacy in a Global App Market

Online Privacy in a Global App Market
By Julia H. | March 9, 2022

The United States’ west coast is home to thousands of technology companies trying to innovate, find a niche and make it big. Inevitably, much of the products developed here reflects it’s western roots and doesn’t adequately consider the risks to its most vulnerable users which may be thousands of miles away. This was the case with Grindr, which prides itself on being the world’s largest social networking app for gay, bi, trans, and queer people. Instead of being a safe space for a marginalized community, a series of security failures combined with not enough emphasis on user privacy has put some LGBTQ+ communities around the world at serious risk over the past decade. Grindr has thankfully responded by making updates that focus on the safety of its users. Still, much can be learned from the ways the platform was abused and how different implementation decisions can be made in order to protect users, especially in high stakes situations.

Human Dignity Trust, Map of Countries that Criminalise LGBT People, 2022

Today, “71 jurisdictions criminalise private, consensual, same-sex sexual activity” [1]. Even in places where it isn’t a criminal offience, individuals can find themselves facing harassment and other hate crimes due to their gender or sexual orientation. In Egypt, for example, police have been known to entrap gay men by detecting their location on apps like Grind and using the existence of the app itself, as well as screenshots and messages from the app, as part of debauchery case [2]. This has been a particularly prevalent problem since 2014 when Grindr security issues, especially surrounding easy access to user location by non-app users, were first brought to light by cybersecurity firm Synack [3]. Grindr’s first response was to note that location sharing can be disabled and to go ahead and disable the feature by default in well known homophobic countries such as Russia, Nigeria, Egypt, Iraq and Saudi Arabia. Despite this, triangulating the location of a user was still possible due to the order in which profiles appear in the app [4].

@Seppevdpll, Trilateration via Grindr, 2018

Sharing exact user location with 3rd parties, or enough information to triangulate an individual, violates privacy laws such as GDPR and California’s CCPA regulation. A huge miss for Grindr outside of how this information could be abused by conservative governments. In parts of California, where Grindr is based, there is a large, vibrant and welcoming gay community. There is a certain level of anonymity in numbers that can be lost elsewhere. Thus, maintaining the safe online space the app was likely meant to be is not just about implementing technical security practices and adhering to legislation. It means taking into account the cultural differences among app users when designing interactions.

Grindr has faced much scrutiny and backlash and has luckily reacted with some updates to its application. They have launched kindr, a campaign to promote “diversity, inclusion, and users who treat each other with respect” [5] that included an update to their Community Guidelines. They have also introduced the ability for users to unsend messages, set an expiration time on the photos they send, and block screenshots [6]. These features, in combination with the use of VPNs, have made it easier for members of the LGBTQ+ community to protect themselves while using Grindr.

Kindr Grindr, 2018

Having a security and privacy-first policy when developing apps should be the standard. Companies all over the world should take on the responsibility of protecting their users with the decisions that are made during design and implementation. Moreover, given the global audience that most companies are targeting these days, they should strive to consider the implications of the technology being released in settings different to those of its developer. Particularly by including input during the development process from different types of users.

Citations
[1] “Map of Countries That Criminalise LGBT People.” Human Dignity Trust, https://www.humandignitytrust.org/lgbt-the-law/map-of-criminalisation.
[2] Brandom, Russell. “Designing for the Crackdown.” The Verge, The Verge, 25 Apr. 2018, https://www.theverge.com/2018/4/25/17279270/lgbtq-dating-apps-egypt-illegal-human-rights.
[3] “Grindr Security Flaw Exposes Users’ Location Data.” NBCNews.com, NBCUniversal News Group, 28 Mar. 2018, https://www.nbcnews.com/feature/nbc-out/security-flaws-gay-dating-app-grindr-expose-users-location-data-n858446.
[4] @seppevdpll. “It Is Still Possible to Obtain the Exact Location of Millions of Men on Grindr.” Queer Europe, https://www.queereurope.com/it-is-still-possible-to-obtain-the-exact-location-of-cruising-men-on-grindr/.
[5] “Kindr Grindr.” Kindr, Grindr, 2018, https://www.kindr.grindr.com/.
[6] King, John Paul. “Grindr Rolls out New Features for Countries Where LGBTQ Identity Puts Users at Risk.” Washington Blade: LGBTQ News, Politics, LGBTQ Rights, Gay News, 13 Dec. 2019, https://www.washingtonblade.com/2019/12/13/grindr-rolls-out-new-features-for-countries-where-lgbtq-identity-puts-users-at-risk/.

Singer, Natasha, and Aaron Krolik. “Grindr and OkCupid Spread Personal Details, Study Says.” New York Times, New York Times, 13 Jan. 2020, https://www.nytimes.com/2020/01/13/technology/grindr-apps-dating-data-tracking.html.

The Digital Rights of LGBTQ+ People: When Technology Reinforces Societal Oppressions.” European Digital Rights (EDRi), 15 Sept. 2020, https://edri.org/our-work/the-digital-rights-lgbtq-technology-reinforces-societal-oppressions.

March 9, 2022March 11, 2022

Your phone is following you around.

Your phone is following you around.
By Theresa Kuruvilla | March 9, 2022

We live in a technologically advanced world where smartphones have become an essential part of our daily lives. Most individuals start their day with a smartphone wake-up alarm, scrolling through daily messages and news items, checking traffic conditions, work emails, calling family, or watching a movie or sports; the smartphone has become a one-stop-shop for everything. However, many people are unaware of what happens behind the screens.

From the time you put the sim card on the phone, regardless of whether it is an android or iPhone, the phone IMEA, hardware serial number, SIM serial number, and IMSI and headphone number will be sent to Apple or Google. The telemetry applications of these companies work on accessing the mac addresses of nearby devices to capture the phone’s GPS location. Many people think turning off GPS location on the phone prevents them from being tracked. They are mistaken. These companies capture your every movement thanks to cell phone technology advancement.

Under law listening to someone else’s phone call without a court order is a federal crime. But no rules prevent private companies from capturing citizens’ precise movements and selling the information for a price. This shows the dichotomy between the legacy methods of privacy invasion and the lack of regulation around the intrusive modern technologies.

On January 6, 2021, a group of Trump supporters’ political rallies turned into a riot at the US Capitol. The event’s digital detritus has been the key to identifying the riot participants: location data, geotagged photos, facial recognition, surveillance cameras, and crowdsourcing. That day, the data collected included location pings for thousands of smartphones, revealing around 130 devices inside the Capitol, exactly when Trump supporters stormed the building. There were no names or phone numbers; however, with the proper use of the technology available, many devices were connected to their owners, tying anonymous locations back to names, home addresses, social networks, and phone numbers of people in attendance. The disconcerting fact is that the technology to gather this data is available for anyone to purchase at an affordable price. Third parties use readily available technology to collect these data. Most consumers whose name is on the list are unaware of their data collected, and it is insecure and vulnerable to law enforcement and bad actors who might use it to inflict harm on innocent people.

(Image: Location pings from January 6, 2021, rally)

Government Tracking Of Capital Mob Riot

For law enforcement to use this data, they must go through courts, warrants, and subpoenas. The data in this example is a bird’s-eye view of an event. But the hidden story behind this is how the new digital era and the various tacit agreements we agree on invade our privacy.

When it comes to law enforcement, this data is primary evidence. On the other hand, these IDs tied to smartphones allow companies to track people across the internet and on their apps. Even though these data are supposed to be anonymous, several tools allow anyone to match the IDs with other databases.

Below example from the New York times shows way to identify anonymous ping.

(Image: Graphical user interface, application)

Description automatically generated

While some Americans might cheer using location databases to identify Trump supporters. But they are ignorant that these commercial databases invade thier user privacy as well. The demand for location data grows daily, and deanonymization has become simpler. While smartphone technology might argue that they provide options to minimize tracking, it is only an illusion of control for individual privacy. The location data is not the only aspect; the tech companies capture every activity with all the smart devices deployed around you with the purpose of making your lives better. With the Belmont Principle of Beneficence, we must maximize the advantage of technology while minimizing the risk. In case of this, even though consumers receive many benefits such as better traffic maps, safer cars, and good recommendations, surreptitiously gathering this data and storing it forever and selling this information to the highest price bidder puts privacy at risk. The privacy acts such as GDPR and CCPR are in place to protect consumer privacy, but this doesn’t protect all people in the same manner. People should have the right to know how their data has been gathered and used. They should be given the freedom to choose a life without surveillance.

References:

Thompson, Stuart A. and Warzel, Charlie (2021, February 6). They stormed the capitol. Their apps tracked them. Medium. https://www.nytimes.com/2021/02/05/opinion/capitol-attack-cellphone-data.html

Thompson, Stuart A. and Warzel, Charlie (2019, December 21). How Your Phone Betrays Democracy. Medium. https://www.nytimes.com/interactive/2019/12/21/opinion/location-data-democracy-protests.html?action=click&module=RelatedLinks&pgtype=Article

The editorial board New York Times (2019, December 21). Total surveillance is not what America signed up for. Medium. https://www.nytimes.com/interactive/2019/12/21/opinion/location-data-privacy-rights.html?action=click&module=RelatedLinks&pgtype=Article

Nissenbaum, Helen F. (2011). A contextual approach to privacy online. Daedalus, the Journal of the American Academy of Arts & Sciences.

Solove, Daniel J. (2006). A Taxonomy of Privacy. University of Pennsylvania Law Review. 154:3 (January 2006), p. 477. Medium. https://ssrn.com/abstract=667622

The Belmont Report (1979).Medium. https://www.hhs.gov/ohrp/sites/default/files/the-belmont-report-508c_FINAL.pdf

March 9, 2022

A market-based counterweight to AI driven polarization

A market-based counterweight to AI driven polarization
By Anonymous | March 9, 2022

Stuart Russell examines the rise and existential threats of AI in his book Human Compatible. While he takes on a broad range of issues related to AI and Machine Learning, he begins the book by pointing out that AI shapes the way we live today. Take the example of a social media algorithm tasked with increasing user engagement and revenue. It’s easy to see how an AI might do this, but Russell presents an alternative theory to the problem of driving user engagement and revenue. “The solution is simply to present items that the user likes to click on, right? Wrong. The solution is to change the user’s preferences so that they become more predictable.” Russell has a more ambitious goal and continues to pull on this thread to its conclusion where algorithms tasked with one thing create other harms to achieve its end, in this case creating polarization and rage to drive engagement.

As Russell states, AI is present and a major factor in our lives today, so is the harm that it creates. One of these harms is the polarization causes through the distribution of content online such as search engines or social networks. Is there a counterweight to the current the financial incentives of content distribution networks, such as search engines and social networks, that could help create a less polarized society?

The United States is more polarized than ever, and the pandemic is only making things worse. Just before the 2020 presidential election, “roughly 8 in 10 registered voters in both camps said their differences with the other side were about core American values, and roughly 9 in 10—again in both camps—worried that a victory by the other would lead to “lasting harm” to the United States.” (Dimock & Wike, 2021) This gap only widened over the summer leading to Dimock and Wike concluding that the US has become more polarized more quickly than the rest of the world.

(Kumar, Jiang, Jung, Lou, & Leskovec, MIS2: Misinformation and Misbehavior Mining on the Web)

While we cannot attribute all of this to the rise of digital media, social networks, and online echo chambers, they are certainly at the center of the problem and one of the major factors. A study conducted by Hunt Allcott, Luca Braghieri, Sarah Eichmeyer, and Matthew Gentzkow published in the American Economic Review found that users became less polarized when the stopped using social media for only a month. (Allcott, Braghieri, Eichmeyer, & Gentzkow, 2020)

Much of the public focus on social media companies and polarization has focused on the legitimacy of the information presented, fake news. Freedom of speech advocates have pointed out that the label of fake news stifles speech. While many of these voices today come from the right, and much of the evidence of preferential treatment of points of view online does not support their argument, the point of view is valid and should be concerning on the face of the argument. If we accept that there is no universal truth upon which to build facts, then it follows that fake news is similarly what isn’t accepted today. This is not to say all claims online as being potentially seen as the truth in the future but the argument is that just because something is perceived to be fake or false today doesn’t mean it is fake.

This means we would need to create a system to clean up to the divisive online echo chamber not based on truth but based on perspectives presented. Supreme Court Justice Louis D. Brandeis famously said the counter to fake speech is more speech (Jr., 2011) so it is possible to create an online environment where users were presented with multiple points of view instead of the same one over and over.

Most content recommendation algorithms are essentially cluster models where the algorithm presents articles with similar content and points of view are presented to a user as articles they have liked in the past. The simple explanation being, if you like one article, you’ll also be interested in a similar one. If I like fishing articles, I’m more likely to see articles about fishing. While if I read articles about overfishing, I’m going to see articles with that perspective instead. This is a simple example of the problem, where, depending on the point of view one starts with, they only get dragged deeper into that hole. Apply this to politics and the thread of polarization is obvious.

Countering this is possible. Categorize content on multiple vectors including topic and point of view. Then present similar topics with opposing points of view to present not only more speech but more diverse speech to put the power of decision back in the hands of the human and away from the distributer. In response to the recent Joe Rogan controversy, Spotify has pledged to invest $100mm in more diverse voices but has not presented a plan to promote them on the platform to create a more well-rounded listening environment for its users. Given how actively Spotify promotes Joe Rogan, they need a plan to ensure that different voices are as easy to discover, not just available.

The hindrance for any private entity adopting this the same with almost any required change of a shareholder driven company, financial. As Russell pointed out, polarizing the public is profitable as it makes users more predictable and easier to keep engaged. There is a model for which a greater harm is calculated and paid for by a company which has the effect of creating a new market entirely, cap and trade.

(How cap and trade works , 2022)

Cap and trade is the idea where companies are allowed to pollute a certain amount. That creates two types of companies, those that create more pollution than they are allocated and those that produce less. Cap and trade allows polluters to buy the allocation of companies under polluting to ensure an equilibrium.

There is a similar model around speech where companies that algorithmically promote one point of view need to offset that distribution by also promoting the other point of view to the same user or group of users. This has two effects. First, it creates a financial calculous for companies who distribute content on whether they should only promote a single point of view to a subset of users, a model that has been highly profitable in the past, if they need to pay for the balance of speech. While at the same time it creates a new market of companies selling offsets who could only promote opposing points of view to specific groups than they are already receiving, points of view they are less likely to engage with, knowing they can increase their compensation for these efforts by polarizing offenders.

Before this goes to the logical conclusion where two companies, each promoting their opposing points of view so each equally guilty of polarization let’s talk about possibly making this work in practice. There are some very complicated issues that would make this difficult to implement like personal privacy and private companies’ strategy of keeping what they know about their users proprietary.

A social media companies presents an article to a user saying that mask mandates are create an undo negative effect on society. They would then have pushed that user towards one end of the spectrum and would be required to present an article to that user making the argument that mask mandates are necessary to slow the spread of the virus. That social media company could either present that new piece of information themselves or sell that piece of information to another company willing to create an offset to presenting it in an outside platform, creating equilibrium. Here the social media company needs to do the calculous of whether it is more profitable to continue to polarize that user on their platform or to create balance within their own walls.

This example is clearly over simplified, but ‘level of acceptance’ could be quantified, and companies could be required to create balance of opinion for specific users or subsets of users. If 80% of publications are publishing one idea, and 20% of publications are presenting the opposing idea then content distributers would be required to create an 80/20 balance for their users.

This is an imperfect starting point for creating algorithmic balance online but one to discuss an incentive and market-based approach to providing fairness to de-polarize the most polarized society at the most polarized moment in recorded history.

Bibliography
Allcott, H., Braghieri, L., Eichmeyer, S., & Gentzkow, M. (2020). The Welfare Effects of Social Media. American Economic Review.
Dimock , M., & Wike, R. (2021, March 29). America Is Exceptional in Its Political Divide. Retrieved from Pew Trusts: https://www.pewtrusts.org/en/trust/archive/winter-2021/america-is-exceptional-in-its-political-divide
How cap and trade works. (n.d.). Retrieved from nvironmental Defense Fund: https://www.edf.org/climate/how-cap-and-trade-works
How cap and trade works . (2022). Retrieved from Environmental Defense Fund: https://www.edf.org/climate/how-cap-and-trade-works
How cap and trade works . (2022). Retrieved from Environmental Defense Fund: https://www.edf.org/climate/how-cap-and-trade-works
Jr., D. L. (2011, December). THE FIRST AMENDMENT ENCYCLOPEDIA. Retrieved from ntsu.edu: https://www.mtsu.edu/first-amendment/article/940/counterspeech-doctrine#:~:text=Justice%20Brandeis%3A%20%22More%20speech%2C%20not%20enforced%20silence%22&text=%E2%80%9CIf%20there%20be%20time%20to,speech%2C%20not%20enforced%20silence.%E2%80%9D
Kumar, S., Jiang, M., Jung, T., Jle Luo, R., & Leskovec, J. (2018). MIS2: Misinformation and Misbehavior Mining on the Web. the Eleventh ACM International Conference.
Kumar, S., Jiang, M., Jung, T., Lou, R., & Leskovec, J. (n.d.). MIS2: Misinformation and Misbehavior Mining on the Web. 2018.

March 9, 2022

Dangers of Predicting Criminality

Dangers of Predicting Criminality
By Kritesh Shrestha | March 9, 2022

Facial recognition as a technology has seen major improvements within the last 5 years, today it is common to use facial recognition commercially for biometric identification. According to test conducted by National Institute of Standards and Technology, the highest performing facial identification algorithm as of April 2020 has an error rate of 0.08% compared to 4.1% of highest performing in 2014. [3] Though these improvements are commendable, concerns arise when attempting to apply these algorithms on high stake issues such as criminality.

Tech to Prision Pipeline
On May 5th 2020, Harrisburg University announced that a publication entitled, “A Deep Neural Network Model to Predict Criminality Using Image Processing” is being finalized. In this publication, a group a of Harrisburg University professors and a Ph.D student claim to have developed an automated computer facial recognition software capable of predicting whether someone is likely going to be a criminal. [4] This measure of criminality is said to have an 80% accuracy with no racial bias just by using a picture of an individual’s face. Data being used behind the software is biometric and criminal legal data provided by the New York City Police Department (NYPD). While the intent of this software is to help prevent crime, it caught the eye of 2,435 academics that signed an open letter demanding the research remains unpublished.

Those that signed the open letter, the Coalition for Critical Technology (CCT), raised concerns over the data used to create the algorithm. The CCT argue that data generated by the criminal justice system cannot be used for classifying criminality as the data would unreliable. [5] The dataset contains history of racially bias and unjust convictions which will feed that same bias into the algorithm. Another study, _”The ‘Criminality from Face’ Illusion”_, looking into the plausibility of using predicting criminality with facial recognition asserts, “there is no coherent definition on which to base development of such an algorithm. Seemingly promising experimental results in criminality-from-face are easily accounted for by simple dataset bias”. [2] A study conducted by the National Criminal Justice Reference Service concluded that for sexual assault alone, wrongful conviction occurred at at rate of 11.6%. [6] The use of unreliable data to classify an individual’s likelihood to commit crimes is harmful as it would validate unjust practices that have occurred over the years.

If an individual was wrongly convicted awaiting to be exonerated, their family members or those that look like them might be labeled as “likely” to commit crimes. The study announced by Harrisburg University has since been pulled from the publication public discussion and the CCT.

Resurgence of Physiognomy
While use of facial recognition algorithms as a predictor is relatively new, the practice of using outer appearance to predict characteristics, __physiognomy__, dates back to the 18th century. [1] Physiognomy, in the past, has been used to promote racial bigotry, block immigration, justify slavery, and permit genocide. While physiognomy has been disproven, the pseudo science seems to be on the rise with the increase uses of facial recognition. The issue with Physiognomy lies in the belief that physical features are a good indicators for complex human behavior. The simplistic belief is problematic in that it skips several levels of abstraction, ignoring the…role of learning and environmental factors in human development. [2] I don’t believe predicting criminality in a vacuum is not harmful, though given the history of physiognomy, predicting criminality seems to be regressive.

Conclusion
The use of facial features as an identifier for criminality is inherently bias as it means accepting the assumption that individuals with certain facial features are more likely to commit crime. With the knowledge that bias exists within our criminal justice system it is irresponsible to recommend the use of criminal justice data to predict criminality. The implication of an algorithm being able to predict criminality is frightening as it could be used to further unjust actions.

Open Ended Thought Experiment in Predicting Criminality
What would the world look if an algorithm has reliable data and is 100% accurate at predicting criminality?
– If a child were to be born into this world with all of the features that classify as “likely to commit crime”; should that child be monitored?
– What rights would that child have to their own privacy if the algorithm is certain that the child will be a criminal?
– What does it mean for the future of child, should they be denied rights due to this classification?

References
[1] Arcas, Blaise Aguera y, et al. “Physiognomy’s New Clothes.” _Medium_, Medium, 20 May 2017, https://medium.com/@blaisea/physiognomys-new-clothes-f2d4b59fdd6a.
[2] Bowyer, Kevin W., et al. “The ‘Criminality from Face’ Illusion.” _IEEE Transactions on Technology and Society_, vol. 1, no. 4, 2020, pp. 175–183., https://doi.org/10.1109/tts.2020.3032321.
[3] Crumpler, William. “How Accurate Are Facial Recognition Systems – and Why Does It Matter?” _How Accurate Are Facial Recognition Systems – and Why Does It Matter? | Center for Strategic and International Studies_, 16 Feb. 2022, https://www.csis.org/blogs/technology-policy-blog/how-accurate-are-facial-recognition-systems-%E2%80%93-and-why-does-it-matter#:~:text=Facial%20recognition%20has%20improved%20dramatically,Standards%20and%20Technology%20(NIST).
[4] “Hu Facial Recognition Software Predicts Criminality.” _Harrisburg University_, 5 May 2020, https://web.archive.org/web/20200506013352/https://harrisburgu.edu/hu-facial-recognition-software-identifies-potential-criminals/.
[5] Technology, Coalition for Critical. “Abolish the #TechToPrisonPipeline.” _Medium_, Medium, 21 Sept. 2021, https://medium.com/@CoalitionForCriticalTechnology/abolish-the-techtoprisonpipeline-9b5b14366b16.
[6] Walsh, Kelly, et al. _The Author(s) Shown below Used Federal Funding Provided by …_ Office of Justice Programs, 1 Sept. 2017, https://www.ojp.gov/pdffiles1/nij/grants/251115.pdf.