blogpost – Page 25 – Data Science W231 | Behind the Data: Humans and Values

October 21, 2020

Clearview AI: The startup that is threatening privacy

Clearview AI: The startup that is threatening privacy
By Stefania Halac, October 16, 2020

Imagine walking down the street, a stranger points their camera at you and can immediately pull up all your pictures from across the internet; they may see your instagram posts, your friends’ posts, any picture that you appear in, some which you may have never seen before. This stranger could now ascertain where you live, where you work, where you went to school, whether you’re married, who your children are… This is one of many compromising scenarios that may become part of our normal life if facial recognition software is widely available.

Clearview AI, a private technology company, offers facial recognition software that can effectively identify any individual. Facial recognition technology is intrinsically controversial, so much so that certain companies like Google don’t offer facial recognition APIs due to ethical concerns. And while some large tech companies like Amazon and Microsoft do sell facial recognition APIs, there is an important distinction between Clearview’s offering and that of the other tech giants. Amazon and Microsoft only allow you to search for faces from a private database of pictures supplied by the customer. Clearview instead allows for recognition of individuals in the public domain — practically anyone can be recognized. What sets Clearview apart is not its technology, but rather the database it assembled of over three billion pictures scraped from the public internet and social media. Clearview AI did not obtain consent from individuals to scrape these pictures, and has been sent cease and desist orders from major tech companies like Twitter, Facebook and Youtube over its practices due to policy violations.

In the wake of the Black Lives Matter protests earlier this year, IBM, Microsoft and Amazon updated their policies to restrict the sale of their facial recognition software to law enforcement agencies. On the other hand, Clearview AI not only sells to law enforcement and government agencies, but until May of this year was also selling to private companies, and has even been reported to have granted access to high net-worth individuals.

So what are the risks? One on hand, the algorithms that feed these technologies are known to be heavily biased and perform more poorly on certain minority populations such as women and African Americans. In a recent study, Amazon’s Rekognition was found to misclassify women as men 19% of times, and darker-skinned women for men 31% of time. If this technology were to be used in the criminal justice system, one implication here is that dark-skinned people would be more likely to be wrongfully identified and convicted.

Another major harm is that this technology essentially provides its users the ability to find anyone. Clearview’s technology would enable surveillance at protests, AA meetings and religious gatherings. Attending any one of these events or locations would become a matter of public record. In the wrong hands, such as those of a former abusive partner or a white supremacist organization, this surveillance technology could even be life-threatening for vulnerable populations.

In response, the ACLU filed a lawsuit against Clearview AI in May for violation of the Illinois Biometric Information Privacy Act (BIPA), alleging the company illegally collected and stored data on Illinois citizens without their knowledge or consent and then sold access to its technology to law enforcement and private companies. While some cities like San Francisco and Portland have enacted facial recognition bans, there is no overarching national law protecting civilian privacy from these blatant privacy violations. With no such law in sight, this may be the end of privacy as we know it.

References:

We’re Taking Clearview AI to Court to End its Privacy-Destroying Face Surveillance Activities

October 21, 2020

The Gender Square: A Different Way to Encode Gender

The Gender Square: A Different Way to Encode Gender
By Emma Tebbe, October 16, 2020

Image: square with two axes, the horizontal reading Masculine and Feminine and the vertical reading Low Gender Association / Agender and Strong Gender Association

As non-gender-conforming and transgender folks become more visible and normalized, the standard male / female / other gender selections we all encounter in forms and surveys become more tired and outdated. First of all, the terms “male” and “female” generally refer to sex, or someones biological configuration, “including chromosomes, gene expression, hormone levels and function, and reproductive/sexual anatomy.” Male and female are not considered the correct terms for gender orientation, which “refers to socially constructed roles, behaviours, expressions and identities of girls, women, boys, men, and gender diverse people.” Although sex exists on a spectrum which includes intersex people, gender has a wide range of identities, including agender, bigender, and genderqueer. This gender square method of encoding gender aims to encompass more of the gender spectrum than a simple male / female / other selection.

Image: triangle defining sex, gender expression, gender attribution, and gender identity

Upon encountering this square in a form or survey, the user would drag the marker to the spot on the square that most accurately represents their gender identity. This location would then be recorded as a coordinate pair, where (0, 0) is the center of the square. The entity gathering the data would then likely use those coordinates to categorize respondents. However, using continuous variables to represent gender identity allows for many methods of categorization. The square could be divided into quadrants, as pictured above, vertical halves (or thirds, or quarters), or horizontal sections. This simultaneously allows for flexibility in how to categorize gender and reproducibility of results by other entities. Other analysts would be able to reproduce results if they are given respondents’ coordinates and the categorization methodology used. Coordinate data could even be used as it was recorded, turning gender from a categorical variable into a continuous one.

Although this encoding of gender encompasses more dimensions, namely representing gender as a spectrum which includes agender identities, it still comes with its own problems. First of all, the gender square still does not leave room for flexible gender identities including those whose gender is in flux or those who identify as genderfluid or bigender. There are a few potential solutions for this misrepresentation on the UI side, but these create new problems with data encoding. Genderfluid folks could perhaps draw an enclosed area in which their gender generally exists, but recording this data is much more complex than a simple coordinate pair, and would become an array of values rather than a coordinate pair. People who identify as bigender could potentially place two markers, one for each of the genders they experience. Both this approach and an area selection approach make the process of categorization more complex – if an individual’s gender identity spans two categories, would they be labeled twice? Or would there be another category for people who fall into multiple categories?

Image: a gender spectrum defining maximum femininity as “Barbie” and maximum masculinity as “G.I. Joe”

Another issue might arise with users who haven’t questioned their gender identity along either of these axes, and may not understand the axes (particularly the Highly Gendered / Agender axis) enough to accurately use the gender square. When implemented, the gender square would likely need an explanation, definitions, and potentially suggestions. Suggestions could include examples such as “If you identify as a man and were assigned that gender at birth, you may belong in the upper left quadrant.” Another option may be to include examples such as in the somewhat problematic illustration above.

This encoding of gender would likely first be adopted by groups occupying primarily queer spaces, where concepts of masculinity, femininity, and agender identities are more prominent and considered. If used in places where data on sex and transgender status is vital information, such as at a doctor’s office, then the gender square would need to be supplemented by questions obtaining that necessary information. Otherwise, it is intended for use in spaces where a person’s sex is irrelevant information (which is most situations where gender information is requested).

Although still imperfect, representation and identification of gender along two axes represents more of the gender spectrum than a simple binary, and still allows for categorization, which is necessary for data processing and analytics. With potential weaknesses in misunderstanding and inflexibility, it finds its strength in allowing individuals to more accurately and easily represent their own identities.

References:
https://cihr-irsc.gc.ca/e/48642.html
https://www.glsen.org/activity/gender-terminology
https://journals.sagepub.com/doi/full/10.1177/2053951720933286
Valentine, David. The Categories Themselves. GLQ: A Journal of Lesbian and Gay Studies, Volume 10, Number 2, 2004, pp. 215-220
https://www.spectator.co.uk/article/don-t-tell-the-parents for image only

October 19, 2020

When Algorithms Are Too Accurate

When Algorithms Are Too Accurate
By Jill Cheney, October 16, 2020

An annual rite of passage every Spring for innumerable students is college entrance exams. Regardless of their name, the end result is the same: to influence admission applications. When the Covid-19 pandemic swept the globe in 2020, this milestone changed overnight. Examinations were cancelled, leaving students and universities with no traditional way to evaluate admission. Alternative solutions emerged with varying degrees of veracity.

In England, the solution used to replace their A-level exams involved developing a computer algorithm to predict student performance. In the spirit of a parsimonious model, two parameters were used: the student’s current grades and the historical test record of the attending school. The outcome elicited nationwide ire by highlighting inherent testing realities.

Overall, the predicted exam scores were higher – more students did better than on any previous resident exam with 28% getting top scores in England, Wales and Northern Ireland. However, incorporating the school’s previous test performance into the algorithm created a self-fulfilling reality. Students at historically high performing schools had inflated scores; conversely, students from less performing schools had deflated ones. Immediate cries of AI bias erupted. However, the data wasn’t wrong – the algorithm simply highlighted the inherent biases and disparity in the actual data modeled.

Reference points did exist for the predicted exam scores. One was from teachers since they provide a prediction on student performance. The other was from student scores on previous ‘mock’ exams. Around 40 percent of students received a predicted score that was one step lower than their teachers’ predictions. Not surprisingly, the largest downturn in predictions occurred amongst poorer students. Many others had predicted scores below their ‘mock’ exam scores. Mock exam results support initial university acceptance; however, they must be followed-up with commensurate official exam scores. For many
students, the disparity between their predicted and ‘mock’ exam scores jeopardized their university admission.

Attempting to rectify the disparities came with its own challenges. Opting to use teacher predicted scores required accepting that not all teachers provided meticulous student predictions. Based on teacher predictions alone, 38% of predicted scores would have been at the highest levels: A*s and As. Other alternatives included permitting students to retake the exam in the Fall or allowing the ‘mock’ exam scores to stand-in should they be higher than the predicted ones. No easy answers existed when attempting to navigate an equitable national response.

As designed, the computer model assessed the past performance of a school over student performance. Individual grades could not offset the influence of a school’s testing record. It also clearly discounted more qualitative variables, such as test performance skills. In the face of a computer-generated scoring model, a feeling of powerlessness emerged. No longer did students feel they possessed control over their future and schooling opportunities.

Ultimately, the predictive model simply exposed the underlying societal realities and quantified how wide the gap actually is. In the absence of the pandemic, testing would have continued on the status quo. Affluent schools would have received higher scores on average than fiscally limited schools. Many students from disadvantaged schools would have individually succeeded and gained university admission. The public outcry this predictive algorithm generated underscores how the guise of traditional test conditions assuages our concerns about the realities of standardized testing.

Sources:
https://www.theverge.com/2020/8/17/21372045/uk-a-level-results-algorithm-biased-coronavirus-covid-19-pandemic-university-applications

https://www.bbc.com/news/education-53764313

October 19, 2020

Data as Taxation

Data as Taxation
By Anonymous, October 16, 2020

Data is often analogized with transaction. We formulate our interactions with tech companies as an exchange of our data as payment for services, which in turn allow for the continued provision of those services.

Metaphors like these can be useful in that they allow us to port developed intuitions from a well-trodded domain (transactions) to help us navigate more less familiar waters (data). In this spirit, I wanted to further develop this “data collection = economic transaction” metaphor, and explore how our perceptions of data collection change with a slight tweak: “data collection = taxation”

In the context of data collection, the following quote from Supreme Court Justice Oliver Wendall Holmes might give one pause. Is this applicable, or entirely irrelevant?

Here’s what I mean: with taxation, government bodies mandate that citizens contribute a certain amount of resources to fund public services. The same goes for data – while Google, Facebook, and Amazon are not governments, they also create and maintain enormous ecosystems that facilitate otherwise impossible interactions. Governments allow for a coordination around national security, education, and supply chains, and Big Tech provides the digital analogues. Taxation and ad revenue allow for the perpetual creation of this value. Both can embody some (deeply imperfect) notion of “consent of the governed” through voter and consumer choice, although neither provides an easy way to “opt out.”

Is this metaphor perfect? Not at all, but there is still value in making the comparison. We can recycle centuries of bickering over fairness in taxation.

For instance, one might ask “when is taxation / data collection exploitative?” On one end, some maintain that “all taxation is theft,” a process by which private property is coercively stripped. Some may feel a similar sense of violation as their personal information is harvested – for them, perhaps the amorphous concept of “data” latches onto the familiar notion of “private property,” which might in turn suggest the need for some kind of remuneration.

At the other extreme, some argue that taxation cannot be the theft of private property, because the property was never private to begin with. Governments create the institutions and infrastructure that allows the concept of “ownership” to even exist, and thus all property is on loan. One privacy analogue could be that the generation of data is impossible and worthless without the scaffolding of Big Tech, and thus users have a similarly tenuous claim on their digital trails.

The philosophy of just taxation has provided me an off-the-shelf frame by which to parse a less familiar space. Had I stayed with the “data collection = economic transaction” metaphor, I would have never thought about data from this angle. As is often the case, a different metaphor illuminates different dimensions of the issue.

Insights can flow the other way as well. For example, in data circles there is a developing sophistication around what it means to be an “informed consumer.” It is recognized by many that merely checking the “I agree” box does not constitute a philosophically meaningful notion of consent, as the quantity and complexity of relevant information is too much to expect from any one consumer. Policies and discussions around the “right to be forgotten”, user control of data, or the right to certain types of transparency acknowledge the moral tensions inherent in the space.

These discussions are directly relevant to justifications often given for a government’s right to tax, like the “social contract” or the “consent of the governed.” Both often have some notion of informed consent, but this sits on similarly shaky ground. How many voters know how their tax dollars are being spent? While government budgets are publicly available, how many are willing to sift through reams of legalese? How many voters can tell you what military spending is within an even order of magnitude? Probably as many as who know exactly how their data is packaged and sold. The data world and its critics have much to contribute to the question of how to promote informed decision-making in a world of increasing complexity.

Linguists George Lakoff and Mark Johnson suggest that metaphors are central to our cognitive processes.

Of course, all of these comparisons are deeply imperfect, and require much more space to elaborate. My main interest in writing this was exploring how this analogical shift led to different questions and frames. The metaphors we use have a deep impact on our ability to think through novel concepts, particularly when navigating the abstract. They shape the questions we ask, the connections we make, and even the conversations we can have. To the extent that that’s true, metaphors can profoundly reroute society’s direction on issues of privacy, consent, autonomy, and property, and are thus well-worth exploring.

October 14, 2020

When an Algorithm Replaces Cash Bail

When an Algorithm Replaces Cash Bail
Allison Godfrey
October 9th, 2020

In an effort to make the criminal justice system more equitable, California Senate Bill 10 replaced cash bail with a predictive algorithm producing a risk assessment score to determine if the accused needs to remain in jail before their trial. The risk assessment places suspects into low, medium, or high risk categories. Low risk individuals are generally released before trial, while high risk individuals remain in jail. In cases with medium risk individuals, the judge has much more discretion in determining their placement before trial and conditions of release. This bill also releases all suspects charged with a misdemeanor without needing a risk assessment. This bill was signed into law in 2018 and effective in October 2019. California Proposition 25 seeks to repeal this bill and return to cash bail on the basis that this algorithm biases the system even more than cash bail. People often see data and algorithms as purely objective, since they are based on numbers and formulas. However, they are often “black box” models where we have no way of knowing exactly how the algorithm arrived at the output. If we cannot follow the model’s logic, we have no way of identifying and modifying its bias.

Image from this article

By the nature of predictive algorithms, they learn from the data in much of the same way as humans learn from their life’s inputs (experiences, conversations, schooling, family, etc). Our life experiences make us inherently biased since we hold a unique perspective purely shaped by this set of experiences. Similarly, algorithms learn from the data we feed into them and spit out the perspective that the data creates: an inherently biased perspective. Say, for example, we feed a predictive model some data about 1,000 people with pending trials. While the Senate Bill is not clear on the exact inputs to the model, say we feed the model the following attributes of each person: age, gender, charge, past record, income, zip code, and education level. We exclude the person’s race from the model in an effort to eliminate racial bias. But, have we really eliminated racial bias?

Image from this article

Let’s compare two people: Fred and Marc. Fred and Marc have the exact same charge, identify as the same gender, have similar incomes, both have bachelor’s degrees, but live in different towns. The model learns from past data that people from Fred’s zip code are generally more likely to commit another crime than people from Marc’s zip code. Thus, Fred receives a higher risk score than does Marc and he awaits his trial in jail while Marc is allowed to go home. Due to the history and continuation of systemic racism in the country, neighborhoods are often racially and economically segregated, so people from one zip code may be much more likely to be people of color and lower income than those from their neighboring town. Thus, by including an attribute like zipcode, we are introducing economic and racial bias into the model even if these additional attributes are not explicitly stated. While the original goal of Senate Bill 10 was to eliminate the ability for wealth to be a determining factor in bail decisions, it inadvertently reintroduces wealth as a predictor in the algorithm through the economic bias that is woven into it. Instead of equalizing the scale in the criminal justice system, the algorithm tips the scale even further.

Image from this article

Additionally, the purpose of cash bail is to ensure the accused will show up to their trial. While it is true that the system of cash bail can be economically inequitable, the algorithm does not seem to be addressing the primary purpose of bail. There is no part of Senate Bill 10 that helps ensure that the accused will be present at their trial.

Lastly, Senate Bill 10 allows judge discretion for any case, particularly medium risk cases. Human bias in the courtroom has historically played a big role in the inequality of our justice system today. The level of discretion the judge has to overrule the risk assessment score could re-introduce the human bias the model partly seeks to avoid. It has been shown that judges exercise this power more often to place someone in jail than they do to release them. In the time of Covid-19, going to jail has an increased risk of infection. With this heightened risk of jail, our decision system, whether that be algorithmic, monetary, and/or human centered, should err more on the side of release, not detainment.

The fundamental question is one that neither cash bail nor algorithms can answer:
How do we eliminate wealth as a determining factor in the justice system while also not introducing other biases and thus perpetuating systemic racism in the courtroom?

October 12, 2020

To Broadcast, Promote, and Prepare: Facebook’s Alleged Culpability in the Kenosha Shootings

To Broadcast, Promote, and Prepare: Facebook’s Alleged Culpability in the Kenosha Shootings
By Matt Kawa | October 9, 2020

The night of August 25, 2020 saw Kenosha, WI engrossed with peaceful protests, riots, arson, looting, and killing in the wake of the shooting of Jacob Blake. In many ways Kenosha was not unlike cities all around the country facing protests both peaceful and violent sparked by the killing of George Floyd and others by police forces. However, Kenosha manages to distinguish itself by the fact that in the midst of the responses to the untimely death of these individuals, more individuals were killed. Namely, two protestors were shot and killed, and another injured, by seventeen-year-old Antioch, IL resident, Kyle Rittenhouse.

Rittenhouse was compelled and mobilized to cross state lines, illegally (as a minor) in possession of a firearm, to “take up arms and defend out City [sic] from the evil thugs” who would be protesting, as posted by a local vigilante militia that calls themselves the Kenosha Guard. The Kenosha Guard set up a Facebook event (pictured below) entitled “Armed Citizens to Protect our Lives and Property” in which the administrators posted the aforementioned quote (also pictured).

In addition to egregious proliferation of racist and antisemitic rhetoric, the administrators of these Facebook groups blatantly promote commission of acts of violence against protestors and rioters, not only via the groups per se, but on their personal accounts as well.

On September 22, a complaint and demand for jury trial was filed by the life partner of one of Rittenhouse’s victims and three other Kenosha residents with the United States District Court for the Eastern District of Wisconsin against shooter Kyle Rittenhouse, Kyle Matheson, “commander” of the Kenosha Guard, co-conspirator Ryan Balch a member of a similar violent organization called the “Boogaloo Bois,” both organizations per se, and most surprisingly, Facebook, Inc.

The complaint effectively alleges intentional negligence on behalf of Facebook for allowing the vigilantes to coordinate their violent presence unchecked. The claim states that Facebook “provides the platform and tools for the Kenosha Guard, Boogaloo Bois, and other right-wing militias to recruit members and plan events.” In anticipation of the defense of ignorance, the complaint then cites that over four hundred reports were filed by users regarding the Kenosha Guard group and event page expressing concern that members would be seeking to cause violence, intimidation, and injury. Reports containing speculation which, as the complaint summarizes, ultimately did transpire.

While Facebook CEO Mark Zuckerberg did eventually apologize for his platforms role in the incident, calling it an “operational mistake” and removing the Kenosha Guard page, the complaint claims that as part of an observable pattern of similar behavior, Facebook “failed to act to prevent harm to Plaintiffs and other protestors” by ignoring material numbers of reports attempting to warn them.

Ultimately, the Plaintiffs’ case rests on the Wisconsin legal principle that, “A duty consists of the obligation of due care to refrain from any act which will cause foreseeable harm to others . . . . A defendant’s duty is established when it can be said that it was foreseeable that [the] act or omission to act may cause harm to someone.” Or, simply put, Facebook had a duty to “stop the violent and terroristic threats that were made using its tools and platform,” including through inaction.

Inevitably, defenses will be made on First Amendment grounds, claiming that the Kenosha Guard and Boogaloo Bois, and their leaders and members, were simply exercising their right to freedom of speech, a right Facebook ought to afford its users. However, the Supreme Court has interpreted numerous exceptions into the First Amendment including quite prominently forbidding of incitement to violence. Whether Facebook has a moral obligation to adjudicate First Amendment claims is less clear cut. But the decision must be made in the modern, rapidly evolving world of social media as to what the role of the platform is in society and what ought or ought no be permissible enforcement of standards across the board.

The full text of the complaint can be found here.

October 12, 2020October 12, 2020

Facing Security and Privacy Risks in the Age of Telehealth

Facing Security and Privacy Risks in the Age of Telehealth
By Anonymous | October 9, 2020

As the world grapples with the coronavirus pandemic, more healthcare providers and patients are turning to telehealth visits–visits where the patient is remote and communicates with her provider through a phone call or video conference. While telehealth visits will continue to facilitate great strides forward in terms of patient access, there are privacy risks that need to be mitigated to secure the success of remote visits.

Image: National Science Foundation

Participating in a remote visit opens up a patient to many potential touchpoints of security risk. For example, ordinary data transmissions from a mobile application or medical device, such as an insulin pump, may be inadvertently shared with a third party based on the permissions granted to applications on a patient’s mobile device. Additionally, devices that stream recordings of customer statements, such as Amazon’s Alexa may record sensitive information that’s communicated over the course of a remote patient visit. In some cases, a patient may have trouble using a HIPAA (Health Insurance Portability and Accountability Act) compliant telemedicine service such as Updox, and the patient and provider might alternatively look to a non-compliant ordinary Zoom call to complete their visit. How does one make the tradeoff between patient privacy and patient access?

There are steps that both patients and providers can take in mitigating the security risks that surround telehealth visits. Patients can limit the permissions of mobile applications they use to reduce the risk of sharing sensitive information with third parties. Patients may also look to briefly turn off any devices that may record activity in their homes. Medical professionals can ensure that only current patient lab results and records are open on their laptops to avoid inadvertently screen sharing inappropriate patient data. Additionally, medical professionals and patients can work to become familiar with HIPAA-compliant telemedicine services, ensuring improved security and seamless telehealth visits.

Image: Forbes

Beyond the actions of patients and providers, patient privacy is often addressed through regulatory institutions such as the U.S. Department of Health and Human Services (HHS) with acts such as HIPAA. The HHS has recognized the need for telehealth visits during the coronavirus pandemic, and has stated that its Office for Civil Rights (OCR) “will not impose penalties for noncompliance with the regulatory requirements under the HIPAA Rules against covered health care providers in connection with the good faith provision of telehealth during the COVID-19 nationwide public health emergency”. As a supplement to the previous statement, the HHS has stated only non-public telecommunication products should be used in telemedicine visits. While the point at which the world will start to recover from the COVID-19 pandemic remains to be seen, protecting patient privacy through improved regulatory guidelines around telehealth should become a higher priority.

Further regulatory control around patient privacy with respect to telehealth will help to ensure its success. The potential benefits of remote visits are great and are quickly becoming realized. Patients with autoimmune diseases can speak to their providers from home, alleviating their higher-than-average risk of COVID-19 complications. Rural patients who once had to travel hours to see the right provider can participate in lab work and testing closer to home and discuss results and steps forward with talented healthcare providers across the country. Providers may be able to see more patients than before. Patients and providers alike can look forward to a world where telemedicine is more easily integrated into daily life, but steps should be taken to ensure patient privacy.

References

Germain, T. (2020, April 14). Medical Privacy Gets Complicated as Doctors Turn to Videochats. Retrieved October 05, 2020, from https://www.consumerreports.org/health-privacy/medical-privacy-gets-complicated-video-chats-with-doctors-coronavirus/
Hall, J. L., & McGraw, D. (2014, February 01). For Telehealth To Succeed, Privacy And Security Risks Must Be Identified And Addressed. Retrieved October 05, 2020, from https://www.healthaffairs.org/doi/full/10.1377/hlthaff.2013.0997
McDougall, J., Ferucci, E., Glover, J., & Fraenkel, L. (2017, October). Telerheumatology: A Systematic Review. Retrieved October 06, 2020, from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5436947/
Notification of Enforcement Discretion for Telehealth. (2020, March 30). Retrieved October 07, 2020, from https://www.hhs.gov/hipaa/for-professionals/special-topics/emergency-preparedness/notification-enforcement-discretion-telehealth/index.html
Schwab, K. (2020, August 21). Telehealth has a hidden downside. Retrieved October 07, 2020, from https://www.fastcompany.com/90542626/telehealth-has-a-hidden-downside

September 28, 2020

The TikTok Hubbub: What’s Different This Time Around?

The TikTok Hubbub: What’s Different This Time Around?
By Anonymous | September 25, 2020

Barely three years since its creation, TikTok is the latest juggernaut to emerge in the social media landscape. With over two billion downloads (over 600 million of which occurred just this year), the short video sharing app that allows users to lip sync and share viral dances finds itself among the likes of Facebook, Twitter, and Instagram in both the size of its user base and ubiquitousness in popular culture. Along with this popularity has come a firestorm of criticism related to privacy concerns, as well as powerful players in the U.S. government categorizing the app as a national security threat.

Image from: https://analyticsindiamag.com/is-tiktok-really-security-risk-or-america-being-paranoid/

Censorship
The largest reason TikTok seems to garner such scrutiny is the app’s parent company, ByteDance, is a Chinese company, and as such is governed by Chinese laws. Early criticisms of the company noted possible examples of censorship, including the removal of a teen’s account who was critical of human rights abuses by the Chinese government, and a German study that found TikTok hid posts made by LGBTQ users and those with disabilities. Exclusion of these viewpoints from the platform certainly raises censorship concerns. It is worth noting TikTok is not actually available in China, and the company maintains that they “do not remove content based on sensitivities related to China”.

Data Collection
Like many of its counterparts, TikTok collects a vast amount of data from its users, including location, IP addresses, and browsing history. In the context of social media apps, this seems to be the norm. It is the question of where this data might ultimately flow that garners the most criticism. The Wall Street Journal notes “concerns grow that Beijing could tap the social-media platform’s information to gather data on Americans.” The idea that this personal information could be shared with a foreign government is indeed alarming, but might have one wondering why regulators have been fairly easy on U.S. based companies like Facebook, whose role in 2016’s election interference is still up for debate, or why citizens do not find it more problematic that the U.S. government frequently requests user information from Facebook and Google. In contrast to the U.S. Government, the European Union has been at the forefront of protecting user privacy and took preemptive steps by implementing the GDPR so that foreign companies, such as Facebook, could not misuse user data without consequence. It seems evident that control of personal data is a concern globally, but one that the U.S. is only selectively taking seriously if it stems from a foreign company.

Image from: https://www.usnews.com/news/technology/articles/2020-03-04/us-senator-wants-to-ban-federal-workers-from-using-chinese-video-app-tik-tok

The Backlash
In November 2019, with bipartisan support, a U.S. national security probe of TikTok was initiated over concerns of user data collection, content censorship, and the possibility of foreign influence campaigns. In September 2020, President Trump went so far as to implement a ban on TikTok in the U.S. Currently, it appears that Oracle has become TikTok’s “trusted tech partner” in the United States, possibly allaying some fears of where data is stored and processed for the application, and under whose authority, providing a path for TikTok to keep operating within the U.S.

For its part, TikTok is attempting to navigate very tricky geopolitical demands (the app has also been banned in India, and Japan and others may follow), even establishing a Transparency Center to “evaluate [their] moderation systems, processes and policies in a holistic manner”. Whether their actions will actually be able to assuage both the public and government’s misgivings is anyone’s guess, and it can also be argued that where the data they collect is purportedly stored and who owns the company are largely irrelevent to the issues raised.

As the saga over TikTok’s platform and policies continues to play out, hopefully the public and lawmakers will not miss the broader issues raised over privacy practices and user data. It is somewhat convenient to scrutinize a company from a nation with which the U.S. has substantive human rights, political, and trade disagreements. While TikTok’s policies should indeed raise concern, we would do well to ask many of the same questions of the applications we use, regardless of where they were founded.

September 28, 2020

Steps to Protect Your Online Data Privacy

Steps to Protect Your Online Data Privacy
By Andrew Dively | September 25, 2020

Some individuals, when asked about why they don’t take more steps to protect their privacy, respond with something along the lines of, “I don’t have anything to hide.”, but if I were to ask those same individuals to send me their usernames and passwords to their email accounts, very few would actually grant me permission. When there is a lot of personal information about us on the internet, it can harm us in ways we never intended. Future employers who scour social media looking for red flags, past connections searching for our physical addresses on Google, or potential litigators looking up our employer and job title on LinkedIn to determine if we’re worth suing. This guide is going to cover the various ways our data and lives are exposed on the web and how we can protect ourselves.

Social media is by far the worst offender when it comes to data privacy, not only because of the companies’ practices but also because of the information people willingly give up, which can be purchased by virtually any third party. I’d encourage you to Google yourself to see what comes up. If you see your page from any networking sites like LinkedIn or Facebook, there are settings to remove these from public search engines. Then, you have to file a query with Google to remove the links once they no longer work. Then, within the same Google page, go to images and see what comes up. These can usually be removed as well. I would recommend removing as much Personally Identifiable Information (PII) as possible from these pages, such as current city, employers, spouses, birth dates, age, gender, pet names, or anything else that could potentially compromise your identity. Then, go through you contacts and remove individuals you don’t know, because I’d recommend that you use the highest security settings on these apps, but they can be circumvented if someone makes a fake account and sends you a friend request. Each of these social media sites has a method under privacy to view your page from the perspective of an outsider, nothing should be visible other than your name and profile picture. Next we will move onto protecting your physical privacy.

If I walked up to most individuals, they wouldn’t give me their physical address either, yet it only takes five seconds to find it on Google. If you scroll down further on the page where you searched your name, you will see other sites like BeenVerified.com, Whitepages.com, and MyLife.com. All it takes for someone to find where you live on these sites is your full name, age range, and the state you live in. These sites aggregate various personal information from public records and other sources and sells them to other companies and individuals who may be interested in them. You will find your current address and the places you’ve lived for the past ten years, all of your relatives and their information, net worth, birth date, age, credit score, criminal history, etc. The good news is that you can wipe your information from most of these sites by searching for the “opt out” form, which they are required to honor by law. If you want to take a further step, you can setup a third party mail service or P.O. Box that has a physical mailing address for less than $10 per month, to avoid having to give your physical address out. Most people aren’t aware that even entities such as the Department of Motor Vehicles sells individuals address information that gets aggregated by these companies. Protecting your physical address and other vital details can go a long way to protect your privacy.

As we wrap this up, the key takeaway from all of this is to try to think about how your data can be compromised and to take steps to protect it before something happens. There are many more potential harms out there beyond just identity theft. Rather than relying on the Government to regulate data privacy in the US, we as individuals can take steps to reclaim our personal privacy and freedom.

September 23, 2020

Private Surveillance versus Public Monitoring

Private Surveillance versus Public Monitoring
By Anonymous | September 18, 2020

In an era where digital privacy is regarded highly, we put ourselves in a contradictory position when we embed digital devices into every aspect of our lives.

One such device that has a large fan club is the Ring doorbell, a product sold by Ring, an Amazon company. It serves the purpose of a traditional doorbell, but combined with its associated phone application, it can record audio and video to monitor motion detected between five and thirty feet of the fixture. Neighbors can then share their footage with each other for alerting and safety purposes.

Ring Video Doorbell from shop.ring.com

However, as most users of smart devices can anticipate, the data our devices generate rarely remains solely ours. Our data has the ability to enter the free market for alternate uses, analysis, and even for sale. One of the main concerns that has surfaced for these nifty devices is the behind-the-scenes access to the device’s data. Ring has been in partnership with law enforcement agencies across the United States. While intentions of this partnership are broadcasted as a way to increase efficiency in solving crime, it begs a larger question. Washington Post’s Drew Harwell points out that “Ring users consent to the company giving recorded video to “law enforcement authorities, government officials and/or third parties” if the company thinks it’s necessary to comply with “legal process or reasonable government request,” according to its terms of service. The company says it can also store footage deleted by the user to comply with legal obligations.” This begs a larger ethical question on whether these kinds of policies infringe on an individual consumer’s autonomy per the Belmont Principle regarding Respect for Persons. If we can’t control what our devices record, store, and what that data is used for, who should have that power?

What began as a product to protect personal property has garnered the power to become a tool for nationwide monitoring voluntarily or involuntarily. This product which is intended for private property surveillance can become a tool available for public surveillance given the authority law enforcement has for access to device data. While the discussion of the power given to law enforcement agencies is larger in scope, in context of the Ring device, it leaves us wondering if one product has garnered a beastly capability to become a tool for mass surveillance. This then creates a direct link to the Fair Information Practice Principles. Per the Collection Limitation principle, the collection of personal information should be limited and obtained by consent. The Ring devices blur the definition of personal information in this instance. Is the recording of when you leave and enter your home your personal information? If your neighbor captures your movements via their device, does their consent to police requests for access to their device compromise your personal autonomy because your activity is the one being shared?

In contrast to this, an ethical dilemma also arises. If neighbors sharing their device data with law enforcement can catch a dangerous individual (as the Ring terms and conditions state), is there a moral obligation to share that data despite having consent of the recorded individual? This is the blurry line between informed consent and public protection.

Lastly, as law enforcement becomes more easily able to rely on devices like Ring, it brings about a larger question of protection equity. With a base cost of approximately $200 and a monthly subscription of approximately $15 to maintain the device’s monitoring, there is a possibility for protection disparity. Will the areas where people can afford these devices inherently receive better protection from local law enforcement because it is faster and easier to solve those crimes? Per the Belmont Principle regarding Social Justice, the burden of civilian protection should be evenly distributed across all local law enforcement agencies. Would it be equitable if police relied on devices like this as a precursor to offering aid in resolving crimes? On the contrary, these devices also have the ability to hinder law enforcement by giving early warning of police searches to a potential suspect. Is that a fair advantage?

Police officer facing Ring doorbell

These foggy implications are what leave once crime cautious citizens wondering if these devices are tethering the lines of data privacy and ethics concerns and even contributing to a larger digital dystopia.