Class Blog – Page 2 – Data Science W231 | Behind the Data: Humans and Values

October 28, 2022

Credit-Score Algorithm Bias: Big Data vs. Human

Credit-Score Algorithm Bias: Big Data vs. Human
By Anonymous | October 27, 2022

Can Big Data eliminate human bias? The U.S. credit market shows minority groups disproportionately receiving unfavorable terms or outright denials on their loans, mortgages, or credit cards applications. Often, these groups tend to be subject to higher interest rates as opposed to their peer groups. Such decisions rely on the data available to lenders as well as their discretion, thus inserting human bias into the mix.

The reality is a stark contrast in access to credit for minorities, especially for African Americans on interests on business loans, bank branch density, fewer banking location in predominantly Black neighborhoods, and finally a stunted growth of local businesses in those areas. Several solutions were proposed to tackle this issue. The U.S. government signed the Community Reinvestment Act into law in 1977. Initiatives such as the African American minority depository institutions were put in place to increase access to banking for those underserved.

The ever-growing role of Big Data is an opportunity to remove prevalent selection biases in making lending decisions. Nonetheless, the limitations of Big Data are becoming apparent as these minority groups are still largely marginalized. Specifically, much of the existing machine learning models place a heavy emphasis on certain traits of the population to determine their credit worthiness. Demographic characteristics such as education, location, and income are closely intertwined with a population’s profile for instance. In some way, the human features translate into the data a specific way, which would then be validated on some outdated premise, scoring different groups various weights.

Big Data could eliminate human biases by assigning different weights to different population categories. Credit-Score Algorithms ensuring fair and non-biased decisions should put in place. In fact, demographic features used to calculate credit worthiness such as FICO scores may be beneficial if the algorithm used is fair, unbiased, and undergone a strict regulatory review. The key point is that there should not be a single standard that is applied to populations of such differing makeup.

A credit score is a mathematical model comparing an individual’s credit to millions of other people. It is an indication of credit worthiness based on review of past and current relationships with lenders, aiming to provide insight on how an individual manages their debts. Currently, credit scores algorithms leverage information on payment history (35%), credit utilization ratio (30%), length of credit history (15%), new credit (10%), and credit mix or types of credit (10%). The resulting number is then assigned into categories of very bad credit (300-499), bad credit (500–600), fair credit (601–660), good credit (661–780), and excellent credit (781–850).

Error in credit reporting for instance could lead to a long lasting and negative credit worthiness. This is particularly damaging to vulnerable groups since the repercussion can last years.

References:

October 28, 2022

WARNING: Your Filter Bubble Could Kill You

WARNING: Your Filter Bubble Could Kill You
By Author: Anonymous | October 27, 2022

Filter bubbles can distort the health information accessible to you and result in misinformed medical decision making.

Filter bubbles are the talk of the town. Nowadays the political and social landscape seems to be increasingly polarized and it’s easy to point fingers at the social media sites and algorithms that are tailoring the information we all consume. Some of the filters we find ourselves in are harmlessly generating content we want to see – such as a new hat to go with the gloves we just bought. On the other hand, the dark side of filter bubbles can be dangerous in situations where personal health information is customized for us in a way that leads to misinformed decisions.

The ‘Filter Bubble’ is a term coined by Eli Parisor in his book The Filter Bubble: How the New Personalized Web Is Changing What We Read and How We Think. He describes our new “era of personalization” where social media platforms and search engines are using advanced models to craft content it predicts a user would like to see, based on past browsing history and site engagement. This virtually conjured world is intended to perfectly match a user’s preferences and opinions so that they’ll enjoy what they’re seeing and will keep coming back. All of this aims to increase user traffic and pad their bottom line.

An illustration of filter bubble isolation

The negative implications of filter bubbles have been extensively researched, especially as they relate to political polarization. A more serious consequence of tailored content and search results is when the filter interferes with the quality of health information produced. In his article The filter bubble and its effect on online personal health information, Harald Holone describes how the technology can present serious new challenges for doctors and their patients. People are increasingly turning to ‘Doctor Google’ or other social forums to ask questions and become informed on decisions that will affect their health instead of their primary care, licensed professional. Holone describes how the relationship between doctors and patients is shifting because people are starting to draw their own conclusions before stepping foot in a doctor’s office, which can diminish their medical authority. The real issue occurs when people start using their personalized feeds or biased public forums for guidance and are still expecting objective results. As touched on before, the reality is that search results can be heavily skewed by prior browsing history and there’s no guarantee that what turns up is dependable for medical direction and the results could give only a distorted version of reality. An effective example he used is a scenario in which a person is deciding whether or not to have their child vaccinated. A publicized case of this is the 2014 measles outbreak in California; many blamed the root cause on a spread of misinformation leading to lower vaccination rates. Health misinformation can also transcend to other decisions relating to cancer treatment, diets or epidemic outbreaks.

There is no simple answer to combat this issue because challenges arise that prevent people from accessing more trustworthy medical information. One problem is the opaqueness of the filter algorithms can make people oblivious to the fact that what they are seeing is only a sliver of the real world and not objective. It is also not evident to people where they fall within the realm of filter bubbles and in what direction their information is biased, especially if they have been in a siloe for a long time. Another dilemma that Halone presents is our lack of control over the content we see. Even if we do become conscious of our bubble, many of the prevalent social media sites don’t offer a way to intentionally shift back to center.

So what can we do to become objectively informed, especially when it comes to medical information? Several solutions have been proposed to help us regain our power. A browser tool named Balancer was created by Munson et al that can track the sites you visit and provide an overview of your reading habits and biases to increase awareness. Another interesting tool that could help combat misinformation is a Chrome add-on called Rbutr that informs a user if what they are currently viewing has been previously disputed or contradicted elsewhere. A more simple place to start could be deleting your browser history or using DuckDuckGo when searching for information that will be used for health decisions.

The conversations surrounding filter bubbles seem to be mainly political in nature but a scary reality is that these bubbles have more grave implications. If you’re not paying attention they can be what’s sneakily driving your medical decisions, not you. Luckily there are steps that you can – and should – take so that you’re privy to all the information you need to make an informed decision within your own best interest.

Sources

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4937233/#R7

https://link.springer.com/article/10.1007/s10676-015-9380-y

https://www.wired.com/story/facebook-vortex-political-polarization/

https://fs.blog/filter-bubbles/

October 28, 2022

The search for the Holy Grail, privacy, profit & global good: Patagonia

The search for the Holy Grail, privacy, profit & global good: Patagonia
By Luis Delgado | October 27, 2022

As we focus on Mulligan, Koopman, et. al and their “Analytic for mapping privacy” it is easy to mistake the lofty goals of privacy as ideals not compatible with business, profits, and especially the most ambitious capitalistic goals. Yet a company has led the path of doing good while breaking revenue records for itself. They lead with a spirit of transparency and good will, not without fault, but with a clear desire to outweigh the negative effects on the earth. Despite inflationary and consumer discretionary spending pressures, those profits continue to rise. I explore the structure of Patagonia, and how it could map to modern tech companies and the careful balance of privacy and top/bottom lines. (Mulligan, Koopman, et. al., 2016)

Patagonia’s core values include:

“Build the best product”
“Cause no unnecessary harm”
“Use business to protect nature”
“Not bound by convention” (Patagonia.com/core-values, 2022)

Within these principles we see a clear venture into the execution of business and sales. How are these balanced with a desire to good? Is that even possible?

Companies like Patagonia are not necessary. No one really needs expensive brand jackets and gear. They are created for the sake of making profits for a company. The innovation comes in being clear about that goal while creating plans to balance that impact through innovation and measurable revenue going towards doing “good”. The concrete actions Patagonia has taken include:

Historically giving away 1% of profits to earth preservation causes.
Creating a trust with the sole intent of global protection.
Giving away the 2% share owned by the founder’s company.
Giving away the remaining 98% of common shares to the same trust.
Focusing on creating long lasting products.
Creating channels for secondhand sales.
Providing all products with lifelong repair support.

(Outside Magazine, Oct 2022)

These actions could easily adversely affect the bottom line of the company, but it is arguably exactly why their revenue has risen. The company’s honest execution, transparency, and focus away from profits has created a customer following attracted waves of new business.

These principles mirror the underlying goals of the Federal Trade Commission’s views for current and future legislation (FTC, Mar 2012). Pillars such as clarity of privacy practices, clear consent, and stronger transparency guide Patagonia’s business model. It is no surprise that Patagonia’s privacy policy also reflects these values. It includes clearly defined systems for information safeguarding, user correction, and customer choice (Patagonia.com/privacy-policy.html, 2022).

With this in mind, I attempt to create a parallel structure that utilizes “dimensions” (Mulligan, Koopman, Doty, Oct 2016), to describe company policy and execution in terms of privacy and sales:

Object: What is privacy for? / Who does revenue benefit?
Justification: Why should this be private? / Why care about the environment and customers?
Contrast concept: What is not private? / What unavoidable harm does the company produce?
Target: What is privacy about and of what? / Besides top and bottom line, who and what benefits from revenue?
Action / Offender: What/who defines a privacy violation? / What does a failure to act ethically look like? To customers? Towards global preservation?

If Patagonia can essentially reinvent itself with a focus on the earth and on customers and be incredibly successful, could that create a framework for other companies? Could it become a framework for legislation at the micro and macro level? I believe it can, and the case for it has been proven by many companies who are pivoting in this direction. If these companies could shift the status quo of what it means to be a profitable corporation, then could this influence the greater public to not only endorse, but expect this from modern business?

I believe that at many points, the study of clearcut ethical privacy practices can seem overwhelming to execute and to enforce, but examples like Patagonia show us that not only is it very much doable but can create greatly benefit not just the environment affected by a company, but the company itself.

References:

Mulligan D., Koopman, C., Doty, N. (October 2016), Privacy is an essentially contested concept: a multi-dimensional analytic for mapping privacy. The Royal Society Publishing.
Federal Trade Commission (March 2012), Protecting Consumer Privacy in an Era of Rapid Change.
Peterson, C., (Oct 2022), Patagonia Will Donate $100 Million a year to Fight Climate Change. What Can that Money Accomplish?, Outside Magazine. https://www.outsideonline.com/business-journal/brands/patagonia-new-corporate-structure-analysis/
Patagonia’s privacy policy (Oct 2022), https://www.patagonia.com/privacy-policy.html
Patagonia’s core values (2022), https://www.patagonia.com/core-values/

October 28, 2022

We are the Product

We are the Product
By Anonymous | October 24, 2022

E-commerce companies want to sell you something, but they really want your digital personas as their bonus. E-commerce websites made a fortune in the past two years with YoY growth above 20% from 2020 and it’s only expected to increase from here on out. With all of this potential revenue floating around, it’s no wonder that e-commerce websites spend millions analyzing their customer behavior in order to figure out the secret recipe that causes a user to make a purchase.

After establishing a product flow and generating some revenue, the next idea that e-commerce companies have is user tracking behavior. Understanding why some people make purchases and using that data to influence those who have not purchased their product is the fastest way to grow their revenue. There are a few companies that are big in the space such as Google Analytics, Pendo, Mixpanel and Amplitude, but the one this blog will focus on is Heap. The core feature of Heap is its “Autocapture” technology which is advertised by Heap as “Heap provides the easiest and most comprehensive way to automatically capture the user interactions on your site, from the moment of installation forward. A single snippet grabs every click, swipe, tap, pageview, and fill — forever.” This data would give e-commerce companies the ability to analyze their user base and understand how to better advertise and market themselves in order to increase their revenue by moving the needle of the non-converted purchasers to become buyers. This in turn reveals a third source of revenue for these companies, the user behaviors themselves.

Here’s the basic flow for understanding how Heap works. First, you go and create a sign-up to get access to your company’s environment. After you sign-up you’re given a random “environment_id” which is a random integer value. You use this with their javascript snippet to push data from your website to the Heap Platform. Once you’ve installed the Heap javascript snippet (takes two seconds) you’ll start to see your users flood in through the Heap system.

From here you can define any particular event you are looking for to analyze it more carefully. Want to see how many users looked at a pop-up on your site before making a purchase, that’s easily done. Want to see how many users came from search companies like Google or Bing, that’s easily done. Want to see the geographic location of where your users are located, that’s also easily done. Through this tool you can get a one-stop shot to answer almost all of your analytical needs to understand your users and therefore take actions to nudge your users into a particular behavior. For example, say that you notice through the Heap dataset that if your users are exposed to certain pop-ups they’re 90% more likely to make a purchase on that pop-up. Well now you’ll show that pop-up to all of your users to see how much more revenue that generates. All of this is done with ease and that’s what makes this tool incredibly powerful and terrifying.

The privacy and ethical concerns around these companies (Google Analytics, Pendo, Mixpanel, Amplitude and Heap) are numerous. First, when visiting company websites that utilize these types of tracking companies, the mention of them is often small within the Cookie Banner consent and oftentimes there’s so much information in these cookie banner consent that it’s hard to know what each script is doing. Second, there’s no option to opt-out of having these companies track your usage around the company website unless you’re using an ad-blocker as that was found to prevent these companies’ snippets from running. Third, without going through and reading the privacy policy of these companies it’s hard to know what that data collection could be used for. The policies surrounding these companies are often vague, and are left open to interpretation. The ethical implications revolve around consent and whether there was an adequate amount of informed consent from the user as a data subject to be used in this regard for analysis later on.

All of these details point to the new future for e-commerce companies, where the product they’re selling is actually secondary to what they desire. What they really want is our data interacting with their website. All of those clicks, scrolls, form fill in are valuable to them in order to market their next product and idea for us to purchase. In this ever increasing advertising world, we must stay wary that the product we are purchasing isn’t at the cost of our digital personas.

October 26, 2022

Data Privacy in the Metaverse: Real Threats in a Virtual World

Data Privacy in the Metaverse: Real Threats in a Virtual World
By Alexa Coughlin | October 24, 2022

The metaverse promises to revolutionize the way we interact with each other and the world – but at what cost?

https://media.cybernews.com/images/featured/2022/01/metaverse-featured-image.jpg

Imagine having the ability to connect, work, play, learn, and even shop all from the comfort of your own home. Now imagine doing it all in 3D with a giant pair of goggles strapped to your face. Welcome to the metaverse.

Dubbed ‘the successor to the mobile internet’, the metaverse promises to revolutionize the way we engage with the world around us through a network of 3D virtual worlds. While Meta (formerly Facebook) is leading the charge into this wild west of virtual reality, plenty of other companies are along for the ride. Everyone from tech giants like Microsoft and NVIDIA, to retailers like Nike and Ralph Lauren is eager to try their hand at navigating cyberspace.

https://cdn.searchenginejournal.com/wp-content/uploads/2022/02/companies-active-in-the-metaverse-621cdb2c5fb11-sej-480×384.jpeg

Boundless as the promise of this new technology may seem, there’s no such thing as a free lunch when it comes to an industry where people (and their data) have historically been the product. The metaverse is no exception.

Through the core metaverse technology of VR and AR headsets, users will provide companies like Meta with access to new categories of personal information that have historically been extremely difficult, if not totally impossible, to track. Purveyors of this cyberworld will have access to extremely sensitive biometric data such as facial expressions, eye movements, and even a person’s gait at their fingertips. Medical conditions, emotions, preferences, mental states, and even subconscious thoughts – it’s all there. All just inferences waiting to be extracted and monetized. Having recently lost $10 billion in ad revenue to changes in Apple’s App Tracking Transparency feature, it’s not hard to fathom what plans a company like Meta might have in store for this new treasure trove of prized data.

https://venturebeat.com/wp-content/uploads/2022/01/5-Identity-and-authentication-in-the-metaverse.jpg?fit=2160%2C1078&strip=all

Given the extremely personal and potentially compromising nature of this data, some users have already started thinking about how they might combat this invasion of privacy. Some have chosen to rely on established data privacy concepts like differential privacy (i.e. adding noise to different VR tracking measures) to obfuscate their identities in the metaverse. Others still have turned to physical means of intervention, like privacy shields, to prevent eye movement tracking.

Creative as these approaches to privacy might be, users should not have to rely on the equivalent of statistical party tricks or glorified sunglasses to protect themselves from exploitation in the metaverse. That said, given the extreme variance in robustness of data regulations across the world, glorified sunglasses may not in fact be the worst option for some. For example, while most of this biometrically derived data may classify as ‘special category’ under the broad categorization of the EU’s GDPR regulation, it may not warrant special protections under the narrower definition of Illinois’ BIPA (Biometric Information Privacy Act) for instance. And that’s to say nothing of the 44 U.S. states with no active data protection laws at all.

With the metaverse still in its fledgling stages, awaiting mass market adoption, it is crucial that regulators take this opportunity to develop new laws that will protect users from these novel and specific threats to their personal data and safety. Until then, consumer education on data practices in the metaverse remains hugely important. It’s essential that users stay informed about what’s at stake for them in the real world when they enter the virtual one.

October 26, 2022

Do the financial markets care about data privacy?

Do the financial markets care about data privacy?
By Finnian Meagher | October 24, 2022

While long-term outperformance of large tech companies may indicate that privacy policy mishaps aren’t priced into stock prices, recent stock performances and relevant pieces of research may suggest otherwise – companies may be forced by the demands of the market and shareholders into adopting more rigid privacy policy practices.
The data privacy practices and policies of companies, especially large tech and social media organizations, are often scrutinized by the public, academics, and government regulators. However, as these companies have seen years of significant outperformance over the rest of the market, as exemplified by the below chart of the FAANG ETF (Facebook, Apple, Amazon, Netflix, and Google), it begs the question: do the financial markets care, and if so, what is the impact of negative press about privacy on market sentiment and stock prices?

Source: https://portfolioslab.com/portfolio/faang

Paul Bischoff of Comparitech believes that there exists a relationship between data breaches and falls in company share prices, including an average share price fall of -3.5% and underperformance of -3.5% compared to the NASDAQ in the months following a data breach. Bischoff observes that “in the long term, breached companies underperformed the market,” with share prices falling -8.6% on average one year after a data breach. Bischoff also notes that more severe data breaches (i.e., leaking highly sensitive information) see “more immediate drops in share price performance.” (Bischell)

However, not every data breach is equally punished. As an example of a deviation from the observations that Bischoff made, Facebook outperformed the NASDAQ after their April 2019 data breach of over 533 million people’s data:

Similarly, Harvard Business Review found that “a good corporate privacy policy can shield firms from the financial harm posed by a data breach… while a flawed policy can exacerbate the problems caused by a breach” (Martin et al.). HBS notes that firms that have high control and transparency in terms of data are “buffered from stock price damage during data breaches,” but “only 10% of Fortune 500 firms fit this profile” (Martin et al.). Additionally, it is observed that data breaches in neighboring companies within one’s industry can have a negative effect on your own company’s stock prices, but those that have strong control and transparency weather those storms better than others.

More recently, Apple announced that their privacy features could amount to billions of dollars in costs for the firm, and this rippled out into their neighboring competitors such as Twitter, Meta, Pinterest, and Snapchat. Zuckerberg, of Meta, noted that Apple’s newly introduced privacy features “could cost… $10 billion in lost sales [to Meta] this year” (Conger and Chen) which contributed to a drop of 26% of the company’s stock. Given that “people can’t really be targeted the way they were before,” (Conger and Chen) companies like Meta will have to go through a comprehensive rebuild of their business plan which could cause financial distress and ultimately share prices to drop. While there have been many factors in the general macroeconomic environment that have contributed to a pull back in share prices of large tech companies this year, implications of evolving privacy policies and practices have contributed to a sharp pullback in stock prices in the sector:

While looking at the long-term outperformance and growth of large tech companies’ financials and stock prices may lead one to take a cynical capitalist view of ‘profit over anything’ even at the expense of user privacy, research in the space and the recent exemplification of stock prices dropping amidst Apple’s change in privacy features shows that maybe these tech giants are not immune. Taking the other view, as ‘money makes the world go around,’ if markets signal the need for adoption of more strict privacy practices and policies, even without regulatory pressure, then companies could be influenced to have a more comprehensive adoption of the principals of data privacy practices.

Works Cited:

Bischoff, Paul. “How Data Breaches Affect Stock Market Share Prices.” Comparitech, 25 June 2021, https://www.comparitech.com/blog/information-security/data-breach-share-price-analysis/).
Conger, Kate, and Brian X. Chen. “A Change by Apple Is Tormenting Internet Companies, Especially Meta.” The New York Times, The New York Times, 3 Feb. 2022, https://www.nytimes.com/2022/02/03/technology/apple-privacy-changes-meta.html.
“Faang Portfolio.” PortfoliosLab, https://portfolioslab.com/portfolio/faang.
Martin, Kelly D, et al. “Research: A Strong Privacy Policy Can Save Your Company Millions.” Harvard Business Review, 30 Aug. 2021, https://hbr.org/2018/02/research-a-strong-privacy-policy-can-save-your-company-millions.

October 26, 2022

Is the New Internet Prone to Old School Hacks?

Is the New Internet Prone to Old School Hacks?
By Sean Lo | October 20, 2022

The blockchain is commonly heralded as the future of the internet, however, Opensea’s email phishing incident in June 2022 proved that we may still be far away from true online safety. We are currently in the midst of one of the biggest technological shifts in the past few decades. What many people are referring to as Web3.0; blockchain is the main technology that is helping build the future of the internet. Underneath this shift, is the idea that the new age internet will be decentralized. In other words, the internet should be owned by the collective group of people that actually uses it, v.s what we have today.

In June 2022, Opensea, one of the largest Non-fungible token (NFT) marketplaces got hacked and lost 17 of their customers their entire NFT collection. It was reported that the value of the combined stolen assets was north of $2 million dollars. The question is then, how is it possible that the blockchain got hacked? Wasn’t the security of the blockchain the main feature that is promised in the new age internet? These are 2 very valid questions as a flaw in the blockchain would ultimately highlight the potential flaws of using the blockchain entirely. What was interesting about this specific incident was that the hack was actually a simple phishing scam, a type of scam that existed since the beginning of email. Opensea reported that an employee at customer.io, the email automation and lifecycle marketing platform Opensea uses, downloaded their email database to send a phishing email. In the attached image below reads the email that was sent to all Opensea customers. This was a couple months after the long awaited Ethereum merge, and the email used this opportunity to trick some users into signing malicious contract.

Opensea phishing email

As mentioned before, social engineering and phishing hacks have always been part of the internet. In fact, the “Nigerian prince” email scam still ranks in roughly $700K a year in lost funds. What makes this specific phishing incident so interesting is because it was done through a Web3.0 native company, and the stolen funds were all stolen directly on the blockchain. By pretending to be Opensea, they were able to get customers to sign a smart contract which the contract proceeded to drain the signers digital wallet. For context, smart contracts are a set of instructions that are binded by the blockchain, think of it as a set of instructions for the computer program to run. Smart contracts are written in the coding language called Solidity, so unless you can read that language, its highly likely that you aren’t aware of what you are signing.

Fake smart contract message

As we venture into the world of Web3.0 where blockchain is the underlying technology that is central to many types of online transactions, there comes a question of how liability and security should be governed in this new world. We’re still very early in the innings around Web3.0 adoption, and I truly believe we’re still likely half a decade away from true mass adoption. On top of all the existing Web2.0 regulations that companies need to follow, the government must also step up to create new laws to keep the regular citizen from malicious online acts. The anonymity of the blockchain does pose potential risks to the entire ecosystem, which is why I believe there must be federal laws around the technology to push us towards mass adoption. It’s really a matter of when rather than if, as there is a pretty clear increase in uses across the entire tech industry.

October 26, 2022

Anonymity as a Means of Abusing Privacy

Anonymity as a Means of Abusing Privacy
By Mima Mirkovic | October 20, 2022

It’s spooky season, and what’s spookier than the Dark Web?

Where’d All the Web Go?
Traditional search engines like Google, Yahoo, and Bing make up 4% of the “surface web”… but where’s the remaining 96%???

At the intersection of anonymity and privacy exists the Dark Web, an elusive section of the internet not indexed by web crawlers and home to 3,000 hidden sites. The market share of the dark web is 6%, serving as a secret marketplace notorious for illicit drugs, arms dealing, human trafficking, major fraud, and more.

This brings me to an ethical, head-scratching conundrum that I’ve been mulling over for years: how is any of this legal?

It isn’t, but it is.

When privacy originated in the 14th century, I don’t think it ever expected the internet to exist. The arrival of the internet mutated common definitions of privacy, but the arrival of the Dark Web completely obliterated these definitions because it offered a means through which privacy could be abused: anonymity.

Dark Web Origins

Time for a history lesson!

In the late 1990s, the US Department of Defense developed an encrypted network using “onion routing” to protect sensitive communications between US spies. This network was intended to protect dissidents, whistleblowers, journalists, and advocates for democracy in authoritarian states.

In the early 2000s, a group of computer scientists used onion routing to develop the Tor (“The Onion Router”) Project, a nonprofit software organization whose mission is to “advance human rights and defend your privacy online through free software and open networks”. By simply downloading the Tor browser, anyone – ANYONE – can access the dark web. The Tor browser works to anonymize your location and protect your data from hackers and web trackers.

In short, the Tor browser offers users an unmatched level of security and protects your human right to privacy via anonymity, but not all who lurk in the shadows are saints.

Ethics, Schmethics
Privacy is malleable. Its definition is groundless. As Solove would say, “privacy suffers from an embarrassment of meanings”. Privacy is bound to whichever context it is placed in, which, conjoined with anonymity, invites the opportunity for violation.

Through a critical multi-dimensional analytic lens, privacy suffers from its own internal complexity. In the context of onion routing, the malleable nature of privacy allows for it to be used for harm despite its objectives, justifications, and applications being intended for good:

Objective – Provide an encrypted, anonymized network
Justification – Privacy is a human right
Application – A secure network for individuals to avoid censorship and scrutiny from their authoritarian regimes

From the “good guy” perspective, the Tor Project was created to uphold an entity we value the most. You could even argue that it was an ethical approach to protecting privacy. In fact, the Tor Project upholds the central tenets of The Belmont Report: users are given full autonomy over their own decisions, users are free from obstruction or legal harm, and every user is given access to the same degree of privacy.

On the flip side, the “bad guys” quickly learned that their malicious actions online could be done without trace or consequence. Take these stats for example: 50,000 terrorist groups operate on the dark web, 8.1% of listings on darknet marketplaces are for illicit drugs, and illegal financing takes up around 6.3% of all dark web markets. You can purchase someone’s credit card number for as little as $9 on the dark web – how is any of this respectful, just, or fair?

Think about it this way…

In 2021, a hacker posted 700M LinkedIn records on the dark web, exposing 92% of LinkedIn users. Your data, the one you work hard to protect, was probably (if not almost certainly) exposed in that breach. That means that your phone number, geolocation, and connected social media accounts were posted for sale by hackers on the dark web. The “bad guys” saw an opportunity to exploit your privacy, my privacy, your friends’ privacy, and your family’s privacy in exchange for a profit – yet their actions were permissible under the guise of privacy and anonymity.

Let’s look at this example through the lens of the Belmont Report:

* Respect for Persons – Hacking is clearly detrimental to innocent users of the web, yet hacking is a repeatable offense and difficult to prevent from occurring
* Beneficence – Hackers don’t consider the potential risks that would befall on innocent people, only the benefits they would stand to gain from exposing these accounts
* Justice – 700M records were unfairly exposed, and the repercussions were not evenly distributed nor was there appropriate remediation

There are thousands of more examples (some much more horrifying), where we could apply these frameworks to show how anonymity enables and promotes the abuse of our human right to privacy. The main takeaway is that no, these actions do not reflect a respect for persons approach, they’re not just in nature, and they’re certainly not fair.

Conclusion
Privacy is a fundamental part of our existence and it deserves to be protected – to an extent. The Tor browser originally presented itself as a morally righteous platform for users to evade censorship, but the dark deeds that occur on darknets nowadays defeat the purpose of privacy entirely. With that in mind, the Belmont Report is a wonderful framework for assessing data protection, but I believe it requires some (major) tweaks to encompass more extreme scenarios.

At the end of the day, your privacy is not nearly as protected as the privacy of criminals on the dark web. Criminals are kept safe because privacy is a human right, yet they are permitted to abuse this privacy in a way that exploits innocent people, harms society, and provides a hub for lawbreaking of the highest degree. At the same time, the law enforcement and government agencies that work to uphold privacy are the same ones breaking this human right in order to catch these “bad guys”. If you ever find yourself scouring through the dark web, proceed with caution, because even in the most private of locations, you’re always being watched!

Like I said earlier – an ethical, head-scratching conundrum that I will continue to mull over for years.

References

[1] Dark Web and Its Impact in Online Anonymity and Privacy: A Critical Analysis and Review
[2] How Much of the Internet is the Dark Web in 2022?
[3] The Truth About The Dark Web – IMF F&D.
[4] Taking on the Dark Web: Law Enforcement Experts ID Investigative Needs | National Institute of Justice

October 26, 2022

Are You Playing Games or Are the Games Playing You?

Are You Playing Games or Are the Games Playing You?
By Anonymous | October 21, 2022

Is our data from playing games being used to manipulate our behavior? When we or our children play online games, there is no doubt we generate an enormous amount of data. This data includes what we would expect from a Google or Facebook (such as location, payment, or device data), but what is not often considered is that this also includes biometric and detailed decision data from playing the game itself. Of course, this in-game data can be used for a variety of purposes such as fixing bugs or improving the game experience for users, but many times it is used to exploit players instead.

Source: https://www.hackread.com/gaming-data-collection-breach-user-privacy/

Data Usage
To be more specific, game developers nowadays are utilizing big datasets like those shown in the image above to gain insights into how to keep players playing for longer and spend more money on the game.[1] While big developers have historically had analytics departments in order to figure out how users were playing their game, even smaller developers today have access to middleware created by external parties that can help to refine their monetization strategies.[2] Some gaming companies even aggregate external sources of data on users such as surveys that infer personality characteristics based on how they play the game. In fact, game developers specifically use decision data like dialogue choices to build psychological profiles of their players, allowing the developers to figure out how impulsive or social they are, isolating players that might be more inclined to spend money or be more engaged.[3] Games such as Pokemon GO can take it a step further by aggregating data from our phones such as facial expressions and room noises in order to further refine this profile.

To capitalize on these personality profiles, developers then build in “nudges” into the games that are used to manipulate players into taking certain actions such as purchasing online goods or revealing personal information. This includes on-screen hints about cash shops, locking content behind a pay wall, or forcing players to engage in loot box mechanics in order to remain competitive. This is highly profitable from games ranging from FIFA to Candy Crush, allowing their parent companies to generate billions in revenue per year.[1]

Source: https://www.polygon.com/features/2019/5/9/18522937/video-game-privacy-player-data-collection

Aside from microtransactions, developers can also monetize this data through targeted advertising to their users, matching the best users based on the requirements of the advertiser.[4] Online games not only provide advertisers with the ability to reach a large-scale audience, but to engage players through rewarded ads as well.

Worse Than a Casino for Children
Given external parties ranging from middleware providers to advertisers have access to intimate decision-making data, this brings up a whole host of privacy concerns. If we were to apply Nissenbaum’s Contextual Integrity framework for privacy to gaming, we could compare online games to a casino. In fact, loot boxes specifically function like a slot machine in that it provides uncertain reward and dopamine spikes to players if they win, encouraging addiction. Similar to how a casino targets “whales” that account for the majority of their revenue, online games also try do the same, allowing them to maximize revenue through microtransactions. Yet unlike casinos, online games are not only allowed, but prevalent amongst young adults under the age of 18 and has problems that extend beyond gambling addiction. In Roblox, one of the most popular children’s games in the world (that allows children to monetize in-game items in the games that they create), there have been numerous reports of financial exploitation, sexual harassment, and threats of dismissal for noncompliance.[5]

Conclusion
While there have been efforts to raise awareness about the manipulative practices of online gaming, the industry still has a long way to go before a clear regulatory framework is established. The California Privacy Rights Act is a step in the right direction as it prohibits obtaining consent through “dark patterns” (nudges), but whether gamers will ultimately have the ability to limit sharing of decision data or delete it from the online game itself remains to be seen.

Sources:
[1] https://www.brookings.edu/techstream/a-guide-to-reining-in-data-driven-video-game-design-privacy/

[2] https://www.wired.com/story/video-games-data-privacy-artificial-intelligence/

[3] https://www.polygon.com/features/2019/5/9/18522937/video-game-privacy-player-data-collection

[4] https://www.hackread.com/gaming-data-collection-breach-user-privacy/

[5] https://www.theguardian.com/games/2022/jan/09/the-trouble-with-roblox-the-video-game-empire-built-on-child-labour

October 26, 2022

Synthetic Data: Silver Bullet?

Synthetic Data: Silver Bullet?
By Vinod Viswanathan | October 20, 2022

One of the biggest harms that organizations and government agencies can cause to customers and citizens is exposing personal information arising out of security breaches exploited by bad actors, both internal and external. A lot of the security vulnerability is the result of a conflict between securing data while allowing safe sharing of data; goals that are primarily at odds with each other.

Synthetic data is artificially generated data through machine learning techniques that model the real world. Artificial data, to qualify as synthetic data, must have two properties. It must retain all the statistical properties of the real world data and it must not be possible to reconstruct the real world data from the artificial data. This technique was first developed in 1993, in Harvard University, by Prof. Donald Rubin who wanted to anonymize census data for his studies and was failing to do it. He instead used statistical methods to create an artificial dataset that mirrored the population statistics of the census data allowing him and his colleagues to analyze and draw inferences without compromising the privacy of the citizens. In addition to privacy, synthetic data allowed for large data sets to be generated and solved the data scarcity problem as well.

As privacy legislation progressed along with efficient large-scale compute, synthetic data started to play a bigger role in machine learning and artificial intelligence by providing anonymous, safe, accurate, large-scale, flexible training data. The anonymity guarantees allowed collaboration; cross-team, cross-organization and cross-industry collaboration providing cost effective research.

Synthetic data mirrors the real world including its biases. One way the bias shows up is through the underrepresentation of certain classifications (groups) in the dataset. As this technique is capable of generating data, it can be used to boost the representation in the dataset while being representative of the classification.

Gartner report, released in June 2022, estimates that by 2030 synthetic data will completely replace real data in training models.

So, have we solved the data problem ? Is synthetic data the silver bullet that is going to allow R&D with personal data with all of the privacy harms.

Definitely not.

Synthetic data can improve representation only if a human involved in the research is able to identify the bias in the data. Bias, by nature, is implicit in humans. We have it and typically we do not know or realize it. Therefore, it is hard for us to pick

it up in the dataset; real or synthetic. This realization of bias continues to be a problem even though safe sharing and collaboration with a diverse group of researchers increases the odds of removing the blindfolds and addressing the inherent bias in the data.

The real world is hardly constant and the phrase “the only constant in life is change” is unfortunately true. The safe, large, accurate and anonymous dataset that can support open access can blind researchers into using these datasets even when the real world has changed. Depending on the application, even a small change in the real world can introduce large deviations in the inferences and predictions from the models that use the incorrect dataset.

Today, the cost of computing power needed to generate synthetic datasets is expensive and not all organizations can afford it. The cost is exponentially higher if the datasets involve rich media assets; images and video, which are very common in the healthcare and transportation automation industries. It is also extremely hard to validate synthetic datasets and their source real world data to generate identical results in all research experiments.

The ease and the advantages of synthetic data can incentivize laziness in researchers, where the researchers simply stop doing the hard work of collecting real-world data and default to synthetic data. In a worst-case scenario, deep-fakes for example makes it extremely difficult to distinguish real and synthetic data allowing misinformation to propagate into the real world and through real world events and data back into synthetic data creating a vicious cycle with devastating consequences.

In summary, don’t drop your guard if you are working with synthetic data. What Is Synthetic Data? Gerard Andrews, Nvidia, June 2021

Sources:

https://blogs.nvidia.com/blog/2021/06/08/what-is-synthetic-data/

The Real Deal About Synthetic Data, MIT Sloan Review, Winter 2022

https://sloanreview.mit.edu/article/the-real-deal-about-synthetic-data/

How Synthetic Data is Accelerating Computer Vision

https://hackernoon.com/how-synthetic-data-is-accelerating-computer-vision-xp153 w6q