October 2022 – Page 2 – Data Science W231 | Behind the Data: Humans and Values

October 26, 2022

Is the New Internet Prone to Old School Hacks?

Is the New Internet Prone to Old School Hacks?
By Sean Lo | October 20, 2022

The blockchain is commonly heralded as the future of the internet, however, Opensea’s email phishing incident in June 2022 proved that we may still be far away from true online safety. We are currently in the midst of one of the biggest technological shifts in the past few decades. What many people are referring to as Web3.0; blockchain is the main technology that is helping build the future of the internet. Underneath this shift, is the idea that the new age internet will be decentralized. In other words, the internet should be owned by the collective group of people that actually uses it, v.s what we have today.

In June 2022, Opensea, one of the largest Non-fungible token (NFT) marketplaces got hacked and lost 17 of their customers their entire NFT collection. It was reported that the value of the combined stolen assets was north of $2 million dollars. The question is then, how is it possible that the blockchain got hacked? Wasn’t the security of the blockchain the main feature that is promised in the new age internet? These are 2 very valid questions as a flaw in the blockchain would ultimately highlight the potential flaws of using the blockchain entirely. What was interesting about this specific incident was that the hack was actually a simple phishing scam, a type of scam that existed since the beginning of email. Opensea reported that an employee at customer.io, the email automation and lifecycle marketing platform Opensea uses, downloaded their email database to send a phishing email. In the attached image below reads the email that was sent to all Opensea customers. This was a couple months after the long awaited Ethereum merge, and the email used this opportunity to trick some users into signing malicious contract.

Opensea phishing email

As mentioned before, social engineering and phishing hacks have always been part of the internet. In fact, the “Nigerian prince” email scam still ranks in roughly $700K a year in lost funds. What makes this specific phishing incident so interesting is because it was done through a Web3.0 native company, and the stolen funds were all stolen directly on the blockchain. By pretending to be Opensea, they were able to get customers to sign a smart contract which the contract proceeded to drain the signers digital wallet. For context, smart contracts are a set of instructions that are binded by the blockchain, think of it as a set of instructions for the computer program to run. Smart contracts are written in the coding language called Solidity, so unless you can read that language, its highly likely that you aren’t aware of what you are signing.

Fake smart contract message

As we venture into the world of Web3.0 where blockchain is the underlying technology that is central to many types of online transactions, there comes a question of how liability and security should be governed in this new world. We’re still very early in the innings around Web3.0 adoption, and I truly believe we’re still likely half a decade away from true mass adoption. On top of all the existing Web2.0 regulations that companies need to follow, the government must also step up to create new laws to keep the regular citizen from malicious online acts. The anonymity of the blockchain does pose potential risks to the entire ecosystem, which is why I believe there must be federal laws around the technology to push us towards mass adoption. It’s really a matter of when rather than if, as there is a pretty clear increase in uses across the entire tech industry.

October 26, 2022

Anonymity as a Means of Abusing Privacy

Anonymity as a Means of Abusing Privacy
By Mima Mirkovic | October 20, 2022

It’s spooky season, and what’s spookier than the Dark Web?

Where’d All the Web Go?
Traditional search engines like Google, Yahoo, and Bing make up 4% of the “surface web”… but where’s the remaining 96%???

At the intersection of anonymity and privacy exists the Dark Web, an elusive section of the internet not indexed by web crawlers and home to 3,000 hidden sites. The market share of the dark web is 6%, serving as a secret marketplace notorious for illicit drugs, arms dealing, human trafficking, major fraud, and more.

This brings me to an ethical, head-scratching conundrum that I’ve been mulling over for years: how is any of this legal?

It isn’t, but it is.

When privacy originated in the 14th century, I don’t think it ever expected the internet to exist. The arrival of the internet mutated common definitions of privacy, but the arrival of the Dark Web completely obliterated these definitions because it offered a means through which privacy could be abused: anonymity.

Dark Web Origins

Time for a history lesson!

In the late 1990s, the US Department of Defense developed an encrypted network using “onion routing” to protect sensitive communications between US spies. This network was intended to protect dissidents, whistleblowers, journalists, and advocates for democracy in authoritarian states.

In the early 2000s, a group of computer scientists used onion routing to develop the Tor (“The Onion Router”) Project, a nonprofit software organization whose mission is to “advance human rights and defend your privacy online through free software and open networks”. By simply downloading the Tor browser, anyone – ANYONE – can access the dark web. The Tor browser works to anonymize your location and protect your data from hackers and web trackers.

In short, the Tor browser offers users an unmatched level of security and protects your human right to privacy via anonymity, but not all who lurk in the shadows are saints.

Ethics, Schmethics
Privacy is malleable. Its definition is groundless. As Solove would say, “privacy suffers from an embarrassment of meanings”. Privacy is bound to whichever context it is placed in, which, conjoined with anonymity, invites the opportunity for violation.

Through a critical multi-dimensional analytic lens, privacy suffers from its own internal complexity. In the context of onion routing, the malleable nature of privacy allows for it to be used for harm despite its objectives, justifications, and applications being intended for good:

Objective – Provide an encrypted, anonymized network
Justification – Privacy is a human right
Application – A secure network for individuals to avoid censorship and scrutiny from their authoritarian regimes

From the “good guy” perspective, the Tor Project was created to uphold an entity we value the most. You could even argue that it was an ethical approach to protecting privacy. In fact, the Tor Project upholds the central tenets of The Belmont Report: users are given full autonomy over their own decisions, users are free from obstruction or legal harm, and every user is given access to the same degree of privacy.

On the flip side, the “bad guys” quickly learned that their malicious actions online could be done without trace or consequence. Take these stats for example: 50,000 terrorist groups operate on the dark web, 8.1% of listings on darknet marketplaces are for illicit drugs, and illegal financing takes up around 6.3% of all dark web markets. You can purchase someone’s credit card number for as little as $9 on the dark web – how is any of this respectful, just, or fair?

Think about it this way…

In 2021, a hacker posted 700M LinkedIn records on the dark web, exposing 92% of LinkedIn users. Your data, the one you work hard to protect, was probably (if not almost certainly) exposed in that breach. That means that your phone number, geolocation, and connected social media accounts were posted for sale by hackers on the dark web. The “bad guys” saw an opportunity to exploit your privacy, my privacy, your friends’ privacy, and your family’s privacy in exchange for a profit – yet their actions were permissible under the guise of privacy and anonymity.

Let’s look at this example through the lens of the Belmont Report:

* Respect for Persons – Hacking is clearly detrimental to innocent users of the web, yet hacking is a repeatable offense and difficult to prevent from occurring
* Beneficence – Hackers don’t consider the potential risks that would befall on innocent people, only the benefits they would stand to gain from exposing these accounts
* Justice – 700M records were unfairly exposed, and the repercussions were not evenly distributed nor was there appropriate remediation

There are thousands of more examples (some much more horrifying), where we could apply these frameworks to show how anonymity enables and promotes the abuse of our human right to privacy. The main takeaway is that no, these actions do not reflect a respect for persons approach, they’re not just in nature, and they’re certainly not fair.

Conclusion
Privacy is a fundamental part of our existence and it deserves to be protected – to an extent. The Tor browser originally presented itself as a morally righteous platform for users to evade censorship, but the dark deeds that occur on darknets nowadays defeat the purpose of privacy entirely. With that in mind, the Belmont Report is a wonderful framework for assessing data protection, but I believe it requires some (major) tweaks to encompass more extreme scenarios.

At the end of the day, your privacy is not nearly as protected as the privacy of criminals on the dark web. Criminals are kept safe because privacy is a human right, yet they are permitted to abuse this privacy in a way that exploits innocent people, harms society, and provides a hub for lawbreaking of the highest degree. At the same time, the law enforcement and government agencies that work to uphold privacy are the same ones breaking this human right in order to catch these “bad guys”. If you ever find yourself scouring through the dark web, proceed with caution, because even in the most private of locations, you’re always being watched!

Like I said earlier – an ethical, head-scratching conundrum that I will continue to mull over for years.

References

[1] Dark Web and Its Impact in Online Anonymity and Privacy: A Critical Analysis and Review
[2] How Much of the Internet is the Dark Web in 2022?
[3] The Truth About The Dark Web – IMF F&D.
[4] Taking on the Dark Web: Law Enforcement Experts ID Investigative Needs | National Institute of Justice

October 26, 2022

Are You Playing Games or Are the Games Playing You?

Are You Playing Games or Are the Games Playing You?
By Anonymous | October 21, 2022

Is our data from playing games being used to manipulate our behavior? When we or our children play online games, there is no doubt we generate an enormous amount of data. This data includes what we would expect from a Google or Facebook (such as location, payment, or device data), but what is not often considered is that this also includes biometric and detailed decision data from playing the game itself. Of course, this in-game data can be used for a variety of purposes such as fixing bugs or improving the game experience for users, but many times it is used to exploit players instead.

Source: https://www.hackread.com/gaming-data-collection-breach-user-privacy/

Data Usage
To be more specific, game developers nowadays are utilizing big datasets like those shown in the image above to gain insights into how to keep players playing for longer and spend more money on the game.[1] While big developers have historically had analytics departments in order to figure out how users were playing their game, even smaller developers today have access to middleware created by external parties that can help to refine their monetization strategies.[2] Some gaming companies even aggregate external sources of data on users such as surveys that infer personality characteristics based on how they play the game. In fact, game developers specifically use decision data like dialogue choices to build psychological profiles of their players, allowing the developers to figure out how impulsive or social they are, isolating players that might be more inclined to spend money or be more engaged.[3] Games such as Pokemon GO can take it a step further by aggregating data from our phones such as facial expressions and room noises in order to further refine this profile.

To capitalize on these personality profiles, developers then build in “nudges” into the games that are used to manipulate players into taking certain actions such as purchasing online goods or revealing personal information. This includes on-screen hints about cash shops, locking content behind a pay wall, or forcing players to engage in loot box mechanics in order to remain competitive. This is highly profitable from games ranging from FIFA to Candy Crush, allowing their parent companies to generate billions in revenue per year.[1]

Source: https://www.polygon.com/features/2019/5/9/18522937/video-game-privacy-player-data-collection

Aside from microtransactions, developers can also monetize this data through targeted advertising to their users, matching the best users based on the requirements of the advertiser.[4] Online games not only provide advertisers with the ability to reach a large-scale audience, but to engage players through rewarded ads as well.

Worse Than a Casino for Children
Given external parties ranging from middleware providers to advertisers have access to intimate decision-making data, this brings up a whole host of privacy concerns. If we were to apply Nissenbaum’s Contextual Integrity framework for privacy to gaming, we could compare online games to a casino. In fact, loot boxes specifically function like a slot machine in that it provides uncertain reward and dopamine spikes to players if they win, encouraging addiction. Similar to how a casino targets “whales” that account for the majority of their revenue, online games also try do the same, allowing them to maximize revenue through microtransactions. Yet unlike casinos, online games are not only allowed, but prevalent amongst young adults under the age of 18 and has problems that extend beyond gambling addiction. In Roblox, one of the most popular children’s games in the world (that allows children to monetize in-game items in the games that they create), there have been numerous reports of financial exploitation, sexual harassment, and threats of dismissal for noncompliance.[5]

Conclusion
While there have been efforts to raise awareness about the manipulative practices of online gaming, the industry still has a long way to go before a clear regulatory framework is established. The California Privacy Rights Act is a step in the right direction as it prohibits obtaining consent through “dark patterns” (nudges), but whether gamers will ultimately have the ability to limit sharing of decision data or delete it from the online game itself remains to be seen.

Sources:
[1] https://www.brookings.edu/techstream/a-guide-to-reining-in-data-driven-video-game-design-privacy/

[2] https://www.wired.com/story/video-games-data-privacy-artificial-intelligence/

[3] https://www.polygon.com/features/2019/5/9/18522937/video-game-privacy-player-data-collection

[4] https://www.hackread.com/gaming-data-collection-breach-user-privacy/

[5] https://www.theguardian.com/games/2022/jan/09/the-trouble-with-roblox-the-video-game-empire-built-on-child-labour

October 26, 2022

Synthetic Data: Silver Bullet?

Synthetic Data: Silver Bullet?
By Vinod Viswanathan | October 20, 2022

One of the biggest harms that organizations and government agencies can cause to customers and citizens is exposing personal information arising out of security breaches exploited by bad actors, both internal and external. A lot of the security vulnerability is the result of a conflict between securing data while allowing safe sharing of data; goals that are primarily at odds with each other.

Synthetic data is artificially generated data through machine learning techniques that model the real world. Artificial data, to qualify as synthetic data, must have two properties. It must retain all the statistical properties of the real world data and it must not be possible to reconstruct the real world data from the artificial data. This technique was first developed in 1993, in Harvard University, by Prof. Donald Rubin who wanted to anonymize census data for his studies and was failing to do it. He instead used statistical methods to create an artificial dataset that mirrored the population statistics of the census data allowing him and his colleagues to analyze and draw inferences without compromising the privacy of the citizens. In addition to privacy, synthetic data allowed for large data sets to be generated and solved the data scarcity problem as well.

As privacy legislation progressed along with efficient large-scale compute, synthetic data started to play a bigger role in machine learning and artificial intelligence by providing anonymous, safe, accurate, large-scale, flexible training data. The anonymity guarantees allowed collaboration; cross-team, cross-organization and cross-industry collaboration providing cost effective research.

Synthetic data mirrors the real world including its biases. One way the bias shows up is through the underrepresentation of certain classifications (groups) in the dataset. As this technique is capable of generating data, it can be used to boost the representation in the dataset while being representative of the classification.

Gartner report, released in June 2022, estimates that by 2030 synthetic data will completely replace real data in training models.

So, have we solved the data problem ? Is synthetic data the silver bullet that is going to allow R&D with personal data with all of the privacy harms.

Definitely not.

Synthetic data can improve representation only if a human involved in the research is able to identify the bias in the data. Bias, by nature, is implicit in humans. We have it and typically we do not know or realize it. Therefore, it is hard for us to pick

it up in the dataset; real or synthetic. This realization of bias continues to be a problem even though safe sharing and collaboration with a diverse group of researchers increases the odds of removing the blindfolds and addressing the inherent bias in the data.

The real world is hardly constant and the phrase “the only constant in life is change” is unfortunately true. The safe, large, accurate and anonymous dataset that can support open access can blind researchers into using these datasets even when the real world has changed. Depending on the application, even a small change in the real world can introduce large deviations in the inferences and predictions from the models that use the incorrect dataset.

Today, the cost of computing power needed to generate synthetic datasets is expensive and not all organizations can afford it. The cost is exponentially higher if the datasets involve rich media assets; images and video, which are very common in the healthcare and transportation automation industries. It is also extremely hard to validate synthetic datasets and their source real world data to generate identical results in all research experiments.

The ease and the advantages of synthetic data can incentivize laziness in researchers, where the researchers simply stop doing the hard work of collecting real-world data and default to synthetic data. In a worst-case scenario, deep-fakes for example makes it extremely difficult to distinguish real and synthetic data allowing misinformation to propagate into the real world and through real world events and data back into synthetic data creating a vicious cycle with devastating consequences.

In summary, don’t drop your guard if you are working with synthetic data. What Is Synthetic Data? Gerard Andrews, Nvidia, June 2021

Sources:

https://blogs.nvidia.com/blog/2021/06/08/what-is-synthetic-data/

The Real Deal About Synthetic Data, MIT Sloan Review, Winter 2022

https://sloanreview.mit.edu/article/the-real-deal-about-synthetic-data/

How Synthetic Data is Accelerating Computer Vision

https://hackernoon.com/how-synthetic-data-is-accelerating-computer-vision-xp153 w6q

October 25, 2022October 25, 2022

Your Social Credit Score: Everything and Everywhere Recorded and Analyzed

Your Social Credit Score: Everything and Everywhere Recorded and Analyzed

By Anonymous | October 20, 2022

Imagine a world where your hygiene, fashion, and mannerisms directly affect your ability to rent an apartment. Every part of you is nitpicked for the world and reflected back to your social credit score which in turn affects your career, finances, and even living situation. Seems extreme, right? Well, this scenario is becoming a reality to Chinese citizens with the country’s Social Credit System.

[IMAGE1: scored_passerbys.jpg]

What is it?

Though China has had ideas about a credit system since 1999, the official Chinese Social Credit System was announced in 2014 after building the necessary infrastructure for the system to be added upon. By using a combination of data gathering and sharing, curation of blacklists and redlists, and punishments, sanctions and rewards, the system would uphold values like financial creditworthiness, judicial enforcement, societal trustworthiness, and government integrity. With the right score, you can expect rewards like getting fast-tracked for approvals or having fewer inspections and audits. But if you have the wrong score, you can face punishments like employment, school, and even travel bans. From this, we can see the system as an oppressive way to force “good behavior” despite the methods being invasive and dismissive of their people’s privacy and autonomy.

The main cause of concern though is the integration of technology into this “social credit system”, namely with the use of their 200 million surveillance cameras from their Artificial Intelligence facial recognition project aka SkyNet, online behavior tracking systems in all spots of the internet, and predictive analytics for identifying “political violations”. With all these technologies at their disposable, we can see the numerous different privacy harms being committed without any direct consent from their citizens.

[IMAGE2: commuters_and_cameras.jpg]

What are the privacy harms?

This entire system has many potential data risks from data collection to the data analysis to the actions after the predictive analysis. On top of that, I want to reiterate that the citizens had never given consent to participate in such an invasive system.

How is it ethical to gather so much sensitive data in a single system and allow numerous data scientists to have access to such personal data? Even if the participants are anonymous, it doesn’t change the fact that these scientists have access to personal identifying data, financial data, social media data, and more; a unique ID would do nothing to protect these participants from bad actors hoping to use this data in a malicious way. Additionally, the government is tight-lipped about how all this data computes a score that affects the participant’s own livelihood and potentially their dependents. This single score dictates so much, yet there isn’t a way for citizens to gain insight into the system or have some technological due process to let the government know if there is an issue. This clear lack of transparency from the system’s creators makes the treatment oppressive to everyone involved.

In addition to the lack of transparency, there is a clear lack of accountability and again due process that would allow some form of correction if the algorithm doesn’t correctly output a score reflective of a citizen’s standing. Like with all data-related endeavors, there is an inherent bias that comes with how the data is being analyzed and what comes out of it; if someone who doesn’t know much about the day-to-day struggles of a Chinese citizen, how can they correctly infer from the data the relative standing of said citizen? How can a couple of algorithms accurately score more than a billion citizens in China? There are bound to be numerous experiences and actions that may actually be okay in the eyes of a human being but deemed “dishonorable” in the logic of the algorithms. By not having an explanation of the system or even a safeguard to avoid situations like this, there are bound to be numerous participants that needlessly fight against a system built against them from the very beginning.

What can we learn from this?

Just like with any system built on people and their data, there is a level of harm committed against the participants that we need to be aware of. It’s important to continue advocating for the rights of the participants rather than the “justice” of the project because incorrect or rushed uses of the data can create harmful consequences for those involved. From this analysis of China’s Social Credit System, we hopefully can learn a thing or two about how powerful the impact of data can be in the wrong context.

Sources

Donnelly, Drew. (2022). “China Social Credit System Explained – What is it & How Does it Work?”. https://nhglobalpartners.com/china-social-credit-system-explained/

Thoughts, Frank. (2022). “Social Credit Score: A Dangerous System”. https://frankspeech.com/article/social-credit-score-dangerous-system

Reilly, J., Lyu, M., & Robertson, M. (2021). “China’s Social Credit System: Speculation vs. Reality”. https://thediplomat.com/2021/03/chinas-social-credit-system-speculation-vs-reality/

October 25, 2022

Would privacy best practices make workforce surveillance less creepy?

Would privacy best practices make workforce surveillance less creepy?

By Andrew Sandico | October 20, 2022

Workforce surveillance is an emerging threat to a person’s privacy and there are limited ways to protect employees.

With the rise of available platforms for employee data, there is more opportunity to measure employee’s activity at work and their productivity. This monitoring of employee activity at work has been defined as workplace surveillance and these new measurements create risks of employee’s privacy since the data can easily be misused. New data sources are becoming more available to do workplace surveillance such as video recordings, pictures, monitoring keystrokes, or counting mouse clicks. All these examples potentially create risk of incorrect measurements for evaluating worker productivity (what employees produce at work) due to missing context. Although it is possible to measure workforce productivity through surveillance, should we?

[6] Examples of Workplace surveillance methods from the NYPost.

The Bad….but some Good?

During the COVID pandemic, many employees went through one of the largest social experiments that was unplanned for many companies. Specifically, employees began to work from home with limited physical availability for managers to see what their employees were doing. Due to this rise, many businesses and managers explored options to better understand what their workers were doing and there was strong interest in measuring their employee’s productivity. Some of these examples include having a platform which could measure their employees working time of when hitting keystrokes, but for some occupations, work isn’t measured by typing but by other activities such as discussions with a customer or talking to a colleague at work about a project. A few key areas of concern to highlight are:

Intrusion– Leveraging a privacy framework such as Solove’s Taxonomy, workforce surveillance could impede a person’s privacy by gathering more information than needed. When a platform is using an employee’s camera they can pick up events that could be used against them. For example, a manager seeing a lot of children in their employee’s camera might determine that specific employee as being less productive compared to another worker that don’t have children in the background.
Discrimination – Data pulled from multiple sources without the appropriate context could inaccurately classify employees into categories (called predictive privacy harm or the inappropriate generalization of personal data). For example, if a marginalized demographic group has a lower key stroke average to other groups due to a small sample size, that group could be inaccurately categorized.
Micromanaging – Per Gartner’s research, they have found that micromanaging has negative impacts on employees such as making them less engaged, motivated, and/or productive [7]. With workforce surveillance, this would only be amplified by technology due to a perceived constant monitoring.

[7] Gartner Diagram of micromanagement impacts on a team

Although there are concerns, is there a gain for the subject? On the flip side, even with all these risks there is feedback of positive reasons for workforce surveillance. Examples include leveling the performance evaluation in companies to be more scientific vs subjective relationships. Per the Rise of Workplace surveillance podcast from the New York times [1], females particularly expressed positive feedback for this since it gave them empirical evidence on their work performance vs where other male counterparts might be subjectively given higher performance results due to relationships. For others that appreciate seeing their work quantified, they stated the benefit from seeing their own measurements to either validate their work as an accomplishment or help them be more productive. Balancing all these concerns, will privacy best practices make workforce surveillance less creepy and more beneficial?

How Privacy Can Help

Respect for a person is key. Leveraging the Belmont Report as a privacy framework, using data to help the employee or by giving aggregated themes to a manager to improve as a leader can help with development. Development for an employee is using data and synthesizing it to help nudge them on their work practices. Imagine a Fitbit for your work, where there are reminders of how to manage work life balance. For example, if you set your work hours from 9-5 but you always find yourself responding to emails at night those few emails can quickly add up to a few more hours of work. These nudges can help remind you of working extra hours and reduce the risk of burnout. For managers, they are also employees with development as well. Understanding the impact of a manager sending a late email to an employee could lead to extra hours of working in the night. Although easier for the manager to send at the moment, the good of understanding this behavior could help the manager change their work habit which would reduce their employee’s burnout. Products such as Viva built into Microsoft 365 is an example of how data can be used by a manager for their development while maintaining the privacy of the employee [5]. Viva provides nudges to managers on some of their influences on their employees’ work habits which a manager can then adjust.

[5] Example of how products like Microsoft Viva help managers understand their employee habits to help them lead their team.

Workforce surveillance is a slippery slope with limited privacy safeguards today, but with emerging platforms and stronger interest from businesses, appropriate privacy policies increase the possibility to use the data for the good of employee development and not for measuring performance. Using this data for performance is often labeled as weaponizing since it can be used as a way to justify giving someone a lower bonus, not giving a promotion, and/or terminate them. Without these privacy guidelines it will be very possible for companies to use the same data for incorrect use. Which brings another dilemma to platform companies that are sources of this data, would they sell the data that they collect from their platform as a service to other companies that do want to use that information for weaponizing. This additional risk of platform companies aggregating this information is another reason to have these companies have privacy by design embedded into their work.

Conclusion

The risk of workforce surveillance poses a major risk to a person’s privacy but if done correctly for the benefit of the person (employee) there are rewards. With meaningful privacy best practices that are followed, the benefits of workforce surveillance could outweigh the risks.

References

[1] The Rise of Workplace Surveillance – The New York Times (nytimes.com)

[2] Workplace Productivity: Are You Being Tracked? – The New York Times (nytimes.com)

[3] ‘Bossware is coming for almost every worker’: the software you might not realize is watching you | Technology | The Guardian

[4] Management Culture and Surveillance

[5] Viva Insights: what is it and what are the benefits? | Creospark

[6] ‘It’s everywhere’: Workplace surveillance methods now common (nypost.com)

[7] Micromanaging Your Remote Workers? Act Now to Stop Yourself. (gartner.com)

October 25, 2022October 25, 2022

Big Corporate: Big Brother?

Big Corporate: Big Brother?

By Tajesvi Bhat | October 20, 2022

Have big corporations taken advantage of social media platforms in order to expand their reach and power beyond all limits?

Social media is so integrated in our lives that it has introduced an entire new subcategory of communication: DMs (Direct Messages). While originally only between friends and family, DMs and social media have spread to the professional world. The advent of LinkedIn- a professional networking service- encouraged the applications of social media mannerisms such as DMs for professional opportunities, networking, and recruiting. Hiring managers did not limit themselves to the confines of LinkedIn though, and used mainstream social media platforms in order to assess the “professional value” of an employee (Robards,2022).

Cybervetting, a thorough investigation of an individual’s social media presence, is increasingly being used by hiring managers and companies in an attempt to decide which employees seem to match the company values and would be able to integrate themselves successfully with the existing company culture. A study published in March 2022 analyzed reports of employment terminations as a result of social media postings. They found that the terminations largely fell in 2 categories: those as a result of self-posts and those as a result of third party posts (ie. someone else posting something in which the individual is mentioned or is involved). As a result of social media investigations, these employees were consequently terminated and the employer’s reasonings for terminations were noted. The below image shows a breakdown for the most common reasons for employee termination for both self-posts and third-party posts:

IMAGE 1: Distribution of Employment Termination Reasons across Self and Third-Party posts (Robards,2022)

The most common source for this information was Facebook, closely followed by Twitter. The most common reasonings for firing for self-posts being racial discrimination. The most common reasonings for firing for third party posts was: . The most common careers who are and have been held accountable as a result of their social media presence are: law enforcement, education workers, hospitality, media, medical, retail, government, and transport workers.

Employers justify the use of cybervetting as a “form of ‘risk work’, with the information collected online used to mitigate risk by determining compatibility and exposing red flags that would suggest a misfit, compromise the image of the employer, or disrupt the workplace” (Robards,2022). The normalization of this practice raises questions about employee visibility as well as the ethical boundaries of employers and Freedom of Speech protections. According to Robards’ study, “27% of participants experienced employer “disapproval” of their social media activities, and some even experienced attempted influence of their use of social media by employers” (Robards,2022). Clearly, cybervetting is a process that is not limited to hiring, and rather continues for the duration of employment. It is incredibly concerning that employers are watching employee social media posts as it is most likely being done without their consent or knowledge, and also the reasoning behind it is not entirely clear. While it is understandable that an employer would want to ensure employees are following their policies and general code of conduct, is that something employees need to follow outside of work hours and off of company
property? Cybervetting also introduces further opportunity for bias and discrimination (which are already prevalent in the hiring process) and narrows the gap between personal and professional lives.

While high school students are often told to be wary of their social media presence as they apply to colleges, no one reminds the every-day adult to be cautious about social media usage since it is often not considered. However, it clearly is a common occurrence that is causing adults to either reconsider their social media usage, limit their account accessibility, or create fake profiles. This surveillance has an enormous impact on the youths who are more likely to alter their personality, or at least their digital personality, and develop a false persona in order to portray themselves as an exact match for future employer expectations. In fact, this general awareness on surveilled social media has influenced the creation of “finstas” (fake instagram accounts) and tactics including “finely tuned privacy settings, deleting content, and maintaining multiple profiles for different imagined audiences” (Robards,2022) in an attempt to provide security and anonymity. In addition to fake accounts, individuals now attempt to use platforms that employers may be less likely to see such as Tumblr versus Facebook, and it will be interesting to see how the platform of choice for the most security and lowest visibility change over time. This creates, and promotes, an alternate reality in which individuals are catering to expectations rather than being their true selves as was the original intent of social media platforms.

IMAGE 2: Most commonly used social media platforms

Public response to employment terminations due to social media postings have been divided between two general categories: either termination was caused by public outcry, or the termination resulted in public outcry. The first is a pleasant surprise because it indicates that the companies are learning to hold not only themselves, but also their employees accountable for actions that are generally deemed inappropriate. However, it also implies that the company is taking action only to satisfy the public and that otherwise it would have done nothing. The employer also then falls prey to the idea of putting on a false persona in order to regain social approval. The second category is when corporations misuse their reach and monitoring of social media platforms in order to terminate employees for speaking out against harsh working conditions, health and safety risks, or share their general disapproval for corporate policies or proceedings. Social media monitoring by employers overall seems to be an invasion of personal rights and freedom of speech, and termination or hiring decisions are incredibly prone to bias but unfortunately continue to be heavily prevalent today. Caution is advised to those who post on social media, big brother truly is watching.

Sources
Robards, B., & Graf, D. (2022). “How a Facebook Update Can Cost You Your Job”: News Coverage of Employment Terminations Following Social Media Disclosures, From Racist Cops to Queer Teachers. Social Media + Society, 8(1). https://doi.org/10.1177/20563051221077022

Image 2:

https://www.socialsamosa.com/2019/03/infographic-most-popular-social-media-platforms-of-2019/

October 25, 2022

Balancing Checks and Balances

Balancing Checks and Balances

Anonymous | October 20, 2022

While the US Government guides data policies and practices for private companies, their own policies and practices have caused them to fall behind with the current digital era.

In today’s digital era, there is data being collected in every interaction, whether that’s the grocery store, scrolling through Tik Tok or walking through the streets of New York. Companies collect this data to understand the patterns of the average citizen, whether that’s how often people buy Double Stuffed Oreos™, how long people spend watching the newest teenage dance trend or the amount of foot traffic that crosses 6th and 42nd at 5 PM every day. This information is used to shape the way that companies evolve their products to appeal to the “average citizen” and reach the widest range of users. In order to do this, they must collect terabytes and terabytes of data to determine how the “average citizen” acts.

Congress sets the data privacy laws that govern the way private companies can collect, share and use these terabytes and terabytes of personal data. These laws are put in place to protect US citizens and give them power on how and what information can be collected, used and shared. These rules and regulations are made to make sure companies are good data stewards to ensure their personal information is not exposed to hurt them. These damages are most commonly in the form of personal and identity theft. There are still millions of cases of hackings and data leaks every year, but the rules and regulations have forced private companies to implement safer data practices.

In recent years, there have been multiple data breaches from US Government entities, like the Department of Defense, Homeland Security, Health and Transportation. In 2015, the Office of Personnel Management was successfully targeted, which exposed personal information for 22 million federal employees. This office governs all federal employees, which means that Social Security Numbers, bank information and medical histories was captured. All of the data that these government agencies collected was exposed and left millions of citizens vulnerable to personal and identity theft. In order to limit the power and freedom of any one individual or entity, there is a system of Checks and Balances in place. I understand that and I agree, but this comes at the expense of adequate technology and infrastructure to be good data stewards. The US government values ethics and effectiveness over efficiency.

I have worked for the federal government for my whole career, so I am not writing this as an anti-government enthusiast. I have seen how internal rules and regulations have hindered the success of data scientists, like myself. We have trouble implementing safer data practices and policies because of the hoops and hurdles that must be jumped through in order to change the way things have always been done. None of these leaks were intentional and were of no fault of just one person. In my experience as a federal employee, the aggregation of many small mistakes can lead to one very large one, like a data leak. Data practices in the government are far behind that of the private industry. Again, this is not intentional or because of one person’s sly decisions. The rules and regulations that government entities have to follow are strict and tedious, with reason. Many levels of approvals and authorizations were put in place to keep the power away from one person or entity if anyone aspires to make any monumental change. This system of Checks and Balances is necessary to keep a large government functioning ethically and effectively, but it sacrifices efficiency.

While the government has valid reasons for its extensive processes, there must be change in order to quickly implement safer data practices to protect US citizens and their information. I know there is not one button to push or one lever to pull that will fix everything. It will be a slow and generational process, but essential to stay in line with the rapidly evolving digital era we are in.

While the government may be utilizing data for ethical endeavors, like trying to find the best food markets to subsidize to help low-income families, understanding the interests of children to better target teen smokers, or identifying the areas with the highest rates of homelessness, there is still a lot of data being collected every day and data practices are not updated to match the current digital era. If we cannot change culture and streamline implementation processes to balance the system of Checks and Balances, we will continue to leave US Citizens at risk of exposure.

“US Government: Deploying yesterday’s technology tomorrow”

Anonymous Federal Employee

October 25, 2022October 25, 2022

The Hidden Perils of Climbing the Alternative Credit Ladder

The Hidden Perils of Climbing the Alternative Credit Ladder

Alberto Lopez Rueda

Alternative data could expand credit availability but in the absence of clear regulation, consumers should carefully consider the risks before asking for an alternative credit score

25 million Americans do not have credit history; they are “credit invisible”. This can make it very difficult to qualify for financial products, such as home mortgages or student loans. Credit is an important cornerstone of achieving the American dream [1] of homeownership, education and success which in turn is key in social mobility [2].

In order to access credit, consumers need to have a credit score. Yet, traditional credit scores are based on consumers’ credit history. How could millions of users dodge this “catch-22” problem and qualify for credit? Alternative data may be the answer according to some companies [3]. Alternative data is defined as any data that is not directly related to a consumer’s credit history and can include datasets such as individuals’ social media, utility bills and web browsing history.

Caption: Types of alternative data in credit scoring. Source.

Credit agencies argue that alternative data can build a more inclusive society by expanding the availability of affordable credit to generally vulnerable populations. But, at what cost?

Ethical and privacy concerns

The use of alternative credit scores raises important questions about transparency and fairness. In addition, some types of alternative data may entrench or even amplify existing social inequalities and could even discriminate against consumers based on protected characteristics like race. There are also hidden privacy risks, as companies generally collect vast amounts of data that can then be shared or re-sold to third-parties.

Alternative credit models are less transparent than traditional scores. Companies generally do not detail the alternative data that was utilized or disclose the methodology used to derive the alternative scores. Furthermore, alternative credit models are also harder to explain than traditional scores, due to the sophisticated models that produce the results. In addition, alternative data can contain potential errors and inaccuracies, which may be very challenging to detect or correct. Without full transparency, consumers’ risk being treated unfairly.

Alternative data may also perpetuate existing inequalities. According to data from FICO, one of the most known credit scoring companies, the FICO XD scores based on utility bills are meaningfully lower than those derived with traditional data. A low score can still provide access to credit although it generally means being charged higher interest rates or premiums. Receiving a FICO score below 580 [4] may even harm consumers since having a score classified as “poor” is generally worse than having no score [5]. However, these results cannot be generalized since the scores largely depend on the types of alternative data and algorithms used.

Caption: Distribution of FICO Scores between alternative data (FICO Score XD) and traditional scores (FICO 9). Source

New datasets could even facilitate discrimination against consumers. The Equal Credit Opportunity Act of 1974 forbids companies from using information from protected classes such as race, sex, and age for credit purposes. Yet, some alternative data indicators such as educational background [6] can be highly correlated to one of these protected classes.

Lastly, users of alternative data scores might be also exposed to meaningful privacy harms. Firstly, companies may collect vast amounts of information about consumers and although consumers are generally notified, they may not be aware the extent to which their privacy can be invaded. In addition, all this information collected could be shared or re-sold for completely different purposes, potentially harming consumers in the future.

Caption: Credit agencies can collect vast and detailed information about consumers

Source

The US regulatory landscape around alternative data scores

In 2019, five US financial regulatory agencies [7] backed the use of alternative information in traditional credit-evaluation systems, encouraging lenders to “take steps to ensure that consumer-protection risks are understood and addressed”. Yet, most of the relevant regulation, such as the Fair Credit Reporting Act, dates back several decades and the advent of big data has inevitably brought some regulatory blind spots. It is imperative the US government enacts substantiative regulation for alternative credit scores to fulfil its potential. Important issues, such as the kinds of alternative data and indicators that can be used, must be addressed. Mechanisms should also be put in place to prevent discrimination, enhance transparency and protect consumers.

Without clear regulatory boundaries, vulnerable consumers might be forced to choose between a life without credit or a risky path of discrimination and lack of privacy. In other words: Dammed if you do, dammed if you don’t.

Conclusion

The advent of alternative data and new algorithms could help expand affordable credit to those who are credit invisible. More credit could in turn lower social inequalities by helping vulnerable consumers. Yet, the use of alternative credit scores remains in its infancy and the lack of explicit regulation can result in harmful practices against users. Against this backdrop, consumers should thoroughly compare the different alternative credit scores and carefully consider the sometimes-hidden disadvantages of climbing the alternative credit ladder.

Citations

[1] Barone, A. (2022, August 1). What Is the American Dream? Examples and How to Measure It. Investopedia.

https://www.investopedia.com/terms/a/american-dream.asp

[2] Ramakrishnan, K., Champion, E., Gallagher, M., Fudge, K. (2021, January 12). Why Housing Matters for Upward Mobility: Evidence and Indicators for Practitioners and Policymakers. Urban Institute

https://www.urban.org/research/publication/why-housing-matters-upward-mobility-evidence-and-indicators-practitioners-and-policymakers

[3] FICO (2021). Expanding credit access with alternative data

https://www.fico.com/en/resource-access/download/15431

[4] McGurran, B. (2019, July 29). How to “Fix” a Bad Credit Score. Experian

https://www.experian.com/blogs/ask-experian/credit-education/improving-credit/how-to-fix-a-bad-credit-score/

[5] Black, M., Adams, D. (2020, December 10). Is No Credit Better Than Bad Credit?. Forbes

https://www.forbes.com/advisor/credit-score/is-no-credit-better-than-bad-credit/

[6] Hayashi, Y. (2019, August 8). Where You Went to College May Matter on Your Loan Application. The Wall Street Journal.

https://www.wsj.com/articles/where-you-went-to-college-may-matter-on-your-loan-application-11565258402?mod=searchresults&page=1&pos=1&mod=article_inline

[7] Hayashi, Y. (2019, December 3). Bad Credit? Regulators Back Ways for Risky Borrowers to Get Loans. The Wall Street Journal.

https://www.wsj.com/articles/bad-credit-alternative-data-wins-support-as-a-way-to-ease-lending-11575420678

October 25, 2022

DuckDuckGo: What happens in search stays in search

DuckDuckGo: What happens in search stays in search
Li Jin | October 20, 2022

We’ve been used to searching everything online without even thinking: the medical, financial, and personal issues, most of which should be private. However, on Google, the most popular search engine in the world, searches are tracked, saved, and used for targeting advertisements or something even worse. The Incognito mode that Google provides is not entirely private. It simply deletes information related to your browsing session. Still, it only does that after you end your session by closing all your tabs.

Established in 2008, DuckDuckGo set out with the mission of “privacy protection available to everyone.” It allows for complete anonymous web browsing. This makes it an ideal choice for anyone that hates ads and/or being tracked online. It offers perfect privacy: No data on users’ online searches are collected or stored. No ads targeting users based on their previous searches. No social engineering techniques are used based on users’ searches and other interests. Everyone can be sure of getting the same search results as all other users.

There are several ways DuckDuckGo did to ensure users’ privacy is protected. First, it exposes the major tracking networks tracking you over time and blocks the hidden third-party trackers on websites you visit, including those from Google and Facebook. Second, Searches made through DuckDuckGo also automatically connect to the encrypted versions of websites wherever possible, making it harder for anyone to see what you’re looking at online. Third, DuckDuckGo calculates and shows users a Privacy Grade as a reminder regarding online privacy. Fourth, of course, Search Privately. DuckDuckGo search doesn’t track you. Ever.

As for its down side, some argued that the quality of DuckDuckGo’s search results are not as good as Google. DuckDuckGo offers a more unfiltered version of the internet. Unlike Google constantly updates its search algorithm tailored to its users’ browsing habits. You may find the results that have been penalized or removed by Google because they are dangerous, patently false, or just misinformation designed to confuse people. But DuckDuckGo claims they use algorithms to calculate all the results and filter millions of possible results down to a ranked order. As a search engine, DuckDuckGo doesn’t offer as many services as Google, and as a result, is less convenient. And of course, because DuckDuckGo doesn’t track and profile users, it does not and cannot provide customized searches like Google.

Currently, DuckDuckGo is Ranking #4 of Utilities in the App Store for iPhone. It’s not as popular as the #1 Google and #2 Chrome, which offers built-in features such as Google Maps, Flights, etc. They are integrated with your other Google accounts and products, which can sometimes be rewarding. But DuckDuckGo rates 4.9 in 1.4M Ratings among Utilities in the App Store for iPhone. The users appreciate DuckDuckGo due to the respect for privacy and the heads-up it offers when something underhanded goes down.

DuckDuckGo is essential to people who care about privacy more than convenience. It will likely thrive in its niche market.