Jared Maslin – Page 2 – Data Science W231 | Behind the Data: Humans and Values

October 26, 2022

Are You Playing Games or Are the Games Playing You?

Are You Playing Games or Are the Games Playing You?
By Anonymous | October 21, 2022

Is our data from playing games being used to manipulate our behavior? When we or our children play online games, there is no doubt we generate an enormous amount of data. This data includes what we would expect from a Google or Facebook (such as location, payment, or device data), but what is not often considered is that this also includes biometric and detailed decision data from playing the game itself. Of course, this in-game data can be used for a variety of purposes such as fixing bugs or improving the game experience for users, but many times it is used to exploit players instead.

Source: https://www.hackread.com/gaming-data-collection-breach-user-privacy/

Data Usage
To be more specific, game developers nowadays are utilizing big datasets like those shown in the image above to gain insights into how to keep players playing for longer and spend more money on the game.[1] While big developers have historically had analytics departments in order to figure out how users were playing their game, even smaller developers today have access to middleware created by external parties that can help to refine their monetization strategies.[2] Some gaming companies even aggregate external sources of data on users such as surveys that infer personality characteristics based on how they play the game. In fact, game developers specifically use decision data like dialogue choices to build psychological profiles of their players, allowing the developers to figure out how impulsive or social they are, isolating players that might be more inclined to spend money or be more engaged.[3] Games such as Pokemon GO can take it a step further by aggregating data from our phones such as facial expressions and room noises in order to further refine this profile.

To capitalize on these personality profiles, developers then build in “nudges” into the games that are used to manipulate players into taking certain actions such as purchasing online goods or revealing personal information. This includes on-screen hints about cash shops, locking content behind a pay wall, or forcing players to engage in loot box mechanics in order to remain competitive. This is highly profitable from games ranging from FIFA to Candy Crush, allowing their parent companies to generate billions in revenue per year.[1]

Source: https://www.polygon.com/features/2019/5/9/18522937/video-game-privacy-player-data-collection

Aside from microtransactions, developers can also monetize this data through targeted advertising to their users, matching the best users based on the requirements of the advertiser.[4] Online games not only provide advertisers with the ability to reach a large-scale audience, but to engage players through rewarded ads as well.

Worse Than a Casino for Children
Given external parties ranging from middleware providers to advertisers have access to intimate decision-making data, this brings up a whole host of privacy concerns. If we were to apply Nissenbaum’s Contextual Integrity framework for privacy to gaming, we could compare online games to a casino. In fact, loot boxes specifically function like a slot machine in that it provides uncertain reward and dopamine spikes to players if they win, encouraging addiction. Similar to how a casino targets “whales” that account for the majority of their revenue, online games also try do the same, allowing them to maximize revenue through microtransactions. Yet unlike casinos, online games are not only allowed, but prevalent amongst young adults under the age of 18 and has problems that extend beyond gambling addiction. In Roblox, one of the most popular children’s games in the world (that allows children to monetize in-game items in the games that they create), there have been numerous reports of financial exploitation, sexual harassment, and threats of dismissal for noncompliance.[5]

Conclusion
While there have been efforts to raise awareness about the manipulative practices of online gaming, the industry still has a long way to go before a clear regulatory framework is established. The California Privacy Rights Act is a step in the right direction as it prohibits obtaining consent through “dark patterns” (nudges), but whether gamers will ultimately have the ability to limit sharing of decision data or delete it from the online game itself remains to be seen.

Sources:
[1] https://www.brookings.edu/techstream/a-guide-to-reining-in-data-driven-video-game-design-privacy/

[2] https://www.wired.com/story/video-games-data-privacy-artificial-intelligence/

[3] https://www.polygon.com/features/2019/5/9/18522937/video-game-privacy-player-data-collection

[4] https://www.hackread.com/gaming-data-collection-breach-user-privacy/

[5] https://www.theguardian.com/games/2022/jan/09/the-trouble-with-roblox-the-video-game-empire-built-on-child-labour

October 26, 2022

Synthetic Data: Silver Bullet?

Synthetic Data: Silver Bullet?
By Vinod Viswanathan | October 20, 2022

One of the biggest harms that organizations and government agencies can cause to customers and citizens is exposing personal information arising out of security breaches exploited by bad actors, both internal and external. A lot of the security vulnerability is the result of a conflict between securing data while allowing safe sharing of data; goals that are primarily at odds with each other.

Synthetic data is artificially generated data through machine learning techniques that model the real world. Artificial data, to qualify as synthetic data, must have two properties. It must retain all the statistical properties of the real world data and it must not be possible to reconstruct the real world data from the artificial data. This technique was first developed in 1993, in Harvard University, by Prof. Donald Rubin who wanted to anonymize census data for his studies and was failing to do it. He instead used statistical methods to create an artificial dataset that mirrored the population statistics of the census data allowing him and his colleagues to analyze and draw inferences without compromising the privacy of the citizens. In addition to privacy, synthetic data allowed for large data sets to be generated and solved the data scarcity problem as well.

As privacy legislation progressed along with efficient large-scale compute, synthetic data started to play a bigger role in machine learning and artificial intelligence by providing anonymous, safe, accurate, large-scale, flexible training data. The anonymity guarantees allowed collaboration; cross-team, cross-organization and cross-industry collaboration providing cost effective research.

Synthetic data mirrors the real world including its biases. One way the bias shows up is through the underrepresentation of certain classifications (groups) in the dataset. As this technique is capable of generating data, it can be used to boost the representation in the dataset while being representative of the classification.

Gartner report, released in June 2022, estimates that by 2030 synthetic data will completely replace real data in training models.

So, have we solved the data problem ? Is synthetic data the silver bullet that is going to allow R&D with personal data with all of the privacy harms.

Definitely not.

Synthetic data can improve representation only if a human involved in the research is able to identify the bias in the data. Bias, by nature, is implicit in humans. We have it and typically we do not know or realize it. Therefore, it is hard for us to pick

it up in the dataset; real or synthetic. This realization of bias continues to be a problem even though safe sharing and collaboration with a diverse group of researchers increases the odds of removing the blindfolds and addressing the inherent bias in the data.

The real world is hardly constant and the phrase “the only constant in life is change” is unfortunately true. The safe, large, accurate and anonymous dataset that can support open access can blind researchers into using these datasets even when the real world has changed. Depending on the application, even a small change in the real world can introduce large deviations in the inferences and predictions from the models that use the incorrect dataset.

Today, the cost of computing power needed to generate synthetic datasets is expensive and not all organizations can afford it. The cost is exponentially higher if the datasets involve rich media assets; images and video, which are very common in the healthcare and transportation automation industries. It is also extremely hard to validate synthetic datasets and their source real world data to generate identical results in all research experiments.

The ease and the advantages of synthetic data can incentivize laziness in researchers, where the researchers simply stop doing the hard work of collecting real-world data and default to synthetic data. In a worst-case scenario, deep-fakes for example makes it extremely difficult to distinguish real and synthetic data allowing misinformation to propagate into the real world and through real world events and data back into synthetic data creating a vicious cycle with devastating consequences.

In summary, don’t drop your guard if you are working with synthetic data. What Is Synthetic Data? Gerard Andrews, Nvidia, June 2021

Sources:

https://blogs.nvidia.com/blog/2021/06/08/what-is-synthetic-data/

The Real Deal About Synthetic Data, MIT Sloan Review, Winter 2022

https://sloanreview.mit.edu/article/the-real-deal-about-synthetic-data/

How Synthetic Data is Accelerating Computer Vision

https://hackernoon.com/how-synthetic-data-is-accelerating-computer-vision-xp153 w6q

October 25, 2022October 25, 2022

Your Social Credit Score: Everything and Everywhere Recorded and Analyzed

Your Social Credit Score: Everything and Everywhere Recorded and Analyzed

By Anonymous | October 20, 2022

Imagine a world where your hygiene, fashion, and mannerisms directly affect your ability to rent an apartment. Every part of you is nitpicked for the world and reflected back to your social credit score which in turn affects your career, finances, and even living situation. Seems extreme, right? Well, this scenario is becoming a reality to Chinese citizens with the country’s Social Credit System.

[IMAGE1: scored_passerbys.jpg]

What is it?

Though China has had ideas about a credit system since 1999, the official Chinese Social Credit System was announced in 2014 after building the necessary infrastructure for the system to be added upon. By using a combination of data gathering and sharing, curation of blacklists and redlists, and punishments, sanctions and rewards, the system would uphold values like financial creditworthiness, judicial enforcement, societal trustworthiness, and government integrity. With the right score, you can expect rewards like getting fast-tracked for approvals or having fewer inspections and audits. But if you have the wrong score, you can face punishments like employment, school, and even travel bans. From this, we can see the system as an oppressive way to force “good behavior” despite the methods being invasive and dismissive of their people’s privacy and autonomy.

The main cause of concern though is the integration of technology into this “social credit system”, namely with the use of their 200 million surveillance cameras from their Artificial Intelligence facial recognition project aka SkyNet, online behavior tracking systems in all spots of the internet, and predictive analytics for identifying “political violations”. With all these technologies at their disposable, we can see the numerous different privacy harms being committed without any direct consent from their citizens.

[IMAGE2: commuters_and_cameras.jpg]

What are the privacy harms?

This entire system has many potential data risks from data collection to the data analysis to the actions after the predictive analysis. On top of that, I want to reiterate that the citizens had never given consent to participate in such an invasive system.

How is it ethical to gather so much sensitive data in a single system and allow numerous data scientists to have access to such personal data? Even if the participants are anonymous, it doesn’t change the fact that these scientists have access to personal identifying data, financial data, social media data, and more; a unique ID would do nothing to protect these participants from bad actors hoping to use this data in a malicious way. Additionally, the government is tight-lipped about how all this data computes a score that affects the participant’s own livelihood and potentially their dependents. This single score dictates so much, yet there isn’t a way for citizens to gain insight into the system or have some technological due process to let the government know if there is an issue. This clear lack of transparency from the system’s creators makes the treatment oppressive to everyone involved.

In addition to the lack of transparency, there is a clear lack of accountability and again due process that would allow some form of correction if the algorithm doesn’t correctly output a score reflective of a citizen’s standing. Like with all data-related endeavors, there is an inherent bias that comes with how the data is being analyzed and what comes out of it; if someone who doesn’t know much about the day-to-day struggles of a Chinese citizen, how can they correctly infer from the data the relative standing of said citizen? How can a couple of algorithms accurately score more than a billion citizens in China? There are bound to be numerous experiences and actions that may actually be okay in the eyes of a human being but deemed “dishonorable” in the logic of the algorithms. By not having an explanation of the system or even a safeguard to avoid situations like this, there are bound to be numerous participants that needlessly fight against a system built against them from the very beginning.

What can we learn from this?

Just like with any system built on people and their data, there is a level of harm committed against the participants that we need to be aware of. It’s important to continue advocating for the rights of the participants rather than the “justice” of the project because incorrect or rushed uses of the data can create harmful consequences for those involved. From this analysis of China’s Social Credit System, we hopefully can learn a thing or two about how powerful the impact of data can be in the wrong context.

Sources

Donnelly, Drew. (2022). “China Social Credit System Explained – What is it & How Does it Work?”. https://nhglobalpartners.com/china-social-credit-system-explained/

Thoughts, Frank. (2022). “Social Credit Score: A Dangerous System”. https://frankspeech.com/article/social-credit-score-dangerous-system

Reilly, J., Lyu, M., & Robertson, M. (2021). “China’s Social Credit System: Speculation vs. Reality”. https://thediplomat.com/2021/03/chinas-social-credit-system-speculation-vs-reality/

October 25, 2022

Would privacy best practices make workforce surveillance less creepy?

Would privacy best practices make workforce surveillance less creepy?

By Andrew Sandico | October 20, 2022

Workforce surveillance is an emerging threat to a person’s privacy and there are limited ways to protect employees.

With the rise of available platforms for employee data, there is more opportunity to measure employee’s activity at work and their productivity. This monitoring of employee activity at work has been defined as workplace surveillance and these new measurements create risks of employee’s privacy since the data can easily be misused. New data sources are becoming more available to do workplace surveillance such as video recordings, pictures, monitoring keystrokes, or counting mouse clicks. All these examples potentially create risk of incorrect measurements for evaluating worker productivity (what employees produce at work) due to missing context. Although it is possible to measure workforce productivity through surveillance, should we?

[6] Examples of Workplace surveillance methods from the NYPost.

The Bad….but some Good?

During the COVID pandemic, many employees went through one of the largest social experiments that was unplanned for many companies. Specifically, employees began to work from home with limited physical availability for managers to see what their employees were doing. Due to this rise, many businesses and managers explored options to better understand what their workers were doing and there was strong interest in measuring their employee’s productivity. Some of these examples include having a platform which could measure their employees working time of when hitting keystrokes, but for some occupations, work isn’t measured by typing but by other activities such as discussions with a customer or talking to a colleague at work about a project. A few key areas of concern to highlight are:

Intrusion– Leveraging a privacy framework such as Solove’s Taxonomy, workforce surveillance could impede a person’s privacy by gathering more information than needed. When a platform is using an employee’s camera they can pick up events that could be used against them. For example, a manager seeing a lot of children in their employee’s camera might determine that specific employee as being less productive compared to another worker that don’t have children in the background.
Discrimination – Data pulled from multiple sources without the appropriate context could inaccurately classify employees into categories (called predictive privacy harm or the inappropriate generalization of personal data). For example, if a marginalized demographic group has a lower key stroke average to other groups due to a small sample size, that group could be inaccurately categorized.
Micromanaging – Per Gartner’s research, they have found that micromanaging has negative impacts on employees such as making them less engaged, motivated, and/or productive [7]. With workforce surveillance, this would only be amplified by technology due to a perceived constant monitoring.

[7] Gartner Diagram of micromanagement impacts on a team

Although there are concerns, is there a gain for the subject? On the flip side, even with all these risks there is feedback of positive reasons for workforce surveillance. Examples include leveling the performance evaluation in companies to be more scientific vs subjective relationships. Per the Rise of Workplace surveillance podcast from the New York times [1], females particularly expressed positive feedback for this since it gave them empirical evidence on their work performance vs where other male counterparts might be subjectively given higher performance results due to relationships. For others that appreciate seeing their work quantified, they stated the benefit from seeing their own measurements to either validate their work as an accomplishment or help them be more productive. Balancing all these concerns, will privacy best practices make workforce surveillance less creepy and more beneficial?

How Privacy Can Help

Respect for a person is key. Leveraging the Belmont Report as a privacy framework, using data to help the employee or by giving aggregated themes to a manager to improve as a leader can help with development. Development for an employee is using data and synthesizing it to help nudge them on their work practices. Imagine a Fitbit for your work, where there are reminders of how to manage work life balance. For example, if you set your work hours from 9-5 but you always find yourself responding to emails at night those few emails can quickly add up to a few more hours of work. These nudges can help remind you of working extra hours and reduce the risk of burnout. For managers, they are also employees with development as well. Understanding the impact of a manager sending a late email to an employee could lead to extra hours of working in the night. Although easier for the manager to send at the moment, the good of understanding this behavior could help the manager change their work habit which would reduce their employee’s burnout. Products such as Viva built into Microsoft 365 is an example of how data can be used by a manager for their development while maintaining the privacy of the employee [5]. Viva provides nudges to managers on some of their influences on their employees’ work habits which a manager can then adjust.

[5] Example of how products like Microsoft Viva help managers understand their employee habits to help them lead their team.

Workforce surveillance is a slippery slope with limited privacy safeguards today, but with emerging platforms and stronger interest from businesses, appropriate privacy policies increase the possibility to use the data for the good of employee development and not for measuring performance. Using this data for performance is often labeled as weaponizing since it can be used as a way to justify giving someone a lower bonus, not giving a promotion, and/or terminate them. Without these privacy guidelines it will be very possible for companies to use the same data for incorrect use. Which brings another dilemma to platform companies that are sources of this data, would they sell the data that they collect from their platform as a service to other companies that do want to use that information for weaponizing. This additional risk of platform companies aggregating this information is another reason to have these companies have privacy by design embedded into their work.

Conclusion

The risk of workforce surveillance poses a major risk to a person’s privacy but if done correctly for the benefit of the person (employee) there are rewards. With meaningful privacy best practices that are followed, the benefits of workforce surveillance could outweigh the risks.

References

[1] The Rise of Workplace Surveillance – The New York Times (nytimes.com)

[2] Workplace Productivity: Are You Being Tracked? – The New York Times (nytimes.com)

[3] ‘Bossware is coming for almost every worker’: the software you might not realize is watching you | Technology | The Guardian

[4] Management Culture and Surveillance

[5] Viva Insights: what is it and what are the benefits? | Creospark

[6] ‘It’s everywhere’: Workplace surveillance methods now common (nypost.com)

[7] Micromanaging Your Remote Workers? Act Now to Stop Yourself. (gartner.com)

October 25, 2022October 25, 2022

Big Corporate: Big Brother?

Big Corporate: Big Brother?

By Tajesvi Bhat | October 20, 2022

Have big corporations taken advantage of social media platforms in order to expand their reach and power beyond all limits?

Social media is so integrated in our lives that it has introduced an entire new subcategory of communication: DMs (Direct Messages). While originally only between friends and family, DMs and social media have spread to the professional world. The advent of LinkedIn- a professional networking service- encouraged the applications of social media mannerisms such as DMs for professional opportunities, networking, and recruiting. Hiring managers did not limit themselves to the confines of LinkedIn though, and used mainstream social media platforms in order to assess the “professional value” of an employee (Robards,2022).

Cybervetting, a thorough investigation of an individual’s social media presence, is increasingly being used by hiring managers and companies in an attempt to decide which employees seem to match the company values and would be able to integrate themselves successfully with the existing company culture. A study published in March 2022 analyzed reports of employment terminations as a result of social media postings. They found that the terminations largely fell in 2 categories: those as a result of self-posts and those as a result of third party posts (ie. someone else posting something in which the individual is mentioned or is involved). As a result of social media investigations, these employees were consequently terminated and the employer’s reasonings for terminations were noted. The below image shows a breakdown for the most common reasons for employee termination for both self-posts and third-party posts:

IMAGE 1: Distribution of Employment Termination Reasons across Self and Third-Party posts (Robards,2022)

The most common source for this information was Facebook, closely followed by Twitter. The most common reasonings for firing for self-posts being racial discrimination. The most common reasonings for firing for third party posts was: . The most common careers who are and have been held accountable as a result of their social media presence are: law enforcement, education workers, hospitality, media, medical, retail, government, and transport workers.

Employers justify the use of cybervetting as a “form of ‘risk work’, with the information collected online used to mitigate risk by determining compatibility and exposing red flags that would suggest a misfit, compromise the image of the employer, or disrupt the workplace” (Robards,2022). The normalization of this practice raises questions about employee visibility as well as the ethical boundaries of employers and Freedom of Speech protections. According to Robards’ study, “27% of participants experienced employer “disapproval” of their social media activities, and some even experienced attempted influence of their use of social media by employers” (Robards,2022). Clearly, cybervetting is a process that is not limited to hiring, and rather continues for the duration of employment. It is incredibly concerning that employers are watching employee social media posts as it is most likely being done without their consent or knowledge, and also the reasoning behind it is not entirely clear. While it is understandable that an employer would want to ensure employees are following their policies and general code of conduct, is that something employees need to follow outside of work hours and off of company
property? Cybervetting also introduces further opportunity for bias and discrimination (which are already prevalent in the hiring process) and narrows the gap between personal and professional lives.

While high school students are often told to be wary of their social media presence as they apply to colleges, no one reminds the every-day adult to be cautious about social media usage since it is often not considered. However, it clearly is a common occurrence that is causing adults to either reconsider their social media usage, limit their account accessibility, or create fake profiles. This surveillance has an enormous impact on the youths who are more likely to alter their personality, or at least their digital personality, and develop a false persona in order to portray themselves as an exact match for future employer expectations. In fact, this general awareness on surveilled social media has influenced the creation of “finstas” (fake instagram accounts) and tactics including “finely tuned privacy settings, deleting content, and maintaining multiple profiles for different imagined audiences” (Robards,2022) in an attempt to provide security and anonymity. In addition to fake accounts, individuals now attempt to use platforms that employers may be less likely to see such as Tumblr versus Facebook, and it will be interesting to see how the platform of choice for the most security and lowest visibility change over time. This creates, and promotes, an alternate reality in which individuals are catering to expectations rather than being their true selves as was the original intent of social media platforms.

IMAGE 2: Most commonly used social media platforms

Public response to employment terminations due to social media postings have been divided between two general categories: either termination was caused by public outcry, or the termination resulted in public outcry. The first is a pleasant surprise because it indicates that the companies are learning to hold not only themselves, but also their employees accountable for actions that are generally deemed inappropriate. However, it also implies that the company is taking action only to satisfy the public and that otherwise it would have done nothing. The employer also then falls prey to the idea of putting on a false persona in order to regain social approval. The second category is when corporations misuse their reach and monitoring of social media platforms in order to terminate employees for speaking out against harsh working conditions, health and safety risks, or share their general disapproval for corporate policies or proceedings. Social media monitoring by employers overall seems to be an invasion of personal rights and freedom of speech, and termination or hiring decisions are incredibly prone to bias but unfortunately continue to be heavily prevalent today. Caution is advised to those who post on social media, big brother truly is watching.

Sources
Robards, B., & Graf, D. (2022). “How a Facebook Update Can Cost You Your Job”: News Coverage of Employment Terminations Following Social Media Disclosures, From Racist Cops to Queer Teachers. Social Media + Society, 8(1). https://doi.org/10.1177/20563051221077022

Image 2:

https://www.socialsamosa.com/2019/03/infographic-most-popular-social-media-platforms-of-2019/

October 25, 2022

Balancing Checks and Balances

Balancing Checks and Balances

Anonymous | October 20, 2022

While the US Government guides data policies and practices for private companies, their own policies and practices have caused them to fall behind with the current digital era.

In today’s digital era, there is data being collected in every interaction, whether that’s the grocery store, scrolling through Tik Tok or walking through the streets of New York. Companies collect this data to understand the patterns of the average citizen, whether that’s how often people buy Double Stuffed Oreos™, how long people spend watching the newest teenage dance trend or the amount of foot traffic that crosses 6th and 42nd at 5 PM every day. This information is used to shape the way that companies evolve their products to appeal to the “average citizen” and reach the widest range of users. In order to do this, they must collect terabytes and terabytes of data to determine how the “average citizen” acts.

Congress sets the data privacy laws that govern the way private companies can collect, share and use these terabytes and terabytes of personal data. These laws are put in place to protect US citizens and give them power on how and what information can be collected, used and shared. These rules and regulations are made to make sure companies are good data stewards to ensure their personal information is not exposed to hurt them. These damages are most commonly in the form of personal and identity theft. There are still millions of cases of hackings and data leaks every year, but the rules and regulations have forced private companies to implement safer data practices.

In recent years, there have been multiple data breaches from US Government entities, like the Department of Defense, Homeland Security, Health and Transportation. In 2015, the Office of Personnel Management was successfully targeted, which exposed personal information for 22 million federal employees. This office governs all federal employees, which means that Social Security Numbers, bank information and medical histories was captured. All of the data that these government agencies collected was exposed and left millions of citizens vulnerable to personal and identity theft. In order to limit the power and freedom of any one individual or entity, there is a system of Checks and Balances in place. I understand that and I agree, but this comes at the expense of adequate technology and infrastructure to be good data stewards. The US government values ethics and effectiveness over efficiency.

I have worked for the federal government for my whole career, so I am not writing this as an anti-government enthusiast. I have seen how internal rules and regulations have hindered the success of data scientists, like myself. We have trouble implementing safer data practices and policies because of the hoops and hurdles that must be jumped through in order to change the way things have always been done. None of these leaks were intentional and were of no fault of just one person. In my experience as a federal employee, the aggregation of many small mistakes can lead to one very large one, like a data leak. Data practices in the government are far behind that of the private industry. Again, this is not intentional or because of one person’s sly decisions. The rules and regulations that government entities have to follow are strict and tedious, with reason. Many levels of approvals and authorizations were put in place to keep the power away from one person or entity if anyone aspires to make any monumental change. This system of Checks and Balances is necessary to keep a large government functioning ethically and effectively, but it sacrifices efficiency.

While the government has valid reasons for its extensive processes, there must be change in order to quickly implement safer data practices to protect US citizens and their information. I know there is not one button to push or one lever to pull that will fix everything. It will be a slow and generational process, but essential to stay in line with the rapidly evolving digital era we are in.

While the government may be utilizing data for ethical endeavors, like trying to find the best food markets to subsidize to help low-income families, understanding the interests of children to better target teen smokers, or identifying the areas with the highest rates of homelessness, there is still a lot of data being collected every day and data practices are not updated to match the current digital era. If we cannot change culture and streamline implementation processes to balance the system of Checks and Balances, we will continue to leave US Citizens at risk of exposure.

“US Government: Deploying yesterday’s technology tomorrow”

Anonymous Federal Employee

October 25, 2022October 25, 2022

The Hidden Perils of Climbing the Alternative Credit Ladder

The Hidden Perils of Climbing the Alternative Credit Ladder

Alberto Lopez Rueda

Alternative data could expand credit availability but in the absence of clear regulation, consumers should carefully consider the risks before asking for an alternative credit score

25 million Americans do not have credit history; they are “credit invisible”. This can make it very difficult to qualify for financial products, such as home mortgages or student loans. Credit is an important cornerstone of achieving the American dream [1] of homeownership, education and success which in turn is key in social mobility [2].

In order to access credit, consumers need to have a credit score. Yet, traditional credit scores are based on consumers’ credit history. How could millions of users dodge this “catch-22” problem and qualify for credit? Alternative data may be the answer according to some companies [3]. Alternative data is defined as any data that is not directly related to a consumer’s credit history and can include datasets such as individuals’ social media, utility bills and web browsing history.

Caption: Types of alternative data in credit scoring. Source.

Credit agencies argue that alternative data can build a more inclusive society by expanding the availability of affordable credit to generally vulnerable populations. But, at what cost?

Ethical and privacy concerns

The use of alternative credit scores raises important questions about transparency and fairness. In addition, some types of alternative data may entrench or even amplify existing social inequalities and could even discriminate against consumers based on protected characteristics like race. There are also hidden privacy risks, as companies generally collect vast amounts of data that can then be shared or re-sold to third-parties.

Alternative credit models are less transparent than traditional scores. Companies generally do not detail the alternative data that was utilized or disclose the methodology used to derive the alternative scores. Furthermore, alternative credit models are also harder to explain than traditional scores, due to the sophisticated models that produce the results. In addition, alternative data can contain potential errors and inaccuracies, which may be very challenging to detect or correct. Without full transparency, consumers’ risk being treated unfairly.

Alternative data may also perpetuate existing inequalities. According to data from FICO, one of the most known credit scoring companies, the FICO XD scores based on utility bills are meaningfully lower than those derived with traditional data. A low score can still provide access to credit although it generally means being charged higher interest rates or premiums. Receiving a FICO score below 580 [4] may even harm consumers since having a score classified as “poor” is generally worse than having no score [5]. However, these results cannot be generalized since the scores largely depend on the types of alternative data and algorithms used.

Caption: Distribution of FICO Scores between alternative data (FICO Score XD) and traditional scores (FICO 9). Source

New datasets could even facilitate discrimination against consumers. The Equal Credit Opportunity Act of 1974 forbids companies from using information from protected classes such as race, sex, and age for credit purposes. Yet, some alternative data indicators such as educational background [6] can be highly correlated to one of these protected classes.

Lastly, users of alternative data scores might be also exposed to meaningful privacy harms. Firstly, companies may collect vast amounts of information about consumers and although consumers are generally notified, they may not be aware the extent to which their privacy can be invaded. In addition, all this information collected could be shared or re-sold for completely different purposes, potentially harming consumers in the future.

Caption: Credit agencies can collect vast and detailed information about consumers

Source

The US regulatory landscape around alternative data scores

In 2019, five US financial regulatory agencies [7] backed the use of alternative information in traditional credit-evaluation systems, encouraging lenders to “take steps to ensure that consumer-protection risks are understood and addressed”. Yet, most of the relevant regulation, such as the Fair Credit Reporting Act, dates back several decades and the advent of big data has inevitably brought some regulatory blind spots. It is imperative the US government enacts substantiative regulation for alternative credit scores to fulfil its potential. Important issues, such as the kinds of alternative data and indicators that can be used, must be addressed. Mechanisms should also be put in place to prevent discrimination, enhance transparency and protect consumers.

Without clear regulatory boundaries, vulnerable consumers might be forced to choose between a life without credit or a risky path of discrimination and lack of privacy. In other words: Dammed if you do, dammed if you don’t.

Conclusion

The advent of alternative data and new algorithms could help expand affordable credit to those who are credit invisible. More credit could in turn lower social inequalities by helping vulnerable consumers. Yet, the use of alternative credit scores remains in its infancy and the lack of explicit regulation can result in harmful practices against users. Against this backdrop, consumers should thoroughly compare the different alternative credit scores and carefully consider the sometimes-hidden disadvantages of climbing the alternative credit ladder.

Citations

[1] Barone, A. (2022, August 1). What Is the American Dream? Examples and How to Measure It. Investopedia.

https://www.investopedia.com/terms/a/american-dream.asp

[2] Ramakrishnan, K., Champion, E., Gallagher, M., Fudge, K. (2021, January 12). Why Housing Matters for Upward Mobility: Evidence and Indicators for Practitioners and Policymakers. Urban Institute

https://www.urban.org/research/publication/why-housing-matters-upward-mobility-evidence-and-indicators-practitioners-and-policymakers

[3] FICO (2021). Expanding credit access with alternative data

https://www.fico.com/en/resource-access/download/15431

[4] McGurran, B. (2019, July 29). How to “Fix” a Bad Credit Score. Experian

https://www.experian.com/blogs/ask-experian/credit-education/improving-credit/how-to-fix-a-bad-credit-score/

[5] Black, M., Adams, D. (2020, December 10). Is No Credit Better Than Bad Credit?. Forbes

https://www.forbes.com/advisor/credit-score/is-no-credit-better-than-bad-credit/

[6] Hayashi, Y. (2019, August 8). Where You Went to College May Matter on Your Loan Application. The Wall Street Journal.

https://www.wsj.com/articles/where-you-went-to-college-may-matter-on-your-loan-application-11565258402?mod=searchresults&page=1&pos=1&mod=article_inline

[7] Hayashi, Y. (2019, December 3). Bad Credit? Regulators Back Ways for Risky Borrowers to Get Loans. The Wall Street Journal.

https://www.wsj.com/articles/bad-credit-alternative-data-wins-support-as-a-way-to-ease-lending-11575420678

October 25, 2022

DuckDuckGo: What happens in search stays in search

DuckDuckGo: What happens in search stays in search
Li Jin | October 20, 2022

We’ve been used to searching everything online without even thinking: the medical, financial, and personal issues, most of which should be private. However, on Google, the most popular search engine in the world, searches are tracked, saved, and used for targeting advertisements or something even worse. The Incognito mode that Google provides is not entirely private. It simply deletes information related to your browsing session. Still, it only does that after you end your session by closing all your tabs.

Established in 2008, DuckDuckGo set out with the mission of “privacy protection available to everyone.” It allows for complete anonymous web browsing. This makes it an ideal choice for anyone that hates ads and/or being tracked online. It offers perfect privacy: No data on users’ online searches are collected or stored. No ads targeting users based on their previous searches. No social engineering techniques are used based on users’ searches and other interests. Everyone can be sure of getting the same search results as all other users.

There are several ways DuckDuckGo did to ensure users’ privacy is protected. First, it exposes the major tracking networks tracking you over time and blocks the hidden third-party trackers on websites you visit, including those from Google and Facebook. Second, Searches made through DuckDuckGo also automatically connect to the encrypted versions of websites wherever possible, making it harder for anyone to see what you’re looking at online. Third, DuckDuckGo calculates and shows users a Privacy Grade as a reminder regarding online privacy. Fourth, of course, Search Privately. DuckDuckGo search doesn’t track you. Ever.

As for its down side, some argued that the quality of DuckDuckGo’s search results are not as good as Google. DuckDuckGo offers a more unfiltered version of the internet. Unlike Google constantly updates its search algorithm tailored to its users’ browsing habits. You may find the results that have been penalized or removed by Google because they are dangerous, patently false, or just misinformation designed to confuse people. But DuckDuckGo claims they use algorithms to calculate all the results and filter millions of possible results down to a ranked order. As a search engine, DuckDuckGo doesn’t offer as many services as Google, and as a result, is less convenient. And of course, because DuckDuckGo doesn’t track and profile users, it does not and cannot provide customized searches like Google.

Currently, DuckDuckGo is Ranking #4 of Utilities in the App Store for iPhone. It’s not as popular as the #1 Google and #2 Chrome, which offers built-in features such as Google Maps, Flights, etc. They are integrated with your other Google accounts and products, which can sometimes be rewarding. But DuckDuckGo rates 4.9 in 1.4M Ratings among Utilities in the App Store for iPhone. The users appreciate DuckDuckGo due to the respect for privacy and the heads-up it offers when something underhanded goes down.

DuckDuckGo is essential to people who care about privacy more than convenience. It will likely thrive in its niche market.

October 25, 2022

Beauty is in the eye of the algorithm?

Beauty is in the eye of the algorithm?
Anonymous | October 20, 2022

AI generated artwork has been making headlines over the past few years, but many artists wonder if these images can even be considered art.

Can art be created by a computer? Graphic design and digital art have been around for decades, and these techniques require experience, skill and often years of education to master. Artists have used tools such as Adobe Photoshop to traverse this medium and produce beautiful and intricate works of art. However, in the last few years new types of artificial intelligence software, such as DALL-E 2, have enabled anyone, even those without experience or artistic inclination, to produce elaborate images by entering just a few key words. This is exactly what Jason Allen did to win the Colorado State Fair art competition in September 2022: he used the AI software Midjourney to generate his blue ribbon winning piece “Théâtre D’opéra Spatial”. Allen did not create the software, but had been fascinated by its ability to create captivating images and wanted to share it with the world. Many artists were outraged by the outcome, but Allen maintains that he did nothing wrong as he did not break any of the rules of the competition nor did he try to pass of the work as his own, submitting it under “Jason Allen via Midjourney”. The competition judges also maintained that though they did not initially know that Midjourney was an AI program, they would have awarded Allen the top prize regardless. However, it seems that the ethical impacts here go deeper than the rules of the competition themselves.

“Théâtre D’opéra Spatial” via Jason Allen from the New York Times

A few questions come to mind when analyzing this topic. Can the output of a machine learning algorithm be considered art? Is the artist here the person who entered the phrase into the software, the software itself, or the developer of the software? Many artists argue that AI cannot produce art because its outputs are devoid of any meaning or intention. They say that art requires emotion and vulnerability in order to be truly creative, though it seems incorrect to try and define the term “art”. Additionally, critics of AI softwares claim that they are a means of plagiarism as the person inputting the key words did not create the work themselves and the software requires previous work as a basis for its learning, so the output is necessarily based on other people’s effort. This is not, however, the first time that AI generated art has made headlines. In 2018, an AI artwork sold for $432,500 after being auctioned at Christie’s. The work, Portrait of Edmond Belamy, was created by Obvious, a group of Paris based artists, who programmed what they coined a “generative adversarial network”. This system consists of two parts, a “generator” and a “discriminator”, the first which creates the image and the second which tries to differentiate between human created and machine generated works. Their goal was to fool the “discriminator”. The situation here was slightly different from the Midjourney generated art as the artists were also the developers of the algorithm and the algorithm itself seems to be credited.

Portrait of Edmond Belamy from Christie’s

As someone who has no artistic ability or vested interest in the world of art, it was difficult to even form an opinion on some of these ethical questions, but the complexity of this topic intrigued me. However, even though this is not a research experiment, the principles of The Belmont Report are relevant in this situation. First, there seems to be an issue with Beneficence here. Beneficence has to do with “maximizing possible benefits and minimizing possible harms.” Allowing or entering AI generated art in a competition is in conflict with this principle. Artists often spend countless hours perfecting their works whereas software can create it in a matter of seconds, so there is a lot of potential for harm here. The winner of the Colorado state fair competition in particular also received a $300 prize, which is money that could have gone to an artist who had a more direct impact on their submission. Furthermore, there is the issue of Justice also mentioned in the Belmont Report. Justice has to do with people fairly receiving benefits based on the effort they have contributed to some project. As mentioned above, the works generated by AI will be based on other people’s intellectual property, but those people will not receive any credit. Additionally, in both the Christie’s and Colorado State Fair cases, the artists are profiting from these works, so there is a case to be made that those whose art was used to train the algorithms are also entitled to some compensation. In the end, it seems that this is another case of technology moving faster than the governing bodies of particular industries. Moving forward the art world must decide how these newer and more advanced softwares fit into spaces where technology has historically, and often intentionally, been excluded.

References:

https://www.nytimes.com/2022/09/02/technology/ai-artificial-intelligence-artists.html

https://www.smithsonianmag.com/smart-news/artificial-intelligence-art-wins-colorado-state-fair-180980703/

https://www.christies.com/features/a-collaboration-between-two-artists-one-human-one-a-machine-9332-1.aspx

https://www.hhs.gov/ohrp/sites/default/files/the-belmont-report-508c_FINAL.pdf

July 5, 2022

Hirevue is looking to expand its profits, and you are the price

Hirevue is looking to expand its profits, and you are the price
Tian Zhu | June 30, 2022

Insight: The video interview giant Hirevue’s AI interviewer had helped to reject more than 24 million candidates based on reasons that only AI knew. More scarily, the candidates could potentially contribute to their rejections with the data they provide.

Recruitment has always been a hot topic, especially after the great resignation following the covid-19 breakout. How to find the right candidates with the right talent for the right job has been the top concern for companies that are eagerly battling the loss of talent.

One important factor that causes talent loss during the hiring process is human bias, whether intentional or unintentional. The video interview giant, Hirevue, thought to use AI to combat bias in the hiring process. Machines can’t be biased right?

We all know the answer to that question. Though AI may not exhibit the same type of biases that human has, it has its list of issues. Clear concerns existed around the AI’s transparency, fairness, and accountability.

First, their algorithm was not transparent at all. Hirevue could not provide independent audits on their algorithms that analyzed the candidate’s video, including facial expressions and body gestures, that produced the final hiring decision. On top of that, there was no indication that the algorithm is fair towards candidates with the same expertise with different demographic backgrounds. The theory behind the link between facial expression and the candidate’s qualification is full of flaws; different candidates with the same qualifications and answers could be scored differently due to their eye movement. Thirdly, the company is accountable for the decision made by the AI. The company even implies the collection and usage of the interview data are solely for the “employer”, yet it is unknown whether they gain access to this data through permissions from the employers for various purposes.

Hirevue was challenged by the Electronic Privacy Information Center with a complaint to the FTC regarding the °∞unfair and deceptive trade practices°±. The company has since stopped using any algorithms with data other than the speech of the candidates.

With the strong push back on the robot interviewer, Hirevue limited their scope of AI to only the voice during the interviews. But note that Hirevue did not start as an AI company, but as a simple video interview company. It°Øs the company°Øs fear of missing out on the AI and big data that drives them to squeeze out the value of your data, desperately trying to find more business value and profit from every single drip.

Such a scenario is not unique to Hirevue. In 2022, AI is closer to people than you think it may be. People are no longer just curious about it but expect AI to help them in their daily life. Uber, for example, could not have been made possible without the heuristic behind optimal matching between the drivers and the users. Customers expect AI in their products. The companies that provide the capability race ahead while those who don’t fall behind naturally.

There are companies out there just like Hirevue, sitting on a pile of data, trying to build up some “quick wins” to not miss out on the AI trend. There’s just one problem, the data that the customers provided was not supposed to be used this way. It is a clear violation of secondary usage of data with all the problems mentioned in the previous sections.
The year 2022 is no longer the year where AI can grow rampantly without constraints from both legal and ethical perspectives. A suggestion for all potential companies that want to take advantage of their rich data: Be transparent about your data and algorithm decisions, be fair to all the stakeholders, and be accountable for the consequences of your AI product. The in-house “quick wins” should never make it out to the public without careful consideration of each point.

https://fortune.com/2021/01/19/hirevue-drops-facial-monitoring-amid-a-i-algorithm-audit/