blogpost – Page 3 – Data Science W231 | Behind the Data: Humans and Values

October 25, 2022October 25, 2022

Your Social Credit Score: Everything and Everywhere Recorded and Analyzed

Your Social Credit Score: Everything and Everywhere Recorded and Analyzed

By Anonymous | October 20, 2022

Imagine a world where your hygiene, fashion, and mannerisms directly affect your ability to rent an apartment. Every part of you is nitpicked for the world and reflected back to your social credit score which in turn affects your career, finances, and even living situation. Seems extreme, right? Well, this scenario is becoming a reality to Chinese citizens with the country’s Social Credit System.

[IMAGE1: scored_passerbys.jpg]

What is it?

Though China has had ideas about a credit system since 1999, the official Chinese Social Credit System was announced in 2014 after building the necessary infrastructure for the system to be added upon. By using a combination of data gathering and sharing, curation of blacklists and redlists, and punishments, sanctions and rewards, the system would uphold values like financial creditworthiness, judicial enforcement, societal trustworthiness, and government integrity. With the right score, you can expect rewards like getting fast-tracked for approvals or having fewer inspections and audits. But if you have the wrong score, you can face punishments like employment, school, and even travel bans. From this, we can see the system as an oppressive way to force “good behavior” despite the methods being invasive and dismissive of their people’s privacy and autonomy.

The main cause of concern though is the integration of technology into this “social credit system”, namely with the use of their 200 million surveillance cameras from their Artificial Intelligence facial recognition project aka SkyNet, online behavior tracking systems in all spots of the internet, and predictive analytics for identifying “political violations”. With all these technologies at their disposable, we can see the numerous different privacy harms being committed without any direct consent from their citizens.

[IMAGE2: commuters_and_cameras.jpg]

What are the privacy harms?

This entire system has many potential data risks from data collection to the data analysis to the actions after the predictive analysis. On top of that, I want to reiterate that the citizens had never given consent to participate in such an invasive system.

How is it ethical to gather so much sensitive data in a single system and allow numerous data scientists to have access to such personal data? Even if the participants are anonymous, it doesn’t change the fact that these scientists have access to personal identifying data, financial data, social media data, and more; a unique ID would do nothing to protect these participants from bad actors hoping to use this data in a malicious way. Additionally, the government is tight-lipped about how all this data computes a score that affects the participant’s own livelihood and potentially their dependents. This single score dictates so much, yet there isn’t a way for citizens to gain insight into the system or have some technological due process to let the government know if there is an issue. This clear lack of transparency from the system’s creators makes the treatment oppressive to everyone involved.

In addition to the lack of transparency, there is a clear lack of accountability and again due process that would allow some form of correction if the algorithm doesn’t correctly output a score reflective of a citizen’s standing. Like with all data-related endeavors, there is an inherent bias that comes with how the data is being analyzed and what comes out of it; if someone who doesn’t know much about the day-to-day struggles of a Chinese citizen, how can they correctly infer from the data the relative standing of said citizen? How can a couple of algorithms accurately score more than a billion citizens in China? There are bound to be numerous experiences and actions that may actually be okay in the eyes of a human being but deemed “dishonorable” in the logic of the algorithms. By not having an explanation of the system or even a safeguard to avoid situations like this, there are bound to be numerous participants that needlessly fight against a system built against them from the very beginning.

What can we learn from this?

Just like with any system built on people and their data, there is a level of harm committed against the participants that we need to be aware of. It’s important to continue advocating for the rights of the participants rather than the “justice” of the project because incorrect or rushed uses of the data can create harmful consequences for those involved. From this analysis of China’s Social Credit System, we hopefully can learn a thing or two about how powerful the impact of data can be in the wrong context.

Sources

Donnelly, Drew. (2022). “China Social Credit System Explained – What is it & How Does it Work?”. https://nhglobalpartners.com/china-social-credit-system-explained/

Thoughts, Frank. (2022). “Social Credit Score: A Dangerous System”. https://frankspeech.com/article/social-credit-score-dangerous-system

Reilly, J., Lyu, M., & Robertson, M. (2021). “China’s Social Credit System: Speculation vs. Reality”. https://thediplomat.com/2021/03/chinas-social-credit-system-speculation-vs-reality/

October 25, 2022

Would privacy best practices make workforce surveillance less creepy?

Would privacy best practices make workforce surveillance less creepy?

By Andrew Sandico | October 20, 2022

Workforce surveillance is an emerging threat to a person’s privacy and there are limited ways to protect employees.

With the rise of available platforms for employee data, there is more opportunity to measure employee’s activity at work and their productivity. This monitoring of employee activity at work has been defined as workplace surveillance and these new measurements create risks of employee’s privacy since the data can easily be misused. New data sources are becoming more available to do workplace surveillance such as video recordings, pictures, monitoring keystrokes, or counting mouse clicks. All these examples potentially create risk of incorrect measurements for evaluating worker productivity (what employees produce at work) due to missing context. Although it is possible to measure workforce productivity through surveillance, should we?

[6] Examples of Workplace surveillance methods from the NYPost.

The Bad….but some Good?

During the COVID pandemic, many employees went through one of the largest social experiments that was unplanned for many companies. Specifically, employees began to work from home with limited physical availability for managers to see what their employees were doing. Due to this rise, many businesses and managers explored options to better understand what their workers were doing and there was strong interest in measuring their employee’s productivity. Some of these examples include having a platform which could measure their employees working time of when hitting keystrokes, but for some occupations, work isn’t measured by typing but by other activities such as discussions with a customer or talking to a colleague at work about a project. A few key areas of concern to highlight are:

Intrusion– Leveraging a privacy framework such as Solove’s Taxonomy, workforce surveillance could impede a person’s privacy by gathering more information than needed. When a platform is using an employee’s camera they can pick up events that could be used against them. For example, a manager seeing a lot of children in their employee’s camera might determine that specific employee as being less productive compared to another worker that don’t have children in the background.
Discrimination – Data pulled from multiple sources without the appropriate context could inaccurately classify employees into categories (called predictive privacy harm or the inappropriate generalization of personal data). For example, if a marginalized demographic group has a lower key stroke average to other groups due to a small sample size, that group could be inaccurately categorized.
Micromanaging – Per Gartner’s research, they have found that micromanaging has negative impacts on employees such as making them less engaged, motivated, and/or productive [7]. With workforce surveillance, this would only be amplified by technology due to a perceived constant monitoring.

[7] Gartner Diagram of micromanagement impacts on a team

Although there are concerns, is there a gain for the subject? On the flip side, even with all these risks there is feedback of positive reasons for workforce surveillance. Examples include leveling the performance evaluation in companies to be more scientific vs subjective relationships. Per the Rise of Workplace surveillance podcast from the New York times [1], females particularly expressed positive feedback for this since it gave them empirical evidence on their work performance vs where other male counterparts might be subjectively given higher performance results due to relationships. For others that appreciate seeing their work quantified, they stated the benefit from seeing their own measurements to either validate their work as an accomplishment or help them be more productive. Balancing all these concerns, will privacy best practices make workforce surveillance less creepy and more beneficial?

How Privacy Can Help

Respect for a person is key. Leveraging the Belmont Report as a privacy framework, using data to help the employee or by giving aggregated themes to a manager to improve as a leader can help with development. Development for an employee is using data and synthesizing it to help nudge them on their work practices. Imagine a Fitbit for your work, where there are reminders of how to manage work life balance. For example, if you set your work hours from 9-5 but you always find yourself responding to emails at night those few emails can quickly add up to a few more hours of work. These nudges can help remind you of working extra hours and reduce the risk of burnout. For managers, they are also employees with development as well. Understanding the impact of a manager sending a late email to an employee could lead to extra hours of working in the night. Although easier for the manager to send at the moment, the good of understanding this behavior could help the manager change their work habit which would reduce their employee’s burnout. Products such as Viva built into Microsoft 365 is an example of how data can be used by a manager for their development while maintaining the privacy of the employee [5]. Viva provides nudges to managers on some of their influences on their employees’ work habits which a manager can then adjust.

[5] Example of how products like Microsoft Viva help managers understand their employee habits to help them lead their team.

Workforce surveillance is a slippery slope with limited privacy safeguards today, but with emerging platforms and stronger interest from businesses, appropriate privacy policies increase the possibility to use the data for the good of employee development and not for measuring performance. Using this data for performance is often labeled as weaponizing since it can be used as a way to justify giving someone a lower bonus, not giving a promotion, and/or terminate them. Without these privacy guidelines it will be very possible for companies to use the same data for incorrect use. Which brings another dilemma to platform companies that are sources of this data, would they sell the data that they collect from their platform as a service to other companies that do want to use that information for weaponizing. This additional risk of platform companies aggregating this information is another reason to have these companies have privacy by design embedded into their work.

Conclusion

The risk of workforce surveillance poses a major risk to a person’s privacy but if done correctly for the benefit of the person (employee) there are rewards. With meaningful privacy best practices that are followed, the benefits of workforce surveillance could outweigh the risks.

References

[1] The Rise of Workplace Surveillance – The New York Times (nytimes.com)

[2] Workplace Productivity: Are You Being Tracked? – The New York Times (nytimes.com)

[3] ‘Bossware is coming for almost every worker’: the software you might not realize is watching you | Technology | The Guardian

[4] Management Culture and Surveillance

[5] Viva Insights: what is it and what are the benefits? | Creospark

[6] ‘It’s everywhere’: Workplace surveillance methods now common (nypost.com)

[7] Micromanaging Your Remote Workers? Act Now to Stop Yourself. (gartner.com)

October 25, 2022October 25, 2022

Big Corporate: Big Brother?

Big Corporate: Big Brother?

By Tajesvi Bhat | October 20, 2022

Have big corporations taken advantage of social media platforms in order to expand their reach and power beyond all limits?

Social media is so integrated in our lives that it has introduced an entire new subcategory of communication: DMs (Direct Messages). While originally only between friends and family, DMs and social media have spread to the professional world. The advent of LinkedIn- a professional networking service- encouraged the applications of social media mannerisms such as DMs for professional opportunities, networking, and recruiting. Hiring managers did not limit themselves to the confines of LinkedIn though, and used mainstream social media platforms in order to assess the “professional value” of an employee (Robards,2022).

Cybervetting, a thorough investigation of an individual’s social media presence, is increasingly being used by hiring managers and companies in an attempt to decide which employees seem to match the company values and would be able to integrate themselves successfully with the existing company culture. A study published in March 2022 analyzed reports of employment terminations as a result of social media postings. They found that the terminations largely fell in 2 categories: those as a result of self-posts and those as a result of third party posts (ie. someone else posting something in which the individual is mentioned or is involved). As a result of social media investigations, these employees were consequently terminated and the employer’s reasonings for terminations were noted. The below image shows a breakdown for the most common reasons for employee termination for both self-posts and third-party posts:

IMAGE 1: Distribution of Employment Termination Reasons across Self and Third-Party posts (Robards,2022)

The most common source for this information was Facebook, closely followed by Twitter. The most common reasonings for firing for self-posts being racial discrimination. The most common reasonings for firing for third party posts was: . The most common careers who are and have been held accountable as a result of their social media presence are: law enforcement, education workers, hospitality, media, medical, retail, government, and transport workers.

Employers justify the use of cybervetting as a “form of ‘risk work’, with the information collected online used to mitigate risk by determining compatibility and exposing red flags that would suggest a misfit, compromise the image of the employer, or disrupt the workplace” (Robards,2022). The normalization of this practice raises questions about employee visibility as well as the ethical boundaries of employers and Freedom of Speech protections. According to Robards’ study, “27% of participants experienced employer “disapproval” of their social media activities, and some even experienced attempted influence of their use of social media by employers” (Robards,2022). Clearly, cybervetting is a process that is not limited to hiring, and rather continues for the duration of employment. It is incredibly concerning that employers are watching employee social media posts as it is most likely being done without their consent or knowledge, and also the reasoning behind it is not entirely clear. While it is understandable that an employer would want to ensure employees are following their policies and general code of conduct, is that something employees need to follow outside of work hours and off of company
property? Cybervetting also introduces further opportunity for bias and discrimination (which are already prevalent in the hiring process) and narrows the gap between personal and professional lives.

While high school students are often told to be wary of their social media presence as they apply to colleges, no one reminds the every-day adult to be cautious about social media usage since it is often not considered. However, it clearly is a common occurrence that is causing adults to either reconsider their social media usage, limit their account accessibility, or create fake profiles. This surveillance has an enormous impact on the youths who are more likely to alter their personality, or at least their digital personality, and develop a false persona in order to portray themselves as an exact match for future employer expectations. In fact, this general awareness on surveilled social media has influenced the creation of “finstas” (fake instagram accounts) and tactics including “finely tuned privacy settings, deleting content, and maintaining multiple profiles for different imagined audiences” (Robards,2022) in an attempt to provide security and anonymity. In addition to fake accounts, individuals now attempt to use platforms that employers may be less likely to see such as Tumblr versus Facebook, and it will be interesting to see how the platform of choice for the most security and lowest visibility change over time. This creates, and promotes, an alternate reality in which individuals are catering to expectations rather than being their true selves as was the original intent of social media platforms.

IMAGE 2: Most commonly used social media platforms

Public response to employment terminations due to social media postings have been divided between two general categories: either termination was caused by public outcry, or the termination resulted in public outcry. The first is a pleasant surprise because it indicates that the companies are learning to hold not only themselves, but also their employees accountable for actions that are generally deemed inappropriate. However, it also implies that the company is taking action only to satisfy the public and that otherwise it would have done nothing. The employer also then falls prey to the idea of putting on a false persona in order to regain social approval. The second category is when corporations misuse their reach and monitoring of social media platforms in order to terminate employees for speaking out against harsh working conditions, health and safety risks, or share their general disapproval for corporate policies or proceedings. Social media monitoring by employers overall seems to be an invasion of personal rights and freedom of speech, and termination or hiring decisions are incredibly prone to bias but unfortunately continue to be heavily prevalent today. Caution is advised to those who post on social media, big brother truly is watching.

Sources
Robards, B., & Graf, D. (2022). “How a Facebook Update Can Cost You Your Job”: News Coverage of Employment Terminations Following Social Media Disclosures, From Racist Cops to Queer Teachers. Social Media + Society, 8(1). https://doi.org/10.1177/20563051221077022

Image 2:

https://www.socialsamosa.com/2019/03/infographic-most-popular-social-media-platforms-of-2019/

October 25, 2022

Balancing Checks and Balances

Balancing Checks and Balances

Anonymous | October 20, 2022

While the US Government guides data policies and practices for private companies, their own policies and practices have caused them to fall behind with the current digital era.

In today’s digital era, there is data being collected in every interaction, whether that’s the grocery store, scrolling through Tik Tok or walking through the streets of New York. Companies collect this data to understand the patterns of the average citizen, whether that’s how often people buy Double Stuffed Oreos™, how long people spend watching the newest teenage dance trend or the amount of foot traffic that crosses 6th and 42nd at 5 PM every day. This information is used to shape the way that companies evolve their products to appeal to the “average citizen” and reach the widest range of users. In order to do this, they must collect terabytes and terabytes of data to determine how the “average citizen” acts.

Congress sets the data privacy laws that govern the way private companies can collect, share and use these terabytes and terabytes of personal data. These laws are put in place to protect US citizens and give them power on how and what information can be collected, used and shared. These rules and regulations are made to make sure companies are good data stewards to ensure their personal information is not exposed to hurt them. These damages are most commonly in the form of personal and identity theft. There are still millions of cases of hackings and data leaks every year, but the rules and regulations have forced private companies to implement safer data practices.

In recent years, there have been multiple data breaches from US Government entities, like the Department of Defense, Homeland Security, Health and Transportation. In 2015, the Office of Personnel Management was successfully targeted, which exposed personal information for 22 million federal employees. This office governs all federal employees, which means that Social Security Numbers, bank information and medical histories was captured. All of the data that these government agencies collected was exposed and left millions of citizens vulnerable to personal and identity theft. In order to limit the power and freedom of any one individual or entity, there is a system of Checks and Balances in place. I understand that and I agree, but this comes at the expense of adequate technology and infrastructure to be good data stewards. The US government values ethics and effectiveness over efficiency.

I have worked for the federal government for my whole career, so I am not writing this as an anti-government enthusiast. I have seen how internal rules and regulations have hindered the success of data scientists, like myself. We have trouble implementing safer data practices and policies because of the hoops and hurdles that must be jumped through in order to change the way things have always been done. None of these leaks were intentional and were of no fault of just one person. In my experience as a federal employee, the aggregation of many small mistakes can lead to one very large one, like a data leak. Data practices in the government are far behind that of the private industry. Again, this is not intentional or because of one person’s sly decisions. The rules and regulations that government entities have to follow are strict and tedious, with reason. Many levels of approvals and authorizations were put in place to keep the power away from one person or entity if anyone aspires to make any monumental change. This system of Checks and Balances is necessary to keep a large government functioning ethically and effectively, but it sacrifices efficiency.

While the government has valid reasons for its extensive processes, there must be change in order to quickly implement safer data practices to protect US citizens and their information. I know there is not one button to push or one lever to pull that will fix everything. It will be a slow and generational process, but essential to stay in line with the rapidly evolving digital era we are in.

While the government may be utilizing data for ethical endeavors, like trying to find the best food markets to subsidize to help low-income families, understanding the interests of children to better target teen smokers, or identifying the areas with the highest rates of homelessness, there is still a lot of data being collected every day and data practices are not updated to match the current digital era. If we cannot change culture and streamline implementation processes to balance the system of Checks and Balances, we will continue to leave US Citizens at risk of exposure.

“US Government: Deploying yesterday’s technology tomorrow”

Anonymous Federal Employee

October 25, 2022October 25, 2022

The Hidden Perils of Climbing the Alternative Credit Ladder

The Hidden Perils of Climbing the Alternative Credit Ladder

Alberto Lopez Rueda

Alternative data could expand credit availability but in the absence of clear regulation, consumers should carefully consider the risks before asking for an alternative credit score

25 million Americans do not have credit history; they are “credit invisible”. This can make it very difficult to qualify for financial products, such as home mortgages or student loans. Credit is an important cornerstone of achieving the American dream [1] of homeownership, education and success which in turn is key in social mobility [2].

In order to access credit, consumers need to have a credit score. Yet, traditional credit scores are based on consumers’ credit history. How could millions of users dodge this “catch-22” problem and qualify for credit? Alternative data may be the answer according to some companies [3]. Alternative data is defined as any data that is not directly related to a consumer’s credit history and can include datasets such as individuals’ social media, utility bills and web browsing history.

Caption: Types of alternative data in credit scoring. Source.

Credit agencies argue that alternative data can build a more inclusive society by expanding the availability of affordable credit to generally vulnerable populations. But, at what cost?

Ethical and privacy concerns

The use of alternative credit scores raises important questions about transparency and fairness. In addition, some types of alternative data may entrench or even amplify existing social inequalities and could even discriminate against consumers based on protected characteristics like race. There are also hidden privacy risks, as companies generally collect vast amounts of data that can then be shared or re-sold to third-parties.

Alternative credit models are less transparent than traditional scores. Companies generally do not detail the alternative data that was utilized or disclose the methodology used to derive the alternative scores. Furthermore, alternative credit models are also harder to explain than traditional scores, due to the sophisticated models that produce the results. In addition, alternative data can contain potential errors and inaccuracies, which may be very challenging to detect or correct. Without full transparency, consumers’ risk being treated unfairly.

Alternative data may also perpetuate existing inequalities. According to data from FICO, one of the most known credit scoring companies, the FICO XD scores based on utility bills are meaningfully lower than those derived with traditional data. A low score can still provide access to credit although it generally means being charged higher interest rates or premiums. Receiving a FICO score below 580 [4] may even harm consumers since having a score classified as “poor” is generally worse than having no score [5]. However, these results cannot be generalized since the scores largely depend on the types of alternative data and algorithms used.

Caption: Distribution of FICO Scores between alternative data (FICO Score XD) and traditional scores (FICO 9). Source

New datasets could even facilitate discrimination against consumers. The Equal Credit Opportunity Act of 1974 forbids companies from using information from protected classes such as race, sex, and age for credit purposes. Yet, some alternative data indicators such as educational background [6] can be highly correlated to one of these protected classes.

Lastly, users of alternative data scores might be also exposed to meaningful privacy harms. Firstly, companies may collect vast amounts of information about consumers and although consumers are generally notified, they may not be aware the extent to which their privacy can be invaded. In addition, all this information collected could be shared or re-sold for completely different purposes, potentially harming consumers in the future.

Caption: Credit agencies can collect vast and detailed information about consumers

Source

The US regulatory landscape around alternative data scores

In 2019, five US financial regulatory agencies [7] backed the use of alternative information in traditional credit-evaluation systems, encouraging lenders to “take steps to ensure that consumer-protection risks are understood and addressed”. Yet, most of the relevant regulation, such as the Fair Credit Reporting Act, dates back several decades and the advent of big data has inevitably brought some regulatory blind spots. It is imperative the US government enacts substantiative regulation for alternative credit scores to fulfil its potential. Important issues, such as the kinds of alternative data and indicators that can be used, must be addressed. Mechanisms should also be put in place to prevent discrimination, enhance transparency and protect consumers.

Without clear regulatory boundaries, vulnerable consumers might be forced to choose between a life without credit or a risky path of discrimination and lack of privacy. In other words: Dammed if you do, dammed if you don’t.

Conclusion

The advent of alternative data and new algorithms could help expand affordable credit to those who are credit invisible. More credit could in turn lower social inequalities by helping vulnerable consumers. Yet, the use of alternative credit scores remains in its infancy and the lack of explicit regulation can result in harmful practices against users. Against this backdrop, consumers should thoroughly compare the different alternative credit scores and carefully consider the sometimes-hidden disadvantages of climbing the alternative credit ladder.

Citations

[1] Barone, A. (2022, August 1). What Is the American Dream? Examples and How to Measure It. Investopedia.

https://www.investopedia.com/terms/a/american-dream.asp

[2] Ramakrishnan, K., Champion, E., Gallagher, M., Fudge, K. (2021, January 12). Why Housing Matters for Upward Mobility: Evidence and Indicators for Practitioners and Policymakers. Urban Institute

https://www.urban.org/research/publication/why-housing-matters-upward-mobility-evidence-and-indicators-practitioners-and-policymakers

[3] FICO (2021). Expanding credit access with alternative data

https://www.fico.com/en/resource-access/download/15431

[4] McGurran, B. (2019, July 29). How to “Fix” a Bad Credit Score. Experian

https://www.experian.com/blogs/ask-experian/credit-education/improving-credit/how-to-fix-a-bad-credit-score/

[5] Black, M., Adams, D. (2020, December 10). Is No Credit Better Than Bad Credit?. Forbes

https://www.forbes.com/advisor/credit-score/is-no-credit-better-than-bad-credit/

[6] Hayashi, Y. (2019, August 8). Where You Went to College May Matter on Your Loan Application. The Wall Street Journal.

https://www.wsj.com/articles/where-you-went-to-college-may-matter-on-your-loan-application-11565258402?mod=searchresults&page=1&pos=1&mod=article_inline

[7] Hayashi, Y. (2019, December 3). Bad Credit? Regulators Back Ways for Risky Borrowers to Get Loans. The Wall Street Journal.

https://www.wsj.com/articles/bad-credit-alternative-data-wins-support-as-a-way-to-ease-lending-11575420678

October 25, 2022

DuckDuckGo: What happens in search stays in search

DuckDuckGo: What happens in search stays in search
Li Jin | October 20, 2022

We’ve been used to searching everything online without even thinking: the medical, financial, and personal issues, most of which should be private. However, on Google, the most popular search engine in the world, searches are tracked, saved, and used for targeting advertisements or something even worse. The Incognito mode that Google provides is not entirely private. It simply deletes information related to your browsing session. Still, it only does that after you end your session by closing all your tabs.

Established in 2008, DuckDuckGo set out with the mission of “privacy protection available to everyone.” It allows for complete anonymous web browsing. This makes it an ideal choice for anyone that hates ads and/or being tracked online. It offers perfect privacy: No data on users’ online searches are collected or stored. No ads targeting users based on their previous searches. No social engineering techniques are used based on users’ searches and other interests. Everyone can be sure of getting the same search results as all other users.

There are several ways DuckDuckGo did to ensure users’ privacy is protected. First, it exposes the major tracking networks tracking you over time and blocks the hidden third-party trackers on websites you visit, including those from Google and Facebook. Second, Searches made through DuckDuckGo also automatically connect to the encrypted versions of websites wherever possible, making it harder for anyone to see what you’re looking at online. Third, DuckDuckGo calculates and shows users a Privacy Grade as a reminder regarding online privacy. Fourth, of course, Search Privately. DuckDuckGo search doesn’t track you. Ever.

As for its down side, some argued that the quality of DuckDuckGo’s search results are not as good as Google. DuckDuckGo offers a more unfiltered version of the internet. Unlike Google constantly updates its search algorithm tailored to its users’ browsing habits. You may find the results that have been penalized or removed by Google because they are dangerous, patently false, or just misinformation designed to confuse people. But DuckDuckGo claims they use algorithms to calculate all the results and filter millions of possible results down to a ranked order. As a search engine, DuckDuckGo doesn’t offer as many services as Google, and as a result, is less convenient. And of course, because DuckDuckGo doesn’t track and profile users, it does not and cannot provide customized searches like Google.

Currently, DuckDuckGo is Ranking #4 of Utilities in the App Store for iPhone. It’s not as popular as the #1 Google and #2 Chrome, which offers built-in features such as Google Maps, Flights, etc. They are integrated with your other Google accounts and products, which can sometimes be rewarding. But DuckDuckGo rates 4.9 in 1.4M Ratings among Utilities in the App Store for iPhone. The users appreciate DuckDuckGo due to the respect for privacy and the heads-up it offers when something underhanded goes down.

DuckDuckGo is essential to people who care about privacy more than convenience. It will likely thrive in its niche market.

October 25, 2022

Beauty is in the eye of the algorithm?

Beauty is in the eye of the algorithm?
Anonymous | October 20, 2022

AI generated artwork has been making headlines over the past few years, but many artists wonder if these images can even be considered art.

Can art be created by a computer? Graphic design and digital art have been around for decades, and these techniques require experience, skill and often years of education to master. Artists have used tools such as Adobe Photoshop to traverse this medium and produce beautiful and intricate works of art. However, in the last few years new types of artificial intelligence software, such as DALL-E 2, have enabled anyone, even those without experience or artistic inclination, to produce elaborate images by entering just a few key words. This is exactly what Jason Allen did to win the Colorado State Fair art competition in September 2022: he used the AI software Midjourney to generate his blue ribbon winning piece “Théâtre D’opéra Spatial”. Allen did not create the software, but had been fascinated by its ability to create captivating images and wanted to share it with the world. Many artists were outraged by the outcome, but Allen maintains that he did nothing wrong as he did not break any of the rules of the competition nor did he try to pass of the work as his own, submitting it under “Jason Allen via Midjourney”. The competition judges also maintained that though they did not initially know that Midjourney was an AI program, they would have awarded Allen the top prize regardless. However, it seems that the ethical impacts here go deeper than the rules of the competition themselves.

“Théâtre D’opéra Spatial” via Jason Allen from the New York Times

A few questions come to mind when analyzing this topic. Can the output of a machine learning algorithm be considered art? Is the artist here the person who entered the phrase into the software, the software itself, or the developer of the software? Many artists argue that AI cannot produce art because its outputs are devoid of any meaning or intention. They say that art requires emotion and vulnerability in order to be truly creative, though it seems incorrect to try and define the term “art”. Additionally, critics of AI softwares claim that they are a means of plagiarism as the person inputting the key words did not create the work themselves and the software requires previous work as a basis for its learning, so the output is necessarily based on other people’s effort. This is not, however, the first time that AI generated art has made headlines. In 2018, an AI artwork sold for $432,500 after being auctioned at Christie’s. The work, Portrait of Edmond Belamy, was created by Obvious, a group of Paris based artists, who programmed what they coined a “generative adversarial network”. This system consists of two parts, a “generator” and a “discriminator”, the first which creates the image and the second which tries to differentiate between human created and machine generated works. Their goal was to fool the “discriminator”. The situation here was slightly different from the Midjourney generated art as the artists were also the developers of the algorithm and the algorithm itself seems to be credited.

Portrait of Edmond Belamy from Christie’s

As someone who has no artistic ability or vested interest in the world of art, it was difficult to even form an opinion on some of these ethical questions, but the complexity of this topic intrigued me. However, even though this is not a research experiment, the principles of The Belmont Report are relevant in this situation. First, there seems to be an issue with Beneficence here. Beneficence has to do with “maximizing possible benefits and minimizing possible harms.” Allowing or entering AI generated art in a competition is in conflict with this principle. Artists often spend countless hours perfecting their works whereas software can create it in a matter of seconds, so there is a lot of potential for harm here. The winner of the Colorado state fair competition in particular also received a $300 prize, which is money that could have gone to an artist who had a more direct impact on their submission. Furthermore, there is the issue of Justice also mentioned in the Belmont Report. Justice has to do with people fairly receiving benefits based on the effort they have contributed to some project. As mentioned above, the works generated by AI will be based on other people’s intellectual property, but those people will not receive any credit. Additionally, in both the Christie’s and Colorado State Fair cases, the artists are profiting from these works, so there is a case to be made that those whose art was used to train the algorithms are also entitled to some compensation. In the end, it seems that this is another case of technology moving faster than the governing bodies of particular industries. Moving forward the art world must decide how these newer and more advanced softwares fit into spaces where technology has historically, and often intentionally, been excluded.

References:

https://www.nytimes.com/2022/09/02/technology/ai-artificial-intelligence-artists.html

https://www.smithsonianmag.com/smart-news/artificial-intelligence-art-wins-colorado-state-fair-180980703/

https://www.christies.com/features/a-collaboration-between-two-artists-one-human-one-a-machine-9332-1.aspx

https://www.hhs.gov/ohrp/sites/default/files/the-belmont-report-508c_FINAL.pdf

October 17, 2022

Fear? AI is coming for your career, is there a different point of view

Fear? AI is coming for your career, is there a different point of view
Anonymous | October 14, 2022

The robots are coming, and in just a few short years, everyone’s mundane job tasks will be taken by advancements in AI. Human-like intelligence required to do more complex tasks can now be given to AI.

As seen by many humans today, the rise of machines…

However, the history of AI is automation, and that subject has proven such a declaration of doom a bit farse. Though many claim its negative downside without ever showing the whole truth, such as “a short history of jobs and automation” by the World Economic Forum. (WEF Sep 3, 2020). Over millennia’s past, humans have invented numerous solutions to problems, many of them of a static nature. They are easier to achieve automation in. However, we have never stopped working to achieve it with dynamic systems, and AI is just the latest success in this category of automation. I submit many exhibits of the past to this realization, and we can prove to ourselves that AI is here to replace less creative thinking jobs with ones more creative and intellectual, having more basis in human culture. This point of view is well expressed in the McKinsey & Company blog from Susan Lund and James Manyika (Mc Nov 29, 2017).

History of automata

In the ancient world of our ancestors, we had more advancements than many people today know. For instance, the Greek Hellenistic world created prototypes to demonstrate ideas on basic scientific principles. These demonstrations provided many mechanisms, hydraulics, pneumatics, and the programmable cart. The article from New Scientist on “The programmable robot of ancient Greece” by Noel Sharkey, describes this in deeper detail (NS, July 4, 2007). This programable cart was the first effort to mechanize something more dynamic in function. A mechanical-like beginning to AI. Further advancement to programmability took place around 150BCE with the Antikythera mechanism, which calculated the positions of astronomical objects. This was not a process humans thought could be done with mechanical devices. This is another more advanced mechanism made within ancient times, and the article by Smithsonian Magazine titled “Decoding the Antikythera Mechanism, the First Computer” by Jo Marchant, gives greater insight into this device (SM, Feb 2015). While these automata became feasibilities and demonstrations for many centuries, they still led to the replacement of human jobs with innovations from human creative endeavors that would be classifiable as divergent. Ultimately these devices replaced jobs as it took much more labor to produce the value outcomes by hand. However, these inventions opened new opportunities for jobs, such as university professors, inventors, researchers, and the like. Job’s that created devices like these to improve humanity’s daily lives, with less convergent work and allowing the greatness of the human mind to wander the immense open unknown, to explore!

Ethical implications

Seeing that automation brings the displacement of jobs from one skill set to another, there is an ethical implication for companies and cultures to consider. This notation is well articulated in MIT Sloan school of management’s article, “Ethics and automation: What to do when workers are displaced” by Tracy Mayor (MIT, July 8^th, 2019). While companies can find new technologies quickly in automation, the labor force is not as quick to change and adapt. Thus, the advocation of companies having the upper hand in capital revenue generation at larger magnitudes suggests the opportunity to be responsible weights on the technology and adopting companies. So should companies pay the bill for change, or the workforce, perhaps a deeper discussion on fairness and sharing the responsibilities of society?

Conclusion

Today, we can see that the ML/AI revolution is repeating history, and humans are worried again about the incoming change. This time it will be cultural, legal, and ethical risks for business. The difference in our society will be more dramatic than ever before. Many economists suggest that a government jobs guarantee would help settle community uncertainty. Bill Gates has suggested that a robot tax would be helpful to offset job loss, making the robot a cheaper option while still providing the government revenue to support job guarantees. Perhaps the era of Star Trek is upon us, a utopian society where all the machines do the mundane of producing our needs and wants. While we focus on human culture and meaningfulness but is mundane meaningless? We can find more questions to unpack, but it is easy to see immediately that AI might just benefit us in future jobs!

October 17, 2022

The Credit Conundrum

The Credit Conundrum
Anonymous | October 14, 2022

A person’s credit score is an important personal data feature that lenders use to evaluate a borrower’s ability to pay back a loan (i.e. creditworthiness). The unfortunate reality is that most consumers don’t have a grasp on the nuances of the credit score model, the most prominent of which was developed by Fair Isaac Corp. (FICO). Credit scores can determine both an individual’s ability to get a loan (e.g. auto loan, mortgage, business loan, student loan, etc.) the interest rate associated with that loan, and the amount of deposit required for larger purchases. FICO has categorized estimated creditworthiness with the following ranges: Excellent: 800–850, Very Good: 740–799, Good: 670–739, Fair: 580–669, Poor: 300–579.

Features of the credit score algorithm [2]

Rarely does the average consumer comprehend the factors that affect their FICO credit score, and it’s quite possible that many consumers don’t know their FICO score or how to check for it. The reality is, a credit score below 640 is usually considered “subprime” and puts the borrower in a dangerous position of falling into a debt trap. “Data shows that more than 1 in 5 Black consumers and 1 in 9 Hispanic consumers have FICO scores below 620; meanwhile, 1 out of every 19 white people are in the sub-620 category” [1]. Subprime borrowers frequently become the target of predatory lending which only exacerbates the unfortunate situation. A Forbes article, written by Natalie Campisi, asserts that the current credit scoring model has been negatively influenced by a long history of discriminatory practices. Algorithmic bias in the credit industry was acknowledged in 1974 when the “Equal Credit Opportunity Act disallowed credit-score systems from using information like sex, race, marital status, national origin and religion” [1]. However, the new credit score model evaluation criteria doesn’t take into account generations of socioeconomic inequity. Federal legislation has been passed in addition to the Equal Credit Opportunity Act to make the credit industry more transparent and equitable.

Despite these efforts on a federal level, the issue of algorithmic bias remains when credit agencies aggregate data points into individual credit scores. Generational wealth has passed disproportionately to white people, so the concept of creditworthiness should be reimagined with feature engineering for equity and inclusion. For example, “FinReg Labs, a nonprofit data testing center, analyzed cash-flow underwriting and the results showed that head-to-head it was more predictive than traditional FICO scoring. Their analysis also showed that using both the FICO score and cash-flow underwriting together offered the most accurate predictive model” [1].

Enhancing the fairness of the credit industry could prove pivotal to the advancement of disenfranchised communities. Credit scoring models ignore rental payment history, but they take housing payments into account when generating credit scores. This prevents many otherwise credit worthy individuals from improving their credit score due to the massive gap in homeownership between whites (74.5% by end of 2020) and non-white communities (44% by end of 2020) [1]. The FICO credit scoring model has gone through many iterations and a variant is used in about 90% lending cases [2]. However, lenders may use different versions of the algorithm to determine loan amounts, interest rates, payback period, and any deposits. Therefore, there’s a need for uniformity of credit standards across different lending opportunities to prevent lending bias. A recent Pew Research paper found that in New York City, over half of debt claims judgments/lawsuits affected individuals in predominantly black or hispanic communities, and 95% of the lawsuits affected people in low- to moderate-income neighborhoods [1]. “Using data that reflects bias perpetuates the bias, critics say. A recent report by CitiGroup states that the racial gap between white and Black borrowers has cost the economy some $16 trillion over the past two decades. The report offers some striking statistics:

● Eliminating disparities between Black and white consumers could have added $2.7 trillion in income or +0.2% to GDP per year.

● Expanding housing credit availability to Black borrowers would have expanded Black homeownership by an additional 770,000 homeowners, increasing home sales by $218 billion.

● Giving Black entrepreneurs access to fair and equitable access to business loans could have added $13 trillion in business revenue and potentially created 6.1 million jobs per year.” [1]

Data taken from this 2010 survey by the Federal Reserve. A more recent survey is available from the Urban Institute, although Asian-Americans aren’t included in their data. [6]

I’ve worked as a financial consultant for both Merrill Lynch Wealth Management and UBS Private Wealth Managment, so I have first-hand insight into the credit conundrum. The credit industry could be enhanced through the development of structured lending products. Furthermore, the bankers who develop these lending products should form criteria that accounts for both years of economic inequality and the reinterpretation of “creditworthiness”. Also, institutional banks should do financial literacy and credit workshops in disenfranchised communities and publish relevant content to remedy the credit disparity. Clients who employ financial consulting services are educated on how to leverage the banking system to reach their financial goals, but the vast majority of the U.S. population doesn’t qualify for personalized financial services. However, these same financial services organizations interface with the masses. Banks should cater to the masses to empower rather than to exploit the proletariat through discriminatory or predatory lending practices.

References

[1] From Inherent Racial Bias to Incorrect Data—The Problems With Current Credit Scoring Models – Forbes Advisor

[2] Credit Score: Definition, Factors, and Improving It

[3] What are the Texas Fair Lending Acts?

[4] Credit History Definition

[5] Subprime Borrower Definition

[6] Average Credit Score by Race: Statistics and Trends

October 17, 2022

When It Comes to Data: Publicly Available Does Not Mean Available for Public Use

When It Comes to Data: Publicly Available Does Not Mean Available for Public Use
Anonymous | October 14, 2022

Imagine it’s 2016 and you’re a user on a dating platform, hoping to find someone worth getting to know. One day, you wake up and find out your entire profile, including your sexual orientation, has been released publicly and without your consent. Just because something is available for public consumption does not mean it can be removed from its context and used somewhere else.

What Happened?
In 2016, two Danish graduate research students, Emil Kirkegaard and Julius Daugbjerg Bjerrekæ, released a non-anonymized dataset of 70,000 users of the OK Cupid Platform, including very sensitive personal data such as usernames, age, gender, location, sexual orientation, and answers to thousands of very personal questions WITHOUT the consent of the platform or its users.

Analyzing the Release Using Solove’s Taxonomy
In case it wasn’t already painfully obvious, there are serious ethical issues in the way the researchers both collected and released the data. In a statement to Vox, an OK Cupid spokesperson emphasize that the researchers violated both terms of service and privacy policy of platform.

As we’ve discussed in class, users have an inherent right to privacy. OK Cupid users did not consent to have their data accessed, used, or published in the way it was. If we examine this using Solove’s Taxonomy framework, it becomes clear that the researchers violated every point he made in his analysis. In terms of information collection, this would constitute as surveillance, especially given the personal nature of the data. As for information processing, this is a gross misuse of the data and blatantly violates the secondary use and exclusion clauses. None of the users consented to having their data used for any type of study nor did they consent to having it published. The researchers argued that the data is public and that by signing up for OK Cupid, the users themselves provided that data for public consumption. This is true to an extent: the users consented to having their profile data accessed by other users of the platform—all a person had to do to access the data is create an account. What the users did not consent to was having that data be publicly available off the platform and then used by researchers not associated with OK Cupid to conduct unauthorized studies. The researchers also did not provide users with the opportunity to have a say in how their data was being used. According to Solove, exclusion is “a harm created by being shut out from participating in the use of one’s personal data, by not being informed about how that data is used, and by not being able to do anything to affect how it is used.” The researchers clearly violated every principle of Solove’s Taxonomy in every step of their process.

The aggregation of the data itself was unethical as they did not ask anyone for permission before scraping the website. This coupled with the fact that the researchers purposely not to make the data anonymous is beyond atrocious—the only reason the dataset did not include pictures was because they would have taken up too much space. And when asked why they didn’t remove usernames, the researchers’ response was that they wanted to be able to edit the data at a later time, in the event they gained access to more information. Let me repeat that again. They wanted to be able to edit the dataset and update user information and make the dataset as robust as possible with as much information as they could find. For example, if a user uses the same username across different platforms and had their height or race listed on a different platform, then they could crosscheck that platform and update the dataset. This also puts users at risk, particularly users whose sexuality or their lifestyle could make them targets of discrimination or hate crimes. That type of information is very private and has no place being publicized in this manner.

The dissemination of the information was a total and unethical breach of confidentiality and gross invasion of privacy. As I’ve already stated, none of the users consented to have their very sensitive personal data scraped and published in a dataset that would be used for other studies.

The Defense
Kirkegaard defended their decision to release the dataset under the claim that the data is already publicly available. There is so much wrong with that statement.

Publicly Available Does Not Mean Available for Public Use
What does this mean? It means that just because a user consents to having their data on a platform does not mean that data can be used in whatever capacity a researcher wants. This concept also shreds the researchers’ defense. Users of OK Cupid consented to have their data used only as outlined in the company’s privacy policy and terms of service, meaning it would only be accessed by other users on the app.

What they did not consent to was having a couple of Danish researchers publish that data for anyone and everyone in the world to see. At this point in time OK Cupid was not using real names, only aliases but the idea that someone could connect an alias or username to a real life individual and access their views on everything from politics to whether they think it’s okay for adopted siblings to date each other to sexual orientation and preferences. (Yeah, I know, my hackles went up too.)

The impact this release of research had on its users is its own separate issue. Take a second to go through this blog post by Chris Girard: https://www.chrisgirard.com/okcupid-questions/. It shows the thousands of question OK Cupid users answer in their profiles, which were also released as part of the dataset.

Based off Solove’s Taxonomy, we can conclude that the researchers’ actions were unethical. Their defense was that the data was already publicly available. I argue that just because that data can be accessed by anyone who creates an OK Cupid account does not mean that it can be used for anything other than what the users have consented to. And to reiterate once again, NONE of them consented to having their data published and then used to conduct research studies both on and off the platform. Even if OK Cupid wanted to conduct an internal study on dating trends, they would still need to get consent from their users to use their data for that study.

The Gravity of the Implications and Why Ethics Matter
This matter was settled out of court and the dataset ended up being removed from the Center for Open Science (the open-source website where it was published) after a copyright claim was filed by the platform. Many people within the science community have condemned the researchers for their actions.

The fact that the researchers never once questioned the morality of their conduct is a huge cause for concern. As data scientists, we have an obligation to uphold a code of ethics. Just because we can do something does not mean we should. We need to be accountable to the people whose data we access. There is a reason that privacy frameworks and privacy policies exist. As data scientists, we need to put user privacy above all else.

https://www.vox.com/2016/5/12/11666116/70000-okcupid-users-data-release
https://www.vice.com/en/article/qkjjxb/okcupid-research-paper-dmca
https://www.vice.com/en/article/53dd4a/danish-authorities-investigate-okcupid-data-dump