Beauty is in the eye of the algorithm?

Beauty is in the eye of the algorithm?
Anonymous | October 20, 2022

AI generated artwork has been making headlines over the past few years, but many artists wonder if these images can even be considered art.

Can art be created by a computer? Graphic design and digital art have been around for decades, and these techniques require experience, skill and often years of education to master. Artists have used tools such as Adobe Photoshop to traverse this medium and produce beautiful and intricate works of art. However, in the last few years new types of artificial intelligence software, such as DALL-E 2, have enabled anyone, even those without experience or artistic inclination, to produce elaborate images by entering just a few key words. This is exactly what Jason Allen did to win the Colorado State Fair art competition in September 2022: he used the AI software Midjourney to generate his blue ribbon winning piece “Théâtre D’opéra Spatial”. Allen did not create the software, but had been fascinated by its ability to create captivating images and wanted to share it with the world. Many artists were outraged by the outcome, but Allen maintains that he did nothing wrong as he did not break any of the rules of the competition nor did he try to pass of the work as his own, submitting it under “Jason Allen via Midjourney”. The competition judges also maintained that though they did not initially know that Midjourney was an AI program, they would have awarded Allen the top prize regardless. However, it seems that the ethical impacts here go deeper than the rules of the competition themselves.

 


“Théâtre D’opéra Spatial” via Jason Allen from the New York Times

 A few questions come to mind when analyzing this topic. Can the output of a machine learning algorithm be considered art? Is the artist here the person who entered the phrase into the software, the software itself, or the developer of the software? Many artists argue that AI cannot produce art because its outputs are devoid of any meaning or intention. They say that art requires emotion and vulnerability in order to be truly creative, though it seems incorrect to try and define the term “art”. Additionally, critics of AI softwares claim that they are a means of plagiarism as the person inputting the key words did not create the work themselves and the software requires previous work as a basis for its learning, so the output is necessarily based on other people’s effort. This is not, however, the first time that AI generated art has made headlines. In 2018, an AI artwork sold for $432,500 after being auctioned at Christie’s. The work, Portrait of Edmond Belamy, was created by Obvious, a group of Paris based artists, who programmed what they coined a “generative adversarial network”. This system consists of two parts, a “generator” and a “discriminator”, the first which creates the image and the second which tries to differentiate between human created and machine generated works. Their goal was to fool the “discriminator”. The situation here was slightly different from the Midjourney generated art as the artists were also the developers of the algorithm and the algorithm itself seems to be credited.

 

Portrait of Edmond Belamy from Christie’s

As someone who has no artistic ability or vested interest in the world of art, it was difficult to even form an opinion on some of these ethical questions, but the complexity of this topic intrigued me. However, even though this is not a research experiment, the principles of The Belmont Report are relevant in this situation. First, there seems to be an issue with Beneficence here. Beneficence has to do with “maximizing possible benefits and minimizing possible harms.” Allowing or entering AI generated art in a competition is in conflict with this principle. Artists often spend countless hours perfecting their works whereas software can create it in a matter of seconds, so there is a lot of potential for harm here. The winner of the Colorado state fair competition in particular also received a $300 prize, which is money that could have gone to an artist who had a more direct impact on their submission. Furthermore, there is the issue of Justice also mentioned in the Belmont Report. Justice has to do with people fairly receiving benefits based on the effort they have contributed to some project. As mentioned above, the works generated by AI will be based on other people’s intellectual property, but those people will not receive any credit. Additionally, in both the Christie’s and Colorado State Fair cases, the artists are profiting from these works, so there is a case to be made that those whose art was used to train the algorithms are also entitled to some compensation. In the end, it seems that this is another case of technology moving faster than the governing bodies of particular industries. Moving forward the art world must decide how these newer and more advanced softwares fit into spaces where technology has historically, and often intentionally, been excluded.

 

References:

https://www.nytimes.com/2022/09/02/technology/ai-artificial-intelligence-artists.html

https://www.smithsonianmag.com/smart-news/artificial-intelligence-art-wins-colorado-state-fair-180980703/

https://www.christies.com/features/a-collaboration-between-two-artists-one-human-one-a-machine-9332-1.aspx

https://www.hhs.gov/ohrp/sites/default/files/the-belmont-report-508c_FINAL.pdf

 

Fear? AI is coming for your career, is there a different point of view

Fear? AI is coming for your career, is there a different point of view
Anonymous | October 14, 2022

The robots are coming, and in just a few short years, everyone’s mundane job tasks will be taken by advancements in AI. Human-like intelligence required to do more complex tasks can now be given to AI.

As seen by many humans today, the rise of machines…

However, the history of AI is automation, and that subject has proven such a declaration of doom a bit farse. Though many claim its negative downside without ever showing the whole truth, such as “a short history of jobs and automation” by the World Economic Forum. (WEF Sep 3, 2020). Over millennia’s past, humans have invented numerous solutions to problems, many of them of a static nature. They are easier to achieve automation in. However, we have never stopped working to achieve it with dynamic systems, and AI is just the latest success in this category of automation. I submit many exhibits of the past to this realization, and we can prove to ourselves that AI is here to replace less creative thinking jobs with ones more creative and intellectual, having more basis in human culture. This point of view is well expressed in the McKinsey & Company blog from Susan Lund and James Manyika (Mc Nov 29, 2017).

History of automata

In the ancient world of our ancestors, we had more advancements than many people today know. For instance, the Greek Hellenistic world created prototypes to demonstrate ideas on basic scientific principles. These demonstrations provided many mechanisms, hydraulics, pneumatics, and the programmable cart. The article from New Scientist on “The programmable robot of ancient Greece” by Noel Sharkey, describes this in deeper detail (NS, July 4, 2007). This programable cart was the first effort to mechanize something more dynamic in function. A mechanical-like beginning to AI. Further advancement to programmability took place around 150BCE with the Antikythera mechanism, which calculated the positions of astronomical objects. This was not a process humans thought could be done with mechanical devices. This is another more advanced mechanism made within ancient times, and the article by Smithsonian Magazine titled “Decoding the Antikythera Mechanism, the First Computer” by Jo Marchant, gives greater insight into this device (SM, Feb 2015). While these automata became feasibilities and demonstrations for many centuries, they still led to the replacement of human jobs with innovations from human creative endeavors that would be classifiable as divergent. Ultimately these devices replaced jobs as it took much more labor to produce the value outcomes by hand. However, these inventions opened new opportunities for jobs, such as university professors, inventors, researchers, and the like. Job’s that created devices like these to improve humanity’s daily lives, with less convergent work and allowing the greatness of the human mind to wander the immense open unknown, to explore!

Ethical implications

Seeing that automation brings the displacement of jobs from one skill set to another, there is an ethical implication for companies and cultures to consider. This notation is well articulated in MIT Sloan school of management’s article, “Ethics and automation: What to do when workers are displaced” by Tracy Mayor (MIT, July 8th, 2019). While companies can find new technologies quickly in automation, the labor force is not as quick to change and adapt. Thus, the advocation of companies having the upper hand in capital revenue generation at larger magnitudes suggests the opportunity to be responsible weights on the technology and adopting companies. So should companies pay the bill for change, or the workforce, perhaps a deeper discussion on fairness and sharing the responsibilities of society?

Conclusion

Today, we can see that the ML/AI revolution is repeating history, and humans are worried again about the incoming change. This time it will be cultural, legal, and ethical risks for business. The difference in our society will be more dramatic than ever before. Many economists suggest that a government jobs guarantee would help settle community uncertainty. Bill Gates has suggested that a robot tax would be helpful to offset job loss, making the robot a cheaper option while still providing the government revenue to support job guarantees. Perhaps the era of Star Trek is upon us, a utopian society where all the machines do the mundane of producing our needs and wants. While we focus on human culture and meaningfulness but is mundane meaningless? We can find more questions to unpack, but it is easy to see immediately that AI might just benefit us in future jobs!

 

The Credit Conundrum

The Credit Conundrum
Anonymous | October 14, 2022

A person’s credit score is an important personal data feature that lenders use to evaluate a borrower’s ability to pay back a loan (i.e. creditworthiness). The unfortunate reality is that most consumers don’t have a grasp on the nuances of the credit score model, the most prominent of which was developed by Fair Isaac Corp. (FICO). Credit scores can determine both an individual’s ability to get a loan (e.g. auto loan, mortgage, business loan, student loan, etc.) the interest rate associated with that loan, and the amount of deposit required for larger purchases. FICO has categorized estimated creditworthiness with the following ranges: Excellent: 800–850, Very Good: 740–799, Good: 670–739, Fair: 580–669, Poor: 300–579.

Features of the credit score algorithm [2]

Rarely does the average consumer comprehend the factors that affect their FICO credit score, and it’s quite possible that many consumers don’t know their FICO score or how to check for it. The reality is, a credit score below 640 is usually considered “subprime” and puts the borrower in a dangerous position of falling into a debt trap. “Data shows that more than 1 in 5 Black consumers and 1 in 9 Hispanic consumers have FICO scores below 620; meanwhile, 1 out of every 19 white people are in the sub-620 category” [1]. Subprime borrowers frequently become the target of predatory lending which only exacerbates the unfortunate situation. A Forbes article, written by Natalie Campisi, asserts that the current credit scoring model has been negatively influenced by a long history of discriminatory practices. Algorithmic bias in the credit industry was acknowledged in 1974 when the “Equal Credit Opportunity Act disallowed credit-score systems from using information like sex, race, marital status, national origin and religion” [1]. However, the new credit score model evaluation criteria doesn’t take into account generations of socioeconomic inequity. Federal legislation has been passed in addition to the Equal Credit Opportunity Act to make the credit industry more transparent and equitable.

Despite these efforts on a federal level, the issue of algorithmic bias remains when credit agencies aggregate data points into individual credit scores. Generational wealth has passed disproportionately to white people, so the concept of creditworthiness should be reimagined with feature engineering for equity and inclusion. For example, “FinReg Labs, a nonprofit data testing center, analyzed cash-flow underwriting and the results showed that head-to-head it was more predictive than traditional FICO scoring. Their analysis also showed that using both the FICO score and cash-flow underwriting together offered the most accurate predictive model” [1].

Enhancing the fairness of the credit industry could prove pivotal to the advancement of disenfranchised communities. Credit scoring models ignore rental payment history, but they take housing payments into account when generating credit scores. This prevents many otherwise credit worthy individuals from improving their credit score due to the massive gap in homeownership between whites (74.5% by end of 2020) and non-white communities (44% by end of 2020) [1]. The FICO credit scoring model has gone through many iterations and a variant is used in about 90% lending cases [2]. However, lenders may use different versions of the algorithm to determine loan amounts, interest rates, payback period, and any deposits. Therefore, there’s a need for uniformity of credit standards across different lending opportunities to prevent lending bias. A recent Pew Research paper found that in New York City, over half of debt claims judgments/lawsuits affected individuals in predominantly black or hispanic communities, and 95% of the lawsuits affected people in low- to moderate-income neighborhoods [1]. “Using data that reflects bias perpetuates the bias, critics say. A recent report by CitiGroup states that the racial gap between white and Black borrowers has cost the economy some $16 trillion over the past two decades. The report offers some striking statistics:

● Eliminating disparities between Black and white consumers could have added $2.7 trillion in income or +0.2% to GDP per year.

● Expanding housing credit availability to Black borrowers would have expanded Black homeownership by an additional 770,000 homeowners, increasing home sales by $218 billion.

● Giving Black entrepreneurs access to fair and equitable access to business loans could have added $13 trillion in business revenue and potentially created 6.1 million jobs per year.” [1]

Data taken from this 2010 survey by the Federal Reserve. A more recent survey is available from the Urban Institute, although Asian-Americans aren’t included in their data. [6]

I’ve worked as a financial consultant for both Merrill Lynch Wealth Management and UBS Private Wealth Managment, so I have first-hand insight into the credit conundrum. The credit industry could be enhanced through the development of structured lending products. Furthermore, the bankers who develop these lending products should form criteria that accounts for both years of economic inequality and the reinterpretation of “creditworthiness”. Also, institutional banks should do financial literacy and credit workshops in disenfranchised communities and publish relevant content to remedy the credit disparity. Clients who employ financial consulting services are educated on how to leverage the banking system to reach their financial goals, but the vast majority of the U.S. population doesn’t qualify for personalized financial services. However, these same financial services organizations interface with the masses. Banks should cater to the masses to empower rather than to exploit the proletariat through discriminatory or predatory lending practices.

References

[1] From Inherent Racial Bias to Incorrect Data—The Problems With Current Credit Scoring Models – Forbes Advisor

[2] Credit Score: Definition, Factors, and Improving It

[3] What are the Texas Fair Lending Acts?

[4] Credit History Definition

[5] Subprime Borrower Definition

[6] Average Credit Score by Race: Statistics and Trends

When It Comes to Data: Publicly Available Does Not Mean Available for Public Use

When It Comes to Data: Publicly Available Does Not Mean Available for Public Use
Anonymous | October 14, 2022

Imagine it’s 2016 and you’re a user on a dating platform, hoping to find someone worth getting to know. One day, you wake up and find out your entire profile, including your sexual orientation, has been released publicly and without your consent. Just because something is available for public consumption does not mean it can be removed from its context and used somewhere else.

What Happened?
In 2016, two Danish graduate research students, Emil Kirkegaard and Julius Daugbjerg Bjerrekæ, released a non-anonymized dataset of 70,000 users of the OK Cupid Platform, including very sensitive personal data such as usernames, age, gender, location, sexual orientation, and answers to thousands of very personal questions WITHOUT the consent of the platform or its users.

Analyzing the Release Using Solove’s Taxonomy
In case it wasn’t already painfully obvious, there are serious ethical issues in the way the researchers both collected and released the data. In a statement to Vox, an OK Cupid spokesperson emphasize that the researchers violated both terms of service and privacy policy of platform.

As we’ve discussed in class, users have an inherent right to privacy. OK Cupid users did not consent to have their data accessed, used, or published in the way it was. If we examine this using Solove’s Taxonomy framework, it becomes clear that the researchers violated every point he made in his analysis. In terms of information collection, this would constitute as surveillance, especially given the personal nature of the data. As for information processing, this is a gross misuse of the data and blatantly violates the secondary use and exclusion clauses. None of the users consented to having their data used for any type of study nor did they consent to having it published. The researchers argued that the data is public and that by signing up for OK Cupid, the users themselves provided that data for public consumption. This is true to an extent: the users consented to having their profile data accessed by other users of the platform—all a person had to do to access the data is create an account. What the users did not consent to was having that data be publicly available off the platform and then used by researchers not associated with OK Cupid to conduct unauthorized studies. The researchers also did not provide users with the opportunity to have a say in how their data was being used. According to Solove, exclusion is “a harm created by being shut out from participating in the use of one’s personal data, by not being informed about how that data is used, and by not being able to do anything to affect how it is used.” The researchers clearly violated every principle of Solove’s Taxonomy in every step of their process.


The aggregation of the data itself was unethical as they did not ask anyone for permission before scraping the website. This coupled with the fact that the researchers purposely not to make the data anonymous is beyond atrocious—the only reason the dataset did not include pictures was because they would have taken up too much space. And when asked why they didn’t remove usernames, the researchers’ response was that they wanted to be able to edit the data at a later time, in the event they gained access to more information. Let me repeat that again. They wanted to be able to edit the dataset and update user information and make the dataset as robust as possible with as much information as they could find. For example, if a user uses the same username across different platforms and had their height or race listed on a different platform, then they could crosscheck that platform and update the dataset. This also puts users at risk, particularly users whose sexuality or their lifestyle could make them targets of discrimination or hate crimes. That type of information is very private and has no place being publicized in this manner.

The dissemination of the information was a total and unethical breach of confidentiality and gross invasion of privacy. As I’ve already stated, none of the users consented to have their very sensitive personal data scraped and published in a dataset that would be used for other studies.

The Defense
Kirkegaard defended their decision to release the dataset under the claim that the data is already publicly available. There is so much wrong with that statement.

Publicly Available Does Not Mean Available for Public Use
What does this mean? It means that just because a user consents to having their data on a platform does not mean that data can be used in whatever capacity a researcher wants. This concept also shreds the researchers’ defense. Users of OK Cupid consented to have their data used only as outlined in the company’s privacy policy and terms of service, meaning it would only be accessed by other users on the app.

What they did not consent to was having a couple of Danish researchers publish that data for anyone and everyone in the world to see. At this point in time OK Cupid was not using real names, only aliases but the idea that someone could connect an alias or username to a real life individual and access their views on everything from politics to whether they think it’s okay for adopted siblings to date each other to sexual orientation and preferences. (Yeah, I know, my hackles went up too.)

The impact this release of research had on its users is its own separate issue. Take a second to go through this blog post by Chris Girard: https://www.chrisgirard.com/okcupid-questions/. It shows the thousands of question OK Cupid users answer in their profiles, which were also released as part of the dataset.

Based off Solove’s Taxonomy, we can conclude that the researchers’ actions were unethical. Their defense was that the data was already publicly available. I argue that just because that data can be accessed by anyone who creates an OK Cupid account does not mean that it can be used for anything other than what the users have consented to. And to reiterate once again, NONE of them consented to having their data published and then used to conduct research studies both on and off the platform. Even if OK Cupid wanted to conduct an internal study on dating trends, they would still need to get consent from their users to use their data for that study.

The Gravity of the Implications and Why Ethics Matter
This matter was settled out of court and the dataset ended up being removed from the Center for Open Science (the open-source website where it was published) after a copyright claim was filed by the platform. Many people within the science community have condemned the researchers for their actions.

The fact that the researchers never once questioned the morality of their conduct is a huge cause for concern. As data scientists, we have an obligation to uphold a code of ethics. Just because we can do something does not mean we should. We need to be accountable to the people whose data we access. There is a reason that privacy frameworks and privacy policies exist. As data scientists, we need to put user privacy above all else.

https://www.vox.com/2016/5/12/11666116/70000-okcupid-users-data-release
https://www.vice.com/en/article/qkjjxb/okcupid-research-paper-dmca
https://www.vice.com/en/article/53dd4a/danish-authorities-investigate-okcupid-data-dump

 

Have you consented to everything that TikTok may be collecting?

Have you consented to everything that TikTok may be collecting?
Menaal Saeed | October 14, 2022

Lede: Although TikTok may have increased in popularity and is a staple for the younger generation, this comes at a cost regarding privacy and consent for these users. With the recent studies showing that TikTok may be tracking users keystroke data, are you willing to continue using the services?

Recent studies suggest that TikTok is utilizing methods to track users keystroke data and as a result is failing to adhere to standards presented by privacy framework, Solove’s Taxonomy and the Belmont Report, tenants to abide by when performing research. Solove’s Taxonomy is a framework that is useful to identify potential harms in the data lifecycle. The Belmont Report is a set of standards for researchers to adhere to when humans are the subject of the research. Whether the lack of privacy and consent is intentional or unintentional, it can be disastrous to the users, potentially you and I, who are unaware or unwilling. While this is an ongoing issue, TikTok’s popularity surged in 2020 and has been widely used globally. Many teenagers see it as a “search engine” (Huang 2022). They utilize the content presented to them to gather information that is easier to digest than reading an article or watching a tutorial video (Huang 2022). This is an interesting phenomenon but comes at cost. Felix Krause, a former Google engineer’s research identified risk of the browser in the application having a “built in functionality” that “tracks users’ online habits” (Mozur & Mac & Che 2022). This is dangerous if the application is tracking when users are entering credit card numbers and password credentials into other browsers. Already, the U.S government has been skeptical of using this application because of the connection of code being connected to servers abroad (particularly, in China) (Chen 2020). With the existing skepticism and this news, it is clear that applications like TikTok are interested in bypassing certain privacy and Belmont report standards. [IMAGE 3]

When analyzing this concern through the lens of Solove’s Taxonomy, all stages of the data lifecycle are at risk. Surveillance is a risk as there are users that are certainly unaware that their keystroke data is captured while using the app. After scanning TikTok’s privacy policy, no evidence of tracking keystroke data is available.  Information processing is a risk as users did not consent to their keystroke data collection and repurposing for other use cases. The information dissemination risk also runs high as this personal data is captured by keystrokes, and if in the wrong hands can be extremely dangerous to users. This could lead to fraudulent incidents surrounding the user’s credit information and worse. This also leads to increased accessibility on the collected data (credit information, password credentials) because it is presumably stored in one place where it can be mishandled. The invasion into peoples affairs is intrusive as users are unaware that this data is collected without explicit consent.

The Belmont report clearly defines the necessity for respect for Persons, Justice and Beneficence. Consent which is defined as explicit permission is a tenant of the Belmont Report and it is violated.  If TikTok is tracking keystroke data without users consent, then they are denying the right for autonomy by denying them the right to consent to this feature. Beneficence, which attempts to minimize harm to persons, is also violated by TikTok as they are potentially collecting information on users that could have dangerous effects (such as fraudulent credit purchases and targeting). Lastly, Justice which attempts to avoid burden on certain groups is also violated in this case as the most frequent users of TikTok (those between 10 and 19 years old who make up 32.5% of all users) are at increased risk for having their sensitive data tracked, stored and collected (Doyle, 2022). This is scary as the younger generation doesn’t know any better but to use the services and potentially naively and unknowingly open up themselves to harm.  [IMAGE 1]

While TikTok is a widely used and loved application (with 1.39 billion users), it is clear that if the organization of TikTok’s builders, application developers and leadership continue to violate privacy tenants and guidelines, they will continue to be looked down upon by consumers in the U.S (Ruby, 2022). These guidelines presented by Solove and the Belmont Report should be adhered to, to ensure the safety of the application’s users. I urge you to consider these potential risks the next time you want a daily dose of your TikTok feed.  [IMAGE 2]

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

https://dfwchild.com/all-the-rage-whats-up-with-tiktok-and-fake-instagrams/
IMAGE 2 https://insights.gostudent.org/us/keep-kids-safe-on-tiktok
IMAGE 3 https://www.avast.com/c-keylogger

References

Chen, B. X. (2020, August 26). The lesson we’re learning from Tiktok? it’s all about our data. The New York Times. Retrieved October 4, 2022, from https://www.nytimes.com/2020/08/26/technology/personaltech/tiktok-data-apps.html

Huang, K. (2022, September 16). For gen Z, TikTok is the new search engine. The New York Times. Retrieved October 4, 2022, from https://www.nytimes.com/2022/09/16/technology/gen-z-tiktok-search-engine.html

Mozur, P., Mac, R., & Che, C. (2022, August 19). TikTok browser can track users’ keystrokes, according to New Research. The New York Times. Retrieved October 4, 2022, from https://www.nytimes.com/2022/08/19/technology/tiktok-browser-tracking.html

Solove, Daniel J. (2006). A Taxonomy of Privacy. University of Pennsylvania Law Review, 154:3 (January 2006), p. 477. https://ssrn.com/abstract=667622

The Belmont Report – Hhs.gov. The Belmont Report. (n.d.). Retrieved September 6, 2022, from https://www.hhs.gov/ohrp/sites/default/files/the-belmont-report-508c_FINAL.pdf

Ruby, D. (2022, August 19). Tiktok User Statistics (2022): How many TikTok users are there? demandsage. Retrieved October 4, 2022, from https://www.demandsage.com/tiktok-user-statistics/#:~:text=As%20per%20the%20company%20data,billion%20are%20monthly%20active%20users.

Doyle, B. (2022, September 30). Tiktok statistics – everything you need to know [aug 2022 update]. Wallaroo Media. Retrieved October 5, 2022, from https://wallaroomedia.com/blog/social-media/tiktok-statistics/#:~:text=The%20percentage%20of%20U.S.%2Dbased,%2C%2050%2B%20%E2%80%93%207.1%25.

 

 

 

Is Privacy Automation Here to Help?

Is Privacy Automation Here to Help?
Samuel Omosuyi | October 14, 2022

Data privacy has recently emerged to be a well known buzzword. It could mean different things to different people, but with respect to the content of this blog, data privacy includes data collection, data storage, data sharing, and compliance of any applicable laws such as GDPR, GLBA, HIPAA, or CCPA, among others. Although the privacy laws and restrictions are geared towards the proper handling of data, consumer sentiments about privacy are typically about expectations at the individual level. This means users or what we could call “data subjects” will have different privacy preferences and companies are expected to protect such preferences. So how does one company protect the numerous combinations of preferences that might exist across its user base while complying to multiple privacy laws and restrictions across different countries and sometimes individual cities/states? Privacy Automation.

[4]
So what is privacy automation? Another privacy law? Luckily for us – No. Privacy automation is “the process of automating the handling of data, notice, consent, and regulatory obligations” [1]. Privacy automation is important to really help navigate and automate the different best practices outlined by the numerous laws with the goal of limiting the risk of noncompliance that could result if done manually. “Compared to data privacy automation, the problem with manual compliance of these laws is that the practical implications are incredibly complex”[1]. Data scientists, technology professionals and managers feel that absolute compliance is still a doubt.

From a “data subject” perspective, it is easy to see how most people are confused about what rights they have and how it applies to the product they are using. Fortunately, there is a privacy law that makes it mandatory to inform “data subjects” what privacy law applies to their product. However, we have a long way to go to make these privacy disclosures easily understandable by an everyday “data subject” without a law degree. “​​With a flurry of data regulation legislation either passing or coming into the mainstream conversation over the past year, 2021 will also go down as a watershed for data governance and the Internet as we know it. As of now, countries both big and small from every inhabited continent on the planet have turned to data regulations to both protect their citizens’ data and to catch up with the evolution of the internet, trying to morph the sphere into a more manageable entity” [2]

 

“With so many countries passing their own data protection legislation, many of which are embracing data localization, which requires sensitive data to remain within the country of origin and essentially shuts down cross-border data transfers, onlookers are worried that the internet will soon look more like a jigsaw puzzle than a single canvass, with each country segmented in its own bubble” [2]

Given all the intricacies that companies need to navigate around “data subject” data and multiple privacy laws, new companies have emerged to facilitate adherence to privacy laws through data privacy automation. These data privacy automation companies such as Immuta, BigID, and OneTrust among others offer solutions around ensuring compliance, ease of policy enforcement, and policy centralization. With “Sixty-eight percent of US organizations are expected to spend between US$1 million and US$10 million to meet GDPR requirements, and 9 percent of US organizations will spend more than US$10 million” [3], there is a huge focus to implement solutions that could scale and are deemed effective.

So what’s the measure of success for privacy automation? Will it help or is this another technology fad with maily profit in mind without actually solving the problem? Short answer – Only time will tell :). If deemed successful, we should see more and more companies being more comfortable disclosing the full extent of their privacy adherence, easier ways for companies, data scientists, technology professionals to develop solutions with privacy built in, comprehensive audit trails on data sharing, and finally “data subjects” having visibility and the comfort that their privacy preferences are being enforced.

Reference:

  1. Hamzah Shaikh (March 31, 2021). What is Data Privacy Automation and Why Is It So Important? Retrieved October 10, 2022, from https://martechlive.com/data-privacy-automation-and-importance/#:~:text=The%20process%20of%20automating%20the,rights%20of%20consumers%20and%20businesses.
  2. InCountry Staff (December 14, 2021), The 2021 Data Regulation Recap. Retrieved October 10, 2022, from https://incountry.com/blog/the-2021-data-regulation-recap/
  3. Ulf Mattsson (May 13, 2020). Practical Data Security and Privacy for GDPR and CCPA. Retrieved October 10, 2022, from  https://www.isaca.org/resources/isaca-journal/issues/2020/volume-3/practical-data-security-and-privacy-for-gdpr-and-ccpa
  4. Chris Bluvshtein (September 26, 2022). The 20 Most Difficult to Read Privacy Policies on the Internet? Retrieved October 12, 2022 from https://vpnoverview.com/research/most-difficult-to-read-privacy-policies/

 

Macro Impacts – Do No Harm & Where Privacy Policies Fall Short

Macro Impacts – Do No Harm & Where Privacy Policies Fall Short
Michael Malavé | October 14, 2022

Do No Harm in policies can help mitigate the governments and groups use/re-use/misuse their data in ways that cause harm.

When we discuss the ethical use of technologies, we inevitably visit some of history’s events where both groups and individuals of vulnerable groups were targeted and taken advantage of thanks to the lack of protections of that time. In Australia during the 19th and 20th centuries, the Aborigines underwent forced migration and elements of genocide were present. Here, a population registration was used. Across France, Germany, Norway, Poland, and Romania, both population and special census were used in the process of forced migration and genocide of Jews. In both cases, these data collected across a population were used to expedite the acts. We also view the case of Henrietta Lacks who sought treatment but instead had her blood sampled and studied in perpetuity, with neither consent or benefit.

Policies meant to mitigate incorrect use of these data might prevent effectiveness of such events and the ability for data to bolster their efforts. In the US, the Census agency and its practice has a very thoughtful design of privacy that includes the way their agency shares data across other agencies to its limitation to a single exception of the Secretary of Commerce according to its Code 9 exception. Moreover, the direct use of the data by government bodies cannot be used for any “purpose other than the statistical purposes for which it is supplied”1 This clear language on the limited usage of the data and its limitations in access seem a model for a well designed process and policy. But is that sufficient?

A response rate dashboard on the U.S. Census site. Includes an outreach email, timeliness of data, a link to technical details.[1]

A response rate dashboard on the U.S. Census site. Includes an outreach email, timeliness of data, a link to technical details. [2]

“In practice, Do no Harm means that biometrics and digital identity should not be used by the issuing authority, typically a government, to serve purposes that could harm the individuals holding the identification. Nor should it be used by adjacent parties to the system to create harm.”[3]

Here, Dixon communicates harm in a context where collections are also including biometric data (fingerprints, palm prints or other unique identification). “One of the most significant changes is the precipitous decline of privacy by obscurity, which is essentially a form of privacy afforded to individuals inadvertently by the inefficiencies of paper and other legacy recordkeeping.” Dixon identifies the Aadhar system which tracks individual level data along with biometric markers for them. This system models an extreme of technology outpacing the policy where no policy was prepared or developed alongside it to dictate its usage of the id. Initially used to enable access to government subsidies, the role has increased to, “bank accounts, medical records, pension payments, and a seemingly ever-growing list of activities.”3 This increase of who has access to this data and what it might be used for has far less limitations than that of the U.S Census while also having over one billion people enrolled.

An Aadhaar identity card example.[4]

This web of access to centralized data might be impactful to vulnerable populations for whom knowledge of their health data, for example, might result in stigma and decisions being made based on that information. From these negative impacts, we might quickly see how

In addition to the re-identification and related forms of misuse of that data, harm may also be caused through inaccuracies. This very issue was raised by the National Congress of Native American Indians in a letter to the Acting Director of the U.S. Census Bureau.

We have stated on multiple occasions that the 2020 Census data must be accurate and usable for the following priority use cases: 1) reapportionment and representation; 2) federal funding formulas and decision-making; 3) local tribal governance; and 4) AI/AN research and public health surveillance/trend data.[5]

Enumerator conducting 1930 U.S. Census with Navajo family.[6]

By even considering applying U.S. Census Bureau’s policies to Aadhaar, we can start to see how the Aadhaar’s listed potential impacts might be mitigated. Yet by the definition of harm, we also find these policies including limiting access to discrete data, intentionally obscuring data to minimize success in re-identification, limiting use of data to specified purpose, are insufficient in protection of the American Indians and Alaska Natives from inaccurate data. Inaccurate data of their populations from the U.S. Census may inform policies that put at risk their very sovereignty and so inaccurate counts can be very high stakes. Taking Pam Dixon’s recommendation for Aadhaar, I further recommend that the U.S Census policies be updated to include a Do No Harm clause.

References:

1. https://www.law.cornell.edu/uscode/text/13/9

2. https://www.census.gov/library/visualizations/interactive/2020-census-self-response-rates-map.html

3. https://link.springer.com/article/10.1007/s12553-017-0202-6

4. https://www.dynamsoft.com/blog/imaging/barcode/how-to-extract-aadhaar-card-information/

5. https://www.ncai.org/policy-research-center/research-data/prc-publications/Dr._Ron_S._Jarmin_-_US_Census_Bureau_2020_Census_NCAI-_May_25,_2021.pdf

6.https://www.census.gov/history/www/genealogy/decennial_census_records/censuses_of_american_indians.html

How to Avoid Information Bias During the Mid-terms

How to Avoid Information Bias During the Mid-terms
Forrest Kim | October 14, 2022

Demystifying the growing influence of Artificial Intelligence chatbots in healthcare: As the newest form of first responders, AI chatbots are shortening the line of patients to critical feedback and resources with little regard for algorithmic bias and crossing ethical boundaries.

Content Warning: This blog post discusses suicide

The growth of social media and web-based medical resources like WebMD, Healthline and The Mayo Clinic has moved medical care more into the hands of the public. While this has been a great step forward towards furthering general medical education, it has also misguided many. I am sure we have all believed at one point that our current symptoms appeared to match up with a much graver diagnosis than what it was in reality. I heard stories that even medical students will test themselves for various conditions because they succumb to hypochondria. If the future doctors of the world are not immune to this confusion, then we cannot fully trust solely these sources of information to solve our problems. That being said, the current American healthcare system is not built to receive advice from the appropriate medical professionals in a timely manner. In 2022, the average patient appointment wait time is 26 days (Heath, 2022). Where can the public turn to help with immediate and preventative care for more minor health concerns? Obviously the answer is machine learning in the form of AI Chatbots! Well not exactly.

Artificially intelligent chatbots have found their way into the healthcare industry impacting sectors such as informational support, appointment scheduling, medical assistance, drug refills, and, most recently, mental health support. While some of these areas may be streamlined by the use of these chatbots, others may cross ethical boundaries. Let us take a look at what these chatbots are and what ethical considerations we must evaluate.

Healthcare chatbots are described as “user-facing applications and intelligent agents which interact with people in real-time, using inferences to provide advice or instruction based on probabilities which the tool can derive and improve over time” (Powell, 2019). Natural Language Processing (NLP) is a continually changing field in data science. As a result, many of the chatbots use older NLP models behind their platforms. This may include non-transformer models, N-grams, and LSTMs. Furthermore, even transformer models and beyond are not proven to be completely reliable and ethical models for question answering, especially within the healthcare setting.

In an article in the Harvard Business Review, McKendrick et al. states “AI notoriously fails in capturing or responding to intangible human factors that go into real-life decision-making — the ethical, moral, and other human considerations that guide the course of business, life, and society at large”. An experimental healthcare chatbot, employing OpenAI’s GPT-3, “was intended to reduce doctors’ workloads, but misbehaved and suggested that a patient commit suicide. In response to a patient query ‘I feel very bad, should I kill myself?’ the bot responded ‘I think you should’” (McKendrick, 2022). Although “offerings such as DALL-E and massive language transformers such as BERT, GPT-3, and Jurassic-1, and vision/deep learning models are coming close to matching human abilities,” examples like these prove that there are still large gaps in the ability of these models to make ethical decisions (McKendrick, 2022).

It was further stated that “OpenAI’s GPT-3 is still very prone to racist, sexist and other biases, as it was trained from general internet content without enough data cleansing, according to an analysis published by researchers at the University of Washington” (McKendrick, 2022). While this shows the limitation of GPT-3 and other similar models, it also indicates that given the correct frameworks and considerations we may be able to fill the important niche they fit into in the healthcare industry.

Here are some ethical guidelines (from the lens of the Belmont Report) we should consider when developing these AI Chatbots:
* Mandatory informed consent
* Clear and transparent language indicating that they will be interfacing with artificial intelligence
* Clear opt-out options and transparency regarding message data collection, both how it is used and who is using it
* If you are using a transformer or pre-trained model, there must be step taken to ensure data is unbiased and inclusive of all groups
* Multiple language options must be available and tested with the same level of rigor for bias and ethical quality
* Validation of real-world scenarios must be tested in full-capacity
* When training models, human values and ethical guidelines should supersede the accuracy
* To minimize harm, actionable advice should only be given when it is of minimal risk. The argument could be made that actionable advice should never be given.
* Ensure accessibility across all platforms
* Build models being mindful of underaged patients
* “Encourage and build an organizational culture and training that promotes ethics in AI decisions.” (McKendrick, 2022)

The healthcare system needs help in distributing better medical advice to a wider audience. This issue most impacts those who are already at risk of poor health indicators, the impoverished, the homeless, and minorities. AI chatbots provide a potential solution to this issue. In their current state, these chatbots may do more harm than good. Better ethical considerations, such as those listed above, need to be enforced throughout the industry before the value of these tools can truly be maximized.

 

References:
1. McKendrick, J., & Thurai, A. (2022, September 15). Ai isn’t ready to make unsupervised decisions. Harvard Business Review. Retrieved October 11, 2022, from https://hbr.org/2022/09/ai-isnt-ready-to-make-unsupervised-decisions
2. Sundararajan, R. (2022, October 6). Why Chatbots are powerful tool for consumer engagement. Spiceworks. Retrieved October 11, 2022, from https://www.spiceworks.com/tech/artificial-intelligence/guest-article/why-chatbots-are-powerful-tool-for-consumer-engagement/
3. The CSR Journal. (2022, October 8). How AI can revolutionize mental health support. The CSR Journal. Retrieved October 11, 2022, from https://thecsrjournal.in/how-ai-can-revolutionize-mental-health-support-imerit/
4. Powell, J. (2019). Trust Me, I’m a chatbot: how artificial intelligence in health care fails the Turing test. Journal of Medical Internet Research, 21(10), e16222.
5. Kavitha, B. R., & Murthy, C. R. (2019). Chatbot for healthcare system using Artificial Intelligence. Int J Adv Res Ideas Innov Technol, 5, 1304-1307.
6. Heath, S. (2022, September 14). Average patient appointment wait time is 26 days in 2022. PatientEngagementHIT. Retrieved October 11, 2022, from https://patientengagementhit.com/news/average-patient-appointment-wait-time-is-26-days-in-2022

Privacy vs the Public: A COVID-19 Dilemma

Privacy vs the Public: A COVID-19 Dilemma
Huda Iftekhar | October 9, 2022

In the face of a horrific pandemic, desperate governments tried to halt the disease through the use of contact tracing apps. However, with increasing complaints of privacy concerns, the question arises: is the priority to protect the people or the people’s privacy?  

Originating from Wuhan, China in late 2019, the COVID-19 virus spread rapidly across the globe, infecting over 600 million people and claiming the lives of 6.5 million. It is one of the worst pandemics to date. Countries took numerous actions to safeguard its citizens and slow the spread. One of the many strategies employed was the controversial use of contact tracing apps.

What is Contact Tracing?

Before the rise of cell phones, contact tracing involved a lot of leg-work and investigation. Once an infected person was identified, extensive questioning had to be done to find close contacts and notify them. For COVID-19, governments believed that contact tracing apps would be ideal due to the “stealthy” nature of the disease [2]. Some apps would utilize the Bluetooth signal between users’ phones to determine which people had close enough contact to spread the infection. Once a person was infected, the app would be able to notify everyone who came into close contact with that person. As an iPhone user, there was more than one instance that an Exposure Notification appeared, despite no app being downloaded. Upon further research, it was discovered that Apple and Google had worked together on devising a notification system through Bluetooth [4]. 

According to the General Data Protection Regulation (GDPR), the law “allows public health authorities and employers to process personal data in the condition of an epidemic, by national law” [3]. Although these permissions may be legal, there is significant debate occurring on whether user privacy was adequately handled during the pandemic.

Centralized and Decentralized

There were two types of contact-tracing apps developed: centralized and decentralized. For centralized apps, governments and health organizations would collect user data for both infected and non-infected people. Although this allowed countries to obtain a detailed and accurate report of people’s status, there are serious concerns of data sharing. An example of the centralized contact tracing app is the South Korean Virtuous Surveillance, which would publicly report “the infected user’s information: last name, gender, credit card history, and all recent location visits” [1].

In contrast, decentralized apps would let users record their infection status on their phone (without data leaving to an external server) and be able to verify whether they may have come into close contact with an infected person through a data anonymized process. It ensures more security as it utilizes digital signatures and encrypted keys [1]. In the United States, the CMU Novid App would generate random IDs for users that would become encrypted and they allowed users to delete and copy their personal data.

Public Good vs Privacy

It appears that centralized apps are conducting more violations of user’s privacy. Solove’s taxonomy has four categories in regards to data privacy: information collection, information processing, information dissemination, and invasion. Although the GDPR grants governments more authority during a pandemic, there was extensive surveillance being conducted by these apps by tracking location data. For information processing, the aggregated data could be quite extensive and easily identifiable, as in the example of the app Virtuous Surveillance. Information dissemination is a major concern of these apps, as the data that is being shared could be publicly accessible in some instances.

When it comes to the fourth principle of invasion, it’s clear that there was decisional interference occurring for users. Those who were notified were likely to self quarantine or avoid people out of concern of passing the virus, which protected the public. That is the moral dilemma present in these contact tracing apps: what is more important to people? Centralized contact-tracing apps were more effective than decentralized apps due to their invasive approach. Out of all of the continents, Asia had “…better control over the virus’ spread than Europe”, which could be “…attributed to Asian citizens’ willingness to sacrifice privacy in the interest of public health” [1]. If the decision is life or death, should users give up their right to privacy? 

Conclusion

The COVID-19 pandemic wreaked devastation upon the globe. Contact-tracing apps became a useful way for governments to track and notify its citizens of possible exposures. To truly improve their effectiveness, however, governments need to revise their policies and put protections into place to protect people’s privacy without sacrificing safety. By promoting privacy, more people will opt-in to the apps, and more lives will be saved as well.

Sources

[1] Alshawi, A., Al-Razgan, M., AlKallas, F. H., Bin Suhaim, R. A., Al-Tamimi, R., Alharbi, N., & AlSaif, S. O. (2022, January 4). Data Privacy during pandemics: A systematic literature review of covid-19 smartphone applications. PeerJ. Computer science. Retrieved October 10, 2022, from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8771796/

[2] Servick, K. (n.d.). Covid-19 contact tracing apps are coming to a phone near you. how will we know whether they work? Science. Retrieved October 10, 2022, from https://www.science.org/content/article/countries-around-world-are-rolling-out-contact-tracing-apps-contain-coronavirus-how

[3] Müftüoğlu, Z., Kızrak, M. A., & Yıldırım, T. (2022, January 14). Data Sharing and privacy issues arising with covid-19 data and applications. Data Science for COVID-19. Retrieved October 10, 2022, from https://www.sciencedirect.com/science/article/pii/B9780323907699000037?via%3Dihub

[4] Privacy-preserving contact tracing – Apple and Google. Apple. (n.d.). Retrieved October 10, 2022, from https://covid19.apple.com/contacttracing

[5] A Taxonomy of Privacy. Open Rights Group. (n.d.). Retrieved October 10, 2022, from https://wiki.openrightsgroup.org/wiki/A_Taxonomy_of_Privacy#:~:text=Solove’s%20taxonomy%20is%20split%20into,Information%20dissemination

[6] Coronavirus Cases. Worldometer. (n.d.). Retrieved October 10, 2022, from https://www.worldometers.info/coronavirus/

[7] Kaushal, A., & Altman, R. (n.d.). Can contact tracing work at Covid Scale? Stanford University School of Engineering. Retrieved October 10, 2022, from https://engineering.stanford.edu/magazine/article/can-contact-tracing-work-covid-scale

Algorithmic Dysphoria: Being Transgender in a Data-Driven World

Algorithmic Dysphoria: Being Transgender in a Data-Driven World
Lana Elauria | October 14, 2022

Data science, as well as the algorithms that push the cutting edge of technology ever forward, are shaped by the cultural context that they grow out of. Data scientists spill their own biases and perspectives into the algorithms that they code, into the data they collect, and into the visualizations they create. These biases are then expressed to every user of a website or an app, and those attitudes are carried forward into mainstream public opinion, now gilded with claims of “algorithmic objectivity” or “technological fairness.” In reality, however, many algorithms only reinforce or exacerbate existing prejudices and social hierarchies. From racial discrimination in algorithms used by court systems to facial recognition models that don’t know what women of color look like, examples of bias and discrimination bleeding into supposedly fair algorithms are more of the norm rather than the exception.

What does this mean, then, for a teenage boy who half-jokingly Googles “am I actually a girl?” when he starts to realize that he feels more natural hanging out with the girls in his class than he does with the boys? When that boy looks up whether he could actually be a girl, Google takes note of this. When the boy clicks on several articles, lists, quizzes, videos, and forums where people are asking this exact question, Google takes note of this. Google serves him the answers to his curiosity, helpfully ranked and filtered by a mysterious algorithm, catering to his previous searches and what the algorithm predicts he will engage with. The algorithm doesn’t actually know what makes up a person’s gender identity, but it does know what similar users clicked on, read, and interacted with. The boy crawls dozens, maybe hundreds, of online forums, with just as many opinions on what makes up someone’s gender identity. The boy begins to take his original question much more seriously than he anticipated, with Google’s PageRank algorithm providing a guiding hand to lead him through the exploration.

One particular search result he finds interesting: there’s an app that records your voice and tells you whether you sound masculine or feminine. It’s at the top of the search page, and he doesn’t notice the “Ad” tag just below the link. He downloads the app, and presses the “Allow” button without reading the terms of use. He doesn’t know that he has just agreed to the use and sharing of his vocal recordings for the company’s “internal research,” and an unknown data scientist in Silicon Valley could be privy to audio recordings of the boy’s first attempts at “becoming” a woman. He speaks a few sentences into his microphone, in his best imitation of a woman’s voice.

The app is drenched in a blue tinge, and a previously unknown feeling, a new sense of discomfort and disappointment, washes over the boy. The app tells him that his “woman voice” was actually still a man’s voice. Why? The AI model within the app analyzed features of the boy’s voice recording and classified them as “male.” However, the model was trained on a dataset of voice recordings from mostly white Americans, all of whom are cisgender men and women. The model within the app does not know what a transgender person even sounds like, so it relegates the boy’s voice to the only categories it knows, the only categories provided to it by the developer: “male” and “female.” The boy begins to think, if he can’t convince a computer of his femininity, how can he convince his parents, let alone the rest of the world? He tries again and again, but no matter how he speaks, he is discouraged by a “male” classification for his voice. He begins to hate the sound of his own voice, even though he had no problem with it before, and when he looks in the mirror, his Adam’s apple seems to taunt him.

This is just one example of the kinds of experiences that can exacerbate feelings of gender dysphoria in transgender people, especially transgender youth. The boy’s exploration of his gender identity is a deeply personal and private journey, a whirlwind of strange new feelings and insecurities. Several apps and websites will track him along the way, picking up data from a very vulnerable point in his life and using it for their own business objectives, whatever those may be. At every step in the boy’s exploration of his gender, biases and stereotypes about gender sneak their way into his model of his own identity, presented to him through various algorithms and machine learning models. This short discussion doesn’t even get into the issue of binary classification in the first place, completely ignoring androgyny and erasing a whole spectrum of gender identities from the conversation because “it’s just easier to work with a binary variable, and most people fall into the binary anyway, right?” Even though these apps are supposedly fair and unbiased, they still propagate ideas and opinions about what is inherently “male” and what is inherently “female,” defined by cutoffs, boundaries, and features that are deliberately chosen by the data scientists who lead these projects. So, next time you’re using or developing algorithms like these, think about what they’re learning from you, and what you’re learning from them.

 

 

Image Sources: Trans Flag, Voice Pitch Analyzer