You Wish You Were Being Recorded

You Wish You Were Being Recorded
By Joseph Wood | May 28, 2021

Full Disclaimer: I work for Siri

Have you ever been talking about a product with a friend and immediately received an ad for it? Microphones are everywhere especially in the devices you interact with the most from phones, and computers to remotes and cars. These microphones are sold to consumers as accessibility tools that make device actions intuitive and easy to use. If you’ve ever thought the microphones were recording more than just the actions requested of them, you wouldn’t be alone. About 43% percent of Americans believe these devices record conversations without their permission (Fowler, 2019). How else are the highly targeted ads explained? Sadly, there is no credible evidence suggesting that these devices are constantly recording you. It’s much worse.

Audio is incredibly expensive to process from a computing and storage standpoint. It needs to be transcribed and even the best transcription services in the world struggle to do this effectively in perfect environments, let alone in real world homes. The amount of dialogue that would need to be collected, stored, correctly transcribed, and processed to get to a point where it might be able to suggest something useful to buy is a huge hurdle. Not to mention the monetary fines these companies would face if their operation were ever revealed. But luckily for them, they don’t need to risk anything because you’ve already agreed to an easier option.

There’s an endless amount of data that is cheaply and readily available for these companies. That’s your digital footprint. Every time you land on a website, look at a new shirt, or spend a second on your favorite mind numbing app, this information is collected, stored, and shared. Companies like Facebook and Google can put together the crumbs you leave across the internet to construct exactly the types of interests you have regardless of whether you did so on their platforms. Facebook has a number of partner marketing firms that it shares data with and even has a tool called Facebook Pixel. Facebook Pixel is embedded into third party websites to track any number of user actions, how long they spend looking at certain components, and if users are revisiting pages. Facebook can even track users’ web browsing habits when users surf on the same browser they’re logged into Facebook on. But logging out is not enough. Facebook can still identify users from IP addresses, or when they use the “sign in with Facebook” option on a different website, or through comparing the email and phone number users share across the internet.

A common practice these ads take is steering consumers back to products they were interested in but eventually passed on. Competitors can also choose to place ads at these times. If the algorithms noticed consumers shopping for items at Target for example, Facebook could give Walmart the option to place ads to steer consumers to their site. Because Facebook knows so much about its users, it can even tell when users of a certain type are buying a specific product, and decide whether to show it to other similar users. The company doesn’t just collect data from other websites, it can use these same websites to continue pummeling users with targeted ads. Even Facebook’s own tools that were developed to combat the negative press it receives are mainly just to remove the way in which it uses data to suggest ads. It does not stop the collection process. All of this allowed by a ridiculous Terms and Conditions notification you likely clicked through. And this is just one company.
Recently, more companies have tried to limit the sharing of users’ data. In 2020, Apple announced that it would give iOS users more control over data sharing and ultimately the ability to cut off any sharing within the apps they use. This was expected to be a huge blow to ad companies, but both Google and Facebook stock prices are sitting at all time highs. This hints at exactly how ingrained these companies have become in all the experiences consumers use that the largest technology company in the world potentially cutting them off had no long term effect on their business outlook.
So the next time you see such a highly targeted ad, think about where the data is actually coming from.

My Data! On My Terms!

My Data! On My Terms!
By Anonymous | May 28, 2021

Elevator Pitch: It’s time to reclaim user data with innovative storage and distribution models. It’s time to provide more independent oversight on user data usage practices by enterprises.

Don’t you hate when you visit a website, you are forced to accept their cookies in the name of personalization and if you choose to not accept, you won’t have access to the majority of the websites? The cost of ad supported content is leaving our personal data in the hands of 3rd party website providers. Similarly, e-commerce websites force their users to leave their data in the name of customization and personalization.

Scariest part of this transaction is its perpetual nature – there are no expiration dates when you accept their cookies. Yes one could delete these cookies but how many know how to do it and how many actually take time to delete them? Companies could take the data and use it in various applications as part of their data science algorithms. Thanks to stringent privacy laws and regulations (such as GDPR in the EU and CCPA in California), companies anonymize your personal data but the individual is not benefiting from their personal data beyond their personalized experience.

User Leaves More Than Just Data

When a company captures user data, it is capturing more than just data or metrics.

  • User attention – User effort – User time – amount of time users have spent on their site which in turn generated the data the company has captured from the user. For example, the more time a user spends on a site like Facebook, the more user data generated for them to use to not only customize that user’s experience but also help in all other aspects of their business along with billions of other users.
  • User choices – User preferences – tells the company not only about the user but also how similar or dissimilar the user is with the rest of the site visitors and that knowledge is useful for the site owners. For example, Netflix not only recognizes users based on their geolocation, address or the device they use, but it also benefits from the movies the user watches by recommending to other viewers who have similar tastes or viewing habits. Basically crowdsourcing the recommendations that are personal to individual users.

So, in the end, the users also create value for the company on top of getting some minor reward as personalization.

Just giving a personalized experience in return for the user data is not enough. All the users should get more value from this interaction and the relationship with the website.

User Data Usage Needs Independent Auditing

Public companies have to hire independent auditors to make sure accounting practices are in compliance with IRS regulations, establish credibility, fraud prevention and process improvement. In this day and age when data is considered to be the life blood of digital organizations, why isn’t there independent auditing of user data usage to ensure proper usage. With such independent checks and balances, industry could avoid situations such as Cambridge Analytica influencing 2016 presidential elections using Facebook user data. Such independent oversight will help ensure “The Belmont Principles” and “The Common Law” are not violated by companies handling user data.

User Data Usage – Time to Hit The Reset Button

Users should have more control over their data and it is time we hit the reset button, change the rules of the game and in fact the entire playing field.

It’s time to provide more options for users to store their data. Here are few potential options for users to choose from:

  1. Personal Data Storage (PDS) – There are quite a few companies that have entered this space that offer more control over user data.
  2. Independent trusted custodians
  3. Identify storage providers by type or importance of the data – User’s don’t have to keep all their data in one place. For example, medical/health data can be stored in Independent trusted custodians whereas browsing data can be stored with PDS.

With users getting more control on where and how they store their data, they should be able to “approve/authorize” their data usage to companies for specific purposes/applications. Think of this as “AirBnB of User Data”.

User Data “approve/authorize” Models (All these are either paid or pro bono models)

  1. Use the data for user’s applications/use cases (Example – enhance my shopping experience – aka targeted ads)
  2. Use user data for helping the community or society (Example – research, medicine development and collective wisdom)

Reference Links

  1. Facebook and Cambridge Analytica Scandal – https://www.nytimes.com/2018/04/04/us/politics/cambridge-analytica-scandal-fallout.html
  2. The Belmont Principles – https://www.hhs.gov/ohrp/regulations-and-policy/belmont-report/read-the-belmont-report/index.html
  3. The Common Law –https://nij.ojp.gov/funding/common-rule
  4. Personal Data Store – https://medium.com/@shevski/are-personal-data-stores-about-to-become-the-next-big-thing-b767295ed842

How much of our secrets can machine learning models leak?

How much of our secrets can machine learning models leak?
By Anonymous | May 28, 2021

The field of Machine Learning (ML) has achieved meaningful progress in the last decade, especially with the introduction of low-cost computing and storage by cloud providers. Its popularity originates mainly due to current successful applications and promises that it will bring numerous social benefits to humanity.

ML models learn patterns from data used during their training process. The ‘learning’ is stored as parameters represented according to the model’s architecture and algorithm.

After that, the models can be used to respond to a variety of queries. Moreover, it is common that those models are published as a service (via API), where an adversary may have black-box access to them without any knowledge of the model’s internal parameters.

Given that process, an important question arises: how much sensitive information existing in training data can be leaked through this type of access?

A flawed model design and training the model may cause an ‘overfitting’ problem, which happens when the model corresponds too closely to a particular set of data, in this case, the training data. The more ‘overfitted’ the model, the easier it is for an adversary to perform attacks that seek to disclose sensitive information contained in the training data.

Among existing attacks are training data extraction­­, membership inference, and attribute inference attacks2. Those attacks have a limited scientific investigation, and the knowledge about their impact on individuals’ privacy is still to be fully understood.

Training Data Extraction­1

Language Model (LMs) are trained using massive datasets – think about a terabyte of English text – to generate a response that approximates a fluent language. In summary, those models ‘memorize’ the trained data, which may contain sensitive information, and create the opportunity for some of this sensitive information to be reflected in the model’s output. An attacker can explore this opportunity by providing hundreds, or millions, of questions to the model and examining the output.

Membership Inference7 8

This threat happens when, given a data record, an attacker can infer whether the record was part of the training dataset. One way this attack is performed is by exploiting the outputs (confidence scores) of the model, and the efficacy of the attack improves for models with ‘overfitting’ problems. Membership inference can have profound privacy implications, such as identifying a person as part of a group that has a particular disease or has been admitted to a hospital.

Attribute Inference

In this attack, the adversary tries to infer missing attributes of a partially known record used in the training dataset. Zhao et al4. experimentally concluded that “it is infeasible for an attacker to correctly infer missing attributes of a target individual whose data is used to train an ML model. That does not mean the same model is not vulnerable to membership inference attacks.

Data Scientists are responsible for applying privacy protection techniques to the training data to prevent later disclosure from adversaries’ attacks, such as Differential Privacy6. Additionally, they must adopt industry best practices to prevent ‘overfitting’ models to its training data, which significantly increases vulnerabilities to attacks, especially when the models are supposed to be accessible via APIs.

 

As crucial as exploring the technical details behind the attacks capable of leaking information from models and how to avoid it, it is the investigation of the privacy concerns that come from this threat. Those concerns add even more complexity to the Data Scientists’ responsibilities when using data, including adherence to fair information practices, such as those summarized by the Belmont Report3, when collecting, handling, using, and sharing data.

 

The analysis of the ML data extraction and inference attacks unveil direct application of all Belmont Report principles:

  •      Respect for Persons– despite the fact the Data Scientists are usually distant to the collection of data, and more importantly, from the individuals that contributed to the data, they must be sensitive to protecting the autonomy of those individuals and act according to their consent. Besides doing the best of their knowledge to avoid common vulnerabilities to attacks, the users need to be aware and consent to the risk of being a member of the dataset.
  •      Benevolence– even when protected from attacks, machine learning models can learn bias, racism, and other social challenges if not correctly designed and trained. That includes the exclusion or overrepresentation of particular groups. Therefore it is critical that Data Scientists feel obligated to protect individuals from harm while maximizing benefits.
  •      Justice– finally, the benefits yielded through training ML models should be inclusive to all and have a fair distribution.

Finally, ML models are vulnerable to disclosing sensitive information raises privacy protection and ethics concerns. More importantly, it interplays with data protection law and risks being classified as personal data under General Data Protection Regulation (GDPR).

There are important questions to be answered as machine learning become ubiquitous, and new threats of sensitive information disclosure appear:

What is the comprehensive list of obligations that Data Scientists need to comply with to adhere to the Belmont report principles and regulations such as GDPR? How to assess that those obligations were met?

Data Ethics In Launching McDonald’s All Day Breakfast

Data Ethics In Launching McDonald’s All Day Breakfast
By Daphne Yang | May 28, 2021

Before anyone asks, no. McDonalds all day breakfast isn’t back just quite yet. But as I wait for All Day Breakfast to come back like other McMuffin enthusiasts, I’ve become interested in how and why McDonald’s decided to start all day breakfast. It’s an interesting story especially to dig a little deeper and a fun example to better understand the principles of ethical research set forth by the Belmont Report.

At a Glance: The Belmont Report

To provide protections for future human subjects in healthcare research, the Commission for the Protection of Human Biomedical and Behavioral Research published the Belmont Report in 1979. The report outlined three main principles of ethical research in the wake of the tragic Tuskegee Syphilis study: respect for persons, beneficence, and justice.

  1. Respect for Persons

    To adhere to the principle of respect for persons, researchers must receive informed and voluntary consent from their subjects and allow them an opportunity to choose what they will allow and not allow to happen to them.

  2. Beneficence

    Beneficence is generally attributed to acts of kindness. As a principle of ethical research, beneficence requires researches to respect a subjects decisions, to do no harm, and maximize possible benefits while minimizing potential harms.

  3. Justice

    Justice refers to the equal distribution of benefits and burden to all subjects of a study. This principle works to ensure that ethical research does not disproportionately impact disadvantaged communities and allows each subject an equal share of risk and reward.

The birth of All Day Breakfast isn’t research in the traditional sense so how might these principles apply to the story of McDonald’s All Day Breakfast?

Beginnings of All Day Breakfast

In October 6th, 2015, McDonald’s released on Twitter that their fan favorite breakfast items would be available all day at every one of their 14,000 McDonald’s locations in the United States. The announcement was an ode to the thousands of die-hard and Twitter saavy McDonald’s breakfast fans, especially John Lee, who had championed the idea of all day breakfast as early as 2007.

What Does This Have To Do With Data Ethics?

While it may seem just out of the blue that McDonald’s decided to take John’s advice, this was a data driven and calcuated endeavor by McDonald’s. With the use of Twitter data, McDonald’s researched and studied a small sample of their customer base (334,000 tweets to be exact) to identify All Day Breakfast as the best way to drive customer engagement with their brand and their national franchise stores. Therefore, McDonald’s use of twitter data and study of their customers provides a good enough reason to examine their work through the Belmont Report lens.

Respect For Persons

McDonald’s primarily used a customer experience platform called Sprinklr to conduct their user behavior research. And while the the study was benign in nature, no active consent was sought from the Twitter users from which data was collected. While use of their tweets and data may have received “consent” because of Twitter’s blanket Terms of Service, this doesn’t mean that actual consent was given from the users. Even more so, McDonald’s should have received explicit consent to use John Lee’s original tweet to promote their products.

Grade: C

Beneficence

Beneficence in this case was met by McDonald’s. The aim of their research was to determine the next product to bring to the market and determining what areas of opportunity were best for the brand. And it seems as if McDonald’s did not violate the principle of beneficence in this version of their customer research.

Grade: A

Justice

While the benefits of All Day Breakfast were evenly distributed and accessible to all McDonald’s customers throughout the US (before the sudden halting of the program due to the coronavirus pandemic), it is critical to be cognizant of any potential burdens. In the months following the launch, customers flocked to McDonald’s to have mid-day McGriddles and hotcakes. Was there unequal distribution of benefits between McDonald’s and any subjects? (aka anyone who had interacted or written a tweet about All Day Breakfast). Well in John Lee’s case, there was some benefit for tweeting about all day breakfast all those years ago (shown below). Perhaps that’s a share of the benefit from McDonald’s to their loyal customers but I’d argue he deserved much more for starting the McDonald’s All Day Breakfast trend.

Grade: B

Why Is Any of This Important?

Ethics is such an important part of our lives and data ethics is especially important as our world moves to become more data dependent. Therefore understanding and actively seeing our world through this lens (even through somewhat silly examples) is an important part of being data literate. Hopefully, through this example, the fundamentals of ethics can be understood in a digestible and approachable way.

What We Talk About When We Talk About Data Bias and Feminism

What We Talk About When We Talk About Data Bias and Feminism
By Julie Lai | May 28, 2021

Data is oftentimes referenced in a way that is immune to bias and encompasses the ultimate objective truth. There is the ideology, otherwise known as positivism, that as long as enough data is collected, our understanding of the world can be explained from a value-neutral, transcendent view from nowhere. But in fact, this theory fails to acknowledge the complexities and nuances of unequal access to data that makes data science a view from a very specific somewhere. Who do we consider when we collect data? Who is excluded behind these numbers? Who is collecting the data? What are the consequences when people are excluded? Without biases and limitations being understood in data, misinterpretation and reinforced biases are the result.  

Admittedly, there was a point when I initially thought of data as inherently objective. However, author Caroline Criado Perez forced me to reconsider my biases after reading Invisible Women: Exposing Data Bias in a World Designed for Men. Criado Perez sheds light on the consequences that cost women their time, money, and even lives, when we make conclusions based on the biased data that exists and ignore the narratives behind the data that doesn’t exist.

Criado Perez underlines how gender-blindness in tech leads to a “one-size-fits-men” approach for supposedly gender-neutral products and systems. This can mean anything from the average smartphone being too large to fit for most women’s hands and pockets, to speech-recognition software trained on recordings of men’s voices, to restrooms and bus routes and office temperatures designed for men, to a higher percentage of health misdiagnoses for women, to cars being designed around the body of a “Reference Man”.

At first glance, some of these biases might not seem to have explicitly severe consequences, such as gender-biased smartphone designs. The average smartphone is roughly 5.5 inches, which can fit into the average man’s hand comfortably whereas the average woman would have to use two hands. While this design is at the least extremely annoying, it is also affecting women’s health. Women have been found to have a higher prevalence of musculoskeletal symptoms and disorders in studies that sex-aggregate and adequately represent women in their data. Similarly, the standard piano keyboard is designed for the average male hand, which affects both women’s level of acclaim and health. Studies have shown that female musicians suffer disproportionately from work-related injuries and keyboard players were among those who were most at risk.

Other biases, such as health misdiagnosis for women and cars designed around the body of a “Reference Man”, have much more explicit consequences for women. The term “Yentl Syndrome” describes when women often get misdiagnosed, mistreated, or told the pain is all in our heads when women present to their doctors with symptoms that differ from men’s. Not only are the consequences wildly frustrating, they can be lethal. In the UK, 50% of women are more likely to be misdiagnosed for a heart attack, because the symptoms we know and recognize are ‘typical’ male symptoms. Misdiagnoses continue to happen in part because some doctors are still trained on medical textbooks and case studies where trials typically use male participants. Furthermore, medications don’t always work the same way for men as they do for women. For example, a heart medication that was meant to prevent heart attacks actually became more likely to trigger one, depending on a certain point in a woman’s menstrual cycle. However, problems like these are overlooked because of the little research done testing drugs at different stages of the menstrual cycle. Both drug medications and car crash tests use a “Reference Man”, typically a white man in his 30’s representing the “standard” human. Because the “Reference Man” is used for all sorts of research involving the dose of drugs, women are often overdosing on medication. For car crash tests, the “Reference Man” is who cars are designed for. This means seatbelts are not designed for the female form, meaning women have to sit further forward. This results in women being 17% more likely than men to die if they’re in a car crash and 47% more likely to be seriously injured.

With biased data comes biased algorithms and biased policies, both of which only reinforce the information that it’s given. As data scientists, we have to consider what biases our algorithms reinforce when we blindly use data. Furthermore, it is not enough to look at data in the lens of feminism. We must look at data in the lens of race, and much more importantly, how the intersection of gender and race biases the data we use.

References:

How do you really feel about data privacy?

How do you really feel about data privacy?
By Angela Gao | May 28, 2021

Thankfully no, large corporations are not spying on your family members to find out what toothpaste they use so they can recommend it to you the next time you open up your favorite social media apps. However, what they and data brokers do have access to is all of your purchases, browsing history, in-app activity, and any other data we willingly give up the instant we check off an application’s terms of service. This seemingly innocuous information can be used to make pretty accurate estimations of your preferences and create product recommendations based on a constructed user profile. This doesn’t stay isolated either. These profiles can draw in the preferences of other users similar to you or likely to be in your circle. While you as an individual may not share very much information, at such a large scale and as a collective, this data can hold great power in directing customer behavior. All these “free” social media applications really aren’t as free as we like to think they are.

The Market for Data

We know that our personal information is highly desirable, but how exactly can we evaluate the economic value of our data? High levels of competition gives firms large incentives to invest in gathering consumer data. A lack of privacy protection creates a stark imbalance of power between data subjects (users) and data holders (corporations).

Excess protection creates market inefficiencies by preventing normal transaction behavior. For example, limiting the input of personal information will obfuscate the purpose of social media applications that seek to connect users. However, the same can be said for a total lack of privacy. Both principles exemplify a fundamental economic principle that there exists some “optimal” level of protection. This notion comes with an assumption that consumers are informed, educated, and long-term utility maximizers. The reality is that most users are short-sighted. We are highly likely to still consent to secondary use of our data, especially as a prerequisite to access products or services. This imbalance lets sellers hoard all of the positive surplus or benefit from these data transactions.

The Intangibles

Our privacy decision making is confusing and often paradoxical. While we value privacy and demand its protection, we also easily give it away for small perks. Weak privacy assurances give us the greatest satisfaction, but strong assurances somehow end up being worse than not having any at all. Privacy violations with small losses make us worry, but we overlook intrusions with significant impact.

Classical rational choice theory argues that individuals would maximize utility over time, collect relevant information, have consistent preferences, and accurately assess probabilities (Bayesian updating). In practice, we are faced with incomplete and asymmetric information. Data subjects know very little in comparison to the data holders regarding the scale and use of collected personal data. The lack of data literacy among regular consumers also means a shallow understanding of the associated consequences. Our bounded rationality makes it difficult to digest the complexities of modern data pipelines in lieu of face value cost-benefits.

Trade-offs around privacy commonly exist as trade-offs between complex bundles of goods which can have ambiguous valuations. Privacy-related benefits are rarely monetary and the marginal value becomes difficult for an individual to evaluate with each layer of complexity. Many times, we may not even know what the possible outcomes of a privacy sensitive scenario may be. For online purchases, we reveal our credit card details which increases the risk of identity theft. When purchasing groceries, we may use a membership card for coupons and points, sharing our buying history and risking junk or targeted ads. Rarely do we stop to think about what information we are sharing and assess the privacy risks.

The field of behavioral economics has done lots of work in expanding upon many of the assumptions economic models make regarding consumer behavior by considering individual and social psychology. The interconnected effects from our bounded rationality, scenario framing, heuristics, and implicit biases all influence how we compare choices in our risk evaluation and time value discount. This is key in understanding our tendency to trade-off privacy costs and benefits in a way that is inconsistent and damaging to initial or long-term plans for immediate gratification.

References https://twitter.com/RobertGReeve/status/1397032784703655938 https://www.heinz.cmu.edu/~acquisti/papers/Acquisti-Grossklags-Chapter-Etrics.pdf
https://www.behavioraleconomics.com/rationality-disclosure-and-the-privacy-paradox/

Accellion Data Breach: An Informed Consent Perspective

Accellion Data Breach: An Informed Consent Perspective
By Anonymous | May 28, 2021

As many UC students, alumni, and employees are aware, the cybersecurity attack on the Accellion file transfer appliance (FTA) has left many with compromising information sold to the dark web. This leaked information includes social security numbers, bank account information, addresses, and more. (See UC Berkeley’s statement on the attack here.)

With such sensitive information now putting many UC affiliates and their dependents at risk for identity theft, I find myself wondering what I could have done to prevent this. Could I have avoided sharing this data with UC Berkeley? Demanded that they use a different storage system just for me?The answer to each of these questions is, of course, no. Well, not necessarily no, but if I had done either of them I wouldn’t have had any kind of successful outcome.

A demand for a file storage or transfer system of my own choice is obviously not feasible. If students had the option of selecting what kind of storage system each they could opt to use, there would be no consistency or efficiency in data storage. On the other hand, I was not given the real choice to opt out of this system storing or transferring my data at all.

If I had refused sharing this data with UC Berkeley or with Accellion, I essentially would not have been able to attend Berkeley. UC Berkeley requires that information for enrollment, payment, and other mandatory attendance requirements.

The Belmont Report outlines principles for ethical treatment of research subjects. The first principle describes treating people with respect; usually this manifests as requiring informed consent, or ensuring people accept all of the terms of the use of their data and are entirely informed about the extent of its use when they consent.

Given the consequences of denying consent for use of our data, can we really say that the first principle of the Belmont Report was followed?

There are four primary principles of informed consent:
1. You must be able to make the decision.
2. The doctor/researcher must disclose information about the relevant procedure.
3. You must understand that information.
4. Consent must be given without coercion.

This final principle is what one could argue is violated in this situation. If the consequence to refusing consent to submit sensitive personal information like social security numbers is having to withdraw from the university, this can have severe consequences greater than may appear at face value. In modern day, a college degree is virtually essential to success in the post-grad world, and is for many families a way to break the cycle of poverty. Many students may not feel that they realistically have the option to refuse acceptance on the grounds of distrust of the data servers.

However, incidents like the Accellion data breach suggest that such distrust is not unfounded. If given the choice in the future, I would not trust my personal, sensitive data to systems like this. Given that the responses to the Undergraduate Student Experience Survey were also included in the data breach, I have chosen to not respond to the Graduate Student Experience Survey on the basis of my lack of trust that the responses will remain anonymized. I have no such freedom to withhold my consent of other data deemed essential to enrollment and payment.

 

European Union Addresses Potential Harmful AI

European Union Addresses Potential Harmful AI
By Frank Bruni | May 28, 2021

The European Union has proposed restrictions on certain uses of artificial intelligence to protect against unethical algorithms. The rules would limit use of artificial intelligence for everyday activities such as “self-driving cars, hiring decisions, bank lending, and scoring of exams.” The proposed rules are the first of its kind to slow down the massive gains in tech to ensure protection of European citizen’s rights. Although banning certain technologies is not a permanent solution, it buys time to make sure artificial intelligence is being used ethically in the long run.

To dive deeper into the ethical issues of artificial intelligence I will discuss the Belmont Report, a report created by the National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research. This report was the first of its kind to discuss the importance of ethics and human rights when experiments and technology are involved. The first principal proclaims an individual has a right to consent. Artificial intelligence is so widespread that this issue is nearly impossible to ensure consent to individuals. In 2016, Detroit began using facial recognition cameras at gas stations in “Project Green Light” and found that they wrongfully accused innocent people because of race. Citizens must be made aware of how artificial intelligence is being introduced into everyday lives and furthermore, they must consent to its use. As shown in Detroit, artificial intelligence has potential to do good, but until we consider the possible downfalls we’ll be left with wrong doings like in Detroit.

The Belmont Report also describes the principal of beneficence. This requires that the large companies using artificial intelligence does no harm to the individuals. Currently artificial intelligence has a lot of proposed benefits including automizing many techniques. For example, Amazon previously built an algorithm used to automize their hiring process. This algorithm solely benefited Amazon, and in fact, the system “discriminated against women.” This was not done with intent, and shows that artificial intelligence can be harmful if not tested thoroughly before being implemented. It is crucial for artificial intelligence to do no harm to individuals as outlined in the Belmont Report. The European Union’s proposed restrictions on artificial intelligence are necessary to push companies to ensure ethical use of algorithms.

Another important point implied by the Belmont report is that to be ethical the benefits by artificial intelligence should not be one sided. Although companies should be held to this responsibility it is hard to enforce since companies are the ones utilizing artificial intelligence, not individuals. For example, Uber used artificial intelligence to increase their profits from drivers by increasing driver supply in specific locations. Uber did this by implementing goal based incentives for the driver to motivate them to drive longer hours (Scheiber). While the drivers believed they were making more money and reaching goals, Uber was actually using artificial intelligence to siphon profits from the drivers up to corporate. Given the immense potential for artificial intelligence, along with this comes the possibility of harm. The European Union is making the safe decision by placing restrictions on the use of artificial intelligence, buying time to ensure algorithms are implemented ethically on accordance of the Belmont Report.

The European Unions proposed changes will address harmful artificial intelligence by requiring companies to inform users when trying to use artificial intelligence on them. This includes trying to detect emotion and classifying people based on biometric features. Other requirements include risk assessments, and human oversight. These proposed changes are a step in the right direction to ensure artificial intelligence stays ethical and does no harm. Regardless, technology must find a way to grow and expand without harming the individual.

References

Satariano, Adam. “Europe Proposes Strict Rules for Artificial Intelligence.” The New York Times, The New York Times, 21 Apr. 2021, www.nytimes.com/2021/04/16/business/artificial-intelligence-regulation.html.

Hill, Kashmir. “Wrongfully Accused by an Algorithm.” The New York Times, The New York Times, 24 June 2020, www.nytimes.com/2020/06/24/technology/facial-recognition-arrest.html.

Hamilton, Isobel Asher. “Amazon Built an AI Tool to Hire People but Had to Shut It down Because It Was Discriminating against Women.” Business Insider, Business Insider, 10 Oct. 2018, www.businessinsider.com/amazon-built-ai-to-hire-people-discriminated-against-women-2018-10.

Scheiber, Noam. “How Uber Uses Psychological Tricks to Push Its Drivers’ Buttons.” The New York Times, The New York Times, 2 Apr. 2017, www.nytimes.com/interactive/2017/04/02/technology/uber-drivers-psychological-tricks.html.

Implications of Advances in Machine Translation

Implications of Advances in Machine Translation
By Cathy Deng | April 2, 2021

On March 16, graduate student Han Gao wrote a two-star review of a new Chinese translation of the Uruguayan novel La tregua. Posted on the popular Chinese website Douban, her comments were brief, yet biting – she claimed that the translator, Ye Han, was unfit for the task, and that the final product showed “obvious signs of machine translation.” Eleven days later, Gao apologized and retracted her review. This development went viral because the apology had not exactly been voluntary – friends of the affronted translator had considered the review to be libel and reported it to Gao’s university, where officials counseled her into apologizing to avoid risking her own career prospects as a future translator.

Gao’s privacy was hotly discussed: netizens felt that though she’d posted under her real name, Gao should have been free to express her opinion without offended parties tracking down an organization with power over her offline identity. The translator and his friends had already voiced their disagreement and hurt; open discussion alone should have been sufficient, especially when no harm occurred beyond a level of emotional distress that is ostensibly par for the course for anyone who exposes their work to criticism by publishing it.

Another opinion, however, was that spreading misinformation should carry consequences because by the time the defamed party could respond, often the damage was already done. Hence, the next question was: was Gao’s post libelous? Quality may be a matter of opinion, but machine translation came down to integrity. To this end, another Douban user extracted snippets from the original novel and compared Han’s 2020 translation to a 1990 rendition by another translator, as well as to corresponding outputs from DeepL, a website providing free neural machine translation. This analysis was conducive to two main conclusions: that Han’s work was often similar in syntax and diction to the machine translation, more so than its predecessor; and that observers agreed that the machine translation was, in some cases, superior to its human competition. The former may seem incriminating, but the latter much less so: after all, if Han had seen the automated translation, wouldn’t she make it better, not worse? Perhaps similarities were caused merely by lack of training (Han was not formally educated in literary translation).

Researchers have developed methods to detect machine translations, such as assessing similarity between the text in question and its back-translation (e.g. translated from Chinese to Spanish, then back to Chinese). But is this a meaningful task for the field of literary translation? Machine learning has evolved such that models are capable of generating or translating text to be nearly indistinguishable from, or sometimes even more enjoyable than, the “real thing.” The argument that customers always “deserve” fully manual work is outdated. And relative to the detection of deep fakes, detecting machine translations is not as powerful in combating misinformation.

Yet I believe assessing similarity to machine translation remains a worthwhile pursuit. It may never be appropriate as a measure of professional integrity because the times of being able to ascertain whether the translator relied on automated methods are likely behind us. Similar to the way plagiarism detection tools are disproportionately harsh on international students, a machine detection tool for translation (currently only 75% accurate at best) may unfairly punish certain styles or decisions. Yet a low level of similarity may well be a fine indicator of quality if combined with other methods. If even professional literary translators might flock to a finite number of ever-advancing art machine translation platforms, it is the labor-intensive act of delivering something different that reveals the talent and hard work of the translator. Historically, some of the best translators worked in pairs, with one providing a more literal interpretation that the other then enriches with artistic flair; perhaps algorithms could now play the former role, but the ability to produce meaningful literature in the latter may be the mark of a translator who has earned their pay. After all, a machine can be optimized for accuracy or popularity or controversy, but only a person can rejigger its outputs to reach instead for truth and beauty – the aspects about which Gao expressed disappointment in her review.

A final note on quality: the average number of stars on Douban, like other review sites, were meant to indicate quality. Yet angry netizens have flooded the works of Han and her friends with one-star reviews, a popular tactic that all but eliminates any relationship between quality and average rating.

References

How private are Zoom meetings?

How private are Zoom meetings?
by Gerardo Mejia | April 2, 2021

This topic caught my attention, especially after the pandemic, because I see people using Zoom to replace human interaction more and more every day. Zoom is used throughout the day for multiple things including work, education, and personal meetings. At first, I thought that privacy issues were mostly limited to personal meetings, but I later learned that there are privacy concerns in both in education and in the workplace.

Personal Meetings

My initial interest in the topic was due to my observations of people using Zoom for things like birthday parties, bridal showers, baby showers, and other non-traditional uses of Zoom. I became interested on whether Zoom itself monitors or listens in on those calls. I was convinced that somewhere in their privacy policy it would state some type of loophole that would allow them to listen in on calls for the purposes of troubleshooting or ensuring the service was working. I was a bit disappointed, and relieved when I read that meetings themselves are considered “Customer Content” and that the company did not monitor, sell or use the customer content in any purpose other than to provide it to the customer.

However, there was a small, although not too obvious loophole. Zoom considers this “Customer Content” to be under the user’s control, including its security, and thus it cannot guarantee that unauthorized parties will not access this content. I came to find out later that this is a major loophole that has been exploited in many instances. Although Zoom doesn’t take responsibility for this, there are many people that blame the company for not upgrading its security features. This all means that somebody would have to hack their way into my family’s private meeting in order to listen in. I believe that for most family gathering meetings the risk of this happening is not very high, so I would say it is safe to say that most family gathering zoom meetings are private as long as they are not the target of a hacker.

Education

I had initially thought that the education field was not heavily affected by zoom’s privacy or security issues. After all, most educators have trouble getting all their students to attend, and who is going to want to hack into a class? I was wrong about that too. The most notorious example occurred in China where Zoom assisted the Chinese government in censoring content that it did not agree with. It is also important to note that in addition to class content, schools also have other types of meetings that are more private in nature that put some sensitive information at risk like grades or school records. These could also become target of malicious hackers. In conclusion, while censorship may not be a large issue in the United States, there are some countries where this is a real issue.

Workplace

I remembered that Zoom is in my companies’ prohibited software list. I also learned that most tech companies have also banned their employees from using Zoom for work. I initially thought that this was due to Zoom’s privacy policy or terms of use policy allowing Zoom employees to listen in and thus making the meetings not secure enough as there could be a third-party listening in. It turns out that Zoom’s privacy policy states that they will not listen in or monitor in the meetings. However, like with personal meetings and education meetings, it is up to the company to secure its meetings and Zoom cannot guarantee that unauthorized users will access the content. This security issues make it so that Zoom cannot be held responsible if a company’s meeting is hacked and the meeting accessed by an unauthorized user. Companies are targeted by hackers all the time, so the risk, especially for high profile companies, of their zoom meetings being hacked is large.

https://zoom.us/privacy
https://www.techradar.com/news/more-top-companies-ban-zoom-following-security-fears
https://www.cnbc.com/2019/07/14/zoom-bug-tapped-into-consumers-deepest-fears.html
https://www.insidehighered.com/news/2020/06/12/scholars-raise-concerns-about-using-zoom-teach-about-china
https://nypost.com/2020/04/10/chinese-spies-are-trying-to-snoop-on-zoom-chats-experts-say/