February 2022 – Data Science W231 | Behind the Data: Humans and Values

February 23, 2022

Data Privacy and Shopping

Data Privacy and Shopping
By Joseph Issa | February 23, 2022

Data plays an essential role in our daily lives in the digital age. People shop online and provide several personal information such as email, name, address, and others. To be competitive in the data science world, we should take a deep look into users’ data privacy. For example, we are training a module on sensitive patient data to predict diabetes while keeping patients anonymous. Online social media websites (Facebook, Twitter, and others) are accustomed to collecting and sharing users’ data. In 2018, the European Union introduced the General Data Protection Regulation (GPPR), which includes a set of regulations to protect the data of European citizens. Any online service with servers in the EU must comply with this regulation. Several important key points from GDPR include having a private office in a company that serves more than 250 employees or dealing with sensitive data. Facebook faced massive penalties for not complying with GDPR.

Source: The Stateman

Everything about us, the users, is data; it is how we think, what we eat, dress, and own. Data protection laws are not going anywhere, and we will be seeing more laws in the coming years. The key is how to preserve users’ data while training the module on this sensitive data. For example, Apple started rolling out privacy techniques in their operating system. They can anonymously collect users’ data and train modules to improve users’ experience. Another example is Google, which collects anonymous data in chrome and in maps to help predict traffic jams. Numeria, for example, allows data scientist around the world to train their modules on encrypted financial data that keeps the client data private.

There are different techniques to develop prediction models while preserving users’ data privacy. Let’s first look at one of the most notorious examples of the potential of predictive analytics. It’s well known that every time you go shopping, retailers are taking note of what you buy and when you’re buying it. Your shopping habits are tracked and analyzed based on what time you go shopping if you use digital coupons vs. paper coupons, buy brand name or generic, and so much more. Your data is stored in internal databases where it’s being picked apart to find trends between your demographics and buying habits.

Stores keep the data for everything you buy; that is how registry stores know what coupons to send to customers. The shopping cart keeps a record of all the purchases made at a given shop. Target, for example, figures out when a teen was pregnant before her family even knew. Target sophisticated prediction algorithms were able to guess when a women shopper is pregnant based on a selection of 25 items that pregnant women buy, among the vitamins, zinc, magnesium, extra-large clothing, and others. Target can predict if a woman is pregnant before anyone else close to her based on this data. Target started targeting the lady with baby coupons at her home address, where she lived with her parents. Her father asked why his daughter was receiving baby coupons in the mail. It turned out that his daughter was pregnant and had told no one about it. The objective for Target store is to make future moms a third primary store, but in doing that, they violated the privacy of their customers.

Source: Youtube.com

The bottom line is target wants to figure out who is pregnant before they look pregnant, which is hard to distinguish them from other customers who are not pregnant. The reason behind that is that when someone is pregnant, they are potential goldmines in shopping for new items they don’t usually buy before they get pregnant. It is terrifying that a company knows what is going on inside your body or house without telling them. After this issue was broadcasted on different new media, Target decided to shut down the program, including the pregnancy prediction algorithm.

Target could have camouflaged the coupons with other regular coupons, so it won’t look clear to the person receiving the coupons in the house that their daughter is pregnant. Instead, they can include coupons or ads for wine-related products, for example, or other food items. This way, they purposely hide the baby-related coupons to slip the baby-related coupon to people’s homes in a way it is not suspected to them.

Another online shopping data breach incident happened with Amazon. Amazon’s technical error accidentally exposed users’ data, including names, emails, addresses, and payment data. The company denied that this incident was a breach or a hack, given the outcome is the same.

Conclusion
In a digital economy, data is of strategic importance. With many online activities such as social, governmental, economic, and shopping, the flow of personal data is expanding fast, raising the issue about data privacy and protection. Legal frameworks that include data protection, data gathering, and the use of data should be in place to protect users’ personal information and privacy.

Furthermore, companies should be held accountable for mishandling users’ data with confidentiality.

References:
Solove, Daniel J. 2006. “A Taxonomy of Privacy.” University of Pennsylvania Law Review 154:477–564. doi:10.2307/40041279
Castells, Manuel. 2010a. The Power of Identity, 2nd ed. Vol. 2, The Information Age: Economy, Society, and Culture. Malden, MA: Wiley-Blackwell.
Castells, Manuel. 2010b. The Rise of the Network Society, 2nd ed. Vol. 1, The Information Age: Economy, Society, and Culture. Malden, MA: Wiley-Blackwell.

February 23, 2022

Why representation matters in AI development?

Why representation matters in AI development?
By Mohamed Gesalla | February 23, 2022

A recent genocide took place against Rohingya in Myanmar, a country that is formerly known as Burma. Rohingya is a stateless Muslim minority in Myanmar’s Rakhine state that are not recognized by Myanmar as citizens or one of the 135 recognized ethic groups in the country. The resulting causality toll of the humanitarian crises reached a staggering 10,000 people and more than 700,00 Rohingya fleeing to Bangladesh. The US Holocaust Memorial Museum found these numbers along with other evidence compelling to conclude that Myanmar’s military committed ethnic cleansing, crime against humanity and genocide against the Rohingya. The government in Myanmar partaking in these crimes was not the supersizing reality as the succeeding military regime that ruled Myanmar failed to address ethnic minority grievances or to provide security for these communities for many years, which has led to arm race and created powerful non-state armed groups (Crisisgroup2020).

In 2010, the new predominantly Buddhist government introduced wide-ranging reforms towards political and economic liberalization but continued to discriminate against the Rohingya. Since the reforms, the country has seen an increase in Buddhist nationalism and Anti-Muslim violence (HumaRightsCouncil2018). As a result of economic liberation reforms constituted by the government in 2011, the telecommunication sector in Myanmar witnessed an unprecedented drop in SIM-card prices going from $200 to $2. This explosion of the internet allowed more than 18 million people out of 50 million to have access to Facebook, compared to 1.1% of the population having access to the internet in 2011.

Facebook became the main source of information in a country that was emerging from a military dictatorship, ethnic division, and a population with no proper education on spread of misinformation through the internet and social media platforms. All these created a fertile environment for widespread hate speech against the Muslim minority and especially the Rohingya. Facebook was the surprise that exacerbated the crises in Myanmar, it was used as a platform not only by Buddhist extremist groups but by authorities fueling these violent groups as found in this post “every citizen has the duty to safeguard race, religion, cultural identities and national interest” (HumaRightsCouncil2018).

Facebook and other social media platforms use sophisticated machine learning and AI driven tools for hate speech detection, however; these systems are reviewed and made better by content reviewers who have deep understanding of certain cultures and languages. Even though Facebook, now known as Meta, was warned by human rights activists that its platform was being used to spread Anti-Muslim hate speech, but Facebook didn’t take any actions because the company couldn’t deal with the Burmese language. As of 2014, the social media giant had only one content reviewer who spoke Burmese. Facebook’s statement reviled that their Deeptext engine failed to detect hate speech, their workforce does on represent the people they serve, and evidently, they do not care about the disastrous impacts of their platform. The Facebook (Meta) – Myanmar incident is only a prime example of many catastrophic impacts that biases and lack of representation in technology development can cause.

In recent years, new technology development across different industries have been adapting AI techniques to optimize and make systems more advanced and sophisticated. As this dependency on AI continues, all segments of society will use AI in one form or another. To ensure the development of technology with minimal biases and defects as possible, there need to be fair representation of all sectors served. Even though there have been efforts at the federal government level to push corporation to diversify their workforce, there need to be more thorough policies. For example, a tech company might have a diverse workforce, but most of the minority groups works in the factory and only a few works in design and executive positions. From a policy perspective, the government needs to go beyond just requiring corporation to meet certain diversity numbers to being specific that those numbers need to be met at the group level inside the organization.

Companies often define diversity as the diversity of thoughts which is valid, but it cannot be segregated from the diversity of religion, gender, race, socioeconomic status, etc. Diversity and representation benefits are not restricted on meeting the needs of the consumers as it has direct benefits on corporations as well. Research found that companies with the most gender and ethnic/cultural diversity on executive teams were ‘33% more likely to have industry-leading profitability (medium).

Conclusion:
There is no doubt that technology has made our lives better in many ways. As it might be a mean for corporation to make profits, there is still a mutually beneficial incentive for consumers and companies to diversify the work force. Diversity and representation ensure the development of technologies that serve all segments of society, minimize discrimination and biases, prevent tragedies, and increase profitability for corporations. Individuals of certain communities know their problems and needs best, and they must be included in the conversation and the decision making. Policy makers and businesses have an obligation to ensure the inclusion of the people served. In my opinion, representation is an effective way to avoid or at least minimize the risk of technologies contributing to another genocide.

References:
https://www.atlanticcouncil.org/blogs/new-atlanticist/now-is-the-time-to-recognize-the- genocide-in-burma/
https://varian- my.sharepoint.com/personal/cns6848_varian_com/Documents/Desktop/www.crisisgroup.org/asi a/south-east-asia/myanmar/312-identity-crisis-ethnicity-and-conflict-myanmar
https://www.ohchr.org/Documents/HRBodies/HRCouncil/FFM-Myanmar/A_HRC_39_64.pdf https://www.google.com/search?q=myanmar+genocide&rlz=1C1SQJL_enUS919US919&sxsrf=AP
https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret- ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G
https://www.google.com/search?q=spread+of+hate+speech+through+facebook&tbm=isch&ved=2 ahUKEwi55Yf-vID2AhViP30KHbS8C3sQ2- cCegQIABAA&oq=spread+of+hate+speech+through+facebook&gs_lcp=CgNpbWcQAzIHCCMQ7wMQ J1DVjARYxp8EYJSiBGgCcAB4AIAB5wGIAY8IkgEFMC42LjGYAQCgAQGqAQtnd3Mtd2l6LWltZ8ABAQ &sclient=img&ei=BfcKYvnOJ-L-9AO0- a7YBw&bih=714&biw=1536&rlz=1C1SQJL_enUS919US919#imgrc=Y7NkBHg1KnNU8M
https://medium.com/groupon-design-union/diversity-in-product-development-is-always- important-even-when-selling-coupons-7138cf6946f8

February 23, 2022

Doctors See the Patient – AI Sees Everything Else

Doctors See the Patient – AI Sees Everything Else
By Andi Morey Peterson | February 23, 2022

As a woman who has never fit into the ideal mold of what it means to be a physically healthy 30-something female, I am quite excited about the prospects of what machine learning can bring to the healthcare industry. It took me years, nearly a decade, to find and interview physicians to find the one that would not just focus on the numbers and that would take my concerns seriously as a woman.

I had all too often felt the gaslighting many women experience when seeking health care. It has been reported often that women’s complaints get ignored easier or pushed off as “normal”. We get seen as being anxious and emotional and it has been proven we wait longer to receive relief when expressing pain”[1]. Black women and other minorities have it even worse. For example, black women die at three times the rate as white women during pregnancy[2]. They are recommended fewer screenings and prescribed less pain medication. Knowing this, the question now is: can machine learning help doctors correctly diagnose patients while ignoring their bias? Can they be more objective?

What we can look forward to:

Today, we are seeing the results of machine learning in our health care systems. Some emergency rooms are using AI to scan in paperwork saving clerical time and using NLP to document conversations between doctors and patients. And researchers are building models that can use computer vision to better detect cancer cells[3]. While all of this is very exciting, will that truly improve patient care?

We want to fast forward to the days where social bias will decrease as more machine learning algorithms are used to help doctors make decisions and diagnoses. Especially as gender becomes more fluid, algorithms will be forced to look at more features than what the doctor sees in front of them. In a way, a doctor, with their bias, sees the patient and their demographics, but the algorithms can see everything. In addition, as more algorithms are released, the more doctors can streamline their work, thus decreasing their errors and reducing the amount of paperwork.

We must remain diligent:

We know that with these solutions we must be careful. Most solutions will not apply to all patients and some solutions simply don’t work no matter how much training data we throw at it. IBM Watson’s catastrophic failure to even come close to real physician knowledge is a good example[4]. It saw only the symptoms, it didn’t see the patient. Worse, unlike other simple ML models, such as Jeopardy (which Watson dominated), what is considered “healthy” is often disagreed upon at the highest level of doctors[5]. The industry is learning this is the case and is heavily focused on fixing these issues.

However, if one of the goals of AI in healthcare is to remove discrimination, we ought to tread lightly. We cannot just focus on improving the algorithms and fine-tuning the models. Human bias has a way of sneaking into our artificial intelligence systems even when we have the intention of making them blind. We have witnessed it with Amazon’s recruiting systems being biased against women and facial regnonitions systems being biased against people of color. In fact, we are starting to see it in previously released models in predicting patient outcomes[5]. We must feed these models with more accurate and unbiased data; this will be the only way we can make sure we can get the best of both worlds. Otherwise, society will have to reckon with the idea that AI can make healthcare disparities worse, not better. With the Belmont Principle of beneficence, we must maximize the benefits and minimize potential harms and that should be at the forefront of our minds as we expand AI in healthcare[6].

My dream of an unbiased AI to handle my health care is not quite as close as I had hoped. My search for a good doctor will continue. In the future, the best doctors will use AI as a tool in their arsonal to help decide what to do for a patient. It will be a practice of art, knowing what to use and when and more importantly knowing when their own biases are coming into play so that they can treat the patient in front of them and so they can be sure the data fed into future models isn’t contaminating the results. We need the doctor to see the patient and AI to see everything else. We cannot have one without the other.

References:
[1] Northwell Health. (2020) Gaslighting in women’s health: No, it’s not just in your head
[2] CDC. (2021) Working Together to Reduce Black Maternal Mortality | Health Equity Features | CDC
[3] Forbes. (2022) AI For Health And Hope: How Machine Learning Is Being Used In Hospitals
[4] Goodwins, Rupert. The Register. (2022) https://www.theregister.com/2022/01/31/machine_learning_the_hard_way/
[5] Nadis, Steve. MIT. (2022) The downside of machine learning in health care | MIT News
[6] The Belmont Report. (1979) https://www.hhs.gov/ohrp/sites/default/files/the-belmont-report-508c_FINAL.pdf

February 23, 2022February 24, 2022

Children’s Online Privacy. Parents, Developers or Regulators Responsibility?

Children’s Online Privacy: Parents, Developers or Regulators Responsibility?
By Francisco Miguel Aguirre Villarreal | February 23, 2022

As a parent of four daughters under 10 and user of social media and gaming platforms, I am in awe of the current trends, viral videos and what kids (not just minors) are posting nowadays. And looking at it, I constantly ask myself about what the girls will be expose to and socially pressure to do, see or say on their teen and pre-teen phase.

This concern its not only mine and it’s not exclusive to this point in time. In the 90’s, legislators and social organizations drafted ideas to protect children from misuse and harms in websites. Those ideas ended up in Congress passing the 1998 Children’s Online Privacy Protection Act (COPPA). It basically states that websites aimed at children under the age of 13 must get parental consent along with other rights to protect their privacy and personally identifying information.

But, because of the additional burden that COPPA compliance carries, most apps and sites not designed for children, social media at the top, prefer to direct their services to adults by adding to their privacy statements clauses to ban or mention that the services are intended for +13, +16 or +18.

The foregoing creates two different main problems, a) for children intended apps or sites, COPPA’s parental consent doesn’t always work, since Children will not wait for parent approval and can falsify approval or search for less restrictive sites, which opens the door for them to access inappropriate content or share data that can be used for advertising or targeting them for purposes that can be harmful, and, b) Social media and other apps not intended for children are still being used by them just by lying about their age to enroll, without any parental consent or, in many cases, without parents even knowing about it. Since apps and sites assume that they are not children, they will not be treated as such and can leave them completely exposed to identity theft, predators or a handful of potential risks, on the other hand parents doesn’t even know their children enrollment so they can’t do anything to protect them and finally the law, as progressive as it may be, will just be, in most cases, reactive for an event that already occur.

Among the findings of the 2020 report “Responding to Online Threats: Minors’ Perspectives on Disclosing, Reporting, and Blocking” conducted by Thorn and Benenson Strategy Group was that 45% of kids under the age of 13 already use Facebook and 36% of them reported experienced a potentially harmful situation online. And this is just Facebook, the numbers are not considering Snapchat, Tik Tok, Instagram and many others available in the market in addition to what may appear in the future.

So, who is responsible for children’s online privacy? First, parents have the responsibility of communicating with their children to make them aware of the risks they are exposed to and monitoring their online activity to early detect potential harms. This might sound easy but finding the balance between surveillance for protection and spying on their children might not come as easy. Still, it’s a necessary or even mandatory task for parents to protect them and informed them. Secondly, legislators and regulators should have a data base of complaints and confirmed cases to gradually typify and incorporate them into the applicable laws, both for developers and perpetrators. These constant updates can modernize children’s online privacy legislation and make it more proactive than reactive. Third, developers must create minimum ethics standards within companies, communicate possible harms on an easy readable format and inform presented cases to children and parents for easy understanding of potential harms. If social organizations, developers, legislators and regulators work together on regulations and principles to protect minors, it would be more fluid, efficient and, above all, safe for children, understanding that the final responsibility will always be carried by parents. Let’s work together help parents minimize that burden and better protect children.

Works Cited:
Children’s Online Privacy Protection Act (COPPA) of 1998
https://www.ftc.gov/enforcement/rules/rulemaking-regulatory-reform-proceedings/childrens-online-privacy-protection-rule

16 CFR Part 312, the FTC’s Children’s Online Privacy Protection Rule
https://www.govinfo.gov/app/details/CFR-2003-title16-vol1

Thorn, Benenson Strategy Group, Responding to Online Threats: Minors’ Perspectives on Disclosing, Reporting, and Blocking, 2021
https://info.thorn.org/hubfs/Research/Responding%20to%20Online%20Threats_2021-Full-Report.pdf

Photos
Aguirre Blogphoto1

Aguirre Blogphoto2

February 23, 2022

Biometric Data: Don’t Get Third-Partied!

Biometric Data: Don’t Get Third-Partied!
By Kayla Wopschall | February 23, 2022

In 2020, Pew Research estimated that one in five Americans regularly uses a Fitness Tracker, putting the market at $36.34 billion dollars. Then you have Health Applications, where you can connect your trackers and get personalized insights into your health, progress towards goals, and general fitness.

With so much personal data held on one little application on your phone or computer, it is easy to feel like it is kept personal. But the Fitness Tracker and Health Applications have your data and it is critically important to understand what you’ve agreed to when you quickly click that “Accept User Agreement” button when setting up your account.

Biometric Data – How Personal Is it?

Fitness Trackers collect an incredible amount of Biometric Data that reveals very personal information about your health, behaviors, and even exact GPS coordinates with timestamps. From this, it is easily possible to analyze someone’s health, lifestyle, and patterns of movement in space… in fact, many Health Applications that read in your Fitness Tracker data are designed to do just that. Provide you patterns of your behavior, allow you to share things like bike routes with friends, and evaluate ways for you to improve your health.

But what do the companies do with this data? It can feel like it is used just for you and your fitness goals, the service that they’re providing… in reality, this data is used for much more. And may be provided to third-parties… in other words, other companies you have not directly consented to.

What is required in a Privacy Policy?

In 2004, the State of California was the first to implement a law, the California Online Privacy Protection Act, or CalOPPA for short, requiring that a Privacy Policy be posted and easily accessible online for all commercial organizations. Because of the nuances of organizations typically doing state business with individuals, the policies apply to all California residents and visitors.

CalOPPA highlights the following basic goals for organizations that collect personally identifying information (PII):

1. Readability – they should use common and understandable language in an easy to understand format.
2. Do Not Track – should contain a clear statement about how/if the device/app tracks you online, and state clearly whether other parties may be collecting PII while you use their service.
3. Data Use and Sharing – Explain the use of your PII, and when possible provide links to the third parties with who your PII is shared with.
4. Individual Choice and Action – Make it clear what choices you as a user has regarding the collection, use, and sharing of PII.
5. Accountability – Have a clear point of contact if you as a user have question or concerns regarding privacy policies and practices.

The implementation of CalOPPA has greatly improved the accessibility and understandability of Privacy policies. However, improvements are needed for third-party data sharing.

Privacy Policies should have a clear explanation of how, if, and when data is shared or sold to a third-party (e.g. another company). There are some protections in place that require companies to aggregate (combine) data and make it anonymous so that any individual can not be identified from the data shared. However, this can be extremely difficult to achieve in practice.

For example, if a third-party purchases data from a health application or fitness tracker, which does not contain personal identifying information like a name or address, the same third-party could then purchase the missing data from a food delivery service that does, making it easy to determine identity.

It is overwhelming to think of all the ways biometric data can travel throughout the web and how it might be used to market, discriminate, and/or monitor individuals.

The first step to keeping data safe is understanding the policies you’ve opted into regarding third-party sharing. Decide if this is a policy you feel comfortable with, and if not, take steps to request the removal of your data from the platform. You do have the right to both fully understand how companies use your data and make more informed choices when clicking that Accept User Agreement button.

February 23, 2022

Social Media Screening in Recruiting: Biased or Insightful?

Social Media Screening in Recruiting: Biased or Insightful?
By Anonymous | February 23, 2022

Introduction
Did you know that your prospective employer may screen your social media content before initiating a conversation or extending a job offer to you? You may not know that a grammatical error in your social post could make your prospective employer question your communication skills, or an image of you drinking at a party could make them pass on your resume.

Social media does not work like a surveillance camera and it does not show a holistic view of someone’s life. People post content on social media selectively, which may not reflect who they are and how they behave in real life. Regardless of our views on the usage of social media for background screening, social intelligence is on the rise.

Social Intelligence in the Data Era
If you google the phrase “social intelligence”, the first definition you may see is the capacity to know oneself and to know others. If you keep browsing, you will eventually see something different stand out prominently:

Social intelligence is often used as part of the social media screening process. This automates the online screening process and gives a report on a candidate’s social behavior.

The internet holds no secret. In this data era, the capacity to know oneself and to know others has expanded more than ever. According to a 2018 CareerBuilder survey, 70% of employers research job candidates as part of the screening process and 57% decided not to move forward with the candidates due to what they found. What’s more surprising is that the monitoring does not stop once the hiring decision has been made. Some employers continue to monitor employees’ social media presence even after they are hired. Almost half of employers indicated that they use social media sites to research current employees. About 1 in 3 employers have terminated an employee based on the content they found online.

Biases from Manual Social Media Screenings
LinkedIn, Facebook, Twitter and Instagram are commonly screened platforms. You may question if it is legal to screen candidates’ online presences in the hiring process. The short answer is, yes, as long as employers and recruiting agencies comply with the laws such as Fair Credit Reporting Act and the Civil Rights Act in the recruiting and hiring process.

While there are rules in place, complying with the rules may not be easy. Although employers and recruiting agencies should only flag inappropriate content such as crime, illegal activities, violence and sexually explicit materials, social media profiles contain a lot more information that does not exist on the candidates’ resumes. Federal laws prohibit discrimination based on protected characteristics such as age, gender, race, religion, sex orientation, or pregnancy status. However, it is almost impossible to avoid seeing the protected information when navigating through someone’s social media profile. In an ideal world, recruiters should ignore the protected information they have seen and make an unbiased decision based on the information within the work context, but is that even possible? A new research revealed that seeing such information tends to impact recruiters’ evaluations on candidate’s hire-ability. In the study, the recruiters reviewed the Facebook profiles of 140 job seekers. While they clearly looked at the work-related criteria such as education, their final assessment was also impacted by prohibited factors such as relationship status and religion – married and engaged candidates got higher ratings while those who indicated their beliefs got lower ratings.

Some people may consider deleting their social media accounts so nothing can be found online, but that may not help you get a better chance with your next career opportunity. According to the 2018 CareerBuilder survey, almost half of employers say that they are less likely to give a candidate a call if they can’t find the candidate online.

Social Intelligence: Mitigate the Biases or Make It Worse?
There has been heated debate over whether social media usage in the hiring process is ethical. While it seems to increase the efficiency of screening and help employers understand the candidates’ personality, it’s hard to remain unbiased when being exposed to a wide variety of protected information.

Could social intelligence help mitigate the biases? Based on the description of social intelligence, we can see that it automates the scanning process which seems to reduce human biases. However, we don’t know how the results are reported. Is protected information reported? How does the social intelligence model interpret grammatical errors or a picture of someone drinking? Are the results truly bias-free? Only those who have access to those social intelligence results know the answer.

February 18, 2022February 24, 2022

Surveillance in Bulk

Surveillance in Bulk
By Chandler Haukap | February 18, 2022

When is the government watching, and do you have a right to know?
In 2014, the ACLU documented [47 civilians injured in no-knock warrants](https://www.aclu.org/report/war-comes-home-excessive-militarization-american-police
). Since then, the deaths of Breonna Taylor and Amir Locke have resulted in mass protests opposed to the issuance of no-knock warrants.

While the country debates the right of citizens to security in their own homes, the federal government is also extending its reach into our virtual spaces. In 2019 the ACLU learned that the NSA was reading and storing text messages and phone calls from United States citizens with no legal justification to do so. While the stakes of these virtual no-knock raids are much lower in terms of human life, the surveillance of our online communities could manifest a new form of oppression if left unchecked.

Privacy rights online

Online privacy is governed by the Stored Communications Act which is part of the larger Electronic Communications Privacy Act. The policy was signed into law in 1986. Three years before the world wide web. 19 years before Facebook. The policy is archaic and does not scale well to a world where 3.6 billion people are on a social media platform.

The Stored Communications Act distinguishes between data stored for 180 days or more and data stored for less than 180 days. For data less than 180 days old, the government must obtain a warrant to view it. Data stored for more than 180 days can be surveilled using a warrant, subpoena, or court order. However, there is one loophole. All data stored “solely for the purpose of providing storage or computer processing services” fall under the same protections as data stored for more than 180 days. This means that once you’ve opened an email it could be considered “in storage”; fair game for the government to read using a court order.

Furthermore, the court can also issue a gag order to the data provider that prevents them from informing you that the government is watching.

Three mechanisms of the current law raise concern:
1) With a search warrant, the government does not have to inform you that they are spying on you.
2) The government can gag private companies from informing you.
3) The government can request multiple accounts per warrant.

These three mechanisms are a recipe for systematic oppression. The Dakota Access Pipeline protest camp contained a few hundred participants at any point. The only thing stopping the government from surveilling all of their online interactions is a warrant issued by a court that we cannot observe or contest.

Bulk Requests

How could we ever know if the government is reading protesters’ texts? With private companies gagged and warrants issued in bulk, we cannot know the extent of the surveillance until years later. Luckily we can see some aggregated statistics from Facebook and Twitter. Both companies issue reports on government requests aggregated every 6 months. The number of requests for user information by the government increases every year at both companies, but the truly disturbing statistic is the number of accounts per request.

The government is requesting more accounts per request from Facebook — Source: https://github.com/chaukap/chaukap.github.io/raw/main/img/Facebook_Account_Requests.png

By dividing the number of accounts requested by the number of requests made, we get the average number of accounts per request. Facebook’s data shows a steady increase in the number of accounts per request which suggests that the government is emboldened to issue more bulk requests.

The government is requesting more accounts per request from Twitter — Source: https://github.com/chaukap/chaukap.github.io/raw/main/img/Twitter_Account_Requests.png

Twitter’s data doesn’t trend as strongly upward, but the last half of 2021 is truly disturbing. Over 9 accounts per request! When will the government simply request an entire protest movement, or have they already?

Every warrant is a chip at our Fourth Amendment right to privacy. This new mechanism is an atomic bomb. It’s a recipe for guilt by association and the violation of entire online communities.

Privacy is a human right, and we deserve laws that respect it.

If you’re doing nothing wrong, you have nothing to hide from the giant surveillance apparatus the government’s been hiding. – Steven Colbert

February 18, 2022

Lived Experts Are Essential Collaborators for Impactful Data Work

Lived Experts Are Essential Collaborators for Impactful Data Work
By Alissa Stover | February 18, 2022

Imagine you are a researcher working in a state agency that administers assistance programs (like Temporary Assistance for Needy Families, a.k.a. TANF, a.k.a. cash welfare). Let’s assume you are in an agency that values using data to guide program decision-making, so you have support of leadership and access to the data you need. How might you go about your research?

If we consider a greatly simplified process, you would probably start with talking to agency decision-makers about their highest-priority questions. You’d figure out which ones might benefit from the help of a data-savvy researcher like you. You might then go see what data is available to you, query it, and dive into your analyses. You’d discuss your findings with other analysts, agency leadership, and maybe other staff along the way. Maybe at the end you’d summarize your findings in a report or just dive into another project. Along the way, you want to do good, you want to help the people the agency serves. And despite the fact that the process here captures little of the complexity of the real, iterative steps researchers go through, it is accurate in the fact that at no point in this process did the researcher engage actual program participants in identifying and prioritizing problems and going about solving them. Although some researchers might be able to incorporate the perspectives of program participants into their analyses by collecting additional qualitative data from them, the vast majority of researchers do not work with program participants as collaborators in research.

A growing number of researchers recognize the value of participatory research, or directly engaging with those most affected by the issue at hand (and thus have lived expertise) [11]. Rather than only including the end-user as a research “subject”, lived experts are seen as having as co-owners of the research with real decision-making power. In our example, the researcher might work with program participants and front-line staff in figuring out what problem to solve. What better way to do “good” by someone than to ask what they actually need, and then do it? How easily can we do harm if we are ignoring those most affected by a problem when trying to solve it? [2]

Doing research using data does not mean you are an unbiased actor; every researcher brings their own perspective into the work and without including other points of view will blindly replicate structural injustices in their work [6]. Working with people who have direct experience with the issue at hand brings a contextual understanding that can improve the quality of research and is essential for understanding potential privacy harms [7]. For example, actually living the experience of being excluded can help uncover where there might be biases in data from under- or over-representation in a dataset [1]. Lived experts bring a lens that can improve practices around data collection and processing and ultimately result in higher quality information [8].

But does collaboration equal consultation? Not really. What really makes participatory research an act of co-creation is power sharing. As Sascha Costanza-Chock puts it, “Don’t start by building a new table; start by coming to the table.” Rather than going to the community to ask them to participate in pre-determined research activities, our researcher could go to the community and ask what they should focus on and how to go about doing the work [3].

Image from: Chicago Beyond, “Why Am I Always Being Researched?”. Accessed at: https://chicagobeyond.org/wp-content/uploads/2019/05/ChicagoBeyond_2019Guidebook_19_small.pdf

Doing community-led research is hard. It takes resources and a lot of self-reflection and inner work on the part of the researcher. Many individuals who might be potential change-makers in this research face barriers to engagement that stem from traumatic experiences with researchers in the past and feelings of vulnerability [5]. Revealingly, many researchers who publish about their participatory research findings don’t even credit the contributions of nonacademic collaborators [9].

Despite the challenges, community-led research could be a pathway to true change in social justice efforts. In our TANF agency example, the status quo is a system that serves a diminishing proportion of families in poverty and does not provide enough assistance to truly help families escape poverty despite decades of research focused on this program asking “What works?” for program participants [10]. Many efforts to improve the programs with data have upheld the status quo or have even made the situation even worse for families [4].

Image from: Center for Budget and Policy Priorities, “TANF Cash Assistance Helps Families, But Program Is Not the Success Some Claim”. Accessed at: https://www.cbpp.org/research/family-income-support/tanf-cash-assistance-helps-families-but-program-is-not-the-success

Participatory research is not a panacea. Deep changes in our social fabric requires cultural and policy change on a large scale. However, a commitment to holding oneself accountable to the end-user of a system and choosing to co-create knowledge with them could be a small way individual researchers create change in their own immediate context.

References

[1] Buolamwini, J. and Gebru, T. (2018). Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. Proceedings of Machine Learning Research, 81:1-15. Accessed at: https://proceedings.mlr.press/v81/buolamwini18a/buolamwini18a.pdf

[2] Chicago Beyond. (2019). Why Am I Always Beyond Researched?: A guidebook for community organizations, researchers, and funders to help us get from insufficient understanding to more authentic truth. Chicago Beyond. Accessed at: https://chicagobeyond.org/researchequity/

[3] Costanza-Chock, S. (2020). Design Justice: Community-Led Practices to Build the Worlds We Need. MIT Press. Accessed at: https://design-justice.pubpub.org/

[4] Eubanks, V. (2018). Automating inequality: how high-tech tools profile, police, and punish the poor. First edition. New York, NY: St. Martin’s Press.

[5] Grayson, J., Doerr, M. and Yu, J. (2020). Developing pathways for community-led research with big data: a content analysis of stakeholder interviews. Health Research Policy and Systems, 18(76). https://doi.org/10.1186/s12961-020-00589-7

[6] Jurgenson, N. (2014). View From Nowhere. The New Inquiry. Accessed at: https://thenewinquiry.com/view-from-nowhere/

[7] Nissenbaum, H.F. (2011). A Contextual Approach to Privacy Online. Daedalus 140:4 (Fall 2011), 32-48. Accessed at: https://ssrn.com/abstract=2567042

[8] Ruberg, B. and Ruelos, S. (2020). Data for Queer Lives: How LGBTQ Gender and Sexuality Identities Challenge Norms of Demographics. Big Data & Society. https://doi.org/10.1177/2053951720933286

[9] Sarna-Wojcicki, D., Perret, M., Eitzel, M.V., and Fortmann, L. (2017). Where Are the Missing Coauthors? Authorship Practices in Participatory Research. Rural Sociology, 82(4):713-746. https://doi.org/10.1186/s12961-020-00589-7

[10] Pavetti, L. and Safawi, A. (2021). TANF Cash Assistance Helps Families, But Program Is Not the Success Some Claim. Center on Budget and Policy Priorities. Accessed at: https://www.cbpp.org/research/family-income-support/tanf-cash-assistance-helps-families-but-program-is-not-the-success

[11] Vaughn, L. M., and Jacquez, F. (2020). Participatory Research Methods – Choice Points in the Research Process. Journal of Participatory Research Methods, 1(1). https://doi.org/10.35844/001c.13244

February 18, 2022

Embedding Ethics in the Code we Write

Embedding Ethics in the Code we Write
By Allison Fox | February 18, 2022

In the last few years, several researchers and activists have pulled back the curtain on algorithmic bias, sharing glaring examples of how artificial intelligence (AI) models have the potential to discriminate based on age, sex, race, and other identities. A 2013 study conducted by Latanya Sweeney revealed that if you have a name that is more often given to black babies than white babies, you are 80% more likely to have an ad suggestive of an arrest display when a Google search for your name is performed (Sweeney 2013).

Joy Boulamwini, Coded Bias

Similar discrimination was presented in the Netflix documentary Coded Bias – MIT researcher Joy Buolamwini discovered that facial recognition technologies do not accurately classify women or detect darker-skinned faces (Coded Bias 2020). Another case of algorithmic bias surfaced recently when news articles revealed that an algorithmic tool used by the Justice Department to assess the risk of prisoners returning to crime generated inconsistent results based on race (Johnson 2022). As the use of AI decision-making continues to increase, and more decisions are made by algorithms instead of humans, these algorithmic biases are only going to be amplified. Data science practitioners can take steps to mitigate these biases and their impacts by embedding ethics in the code they write – both figuratively and literally.

To better integrate conversations about ethics into the actual process of doing data science, the company DrivenData developed Deon, a command line tool that provides developers with reminders about ethics throughout the entire lifecycle of their project (DrivenData 2020).

Deon Checklist: Command Line Tool

The checklist is organized into five sections, designed to mirror the various stages of a data science project – data collection, data storage, analysis, modeling, and deployment. Each section includes several questions that aim to provoke discussion and ensure that important steps are not overlooked. DrivenData also put together a table of real-world ethical issues with AI that maybe could have been avoided had the corresponding checklist questions been discussed during the data science project. For example, during analysis, it is important to examine the dataset for possible sources of bias, and then take steps to address those biases. If this step is not taken during analysis, unintended consequences can ensue – garbage in often results in garbage out, meaning that if you provide a model with biased data, the model is likely going to produce outputs that reflect that bias. For example, female jobseekers are more likely to be shown Google ads for lower-paying jobs than male jobseekers (Gibbs 2015). This discriminatory behavior by Google’s model could be a result of biased data, and had steps been taken to address biased data, this discriminatory treatment potentially could have been avoided. By using Deon to embed ethics in the code we write, data scientists will be reminded of these ethical risks while coding, and can take steps to address biased data before a model is released into the wild, in turn avoiding and mitigating potential unintended biases.

Ethics are also relevant during the modeling stage of a data science project, where it is important to test model results for fairness across groups. The Deon checklist includes a checklist item on this step, and several open-source, code-based toolkits like AI Fairness 360 and Fairlearn have been developed recently to help data scientists assess and improve fairness in AI models. If this step is ignored, models may treat people differently based on certain identities, such as when Apple’s credit card first launched, and offered smaller lines of credit to men than women (Knight 2019).

As the use of AI to make decisions that were previously made by humans becomes even more widespread, classification decisions will be made faster and at a larger scale, reaching more people than ever before. While this will have its benefits, in that the advent of new technologies such as the ones discussed in this blog can improve quality of life and access to opportunity, it will also have its consequences. Minorities populations who already face discrimination have been shown to be the most susceptible to these consequences. Open-source tools that embed ethical considerations in the data science process, like Deon, AI Fairness360, and Fairlearn, can all help to combat these consequences by encouraging data scientists to place ethics at the forefront during each stage of a data science project.

References:

1. Coded Bias. (2020). About the Film. Coded Bias. https://www.codedbias.com/

2. DrivenData. (2020). About – Deon. Deon. https://deon.drivendata.org/

3. Gibbs, S. (2015, July 8). Women less likely to be shown ads for high-paid jobs on Google, study shows. The Guardian. https://www.theguardian.com/technology/2015/jul/08/women-less-likely-ads-high-paid-jobs-google-study

4. Johnson, C. (January 6, 2022). Flaws plague a tool meant to help low-risk federal prisoners win early release. NPR. https://www.npr.org/2022/01/26/1075509175/justice-department-algorithm-first-step-act

5. Knight, W. (2019, November 19). The Apple Card Didn’t “See” Gender—and That’s the Problem. Wired. https://www.wired.com/story/the-apple-card-didnt-see-genderand-thats-the-problem/

6. Sweeney, Latanya, Discrimination in Online Ad Delivery (January 28, 2013). Available at SSRN: http://dx.doi.org/10.2139/ssrn.2208240