Archive for the ‘Uncategorized’ Category

Hello Berkeley!

December 4th, 2018

My name is Taylor and I’m an incoming student of the Master of Information and Data Science program. My current research interests are natural language understanding and generation. Reach me at ude.yelekreb.loohcsinull@rolyat!

This is a project update from a CTSP project from 2017: Assessing Race and Income Disparities in Crowdsourced Safety Data Collection (with Kate BeckAditya Medury, and Jesus M. Barajas)

Project Update

This work has led to the development of Street Story, a community engagement tool that collects street safety information from the public, through UC Berkeley SafeTREC.

The tool collects qualitative and quantitative information, and then creates maps and tables that can be publicly viewed and downloaded. The Street Story program aims to collect information that can create a fuller picture of transportation safety issues, and make community-provided information publicly accessible.


The Problem

Low-income groups, people with disabilities, seniors and racial minorities are at higher risk of being injured while walking and biking, but experts have limited information on what these groups need to reduce these disparities. Transportation agencies typically rely on statistics about transportation crashes aggregated from police reports to decide where to make safety improvements. However, police-reported data is limited in a number of ways. First, crashes involving pedestrians or cyclists are significantly under-reported to police, with reports finding that up to 60% of pedestrian and bicycle crashes go unreported. Second, some demographic groups, including low-income groups, people of color and undocumented immigrants, have histories of contentious relationships with police. Therefore, they may be less likely to report crashes to the police when they do occur. Third, crash data doesn’t include locations where near–misses have happened, or locations where individuals feel unsafe but an issue has not yet happened. In other words, the data allow professionals to react to safety issues, but don’t necessarily allow them to be proactive about them.

One solution to improve and augment the data agencies use to make decisions and allocate resources is to provide a way for people to report transportation safety issues themselves. Some public agencies and private firms are developing apps and websites whether people can report issues for this purpose. But one concern is that the people who are likely to use these crowdsourcing platforms are those who have access to smart phones or the internet and who trust that government agencies with use the data to make changes, biasing the data toward the needs of these privileged groups.

Our Initial Research Plan

We chose to examine whether crowdsourced traffic safety data reflected similar patterns of underreporting and potential bias as police-reported safety data. To do this, we created an online mapping tool that people could use to report traffic crashes, near-misses and general safety issues. We planned to work with a city to release this tool to and collected data from the general public, then work directly with a historically marginalized community, under-represented in police-reported data, to target data collection in a high-need neighborhood. We planned to reduce barriers to entry for this community, including meeting the participants in person to explain the tool, providing them with in-person and online training, providing participants with cell phones, and compensating their data plans for the month. By crowdsourcing data from the general public and from this specific community, we planned to analyze whether there were any differences in the types of information reported by different demographics.

This plan seemed to work well with the research question and with community engagement best practices. However, we came up against a number of challenges with our research plan. Although many municipal agencies and community organizations found the work we were doing interesting and were working to address similar transportation safety issues we were focusing on, many organizations and agencies seemed daunted by the prospect of using technology to address underlying issues of under-reporting. Finally, we found that a year was not enough time to build trusting relationships with the organizations and agencies we had hoped to work with. Nevertheless, we were able to release a web-based mapping tool to collect some crowdsourced safety data from the public.

Changing our Research Plan

To better understand how more well-integrated digital crowdsourcing platforms perform, we pivoted our research project to explore how different neighborhoods engage with government platforms to report non-emergency service needs. We assumed some of these non-emergency services would mirror the negative perceptions of bicycle and pedestrian safety we were interested in collecting via our crowdsourcing safety platform. The City of Oakland relies on SeeClickFix, a smartphone app, to allow residents to request service for several types of issues: infrastructure issues, such as potholes, damaged sidewalks, or malfunctioning traffic signals; and non-infrastructure issues such as illegal dumping or graffiti. The city also provides phone, web, and email-based platforms for reporting the same types of service requests. These alternative platforms are collectively known as 311 services. We looked at 45,744 SeeClickFix-reports and 35,271 311-reports made between January 2013 and May 2016. We classified Oakland neighborhoods by status as community of concern. In the city of Oakland, 69 neighborhoods meet the definition for communities of concern, while 43 do not. Because we did not have data on the characteristics of each person reporting a service request, we made the assumption that people reporting requests also lived in the neighborhood where the request was needed.

How did communities of concern interact with the SeeClickFix and 311 platforms to report service needs? Our analysis highlighted two main takeaways. First, we found that communities of concern were more engaged in reporting than other communities, but had different reporting dynamics based on the type of issue they were reporting. About 70 percent of service issues came from communities of concern, even though they represent only about 60 percent of the communities in Oakland. They were nearly twice as likely to use SeeClickFix than to report via the 311 platforms overall, but only for non-infrastructure issues. Second, we found that even though communities of concern were more engaged, the level of engagement was not equal for everyone in those communities. For example, neighborhoods with higher proportions of limited-English proficient households were less likely to report any type of incident by 311 or SeeClickFix.

Preliminary Findings from Crowdsourcing Transportation Safety Data

We deployed the online tool in August 2017. The crowdsourcing platform was aimed at collecting transportation safety-related concerns pertaining to pedestrian and bicycle crashes, near misses, perceptions of safety, and incidents of crime while walking and bicycling in the Bay Area. We disseminated the link to the crowdsourcing platform primarily through Twitter and some email lists. . Examples of organizations who were contacted through Twitter-based outreach and also subsequently interacted with the tweet (through likes and retweets) include Transform Oakland, Silicon Valley Bike Coalition, Walk Bike Livermore, California Walks, Streetsblog CA, and Oakland Built. By December 2017, we had received 290 responses from 105 respondents. Half of the responses corresponded to perceptions of traffic safety concerns (“I feel unsafe walking/cycling here”), while 34% corresponded to near misses (“I almost got into a crash but avoided it”). In comparison, 12% of responses reported an actual pedestrian or bicycle crash, and 4% of incidents reported a crime while walking or bicycling. The sample size of the responses is too small to report any statistical differences.

Figure 1 shows the spatial patterns of the responses in the Bay Area aggregated to census tracts. Most of the responses were concentrated in Oakland and Berkeley. Oakland was specifically targeted as part of the outreach efforts since it has significant income and racial/ethnic diversity.

Figure 1 Spatial Distribution of the Crowdsourcing Survey Responses

Figure 1 Spatial Distribution of the Crowdsourcing Survey Responses


In order to assess the disparities in the crowdsourced data collection, we compared responses between census tracts that are classified as communities of concern or not. A community of concern (COC), as defined by the Metropolitan Transportation Commission, a regional planning agency, is a census tract that ranks highly on several markers of marginalization, including proportion of racial minorities, low-income households, limited-English speakers, and households without vehicles, among others.

Table 1 shows the comparison between the census tracts that received at least one crowdsourcing survey response. The average number of responses received in COCs versus non-COCs across the entire Bay Area were similar and statistically indistinguishable. However, when focusing on Oakland-based tracts, the results reveal that average number of crowdsourced responses in non-COCs were statistically higher. To assess how the trends of self-reported pedestrian/cyclist concerns compare with police-reported crashes, an assessment of pedestrian and bicycle-related police-reported crashes (from 2013-2016) shows that more police-reported pedestrian/bicycle crashes were observed on an average in COCs across the Bay Area as well as in Oakland. The difference in trends observed in the crowdsourced concerns and police-reported crashes suggest that either walking/cycling concerns are greater in non-COCs (thus underrepresented in police crashes), or that participation from among COCs is relatively underrepresented.

Table 1 Comparison of crowdsourced concerns and police-reported pedestrian/bicycle crashes in census tracts that received at least 1 response

Table 1 Comparison of crowdsourced concerns and police-reported pedestrian/bicycle crashes in census tracts that received at least 1 response

Table 2 compares the self-reported income and race/ethnicity characteristics of the respondents with the locations where the responses were reported. For reference purposes, Bay Area’s median household income in 2015 was estimated to be $85,000 (Source:, and Bay Area’s population was estimated to be 58% White, per the 2010 Census, (Source:

Table 2 Distribution of all Bay Area responses based on the location of response and the self-reported income and race/ethnicity of respondents

The results reveal that White, medium-to-high income respondents were observed to report more walking/cycling -related safety issues in our survey, and more so in non-COCs. This trend is also consistent with the definition of COCs, which tend to have a higher representation of low-income people and people of color. However, if digital crowdsourcing without widespread community outreach is more likely to attract responses from medium-to-high income groups, and more importantly, if they only live, work, or play in a small portion of the region being investigated, the aggregated results will reflect a biased picture of a region’s transportation safety concerns. Thus, while the scalability of digital crowdsourcing provides an opportunity for capturing underrepresented transportation concerns, it may require greater collaboration with low-income, diverse neighborhoods to ensure uniform adoption of the platform.

Lessons Learned

From our attempts to work directly with community groups and agencies and our subsequent decision to change our research focus, we learned a number of lessons:

  1. Develop a research plan in partnership with communities and agencies. This would have allowed us to ensure that we began with a research plan in which community groups and agencies were better able to partner with us on, and this would have ensured that the partners were on board the topic of interest and the methods we hoped to use.
  2. Recognize the time it takes to build relationships. We found that building relationships with agencies and communities was more time intensive and took longer that we had hoped. These groups often have limitations on the time they can dedicate to unfunded projects. Next time, we should plan for this in our initial research plan.
  3. Use existing data sources to supplement research. We found that using See-Click-Fix and 311 data was a way to collect and analyze information to add context to our research question. Although the data did not have all demographic information we had hoped to analyze, this data source added additional context to the data we collected.
  4. Speak in a language that the general public understands. We found that when we used the term self-reporting, rather than crowdsourcing, when talking to potential partners and to members of the public, these individuals were more willing to consider the use of technology to collect information on safety issues from the public as legitimate. Using vocabulary and phrasing that people are familiar with is crucial when attempting to use technology to benefit the social good.

Halloween Cuckoo Clock

October 30th, 2018

Team: Azin Mirzaagha, Patrick Barin, Yunjie Yao



Create a “Cuckoo Clock” mechanics. Cuckoo Clock typically has an automaton of the bird that appears through a small trap door while the clock is striking.


Components Used

  • Plywood
  • wood beams
  • wood pieces
  • rubber bands
  • Small hinge
  • wood squares
  • tape
  • Glue
  • Scissor


Halloween Cuckoo Clock


We created a Halloween theme cuckoo clock because this Wednesday is Halloween!!

When made a house, and when the door of the house is opened, a pumpkin will pop out with a message “trick or treat”!


CTSP Alumni Updates

September 27th, 2018

We’re thrilled to highlight some recent updates from our fellows:

Gracen Brilmyer, now a PhD student at UCLA, has published a single authored work in one of the leading journals in archival studies, Archival Science: “Archival Assemblages: Applying Disability Studies’ Political/Relational Model to Archival Description” and presented their work on archives, disability, and justice at a number of events over the past two years, including The Archival Education and Research Initiative (AERI), the Allied Media Conference, the International Communications Association (ICA) Preconference, Disability as Spectacle, and their research will be presented at the upcoming Community Informatics Research Network (CIRN).

CTSP Funded Project 2016: Vision Archive

Originating in the 2017 project “Assessing Race and Income Disparities in Crowdsourced Safety Data Collection” done by Fellows Kate Beck, Aditya Medury, and Jesus Barajas, the Safe Transportation and Research Center will launch a new project, Street Story, in October 2018. Street Story is an online platform that allows community groups and agencies to collect community input about transportation collisions, near-misses, general hazards and safe locations to travel. The platform will be available throughout California and is funded through the California Office of Traffic Safety.

CTSP Funded Project 2017: Assessing Race and Income Disparities in Crowdsourced Safety Data Collection

Fellow Roel Dobbe has begun a postdoctoral scholar position at the new AI Now Institute. Inspired by his 2018 CTSP project, he has co-authored a position paper with Sarah Dean, Tom Gilbert and Nitin Kohli titled A Broader View on Bias in Automated Decision-Making: Reflecting on Epistemology and Dynamics.

CTSP Funded Project 2018: Unpacking the Black Box of Machine Learning Processes

We are also looking forward to a CTSP Fellow filled Computer Supported Cooperative Work conference in November this year! CTSP affiliated papers include:

We also look forward to seeing CTSP affiliates presenting other work, including 2018 Fellows Richmond Wong, Noura Howell, Sarah Fox, and more!


Article published in Nature

September 22nd, 2018

Dr. Blumenstock’s article, “Don’t forget people in the use of big data for development,” was published in the journal Nature

Blumenstock receives Hellman Award

September 20th, 2018

Prof. Blumenstock was named as a 2018 Hellman Fellow for his project, “Evaluating Community Cellular Networks: How Does Mobile Connectivity Affect Isolated Communities?”

Professor Deirdre K. Mulligan and PhD student (and CTSP Co-Director) Daniel Griffin have an op-ed in The Guardian considering how Google might consider its human rights obligations in the face of state censorship demands: If Google goes to China, will it tell the truth about Tiananmen Square?

The op-ed advances a line of argument developed in a recent article of theirs in the Georgetown Law Technology Review: “Rescripting Search to Respect the Right to Truth”

To view the full article I wrote about the Impact of Oracle v. Rimini on Data Professionals and the Public, please visit:

Of particular interest to Data Scientists was the question of whether using “bots and scrapers” for automated collection of data was deemed a violation of the law if it violated a Terms of Service.  An important tool in the Data Scientists’ and Data Engineers’ toolbox, automated scraping scripts provide for efficient accumulation of data.  Further, many individuals cite instances of Terms of Service being too broad or vague for interpretation.

Among the applications of these scraped data, it subsequently can be used for academic research or used to develop novel products and services that connect disparate sets of information and reduce information asymmetries across consumer populations (for example, search engines or price tracking).  On the other hand, sometimes malicious bots can become burdensome to a company’s website and impact or impede their operations.

In June of 2018, the Algorithmic Fairness and Opacity Working Group (AFOG) held a summer workshop with the theme “Algorithms are Opaque and Unfair: Now What?.” The event was organized by Berkeley I School Professors (and AFOG co-directors) Jenna Burrell and Deirdre Mulligan and postdoc Daniel Kluttz, and Allison Woodruff and Jen Gennai from Google. Our working group is generously sponsored by Google Trust and Safety and hosted at the UC Berkeley School of Information.

Inspired by questions that came up at our biweekly working group meetings during the 2017-2018 academic year, we organized four panels for the workshop. The panel topics raised issues that we felt required deeper consideration and debate. To make progress we brought together a diverse, interdisciplinary group of experts from academia, industry, and civil society in a workshop-style environment. In panel discussions, we considered potential ways of acting on algorithmic (un)fairness and opacity. We sought to consider the fullest possible range of ‘solutions,’ including technical implementations (algorithms, user-interface designs), law and policy, standard-setting, incentive programs, new organizational processes, labor organizing, and direct action.


Researchers (e.g., Barocas and Selbst 2016; Kleinberg et al. 2017), journalists (e.g., Miller 2015), and even the federal government (e.g., Executive Office of the President 2016) have become increasingly attuned to issues of algorithmic opacity, bias, and fairness, debating them across a range of applications, including criminal justice (Angwin et al. 2016, Chouldechova 2017, Berk et al. 2017), online advertising (Datta et al. 2018), natural language processing (Bolukbasi et al. 2016), consumer credit (Waddell 2016), and image recognition (Simonite 2017; Buolamwini and Gebru 2018).

There has been recent progress especially in understanding algorithmic fairness as a technical problem. Drawing from various formal definitions of fairness (see Narayanan 2018; Corbett-Davies and Goel 2018; Kleinberg et al. 2017), researchers have identified a range of techniques for addressing fairness in algorithm-driven classification and prediction. Some approaches focus on addressing allocative harms by fairly allocating opportunities or resources. These include fairness through awareness (Dwork et al. 2012), accuracy equity (Angwin et al. 2016Dieterich et al. 2016), equality of opportunity (Hardt et al. 2016), and fairness constraints (Zafar et al. 2017). Other approaches tackle issues of representational harms which occur when a system diminishes specific groups or reinforces stereotypes based on identity (see Crawford 2017). Proposed solutions include corpus-level constraints to prevent the amplification of gender stereotypes in language corpora (Zhao et al. 2017), diversity algorithms (Drosou et al. 2017), causal reasoning to assess whether a protected attribute has an effect on a predictor (Kilbertus et al. 2017, Kusner et al. 2017), and inclusive benchmark datasets to address intersectional accuracy disparities (Buolamwini and Gebru 2018).

These new approaches are invaluable in motivating technical communities to think about the issues and make progress on addressing them. But the conversation neither starts nor ends there. Our interdisciplinary group sought to complement and challenge the technical framing of fairness and opacity issues. In our workshop, we considered the strengths and limitations of a technical approach and discussed where and when hand-offs, human augmentation, and oversight are valuable and necessary. We considered ways of engaging a wide-ranging set of perspectives and roles, including professionals with deep domain expertise, activists involved in reform efforts, financial auditors, scholars, as well as diverse system users and their allies. In doing so, we considered models that might be transferable looking to various fields including network security, financial auditing, safety critical systems, and civil rights campaigns.

The Panels
Below is a brief summary of the panel topics and general themes of the discussion. Full write-ups for each panel are linked. Our aim in these write ups is not to simply report a chronological account of the panel, but to synthesize and extend the panel discussions. These panel reports take a position on the topic and offer a set of concrete proposals. We also seek to identify areas of limited knowledge, open questions, and research opportunities. We intend for these documents to inform an audience of researchers, implementers, practitioners, and policy-makers.

Panel 1 was entitled “What a technical ‘fix’ for fairness can and can’t accomplish.” Panelists and audience members discussed specific examples of problems of fairness (and justice), including cash bail in the criminal justice system, “bad faith” search phrases (e.g., the question, “Did the Holocaust happen?”), and representational harm in image-labeling. Panelists noted a key challenge that technology, on its own, is not good at explaining when it should not be used or when it has reached its limits. Panelists pointed out that understanding broader historical and sociological debates in the domain of application and investigating contemporary reform efforts, for example in criminal justice, can help to clarify the place of algorithmic prediction and classification tools in a given domain. Partnering with civil-society groups can ensure a sound basis for making tough decisions about when and how to intervene when a platform or software is found to be amplifying societal biases, is being gamed by “bad” actors, or otherwise facilitates harm to users. [READ REPORT]

Panelists for Panel 1: Lena Z. Gunn (Electronic Frontier Foundation), Moritz Hardt (UC Berkeley Department of Electrical Engineering and Computer Sciences), Abigail Jacobs (UC Berkeley Haas School of Business), Andy Schou (Google). Moderator: Sarah M. Brown (Brown University Division of Applied Mathematics).

Panel 2, entitled “Automated decision-making is imperfect, but it’s arguably an improvement over biased human decision-making,” describes a common rejoinder to criticism of automated decision-making. This panel sought to consider the assumptions of this comparison between humans and machine automation. There is a need to account for differences in the kinds of biases associated with human decision-making (including cognitive biases of all sorts) and those uniquely generated by machine reasoning. The panel discussed the ways that humans rely on or reject decision-support software. For example, work by one of the panelists, Professor Angèle Christin, shows how algorithmic tools deployed in professional environments may be contested or ignored. Guidelines directed at humans about how to use particular systems of algorithmic classification in low- as opposed to high-stakes domains can go unheeded. This seemed to be the case in at least one example of how Amazon’s facial recognition system has been applied in a law-enforcement context. Such cases underscore the point that humans aren’t generally eliminated when automated-decision systems are deployed; they still decide how they are to be configured and implemented, which may disrupt whatever gains in “fairness” might otherwise be realized. Rather than working to establish which is better–human or machine decision-making–we suggest developing research on the most effective ways to bring automated tools and humans together to form hybrid decision-making systems. [READ REPORT]

Panelists for Panel 2: Angèle Christin (Stanford University Department of Communication), Marion Fourçade (UC Berkeley Department of Sociology), M. Mitchell (Google), Josh Kroll (UC Berkeley School of Information). Moderator: Deirdre Mulligan (UC Berkeley School of Information).

Panel 3 on “Human Autonomy and Empowerment” examined how we can enhance the autonomy of humans who are subject to automated decision-making tools. Focusing on “fairness” as a resource allocation or algorithmic problem tends to assume it is something to be worked out by experts. Taking an alternative approach, we discussed how users and other ‘stakeholders’ can identify errors, unfairness, and make other kinds of requests to influence and improve the platform or system in question. What is the best way to structure points of user feedback? Panelists pointed out that design possibilities range from lightweight feedback mechanisms to support for richer, agonistic debate. Not-for-profit models, such as Wikipedia, demonstrate the feasibility of high transparency and open debate about platform design. Yet participation on Wikipedia, while technically open to anyone, requires a high investment of time and energy to develop mastery of the platform and the norms of participation. “Flagging” functions, on the other hand, are pervasive, lightweight tools found on most mainstream platforms. However, they often serve primarily to shift governance work onto users without the potential to fundamentally influence platform policies or practices. Furthermore, limiting consideration to the autonomy of platform users misses the crucial fact that many automated decisions are imposed on people who never use the system directly. [READ REPORT]

Panelists for Panel 3: Stuart Geiger (UC Berkeley Institute for Data Science), Jen Gennai (Google), and Niloufar Salehi (Stanford University Department of Computer Science). Moderator: Jenna Burrell (UC Berkeley School of Information).

Panel 4 was entitled “Auditing Algorithms (from Within and from Without).” Probing issues of algorithmic accountability and oversight, panelists recognized that auditing (whether in finance or safety-critical industries) promotes a culture of “slow down and do a good job,” which runs counter to the “move fast and break things” mindset that has long defined the tech industry. Yet corporations, including those in the tech sector, do have in-house auditing teams (in particular, for financial auditing) whose expertise and practices could serve as models. Generally, internal audits concern the quality of a process rather than the validity of the “outputs.” Panelists pointed out that certain processes developed for traditional auditing might work for auditing “fairness,” as well. A “design history file,” for example, is required in the development of medical devices to provide transparency that facilitates FDA review. In the safety-critical arena, there are numerous techniques and approaches, including structured safety cases, hazard analysis, instrumentation and monitoring, and processes for accident investigation. But there are also particular challenges “fairness” presents to attempts to develop an audit process for algorithms and algorithmic systems. For one, and recalling Panel 1’s discussion, there are numerous valid definitions of fairness. In addition, problems of “fairness” are often not self-evident or exposed through discrete incidents (as accidents are in safety-critical industries). These observations suggest a need to innovate auditing procedures if they are to be applied to the specific challenges of algorithmic fairness. [READ REPORT]

Panelists for Panel 4: Chuck Howell (MITRE), Danie Theron (Google), Michael Tschantz (International Computer Science Institute). Moderator: Allison Woodruff (Google).

NOTE: this reports on an AFOG relevant ISchool final project for the Master of Information Management and Systems (MIMS) program. The project was developed by a student team composed of Samuel Meyer, Shrestha Mohanty, Sung Joo Son, and Monicah Wambugu. Students from the team participated in the AFOG lunch working group meetings.

by Samuel Meyer

Opaque: hard to understand; not clear or lucid; obscure. All algorithms are opaque to those who do not understand how they work, but machine learning can be opaque even to those who built it. As a result, much academic work on explainable machine learning focuses on just trying to explain how the model works to those who built it, let alone opening it up to the general public. Unfortunately, many new machine-learning products sold to governments and large companies are only understandable by their developers. If only the developers understand the systems, how can the public make sure that government systems make fair decisions?

Not all is lost, however. Contrary to the narrative that all machine learning is completely a black box, talking to machine learning practitioners will reveal that some machine learning algorithms are generally considered to be interpretable (such as logistic regression and decision trees). They may not always be as accurate as neural nets or other advanced machine learning, but these algorithms are often used in real-world applications because they are easy to implement and easier for experts to interpret.

As a first step toward making general systems that will let the public look at machine learning, we designed and built a web application that allows arbitrary csv-based datasets be fit with logistic regression and viewed by non-experts. It allows machine-learning developers to share what a model is doing with outsiders. If the non-experts want to add comments about factors or weights in the model, they can.

Also, the web app includes an implementation of equal opportunity, a mathematical definition of fairness created by Moritz Hardt, a member of AFOG. This allows users to see what effect the fairness requirement would have on their dataset.

You can read a full report at our project page or download the code and run it yourself.

Definition of opaque from