Archive for September, 2017

Are you a local nonprofit or community organization that has a pressing challenge that you think technology might be able to address, but you don’t know where to start?

If so, join us and the UC Berkeley School of Information’s IMSA (Information Management Student Association) for Un-Pitch Day on October 27th from 4 – 7pm, where graduate students will offer their technical expertise to help address your organization’s pressing technology challenges. During the event, we’ll have you introduce your challenge(s) and desired impact and partner you with grad students with activities to explore your challenge(s) and develop refined questions to push the conversation forward.

You’d then have the opportunity to pitch your challenge(s) with the goal of potentially matching with a student project group to adopt your project. By attending Un-Pitch day, you would gain a more defined sense of how to address your technology challenge, and, potentially, a team of students interested in working with your org to develop a prototype or a research project to address it.

Our goal is to both help School of Information grad students (and other UCB grad students) identify potential projects they can adopt for the 2017-2018 academic year (ending in May). Working in collaboration with your organization, our students can help develop a technology-focused project or conduct technology-related research to aid your organization.

There is also the possibility of qualifying for funding ($2000 per project team member) for technology projects with distinct public interest/public policy goals through the Center for Technology, Society & Policy (funding requires submitting an application to the Center, due in late November). Please note that we cannot guarantee that each project presented at Un-Pitch Day will match with an interested team.

Event Agenda

Friday, October 27th from 4 – 7pm at South Hall on the UC Berkeley campus

Light food & drinks will be provided for registered attendees.

Registration is required for this event; click here to register.

4:00 – 4:45pm Social impact organization introductions and un-pitches of challenges

4:45 – 5:00pm CTSP will present details about public interest project funding opportunities and deadlines.

5:00 – 6:00pm Team up with grad students through “speed dating” activities to break the ice and explore challenge definitions and develop fruitful questions from a range of diverse perspectives.

6:00 – 7:00pm Open house for students and organizations to mingle and connect over potential projects. Appetizers and refreshments provided by CTSP.

Please join us for the next NLP Seminar on Monday, October 9, at 4:00pm in 202 South Hall.

Speaker: Siva Reddy (Stanford)

Title:  Linguists-defined vs. Machine-induced Natural Language Structures for Executable Semantic Parsing


Querying a database to retrieve an answer, telling a robot to perform an action, or teaching a computer to play a game are tasks requiring communication with machines in a language interpretable by them. Here we consider the task of converting human languages to a knowledge-base (KB) language for question-answering. While human languages have latent structures, machine interpretable languages have explicit formal structures. The computational linguistics community has created several treebanks to understand the formal structures of human languages, e.g., universal dependencies. But are these useful for deriving machine interpretable formal structures?

In the first part of the talk, I will discuss how to convert universal dependencies in multiple languages to both general-purpose and kb-executable logical forms. In the second part, I will present a neural model on how to induce task-specific natural language structures. I will discuss the similarities and differences between linguists-defined and machine-induced structures, and pros and cons of each.


Siva Reddy is a postdoc at the Stanford NLP group working with Chris Manning. His research focuses on finding fundamental representations of language, mostly interpretable, which are useful for NLP applications, especially machine understanding. In this direction, he is currently exploring whether linguistic representations are necessary or all we need is end-to-end learning. His postdoc is partly funded by a Facebook AI Research grant. Prior to the postdoc, he was a Google PhD Fellow at the University of Edinburgh under the supervision of Mirella Lapata and Mark Steedman. He worked with Google Parsing team as an intern during his PhD, and as a full-time employee for Adam Kilgarriff’s Sketch Engine before his PhD. His team won the first place in SemEval 2011 Compositionality Detection task and a best paper at IJCNLP 2011. Apart from language, he loves nature and badminton.

Please join us for our first NLP Seminar of the Fall semester on Monday, September 25, at 4:00pm in 202 South Hall.

Speaker: David Smith (Northeastern University)

Title: Modeling Text Dependencies: Information Cascades, Translations, and Multi-Input Encoders


Dependencies among texts arise when speakers and writers copy manuscripts, cite the scholarly literature, speak from talking points, repost content on social networking platforms, or in other ways transform earlier texts. While in some cases these dependencies are observable—e.g., by citations or other links—we often need to infer them from the text alone. In our Viral Texts project, for example, we have built models of reprinting for noisily-OCR’d nineteenth-century newspapers to trace the flow of news, literature, jokes, and anecdotes throughout the United States. Our Oceanic Exchanges project is now extending that work to information propagation across language boundaries. Other projects in our group involve inferring and exploiting text dependencies to model the writing of legislation, the impact of scientific press releases, and changes in the syntax of language.

In this talk, I will discuss methods both for inferring these dependency structures and for exploiting them to improve other tasks. First, I will describe a new directed spanning tree model of information cascades and a new contrastive training procedure that exploits partial temporal ordering in lieu of labeled link data. This model outperforms previous approaches to network inference on blog datasets and, unlike those approaches, can evaluate individual links and cascades. Then, I will describe methods for extracting parallel passages from large multilingual, but not parallel, corpora by performing efficient search in the continuous document-topic simplex of a polylingual topic model. These extracted bilingual passages are sufficient to train translation systems with greater accuracy than some standard, smaller clean datasets. Finally, I will describe methods for automatically detecting multiple transcriptions of the same passage in a large corpus of noisy OCR and for exploiting these multiple witnesses to correct noisy text. These multi-input encoders provide an efficient and effective approximation to the intractable multi-sequence alignment approach to collation and allow us to produce transcripts with more than 75% reductions in error.




September 2nd, 2017

Hello new MIDS students! This will be where your blog posts are listed; currently it is set so your entries are only visible to members of the I School Community, but we highly encourage you to have it be posted publicly as well!