About

Course Information

  • Tues / Thus 2:00-3:30
  • 210 South Hall
  • CCN: 42620, 42508
  • i 290-001, i190-001, Fall 2012

Course Instructors

  • Prof. Marti Hearst, hearst@ischool.berkeley.edu (Office hours:  Tues 9-11am, 307B South Hall)
  • Eunkwang Joo, jooddang@ischool.berkeley.edu (Office hours: Wed: 11-12pm, South Hall Atrium)
  • Shreyas, shreyas@ischool.berkeley.edu (Office hours: Mon 2-3pm, Room 202 South Hall )

Course Description

How to store, process, analyze and make sense of Big Data is of increasing interest and importance to technology companies, a wide range of industries, and academic institutions. In this course, UC Berkeley professors and Twitter engineers will lecture on the most cutting-edge algorithms and software tools for data analytics as applied to Twitter microblog data. Topics will include applied natural language processing algorithms such as sentiment analysis, large scale anomaly detection, real-time search, information diffusion and outbreak detection, trend detection in social streams, recommendation algorithms, and advanced frameworks for distributed computing. Social science perspectives on analyzing social media will also be covered.

This is a hands-on project course in which students are expected to form teams to complete intensive programming and analytics projects using the real-world example of Twitter data and code bases. Engineers from Twitter will help advise student projects, and students will have the option of presenting their final project presentations to an audience of engineers at the headquarters of Twitter in San Francisco (in addition to on campus). Project topics include building on existing infrastructure tools, building Twitter apps, and analyzing Twitter data.

Intended Audience

This course is intended for students, both graduate and undergraduate, with programming skills and an interest in analyzing data and/or building software to do so.

Prerequisites

Undergraduates must be upper-division computer science or electrical engineering majors, or must have taken significant advanced programming courses including CS 162 and math courses including CS 70 or equivalent. Completion of a statistics course is also strongly recommended.

Graduate students must be comfortable with systems programming and be able to pick up new software programming tools with little structured support and be comfortable with basic math topics such as graph theory, statistics, and probability theory.