Syllabus

Course Schedule and Readings

Date            Instructor(s)                                   Topic

8/23 (Thu)  Marti Hearst / Gilad Mishne,            Intro to course (pdf); Twitter basics (pdf)

       Read: MapReduce, Dean et al, OSDI’04Pig Latin, Olston et al, SIGMOD ’04

       Pre-Assignment: Install Pig and run some sample scripts

8/28 (Tue)  Othman Laraki / Raffi Krikorian,     Twitter Philosophy (pdf)/ Twitter  Software Ecosystem (pdf)

Read:   Programming Pig, Chs. 1-4

8/30 (Thu) Bill Graham,                                          Using Hadoop at Twitter (pdf)

              Read:   Programming Pig, Ch. 5

            Assignment 1: Pig programming

9/4   (Tue) Jonathan Coveney,                                Using Pig at Twitter (pdf)

9/6   (Thu)  In class assignment work  (AWS How To)

9/11  (Tue) Rion Snow,                                             The Twitter API (pdf)

       Pre-Assignment: Download the Twitter4j Library

       Reading: Twitter4j Code examples

9/13  (Thu) Kostas Tsioutsiouliklis,                Trend Detection in Twitter’s Streams (pdf)

       Background reading: Minhash, Chi square test

9/18  (Tue) Brian Larson,                                         Real-time Twitter Search

Background reading: The classic paper on web search engines; the paper on EarlyBird that Brian will be covering

9/20  (Thu)  In-class assignment work
9/25  (Tue) Stephen Sorkin,                                     Correlating Twitter Data with Other Data Streams / Twitter applications (pdf)
9/27  (Thu) Aneesh Sharma,                                    Graph Algorithms for the Twitter Social Graph (pdf)

      Background Reading for Lectures by Aneesh, Joey, Delip, and Stan:  Networks, Crowds, and Markets:   Reasoning about a Highly Connected World. Easley and Kleinberg. Cambridge University Press, 2010.

10/2  (Tue) Joey Gonzalez,                                       GraphLab: Big Learning with Graphs (pdf)
10/4  (Thu) Delip Rao,                                               Large-scale Anomaly Detection at Twitter
10/9  (Tue)  Alpa Jain,                                               Recommendation Algorithms at Twitter (pdf)
10/11 (Thu) In-class assignment work
10/16 (Tue) Project ideas discussion
10/18 (Thu) Project matchmaking
10/23 (Tue) Kurt Thomas,                                         Security at Twitter
10/25 (Thu) Stan Nikolov,                                         Information Diffusion and Outbreak Detection at Twitter
10/30 (Tue)  In class use of Amazon EMR
11/1     (Thu) Oscar Boykin / Argyris Zymnis,       Using Scalding at Twitter
11/6    (Tue)  Special Election Day Activities
11/8    (Thu)  Matei Zaharia,                                     Spark

Background reading on Spark:

Long version: http://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf 

Shorter: http://www.cs.berkeley.edu/~matei/papers/2010/hotcloud_spark.pdf

11/13   (Tue)  Project mid-way check-ins
11/15   (Thu)  Project mid-way check-ins
11/20  (Tue) Discuss Assignment 3
11/22   Thanksgiving Holiday
11/27   (Tue) Twitter Organization and Search (pdf)
11/29   (Tue) Course Wrap-up (pdf)
12/6     (Thu)  Project Presentations @ Twitter (6:00pm-8:30pm)
12/11    (Tue)  Project Presentations @ Berkeley (4:00pm-6:00pm)

Schedule Subject to Change