The Assignment
To prepare: study the Pig Tutorial on analyzing a query log.
Pig assignment revised posted on 9/10/2012. It converts question 3 to an optional bonus question.
Due 9/14/2012 at 5pm PST.
Sample Solutions
Prof Hearst: Sample Solutions for Q2 and Q4 , along with grading guidelines.
Seongtaek Lim: did a great job answering q1; also wrote a very nice java UDF for q3; he also Pig code specially designed to produce output that is easy for the graders to read. View Pig and Java Code.
Anupam Prakash: Another great solution to Q3 using a java UDF. View Code and Writeup.
Eloi Pereira: For Q4, he took the innovative step of looking at context words in the query to try to find locations (such as the word “address”, but removing email addresses, schools, etc.). View Code and Writeup
Yusef Shafi: Made a cool attempt to recognize bodies of water for Q4 and also had a nice way of automatically labeling output for Q2 and Q4. View Code and Output
Arian Shamas: Made a mighty effort to implement the session recognition of Q3 in Pig! I’m still not entirely sure how it works, but it’s worth looking at if you’re interested. View the Pig Code