Stat 252 and MS&E 238
Class time Spring 2007: Monday 3-6 pm
Class location Spring 2007: Gates B3
Data Mining and Electronic Business
This course is about People and Data: Measuring behavior on the web, in communication patterns, in social networks, on dating sites. Mining the data, building predictive models, creating (and discarding) hypotheses, designing cool experiments, and learning from them quickly. And figuring out what is similar and what is very different from before, and why.
We will discuss the impact this communication and data revolution is having on individuals, business, and society, essentially to most aspects of the world we live in. Applications range from online marketing (behavioral targeting and situational targeting) to architectures leveraging collective intelligence. We are also fortunate to have some great guest speakers come to class. Previous years are on the web (2005, 2004), and this year's is current on the course wiki.
Students are expected to actively engage in class and discuss, to have their assumptions challenged, and to bring their various backgrounds to class to make it a great experience for themselves and everybody else in class.
: We meet once a week, on Monday afternoons for 3 hours each (Apr 9, 16, 23, 30, May 7, 14, 21, June 4, and once during exam week). The purpose is to make it really easy for everybody to physically come to class and participate. This is a lot more fun than just watching it on the internet, and you learn a lot more. Note that this explicitly includes SCPD students who only signed up for remote access, just don't tell anyone :) Schedule
: All registered students have full read/write access to aweigend.wiskispaces.com. I encourage you to really actively contribute -- the class and you will benefit. Course wiki
: The main goal is that you to get insights and ideas in the area of People and Data, and to transfer the knowledge and come of with applications in your area of interest. To support this objective, the following elements enter your final grade: Grading
Wiki: We will form 8 groups, each with around 5 students. Each group is responsible to create the initial wikipage for one of the classes by Friday 6pm (i.e., 4 days after class). [30%]
Homeworks: There will be weekly assignments. The first 3 focus on hands-on experience with data (looking at web logs, using APIs, running an online advertising campaign), the remaining ones focus on the readings and coencepts. Homeworks are due Sunday at 5pm, i.e.,the day before the subsequent class, such that we can briefly discuss it during class. [50%]
Class participation [20%]
Project: If you have a good and solid idea for an interesting project, I am happy to give feedback and jointly decide on whether it makes sense to do the project. I encourage projects in small groups. [optional]
I am also able to create interships at some of the companies I advise, including summer abroad in Bangkok (Agoda, online travel), Cologne (Imageloop, onlide slideshows), as well as possibly China and Singapore.
Papers: Some articles as well as some chapters of the forthcoming O'Reilly book on Collective Intelligence by Toby Segaran, see course wiki.
Textbooks: Some of the material is very recent, and the scope of this class is quite broad, that I do not know of any decent textbook for the class. Depending on your specific background and interests, the following might be useful:
P. Baldi, P. Frasconi, and P. Smyth: Modeling the Internet and the Web (2003) Background on Web technology, statistical modeling of behavior, information retrieval
C. Shapiro, and H.R. Varian: Information Rules (1998) Short book with lots of important insights about the networked economy (network effects, economics of digital goods, pricing, etc.)
M.J.A. Berry and G.S. Linoff: Data Mining Techniques (pdf) (2nd edition, 2004) Applications of data mining in marketing and business
T. Hastie, R. Tibshirani, and J.H. Friedman: The Elements of Statistcal Learning (2003) The classic for more theoretical aspects in data mining
C.M. Bishop: Pattern Recognition and Machine Learning (2006) Recent book on data mining from a Bayesian perspective
Room 237 Sequoia Hall
Room 204 Sequoia Hall
Note: The previous version of this page (addressing students considering taking the course) is here.
by | +1 (917) 697-3800 | www.weigend.com