Andreas Weigend
Stanford University
Stat 252 and MS&E 238
Class time Spring 2008: tentatively Monday 3:15 - 6:05 pm
Class location Spring 2008: tentatively Gates B03

Data Mining and Electronic Business

This course is about People and Data: Collecting data about behavior on the web, in communication patterns, in social networks, on dating sites, etc. Mining the data, building predictive models, creating (and rejecting) hypotheses, designing cool experiments, and learning from them quickly. And figuring out what is similar to the past and what has changed, what is coming on the horizon, and what the underlying drivers are.

Until about a decade ago, algo differentiator between a good and a bad firm often was theprogress was made in were about progress on algorithms. The last decade has been about progress on data. We will discuss the impact of this communication and data revolution on individuals, business, and society, essentially to most aspects of the world we live in. Applications range from online marketing (behavioral targeting and situational targeting) to architectures leveraging collective
intelligence. We are also fortunate to have some great guest speakers come to class. The detailed write-up of each class is created by groups of students on the course wiki. The 2007 course wiki is on the web, and the 2005 and 2004 syllabi might also help with the decision of whether to take this course.

The first half of the quarter focuses on data: Click data (what all can be collected and what it is useful for), intention data (such the queries from the searches you do, we will also discuss social search), attention data (such as tags on social bookmarking sites with its important application for discovery), and interaction data (of email headers and social networking sites). We will also discuss prediction markets as yet another way of gleaning rich data from people. The second half of the quarter focuses on models and on creating appropriate structures and incentives. We will discuss models for products (recommender systems), people (reputation systems), situation and location.

Students are expected to actively engage in class discussions, to have their assumptions challenged, and to bring their various backgrounds to class in order to make it a great experience for themselves and everybody else.

Schedule : We meet once a week, Monday afternoon for 3 hours (In 2008, this is Apr 7, 14, 21, 28, May 5, 12, 19, [no class on May 26, Memorial Day] and June 2, and possibly during exam week). This schedule proved useful last year since it makes it as easy as possibly for local students to physically come to class and participate. This is a lot more fun than just watching it on the internet, and you learn a lot more. Note that this explicitly includes SCPD students who only signed up for remote access, just don't tell anyone :)

Course wiki : All students have full read/write access to the course wiki at I encourage you to really actively contribute -- the class and you will benefit.

Grading : The main goal is that you get insights in the area of People and Data, and that you transfer them to your area, hopefully coming up with some interesting ideas and applications. To support this objective, your grade will be determined by the following:

  • Class wiki: We will form 8 groups, each with around 5 students. Each group is responsible to create the initial wikipage for one of the classes by Friday 6pm (i.e., 4 days after class). These pages are hperlinked, emphasizing the key lernings of each class. [30%]

  • Homeworks: There will be weekly assignments. They are due the day before class at 5pm, such that we can look through them and give brief feedback in a timely manner. [50%]
    The first assignments focus on hands-on experience with data
    - understanding your own data (your web logs),
    - getting data from other websites by modifying and running a simple spider (example code uses php), or using an API
    - running an online advertising campaign using Google AdWords, Yahoo SM, or Miscrosoft adCenter.
    - measure its effectiveness and, more broadly, understand what can be tracked easily on your site, using Google Analytics
    One assignment leaves lots of space for creativity but also needs to be coded up (there were some amazing entries in 2007):
    - write a recommendation system for
    We will also
    - mine the data of an online social network or a dating site,
    - run Cleverset's recommender system on the "network" of sites of students in the class.
    When appropriate, papers will be assigned to deepen your understanding.

  • Class participation. [20%]

  • Project: If you have a good and solid idea for an interesting project, I am happy to give feedback and jointly decide on whether it makes sense to do the project. I encourage projects in small groups. [optional]

There are also intership opportunities available. Last year's ranged from San Francisco (hitwise, web measurement) to Bangkok (agoda, online travel), China and Singapore.


Teaching Assistants:

  • Rudy Angeles
    Room 206 Sequoia Hall,
    Office hors: Fri 2:30 - 4:00 (also via Yahoo messenger: stat252spring2008)
    (650) 725-6148

  • Zehao Chen
    Room 238 Sequoia Hall
    Office hours: Mon 1:15 - 2:45
    (650) 725-5952

Note: The previous version of this page (addressing students considering taking the course) is here.
by | +1 (917) 697-3800 |