STAT 5500: STATISTICAL DATA MINING

2006 Winter



Instructor: Hong Gu

Office: Chase 101

Phone: 494-7161

E-mail: hgu@mathstat.dal.ca

Lectures: 9:35-10:55am on Mon. and Fri.

(during Jan. 19 to Feb. 6, this lecture time overlaps with Stat1060 lectures, thus will be changed to Mon. Wed. Fri. 9:35-10:25am)

Place: Chase 107

Office Hours: Mon. Wed. Fri. 12:30pm-1:30pm

Course description: A variety of supervised learning and unsupervised learning methods are introduced and their statistical insights will be discussed. 
Topics to be discussed for supervised learning include: Linear methods for regression and  classification, additive models and Trees (GAM, CART, 
PRIM and MARS), bagging and boosting and neural networks (These correspond to the Chapters 1, 2, 3, 4, 9, 10 and 11 of the text book). 
The unsupervised learning methods (clustering analysis) included in Chapter 14 will also be introduced. 



Prerequisite: Stat 3340.03, 4350.03, or instructor's consent

Textbook: Hastie, T., Tibshirani, R., Friedman, J. (2001),

The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer

Marking Scheme:

40 % Homework assignments

30 % Project

15 % Papers reading Report

15 % Presentation

Course Outline: (tentative)

week 1-week3: Introduction, unsupervised learning methods, chapter 14.

week 4: Chapter1. Supervised learning, least square and Nearest neighbors, Sec 2.1-2.3. Statistical decision theory, Sec. 2.4.

week 5: Curse of dimensionality and Bais-variance tradeoff, 2.5-2.8. Linear methods for regression, least square and subset selection, Sec. 3.2-3.4.

week 6: Shrinkage methods, Sec. 3.4. Linear method for classification: linear discriminant analysis, Sec. 4.3

week 7: Quadratic discriminant analysis and other regularized methods, Sec. 4.3. Logistic regression, Sec. 4.4.

week 8: Generalized additive models, Sec. 9.1. CART, Sec. 9.2.

week 9: CART (continues) , Sec. 9.2. PRIM, Sec. 9.3.

week 10: MARS, Sec. 9.4. Bagging Sec. 8.7. Boosting and additive trees, Sec. 10.1-10.4.

week 11: Boosting and additive trees, Sec. 10.5-10.10.

week 12: Neural networks, Sec 11.1-11.3

week13: Neural networks.