The study of the fundamentals of data analysis and data modeling (also called "data mining" or "pattern recognition" in the computer science and the engineering community), in a decision-making perspective. The data analysis techniques will be applied to real-world projects, by using statistical analysis/data mining softwares such as S-Plus, R, SAS, Weka or Matlab.
Main themes
* A review of the main subspace projection and feature extraction of data analysis/modeling, and their interpretation:
- Categorical data: subspace projection and latent variable techniques techniques, log-linear models, etc.
- Numerical data: subspace projection and latent variable techniques, clustering techniques, discriminant analysis, etc.
* Supervised classification: naïve Bayes, artificial neural networks, large margin classifiers, decision trees, combining classifiers, etc.
* Unsupervised classification (clustering) methods.
* Non-stationnary data models: dynamic time warping, Markov models, hidden Markov models.
* Decision-making from data: a short introduction to Bayes decision theory, Bayesian networks, Markov decision processes, reinforcement learning, multicriteria decision analysis.
* Application to "information retrieval" and to "web mining" (PageRank, Hits, collaborative recommendation, etc).
* Projects (for instance scoring) based on real data, with S-Plus, R, SAS, Weka or Matlab.
Content and teaching methods
see above
Other information (prerequisite, evaluation (assessment methods), course materials recommended readings, ...)
Prerequisite:
- A first course on probability theory
- A first course on mathematical statistics
- An undergraduate course on matrix algebra
- An undergraduate course on multivariate analysis
References:
The students will receive a copy of research papers and book chapters from the following books:
- Alpaydin (2004), "Introduction to machine learning". MIT Press.
- Bardos (2001), "Analyse discriminante. Application au risque et scoring financier. Dunod.
- Bishop (1995), "Neural networks for pattern recognition". Clarendon Press.
- Bishop (2006), "Pattern recognition and machine learning". Springer-Verlag.
- Bouroche & Saporta (1983), "L'analyse des données". Que Sais-je.
- Cornuéjols & Miclet (2002), "Apprentissage artificiel. Concepts et algorithmes". Eyrolles.
- Duda, Hart & Stork (2001), "Pattern classification, 2nd ed". John Wiley & Sons.
- Dunham (2003), "Data mining. Introductory and advanced topics". Prentice-Hall.
- Greenacre (1984), "Theory and applications of correspondence analysis". Academic Press.
- Han & Kamber (2005), "Data mining: Concepts and techniques, 2nd ed.". Morgan Kaufmann.
- Hand (1981), "Discrimination and classification". John Wiley & Sons.
- Hardle & Simar (2003), "Applied multivariate statistical analysis". Springer-Verlag.
Disponible à http://www.quantlet.com/mdstat/scripts/mva/htmlbook/mvahtml.html
- Hastie, Tibshirani & Friedman (2001), "The elements of statistical learning". Springer-Verlag.
- Johnson & Wichern (2002), "Applied multivariate statistical analysis, 5th ed". Prentice-Hall.
- Lebart, Morineau & Piron (1995), "Statistique exploratoire multidimensionnelle". Dunod.
- Mitchell (1997), "Machine learning". McGraw-Hill.
- Naim, Wuillemin, Leray, Pourret & Becker (2004), "Réseaux bayesiens". Editions Eyrolles.
- Nilsson (1998), "Artificial intelligence: A new synthesis". Morgan Kaufmann.
- Ripley (1996), "Pattern recognition and neural networks". Cambridge University Press.
- Rosner (1995), "Fundamentals of biostatistics, 4th ed".Wadsworth Publishing Company.
- Saporta (1990), "Probabilités, analyse des données et statistique". Editions Technip.
- Tan, Steinbach & Kumer (2005), "Introduction to data mining". Pearson.
- Theodoridis & Koutroumbas (2003), "Pattern recognition, 3th ed". Academic Press.
- Therrien (1989), "Decision, estimation and classification". Wiley & Sons.
- Venables & Ripley (2002), "Modern applied statistics with S. Springer-Verlag.
- Webb (2002), "Statistical pattern recognition, 2nd ed". John Wiley and Sons.