- to understand quantitative and qualitative data mining methods and to
apply them to decision making
- to develop a critical view of data mining techniques in specific
application domains
- to master information retrieval techniques from very large data collection,
possibly enriched with link structures
(WEB, social networks, ...)
- to apply information retrieval techniques in the context of search
engines and automated recommendation systems
- to implement data mining and information retrieval algorithms within
standard software environments such as S-Plus, R, SAS, Weka or Matlab
Main themes
. Complements of data mining
Canonical correlation analysis
Correspondence analysis
Partial least squares regression
Log-linear models
Association rules
. Decision making
Markov decision processes and reinforcement learning
Exploration/exploitation and bandit problems
Utility theory
Multi-criteria preference modeling - the Promethee method
Probabilistic reasoning withe bayesian networks
Possibility theory
Two-players game theory
Collective decisions
. Information retrieval
The basic vector-space model
The probabilistic model
Ranking web pages :PageRank, HITS, etc...
Collaborative recommendation models (recommender systems)
. Link analysis and web/graph mining
Network community detection
Similarity measures between nodes
Spectral graph partitioning and mapping
Reputation models.
Content and teaching methods
see above
Other information (prerequisite, evaluation (assessment methods), course materials recommended readings, ...)
Prerequisite:
- A first course on probability theory
- A first course on mathematical statistics
- An undergraduate course on matrix algebra
- An undergraduate course on multivariate analysis
References:
The students will receive a copy of research papers and book chapters from the following books:
- Alpaydin (2004), "Introduction to machine learning". MIT Press.
- Bardos (2001), "Analyse discriminante. Application au risque et scoring financier. Dunod.
- Bishop (1995), "Neural networks for pattern recognition". Clarendon Press.
- Bishop (2006), "Pattern recognition and machine learning". Springer-Verlag.
- Bouroche & Saporta (1983), "L'analyse des données". Que Sais-je.
- Cornuéjols & Miclet (2002), "Apprentissage artificiel. Concepts et algorithmes". Eyrolles.
- Duda, Hart & Stork (2001), "Pattern classification, 2nd ed". John Wiley & Sons.
- Dunham (2003), "Data mining. Introductory and advanced topics". Prentice-Hall.
- Greenacre (1984), "Theory and applications of correspondence analysis". Academic Press.
- Han & Kamber (2005), "Data mining: Concepts and techniques, 2nd ed.". Morgan Kaufmann.
- Hand (1981), "Discrimination and classification". John Wiley & Sons.
- Hardle & Simar (2003), "Applied multivariate statistical analysis". Springer-Verlag.
Disponible à http://www.quantlet.com/mdstat/scripts/mva/htmlbook/mvahtml.html
- Hastie, Tibshirani & Friedman (2001), "The elements of statistical learning". Springer-Verlag.
- Johnson & Wichern (2002), "Applied multivariate statistical analysis, 5th ed". Prentice-Hall.
- Lebart, Morineau & Piron (1995), "Statistique exploratoire multidimensionnelle". Dunod.
- Mitchell (1997), "Machine learning". McGraw-Hill.
- Naim, Wuillemin, Leray, Pourret & Becker (2004), "Réseaux bayesiens". Editions Eyrolles.
- Nilsson (1998), "Artificial intelligence: A new synthesis". Morgan Kaufmann.
- Ripley (1996), "Pattern recognition and neural networks". Cambridge University Press.
- Rosner (1995), "Fundamentals of biostatistics, 4th ed".Wadsworth Publishing Company.
- Saporta (1990), "Probabilités, analyse des données et statistique". Editions Technip.
- Tan, Steinbach & Kumer (2005), "Introduction to data mining". Pearson.
- Theodoridis & Koutroumbas (2003), "Pattern recognition, 3th ed". Academic Press.
- Therrien (1989), "Decision, estimation and classification". Wiley & Sons.
- Venables & Ripley (2002), "Modern applied statistics with S. Springer-Verlag.
- Webb (2002), "Statistical pattern recognition, 2nd ed". John Wiley and Sons.