Aims
General objectives.
Presentation of the modern techniques for the analysis of huge multivariate data sets. Developing the basic tools for " data mining ".
Specific objectives.
At the end of this course, the students should be able to:
Manipulate and describe the information contained in huge data sets;
Understand why such or such method is appropriate;
Give a correct interpretation of the resulting pictures and of the output of the software;
Solve problems with real data sets.
Main themes
Contents.
- Reminders of algebra and geometry useful for multivariate data analysis
- Basic principles of factorial methods
- Principal components analysis (PCA)
- Canonical correlation
- Factorial discriminant analysis (FDA)
- Cluster analysis
- Data analysis in practice
Content and teaching methods
Contents.
- Reminders of algebra and geometry useful for multivariate data analysis
- Basic principles of factorial methods
- Principal components analysis (PCA)
- Canonical correlation
- Factorial discriminant analysis (FDA)
- Cluster analysis
Other information (prerequisite, evaluation (assessment methods), course materials recommended readings, ...)
Prerequisite:
The student should be able to:
Manipulate and read algebraic expression (matrix calculus);
Dominate the basic tools of statistical analysis.
Evaluation.
Two parts:
1) A project with real data (see details below). The idea is to apply the methods of the course in a real problem chosen by the student. The course is concentrated on 10 weeks to allow the students to do this project in the best conditions. Students work by groups of two students. The teaching assistant will help the students for the software problems. This work would represent 12 hours of work per student (24 h. for the team of two).
2) Written exam, with closed book. The idea is to see if the student masters all the techniques developed in the course (understanding of the techniques) but also if he is able to comment output from a software (like these presented in the manual).
Details about project:
For those who want, 2 or 3 meetings will be organized by the teaching assistant to initiate the students to the software SPADN. The teaching assistant will also help the students for their project but only for software's issues.
This project is a work on real data. The idea is to apply the techniques of the course to analyze a problem in a field chosen by the student. The project should at least contain a PCA and a FCA. It would be better if the chosen data set allows for both approaches (remember that multiple FCA is possible for most of the data sets). Often, some cluster analysis shed new light on the data that have been analyzed (detection of outliers, structure of different groups of individuals, etc
). In case, a statistical description of the obtained groups is useful.
The project will be presented in a short report, summarizing:
1 The object of the analysis
2 Description of the data (units, etc.)
3 The analysis
4 Comments on the obtained results and conclusions.
The report should not be longer than 7-10 pages (some details may be given in appendices). The evaluation criterion will be based on:
1 Appropriateness of the chosen methods.
2 Originality and interest of the chosen problem.
3 Deepness of the analysis (more than the minimum required).
4 Correctness of the comments.
5 Quality of the presentation of the report.
L.SIMAR (2004) : " Multivariate Data Analysis ", 256 pages, Institut de Statistique, UCL.
This manual is available at the DUC (students' bookstore).
Other credits in programs
BIR22/0A
|
Deuxième année du programme conduisant au grade de bio-ingénieur: Sciences agronomiques (Technologies et gestion de l'information)
|
|
Mandatory
|
BIR22/0C
|
Deuxième année du programme conduisant au grade de bio-ingénieur: Chimie et bio-industries (Technologies & gestion de l'information)
|
|
Mandatory
|
BIR22/0E
|
Deuxième année du programme conduisant au grade de bio-ingénieur: Sciences et technologies de l'environnement (Technologies et gestion de l'information)
|
|
Mandatory
|
BIR22/4E
|
Deuxième année du programme conduisant au grade de bio-ingénieur : Sciences et technologie de l'environnement (Technologies environnementales: eau, sol, air)
|
|
Mandatory
|
BIR22/5E
|
Deuxième année du programme conduisant au grade de bio-ingénieur : Sciences et technologie de l'environnement (Aménagement du territoire)
|
|
Mandatory
|
BIR22/6E
|
Deuxième année du programme conduisant au grade de bio-ingénieur : Sciences et technologie de l'environnement (Nature, eau & forets)
|
|
Mandatory
|
BIR22/7E
|
Deuxième année du programme conduisant au grade de bio-ingénieur : Sciences et technologie de l'environnement (Ressources en eau et en sol)
|
|
Mandatory
|
|