UCL - Studies

Version française

Study programmes
First cycle
Second cycle
Third cycle
Faculties and entities
Access to studies
Academic calendar
Search
Simple
Detailed
Per course

Data Analysis [STAT2411]
[22.5h+7.5h exercises] 5 credits

Version française

Printable version

This course is taught in the 1st semester

Teacher(s):

Isabelle De Macq (supplée Léopold Simar), Léopold Simar

Language:

French

Level:

Second cycle

>> Aims
>> Main themes
>> Content and teaching methods
>> Other information (prerequisite, evaluation (assessment methods), course materials recommended readings, ...)
>> Other credits in programs

Aims

General objectives.
Presentation of the modern techniques for the analysis of huge multivariate data sets. Developing the basic tools for " data mining ".

Specific objectives.
At the end of this course, the students should be able to:
- Manipulate and describe the information contained in huge data sets;
- Understand why such or such method is appropriate;
- Give a correct interpretation of the resulting pictures and of the output of the software;
- Solve problems with real data sets.

Main themes

Contents:
- Reminders of algebra and geometry useful for multivariate data analysis
- Basic principles of factorial methods
- Principal components analysis (PCA)
- Canonical correlation
- Factorial discriminant analysis (FDA)
- Factorial correspondence analysis (FCA simple and multiple)
- Cluster analysis
- Data analysis in practice

Content and teaching methods

Contents:
- Reminders of algebra and geometry useful for multivariate data analysis
- Basic principles of factorial methods
- Principal components analysis (PCA)
- Canonical correlation
- Factorial discriminant analysis (FDA)
- Factorial correspondence analysis (FCA simple and multiple)
- Cluster analysis
- Data analysis in practice

Other information (prerequisite, evaluation (assessment methods), course materials recommended readings, ...)

Prerequisite:
The student should be able to:
Manipulate and read algebraic expression (matrix calculus);
Dominate the basic tools of statistical analysis.

Evaluation: two parts:
1) A project with real data (see details below). The idea is to apply the methods of the course in a real problem chosen by the student. The course is concentrated on 10 weeks to allow the students to do this project in the best conditions. Students work by groups of two students. The teaching assistant will help the students for the software problems. This work would represent 12 hours of work per student (24 h. for the team of two).
2) Written exam, with closed book. The idea is to see if the student masters all the techniques developed in the course (understanding of the techniques) but also if he is able to comment output from a software (like these presented in the manual).


Details about project:

For those who want, 2 or 3 meetings will be organized by the teaching assistant to initiate the students to the software SPADN. The teaching assistant will also help the students for their project but only for software's issues.

This project is a work on real data. The idea is to apply the techniques of the course to analyze a problem in a field chosen by the student. The project should at least contain a PCA and a FCA. It would be better if the chosen data set allows for both approaches (remember that multiple FCA is possible for most of the data sets). Often, some cluster analysis shed new light on the data that have been analyzed (detection of outliers, structure of different groups of individuals, etc…). In case, a statistical description of the obtained groups is useful.

The project will be presented in a short report, summarizing:

1 The object of the analysis
2 Description of the data (units, etc.)
3 The analysis
4 Comments on the obtained results and conclusions.

The report should not be longer than 7-10 pages (some details may be given in appendices). The evaluation criterion will be based on:

1 Appropriateness of the chosen methods.
2 Originality and interest of the chosen problem.
3 Deepness of the analysis (more than the minimum required).
4 Correctness of the comments.
5 Quality of the presentation of the report.



Manual.
L.SIMAR (2004) : " Multivariate Data Analysis ", 256 pages, Institut de Statistique, UCL.

This manual is available at the DUC (students' bookstore).
Professor : Léopold Simar, tél : 010/47 43 08, simar@stat.ucl.ac.be


References :
Lebart, L., Morineau, A. et J.P. Fenelon (1982) : Traitement des données statistiques. Dunod, Paris.
Saporta, G. (1990) : Probabilités, analyse des données et statistiques. Ed. Tecnip, Paris.
Romeder, J.M. (1973) : Méthodes et programmes d'analyse discriminante. Dunod, Paris

For more information:

http://www.stat.ucl.ac.be/cours/stat2411/index.html

http://www.stat.ucl.ac.be/cours/stat2411/index.html

Other credits in programs

ACTU21MS

Première année du master en sciences actuarielles, à finalité spécialisée

(5 credits)

Mandatory

MATH22/G

Deuxième licence en sciences mathématiques

(5 credits)

STAT21MS/DM

Première année du master en statistique, orientation générale, à finalité spécialisée (data management et data mining)

(5 credits)

STAT21MS/EA

Première année du master en statistique, orientation générale, à finalité sécialisée (économie et assurance)

(5 credits)

STAT21MS/MM

Première année du master en statistique, orientation générale, à finalité spécialisée (méthodes mathématiques)

(5 credits)

STAT21MS/MS

Première année du master en statistique, orientation générale, à finalité spécialisée (marketing et sondage)

(5 credits)

STAT21MS/ST

Première année du master en statistique, orientation générale, à finalité spécialisée (sciences et technologie)

(5 credits)

STAT22MS/DM

Deuxième année du master en statistique, orientation générale, à finalité spécialisée (data management et data mining)

(5 credits)

STAT22MS/EA

Deuxième année du master en statistique, orientation générale, à finalité spécialisée (économie et assurance)

(5 credits)

STAT22MS/MM

Deuxième année du master en statistique, orientation générale, à finalité spécialisée (méthodes mathématiques)

(5 credits)

STAT22MS/MS

Deuxième année du master en statistique, orientation générale, à finalité spécialisée (marketing et sondage)

(5 credits)

STAT22MS/ST

Deuxième année du master en statistique, orientation générale, à finalité spécialisée (sciences et technologie)

(5 credits)

STAT3DA/B

diplôme d'études approfondies en statistique (biostatistique et épidémiologie)

(5 credits)

STAT3DA/E

diplôme d'études approfondies en statistique (statistique et économétrie)

(5 credits)

STAT3DA/P

diplôme d'études approfondies en statistique (pratique de la statistique)

(5 credits)



This site was created in collaboration with ADCP, ADEF, CIO et SGSI
Person in charge : Jean-Louis Marchand - Information : issec@stat.ucl.ac.be
Last update :13/03/2007