Data Analysis

LSTAT2110  2016-2017  Louvain-la-Neuve

Data Analysis
5.0 credits
22.5 h + 7.5 h
1q

Teacher(s)
Segers Johan ;
Language
Français
Main themes
Contents: - Reminders of algebra and geometry useful for multivariate data analysis - Basic principles of factorial methods - Principal components analysis (PCA) - Canonical correlation - Factorial discriminant analysis (FDA) - Factorial correspondence analysis (FCA simple and multiple) - Cluster analysis - Data analysis in practice
Aims

General objectives. Presentation of the modern techniques for the analysis of huge multivariate data sets. Developing the basic tools for " data mining ". Specific objectives. At the end of this course, the students should be able to: - Manipulate and describe the information contained in huge data sets; - Understand why such or such method is appropriate; - Give a correct interpretation of the resulting pictures and of the output of the software; - Solve problems with real data sets.

The contribution of this Teaching Unit to the development and command of the skills and learning outcomes of the programme(s) can be accessed at the end of this sheet, in the section entitled “Programmes/courses offering this Teaching Unit”.

Content
Contents: - Reminders of algebra and geometry useful for multivariate data analysis - Basic principles of factorial methods - Principal components analysis (PCA) - Canonical correlation - Factorial discriminant analysis (FDA) - Factorial correspondence analysis (FCA simple and multiple) - Cluster analysis - Data analysis in practice
Other information
Prerequisite: The student should be able to: Manipulate and read algebraic expression (matrix calculus); Dominate the basic tools of statistical analysis. Evaluation: two parts: 1) A project with real data (see details below). The idea is to apply the methods of the course in a real problem chosen by the student. The course is concentrated on 10 weeks to allow the students to do this project in the best conditions. Students work by groups of two students. The teaching assistant will help the students for the software problems. This work would represent 12 hours of work per student (24 h. for the team of two). 2) Written exam, with closed book. The idea is to see if the student masters all the techniques developed in the course (understanding of the techniques) but also if he is able to comment output from a software (like these presented in the manual). Details about project: For those who want, 2 or 3 meetings will be organized by the teaching assistant to initiate the students to the software SPADN. The teaching assistant will also help the students for their project but only for software's issues. This project is a work on real data. The idea is to apply the techniques of the course to analyze a problem in a field chosen by the student. The project should at least contain a PCA and a FCA. It would be better if the chosen data set allows for both approaches (remember that multiple FCA is possible for most of the data sets). Often, some cluster analysis shed new light on the data that have been analyzed (detection of outliers, structure of different groups of individuals, etc…). In case, a statistical description of the obtained groups is useful. The project will be presented in a short report, summarizing: 1 The object of the analysis 2 Description of the data (units, etc.) 3 The analysis 4 Comments on the obtained results and conclusions. The report should not be longer than 7-10 pages (some details may be given in appendices). The evaluation criterion will be based on: 1 Appropriateness of the chosen methods. 2 Originality and interest of the chosen problem. 3 Deepness of the analysis (more than the minimum required). 4 Correctness of the comments. 5 Quality of the presentation of the report. Manual. L.SIMAR (2004) : " Multivariate Data Analysis ", 256 pages, Institut de Statistique, UCL. This manual is available at the DUC (students' bookstore). Professor : Léopold Simar, tél : 010/47 43 08, simar@stat.ucl.ac.be References : Lebart, L., Morineau, A. et J.P. Fenelon (1982) : Traitement des données statistiques. Dunod, Paris. Saporta, G. (1990) : Probabilités, analyse des données et statistiques. Ed. Tecnip, Paris. Romeder, J.M. (1973) : Méthodes et programmes d'analyse discriminante. Dunod, Paris
Faculty or entity<


Programmes / formations proposant cette unité d'enseignement (UE)

Program title
Sigle
Credits
Prerequisites
Aims
Master [120] in Economics: General
5
-

Master [120] in Statistics: General
5
-

Master [120] in Biomedical Engineering
5
-

Master [120] in Mathematics
5
-

Master [120] in Statistics: Biostatistics
5
-

Master [120] in Mathematical Engineering
5
-

5
-