Mining Patterns in Data

lingi2364  2019-2020  Louvain-la-Neuve

Mining Patterns in Data
Note from June 29, 2020
Although we do not yet know how long the social distancing related to the Covid-19 pandemic will last, and regardless of the changes that had to be made in the evaluation of the June 2020 session in relation to what is provided for in this learning unit description, new learnig unit evaluation methods may still be adopted by the teachers; details of these methods have been - or will be - communicated to the students by the teachers, as soon as possible.
5 credits
30.0 h + 15.0 h
Q2
Teacher(s)
Nijssen Siegfried;
Language
English
Prerequisites
LINFO1121
Main themes
An important task in data mining is the discovery of patterns in data. Patterns are recurring structures in data; they can provide interpretable explanations for observations in data, can help to gain a better understanding in the structure of data, can be used to build better models, and can be used to solve other computational tasks (such as the construction of database indexes or data compression). Patterns can be found in many different forms of data, including data from supermarkets, insurance companies, scientific experiments, social networks, software projects, and so on.
This course will provide an in-depth introduction to pattern mining. After an introduction to the basics of pattern mining, it will provide an in-depth discussion of a number of advanced pattern mining techniques.
Topics that will be discussed are:
  • Categories of pattern mining tasks, including pattern and pattern set mining, supervised and unsupervised pattern mining, dataset types,and pattern scoring functions;
  • Algorithms for solving different pattern mining tasks;
  • Data structures for making pattern mining more efficient;
  • The implementation of pattern mining algorithms;
  • Mathematical foundations for the different categories of pattern mining tasks;
  • Complexity classes relevant to pattern mining;
  • Applications of pattern mining, with a special focus on the application of pattern mining techniques in software engineering.
Aims

At the end of this learning unit, the student is able to :

1 Given the learning outcomes of the "Master in Computer Science and Engineering" program, this course contributes to the development, acquisition and evaluation of the following learning outcomes:
  • INFO 1
  • INFO 2.1-4
  • INFO 4.2-4
  • INFO 5.5
  • INFO 6.4
Given the learning outcomes of the "Master [120] in Computer Science" program, this course contributes to the development, acquisition and evaluation of the following learning outcomes:
  • SINF 1.M4, 1.M3
  • SINF 2.1-4
  • SINF 4.2-4
  • SINF 5.5
  • SINF 6.4
Students completing this course successfully will be able to
  • Identify the most appropriate pattern mining task for a given data set ;
  • Explain the advantages and disadvantages of pattern mining algorithms in relation to the problem to be solved ;
  • Identify appropriate approaches for evaluating the quality of patterns and apply them in various situations ;
  • Determine the computational complexity of pattern mining problems;
  • Develop new pattern mining algorithms for new applications.
 

The contribution of this Teaching Unit to the development and command of the skills and learning outcomes of the programme(s) can be accessed at the end of this sheet, in the section entitled “Programmes/courses offering this Teaching Unit”.
Content
  • Frequent itemset mining: algorithms, data structures;
  • Constraint-based itemset mining: algorithms, data structures;
  • Patterns in sequences, trees, graphs: algorithms, data structures, complexity classes;
  • Pattern mining in supervised data: scoring functions, algorithms;
  • Pattern set mining in supervised data: scoring functions, models (decision trees, boosting), algorithms
  • Pattern set mining in unsupervised data: scoring functions (minimum description length principle, maximum entropy), algorithms
  • Applications of pattern mining: software repositories, traces, log files, cheminformatics, bioinformatics, industrial applications
Teaching methods
  • Lecture
  • 3 exercises
Evaluation methods
25% for the exercises + Written Exam. The exercises only count when the written exam has a grade >=10. The same conditions apply in August.
Bibliography
Charu C. Aggarwal, Jiawei Han (Eds.),  Frequent Pattern Mining, Springer 2014 (ISBN: 978-3-319-07820-5)
Chapitres de
Siegfried Nijssen, Albrecht Zimmermann and Luc De Raedt, Essentials of Pattern Mining. 
 
Faculty or entity
INFO


Programmes / formations proposant cette unité d'enseignement (UE)

Title of the programme
Sigle
Credits
Prerequisites
Aims
Master [120] in Data Science Engineering

Master [120] in Computer Science and Engineering

Master [120] in Mathematical Engineering

Master [120] in Computer Science

Master [120] in Data Science : Statistic

Master [120] in Data Science: Information Technology