- To master some important research issues in relation to internet, the web, and the processing of large digital documents databases.
- To develop a " web mining " application for collaborative recommendation (PHP/MySQL).
Main themes
- Web mining
- Information retrieval
- Text mining
- Collaborative filtering and recommendation
- Search engines technology and algorithms
Content and teaching methods
Summary
This course will introduce some recent techniques used in digital/internet information processing and retrieval, search engines technology and web mining. These techniques are essentially algorithms and statistical models, coming from the computer sciences, machine learning and data mining fields.
Content
The course will introduce some major web mining, information retrieval, text mining, collaborative filtering and search engine ranking techniques. For instance, it will cover
* Digital documents retrieval ;
* Web pages ranking techniques (PageRank, Hits, etc) ;
* Collaborative recommendation techniques ;
* Etc.
Some of the courses will be organized as seminars.
The techniques will be illustrated through a programming project during which the students will develop a web-enabled collaborative recommendation system (web mining, PHP/MySQL).
Methods
In-class activities
x Lectures
x Interactive seminar
x Problem based learning
x Project based learning
At home activities
x Readings to prepare the lecture
x Paper work
x Students presentation
Other information (prerequisite, evaluation (assessment methods), course materials recommended readings, ...)
Prerequisites (ideally in terms of competiencies)
Knowledge of information systems modelling (UML)
Knowledge of an object-oriented language (such as Java)
Knowledge of the fundamentals of mathematical and multivariate statistics
- Han J. & Kamber M. (2006) " Data mining : concepts and techniques. Morgan Kaufmann.
- Baldi P., Frasconi P. & Smyth P. (2003) " Modelling the internet and the web. Wiley.
- Chakraberti S. (2003) " Mining the web ". Morgan Kaufmann.
- Langville A. & Meyer C. (2006) " Google's PageRank and beyond. The science of search engine rankings. Princeton University Press.
- Baeza-Yates R. & Ribeiro-Neto B. (1999) " Modern information retrieval ". Addison Wesley.
- Weiss S., Indurkhya N., Zhang T. & Damerau F. (2005) " Text mining ". Springer.
Internationalisation
/
Corporate features
x conference
Skills
x presentation skills
x writing skills
x team work
x individual autonomy
x problem solving
x critical thinking
Techniques and tools for teaching and learning
x IT tools
x Internet work
x modelling
x quantitative methods
x mathematics
x technology and science