Due to the COVID-19 crisis, the information below is subject to change,
in particular that concerning the teaching mode (presential, distance or in a comodal or hybrid format).
5 credits
22.5 h
Q1
Teacher(s)
Fairon Cédrick; Tack Anaïs (compensates Fairon Cédrick);
Language
French
Main themes
The course begins with the architectural study of a complex automatic language processing system (recognition, analysis, generation). It continues with the study of the central linguistic theories and computer formalities of ANLP. Special attention is given to the presentation and analysis of real applications.
Aims
At the end of this learning unit, the student is able to : | |
1 |
The course will teach students the basic theory necessary to understanding the current objectives and issues of the automatic natural language processing (ANPL). At the same time, students will learn to analyse and explain the practical and technical limits that arise in the elaboration of computer systems aimed at language processing (problems of ambiguity, necessity of linguistic resource adaptability, multilingualism, etc.). By the end of the course, students will have received an overview of the "state of the art" in ANLP, be able to take a critical approach to ANLP applications, and have a general knowledge of the main theories in the field. |
Content
The course is given in the form of an interactive lecture. A reader composed of specialized articles allows students to prepare for the lectures.
Course Outline:
- The domain of NLP (naming, historical overview, levels of analysis)
- Coding and pre-processing
- Formal languages (regular expressions, FSA)
- Probabilistic language models (notions of probability, n-gram models)
- Lexical resources (electronic dictionaries, etc.)
- Lemmatization
- POS-tagging (rule-based approach, HMMs)
- Formal grammars (Chomsky's hierarchy, non-contextual grammars)
- Syntactic parsing (general principles, alternatives)
- Lexical semantics (thesaurus, ontologies, WordNet)
- Vector semantics (distributionalism, word embeddings)
Course Outline:
- The domain of NLP (naming, historical overview, levels of analysis)
- Coding and pre-processing
- Formal languages (regular expressions, FSA)
- Probabilistic language models (notions of probability, n-gram models)
- Lexical resources (electronic dictionaries, etc.)
- Lemmatization
- POS-tagging (rule-based approach, HMMs)
- Formal grammars (Chomsky's hierarchy, non-contextual grammars)
- Syntactic parsing (general principles, alternatives)
- Lexical semantics (thesaurus, ontologies, WordNet)
- Vector semantics (distributionalism, word embeddings)
Evaluation methods
Due to the COVID-19 crisis, the information in this section is particularly likely to change.
- Two practical works to be carried out during the semester [6 points]- Written (or oral) examination focusing mainly on the course and, to a lesser extent, on important concepts from the required readings. [14 points]
Other information
English-friendly course: course taught in French but offering facilities in English.
Teaching materials
- Jurafsky & Martin, "Speech and Language Processing" (2nd edition)
Faculty or entity
FIAL
Programmes / formations proposant cette unité d'enseignement (UE)
Title of the programme
Sigle
Credits
Prerequisites
Aims
Master [120] in Data Science : Statistic
Master [120] in Linguistics
Master [120] in Ancient and Modern Languages and Literatures
Master [120] in French and Romance Languages and Literatures : French as a Foreign Language