Advance notice: ‘Corpus Linguistics with R’ and ‘Statistics for linguistics with R’ bootcamps by S.T. Gries

Louvain-la-Neuve, Belgium, August 2019

The Linguistics Research Unit of the Institute of Language and Communication (Université catholique de Louvain, Belgium) will be hosting two 30-hour bootcamps by Stefan Gries next summer.

The ‘Corpus Linguistics with R’ bootcamp (12-16 Aug 2019) is a hands-on introduction to using the programming language R for the analysis of textual data (mostly corpora, but theoretically also literary works, web data, etc.). It is based on the second edition (2016) of Gries’s textbook Quantitative corpus linguistics with R and introduces a variety of programming constructs required for text processing and corpus exploration including

  • building word frequency lists and computing type-token ratios;
  • computing dispersion and key words statistics;
  • extracting concordance lines.

For that, we will discuss different relevant functions and data structures, control flow structures such as loops and conditionals, and a sizable number of regular expressions; in addition and time permitting, we will also cover very elementary basics of data visualization. The kinds of data dealt with in this course come from a variety of differently formatted/annotated corpora and will also include 1-2 examples of literary works and/or XML processing.

The ‘Statistics for linguistics with R’ bootcamp (19-23 Aug 2019) is a hands-on introduction to statistical methods for both graduate students and seasoned researchers and is based on the second edition (2013) of Gries’s textbook Statistics for linguistics with R. The course is intended for linguists who already have a basic knowledge in statistics and some experience using R, and who wish to improve their proficiency in statistical analysis of linguistic data. Using the open source software and programming language R, we will:

  • briefly recap basic aspects of statistical evaluation as well as several descriptive statistics;
  • briefly discuss a selection of monofactorial statistical tests for frequencies, means, correlations and how they constitute special (limiting) cases of regression methods;
  • explore different kinds of multifactorial and multivariate methods, in particular different kinds of regression approaches (fixed-effects only and mixed-effect modelling) as well as classification trees and random forests.

Details about the previous edition of the ‘Statistics for linguistics with R’ bootcamp in LLN are available at: For info about the prerequisites, visit

The website of the two events will be online in early 2019 and online registration will start on 1 March 2019. It will be possible to register for one event only but priority will be given to people who register for both. The number of participants is limited. If you would like to participate, mark the date in your diary!

Contact email:

Magali Paquot