AILA '96

11th World Congress of Applied Linguistics

Exploiting Computerized Learner Corpora (CLC)

Jyväskylä, Finland
4-9 August 1996

The recent proliferation of large computerised corpora has provided a huge stimulus for linguistic research. The majority of these corpora contain native speaker productions however, and researchers have not had access to a homogeneous corpus of learner language which could be compared to similar native speaker language, or indeed which could be used to compare one type of learner language with another.

The International Corpus of Learner English (ICLE) was compiled to fill this gap. Centralised at the University of Louvain, Belgium, it has been collected in collaboration with several universities. The corpus, which now stands at 1 million+ words, is made up of argumentative essays written by university students of English from 11 mother tongue backgrounds (Chinese, Czech, Dutch, Finnish, French-speaking, German, Japanese, Polish, Russian, Spanish, Swedish).

A homogeneous corpus of learner writing such as ICLE provides excellent material for SLA research in many different areas. It provides new insights into learner language, which can be applied to ELT materials design and classroom methodology.

The symposium will cover all aspects of computerised learner corpora: compilation of learner data, computer processing of the data (concordancing, tagging, parsing, etc.), computer-aided studies of learner grammar, lexis and discourse and pedagogical applications (ELT dictionaries, electronic writing aids, etc.).

The speakers, who will be mainly (but not exclusively) ICLE participants, will present the results of research thus far and discuss future research goals. Though the focus will be on English, the organizers would like to encourage people working on other languages to attend. As the main aim of the symposium is to generate discussion in this fast-developing field, the programme will build in ample time for audience feedback, both after each paper and at the end of the day.

Symposium Organisers

Speakers and titles of papers

Symposium Outline

Exploiting Computerised Learner Corpora (CLC)

Speakers at this CLC symposium will present the results of recent research in this new and fast-expanding field. Research areas include: data compilation, processing and annotation; (semi)-automatic studies of learner grammar, lexis and discourse and pedagogical applications. Audience feedback is actively sought.


A case study will be presented of the two highly frequent, polysemous verbs 'find' and 'want' which both show a great variety in their complementation patterns. A comparison will be made of their occurrence in native and non-native English with respect to their 'bare' frequencies as well as their syntactic context and various senses.

This paper examines the under/overuse of various lexical, grammatical and discoursal features in argumentative essays written by advanced Swedish learners of English. The Swedish learners can be shown to deviate significantly from native British writers in several respects.

This general introduction to the field of computerized learner corpora (CLC) addresses three main questions: (1) How does one compile and process a CLC?; (2) What insights into learner language do CLC provide?; (3) What contribution can CLC data make to both SLA theory and classroom methodology?

This paper examines quantifiable patterns of linguistic overuse and underuse in written EFL discourse, focusing particularly on contrasts between Polish learner and native English writing. It is proposed that such findings could be used effectively to improve writing textbooks, particularly in the area of lexis (undue repetition, superfluous phraseology etc).

Adjective intensification is an important function of 'modal grammar'; it highlights the qualities we consider to be particularly relevant, extremely interesting or crucially important. This paper explores patterns of non-nativeness that are associated with advanced German learners' usage in argumentative essays, covering aspects of over- and underuse, collocability, delexicalisation and information structure.

Learner corpora present new issues and new applications for current language processing software. This contribution will survey the experience of the Universities of Lancaster and Louvain in using language processing techniques for machine-aided exploitation of learner corpora.

This paper demonstrates a prototype electronic language-learning and writing environment based on the analysis of a computer corpus of the writing of English learners in Hong Kong. L2 corpus analysis techniques and Computer-Assisted-Language-Learning design issues are discussed.

Writer/reader visibility in discourse can be assessed through the presence or absence of certain linguistic features, amongst which personal pronouns, imperatives and direct questions. This paper investigates the distribution of these and other features of writer/reader involvement in the writing of learners from 5 language backgrounds and evaluates their impact on the effectiveness of the discourse.

This paper adopts a comparative, quantitative approach to the study of learner language, investigating differences in vocabulary frequencies (e.g. quantifiers, core adjectives and verbs, connectors, phrasal verbs) in learner and native English writing. These differences shed light on L1-induced and universal features of learner lexis and discourse.

The frequency of questions in the EFL writing of students from various L1 backgrounds has been found to vary. This paper explores discourse functions of direct questions in advanced EFL writing, primarily focusing on argumentative strategies in EFL L1-Finnish student writing. Findings will be compared with Anglo-American student writing.

Key Words

interlanguage, English as a Foreign Language (EFL), writing, computer, corpus

Page : UCL | GERM | ETAN | CECL | Useful pointers.