Spoken Learner Corpus Colloquium

Centre for English Corpus Linguistics, University of Louvain , Belgium

24-25 January, 2008



Viktoria BÖRJESSON (Göteborg University, Sweden)

The use of reinforcing and attenuating modifiers of adjectives in Swedish advanced learners' English

Investigating advanced learner use of adjective modifiers (cf. e.g. Granger 1998 and Lorenz 1999) is of considerable interest. In particular, two phenomena can be studied in relation to overuse of these modifiers, viz . over-reinforcement and over-hedging of qualifications or evaluations. Apart from these two presumably sociolinguistic phenomena, possibly based on power relations between student and teacher or between non-native and native speaker, differences in the use of adjective modifiers may indicate L1 transfer or cultural differences with regard to rhetorical tradition or teachers' instructions. A third reason for frequency differences between the corpora investigated might be deficient vocabulary skills, leading to less variation and overrepresentation of some modifiers, and collocational misuse, in some cases due to restricted collocability of the modifier and the adjective modified.

By outlining the differences between NNS and NS spoken texts in the frequencies of modifiers of adjectives and adjectives modified, we may select overused modifiers for closer investigation and look at distribution and context and thus contribute to our knowledge of advanced students' interlanguage as well as transfer and power relations. This preliminary study of the use of adjective modifiers in the Swedish part of LINDSEI is focused on the frequencies of the adjective modifiers identified, such as adverbs (e.g. quite , really ), NPs (e.g. a bit , a little bit ), PPs (e.g. in a way ) and sentences (e.g. I would say ), in both pre- and postmodification. The frequencies have been normalised and compared with the spoken part of ICE-GB, the British part of the International Corpus of English . It was found that out of the 14 most frequent adjective modifiers in LINDSEI, nine were considerably overused (more than 86%), four were slightly overused and only one modifier was underused. Overused modifiers were analysed as regards distribution across the material and the adjectives modified. Furthermore, a general comparison was made with written learner texts from the SWICLE corpus, the outcome of which showed differences in vocabulary choices, frequency and style.


Xiao Sylvia CHEN ( South China Normal University , Guangzhou , PR China – City University of Hong Kong , Hong Kong SAR)

The Chinese subcorpus of LINDSEI: Corpus compilation, preliminary studies and future development

The Chinese subcorpus was one of the earliest components of LINDSEI compiled and studied. This presentation first introduces background information on the data and issues relating to its collection and transcription. Some of the issues are specific to Chinese learners while others are more general in nature and may apply to other subcorpora. One of the issues is how to work out a set of transcription conventions that are consistent and yet flexible enough to represent L 1 -specific features appropriately. Another concerns the significance of directions and prompts. Subtle variations, whether linguistic or non-linguistic, may elicit very different data. The second part of the presentation gives a summary on the preliminary studies done so far by members of the Chinese team. They cover areas such as pronunciation, cohe sive devices , epistemic markers , verb collocations and so forth. The author's previous work on person reference and her present PhD project on discourse competence are discussed in greater detail. The presentation concludes with a discussion of directions for future research and corpus development.


Sylvie DE COCK (Université catholique de Louvain, Belgium)

And yeah, it was really good! Positive stance in native and learner speech

This paper sets out to investigate attitudinal stance (Biber & Conrad 2000) in native and learner speech. The focus of the paper is on the adjectives that native speakers and French-speaking advanced EFL learners use to convey positive attitudinal stance in informal interviews and more specifically on the preferred patterns in which these adjectives are used. Positive attitudinal stance in native and learner speech will be analysed using LOCNEC (the Louvain Corpus of Native English Conversation) and the French subcorpus of the Louvain International Database of Spoken English Interlanguage (LINDSEI).


Ourania HATZIDAKI ( Hellenic Air Force Academy , Greece )

‘I don't know I'm very negative to this whole thing' : Hedging in the speech of Greek learners of English

This paper presents the results of a contrastive analysis of hedging in the LOCNEC and Greek LINDSEI corpora. As a basis for the present investigation we use the phrase I don't know , which is by far the most frequent 3-word unit in both corpora. The various discourse functions of this multifaceted phrase are examined in the light of the two existing studies of I don't know in English native speech ( Tsui 1991 and Diani 2004). The specialised character of the two corpora (asymmetry of interlocutors, fixed agenda, unidirectional interviewing; Pulcini & Furiassi 2004) is taken into consideration. The analysis of the present data shows that both English and Greek speakers employ this phrase to fulfill similar pragmatic needs (e.g. to express uncertainty about the truth value of a statement or the correctness of an interpretation on the basis of evidence, indecisiveness or insecurity about future prospects, doubt about their own abilities, ambivalence about their feelings; also to minimize the effect of a potentially face-threatening view, to build rapport with the interviewer and so on). Greek learners also use this phrase to express encoding difficulties (cf. de Cock 2004).

We also look into the collocational characteristics of I don't know in the speech of English native speakers and Greek learners, particularly with regard to the widely-held view that hedging devices tend to co-occur (Aijmer 2002, Stenström 1994). It emerges from the data that, although a certain degree of hedge clustering is present in the Greek learner data, this occurs to a much lesser extent than in the English native speaker corpus and with a significantly narrower range of hesitation markers. Moreover, Greek learners systematically use a totally different technique rarely employed by native speakers, namely they juxtapose a single hesitation marker such as I don't know with an intensified or categorical statement, thus making their hedged utterances sound apologetic, defensive or insufficiently justified (Hatzidaki 2006).


Aijmer, K. (2002) English Discourse Particles: Evidence from a Corpus . Amsterdam : John Benjamins.

De Cock, S. (2004) “Preferred sequences of words in NS and NNS speech”. Belgian Journal of English Language and Literature , New Series 2 : 225-246.

Diani, G. (2004) “The discourse functions of I don't know in English conversation”. In K. Aijmer A.-B. Stenström (eds) Discourse Patterns in Spoken and Written Corpora . Amsterdam : John Benjamins.

Hatzidaki, O. (2006) “Evaluation of the spoken lexicophraseological skills of Greek university students of English: A corpus-based approach”. In Proceedings of the International Conference “Foreign Language Teaching in Tertiary Education”, 9-10 June 2005 , Epirus Institute of Technology, Dionikos Publications, Athens .

Pulcini, V. & C. Furiassi (2004) “Spoken interaction and discourse markers in a corpus of learner English”. In A. Partington, J. Morley & L. Haarman (eds) Corpora and Discourse . Bern : Peter Lang.

Stenström, A.-B. (1994) An Introduction to Spoken Interaction . London : Longman.

Tsui, A.B.M. (1991) “The pragmatic functions of ‘I don't know'”. Text, 11 (4) , 607-622.


Claire HUGON (Université catholique de Louvain, Belgium)

Exploring register variation in learner lexis: The high-frequency verb make in native and learner speech and writing

High-frequency verbs are notoriously difficult for learners, even at the advanced level. They are extremely versatile, as illustrated by their high degree of polysemy and their high rate of phraseological uses, not to mention the uncommonly large number of phrasal and prepositional verbs in which they enter. To complicate matters more, the various meanings and patterns are distributed unevenly across registers (cf. McCarthy & Carter 1997, Biber et al. 1999). In the corpus linguistics literature, however, high-frequency verbs are often treated as though they were a unified phenomenon, irrespective of register distinctions or polysemy.

In this presentation, I will report on a corpus-based study of a representative of this group of verbs, viz. make , in native and learner speech and writing. The study is based on the French subcorpora of LINDSEI and ICLE, as well as their native counterparts, the LOCNEC and LOCNESS corpora. The speech-writing comparison in native English confirms that the profile of make varies considerably according to the register. As for the native-learner comparison, it reveals a scale of proficiency in learners' productive competence, going from highly limited to near-perfect knowledge of native-like use of make . Some pedagogical implications will be examined, and the study will be situated in the larger context of my research on the acquisition of high-frequency verbs.


Joanna JENDRYCZKA-WIERSZYCKA ( Adam Mickiewicz University , Poland )

Sort of native speech: On the use of vagueness tags & its possible sources in the first language

Vague language is regarded as “one of the most important features of the vocabulary of informal conversation.” (Crystal & Davy 1975). What is understood by VTs here is natural speech phenomena occurring, among others, in times of memory loss, or from a lack or unawareness of a precise equivalent of a word the speaker wants to use (Crystal & Davy 1975). Sequences such as: I mean I, sort of like, you know or stuff like that exemplify this phenomenon.

Since VTs are widely recognized to be underused in non-native English varieties, the need for a comparison of the data with the use of VTs in the learner's L1 follows as a matter of course. However, while the literature pertaining to both English as L1 and as L2 is abundant, it is apparently neglected in Polish linguistics. The present paper attempts at bridging the gap in Polish VTs.

In order to see whether it is the case that Polglish users underuse VTs because they have low incidence in the users' L1, the quest for the Polish inventory of VTs was conducted by means of using the corpus of the Polish language of the younger generation of Poles (Otwinowska-Kasztelanic 2000). The hypothesis is that Polish speakers use a smaller percentage of VTs in their L1 than English speakers in their L1. This paper presents the inventory of Polish VTs and investigates to what extent this has influence on their L2 performance.

The procedures of investigation, based on Biber et al.'s (1999) corpus-driven 'recurrent word combination' (RWC) method will be briefly pointed to. The function of software extracting RWCs and measuring differences in frequency that allows for IL comparisons will also be explained.


Aijmer, Karin. (2002) English Discourse Particles: Evidence from a Corpus. Amsterdam : Benjamins.

Altenberg, Bengt (1990) “Speech as linear composition”. In G. Caie, K. Haastrup, A.L. Jakobsen, J.E. Nielsen, J. Sevaldsen, H. Specht & A. Zettersten (eds) Proceedings from the Fourth Nordic Conference for English Studies, Helsingor, May 11-13 1989 (pp. 133-143). Copenhagen University , Department of English.

Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad & Edward Finnegan (eds) (1999) Longman Grammar of Spoken and Written English. Harlow : Longman.

Channell, Joanna (1994) Vague Language . Oxford : Oxford University Press.

Crystal, David & Derek Davy (1975) Advanced Conversational English . London : Longman.

De Cock, Sylvie (1998) “A recurrent word combination approach to the study of formulae in the speech of native and non-native speakers of English.” International Journal of Corpus Linguistics 3(1): 59-80.

De Cock, Sylvie (2004) “Preferred sequences of words in NS and NNS speech”, Belgian Journal of English Language and Literatures (BELL) , New Series 2: 225-246.

Granger, Sylviane (1996) “From CA to CIA and back: An integrated approach to computerized bilingual and learner corpora”. In Karin Aijmer, Bengt Altenberg & Matts Johansson (eds) Languages in Contrast. Textbased Cross-linguistic Studies (pp. 37-51). Lund : Lund University Press.

Jendryczka-Wierszycka, Joanna (2006) Lexical bundles in Polish learner speech: a study based on the PLINDSEI corpus of spoken learner English . Unpublished M.A. thesis.

Kay, Paul (1997) Words and the Grammar of Context. Chapter 6: The kind of/sort of construction (pp. 145- 158). Stanford , CA : CSLI Publications.

Otwinowska-Kasztelanic, Agnieszka (2000) A Study of the Lexico-Semantic and Grammatical Influence of English on the Polish of the Younger Generation of Poles (19-35 years of age). Warszawa: Wydawnictwo Akademickie Dialog.

Otwinowska-Kasztelanic, Agnieszka (2000) Korpus jezyka mówionego mlodego pokolenia Polaków (19-35 lat). Warszawa: Wydawnictwo Akademickie Dialog.

Scott, Mike (1998) WordSmith Tools Manual, Version 3.0 . Oxford : Oxford University Press. (http://www.lexically.net/wordsmith/) (date of access: 16 May 2007)

Usoniene, A. (ed.) (1998) Proceedings from the International Conference on Germanic and Baltic Linguistic Studies and Translation. University of Vilnius , 22-24 April 1998. Vilnius : Homo Liber.


Tomoko KANEKO & Takako KOBAYASHI (Showa Women's University , Japan )

Use of verbs by Japanese learners of English - Past tense and “feel”

In this paper, a brief explanation about Japanese LINDSEI group activities previously conducted will be introduced first.

Next, a report on a study of the use of the past tense forms in the LINDSEI Japanese Sub-corpus will follow. It was found that the learners correctly used various types of past tense forms only 50 to 66% of the time and they often failed to mark past tense forms of state verbs, especially those expressing “feeling”.

In the second study, the English learner corpora from four different language learner groups will be analyzed: Japanese, Chinese, Italian, and French speakers, as well as two native speaker corpora for the purpose of comparison with the learner data. The results revealed that varieties of the usage of “feel” in respective learner corpora have unique tendencies, and they did not necessarily conform to the tendencies in the native speaker data. This suggests, as Wierzbicka (1999) claimed, expressions that correspond to the word “feel” in English exist universally, expressing “physical internal sense” or “mental state, emotion, etc.”, but that the use of the verb “feel” is affected by the first languages of learners and their socio-cultural experiences.

Finally, we would like to conclude our presentation with two directions for future study: 1) to organize lexicogrammatical studies so far administered by each member of the Japanese LINDSEI group, and 2) to develop our study area into pragmatics.


Susanne KAEMMERER (Justus-Liebig-Universität Giessen , Germany )

Error-tagging the German component of LINDSEI: principles, problems, decisions

In this presentation we will sketch out some of the problems our team had with error tagging the German subcorpus of LINDSEI. For the error-tagging process, we used the UCL Error Editor. Some of the problems we will report on are related to the fact that the UCL Editor was designed primarily for written data.

The focus of this talk will be on the tagset. We suggest a number of new tags, which we have already implemented in the tag menu of the editor and which we use as normal categories in the German subcorpus. We will also make suggestions for potential changes within the error tagging manual for all LINDSEI teams.


Pascual PÉREZ-PAREDES (Universidad de Murcia, Spain)

A multidimensional analysis of spoken learner language: The LINDSEI register

Multidimensional Analysis (MA) of language offers researchers a framework for the interpretation of language use that incorporates the notion of register . In the broad, the focus of MA shifts our attention from a comparison between two, or more, registers to the notion of language variation as a continuum which shows major trends of functionality that go beyond the individual analysis of particular linguistic features.

MA studies have explored linguistic variation using a corpus-based methodology that benefits from both quantitative and qualitative research methods. In this way, MA has played an important role in the description of the English language. Notwithstanding, the relevance of MA may go beyond the purely linguistic boundaries of description and be functional in the teaching and learning of foreign languages. Disappointingly, the MA quantitative approach adopted by Biber (1988, 2003) has not attracted the attention of the learner language research community so far.

In this paper, I will explore the insights that learner language researchers can gain by using MA. To do so, I will consider the register that emerges from the interview elicitation tasks that have been used for the compilation of LINDSEI. The corpus I am using here has been contributed by 59 Spanish native speakers, learners of English in their first year of the Degree in English Studies at UMU ( Universidad de Murcia , Spain ). The results here will be instrumental in discussing the range and scope of the researchers' debate on the way we interpret learner language and how we characterize against the notion of language use in general and that of register in particular.


Virginia PULCINI ( University of Turin , Italy )

Evaluation and point of view in the oral production of Italian learners of English

This paper is a contribution to the study of Italian learners' strategies in the management of spoken interaction in English, a research topic already framed, though tentatively, in previous research (Pulcini & Furiassi 2004). The focus is on the expression of evaluation and attitudinal stance, a topical issue in contemporary linguistics (Thompson & Hunston 2003, Anderson & Bamford 2004). The LINDSEI format lends itself to this type of pragmatic analysis, since the corpus consists of interviews in which learners are invited to discuss different topics (film, play, experience, countries, a story to retell) and express their opinions on everyday issues or personal events and ideas. Evaluation in spoken discourse may emerge in different ways and in various degrees of explicitness, usually through the use of evaluative language, markers of subjectivity ( I think ) and elements of interpersonal metadiscourse ( maybe, you know ) typical of the spoken mode. Parameters of evaluation may be expressed in terms of “good or bad”, “likelihood”, “expectedness” and “importance”, and are conveyed through conceptual and linguistic signals including lexical, grammatical and textual items of discourse. To this end, data have been extracted from the LINDSEI-It and analysed. The most common evaluative adjectives ( good , different , important , interesting ) extracted from the frequency list and their recurrent collocations with nouns or modifiers ( very , really ) have been analysed. Following a reverse path, we have searched for the evaluative adjectives associated to the “topic words” of the interviews ( film , play , experience , countries , girl/woman/lady ). In addition, other markers of evaluation have been considered, such as private verbs ( I think , I believe , I know etc.), markers of subjectivity ( in my opinion ) and modal verbs indicating possibility ( may ), necessity ( should ) and prediction ( will , may ), the latter being an area of difficulty for learners of English.


Anderson, L. and J. Bamford (eds) (2004) Evaluation in Oral and Written Academic Discourse . Roma: Officina Edizioni.

Pulcini, V. (2004) “A corpus of ‘informal academic interviews': the Italian Component of the LINDSEI project”. In M.T. Prat Zagrebelsky (ed.) Computer Learner Corpora: Theoretical Issues and Empirical Case Studies of Italian Advanced EFL Learners' Interlanguage (pp. 177-192). Alessandria : Edizioni dell'Orso.

Pulcini, V. & C. Furiassi (2004) “Spoken interaction and discourse markers in a corpus of learner English”. In A. Partington, J. Morley & L. Haarman (eds) Corpora and Discourse ( pp. 107-123 ). Bern : Peter Lang.

Thompson, G. & S. Hunston (2003) “Evaluation: an Introduction”. In S. Hunston & G. Thompson (eds) Evaluation in Text. Authorial Stance and the Construction of Discourse ( pp. 1-27 ) . Oxford : Oxford University Press.