What do annotators annotate?
An analysis of language teachers’ corpus pedagogical annotation.

Presented at the 8th international Teaching and Language Corpora conference, Lisbon

The paper presented at TALC 2008 includes different annexes that for lack of space cannot possibly be published along with the text. Here you can access them:

1. Introduction to the rationale behind corpus linguistics, the role of annotation in corpus linguistics and the relevance of annotation in the context of pedagogically-relevant corpora.

2. A video tutorial of SACODEYL Annotator.

3. Transcript of the text annotated by informants.


The System Aided Compilation and Open Distribution of European Youth Language (SACODEYL) EU initiative attempts to bring corpus-based materials and language learning experiences into the spotlight by implementing pedagogical annotation on multimedia, multilingual corpora of young Europeans aged between 13 and 18 years. In order to explore the potential use of pedagogical annotation by FLT professionals outside the SACODEYL consortium, we have conducted two case studies on the annotation of an interview which integrates the English corpus of SACODEYL.
One of the most neglected areas of research in corpus-based language teaching is the pedagogical annotation of corpora, .The studies that deal with this emerging area are scarce, dealing mainly with theoretical aspects (Braun 2005, 2006, 2007) and practical implementation issues (Pérez-Paredes and Alcaraz 2007). The pedagogical annotation of corpora varies significantly from the traditional linguistic focus on POS tagging, which, in turn, has dominated the TALC community debate. The latter is more concerned with the linguistic interpretation of data in the light of sampling, representativeness and expert appreciation of segmented discourse in context (Braun, Berglund and Pérez-Paredes 2007). In FLT contexts, however, pedagogic annotation may well serve a role of mediation between learners and corpus-based materials.

The two case studies we report involved professionals of different background and training experience. Both were exposed to a 90-minute training session which included (1) a short introduction to the role of annotation in corpus linguistics; (2) a tutorial of the annotation tool used for the annotation of SACODEYL corpora and, finally, (3) the underlying reasons why pedagogic annotation may be of interest in the context of FLT. Before the annotation task, the teachers watched the interview and were given a transcript of the interview. After that, they were prompted to annotate an interview which integrates the English component of SACODEYL for 90 minutes.
Our research discusses quantitative as well as qualitative data that emerges from the annotations analyzed. The former include an analysis of different levels of annotation relevant in pedagogical corpus annotation (corpus, text and section levels) and different measures concerning the categories and keywords annotated. The qualitative information we have considered incorporates background data of the annotators as well as insights into the mediation role underlying the annotating task of the teachers in the study. Despite the explorative scope of our study, the results show that, while retaining significant differences in terms of the annotation items targeted, the annotations share a common understanding of the role and scope of pedagogic annotation in language teaching. These results reveal that pedagogical annotation may become an important tool in a situation where teachers organize language learning experiences around the use of ad-hoc, teacher-led corpus-based materials that focus on the specific needs of learners, as opposed to situations which make use of either raw texts, where no annotation is available, or POS tagged corpora. The authors suggest that further research will be necessary to establish more solid links between pedagogical annotation and needs-driven selection of corpus-based materials in the language classroom.