Machine learning for learner English

This paper discusses machine learning techniques for the prediction of Common European Framework of Reference (CEFR) levels in a learner corpus. We summarise the CAp 2018 Machine Learning (ML) competition, a classification task of the six CEFR levels, which map linguistic competence in a foreign language onto six reference levels. The goal of this competition was to produce a machine learning system to predict learners’ competence levels from written productions comprising between 20 and 300 words and a set of characteristics computed for each text extracted from the French component of the EFCAMDAT data (Geertzen et al., 2013). Together with the description of the competition, we provide an analysis of the results and methods proposed by the participants and discuss the benefits of this kind of competition for the learner corpus research (LCR) community. The main findings address the methods used and lexical bias introduced by the task.

Mots clés

CEFR EFCAMDAT corpus Language proficiency Learners of English Machine learning Natural language processing (NLP)

Domaines

Linguistique Informatique et langage [cs.CL] Apprentissage [cs.LG]

Fichier principal

Lessons_from_the_CAp_2018_Competition-Final-Authors-version.pdf (633.32 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Thomas Gaillat : Connectez-vous pour contacter le contributeur

https://univ-rennes2.hal.science/hal-02496670

Soumis le : mercredi 23 septembre 2020-08:52:21

Dernière modification le : vendredi 22 décembre 2023-15:16:05

Archivage à long terme le : jeudi 24 décembre 2020-18:33:58

Dates et versions

hal-02496670 , version 1 (23-09-2020)

Identifiants

HAL Id : hal-02496670 , version 1
DOI : 10.1075/ijlcr.18012.bal

Citer

Nicolas Ballier, Stéphane Canu, Caroline Petitjean, Gilles Gasso, Carlos Balhana, et al.. Machine learning for learner English: A plea for creating learner data challenges. International Journal of Learner Corpus Research, 2020, 6 (1), pp.72-103. ⟨10.1075/ijlcr.18012.bal⟩. ⟨hal-02496670⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UR2-HB INSA-ROUEN LITIS COMUE-NORMANDIE UNIV-RENNES2 UNIV-RENNES UNIROUEN UNILEHAVRE CLILLAC-ARP INSA-GROUPE LIDILE UP-SOCIETES-HUMANITES

259 Consultations

967 Téléchargements