Homepage » About the Corpus » Data Collection

Data Collection

The implementation of our project involves a series of developmental data drawn from different sources and distributed in one of the following «pools»:

  • Pool A: It comprises handwritten texts produced during the regular school schedule in class or at home, but under the teachers’ supervision. All texts are digitized and chronologically stored with an electronic format comprising metadata information, such as the writing conditions, the target genre (+/- formal style, +/-recipient) and the use of reference materials (dictionaries, textbooks, etc.).
  • Pool B: It includes handwritten texts produced during the extra writing practice courses organized by the research team and offered to all project participants for free. Each learner has to write monthly (extra writing practice courses open usually on the last week of the month) in the predetermined time of 30 minutes a task related to his/her language level. Therefore, the expected task production in the period of one academic year (October to May) is about 8 tasks per learner. All productions are digitized and e-mailed back to their writers with a personalized linguistic and metalinguistic feedback for all kind of errors (grammatical, lexical, pragmatic). Moreover, all texts are edited and presented anew in an error free form, so that each learner may benefit from the second draft of his initial production (Ellis & Barkhuizen, 2005).
  • Pool C: It comprises oral productions by the same learners on similar tasks as those described in the previous section. Oral data elicitation takes the form of informal interaction between learner- native speaker pairs and occurs twice a year (1rst collection: December to February, 2nd collection: April to May). Transcription of spoken material is broad orthographic, marking basic features of spontaneous discourse such as overlap, pauses, interruption, lengthening, etc. and it allows us to compare both oral and written output of the same learner. A digital copy of all spoken texts allows for more detailed transcriptions when the need arises in the future. Furthermore, a direct feedback form assessing the learners discourse competence as well as their grammatical and lexical inefficiencies is provided by the end of any spoken interaction.
  • Pool D: It refers to the final storage of the learners’ performance and it contains written and oral material drawn from the School achievement test, known as «Veveosi Elinomathias», which corresponds to the B2 level (CEF, 2001) and is held every May. Therefore, each subject participating in the project will be followed under different output conditions (+/- testing pressure, +/- teacher intervention) and may inform the database with his final productive skill outcome which will be assessed by external evaluators.