Evolution of automated feedback in SFL: Analysis of Claude Sonnet 3.7 and 4.0 as evaluators
DOI
https://doi.org/10.25267/Tavira.2025.i30.1107Info
Abstract
This research analyses the evolution of feedback capabilities between specific artificial intelligence for different Claude Sonnet language models in versions 3.7 and 4.0 as correction tools for texts written by students of Spanish as a foreign language. Through a qualitative comparative analysis of 15 texts from the CEDEL2 corpus, the study evaluates different items: accuracy in error detection, explanatory clarity, pedagogical appropriateness, and problems detected. Claude 4.0 increases error detection by 17% (189 vs 161) and develops greater sophistication in level adaptation, focusing on fundamental errors for beginners while providing comprehensive analysis for advanced students. The latest version introduces improvements in structural organisation through a tripartite format: ‘error → correction → explanation’. However, it presents worrying pedagogical setbacks: it eliminates complementary activities characteristic of Claude 3.7, degrades motivational feedback to generic third-person comments, and maintains biases towards peninsular varieties and hypercorrection. More problematic are the interlinguistic interferences generated by proposals in Spanish and English, resulting in inappropriate Spanglish. The analysis confirms that neither version can function autonomously without teacher mediation, establishing their optimal role as complementary tools with active pedagogical supervision. The findings show that technological evolution in educational AI does not constitute linear improvement, revealing complex exchanges between technical sophistication and pedagogical adequacy.
Keywords
Downloads
How to Cite
License
Copyright (c) 2025 Antoni Brosa Rodríguez

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors who have publications with this journal agree to the following terms:
- They will retain their copyright and grant the journal the right of first publication of their work, which will simultaneously be subject to the Creative Commons Attribution License. They may copy, use, disseminate, transmit, and publicly display, provided that authorship, url, and journal are cited, and they are not used for commercial purposes. Derivative works are not permitted.
You may adopt other non-exclusive licensing arrangements for distribution of the published version of the work (e.g. deposit it in an institutional telematic archive or publish it in a monographic volume) as long as the initial publication in this journal is indicated.
Disseminate your work on the Internet (e.g. in institutional telematic archives or on your website), which can lead to interesting exchanges and increase citations of the published work. (See The Open Access Effect)
References
Arnold, J. (2000). La dimensión afectiva en el aprendizaje de idiomas. Colección Cambridge de didáctica de lenguas. Edinumen.
Bailini, S. (2020a). El feedback como herramienta didáctica para el desarrollo de la autonomía en la adquisición de lenguas extranjeras. Philologia Hispalensis, 34, 25-39. https://dx.doi.org/10.12795/PH.2020.v34.i01.02
Bailini, S. (2020b). El feedback interactivo y la adquisición del español como lengua extranjera. Mimesis.
Benson, P. (2006). Autonomy in language teaching and learning. Language Teaching, 40, 21-40. https://doi.org/10.1017/S0261444806003958
Buyse, K. (2014). Una hoja de ruta para integrar las TIC en el desarrollo de la expresión escrita: recursos y resultados. Journal of Spanish Language Teaching, 1(1), 101-115. https://doi.org/10.1080/23247797.2014.898516
Coterall, S. (2008). Aprendientes de lenguas y autoevaluación. marcoELE, 7. http://marcoele.com/descargas/7/cotterall.pdf
Coyne, S., Sakaguchi, K., Galvan-Sosa, D., Zock, M. e Inui, K. (2023). Analyzing the performance of GPT-3.5 and GPT-4 in grammatical error correction. arXiv:2303.14342. https://doi.org/10.48550/arXiv.2303.14342
Crossley, S. A., Bradfield, F. y Bustamante, A. (2019). Using human judgments to examine the validity of automated grammar, syntax, and mechanical errors in writing. Journal of Writing Research, 11(2), 251-270. https://doi.org/10.17239/jowr-2019.11.02.01
Feng Teng, M. (2024). «ChatGPT is the companion, not enemies»: EFL learners' perceptions and experiences in using ChatGPT for feedback in writing. Computers and Education: Artificial Intelligence, 7. https://doi.org/10.1016/j.caeai.2024.100270
Fernández, S. (2017). Evaluación y aprendizaje. MarcoELE: Revista de Didáctica Español Lengua Extranjera, 24, 1-43. http://marcoele.com/descargas/24/fernandez-evaluacion_aprendizaje.pdf
Ferreira, A. y Kotz, G. (2010). ELE-Tutor Inteligente: Un analizador computacional para el tratamiento de errores gramaticales en Español como Lengua Extranjera. Revista signos, 43(73), 211-236. https://dx.doi.org/10.4067/S0718-09342010000200002
García, M. (2024). ChatGPT: posibles aplicaciones y recomendaciones de uso en ELE. In ELEUK ampliando horizontes: propuestas didácticas y avances en investigación (pp. 121-139). Instituto Cervantes.
García Pujals, A. y Lasagabaster, D. (2019). El efecto de la evaluación y la retroalimentación en la autonomía, la motivación y el aprendizaje del español como L3. Revista Española de Lingüística Aplicada, 32(2), 455-485. https://doi.org/10.1075/resla.17050.gar
Hattie, J. y Timperley, H. (2007). The Power of Feedback. Review of Educational Research, 77(1), 81-112.
Kasneci, E., Seßler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., Gasser, U., Groh, G., Günnemann, S., Hüllermeier, E., Krusche, S., Kutyniok, G., Michaeli, T., Nerdel, C., Pfeffer, J., Poquet, O., Sailer, M., Schmidt, A., Seidel, T. y Kasneci, G. (2023). ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences, 103. https://doi.org/10.1016/j.lindif.2023.102274
López Mata, D. (2023). ChatGPT en la clase de preparación al DELE. Propuesta didáctica e impresiones de los estudiantes de ELE. Revista Nebrija De Lingüística Aplicada a La Enseñanza De Lenguas, 17(35). https://doi.org/10.26378/rnlael1735533
Mizumoto, A., Shintani, N., Sasaki, M. y Feng Teng, M. (2024). Testing the viability of ChatGPT as a companion in L2 writing accuracy assessment. Research Methods in Applied Linguistics, 3(2). https://doi.org/10.1016/j.rmal.2024.100116
Ranalli, J. (2021) L2 student engagement with automated feedback on writing: Potential for learning and issues of trust. Journal of Second Language Writing, 52. https://doi.org/10.1016/j.jslw.2021.100816
Ranalli, J., Link, S. y Chukharev-Hudilainen, E. (2017). Automated writing evaluation for formative assessment of second language writing: Investigating the accuracy and usefulness of feedback as part of argument-based validation. Educational Psychology, 37(1), 8-25. https://doi.org/10.1080/01443410.2015.1136407
Slamet, J. (2024). Potential of ChatGPT as a digital language learning assistant: EFL teachers' and students' perceptions. Discoveries in Artificial Intelligence, 4. https://doi.org/10.1007/s44163-024-00143-2
Xiao, Y. y Zhi, Y. (2023). An Exploratory Study of EFL Learners' Use of ChatGPT for Language Learning Tasks: Experience and Perceptions. Languages, 8(3). https://doi.org/10.3390

