MODEL OF AUTOMATED TEST EVALUATION USING GENERATIVE AI TOOLS
DOI:
https://doi.org/10.35120/sciencej0502227aKeywords:
generative AI, automated evaluation, test assessment, prompt engineering, educational technologyAbstract
This paper presents a model for automated test evaluation based on the application of generative AI tools. The proposed approach is structured as a sequential evaluation pipeline that uses predefined prompts to process student responses through clearly defined phases. The evaluation process includes defining the answer key, applying scoring rules, analyzing responses at the question level, generating individual reports, and aggregating results. A key contribution of the paper is the definition of a controlled prompt protocol that decomposes a complex evaluation task into smaller, verifiable steps, improving transparency and reliability. The model also supports human-in-the-loop interaction, allowing instructors to intervene at each stage of the evaluation process and maintain pedagogical control. Unlike traditional approaches where AI is used as a general-purpose assistant, the proposed model defines a structured workflow that ensures consistent and repeatable evaluation. This design reduces the risk of uncontrolled AI behavior and enables precise identification of potential errors within individual phases of the pipeline. The model was applied in a real educational setting, where test evaluation was performed using the proposed pipeline. The results indicate a significant improvement in efficiency, with an approximate 95% reduction in evaluation time compared to manual grading. Additionally, the system demonstrates a high level of consistency in applying grading criteria, reducing subjectivity in assessment. It was observed that system reliability depends on the quality of input data, particularly in cases involving manually entered student identification data. The findings suggest that limitations are not solely related to the AI model itself, but also to data acquisition and preprocessing. The proposed model shows strong potential for application in various educational contexts where efficient and transparent test evaluation is required.
Downloads
References
Becker, B. A., Denny, P., Finnie-Ansley, J., et al. (2023). Generative AI in Computing Education: Opportunities and Challenges. ACM Transactions on Computing Education. https://doi.org/10.1145/3615706.
Cotton, D., Cotton, P., & Shipway, J. (2023). Chatting and Cheating: Ensuring Academic Integrity in the Era of ChatGPT. Innovations in Education and Teaching International. https://doi.org/10.1080/14703297.2023.2190148
Finnie-Ansley, J., Becker, B. A., Denny, P., & Luxton-Reilly, A. (2022). My AI Wants to Know if This Will Be on the Exam: Testing OpenAI’s Codex on CS2 Programming Exercises. Proceedings of the 53rd ACM Technical Symposium on Computer Science Education (SIGCSE). https://doi.org/10.1145/3478431.3499348
Holmes, W., Bialik, M., & Fadel, C. (2019). Artificial Intelligence in Education: Promises and Implications for Teaching and Learning. Center for Curriculum Redesign. https://curriculumredesign.org/wp-content/uploads/AIED-Report-2019.pdf
Kasneci, E., et al. (2023). ChatGPT for Good? On Opportunities and Challenges of Large Language Models for Education. Learning and Individual Differences, 103, 102274. https://doi.org/10.1016/j.lindif.2023.102274
OpenAI. (2023). GPT-4 Technical Report. https://arxiv.org/abs/2303.08774
Prather, J., Reeves, B., Denny, P., Becker, B. A., et al. (2023). The Impact of AI Code Generators on Programming Education. Proceedings of the ACM Conference on International Computing Education Research (ICER). https://doi.org/10.1145/3568813.3600137
Susnjak, T. (2022). ChatGPT: The End of Online Exam Integrity? arXiv preprint arXiv:2212.09292. https://arxiv.org/abs/2212.09292
Vaithilingam, P., Zhang, T., & Glassman, E. (2022). Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models. Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3491102.3517506
Zhai, X. (2023). ChatGPT for Teaching and Learning: A Systematic Review. Educational Technology Research and Development. https://doi.org/10.1007/s11423-023-10231-6
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.



