MODEL OF AUTOMATED TEST EVALUATION USING GENERATIVE AI TOOLS

Goran Aritonović

doi:10.35120/sciencej0502227a

Authors

Goran Aritonović Belgrade Business and Arts Academy of Applied Studies, Belgrade, Serbia Author

DOI:

https://doi.org/10.35120/sciencej0502227a

Keywords:

generative AI, automated evaluation, test assessment, prompt engineering, educational technology

Abstract

This paper presents a model for automated test evaluation based on the application of generative AI tools. The proposed approach is structured as a sequential evaluation pipeline that uses predefined prompts to process student responses through clearly defined phases. The evaluation process includes defining the answer key, applying scoring rules, analyzing responses at the question level, generating individual reports, and aggregating results. A key contribution of the paper is the definition of a controlled prompt protocol that decomposes a complex evaluation task into smaller, verifiable steps, improving transparency and reliability. The model also supports human-in-the-loop interaction, allowing instructors to intervene at each stage of the evaluation process and maintain pedagogical control. Unlike traditional approaches where AI is used as a general-purpose assistant, the proposed model defines a structured workflow that ensures consistent and repeatable evaluation. This design reduces the risk of uncontrolled AI behavior and enables precise identification of potential errors within individual phases of the pipeline. The model was applied in a real educational setting, where test evaluation was performed using the proposed pipeline. The results indicate a significant improvement in efficiency, with an approximate 95% reduction in evaluation time compared to manual grading. Additionally, the system demonstrates a high level of consistency in applying grading criteria, reducing subjectivity in assessment. It was observed that system reliability depends on the quality of input data, particularly in cases involving manually entered student identification data. The findings suggest that limitations are not solely related to the AI model itself, but also to data acquisition and preprocessing. The proposed model shows strong potential for application in various educational contexts where efficient and transparent test evaluation is required.

Downloads

Download data is not yet available.

References

Becker, B. A., Denny, P., Finnie-Ansley, J., et al. (2023). Generative AI in Computing Education: Opportunities and Challenges. ACM Transactions on Computing Education. https://doi.org/10.1145/3615706.

Cotton, D., Cotton, P., & Shipway, J. (2023). Chatting and Cheating: Ensuring Academic Integrity in the Era of ChatGPT. Innovations in Education and Teaching International. https://doi.org/10.1080/14703297.2023.2190148 DOI: https://doi.org/10.35542/osf.io/mrz8h

Finnie-Ansley, J., Becker, B. A., Denny, P., & Luxton-Reilly, A. (2022). My AI Wants to Know if This Will Be on the Exam: Testing OpenAI’s Codex on CS2 Programming Exercises. Proceedings of the 53rd ACM Technical Symposium on Computer Science Education (SIGCSE). https://doi.org/10.1145/3478431.3499348 DOI: https://doi.org/10.1145/3576123.3576134

Holmes, W., Bialik, M., & Fadel, C. (2019). Artificial Intelligence in Education: Promises and Implications for Teaching and Learning. Center for Curriculum Redesign. https://curriculumredesign.org/wp-content/uploads/AIED-Report-2019.pdf

Kasneci, E., et al. (2023). ChatGPT for Good? On Opportunities and Challenges of Large Language Models for Education. Learning and Individual Differences, 103, 102274. https://doi.org/10.1016/j.lindif.2023.102274 DOI: https://doi.org/10.1016/j.lindif.2023.102274

OpenAI. (2023). GPT-4 Technical Report. https://arxiv.org/abs/2303.08774

Prather, J., Reeves, B., Denny, P., Becker, B. A., et al. (2023). The Impact of AI Code Generators on Programming Education. Proceedings of the ACM Conference on International Computing Education Research (ICER). https://doi.org/10.1145/3568813.3600137 DOI: https://doi.org/10.1145/3568813.3600137

Susnjak, T. (2022). ChatGPT: The End of Online Exam Integrity? arXiv preprint arXiv:2212.09292. https://arxiv.org/abs/2212.09292

Vaithilingam, P., Zhang, T., & Glassman, E. (2022). Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models. Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3491102.3517506 DOI: https://doi.org/10.1145/3491101.3519665

Zhai, X. (2023). ChatGPT for Teaching and Learning: A Systematic Review. Educational Technology Research and Development. https://doi.org/10.1007/s11423-023-10231-6

MODEL OF AUTOMATED TEST EVALUATION USING GENERATIVE AI TOOLS

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

How to Cite

Similar Articles

Make a Submission

Journal Information

For Authors

Information

Baners

Keywords