Automated Evaluation

description16 papers

group21 followers

lightbulbAbout this topic

Automated evaluation refers to the use of algorithms and software tools to assess and score responses, performances, or outputs in various domains, such as education, natural language processing, and software development, without human intervention. It aims to enhance efficiency, consistency, and objectivity in the evaluation process.

lightbulbAbout this topic

Key research themes

1. How can automated evaluation enhance programming education through precise and fair assessment with feedback?

This research area investigates the development and implementation of automated tools to accurately assess programming assignments, aiming to reduce manual grading errors, improve efficiency, and standardize evaluation. It focuses on both correctness and qualitative assessment dimensions, such as code structure, style, and performance, and addresses the challenge of providing detailed, consistent feedback to learners.

Automatic assessment of Java code

by Adam Khalid

2022, Computer Languages, Systems & Structures

Key finding: This paper presents JavAssess, a Java library enabling both black-box and white-box assessment methods that automatically inspect, test, mark, and correct student Java code. The controlled university study with 535 students... Read more

articleView Paper downloadDownload

A system for automatic evaluation of programs for correctness and performance

by Marco Bettoni

2021

Key finding: The described system fully automates testing, grading, and feedback generation for student programming assignments (currently C language), incorporating multiple test dimensions including random and user-defined inputs,... Read more

articleView Paper downloadDownload

A Web-Based Automated Code Assessment and Testing System for ICT Teachers

by Kasun Jinasena

2023, EMSJ

Key finding: This research introduces a web-based system for automated grading of programming assignments with instructor-created questions and automatic generation of test cases. A survey with 30 teachers showed significant positive... Read more

articleView Paper downloadDownload

Bibliometric Analysis of Automated Assessment in Programming Education: A Deeper Insight into Feedback

by Álvaro Figueira, PhD

2023, Electronics

Key finding: Through a bibliometric study (2010–2022), this work identified that continuous, individualized, and timely feedback is critical in automated programming assessment to enhance learner progress. The analysis underscores recent... Read more

articleView Paper downloadDownload

Answer Sheet Evaluation using AI

by International Journal of Scientific Research in Science and Technology IJSRST

2021, International Journal of Scientific Research in Science and Technology

Key finding: This project develops a digital platform leveraging Optical Character Recognition (OCR) and keyword-based methods to automate handwritten answer evaluation, aiming to significantly reduce teacher workload and eliminate bias.... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

2. How can implicit user behavior and interaction data be utilized for automated evaluation of intelligent assistants' effectiveness?

This research theme examines methods for automatically evaluating voice-activated intelligent assistants by leveraging implicit user feedback such as interaction patterns, satisfaction metrics, and acoustic signals. The goal is to create consistent, scalable, and task-agnostic evaluation frameworks that overcome the challenges posed by diverse, evolving tasks and reduce reliance on costly, manual ground-truth annotations.

Automatic Online Evaluation of Intelligent Assistants

by Imed Zitouni

2021, Proceedings of the 24th International Conference on World Wide Web - WWW '15

Key finding: The paper introduces a novel model that categorizes user-system interactions into task-independent dialog actions and uses Markov models plus features from requests, responses, clicks, and acoustic signals to predict user... Read more

articleView Paper downloadDownload

3. What are the challenges and solutions in automating subjective evaluation and feedback for written responses and complex open-ended answers?

This theme focuses on automating the evaluation of subjective, open-ended responses—such as essays, summaries, and diagrams—using natural language processing, semantic similarity measures, and diagrammatic analyses. It addresses the tension between capturing writing quality, providing instructional feedback, and ensuring evaluation fairness, especially when using indirect measures and avoiding superficial text features.

Beyond Automated Essay Scoring

by david Mudou

2019

Key finding: The paper critically assesses early automated essay scoring methods that predominantly relied on surface textual features (e.g., essay length, word frequency) shown to correlate moderately with human scores (multiple R =... Read more

articleView Paper downloadDownload

An Expert-free evaluation of Scientific Summary through Diagram

by Quang Vu

2023

Key finding: This study proposes a fully automated evaluation framework that compares student-generated diagrammatic summaries against concept graphs extracted directly from the source scientific text, requiring no expert input. It uses... Read more

articleView Paper downloadDownload

Realization of an intelligent evaluation system

by mohamed biniz

2023, International Journal of Informatics and Communication Technology (IJ-ICT)

Key finding: The work develops an intelligent evaluation system (IES) that measures semantic and syntactic similarity between student answers and model answers using techniques like TF-IDF weighting and cosine similarity. It enables... Read more

articleView Paper downloadDownload

Online Examination and Evaluation System

by IRJET Journal

2022, IRJET

Key finding: This paper surveys existing AI-based technologies for automating evaluation of subjective answers in online exams, comparing approaches using keyword matching, cosine similarity, Jaccard similarity, and transformer-based... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

All papers in Automated Evaluation

Design and Implementation of A Tool to Integrate Automated Test Data Generation and Automatic Programming Assessment

by Farhan Tajudin

2025

Nowadays, manually assessing students’ programming exercises has been identified as among of the toughest tasks to lecturers of programming courses on top of their high routine workloads.Thus, Automatic Programming Assessment (or APA) has... more

descriptionView Paper arrow_downwardDownload

Teacher vs Artificial Intelligence: a comparison of the quality of feedback provided by a teacher and generative artificial intelligence in assessing students' creative writing

by Publishing House "Scientific and Educational Initiative"

2024, Perspectives of Science & Education

Introduction. Writing foreign-language creative writing assignments is one of the goals of foreign language
teaching in higher education. Modern AI tools (ChatGPT 4.0) are able to provide learners with evaluative feedback
and recommendations for essay revision. However, the feedback quality from ChatGPT 4 and other AI tools is a
subject of discussion in the teaching community. The aim of this paper is to compare the feedback quality provided
by ChatGPT 4.0 and teachers in evaluating students' essays.
Materials and methods. A bank of English essays (N=350) written by linguistics students (A2-B1 level) was used
as the material. The participants of the research were 12 teachers of English at Derzhavin Tambov State University
(Russian Federation). For every essay, one teacher and ChatGPT gave evaluative feedback on the following criteria:
1) content of the essay; 2) organisation and structure of the essay; 3) supporting ideas and arguments; 4) language
of the essay (lexical aspect of speech, grammatical aspect of speech, syntax); and 5) originality of the idea. The
recommendations received from the teacher and ChatGPT 4.0 were evaluated on the basis of norm-referenced
testing. The data analysis was carried out using the Student’s t-test.
Research results. It was found that ChatGPT matched the teacher in terms of the quality of evaluative feedback
for the following criteria: ‘content of the paper’ (t=0.24; p>0.05), ‘organisation and structure’ (t=1; p>0.05), and
‘supporting ideas and arguments’ (t=1.43; p>0.05). Moreover, ChatGPT outperformed the teacher (not a native
speaker) for the criteria ‘language of the essay’ (t=1.67; p≤0.05) and ‘originality of the essay’ (t=1.78; p≤0.05), which
is explained by the fact that the GPT language model was developed based on large English textual data. This
allowed the AI tool to be more accurate in assessing specifically the linguistic correctness of a written expression.
Conclusion. The novelty of the study is in the confirmation of the ability of the AI tool ChatGPT 4.0 to provide
qualitative feedback in assessing creative writing at the teacher's level or even better. The results of the study
support more intensive implementation of ChatGPT 4.0 in the process of teaching foreign language and assessing
the development level of students' writing skills.

descriptionView Paper arrow_downwardDownload

Method of teaching students’ foreign language creative writing based on evaluative feedback from artificial intelligence

by Publishing House "Scientific and Educational Initiative"

2024, Perspectives of Science & Education

Introduction. The development of students' foreign language creative writing skills is a component
of the goal of foreign language teaching in higher education. The effectiveness of the development of
students' writing skills is largely determined by the method and tools of teaching. Artificial intelligence
(AI) tools are able to provide learners with evaluative feedback that can be used to finalize written creative
works in a foreign language. The aim of the paper is to develop a method of teaching students to write
foreign language creative works on the basis of evaluative feedback from artificial intelligence, to test its
effectiveness in the course of experimental training.
Materials and methods. The study involved 1st year students (N=50) of Derzhavin Tambov State University
(Russian Federation), majoring in English as a Foreign Language. In order to test the effectiveness of the
authors’ method, stages of training were developed during which students received evaluative feedback
from the AI tool in order to further refine their papers. Students of the control group developed writing
skills using the traditional method. The objects of control were: a) content of the paper; b) organization
and structure; c) support of ideas and arguments; d) vocabulary; e) grammar. The data analysis was carried
out using the Student’s t-test.
Research results. It was found that the method of teaching students foreign language creative writing
on the basis of evaluative feedback from artificial intelligence is more effective in comparison with the
traditional method in terms of the following criteria: a) content of creative paper (t=2,75; p≤0,05), b)
organization and structure (t=3,05; p ≤ 0,05), and c) argumentation (t=2,44; p ≤ 0,05). At the same time,
statistics did not reveal any growth in the development of students' lexical (t=2,13; p>0,05) and grammar
(t=2,13; p>0,05) skills, which is explained by the individual nature of AI recommendations and the
objectively lack of sustainability in the lexical component of the teaching content when studying different
topics within the framework of one academic course.
Conclusion. The novelty of the study includes the development of stages of teaching students foreign
language creative writing based on evaluative feedback from artificial intelligence. The results obtained
can be used in the development of methods of teaching foreign languages with the use of AI-technologies.

descriptionView Paper arrow_downwardDownload

HCMN 2024 PUBLISHED

by Lyra Xyza Sinfuego

2024, HCMCOUJS-Social Sciences

descriptionView Paper arrow_downwardDownload

HCMN 2024 PUBLISHED

by Lyra Xyza Sinfuego

2024, HCMCOUJS-Social Sciences

The difference in the measure of effectiveness of checking test papers manually and using AOME in terms of efficiency In Table 2, a substantial effect size is unveiled concerning the difference in time spent between manual test paper evaluation and AOMR technology, with (t(199) = 1.15, p < .05, and d = 1.00). This finding unequivocally indicates that utilizing AOMR for assessing test papers has a large effect in reducing the time spent compared to the manual process. Table 2 This study’s outcomes clearly indicate that the implementation of AOMR technology offers a pronounced enhancement in grading efficiency when compared to traditional manual methods. The capacity of AOMR to process answer sheets quickly by examining marked responses allows for efficient score calculations and rapid result production. The study of Calaguas and Consunji (2022) revealed the same result. The empirical data strongly supports the notion that integrating technology into educational assessment practices, exemplified by AOMR termed by the author as mobile OMR, can yield substantial gains in efficiency.

Note: If p < 0.05, reject the null hypothesis; otherwise, fail to reject The difference in the level of accuracy & reliability in checking test papers manually and using AOMR Table 4 The same findings as Virtus (2019), this study underscored the reliability and accuracy of both AOMR and manual procedures in the context of test paper checking. The proximity of mean values suggests that these methods produce comparable outcomes in terms of accuracy and reliability. Moreover, the convergence of results signifies a consistent application of grading standards.

descriptionView Paper arrow_downwardDownload

Enhancing face-to-face evaluation using alternative optical mark recognition: A case study from the University of Cabuyao's college of education

by Ronnel P . Cuerdo

2024, HO CHI MINH CITY OPEN UNIVERSITY JOURNAL OF SCIENCE

Technology is defined as the use of scientific knowledge to solve practical problems. However, educators’ initiatives to integrate technology have been mostly prohibitively expensive. In this context, researchers proposed the automation of one of the most important processes but highly repetitive tasks among educators, the processing of student test results. The aim was to determine the alignment with evaluation standards and the acceptability of a cost-effective alternative. This study utilized a mixed-method approach, specifically concurrent triangulation. Quantitative and qualitative data were gathered concurrently, and then compared and combined the results to get a comprehensive understanding of the topic. Quantitatively, it involved the use of mean, standard deviation, t-test, and Cohen’s d to evaluate Alternative Optical MarkRecognition (AOMR) according to the required educators’ evaluation standards and its impact on reducing educators’clerical workload. Qualitatively, semi-structured interviews and thematic analysis were employed to elucidate educators’perspectives regarding the use of AOMR and the broader integration of technology as a whole. Results showed a one-hundred-thirty times efficiency compared to the manual process without losing the accuracy and reliability of data. Participants underscored the positive effect of AOMR in diminishing the labor-intensive nature of a crucial yet arduous clerical task for educators. Additionally, participants also emphasized unexpected benefits,including email results distribution, backup e-copies of sheets, ease of data management, class record integration, and automated student ranking. These findings offer valuable insights into the challenges surrounding the integration of technology in educational contexts in general, shedding light on the advantages of AOMR in the evaluation of student test results, in particular.
Keywords: alternative optical mark recognition; Evalbee; integrating automation; student test evaluation; technology in education

descriptionView Paper arrow_downwardDownload

A Web-Based Automated Code Assessment and Testing System for ICT Teachers

by Kasun Jinasena

2023, EMSJ

Computer programming captivates the attention of both professionals and young learners due to its multidisciplinary applications and high paid jobs. However, mastering computer programming requires critical thinking and consistent... more

descriptionView Paper arrow_downwardDownload

An Expert-free evaluation of Scientific Summary through Diagram

by Quang Vu

2023

This paper proposes a solution to evaluate summary of a scientific article through diagram analysis. The model diagram used for evaluation is constructed solely base on the reading text, and does not require extra input from human... more

descriptionView Paper arrow_downwardDownload

Effectiveness of Automation in Evaluating Test Results Using EvalBee as an Alternative Optical Mark Recognition (OMR): A Quantitative-Evaluative Approach from a Philippine Public School

by Jomar Ison

2023, International Journal of Theory and Application in Elementary and Secondary School Education

Within this study, the authors want to address the problem of overworking of teachers in Philippine schools due to their excessive clerical responsibility, which could lead to teacher attrition. The authors propose to automate the... more

descriptionView Paper arrow_downwardDownload

Web Accessibility in Africa: A Study of Three African Domains

by Luís Carriço

2022, Human-Computer Interaction – INTERACT 2013

Being the most used method for dissemination of information, especially for public services, it is of paramount importance that the Web is made accessible as to allow all its users to access the content of its pages. In this paper, we... more

descriptionView Paper arrow_downwardDownload

Effectiveness of Automation in Evaluating Test Results Using EvalBee as an Alternative Optical Mark Recognition (OMR): A Quantitative-Evaluative Approach from a Philippine Public School

by Ronnel P . Cuerdo

2022, International Journal of Theory and Application in Elementary and Secondary School Education

Table 1. Level of efficiency of using manual vis-a-vis alternative OMR procedure in checking test papers and analyzing Its results 1s shown 1n lable 1. As shown in Table 1, the alternative OMR procedure is more efficient in checking test papers an analyzing its results with a mean score of 1.05 (SD = 0.21) compared to the manual procedure, which obtaine a mean evaluation of 4.76 (SD = 0.72). It indicates that alternative OMR has a better ability to check an analyze test items with less amount of time, resources, and effort or performance consumed. Moreover, thi table shows how extreme the difference is between alternative OMR and manual processes with respect t time and effort consumed, "very efficient" and "inefficient" respectively. According to Karunanayak (2015), the most evident benefit of employing OMR technology in the collection of data from documents is it efficiency when compared to the manual process, which is a time-consuming and tedious task [12].

Level of Accuracy of Data Output As presented in Table 2, both the alternative OMR and manual procedures produce very accurate results in checking test papers with a closer mean score of 1.01-1.03 (SD = 0.03 - 0.13); also, both the alternative OMR and manual procedures produce very accurate results in analyzing test papers with a closer mean score of 1.00-1.03 (SD = 0.02 - 0.12). It suggests that alternative OMR and manual procedures both provide a very precise and error-free test checking and analysis. However, the standard deviation of 0.03 - 0.02 in the alternative OMR process of checking and item analyzing, respectively, and the standard deviation of 0.13 - 0.12 in the manual process of checking and item analyzing both give a consistent difference in how "very accurate" those processes are.

Table 4. Level of efficiency of using manual vis-a-vis alternative OMR procedure in checking test As presented in Table 4, there is a significant difference between the mean evaluation of alternative OMR and Manual procedures in terms of their level of efficiency in checking test papers and analyzing their results. The figures in the table suggest that alternative OMR and manual procedures differ by 3.71 points in favor of the alternative OMR. This mean difference is statistically significant when tested at .01 level, which produces a Cohen’s d value of 7.00 that signifies a large effect size. It can be inferred from Table 4 that alternative OMR is a more efficient procedure for checking and analyzing tests compared to the manual process. According to Virtus (2019), efficiency is the most evident benefit of employing optical mark recognition technology to acquire data from papers. In alternative OMR, documents are scanned and typed at a rate that is several times faster than a human can [7].

Based on the results in Table 5, there is no significant difference between the level of accuracy of using manual vis-a-vis alternative OMR procedure in checking test papers and analyzing its results [t(48) = .913; t(48) = 1.109; p > .05)]. This result means that the two procedures showed a comparable level of accuracy when they were evaluated by the 25 teachers who served as the respondents of the study. In fact, they are both evaluated as “very accurate”. The data shows that using an alternative OMR provides a quick assessment of a student's test results. ‘eachers, in fact, find it useful, especially in terms of its ability to compute for item analysis, which was reviously a time-consuming operation. Data on the results will almost always be readily and promptly vailable. In the absence of an alternative OMR, each document must be painstakingly checked manually efore being properly entered into a form or computer system. Although skilled individuals could become roficient with evaluating and inputting data from forms, there seems to be a real limit towards how quickly n individual could do the task [7]. Tahle 5 reveals the test of significant difference on the level of accuracv hetween manual and

Table 6. Level of reliability of using manual vis-a-vis alternative OMR procedure in checking test papers and analyzing its results Table 6. Level of reliability of using manual vis-a-vis alternative OMR procedure in checking test As described in Table 6, there is no significant difference between the level of reliability of using manual vis-a-vis alternative OMR procedure in checking test papers and analyzing its results [t(48) = .825; t(48) = 1.107; p > .05)]. This result means that the two procedures showed a similar level of reliability when they were evaluated by the 25 teachers who served as the respondents of the study. In fact, they are both evaluated as “very reliable”. Ronnel P. Cuerdo, Michael Jomar B. Ison, Christian Diols T. Ofiate International Journal of Theory and Application in Elementary and Secondary School Education (IJTAESE), Vol. 3 (2), 61-75 Effectiveness of Automation in Evaluating Test Results Using EvalBee as an Alternative Optical Mark Recognition (OMR): A Quantitative-Evaluative Approach from a Philippine Public School

descriptionView Paper arrow_downwardDownload

Automatic System for Grading Multiple Choice Questions and Feedback Analysis

by Aniket Ashok Pawar

2022

This paper proposes a new idea for grading multiple-choice test which is to develop a method to use a personal computer plus a scanner and a program based application, that will grade a specially designed MCQ exam test and feedback... more

The answer sheet is made up of two elements: a title section, for identification of the exam and the candidate, and a response grid, for the candidate's answers to the questions. The personnel-identification section appears on the upper part of the answer sheet. It serves to identify the exam candidate. Different personal-identification templates can be used with Software. Depending on the template used, you can insert text boxes for surname and given name, date of birth and a student ID code.

descriptionView Paper arrow_downwardDownload

Effectiveness of Automation in Evaluating Test Results Using EvalBee as an Alternative Optical Mark Recognition (OMR): A Quantitative-Evaluative Approach from a Philippine Public School

by Christian Diols oñate

2022, International Journal of Theory and Application in Elementary and Secondary School Education

descriptionView Paper arrow_downwardDownload

Effectiveness of Automation in Evaluating Test Results Using EvalBee as an Alternative Optical Mark Recognition (OMR): A Quantitative-Evaluative Approach from a Philippine Public School

by Ronnel Cuerdo

2022, International Journal of Theory and Application in Elementary and Secondary School Education

descriptionView Paper arrow_downwardDownload

Evaluation of a Platform for Automated, Remote, In-Situ User EXperience Measurement

by Joke Kort

2022, Citeseer

A preliminary version of a platform for automated, remote, insitu user experience measurement called TUMCAT was evaluated. The use of peer-to-peer software was monitored with it during five weeks and subjective data were gathered with the... more

descriptionView Paper arrow_downwardDownload

Effectiveness of Automation in Evaluating Test Results Using EvalBee as an Alternative Optical Mark Recognition (OMR): A Quantitative-Evaluative Approach from a Philippine Public School

by Michael Jomar Ison

2022

descriptionView Paper arrow_downwardDownload

Evaluation of a Platform for Automated, Remote, In-Situ User EXperience Measurement

by J. Fokker

2022, Citeseer

descriptionView Paper arrow_downwardDownload

EVALUACIÓN AUTOMATIZADA DE NARRACIONES EN EL ÁREA DE LENGUA EXTRANJERA. AUTOMATED STORY SCORING IN FOREIGN LANGUAGE CLASS.

by PIXEL-BIT. Revista de Medios y Educación

2019, Pixel-BIt. Revista de Medios y Educación.

El programa EssA fue creado en 2004 como un instrumento para ayudar a valorar narraciones en el área de lengua extranjera. El análisis de regresión múltiple, según variables léxico-gramaticales, determinó una ecuación de regresión que... more

descriptionView Paper arrow_downwardDownload