Automatic Evaluation

description1,245 papers

group47 followers

lightbulbAbout this topic

Automatic evaluation refers to the use of algorithms and computational methods to assess the quality or performance of systems, models, or outputs, particularly in fields such as natural language processing, machine learning, and educational assessment, enabling objective and efficient measurement without human intervention.

lightbulbAbout this topic

Key research themes

1. How can automated methods accurately and robustly evaluate subjective and open-ended written responses?

This theme investigates computational approaches to automatically assess the quality of subjective textual answers, such as essays or short answers, focusing on techniques that handle the complexity and variability of natural language in educational contexts. It addresses challenges in modeling semantic similarity, handling rater bias, multilingual support, and feedback provision to enhance evaluation accuracy and instructional utility.

Beyond Automated Essay Scoring

by david Mudou

2019

Key finding: This work established that early automated essay scoring systems, such as Project Essay Grader (PEG), successfully correlated surface textual features (e.g., average word length, essay length) with human scores (up to R=.78),... Read more

articleView Paper downloadDownload

Automated Essay Scoring in the Presence of Biased Ratings

by Evelin Amorim

2021

Key finding: This study revealed that human rater bias, manifested in subjective comments, systematically affects automated essay scoring (AES) models trained to mimic such scores. Using lexicon-based analyses and subjectivity measures on... Read more

articleView Paper downloadDownload

GradeAid: a framework for automatic short answers grading in educational contexts—design, implementation and evaluation

by Emiliano del Gobbo

2023, Knowledge and Information Systems

Key finding: GradeAid integrates lexical and semantic features analyzed via state-of-the-art regression models to automatically score short student answers across multiple languages and heterogeneous datasets. Its robust validation,... Read more

articleView Paper downloadDownload

Subjective Answer Evaluator

by IJRASET Publication

2022, International Journal for Research in Applied Science and Engineering Technology (IJRASET) i

Key finding: This paper proposed an unsupervised two-stage system combining text summarization to extract key information and advanced neural language models (BERT, XLNET) fine-tuned on challenging datasets to evaluate subjective answers.... Read more

articleView Paper downloadDownload

A Multilingual Application for Automated Essay Scoring

by Celia María Pérez Marqués

2023, Lecture Notes in Computer Science

Key finding: Pioneering a bilingual AES system for Spanish and Basque, this work developed NLP pipelines integrating spell checking, lexical variability, and discourse features leveraging language-specific resources. A client-server... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

2. How can implicit user interaction data and dialogue act modeling enable automatic evaluation of intelligent assistants across diverse tasks?

This research area focuses on developing scalable, consistent automatic evaluation frameworks for voice-activated intelligent assistants that perform multiple, heterogeneous tasks (e.g., voice commands, web search, chat). It leverages implicit user feedback derived from user-system interaction logging, and models dialog actions in a task-independent manner to predict user satisfaction and key system components' performance, enabling cost-effective and continuous quality assessment without human annotations.

Automatic Online Evaluation of Intelligent Assistants

by Imed Zitouni

2021, Proceedings of the 24th International Conference on World Wide Web - WWW '15

Key finding: This paper introduced a novel evaluation model that predicts user satisfaction with intelligent assistants by classifying user-system interactions into task-independent dialog actions using a Markov model over action... Read more

articleView Paper downloadDownload

3. What are the critical considerations for fairness, transparency, and interpretability in automatic evaluation metrics across AI systems?

This theme explores challenges in the representativeness, bias, and interpretability of automatic evaluation metrics in AI, including fairness concerns in scoring and evaluation transparency. Research addresses how aggregate metrics may mask critical performance disparities, the impact of biased training data on evaluation fairness, and proposes methodological innovations for transparent, interpretable reporting and fair scoring frameworks that consider social and ethical dimensions.

Fairly evaluating and scoring items in a data set

by Abolfazl Asudeh

2022, Proceedings of the VLDB Endowment

Key finding: This tutorial synthesized challenges and frameworks for responsible scoring of data items, highlighting that fairness is multi-faceted and context-dependent with competing definitions. It emphasized that biased social data... Read more

articleView Paper downloadDownload

Rethink reporting of evaluation results in AI

by Anthony Cohn

2024, Science

Key finding: The authors argued that prevailing AI evaluation practices relying on aggregate metrics impede nuanced understanding of system capabilities and failure modes. They demonstrated that lack of availability of instance-level... Read more

articleView Paper downloadDownload

Transparent Human Evaluation for Image Captioning

by Noah Smith

2023, Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Key finding: This paper designed a rubric-based human evaluation protocol for image captioning that separately quantifies precision, recall, fluency, conciseness, and inclusiveness, revealing critical gaps in standard automatic metrics... Read more

articleView Paper downloadDownload

Computer-supported Techniques to Increase Students Engagement in Programming

by Paula Correia Tavares

2023, Proceedings of the 8th International Conference on Computer Supported Education

Key finding: While focused on pedagogy, this study underscored the importance of timely, automated feedback mechanisms to motivate learners and improve outcomes in programming education. It demonstrated that computer-supported tools... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

All papers in Automatic Evaluation

A Matrix-Based Heuristic Algorithm for Extracting Multiword Expressions from a Corpus

by Orhan Bilgin

2025

This paper describes an algorithm for automatically extracting multiword expressions (MWEs) from a corpus. The algorithm is nodebased, i.e. extracts MWEs that contain the item specified by the user, using a fixed window-size around the... more

descriptionView Paper arrow_downwardDownload

Computer-supported Techniques to Increase Students Engagement in Programming

by Paula Correia Tavares

2025

One of the main reasons that justify the student's failure in (introductory) programming courses is the lack of motivation that impacts on the knowledge acquisition process, affecting learning results. As soon as students face the... more

descriptionView Paper arrow_downwardDownload

EMOVAL : évaluation automatique de la valence et de l’activation émotionnelles des textes à l’aide d’une méta-norme de 5656 mots-racines

by Guy Denhiere

2025, Psychologie Française

) et de 12 textes calibrés de valence émotionnelle positive (joie et bonne surprise) et négative (peur, colère, dégoût, tristesse et mauvaise surprise). Les deux types de tests effectués confirment la pertinence psychologique d'EMOVAL.... more

descriptionView Paper arrow_downwardDownload

Lexical cohesion for evaluation of machine translation at document level

by Jonathan Webster

2025, 2011 7th International Conference on Natural Language Processing and Knowledge Engineering

This paper studies how granularity of machine translation evaluation can be extended from sentence to document level. While most state-of-the-art evaluation metrics focus on the sentence level, we emphasize the importance of document... more

descriptionView Paper arrow_downwardDownload

Summary Street®: Computer Support for Comprehension and Writing

by Nina Johnson

2025, Journal of Educational Computing Research

Having students express their understanding of difficult, new material in their own words is an effective method to deepen their comprehension and learning. Summary Street® is a computer tutor that offers a supportive context for students... more

descriptionView Paper arrow_downwardDownload

The Question Answering Systems : A Survey

by Mohamed Haggag

2025

Question Answering (QA) is a specialized area in the field of Information Retrieval (IR). The QA systems are concerned with providing relevant answers in response to questions proposed in natural language. QA is therefore composed of... more

descriptionView Paper arrow_downwardDownload

DESIGN AND DEVELOPMENT OF SECTION 508 COMPLIANT CUSTOM UI COMPONENTS: A TECHNICAL PERSPECTIVE

by VENKATA PRASANNA KUMAR PENTAKOTA

2025, DESIGN AND DEVELOPMENT OF SECTION 508 COMPLIANT CUSTOM UI COMPONENTS

Building Accessible Custom UI Controls: A Comprehensive Guide for 508 Compliance addresses the critical need for developing inclusive web applications through customized user interface components. This technical article explores the... more

descriptionView Paper arrow_downwardDownload

A hybrid statistical/linguistic model for generating news story gists

by Joe Carthy

2025

In this paper, we describe a News Story Gisting system that generates a 10-word short summary of a news story. This system uses a machine learning technique to combine linguistic, statistical and positional information in order to... more

descriptionView Paper arrow_downwardDownload

Machine Learning Approach to Augmenting News Headline Generation

by Joe Carthy

2025

In this paper, we present the HybridTrim system which uses a machine learning technique to combine linguistic, statistical and positional information to identify topic labels for headlines in a text. We compare our system with the Topiary... more

descriptionView Paper arrow_downwardDownload

QT Dispersion and Dipyridamole-Induced Myocardial Ischemia

by Yoto Yotov

2025, Scripta scientifica medica

The relationship between QT interval dispersion and dipyridamole-induced, transient myocardial ischemia was assessed in 32 male patients with ischemic heart disease. A standardized, high dose dipyridamole-ECG stress test was used as... more

descriptionView Paper arrow_downwardDownload

Putting Human Assessments of Machine Translation Systems in Order

by Adam Lopez

2025

Human assessment is often considered the gold standard in evaluation of translation systems. But in order for the evaluation to be meaningful, the rankings obtained from human assessment must be consistent and repeatable. Recent analysis... more

descriptionView Paper arrow_downwardDownload

Open Soucre Graph Transducer Interpreter and Grammar Development Environment

by Bernd Bohnet

2025

Graph and tree transducers have been applied in many NLP areas-among them, machine translation, summarization, parsing, and text generation. In particular, the successful use of tree rewriting transducers for the introduction of syntactic... more

descriptionView Paper arrow_downwardDownload

Significance tests of automatic machine translation evaluation metrics

by Ying Zhang

2025, Machine Translation

Automatic evaluation metrics for Machine Translation (MT) systems, such as BLEU, METEOR and the related NIST metric, are becoming increasingly important in MT research and development. This paper presents a significance testdriven... more

descriptionView Paper arrow_downwardDownload

शाबर मंत्र: उत्पत्ति, स्वरूप एवं लोकमान्यता का समाजशास्त्रीय अध्ययन

by Dheeraj Pratap Mitra

2025, International Journal of Arts, Humanities and Social Studies

शाबर मंत्र भारतीय धार्मिक एवं आध्यात्मिक परंपराओं में एक विशिष्ट स्थान रखते हैं जो लोक प्रथाओं तथा तांत्रिक विश्वासों में गहराई से निहित हैं। संस्कृत के शास्त्रीय मंत्रों के स्वरूप के विपरीत ये मंत्र क्षेत्रीय भाषाओं में रचित होते हैं जिससे आम जनता हेतु इनकी पहुँच तथा समझ आसान हो जाती है। प्रस्तुत आलेख शाबर मंत्रों के ऐतिहासिक विकास, सामाजिक स्वीकृति एवं इनके सांस्कृतिक महत्व का विश्लेषण करता है साथ ही संकट निवारण, सुरक्षा, उपचार तथा समृद्धि में इनकी भूमिका को उजागर करता है। इस आलेख में गुरु गोरखनाथ तथा नाथ संप्रदाय के योगदान पर भी प्रकाश डाला गया है जो शाबर मंत्रों के व्यापक धार्मिक परिदृश्य में एकीकृत होने की प्रक्रिया को दर्शाता है। यद्यपि कि शाबर मंत्रों के प्रभाव को लेकर विवाद भी बना हुआ है तथा इनके गूढ़ स्वभाव एवं वैज्ञानिक प्रमाणों की अनुपस्थिति के कारण भी इन्हें संदेह की दृष्टि से देखा जाता है। कुछ आलोचकों का तर्क है कि इन मंत्रों की प्रभावशीलता मुख्य रूप से मनोवैज्ञानिक होती है जो विश्वास, आत्म-सुझाव आदि से प्रभावित होती है न कि किसी अलौकिक शक्ति से तथापि डिजिटल मीडिया के प्रसार के साथ ही आधुनिक समय में इनकी निरंतर प्रासंगिकता एवं लोकप्रियता में वृद्धि देखी गई है। यह आलेख शाबर मंत्रों के व्यावसायीकरण, इनके कानूनी एवं नैतिक पहलुओं तथा आधुनिक समाज में इनकी भूमिका पर भी विचार करता है। समाजशास्त्रीय दृष्टिकोण से इन मंत्रों का अध्ययन यह समझने हेतु भी आवश्यक है कि परंपरागत आध्यात्मिक विश्वास किस प्रकार समकालीन समाज में संरक्षित एवं पुनः प्रतिष्ठित हो रहे हैं।

descriptionView Paper arrow_downwardDownload

Chapter AUTOMATING GUIDELINES INSPECTION From Web site Specification to Deployment

by Mouhamed Diouf

2025

Abstract: This work focuses on how we can improve automatic evaluation based on guidelines inspection throughout the life cycle of Web applications by mapping guideline concepts to different artifacts produced during the development... more

descriptionView Paper arrow_downwardDownload

Eigengestures for Natural Human Computer Interface

by Jaroslaw A Miszczak

2025, Advances in Intelligent and Soft Computing

We present the application of Principal Component Analysis for data acquired during the design of a natural gesture interface. We investigate the concept of an eigengesture for motion capture hand gesture data and present the... more

descriptionView Paper arrow_downwardDownload

Aging effects on query flow graphs for query suggestion

by CARLOS CASTILLO

2025, Proceedings of the 18th ACM conference on Information and knowledge management

World Wide Web content continuously grows in size and importance. Furthermore, users ask Web search engines to satisfy increasingly disparate information needs. New techniques and tools are constantly developed aimed at assisting users in... more

descriptionView Paper arrow_downwardDownload

Developing a RESTful API for a Web Accessibility Evaluation Tool

by Salvador Otón

2025

Nowadays, the evaluation of the accessibility of Enterprise Web Information Systems is based on the Web Content Accessibility Guidelines (WCAG 2.0), created by the World Wide Web Consortium (W3C) in 2008 and adopted in 2012 by the... more

descriptionView Paper arrow_downwardDownload

Wireless intelligent sensor network for autonomous structural health monitoring

by Kerop Janoyan

2025, SPIE Proceedings

Life cycle monitoring of civil infrastructure such as bridges and buildings is critical to the long-term operational cost and safety of aging structures. The widespread use of Structural Health Monitoring (SHM) systems is limited due to... more

descriptionView Paper arrow_downwardDownload

Microsoft Word - conscience.5.doc

by Tracy Finn

2025

Abstract: This paper uses a neural theory of emotional consciousness to develop a novel account of conscience and moral intuition. Emotions are both cognitive appraisals and somatic perceptions, performed simultaneously by interacting... more

descriptionView Paper arrow_downwardDownload

A Hybrid Approach for Multiword Expression Identification

by Profa. Helena Caseli

2025, Lecture Notes in Computer Science

Considerable attention has been given to the problem of Multiword Expression (MWE) identification and treatment, for NLP tasks like parsing and generation, to improve the quality of results. Statistical methods have been often employed... more

descriptionView Paper arrow_downwardDownload

Text Generation Models for Paraphrase on Kazakh Language

by Nurzhan Mukazhanov

2025, Vestnik KazUTB

This study delves into the relatively unexplored domain of natural language processing for the Kazakh languagea language with limited computational resources. The paper dissects the effectiveness of diffusion models and transformers in... more

descriptionView Paper arrow_downwardDownload

PREFER: Using a Graph-Based Approach to Generate Paraphrases for Language Learning

by Jason Chang

2025

Paraphrasing is an important aspect of language competence; however, EFL learners have long had difficulty paraphrasing in their writing owing to their limited language proficiency. Therefore, automatic paraphrase suggestion systems can... more

descriptionView Paper arrow_downwardDownload

Lexical cohesion for evaluation of machine translation at document level

by Cecilia Pun

2024, 2011 7th International Conference on Natural Language Processing and Knowledge Engineering

descriptionView Paper arrow_downwardDownload

Towards Cross-Version Harmonic Analysis of Music

by Meinard Müller

2024, IEEE Transactions on Multimedia

For a given piece of music, there often exist multiple versions belonging to the symbolic (e.g., MIDI representations), acoustic (audio recordings), or visual (sheet music) domain. Each type of information allows for applying specialized,... more

descriptionView Paper arrow_downwardDownload

Advances in cancer tissue microarray technology: Towards improved understanding and diagnostics

by David Foran

2024, Analytica Chimica Acta

Over the past few years, tissue microarray (TMA) technology has been established as a standard method for assessing the expression of proteins or genes across large sets of tissue specimens. It is being adopted increasingly among leading... more

descriptionView Paper arrow_downwardDownload

Combining case-based reasoning systems and support vector regression to evaluate the atmosphere–ocean interaction

by Juan Antonio Bran De Paz

2024, Knowledge and Information Systems

This work presents a system for automatically evaluating the interaction that exists between the atmosphere and the ocean's surface. Monitoring and evaluating the ocean's carbon exchange process is a function that requires working with a... more

descriptionView Paper arrow_downwardDownload

An overview of Web search evaluation methods

by Rashid Ali

2024, Computers & Electrical Engineering

Web search evaluation is the process of measuring the effectiveness of a Web search system. Such an evaluation helps in identifying the most effective one and helps the users to find the required information with less effort. Web search... more

descriptionView Paper arrow_downwardDownload

Overview of the IWSLT 2005 Evaluation Campaign

by C. Hori

2024, Proc. of the International Workshop on Spoken …

This paper reports an overview of the evaluation campaign results of the IWSLT 2005 workshop1. The BTEC corpus, which consists of typical travel domain phrases, was used. Data for the five language pairs Arabic/Chinese/Japanese/Korean to... more

descriptionView Paper arrow_downwardDownload

Improving Lexical Alignment Using Hybrid Discriminative and Post-Processing Techniques

by Profa. Helena Caseli

2024

Automatic lexical alignment is a vital step for empirical machine translation, and although good results can be obtained with existent models (e.g. Giza++), more precise alignment is still needed for successfully handling complex... more

descriptionView Paper arrow_downwardDownload

Web Accessibility Evaluation Tools: A Survey and Some Improvements

by Luis Álvarez

2024, Electronic Notes in Theoretical Computer Science

Web Content Accessibility Guidelines (WCAG) from W3C consist of a set of 65 checkpoints or specifications that Web pages should accomplish in order to be accessible to people with disabilities or using alternative browsers. Many of these... more

descriptionView Paper arrow_downwardDownload

Kohonen and counterpropagation artificial neural networks in analytical chemistry

by Jure Zupan

2024, Chemometrics and Intelligent Laboratory Systems

The principles of the Kohonen and counterpropagation artificial neural network (K-ANN and CP-ANN) learning strategy is described. The use of both methods (with the emphasis on CP-ANNs) is explained on several examples from analytical... more

descriptionView Paper arrow_downwardDownload

Proceso metodológico para el análisis comparativo de validadores automáticos de accesibilidad Web

by valentina mosquera morales

2024, Informador Técnico

La accesibilidad web es aquella característica que permite que cualquier persona sin importar sus condiciones pueda acceder a los contenidos de los sitios web. El uso de validadores automáticos permite realizar un primer análisis acerca... more

descriptionView Paper arrow_downwardDownload

Latent Semantic Analysis Parameters for Essay Evaluation using Small-Scale Corpora*

by JOSE FREDDY VIDAL LEON

2024, Journal of Quantitative Linguistics

Some previous studies (e.g. that carried out by Van Bruggen et al. in 2004) have pointed to a need for additional research in order to firmly establish the usefulness of LSA (latent semantic analysis) parameters for automatic evaluation... more

descriptionView Paper arrow_downwardDownload

Extending the Entity-based Coherence Model with Multiple Ranks

by Graeme Hirst

2024, Conference of the European Chapter of the Association for Computational Linguistics

We extend the original entity-based coherence model (Barzilay and Lapata, 2008) by learning from more fine-grained coherence preferences in training data. We associate multiple ranks with the set of permutations originating from the same... more

descriptionView Paper arrow_downwardDownload

Reversing the affective congruency effect: The role of target word frequency of occurrence

by Oscar Ybarra

2024, Journal of Experimental Social Psychology

In this research the outcome of an aVective priming experiment is shown to critically depend on the frequency of occurrence of the target words used. Low frequency target words (5.7 occurrences per million words) resulted in an aVective... more

descriptionView Paper arrow_downwardDownload

Verification of Bangla Sentence Structure using N-Gram

by Nur Hossain Khan

2024, Global journal of computer science and technology

Statistical N-gram language modeling is used in many domains like spelling and syntactic verification, speech recognition, machine translation, character recognition and like others. This paper describes a system for sentence structure... more

descriptionView Paper arrow_downwardDownload

Sistema Colaborativo de Suporte à Aprendizagem em Grupo da Programação—SICAS-COL

by Antonio Carlos Neves Mendes

2024

-Muitos alunos apresentam dificuldades na compreensão e desenvolvimento de algoritmos. Para tentar ajudar esses alunos foi criado o ambiente SICAS, um ambiente de trabalho individual baseado na animação e simulação de algoritmos. Neste... more

descriptionView Paper arrow_downwardDownload

Peranan Ergonomik Dalam Rekabentuk Kerusi Sekolah : Kajian Kes di Sekitar Perlis, Kedah dan Pulau Pinang

by Noor Azlina Mohamed Khalid

2024

Ergonomik memainkan peranan yang penting dalam rekabentuk kerusi sekolah. Postur duduk yang janggal akibat daripada rekabentuk kerusi yang tidak sesuai mampu menyumbang ke arah kesan yang negatif kepada kesihatan kanak-kanak. Isu ini... more

descriptionView Paper arrow_downwardDownload

The Automatic Evaluation of Novel Stimuli

by MAGDA GARCIA

2024, Psychological Science

From classic theory and research in psychology, we distill a broad theoretical statement that evaluative responding can be immediate, unintentional, implicit, stimulus based, and linked directly to approach and avoidance motives. This... more

descriptionView Paper arrow_downwardDownload

Latent Semantic Analysis Parameters for Essay Evaluation using Small-Scale Corpora*

by JOSE EDUARDO SALAZAR LEON

2024, Journal of Quantitative Linguistics

descriptionView Paper arrow_downwardDownload

Search right and thou shalt find ... Using Web Queries for Learner Error Detection

by Claudia Leacock

2024

We investigate the use of web search queries for detecting errors in non-native writing. Distinguishing a correct sequence of words from a sequence with a learner error is a baseline task that any error detection and correction system... more

descriptionView Paper arrow_downwardDownload

Using Statistical Techniques and Web Search to Correct ESL Errors

by Claudia Leacock

2024, CALICO Journal

In this paper we present a system for automatic correction of errors made by learners of English. The system has two novel aspects. First, machine-learned classifiers trained on large amounts of native data and a very large language model... more

descriptionView Paper arrow_downwardDownload

Latent Semantic Analysis Parameters for Essay Evaluation using Small-Scale Corpora*

by jose perez leon

2024, Journal of Quantitative Linguistics

descriptionView Paper arrow_downwardDownload

Evaluation of a machine translation system for low resource languages: METIS-II

by Olga Yannoutsou

2024

In this paper we describe the METIS-II system and its evaluation on each of the language pairs: Dutch, German, Greek, and Spanish to English. The METIS-II system envisaged developing a data-driven approach in which no parallel corpus is... more

descriptionView Paper arrow_downwardDownload

UPC-CORE: What Can Machine Translation Evaluation Metrics and Wikipedia Do for Estimating Semantic Textual Similarity?

by Jordi Turmo

2024

In this paper we discuss our participation to the 2013 Semeval Semantic Textual Similarity task. Our core features include (i) a set of metrics borrowed from automatic machine translation, originally intended to evaluate automatic against... more

descriptionView Paper arrow_downwardDownload

EMOVAL : évaluation automatique de la valence et de l’activation émotionnelles des textes à l’aide d’une méta-norme de 5656 mots-racines

by Nicolas Leveau

2024, Psychologie Française

descriptionView Paper arrow_downwardDownload

An Analysis of Students Summaries using Summary Sentence Decomposition

by Norisma Idris

2024

descriptionView Paper arrow_downwardDownload

An Analysis of Students Summaries using Summary Sentence Decomposition

by Norisma Idris

2024

descriptionView Paper arrow_downwardDownload

Investigation of AM-FM methods for mammographic breast density classification

by Styliani Petroudi and

2024

Breasts are composed of a mixture of fibrous and glandular tissue as well as adipose tissue and breast density describes the prevalence of fibroglandular tissue as it appears on a mammogram. Over the past few years, evaluation and... more

descriptionView Paper arrow_downwardDownload

Automatic Evaluation

Key research themes

1. How can automated methods accurately and robustly evaluate subjective and open-ended written responses?

2. How can implicit user interaction data and dialogue act modeling enable automatic evaluation of intelligent assistants across diverse tasks?

3. What are the critical considerations for fairness, transparency, and interpretability in automatic evaluation metrics across AI systems?

Related Topics

All papers in Automatic Evaluation