Key research themes
1. How can automated methods accurately and robustly evaluate subjective and open-ended written responses?
This theme investigates computational approaches to automatically assess the quality of subjective textual answers, such as essays or short answers, focusing on techniques that handle the complexity and variability of natural language in educational contexts. It addresses challenges in modeling semantic similarity, handling rater bias, multilingual support, and feedback provision to enhance evaluation accuracy and instructional utility.
2. How can implicit user interaction data and dialogue act modeling enable automatic evaluation of intelligent assistants across diverse tasks?
This research area focuses on developing scalable, consistent automatic evaluation frameworks for voice-activated intelligent assistants that perform multiple, heterogeneous tasks (e.g., voice commands, web search, chat). It leverages implicit user feedback derived from user-system interaction logging, and models dialog actions in a task-independent manner to predict user satisfaction and key system components' performance, enabling cost-effective and continuous quality assessment without human annotations.
3. What are the critical considerations for fairness, transparency, and interpretability in automatic evaluation metrics across AI systems?
This theme explores challenges in the representativeness, bias, and interpretability of automatic evaluation metrics in AI, including fairness concerns in scoring and evaluation transparency. Research addresses how aggregate metrics may mask critical performance disparities, the impact of biased training data on evaluation fairness, and proposes methodological innovations for transparent, interpretable reporting and fair scoring frameworks that consider social and ethical dimensions.