Academia.eduAcademia.edu

Item Response Theory

description7,777 papers
group27,214 followers
lightbulbAbout this topic
Item Response Theory (IRT) is a statistical framework used in psychometrics to model the relationship between individuals' latent traits and their item responses on assessments. It focuses on understanding how specific characteristics of test items influence the probability of a correct response, allowing for the evaluation of both item and test-taker abilities.
lightbulbAbout this topic
Item Response Theory (IRT) is a statistical framework used in psychometrics to model the relationship between individuals' latent traits and their item responses on assessments. It focuses on understanding how specific characteristics of test items influence the probability of a correct response, allowing for the evaluation of both item and test-taker abilities.

Key research themes

1. How can Item Response Tree models capture complex response processes beyond traditional IRT outcomes?

This research theme explores advances in Item Response Theory (IRT) that model the internal cognitive or psychological decision processes influencing item response selection. Beyond assessing terminal item responses, item response tree models characterize sequential, nested, and multidimensional decision-making pathways. This detailed modeling offers nuanced insights into psychological assessments, response omissions, and the structure of Likert-type scale responses, addressing limitations of classical IRT models that treat responses as flat outcome categories.

Key finding: Jeon et al. (2015) introduce a generalized item response tree (IRT) model that flexibly incorporates node-specific parametric forms, dimensionality, and covariates, allowing models to capture complex decision processes in... Read more

2. What advantages does Item Response Theory offer over Classical Test Theory in psychological test development and measurement precision?

This research area investigates the methodological and practical benefits of Item Response Theory (IRT) compared to Classical Test Theory (CTT) in psychological and educational assessments. It focuses on how IRT models provide invariant item and person parameters, permit precise measurement precision quantification at varying trait levels, and support refined test development practices. These advantages are crucial for improving test validity, reliability, and interpretability, especially in scales with graded responses.

Key finding: This study applies the graded response model (GRM) of IRT to positive and negative affect scales, demonstrating that IRT estimates person abilities invariantly across different test forms and reveals item discrimination... Read more
Key finding: This paper articulates fundamental measurement issues in psychology and marketing research that IRT can address more effectively than CTT, such as balancing reliability and construct validity and handling item wording... Read more
Key finding: This paper outlines the advantages of IRT in educational and psychological test development, emphasizing its probabilistic modeling of item responses as functions of latent traits and its ability to address inherent... Read more

3. How can item-fit and model-data fit be accurately assessed in IRT to identify aberrant items and improve measurement validity?

This theme focuses on the development and evaluation of statistical methods for assessing the fit of IRT models at the item level, crucial for ensuring accurate parameter estimation and valid test scores. It compares chi-square and entropy-based techniques, investigates challenges with traditional fit statistics due to model dependency and sample-specific grouping, and explores computational innovations to provide more precise diagnostics of item misfit, enabling enhanced item selection and test calibration.

Key finding: This study compares several item-fit statistics including EMRj, traditional chi-square (X2), likelihood ratio (G2), S-X2, and PV-Q1 through Monte Carlo simulations mimicking item-level misfit scenarios under a 2PL IRT model.... Read more

4. What computational methods and software can enhance IRT parameter estimation for complex models and simulation studies?

This area addresses the challenges of flexible IRT model estimation using Bayesian methods and computational resources. It covers implementation strategies using BUGS-language software for various common and extended IRT models, enabling customization for longitudinal or multi-level data structures. It also examines automation with R scripting to conduct large-scale simulation studies using stand-alone software packages, streamlining iterative model fitting and fit metric extraction critical for psychometric research.

Key finding: The paper presents detailed Bayesian modeling code in BUGS language for fitting common IRT models including the 2PL, 3PL, graded response, generalized partial credit, testlet, and generalized testlet models. It highlights the... Read more
Key finding: Lee demonstrates methods to automate complex and large-scale IRT simulation studies by leveraging R's scripting capabilities to generate datasets, prepare software inputs, invoke stand-alone IRT estimation tools (like... Read more

5. How can IRT be extended and applied to continuous and polytomous response data while accounting for measurement constraints?

This theme investigates the modeling of non-dichotomous responses in IRT, including continuous measurements such as response times and Likert-scale data. It explores latent trait models that incorporate distributional restrictions (e.g., response boundedness), extend traditional discrete IRT to continuous domains, and develop threshold models suited for polytomous items. Addressing response scale properties improves model appropriateness, measurement validity, and the treatment of response patterns in diverse assessment contexts.

Key finding: The article establishes a general IRT framework for continuous response variables that models responses as functions of latent traits while explicitly accommodating restrictions such as bounded or positive supports. It shows... Read more
Key finding: This study applies Nonparametric Item Response Theory (NIRT) methods, specifically the Mokken and Dominance Models, to examine the one-dimensionality and invariant item ordering assumptions of the TOEFL iBT listening test.... Read more

All papers in Item Response Theory

An important feature of learning maps, such as Dynamic Learning Maps and Enhanced Learning Maps, is their ability to accommodate nation-wide specifications of standards, such as the Common Core State Standards, within the map nodes along... more
This study explores a new item-writing framework for improving the validity of math assessment items. The authors transfer insights from Cognitive Load Theory (CLT), traditionally used in instructional design, to educational measurement.... more
1~11f<! h'~1 \' ~~, , 1'11;1 ".11 ~~.'UI ~~ ., I''''~; I., .. "" "'1100"", tt lIu ,,~, 'lii0i , ' t,IIw ' "I '~""I ..... ~ "I" *"11111 ""I liN !I~ .. "" ""111\ ~"I ~ "' 1" q .... ,' "., t'l'l •• Ufll lit", I '11'1 t III "I '"'''' 1-.... more
An additive multilevel item structure (AMIS) model with random residuals is proposed. The model includes multilevel latent regressions of item discrimination and item difficulty parameters on covariates at both item and item category... more
Cognitive complexity level is important for measuring both aptitude and achievement in large-scale testing. Tests for standards-based assessment of mathematics, for example, often include cognitive complexity level in the test blueprint.... more
Background: Self-reported depressive complaints among college students might indicate different degrees of severity of depressive states. Through the framework of item response theory, we aim to describe the pattern of responses to items... more
This study examines the response pattern of depressive symptoms in a nationwide student sample, through item analyses of a rating scale by both classical test theory (CTT) and item response theory (IRT). The 21-item Beck Depression... more
This study explored a computerized adaptive test delivery algorithm for latent class identification based on the mixture Rasch model. Four item selection methods based on the Kullback–Leibler (KL) information were proposed and compared... more
su disposición y apoyo al dirigir este trabajo de grado. Al Dr. Ricardo Oliveros por su valiosa contribución con la evaluación clínica de los pacientes.
In psychiatry, the recovery paradigm is increasingly identified as the overarching framework for service provision. Currently, the Recovery Self-Assessment (RSA), a 36-item rating scale, is commonly used to assess the uptake of a recovery... more
Objective-To illustrate how measurement practices can be advanced using as an example the fatigue item bank (FIB) and its applications (short-forms and computerized adaptive test) that were developed via the NIH Patient Reported Outcomes... more
In the current study, we examined the dimensionality of the 16-item Card Sorting subtest of the Delis-Kaplan Executive Functioning System assessment in a sample of 264 native Englishspeaking children between the ages of 9 and 15 years. We... more
Assessing the effectiveness of educational interventions relies on quantifying differences between interventions groups over time in a between-within design 1 . Binary outcome variables (e.g., correct responses versus incorrect responses)... more
In the current study, we examined the dimensionality of the 16-item Card Sorting subtest of the Delis-Kaplan Executive Functioning System assessment in a sample of 264 native English-speaking children between the ages of 9 and 15 years.... more
We studied rheumatoid arthritis (RA) in the North American Rheumatoid Arthritis Consortium (NARAC) data (1499 subjects; 757 families). Identical methods were applied for studying RA in the Genetic Analysis Workshop 15 (GAW15) simulated... more
The purpose of this research was to investigate the relation between measurement bias at the item level (differential item functioning, DIF) and predictive bias at the test score level. DIF was defined as a difference in the probability... more
In their recent paper and the associated Response to this Comment, Tuckerman et al. dispute the form of the Liouville equation, as proposed by Liouville in 1838. They go on to introduce a definition of the entropy which is at variance... more
We study the asymptotic response of polar ordered active fluids ("flocks") to small external aligning fields h. The longitudinal susceptibility χ diverges, in the thermodynamic limit, like h -ν as h → 0. In finite systems of linear size... more
The existing minima for sample size and test length recommendations for DIMTEST (750 examinees and 25 items) are tied to features of the procedure that are no longer in use. The current version of DIMTEST uses a bootstrapping procedure to... more
The existing minima for sample size and test length recommendations for DIMTEST (750 examinees and 25 items) are tied to features of the procedure that are no longer in use. The current version of DIMTEST uses a bootstrapping procedure to... more
The progress test is used to provide useful summative and formative judgments about medical students&#39; knowledge without distorting learning. The test samples the complete knowledge domain expected of new graduates on completion of... more
RESUMEN: El objetivo del presente trabajo fue describir, comparar según género y relacionar las habilidades psicológicas deportivas y el estado de ánimo en deportistas peruanos de Quadball (Quiddicht). La muestra estuvo conformada por 43... more
by Uduak Utibe and 
1 more
In the study reported on here we assessed the dimensionalities and trends in psychometric qualities of the West African Senior School Certificate chemistry examination (WASSCCE) by applying a multidimensional 4-parameter logistic model of... more
Artykuł przedstawia w szerokiej perspektywie poglądy na samobójstwo, które jest traktowane jako apogeum nieprzyjaznej postawy żywionej wobec siebie, a tym samym staje się przykładem skrajnego niebezpieczeństwa egzystencjalnego. Tłem... more
A good item that will measure the intended domain is expected to be free of biases. But several studies have confirmed that some items in a test reveal biases due to a group of testees.. A generally acceptable analytical technique that... more
Forgetting in long-term memory, as measured in a recall or a recognition test, is faster for items encoded more recently than for items encoded earlier. Data on forgetting curves fit a power function well. In contrast, many connectionist... more
Introduction: Ample evidence indicates that assessing children's early literacy skills is crucial for later academic success. This assessment enables the provision of necessary support and materials while engaging them in the culture of... more
Cognitive diagnostic models (CDMs) provide a fine-grained analysis of students' cognitive abilities by determining their mastery or non-mastery of specific attributes. CDMs have been retrofitted to existing non-diagnostic (inter)national... more
In 1960 Georg Rasch helped open the field of Item Response Theory by the model that bears his name, distinguished by the use of a single parameter to model the relationship between item difficulty and person ability. Various extensions of... more
Information Systems (IS) research frequently uses survey data to measure the interplay between technological systems and human beings. Researchers have developed sophisticated procedures to build and validate multi-item scales that... more
Her research focuses on the development of Bayesian nonparametric models for single density estimation and regression modeling on compact spaces, time dynamic point processes, and diagnostic test validation. She is also the main developer... more
One of the questionnaires that will be used to evaluate social learning environments such as Facebook is the Online Social Learning Environment Instrument (OSLEI). The aim of this study was to evaluate the OSLEI using alternative method... more
Concerns among students have increased due to the use of test scores in decision-making, leading them to question whether their results accurately reflect their abilities, especially when they perceive subjectivity in rater scoring. This... more
This study addresses the critical need for robust measurement tools in digital leadership (DL) within educational settings—a topic of increasing relevance but limited research. Using the Rasch model measurement analysis, the study aims to... more
Traditional methods of test parameterization have been found defective in terms of assuming one score and not providing information on skills mastery profile of the examinees, in addition to non-estimation of the fourth parameter-slipping... more
School examinations including Physics have been fraught with biased questions. Equality in the nature of examination questions is not attained between focal and reference test-takers. This makes assessment of the learners' knowledge of... more
The author has granted a nonexclusive license allowing Library and Archives Canada to reproduce, publish, archive, preserve, conserve, communicate to the public by telecommunication or on the Internet, loan, distribute and sell theses... more
The poor state of secondary school students' achievement relative to policy expectation in Physics triggered the study. At the level of instrument quality, the scoring pattern in West African Senior School Certificate Examination's... more
This study compared the effectiveness of Number Right method and the Corrected Method of scoring multiple choice items in social studies. The purpose of the study was to determine students' performance, gender differences and compare the... more
This Monte Carlo study examined the effect of complex sampling of items on the measurement of differential item functioning (DIF) using the Mantel-Haenszel procedure. Data were generated using a three-parameter logistic item response... more
The generalized graded unfolding model (J. Roberts, J. Donoghue, and J. Laughlin, 1998, 1999) is an item response theory model designed to unfold polytomous responses. The model is based on a proximity relation that postulates higher... more
The generalized graded unfolding model (GGUM) (J. Roberts, J. Donoghue, and J. is an item response theory model designed to analyze binary or graded responses that are based on a proximity relation. The purpose of this study was to assess... more
The validity of the assumptions underlying Cliff&#39;s (1989) ordinal true score theory (OTST) were investigated in a three-stage study. OTST makes only ordinal assumptions about the data, and provides a means of converting ordinal item... more
Analyses based on fitting item response models to data from the College Board's Advanced Placement exams in Chemistry and United States History indicated that the constructed-response portion of the tests yielded little information over... more
Using analyses based on fitting item response models to data from the College Board&#39;s Advanced Placement exams in chemistry and United States history, we found that the constructed response portion of the tests yielded little... more
Purpose: In this literature review we evaluated the feasibility and clinimetric quality of quality-of-life (QoL) measurement instruments suitable for use in palliative care. Methods: We conducted a systematic literature review to identify... more
The aim of this research study was to find out characteristics of items and subtests of Tes Potensi akademik (TP) College Admissions (ujian masuk, UM) UGM 2006 approached by unidimensional and multidimensional item response theory with 3... more
Educators in the United States continue to struggle with the disparity in academic achievement of their students and with the ever-increasing emphasis on meeting Adequate Yearly Progress, for No Child Left Behind. Looking at data from the... more
Download research papers for free!