The Use of Person Fit to Model Panelist Fit
2012
Abstract
The use of person-fit methods to determine the extent to which a panelist's ratings fit the item response theory (IRT) models used in the National Assessment of Educational Progress (NAEP) is demonstrated. Person-fit methods are statistical methods that allow the identification of nonfitting response vectors. To determine whether panelists' ratings fit the IRT models used in the NAEP, the 1(z) statistic (F. Drasgow, M. Levine, and E. Williams, 1985) was used. Rating data from the 1994 NAEP achievement level setting process were obtained for grade 12 geography, for which 29 panelists (primarily teachers) set levels. A response vector was created for each panelist for each achievement level using each of three p-value criteria and simulated item score string estimation (ISSE) values were created. The 1(z) statistic was calculated for each of the 27 response vectors associated with each of the 29 panelists. Means and standard deviations of the 1(z) distributions were computed f...
References (14)
- ACT (1997). Setting achievement levels on the 1996 National Assessment of Educational Progress in Science final report. Iowa City, IA: Author.
- ACT (1995a). NAEP reading revisit: An evaluation of the 1992 achievement levels descriptions. Iowa City, IA: Author.
- ACT (1995b). Results of the 1994 Geography NAEP Achievement Levels-Setting pilot study. Iowa City, IA: Author.
- ACT (1995c). Results of the 1994 U.S. History NAEP Achievement Levels-Setting pilot study. Iowa City, IA: Author.
- ACT (1994). Design document for setting achievement levels on the 1994 National Assessment of Educational Progress in Geography and U.S. History and the 1996 National Assessment of Educational Progress in Science. Iowa City, IA: Author.
- ACT (1993a). Setting achievement levels on the 1992 National Assessment of Educational Progress in Mathematics, Reading, and Writing: A technical report on reliability and validity. Iowa City, IA: Author.
- Allen, N.L., Johnson, E.G., Mislevy, RJ., & Thomas, N. (1996). Scaling procedures. In N.L. Allen, D.L. Kline, & C.A. Zelenak, The NAEP 1994 Technical Report, Wahington D.C: National Center for Education Statistics.
- Johnson, E.G., Mislevy, RJ., & Thomas, N. (1994). Scaling procedures. In E.G. Johnson & J.E. Carlson, The NAEP 1992 Technical Report, Wahington D.C: National Center for Education Statistics.
- Bay, L. (1997). Comparing student performance on different item formats relative to achievement levels cutpoints. In M.L. Bourque (Moderator), Setting Standards for NAEP. Related-papers session conducted at the meeting of the National Council on Educational Measurement, April, 1998, San Diego, CA.
- Bay, L., Chen, L., & Reckase, M. D. (1997). The grid: A possible rating method for the 1998 NAEP writing achievement levels-setting process. A report prepared for the meeting of the 1998 NAEP Achievement Levels-Setting Project Technical Advisory Committee for Standard Setting (TACSS), Oct. 2-3, 1997, St. Louis, MO.
- Bay, L., & Hanson, B. A. (1997). Computing achievement levels.cutpoints from NAEP booklet classification studies: A secondary analysis. A report prepared for the meeting of the 1998 NAEP Achievement Levels-Setting Project Technical Advisory Committee for Standard Setting (TACSS), Oct. 2-3, 1997, St. Louis, MO.
- Chen, Wen-Hung (1997). Setting achievement levels for NAEP using item score string: A simulation study. A report prepared for the meeting of the 1998 NAEP Achievement Levels-Setting Project Technical Advisory Committee for Standard Setting (TACSS), Oct. 2-3, 1997, St. Louis, MO.
- Drasgow, F., Levine, M. V., & Williams, E. A. (1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38, 67-86.
- Levine, M. V., & Rubin, D. B. (1979). Measuring the appropriateness of multiple-choice test scores. Journal of Educational Statistics, 4, 269-290.