How Does “Sentence Structure and Vocabulary
2012
Abstract
Several studies have evaluated sentence structure and vocabulary (SSV) as a scoring criterion in assessing writing, but no consensus on its functionality has been reached. The present study presents evidence that this scoring criterion may not be appropriate in writing assessment. Scripts by 182 ESL students at two language centers were analyzed with the Rasch partial credit model. Although other scoring criteria functioned satisfactorily, SSV scores did not fit the Rasch model, and analysis of residuals showed SSV scoring on most test prompts loaded on a benign secondary dimension. The study proposes that a lexico-grammatical scoring criterion has potentially conflicting properties, and therefore recommends considering separate vocabulary and grammar criteria in writing assessment.
References (73)
- Aryadoust, V. (2012a/forthcoming). Evaluating the psychometric quality of an ESL placement test of writing: A many-facets Rasch study. Linguistics Journal.
- Aryadoust, V. (2012b). Differential item functioning in while-listening performance tests: The case of IELTS listening test. International Journal of Listening, 26(1), 40-60.
- Aryadoust, V., Akbarzadeh S., & Nasiri, E. (2007a). IELTS writing tutor: Writing task1, academic module. Tehran: Jungle Publication.
- Aryadoust, V., Akbarzadeh S., & Nasiri, E. (2007b). IELTS writing tutor: Writing task1, general module. Tehran: Jungle Publication.
- Aryadoust, V., Goh, C., & Lee, O. K. (2011). An investigation of differential item functioning in the MELAB listening test. Language Assessment Quarterly, 8(4), 361-385.
- Bachman, L. F. (1990). Fundamental consideration in language testing. Oxford: Oxford University Press.
- Baghaei, P. (2008). The Rasch model as a construct validation tool. Rasch Measurement Transactions, 22, 1145-1146.
- Baghaei, P., & Amrahi, N. (2009). Introduction to Rasch measurement. Iranian EFL Journal, 5, 139-154.
- Banerjee, J., Franceschina, F., & Smith, A. M. (2007). Documenting features of written language production typical at different IELTS band score levels. (IELTS Research Report No. 7, the British Council/University of Cambridge Local Examinations Syndicate).
- Bereiter, C., & Scardamalia, M. (1987). The psychology of written composition. Hillsdale, NJ: Lawrance Erlbaum.
- Bond, T. G., & Fox, C. M. (2007). Applying the Rasch model: Fundamental measurement in the human sciences. London: Lawrence Erlbaum Associates.
- Brown, A. (2006). Candidate discourse in the revised IELTS speaking test. (Research Report No. 6, IELTS Australia).
- Cambridge Practice Tests for IELTS 3. (2002). Cambridge: Cambridge University Press. Cambridge Practice Tests for IELTS 4. (2005). Cambridge: Cambridge University Press. Cambridge Practice Tests for IELTS 5. (2006). Cambridge: Cambridge University Press. Cambridge Practice Tests for IELTS 6. (2007). Cambridge: Cambridge University Press.
- Chapelle, C., Grabe, W., & Berns, M. (1997). Communicative language proficiency: definition and implications for TOEFL-2000. TOEFL-2000 Monograph Series. Princeton, NJ: Educational Testing Service.
- Cohen, J. (1960). A coefficient for agreement for nominal scales. Education and Psychological Measurement, 20(1), 37-46.
- Connor, U. (1991). Linuistic/rhetorical measures for evaluating ESL writing. In L. Hamp-Lyons (Ed.), Assessing second language writing in academic contexts (pp. 215-226). Norwood, NJ: Ablex Publication Corporation.
- Eckes, H. (2005). Examining rater effects in TestDaF writing and speaking performance assessments: A many-facet Rasch analysis. Language Assessment Quarterly, 2(3), 197- 221.
- Engelhard, G., Jr. (1992). The measurement of writing ability with a many-facet Rasch mode. Applied measurement in Education, 5(2), 171-191.
- Engelhard, G., Jr. (1994). Evaluating rater accuracy in performance assessments. Journal of Educational Measurement, 33(1), 56-70.
- Engelhard, G., Jr. (1996). Examining rater errors in the assessment of written composition with a many-faceted Rasch model. Journal of Educational Measurement, 33(1), 93-112.
- Engelhard, G., Jr. (2002). Monitoring raters in performance assessments. In G. Tindal and T. Haladyna (Eds.), Large-scale assessment programs for ALL students: Development, implementation, and analysis (pp. 261-287). Mahwah, NJ: Erlbaum.
- Engelhard, G., Jr., & Anderson, D. W. (1998). A binomial trials model for examining the ratings of standard-setting judges. Applied Measurement in Education, 11(3), 209-230.
- Evola, J., Mamer, E., & Lentz, B. (1980). Discrete point versus global scoring for cohesive devices. In J. W. Oller & K. Perkins (Eds.), Research in language testing (pp. 177-181). Rowley, MA: Newbury House.
- Fahim, M., & Houman, B. (2011). The effects of rater training on raters' severity and bias in second language writing assessment. Iranian Journal of Language Testing, 1(1), 1-16.
- Field, A. (2005). Discovering statistics using SPSS. (2nd ed.). London: Sage.
- Gennaro, K. D. (2009). Investigating differences in the writing performance of international and Generation 1.5 students. Language Testing, 26(4), 533-559.
- Grabe, W., & Kaplan, R. B. (1996). Theory and practice of writing: An applied linguistic perspective. New York: Longman.
- Grabowski, K. C. (2008). Investigating the construct validity of a performance test designed to measure grammatical and pragmatic knowledge. Spaan Fellow Working Papers in Second or Foreign Language Assessment, 6, 131-179.
- Hamp-Lyons, L. (1991). Reconstructing "academic writing proficiency". In L. Hamp-Lyons (Ed.), Assessing second language writing in academic contexts (pp. 127-154). Norwood, NJ: Ablex Publication Corporation.
- Hattie, J. (1985). Methodology review: Assessing unidimensionality of tests and items. Applied Psychological Measurement, 9(2), 139-64.
- Hayes J. R., & Flower L. S. (1980). Identifying the organization of writing processes. In L. Gregg & E. R. Steinberg (Eds.), Cognitive processes in writing (pp. 3-30). Hillsdale, NJ: Erlbaum.
- Hayes, J. R. (1996). A new framework for understanding cognition and affect in writing. In C. M. Levy & S. Ransdell (Eds.), The science of writing: Theories, methods, individual differences, and applications (pp. 1-26). Mahwah, NJ: Erlbaum.
- Hedge, T. (2005). Writing. Oxford: Oxford University Press.
- Homburg, T. J. (1984). Holistic evaluation of ESL compositions: Can it be validated objectively? TESOL Quarterly, 18(1), 87-109.
- Hyland, K. (2002). Teaching and researching writing. London: Longman.
- Jackeman, V., & McDowell, C. (1996). Cambridge practice tests for IELTS 1. Cambridge: Cambridge University Press.
- Jackeman, V., & McDowell, C. (2000). Cambridge practice tests for IELTS 2. Cambridge: Cambridge University Press.
- Kim, Y.-H. (2011). Diagnosing EAP writing ability using the Reduced Reparameterized Unified model. Language Testing, 28(4), 509-541.
- Kondo-Brown, K. (2002). An analysis of rater bias with FACETS in measuring Japanese L2 writing performance. Language Testing, 19, 1-29.
- Knoch, U. (2009a). Developing and validating a rating scale for diagnostic writing assessment. Frankfurt: Peter Lang.
- Knoch, U. (2009b). Diagnostic assessment of writing: A comparison of two rating scales. Language Testing, 26(2) 275-304.
- Knoch, U., & Elder, C. (2010). Validity and fairness implications of varying time conditions on a diagnostic test of academic writing proficiency. System, 38(1), 63-74.
- Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159-174.
- Linacre, J. M. (2009a). WINSTEPS Rasch measurement, Version 3.64 [computer program]. Chicago: Winsteps.com.
- Linacre, J. M. (2009b). A users' guide to WINSTEPS® MINISTEPS Rasch-model computer programs. Chicago: Winsteps.com.
- Lougheed, L. (2004). Barron's how to prepare for the Computer-Based TOEFL essay. New York: Barron's Educational series, Inc.
- Lynch, B., & McNamara, T. F. (1998). Using G-theory and many-facet Rasch measurement in the development of performance assessments of the ESL speaking skills of immigrants. Language Testing, 15(2), 158-180.
- McNamara, T. F. (1996). Measuring second language performance. New York: Longman.
- McNamara, T. F. (1990). Item response theory and the validation of an ESP test for health professionals. Language Testing, 7, 52-75
- Mickan, P. (2003). 'What's your score?': An investigation into language descriptors for rating written performance. (Research Report No. 4, IELTS Australia).
- Mickan, P., & Slater, S. (2003). Text analysis and the assessment of Academic Writing. (Research Report No. 4, IELTS Australia).
- Mickan, P., Slater, S., & Gibson, C. (2000). A study of response validity of the IELTS writing subtest. (Research Report No. 3, IELTS Australia).
- Motallebzadeh, K., & Baghaee Moghaddam, P. (2011). Models of language proficiency: A reflection on the construct of language ability. Iranian Journal of Language Testing, 1(1), 42-48.
- Murphy, S., & Yancey, K. B. (2008). Construct and consequence: Validity in writing assessment. In C. Bazerman (Ed.) Handbook of research on writing: History, society, school, individual, text (pp. 365-385). New Jersey: Lawrence Erlbaum.
- O'Loughlin, K., & Wigglesworth, G. (2003). Task design in IELTS academic writing task 1: The effect of quantity and manner of presentation of information on candidate writing. (Research Report No. 4, IELTS Australia).
- Official IELTS Practice Materials. (2007). Cambridge: Cambridge University Press.
- Pollitt, A., & Hutchinson, C. (1987). Calibrating graded assessments: Rasch partial credit analysis of performance in writing. Language Testing, 4(1), 72-92.
- Read, J., & Nation, P. (2006). An investigation of the lexical dimension of the IELTS speaking test. In P. McGovern & S. Walsh (Eds.), IELTS research reports (pp. 207-231). Volume 6. Canberra: IELTS Australia.
- Riazi, M., & Rezaii, M. (2011). Teacher-and peer-scaffolding behaviors: Effects on EFL students' writing improvement. In A. Feryok (Ed.), CLESOL 2010: Proceedings of the 12th National Conference for Community Languages and ESOL (pp. 55-63). Retrieved from http://www.tesolanz.org.nz/
- Saito, H. (2008). EFL classroom peer assessment: Training effects on rating and commenting. Language Testing, 25(4), 553-581.
- Schaefer, D. (2008). Rater bias patterns in an EFL writing assessment. Language Testing, 25(4), 465-493.
- Shaw, S., & Falvey, P. (2008). The IELTS Writing Assessment Revision Project: Towards a revised rating scale. Retrieved from http://www.cambridgeesol.org/assets/pdf/research_reports_01.pdf
- Shaw, S., & Weir, C. J. (2007). Examining writing: Research and practice in assessing second language writing. Cambridge: Cambridge University Press.
- Smith, Jr., E. V. (2002). Detecting and evaluating the impact of multidimensionality using item fit statistics and principal component analysis of residuals. Journal of Applied Measurement, 3, 205-231.
- Smith, R. M. (1991). The distributional properties of Rasch item fit statistics. Educational and Psychological Measurement, 51, 541-565.
- Smith, R. M. (1996) Polytomous mean-square fit statistics. Rasch Measurement Transactions, 10, 516-517.
- Smith, R. M. (2000). Fit analysis in latent trait models. Journal of Applied Measurement, 1, 199- 218.
- Smith, R. M., & Miao, C. Y. (1994). Assessing unidimensionality for Rasch measurement. In M. Wilson (Ed.), Objective measurement: Theory into practice (Vol. 2) (pp. 316-327). Norwood, NJ: Ablex.
- Weigle, S. C. (1998). Using FACETS to model rater training effects. Language Testing, 15(2), 263-287.
- Weigle, S. C. (2002). Assessing writing. Cambridge, UK: Cambridge University Press.
- Wright, B. D. (1994). Local dependency, correlations, and principal components. Rasch Measurement Transactions, 10(3), 509-511.
- Wright, B. D., & Stone, M. H. (1988). Validity in Rasch measurement. University of Chicago: Research Memorandum No. 55.
- Wright, B. D., & Stone, M. H. (1999). Measurement essentials. (2nd ed.). Wilmington, Delaware: Wide Range, Inc.