Rater Reliability in Evaluation of Essay and Oral Examinations
1970, Pedagogisk Forskning
https://doi.org/10.1080/0031383700140111…
27 pages
1 file
Sign up for access to the world's latest research
Abstract
AI
AI
This study analyzes the reliability of examination scoring methods at the Institute of Psychology in Oslo, focusing on both written and oral assessments across different examination periods. By evaluating grading consistency among multiple examiners, the research highlights discrepancies and potential biases in grading practices, particularly noting differences in how men and women candidates are assessed during oral examinations.
Related papers
University News, 2018
Examination is a measure of student's progress. It is used as a means to organize and integrate knowledge that incorporates both scholastic and non-scholastic aspects of education. During the process of examination, maintenance of fairness, confidentiality, security and timely execution has become a serious challenge, especially in traditional affiliating universities. It became a bigger challenge to keep abreast with the changing times. Therefore, there is a need to bring in reforms in the traditional examination and evaluation systems in arriving uniformity, reliability and validity. The contemporary concerns of higher education are mainly the processes and procedures associated with the delivery of key services and activities of the assessment reforms, which includes coexisting practices of both examination and evaluation. Opinions expressed in the articles are those of the contributors and do not necessarily reflect the views and policies of the Association.
Revista Canaria de Estudios Ingleses, 2011
The importance of the University Entrance Examination in the students' academic future has fostered research on the characteristics of the exams which compose it. Among them, the English exam has been analysed concerning crucial issues such as its validity and reliability, the students' written production in the foreign language, and the type of improvements which may be implemented in the exam. The results obtained in the studies conducted so far on the English exam are overviewed in this paper, so that they may be considered in forthcoming studies or when implementing changes to the exam.
The entry test to University are a hybrid of level tests, survey tests, diagnostic tests and selective ones. In fact some of them (those relating to planned access degree courses) determines a ranking which will allow, within the limits of available places, registration for the course. The other, even if compulsory, determine scores not affecting the inscription but to highlight the deficiencies to be recovered or, in some cases, give negative opinion about inscription. The “number of pre-registrations” problem, which might be irrelevant for a planned access degree, poses serious problems for an unscheduled access degree. In fact, while the test of degree courses in Primary Education and for Childhood and Preadolescence Training can ensure the objectivity and selectivity of the test, for the degree course in Science of Education there is a vital need for the construction of test, in addition to the required reliability and validity, it can find any deficiencies which, if not remedied, could invalidate future studies (Marlow, 2000; Cheung, Bucat, 2002). In other words, high numbers of registrations to unscheduled access course degree affect the quality of students learning path (Notti, 2010). If entry tests do not find and use any instrument relating to the recovery of any debt, the productivity parameter of the CdL and the faculty will suffer heavily. It is clear that it is necessary and useful to check the instruments built, without any presumptions of infallibility, especially since a bad test provides unreliable results (Steven et al., 1990; 1991). For this reason every effort should be made to build valid and reliable tests. The objectives of the research have been, in summary, the following: check the validity and the reliability of entry tests of the degree course in Science of Education; show any kind of problem emerged from statistical analysis; suggest, if possible, solutions to be adopted to make tests congruent with the purposes for which they were built. After obtaining the necessary authorizations, the documentation in electronic form concerning tests given to the students and the corresponding tabs of results were acquired. Then, the following statistical processes were made considering the objectives of the research to check if entry tests are able to select students considering their preparation level; to check the ability of the test to measure the skills for which it was constructed and, consequently, its internal coherence. The test is quite selective, not particularly difficult and with many unreliable items. The study of results examination reveals that the 1133 participant students had more difficulties especially in the two test areas called “Linguistics and literature” and “Geography”. In them there is a good level of selectivity. A strong criticism is emerged from the distractors quality. The results we received, show a sufficient quality of test and a capacity to place in a reliable ranking of student results.
Social validation procedures were used to compare views held by professors and students concerning college examinations. In the first phase ofthis project, four groups ofparticipants (21 professors and 126 students representing psychology and biology) listed characteristics ofgood and poor tests. The characteristics given by the groups were somewhat similar, although very few ideas were given about the results of examinations (e.g., range of scores). A socially valid questionnaire was constructed from the most commonly cited characteristics, and four additional groups of professors (ø : 25) and students (n : 102) were asked to rate the importance of each. MANOVAs indicated that the professors and students gave significantly different ratings, but there were only slight differences between the two disciplines, and no differences between those who reported having taken measurement or testing classes and those who had not. The differences between professors and students were especially clear in ratings of Instructions and Question characteristics of tests. In contrast, items from the questionnaire concerning fhe Coverage and Content of an examination were given similar ratings by the four groups. The results suggest a number of ways that professors can construct examinations which students will respect. Preliminary findings from this investigation were presented at the meeting of the Western
2013
This paper is aimed at analysing the assessment procedure for speaking component of the high-stake test of standard repute, the International English Language Testing System (IELTS). An attempt is made to report the test tasks/test construct, test procedure, its rater(s) and rating criteria, reckon its strengths and in last determine its weaknesses if any.
Journal for educational research online, 2016
Der Artikel von Kupiainen, Marjanen und Hautamaki konzentriert sich auf die zentrale Abschlussprufung der Sekundarstufe II in Finnland als eine Schulabschluss- und Hochschulzugangsprufung. Die dargestellte Studie geht der Frage nach, ob die gestiegenen Auswahlmoglichkeiten der fachspezifischen Prufungen die Vergleichbarkeit der Prufungsergebnisse und die Wahl der Schulerinnen und Schuler nicht nur in der Prufung, sondern bereits wahrend der Schulzeit beeinflussen kann. Es wird Bezug auf Finnlands mehr als 160 Jahre lange Tradition zentraler Abschlussprufungen am Ubergang zwischen Sekundarstufe II und Hochschulzugang genommen. Die Autorengruppe erlautert das finnische System hinsichtlich der Einfuhrung eines kursbasierten (vs. klassen- oder jahrgangsstufenbasierten) Curriculums fur die dreijahrige Sekundarstufe II und bezuglich der anschliesenden Reformen der zentralen Abschlussprufung, durch welche die Auswahlmoglichkeiten von Schulerinnen und Schuler fur die fachspezifischen Prufun...
2014
Language testing professionals and teacher educators have articulated the need for a broad variety stakeholders––including classroom teachers–– to develop assessment literacy. In this paper, we argue that when teachers are involved in local assessment development projects, they can expand their assessment knowledge and skills beyond what is necessary for conducting principled classroom assessments. We further claim that a particular analytic approach, Rasch analysis, should be considered as one possible element of this expanded assessment literacy. To this end, we use placement exam data from one Colombian university to illustrate how analyses from item response theory perspectives (Rasch analysis) differ from, and can usefully complement classical test theory.Evaluadores de lengua y formadores de maestros argumentan que los involucrados en el campo de la educación, incluyendo los maestros de aula, deben desarrollar un conocimiento profundo en el tema de la evaluación. Planteamos qu...
relabs.org
High-stakes tests with self-selected essay questions 2 Abstract This study investigates the effect of reporting the unadjusted raw scores in a high-stakes language exam when raters differ significantly in severity and self-selected questions differ significantly in difficulty. More sophisticated models, introducing meaningful facets and parameters, are successively used to investigate the characteristics of the dataset. The application of the Rasch models to the data showed that examinees could benefit significantly from being marked by lenient raters and by responding to less demanding essay questions. It was also shown that the third rater failed to adjust the raw scores in a way similar to the statistical adjustment by the Rasch models. The study discusses the consequences of reporting unadjusted raw scores with particular emphasis on issues of fairness. High-stakes tests with self-selected essay questions 3 HIGH STAKES EXAMINATIONS WITH SELF-SELECTED ESSAY QUESTIONS:
2014
This paper is aimed at foreign teaching staff in Denmark who are interested in gaining a better understanding of assessment methods and practices in Danish higher education. It addresses assessment practices and grading at Danish universities, with special attention to the use, preparation, conduct, and assessment of oral exams. It also examines the formal role of examiners and co-examiners, exploring possible differences between how international and Danish staff might approach the tasks of examining, co-examining, and grading. Finally, it considers some important issues raised by the increasing use of English as the examination language. Assessment methods Summative assessment is a core activity of any education system. Exams are important in different ways for students, for teachers, and for future employers. To ensure that students fulfil their degree requirements, universities must test them not only at the end of their degree programme but also during the course of it to guarantee progression and partial competences. Exams inevitably structure and shape the work of students, who naturally want to pass their exams and succeed in their studies, and thus to reach high levels of competence and eventually find interesting and well-paying jobs. Exams are also high stakes testing activities with important consequences for the test takers: passing has important advantages, and failing has important disadvantages. In addition, the actual exam results are important to the students, since all final semester grades, which typically only reflect the grades received on exams, appear on the diploma. Teachers may stimulate students to do their best on exams by informing them of the final exam requirements in a positive and constructive way, and sometimes by reminding them of the negative consequences if they do not take their exams seriously. Teachers also sometimes measure their own performance according to the successful results of their students. Finally, future employers need exams to know the levels of knowledge, skills and competence of the job candidates. BA, MA or PhD diplomas inspire confidence, but the actual grades can also be of importance in understanding a candidate's profile. Exams are the only summative form of assessment in Danish university education since no official grading occurs during the semester. With grading and testing not usually being part of the daily culture and power relations in Danish Appeals Danish students have extensive opportunities to appeal against examination results if they do not think that these are fair. The university receives and evaluates appeals, usually at the departmental level, and has a system of appeal bodies that handle such matters. If the student disagrees with the university's decision, the student can appeal it by contacting the Danish Ministry of Science, Innovation and Higher Education. Appealing can result in not only the same or a better grade but also a lower grade.

Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
References (8)
- GUILFORD, J. P. (1954). Psychometric Methods. McGraw-Hill, New York.
- GUILFORD, J. P. (1965). Fundamental Statistics in Psychology and Education. McGraw- Hill, New York.
- HANDAL, G. & OSNES, J. (1970). I hvilken grad er essay-og kortsvarprover repre- sentative og palitelige maleinstrumenter i universitetsundervisning? In HANDAL, G. (ed.), Universitetsstudier under debatt. Universitetsforlaget, Oslo.
- KVALE, S. (1970). En'eksaminasjon av universitetseksamener. Universitetsforlaget, Oslo.
- KVALE, S. & AKSELSEN, T. (1968). Karaktergivning ved psykologieksamen varen 1968. Manuscript, Institute of Psychology, Oslo.
- MCNEMAR, O_. (1969). Psychological Statistics. Wiley, New York.
- MARTON, F. (1967). Prov och evaluering irtom den akademiska utbildningen. Universitets- pedagogiska utredningen, Universitetskanslerambetet, Stockholm.
- TEIGEN, K. H., KVALE, S. & TSCHUDI, F. (1970). Bedommer-reliabilitet i eksamens- kommisjoner: en eksperimentell etterproving. Manuscript, Institute of Psychology, Bergen.