In a thought-provoking article, available online ahead of publication in the February 2012 edition of Assessment and Evaluation in Higher Education, Teresa McConlogue looks into the pedagogical benefits of peer assessment. Her paper But is it fair? Developing students’ understanding of grading complex written work through peer assessment focuses on work conducted with engineering students at Queen Mary University of London.
Two distinct cohorts of students were required to peer assess a piece of coursework, leading to generation of a summative mark; a laboratory report (n=56, 10% of mark for module) and a literature review (n=26, 25%). Each piece of work was assessed by 4 or 5 peers who were required to provide both a mark and comments on the work. The students were then awarded the mean mark.
Thus far there is nothing exceptional about this process – peer assessment is an established practice in Higher Education (see, for example, Paul Orsmond’s excellent guide on Self- and Peer-Assessment). The controversial element of McConlogue’s activity comes with the fact that the authors of the peer-assessed work were provided with all of the comments made by their contemporaries AND a full record of the range of marks awarded. This “warts and all” approach exposed the students to the mechanics of marking – showing them both the reasoning that went into a mark (some of which seemed poorly aligned with the mark awarded or based on ‘trivialities’) and the fact that an individual “rogue” mark may have significantly influenced the mean. In some cases the individual marks awarded apparently spanned several grade boundaries.
Qualitative evaluation (questionnaires and focus groups) showed that the students frequently found this to have been an unsettling experience. Exposure to a divergent range of scores ran contrary to an ingrained expectation that there ought to be a “correct” mark for their work. The author expresses surprise that many students also thought the process was “unfair” (especially since those with grievances were allowed to ask for a reassessment of the work by staff, a process that generally ended with a mark very close to the originally-awarded mean).
Some of the benefits of involvement in peer assessment can be accrued from formative rather than summative tasks (as, for example, in my own work ‘You have 45 minutes, starting from now’: helping students develop their exam essay skills, in which students offer formative feedback on essays prior to summative marking by tutors). The benefits for students include opportunities to understand more fully the standard of work expected at university (clarifying ‘the rules of the game’, Carless (2006)) and to benchmark their own work alongside those of their contemporaries (In my exercise they also get an appreciation of the difficulty markers have with the legibility of some students’ work).
Providing students with only the mean mark rather than the individual scores would have avoided some of the flak received in the feedback. For McConlogue, however, the exposure of students to the inherent subjectivity of marking represents one of the potential benefits of the exercise. Engagement with the nuances of marking can enhance students’ critical thinking skills and provides opportunity for “assessment dialogue”. There is educational merit in realising that diverse influences can shape the mark awarded for a piece of work and that “application of assessment criteria in HE is a matter of professional judgement not a matter of fact” (Bloxham, 2009). Despite this, comments from participating students give the impression that they would have preferred the work to have been marked by staff since they are perceived to bring (a) greater subject knowledge and (b) greater marking experience to the process and are therefore more likely to get the mark “right”.
Whilst the students may expect their tutors to do a better job than their peers, McConlogue’s review of the literature on consistency in marking by academics suggests that this faith may be unwarranted. In many ways this was the most interesting feature of the paper for me since it puts into the spotlight the issue that is so often the “elephant in the room”, the suspicion that marking – even marking by experienced tutors – is not as reliable as we would like to believe.
Drawing on work by Sue Bloxham (2009), Suellen Shay (2005), John Schacter (1999) and Susan Orr (2006), amongst other, McConlogue notes that marking of anything more than trivial/fact-regurgitation tasks can be prone to inconsistency. The differences may stem from all manner of factors, including the time of day, the number of scripts and the order of marking (i.e. a mediocre piece of work may look poor when marked directly after a first-class essay whereas it may stand out in a positive way if the preceding script was lousy).
To try an counter these effects, course directors organise training for markers and provide ever more proscriptive assessment guidelines, assessment criteria, and/or grade descriptors in advance of marking so as to standardise tutor responses. This is supplemented after the event by moderation across markers. Yet despite these interventions variability persists.
Should we worry? As someone who takes the moderation process seriously and spends many hours tweaking up or down the marks awarded by different tutors, I would like to think that students are usually getting as fair a mark as possible. It would, however, be naive to think that unwarranted differences never occur. On one level this is simply a manifestation of the broader injustices that exist in life. It could also be argued that a 5% or even 10% “error” in the marking of one piece of work that counts for 10% of the mark for a module that in turn represents 20 credits out of a total of 120 credits in a year than contributes 40% to an overall degree classification is not something about which we should be unduly concerned. Over the course of an entire degree programme, individual differences are going to even out.
Yet, at the same time, McConlogue’s paper is a timely reminder that we ought to be doing everything reasonable and practicable to ensure that fairness is achieved. Whether or not the benefits of peer assessment (of summative work) outweigh potential issues of consistency between student markers is an issue with which individual module convenors will have to grapple.