Marking (in)consistency – the elephant in the assessment room?

In September 2006 Banksy (briefly) included a painted "Elephant in the Room" in his LA show

In a thought-provoking article, available online ahead of publication in the February 2012 edition of Assessment and Evaluation in Higher Education, Teresa McConlogue looks into the pedagogical benefits of peer assessment. Her paper But is it fair? Developing students’ understanding of grading complex written work through peer assessment focuses on work conducted with engineering students at Queen Mary University of London.

Two distinct cohorts of students were required to peer assess a piece of coursework, leading to generation of a summative mark; a laboratory report (n=56, 10% of mark for module) and a literature review (n=26, 25%). Each piece of work was assessed by 4 or 5 peers who were required to provide both a mark and comments on the work. The students were then awarded the mean mark.

Thus far there is nothing exceptional about this process – peer assessment is an established practice in Higher Education (see, for example, Paul Orsmond’s excellent guide on Self- and Peer-Assessment). The controversial element of McConlogue’s activity comes with the fact that the authors of the peer-assessed work were provided with all of the comments made by their contemporaries AND a full record of the range of marks awarded. This “warts and all” approach exposed the students to the mechanics of marking – showing them both the reasoning that went into a mark (some of which seemed poorly aligned with the mark awarded or based on ‘trivialities’) and the fact that an individual “rogue” mark may have significantly influenced the mean. In some cases the individual marks awarded apparently spanned  several grade boundaries.

Qualitative evaluation (questionnaires and focus groups) showed that the students frequently found this to have been an unsettling experience. Exposure to a divergent range of scores ran contrary to an ingrained expectation that there ought to be a “correct” mark for their work. The author expresses surprise that many students also thought the process was “unfair” (especially since those with grievances were allowed to ask for a reassessment of the work by staff, a process that generally ended with a mark very close to the originally-awarded mean).

Some of the benefits of involvement in peer assessment can be accrued from formative rather than summative tasks (as, for example, in my own work ‘You have 45 minutes, starting from now’: helping students develop their exam essay skills, in which students offer formative feedback on essays prior to summative marking by tutors). The benefits for students include opportunities to understand more fully the standard of work expected at university (clarifying ‘the rules of the game’, Carless (2006)) and to benchmark their own work alongside those of their contemporaries (In my exercise they also get an appreciation of the difficulty markers have with the legibility of some students’ work).

Embracing subjectivity?

Providing students with only the mean mark rather than the individual scores would have avoided some of the flak received in the feedback. For McConlogue, however, the exposure of students to the inherent subjectivity of marking represents one of the potential benefits of the exercise. Engagement with the nuances of marking can enhance students’ critical thinking skills and provides opportunity for “assessment dialogue”. There is educational merit in realising that diverse influences can shape the mark awarded for a piece of work and that “application of assessment criteria in HE is a matter of professional judgement not a matter of fact” (Bloxham, 2009). Despite this, comments from participating students give the impression that they would have preferred the work to have been marked by staff since they are perceived to bring (a) greater subject knowledge and (b) greater marking experience to the process and are therefore more likely to get the mark “right”.

Whilst the students may expect their tutors to do a better job than their peers, McConlogue’s review of the literature on consistency in marking by academics suggests that this faith may be unwarranted. In many ways this was the most interesting feature of the paper for me since it puts into the spotlight the issue that is so often the “elephant in the room”, the suspicion that marking – even marking by experienced tutors – is not as reliable as we would like to believe.

Drawing on work by Sue Bloxham (2009), Suellen Shay (2005), John Schacter (1999) and  Susan Orr (2006), amongst other, McConlogue notes that marking of anything more than trivial/fact-regurgitation tasks can be prone to inconsistency. The differences may stem from all manner of factors, including the time of day, the number of scripts and the order of marking (i.e. a mediocre piece of work may look poor when marked directly after a first-class essay whereas it may stand out in a positive way if the preceding script was lousy).

To try an counter these effects, course directors organise training for markers and provide ever more proscriptive assessment guidelines, assessment criteria, and/or grade descriptors in advance of marking so as to standardise tutor responses. This is supplemented after the event by moderation across markers. Yet despite these interventions variability persists.

Should we worry? As someone who takes the moderation process seriously and spends many hours tweaking up or down the marks awarded by different tutors, I would like to think that students are usually getting as fair a mark as possible. It would, however, be naive to think that unwarranted differences never occur. On one level this is simply a manifestation of the broader injustices that exist in life. It could also be argued that a 5% or even 10% “error” in the marking of one piece of work that counts for 10% of the mark for a module that in turn represents 20 credits out of a total of 120 credits in a year than contributes 40% to an overall degree classification is not something about which we should be unduly concerned. Over the course of an entire degree programme, individual differences are going to even out.

Yet, at the same time, McConlogue’s paper is a timely reminder that we ought to be doing everything reasonable and practicable to ensure that fairness is achieved. Whether or not the benefits of peer assessment (of summative work) outweigh potential issues of consistency between student markers is an issue with which individual module convenors will have to grapple.

Advertisements

1 Comment

  1. Just a quick and dirty reaction as Graham Gibbs would say!!..

    I think this is a huge area Chris. In my experience, one of the interesting facts which comes out of double marking, moderation etc, (eg for project work, portfolios) is that quite often whilst marks for individual components may vary between markers their overall assessment is often very similar eg one comes up with a mark of 63% and one with 65% overall, and I think this is probably due to different peoples expectations and beliefs about what is important within an overall framework of “I know a 2i when I see it”. I might give more weight to accuracy in my specific area of expertise which is missed by the second marker for example, and I remember once having quite a heated debate over a presentation mark because I thought correct spelling was more important than did the other marker. It is however an area where I would say experience counts – those relatively new to marking are often harder markers than those with more experience, and PhD students (recent gradautes) have often been the hardest of all!
    I do think it is important for students to see where their mark came from but I am moving away from giving specific marks to more general degree classification marks, as I think this gives students what they want to know but enables them to focus on what they need to do to maintain or move up a grade. It also (hopefully) removes those requests for remarks over a 1% differnce from their colleague. Once students have engaged with the grade, % marks can be released if desired, or left for external moderation before finalisation.


Comments RSS TrackBack Identifier URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

  • Awards

    The Power of Comparative Genomics received a Special Commendation

  • December 2011
    M T W T F S S
    « Nov   Jan »
     1234
    567891011
    12131415161718
    19202122232425
    262728293031