Anonymous data and educational research

Comparing 'before' and 'after' data needs some identification

When undertaking educational research you often want to know how an intervention has affected a cohort, and ideally to be able to drill down into the data to see the impact on individuals. In order to match pre-and post- activity surveys, some kind of identifier is required. You could ask the students to put their names on the forms, but they may have concerns that this will have ramifications for their coursework. What else you could do?

There are a range of semi-anonymised labels you could use. At various times in my own work I’ve used formal candidate number, email username and date of birth (the latter often throws up more than one student with the same date, but handwriting can then distinguish). In each of these cases, however, it remains a relatively trivial step for someone with access to the right databases to decode the label and convert it into a name. Of course there is generally no reason why a researcher would want to do this, and students trust that you are not going to waste your precious time doing so.

What else might you do? You could ask the students to pick a bogus name or their favourite superhero, but these run several risks – including having surveys completed multiple “lady gaga”s or “dr [insert your name here]”. The students might also forget the random name they picked between the first and the second test.

During a recent coffee-time conversation with a colleague from the social sciences, she recommended a system for developing an essentially unbreakable code for times when proper anonymity is necessary (or desirable) whilst retaining the potential to match ‘before’ and ‘after’ surveys.

The system generates a series of letters and numbers that is consistent and sufficiently detailed to be unique to an individual without allowing participant identification. The questions used to produce the code are an amalgamation of the sort of security questions banks and websites sometimes pose to authenticate ID. By combining components from a variety

1. The first letter of the town or city where you were born (e.g. if born in Lincoln, put L).

2. The last letter of your mother’s maiden name (e.g. if her name was Carter, put R)

3. The first digit of your house number (e.g. if you live at number 43, put 4). If the house only has a name, put zero.

4. The second digit of the day of the month in which you are born (e.g. if born on the 27th September, put a 7).

5. The second letter in the name of your oldest brother or sister (e.g. if your brother is called Austin, put U). If you are an only child put ‘Z’.

This string of five letters and numbers ought to be adequate to produce a unique code for each participant. For most pedagogic research this is over and above the necessary security, but for times when the data is particularly sensitive it might give a useful.



  1. Really liked the idea Chris
    quite brilliant!
    I will definitely use something of the sort next time the need arises

  2. So the idea is that study participants identify them selves by generating these codes?
    Likelihood that they’ll get it right?

    • I haven’t tried it in earnest yet. I know some students fail to get their candidate number right, but the advantage here seems to be a short set of questions to which they know the answer hence advantage over “think of a number or name and remember it for 3 months”

    • Even students cant forget things like the day of the month they were born, the letter of their mothers first name etc can they !

      • But they can type badly, carelessly, or simply not bother.
        My point is how accurate will participant-generated codes be?

      • @Alan I think for most purposes simpler systems are sufficient, but if there is a need to offer greater reassurance that the data is anonymous then this would be worth considering. As I said, I’ve not tried it yet myself but there’s no reason to believe that the vast majority of students in a group wouldn’t record their code accurately.

  3. […] I realised that I have actually recommended a system before (I won’t repeat it here, follow this link to see the details). At the time my interest in the topic was largely theoretical, but several […]

Comments RSS TrackBack Identifier URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s