Alasdair J G Gray

Connecting the dots in the World's data

Will the real Kevin Macleod please line up?

Last week I attended the Digitising Scotland Project Colloquium at Raasay House (featured image above) on the Isle of Raasay. The colloquium was a gathering of historians and computer scientists to discuss the challenges of linking the vital records of the people of Scotland between 1851 and 1974.

The Digitising Scotland Project is having the birth, marriage, and death records of Scotland transcribed from the scans of the original hand written registration books. This process is not without its own challenges, try reading this birth record of a famous Scottish artist and architect, but the focus of the colloquium was on what happens after the records have been transcribed.

Each Scottish vital record identifies several individuals, e.g. on a birth record you will have the baby, their parents, the informant, and the registrar. The same individuals will appear on multiple records relating to events in their own life, e.g. an individual will have a birth record, potentially one or more marriage records, and a death record, assuming that they have not emigrated. They can also appear in the records of other individuals, e.g. as a mother on a birth record, the mother-of-the-bride on a marriage record, or the doctor on a death record. The challenge is how to identify the same individual across all the records, when all you have is a name (first and last) and potentially the age.

The problem is compounded in an area like Skye, which was one of the focus regions of the Digitising Scotland project, because there is a relatively small distribution of names on which to draw upon. For example, a name like Kevin Macleod will appear on multiple records. In some cases the name will correspond to a single Kevin Macleod, in other cases it will be a closely related Kevin Macleod, e.g. Kevin Macleod the father of Kevin Macleod, and in others the two Kevin Macleods will not be related at all. The challenge is how to develop a computer algorithm that is capable of making these distinctions.

The colloquium was a great opportunity for historians and computer scientists to discuss the challenges and help each other to develop a solution. However, first we had to agree on a common understanding for terms such as “record” and “individual”.

Overall, we made great progress on exchanging ideas and techniques. We heard how similar challenges are being addressed in a related project focusing on North Orkney, how historians approach the record linkage challenge, and about work for automatically classifying causes of death to their ICD10 code and jobs to HISCO. There was also time to socialise and enjoy some of the scenery of Raasay, which is a beautiful island the size of Manhattan but with a population of only 160.

View from the meeting room

View from the meeting room

Sunset over Portree, Skye

Sunset over Portree, Skye

    About Me

    Headshot

    I'm an Associate Professor in Computer Science at Heriot-Watt University. My research focuses on linking datasets. Read more

    Tweets