Seeing the whole archive

Yesterday I went to Dr Mitchell Whitelaw’s impressive presentation at the National Archives, about his Visible Archive project.

First, he gave a great introduction to why visualisations are important and how they can help you get a handle on a collection. In brief, search excels when you know which small piece you’re looking for. But if you want to explore the whole, you need another way in. Visualisations are great because by looking, we can find patterns and therefore intrinsic structures, which help us to make sense of and thereby navigate within large data sets.

Look at this beautiful visualisation of all the series in the Archives:

Every series – big square means

65k archival series – a big border => physically large; a big interior square => a lot of registered items

In the interactive version, you can click on any series and see the agencies that created or controlled it, and the other series to which it relates – eg an index to the series, or a successive series:

Series A432, which agencies created and controlled it, and its related series

Series A432, which agencies created and controlled it, and its related series

When you highlight one of the agencies, in this case CA5, the orange squares also indicate all the other series that that agency created.

Ah, the beauty of the series system! As Ross Gibbs, Director-General of the National Archives, said at the end of the presentation, Peter Scott would be elated.

But wait, there was more. Mitchell then showed us a deceptively simple visualisation of a single series, A1. It started with a tag cloud of the 150 most common words in the titles if items. ‘Naturalisation’ and ‘certificate’ were huge, and there were a lot of names, of places like Norfolk and Papua but also of people.

On hover you could see the spread of each term in items over time, and on click you could see a list of items. Then, if the item was digitised you could also have a look at each folio. Nice!

But the zing was yet to come. You can also combine two terms, or exclude one (eg, what is there in that series apart from all those naturalisation files? And in this way you could start to make discoveries, just by playing around with the tag cloud – for example, that there was a major cyclone in Darwin in 1937.

In so many ways, visualisation works as a way in to the records. We can’t predict all the ways that it works until we see them working. But sure as eggs there will be ways, not least because the national archives data has an in-built structure.

Thanks Mitchell ! for doing this great work and for making it look effortless. (I know it’s not!)

Leave a comment