Big Data generally refers to extremely large datasets that require demanding computational analysis to reveal patterns and trends, such as the map below generated from the data in millions of Twitter posts. We are producing reams of this data in the 21st century, but how do we analyze it from a humanities perspective? How do we perform these sorts of analyses if we are interested in periods before regular digital record keeping?
Enter digitization and citizen science initiatives. One of the major trends in Digital Humanities work is the digitization of old records or print books that are then made searchable and available online for analysis. Google Books is the most well-known project of this type, and we also read Tim Hitchcock’s article about his pioneering historical projects in this arena, e.g. the Old Bailey Online and London Lives. These projects took years to build and required the dedicated paid labor of a team of scholars and professionals. But there’s another model out there that relies on the unpaid labor of thousands of non-expert volunteers who collectively are able to do this work faster and more accurately than our current computers: crowdsourcing.
Today we had a guest lecture from Evan Roberts from the University of Minnesota’s Population Center, who has just launched a project on Zooniverse, a crowdsourcing initiative that bills itself as “the world’s largest and most popular platform for people-powered research.” This platform takes advantage of the fact that people can distinguish detailed differences between images that regularly trip up computers, and empowers non-experts to contribute to serious research by reducing complex problems to relatively straightforward decisions:
- is this galaxy a spiral or an ellipse?
- is this a lion or a zebra?
- is this the Greek letter tau or epsilon?
The project that Evan and his team just launched, Measuring the Anzacs, seeks to study demographic and health trends in the early 20th century by transcribing 4.5 million pages worth of service records from the Australian and New Zealand Army Corps during WWI. This data would take countless years to process with a small team of researchers, but as Evan told us, they hope to speed up this process tremendously by taking advantage of the fact that there are lots of people who have access to a computer, speak English and can read handwriting.
Tim Hitchcock ended his piece with a conundrum:
How to turn big data in to good history? How do we preserve the democratic and accessible character of the web, while using the tools of a technocratic science model in which popular engagement is generally an afterthought rather than the point.
The Zooniverse model has taken a major step towards resolving this tension and turning formerly restricted research practices into consciously public digital humanities work.
Explore the Measuring the Anzacs project and work your way through at least one document, marking and transcribing the text.
When you’re done, post a brief comment below giving some feedback on the process. Were the instructions easy to follow? Was the text easy to transcribe? Did you feel like you were making a real contribution to the project? What did you get out of the project, from a humanities perspective? Did you come away with a greater understanding of either the research process or the lived experience of the individual people whose records you were working with?