3A: Big Data and “Zoomanities”

Big Data generally refers to extremely large datasets that require demanding computational analysis to reveal patterns and trends, such as the map below generated from the data in millions of Twitter posts. We are producing reams of this data in the 21st century, but how do we analyze it from a humanities perspective?  How do we perform these sorts of analyses if we are interested in periods before regular digital record keeping?

World travel and communications recorded on Twitter

World travel and communications recorded on Twitter

Enter digitization and citizen science initiatives.  One of the major trends in Digital Humanities work is the digitization of old records or print books that are then made searchable and available online for analysis.  Google Books is the most well-known project of this type, and we also read Tim Hitchcock’s article about his pioneering historical projects in this arena, e.g. the Old Bailey Online and London Lives.  These projects took years to build and required the dedicated paid labor of a team of scholars and professionals.  But there’s another model out there that relies on the unpaid labor of thousands of non-expert volunteers who collectively are able to do this work faster and more accurately than our current computers: crowdsourcing.

Today we had a guest lecture from  Evan Roberts from the University of Minnesota’s Population Center, who has just launched a project on Zooniverse, a crowdsourcing initiative that bills itself as “the world’s largest and most popular platform for people-powered research.”  This platform takes advantage of the fact that people can distinguish detailed differences between images that regularly trip up computers, and empowers non-experts to contribute to serious research by reducing complex problems to relatively straightforward decisions:

  • is this galaxy a spiral or an ellipse?
  • is this a lion or a zebra?
  • is this the Greek letter tau or epsilon?

The project that Evan and his team just launched, Measuring the Anzacs, seeks to study demographic and health trends in the early 20th century by transcribing 4.5 million pages worth of service records from the Australian and New Zealand Army Corps during WWI.  This data would take countless years to process with a small team of researchers, but as Evan told us, they hope to speed up this process tremendously by taking advantage of the fact that there are lots of people who have access to a computer, speak English and can read handwriting.

Tim Hitchcock ended his piece with a conundrum:

How to turn big data in to good history?  How do we preserve the democratic and accessible character of the web, while using the tools of a technocratic science model in which popular engagement is generally an afterthought rather than the point.

The Zooniverse model has taken a major step towards resolving this tension and turning formerly restricted research practices into consciously public digital humanities work.



Explore the Measuring the Anzacs project and work your way through at least one document, marking and transcribing the text.

Screen Shot 2015-09-30 at 6.24.24 AM

When you’re done, post a brief comment below giving some feedback on the process.  Were the instructions easy to follow?  Was the text easy to transcribe?  Did you feel like you were making a real contribution to the project?  What did you get out of the project, from a humanities perspective?  Did you come away with a greater understanding of either the research process or the lived experience of the individual people whose records you were working with?


  1. The instructions were fairly easy to follow, although it was a bit confusing at first to determine whether the ‘mark entries’ referred to the labels themselves on the document, the written entries, or both. Some of the text I transcribed was fairly easy, since it was typewritten. I think this is such a cool project! It allows you to look at records of real people and hold a tiny piece of history even from so far away. It’s astounding to be able to connect with all of these people throughout history. Even a century ago, keeping track of records like these and sharing them on such a large scale would have been pretty impossible. This project makes me really appreciate the field of digital humanities so much more and want to contribute to databases like this.

  2. I both marked and transcribed a little–I found it somewhat difficult at times, for the same reasons as boothb (whoever you may be!). Directions didn’t seem too clear at times with what I should mark (Should I write down the service time periods from the sticky note? Should I ignore them? If some are covered, should I only write those ones that I could see?), so a little clarification in the instructions could be nice to account for variability in the sheets themselves. While I was doing this, though, I did find it really interesting–and sobering, and somewhat sad at times–to be looking at a piece of paper spelling out someone’s life and, possibly, death.

  3. I really enjoyed exploring the Measuring the ANZACs project, although like Blossom and Pallav I found some aspects confusing, particularly when it came to sticky notes. I also was unsure of what to do when a word (specifically, names and places) was difficult to read but not illegible– should I write down what it looked like? Should I write the name it most resembled? Should I mark it as illegible? It occurred to me that people from New Zealand have an advantage in this case; the places names that were unfamiliar to me would be more easily recognized by them. But overall, it was fascinating to explore these records of history–and to not only view them, but to take part in the process of making them useful for research. I think making history (and other projects) available to people in a way like this is a really democratizing and empowering way of doing research.

  4. I enjoyed the experience of contributing to the digitization of historical records. While some of the directions were confusing, but I think these are problems that can be fixed over time as the site receives more feedback. Some of the words were illegible to read, like Welnaa2 mentioned, but overall, being able to contribute to such an important project is really fascinating.

  5. I enjoy the opportunity to explore this project on Zooniverse. I was intrigued by the different documentations and their creators’ history. Even though the instructions may sometime appear to be confusing and the webpage may be sometimes a bit difficult to navigate as the page often does not load properly and fully that I could not enter any option(maybe it is also my internet?), I still feel that this is very unique and it can definitely be improved overtime. Of course, overall, I am very honored to be able to contribute to this project.

  6. I was able to explore digitization and it was truly new and interesting. Through the opportunity provided by Zooniverse, contribution to historical based project was fun and meaningful. However, sometimes, directions are a little confusing..

  7. This is by far the most complex zooniverse project I’ve played around with. Due to its complexity, I found it pretty nerve-wracking, especially at first. I kept wanting to hit a “back” button, even though none existed, and even though I knew my one incorrect response wouldn’t mess up all of the data. Otherwise, however, I found the interface pretty remarkable: for it’s complexity, it gets the job done very well. I also enjoyed the ability to look at the documents closely and personally: it certainly made me curious about the person’s life, which I’m sure is part of the whole goal.

  8. I didn’t have any problems with the instructions on the Measuring The Anzacs project, as they were very straightforward. However, I did have a hard time transcribing documents. Much of the writing that I looked at was at the borderline of legibility, which had me making multiple guesses during this transcription process. There were only a few instances when I clearly knew I had transcribed the content correctly (letter for letter). Thus, I didn’t feel like I provided much help with the transcription. The marking process was more straightforward and I definitely felt like I was providing a small contribution to the project every time I finished a document. The main thing I took away from the website was how convenient it all was. People from all over the world could’ve accessed this website and contributed to this piece of history, which is pretty remarkable. Just a generation ago, this whole project would’ve been unthinkable. Lastly, like Pallav, I also found it saddening at times when I casually browsed through the pages someone’s life or death.

  9. The process was very fairly simple and easy. I ran into trouble after realizing there’s no back button. When I was looking closer at a page during the transcription process I saw that there was a sticky note I’d failed to report in the marking step. I also think that the transcription process would be easier for people from older generations because of their familiarity with cursive handwriting.

  10. Going through this digitization process was really fun. I loved how the web-sire was set up. It was really well done and pretty. It made me want to transcribe more pages. As I was transcribing, I felt like I was a part of something big. I felt like i was making history so it was really nice.

  11. The website is very impressive, and on one level participating makes me feel like I’m really contributing to the project, but the handwriting is often so hard to transcribe that I feel very unsure of myself, and as I work through the documents I sometimes worry that I’m doing more harm than good. A few suggestions: (1) Perhaps users should be able to type “_” when they encounter a letter they can’t transcribe. That way, they could still enter the parts they can read and not have to skip the question entirely. (2) A back button would be nice! I sometimes entered an incomplete result by accident and felt guilty for skewing the data. (3) More clear instructions/ less flexibility. Rather than just provide a blank line, less options could help make the results more uniform and make the job easier for the transcribers. For example, if asking for a date, maybe ask for day, month, and year separately. If asking for a location, maybe offer multiple choice. Those are a few suggestions. Overall the project is very impressive and I wish the creators luck and success!

Leave a Reply

Your email address will not be published. Required fields are marked *