Big Data and Digital Humanities

Last week we tore furiously through the front-end of web development — HTML, CSS and JavaScript.  But we are not learning coding here, we are doing Digital Humanities. Today let’s look at how we can put those skills into practice for humanities data to tell a story.

One of the longest-running types of applications is an interactive timeline.  We will do a class project to convert a flat timeline into an interactive one using an easy to use application, but there are other tools that require more coding and show you how these work.  Take a look at the SIMILE timeline tool for an example of one of those.

In-Class Project

Today, we are going to use the beautiful TimelineJS framework to explore gathering data in a flat spreadsheet and leveraging JavaScript libraries to make them interactive.  The humble spreadsheet or CSV, is one of the most common and versatile data formats for DH projects, as they allow you to quickly and collaboratively gather and sort large amounts of data without having to commit to a set format at the outset. Once the data are gathered, they can be cleaned and filtered using a number of easily accessible tools, and then imported into any number of databases, visualization engines, or file systems for long-term, stable storage.

Last time, we began sketching out our plans for our collective class project on The Defeat of Jesse James. We collectively decided on a project that includes some form of animated map with a timeline that would ideally enable the events of the day to be viewed from multiple perspectives, and analyzed in broader historical and cultural context.

Jesse James brainstorm 1Jesse James brainstorm 2

 

 

 

 

 

 

Today, we are going to explore the flat data structure of the CSV and begin with the timeline portion of the project, by reading and researching the resources linked from the MNopedia article on the Northfield Bank Raid.

  1. I have started a blank timeline for us by following the 4 step instructions to Make a New Timeline on the KnightLab TimelineJS page.
  2. Go to the Google Spreadsheet backend, which you should all be able to edit.
  3. Once you are setup, we need to delete the template data and start adding our own. There are much more sophisticated ways to do this, which we’ll touch on in future classes, but for now, we are going to do things the old fashioned way by reading, writing and maybe doing some copy/pasting.
  4. We’ll want to organize our efforts first and figure out who is going to gather what from which sources. As we go, things will be messy at first, but we can start to clean them up by keeping a few things in mind.
  5. All entries should have a brief headline that summarizes the text on that date’s card, which you’ll need to write.
  6. Add images and links to other resources, making sure to check permissions and include media credits.
  7. Finally, explore what happens to the timeline when you use tags to categorize events.  Can we use these to start imposing some structure on our messy collaborative data gathering?

 


Big Data

Big Data generally refers to extremely large datasets that require demanding computational analysis to reveal patterns and trends, such as the map below generated from the data in millions of Twitter posts. We are producing reams of this data in the 21st century, but how do we analyze it from a humanities perspective?  How do we perform these sorts of analyses if we are interested in periods before regular digital record keeping?

World travel and communications recorded on Twitter
World travel and communications recorded on Twitter

Enter digitization and citizen science initiatives.  One of the major trends in Digital Humanities work is the digitization of old records or print books that are then made searchable and available online for analysis.  Google Books is the most well-known project of this type, and we also read Tim Hitchcock’s article about his pioneering historical projects in this arena, e.g. the Old Bailey Online and London Lives.  These projects took years to build and required the dedicated paid labor of a team of scholars and professionals.  But there’s another model out there that relies on the unpaid labor of thousands of non-expert volunteers who collectively are able to do this work faster and more accurately than our current computers: crowdsourcing.

Zooniverse is a crowdsourcing initiative that bills itself as “the world’s largest and most popular platform for people-powered research.”  This platform takes advantage of the fact that people can distinguish detailed differences between images that regularly trip up computers, and empowers non-experts to contribute to serious research by reducing complex problems to relatively straightforward decisions:

  • is this galaxy a spiral or an ellipse?
  • is this a lion or a zebra?
  • is this the Greek letter tau or epsilon?

Measuring the Anzacs, seeks to study demographic and health trends in the early 20th century by transcribing 4.5 million pages worth of service records from the Australian and New Zealand Army Corps during WWI.  This data would take countless years to process with a small team of researchers, but as Evan told us, they hope to speed up this process tremendously by taking advantage of the fact that there are lots of people who have access to a computer, speak English and can read handwriting.

Tim Hitchcock ended his piece with a conundrum:

How to turn big data in to good history?  How do we preserve the democratic and accessible character of the web, while using the tools of a technocratic science model in which popular engagement is generally an afterthought rather than the point.

The Zooniverse model has taken a major step towards resolving this tension and turning formerly restricted research practices into consciously public digital humanities work.

 


Assignment

Explore the Measuring the Anzacs project and work your way through at least one document, marking and transcribing the text.

Screen Shot 2015-09-30 at 6.24.24 AM

When you’re done, post a brief comment below giving some feedback on the process.  Were the instructions easy to follow?  Was the text easy to transcribe?  Did you feel like you were making a real contribution to the project?  What did you get out of the project, from a humanities perspective?  Did you come away with a greater understanding of either the research process or the lived experience of the individual people whose records you were working with?

Author: Austin

13 comments

  1. The instructions were fairly easy to follow once I got the hang of it. Before that it was a little but challenging. Text was easy to transcribe in all but a few cases. In terms of my feeling of contribution, that felt limited, but it didn’t feel like useless work- it seemed like it would generally be helpful but each detail I added didn’t feel important.
    I got to mark a bunch of documents about one guy- from his enlistment to his discharging. I felt I got a little bit of a sense of how Peter Alexander McLaren’s short time in the military was, and especially, from his medical records, of how it hurt his body.
    Doing research in records like these without any digitized information must have been a real pain in butt, like the one Peter McLaren got when he was shot in the buttocks.

  2. For the most part transcribing and marking had fairly shallow learning curves, and I was able to get the hang of the processes fairly quickly. One thing that was frustrating, interface-wise, was transitioning from marking to transcribing on the same document. If the document was already marked, often times the transcribe button for a marking box would be covered up by the box below it, which made things more difficult than they needed to be.
    Collaborating on an online project like that was a cool experience. I didn’t necessarily feel as if I was making a significant contribution, but more as if I was a small cog among many on a large mechanism. I also think that the digitization of documents hadn’t quite hit home for me as a process until I worked on marking and transcribing myself. I have some trouble reading cursive, which definitely made it difficult for me to transcribe the handwritten documents, and I developed more depth in my understanding of the importance of digitization, both in terms of preserving these documents and making them accessible to a wider audience.

  3. I thought that the way this site was set up was very organized. Every step set up the user for the next step, the instructions depended on the choices that the user selected, and culminated in a tidy collection of information. It was fast going and satisfying to complete, as each part had a little done button. I felt like I was contributing to a conversation and somehow to history. By letting everyone participate in the act of transcribing, it made me far more interested in the histories of these men than if I was just reading reports. I was caught up in the life of the man whose document I was transcribing, and was taken aback by the death date at the bottom of the page. The interaction between user and the history I think makes the process far more valuable and the take away much more meaningful.

  4. I marked and transcribed service records for a couple people. One of them had detailed information about their unit, rank, and periods of service, while the other’s documents were mostly illegible. It was interesting learning about these people and the New Zealand arms corps and being able to interact with their official documents. The instructions were easy to follow and there were question marks on the pop boxes that explained anything I didn’t know how to do. I didn’t feel like I was making a meaningful contribution because the documents I looked at already had a few marks on them and I only looked at a few. I would have liked to mark up a document that had been completely untouched, but I was unsure of how to find one. When I look at the big picture, though, this process is really fascinating as it allows for anyone who is interested to work and interact with the project. From a humanities perspective, it was interesting to learn about these people like how long they served and where they were coming from. It was also cool to think about the number of people that are collaborating to transcribe and make these documents digital. There is so much more that can be learned about New Zealanders in the arms corps after all the documents are consolidated. I have greater appreciation for all the archives that are available online and for the people or algorithms that digitized them after going through this process.

  5. I went through the process of transcribing and marking a few ANZAC documents. While transcribing, I found some documents simple to read, while others were essentially illegible. There were some abbreviations I was unsure of, but I discovered a page on the website which explained what all the abbreviations were. I didn’t feel much of a connection with the people whose records I was working with. There wasn’t much information about them—they all seemed similar.
    I felt like I was not making a contribution to the project. It appears as if I’m a small portion of the project. In fact, I’ve never contributed to a crowdsourced project before, and I don’t think I have any reason to in the future. I don’t know why anyone would want to contribute to one… It’s free labor, and there is no entertainment value. Sorry for the negative outlook, just how I feel! Although, from the other side of the spectrum, I do understand why digital humanists crowdsource because there ARE people who contribute.

  6. Transcribing the text was pretty hard. Most of the documents had really bad handwriting and I could barely make out what it said. The instructions were definitely easy to follow and I felt like I was contributing to something much bigger than myself. Crowdsourcing, in general, is a very good way of getting things done, as people can even crowdsource in their free time if it’s something that’s easy. I definitely got a better sense of what these soldiers were going through and my understanding of the life of a soldier was also broadened. It was rather interesting to see a project like this, as it both gets something done, and spreads awareness of something. I think it was an enlightening experience, if not maybe slightly frustrating.

  7. As others have suggested, transcribing was indeed pretty hard. Most of the documents I transcribed had a really illegible handwriting. The documents also used a lot of abbreviations which made transcribing even more difficult. However, the webpage provided field guide which was really helpful in explaining what everything means. Although I did not feel that I made a significant contribution to the project, I don’t think it was all for nothing. Some of the documents I worked on, as it turns out, have been marked and transcribed previously. At first, I wouldn’t say I connected with any of the soldiers personally after marking and transcribing their documents. I more or less treated the documents as some sort of data which is probably due to my different academic background. However, as I continued, I became more aware of the life stories of the soldiers behind the documents and what they might have gone through during the war. Unfortunately the website crashed after I transcribed some documents.
    This was actually a really cool crowdsourcing project, and I believe this could accelerate the project, especially compared to if it would have been done manually.

  8. The instructions were pretty easy to follow along with. The text was not that hard to transcribe once I got the hang of it. It was confusing at first, but I got used to it and that allowed me to transcribe it fairly easily. Yeah, I sort of felt like I was making some contributions to to the project. Most of it was just step by step contributions and some of the work was already done, so I did not have the full experience of transcribing. To be honest, I did not have any real connection to the text because I saw the text as a piece of data rather than a text to read and get somewhat of an experience of people of New Zealand in WW1. Nonetheless, it provided me with good information about the conditions of the war and, to some extent, the lived experiences of the individual people.

  9. While I feel that I wasn’t making a significant contribution, I found this crowdsourcing project to be pretty cool as it was interesting to see some old documents that often caused me to think further about the personalities and stories of the men that these documents describe. Marking up was an easier way to have this learning experience of examining these documents, but transcribing was more challenging and took a bit more time to learn as there was less guidance through the process. As others have mentioned, I struggled with reading some handwriting and some documents had areas that were covered, upside down, or faded, making them challenging to read. I also struggled with the website as it took a while for them to find a subject that needs transcribing, and the website crashed a few times. Overall, the activity on this website is a cool example of the crowdsourcing process.

  10. I don’t feel as though I made a significant contribution to the project and I didn’t really learn from the data because I simply viewed the data as data. Transcribing started off pretty difficult because it was hard to read most of the handwriting. It got better the more I did it, but it was still fairly challenging. I didn’t like the transition from marking to transcribing the same text as boxes were in the way of one another. Overall I can see why it is important to help mark and transcribe the documents and it really can have a positive outlook on the person viewing the items already transcribed, but it seems dreadful for the person who has to do the transcribing.

  11. I felt that measuringtheanzacs.com did a great job of providing informative instructions for both the transcribing and marking sections. I generally found marking to be easier but more tedious than transcribing. Like many others, I found many of the transcribable documents to be illegible most of time. While on the other hand, marking was easier but took a lot more time.

    I think it would be easy for someone to feel like they weren’t contributing to this project (as is the way with many crowdsourcing projects in general.) However, I think if the people participating in the crowdsourcing were more invested in the research or were able to readily see the results of their work they would feel like they were making more a contribution.

  12. This website is actually pretty cool and fun if you’re into stuff like this and want to contribute to help mark or transcribe old documents. I feel like this is a good way for us to see how people wrote back in the days and how terrible their handwriting was and still is today. I felt like this was very easy to follow and not hard at all to figure out what to do. I do think I made a contribution to this project because I can help verify words that are hard to read and maybe I was able to read them better than some people and those kind of people are needed to help verify what some words may or may not be. I feel like the research going into this is harder than it I thought it would be because of the amount of people sometimes who see something like this are very low and their data can be very low or scarce sometimes and it can be hard to finish a project or any research without sufficient data. Overall, I think this is an awesome and cool project that more people should try out.

  13. The instructions were very easy to follow, although I wished the site told me that it would take a while to find a transcription before I clicked the TRANSCRIBE button.

    Some texts were easy to transcribe, but some pieces of information were blocked by attached notes, which happened frequently. Some documents have information with pressed ink, which is easy to read, but others have pencil or pen written, which is either easy or hard to read.

    I don’t really know if I was making a positive contribution to the project because I didn’t know if others transcribed before I did, although I am wondering if my input helps AIs distinguish information, but I definitely know the information is simply going into a directory for a specific document. It seems extremely simple to do from a intermediate programmer’s perspective.

    I think I came away with some idea about the research and its potential uses, such as finding where the most soldiers came from, looking at the percentages of wedded soldiers. Even small tidbits of information can be used to imagine the story of a soldier’s prior life.

Leave a Reply

Your email address will not be published. Required fields are marked *