Taking Zoobook (and stalking) to the next level

For our group’s final project, we decided to create a map based on the Carleton’s digital database of Zoobooks from 1955 to 2015. The visual representation of this data will help us understand where Carleton students come from and how it has changed throughout the decades.

As a CS major, my biggest personal interest in this project is to develop a data-scraping tool that will enable us to quickly transcribe the data from PDF-formattted Zoobooks to a comprehensive Excel sheet. I find this task to be challenging but interesting and am excited to start working on it.

 

Quang Tran

3 Comments

  1. I like your idea on extracting information on such a unique source to create this interesting sounding project. Through your pitch, I believe that your project is feasible and convincing. Moreover, I also do appreciate your passion and interest in this DH project and I can tell that you will certainly achieve much through it. However, I am also curious in what do you hope to gain from this experience and if there is a certain goal?
    P.S. One trivial concern- you mentioned that you are focusing on the digital database of Zoobooks from 1955 to 2015 but are not we assigned to look into the period of 1916 to 1966?

    • Yeah, data has been my greatest concern even since we drafted our proposal. We hope to get data of the period from 1955 to 1966 from Zoobook first and then search for the data from 1916 to 1955 elsewhere. Thanks for pointing it out!

  2. Quang, I agree that extracting and cleaning your data will be the first big challenge on your project. There are a number of different web scraping and data manipulation tools and techniques out there, ranging from off the shelf web forms to command line programs and code libraries. A good place to start would be looking through the list of lessons at The Programming Historian, which has lots of great tutorials along these lines.

    Let me know how you get on!

Leave a Reply

Your email address will not be published. Required fields are marked *