Robots Reading the Carletonian

Members in Group: Oren and Chris

Project Definition: We will use the digital records of the Carletonian 1965-2015 to track the interests of Carleton students over time as well as the evolution of the newspaper itself.

 

Proposed Methodology

Data: For our project we hope to be able to use Carletonian issues from 1965 to 2015.

Tools and Techniques: Each issue is available via the Carleton Digital Archives in a PDF form. We plan to use a OCR.Space’s free API and a python script to translate each PDF into plaintext. At the moment this will require downloading a PDF of each newspaper page and running a python program on it. We would expect this to take a long time and are in communication with the digital archivist to see if there a more efficient solution. Hopefully we would be able to find downloadable PDFs of entire issues or be able to use the PDFs url instead.

Analyses and Transformations: We will use text analysis to find patterns of word choice in the issues over time. We can also run analysis on just the headlines to narrow down the data. We can also look at other days like pages per issue over time, or number of authors featured, or the length of feature articles.

Presentation: We will link to the VoyantTools page for our project on the website, and we will also show screenshots of relevant graphs and visualizations, as well as written analysis and explanation.

 

Proposed Timeline

Sprint 1 (2/9 – 2/16): Gather and clean data from all Carletonian from 1965 to 2000

Sprint 2 (2/17 – 2/21): Gather and clean data for Carletonian for remaining 15 years. Begin text analysis using VoyantTools

Sprint 3 (2/22 – 2/25): Complete text analysis and begin to creation of website

Sprint 4 (2/26 – 3/2): Complete website and populate it with interactive displays and narrative

Sprint 5 (3/2 – 3/6): Create and prepare presentation for 3/7

 

Current Similar Project

Robots Reading Vogue

Author: lieberkotzo

http://orenlieberkotz.org/

Leave a Reply

Your email address will not be published. Required fields are marked *