Academic Catalog Update

Academic Catalog Analysis: Max and Cole

Progress

We talked to both Nat Wilson and Hsianghui Liu-Spencer from the Carleton archives to try and learn more about the documents. We learned how they were OCR’ed and what content is available. The information they had available to them was not helpful for our project specifically.

So far, we have gathered PDFs, and processed OCR text files of every academic catalog from 1900-2000. Downloading them individually took a long time. We have not yet built anything, but we have a plan in place.

Problems

Our initial issue is the variance in format and quality of OCR in the catalogs. This has forced us to take a different route. Our new plan is to look at overall trends in word use and frequency.

Tools and techniques

We are not sure exactly what software we will use to analyze the catalogs. We have considered using Voyant Tools. We would like to look into other software that can analyze several different files at once. We will be sure to add several stop words such as course, credit, professor, etc.

Timeline- We are on track for delivery.

~~Understand and evaluate the best tools and data storing option by Feb. 16th.~~ Met

Clean up data and start working on Stop Words List: Monday Feb. 26th.

Website Started: Friday March 2

Presentation: Friday March 9

Author: Max Goldberg

http://goldbergmax.com/about/

One comment

Austin says:

February 23, 2018 at 6:47 am

Max and Cole,

As we discussed in class, there are many more or less intensive solutions to text analysis beyond voyant, which usually just serves as a first taste of what the possibilities of a text are. One of the more commonly used for DH projects is Antconc, and there is a good tutorial to corpus analysis with Antconc on the Programming Historian website.

Once you find some useful comparisons, we can explore further visualization options.