The Newton Project – Hacking the Humanities

Starting in 1998, a team of scholars in the UK set out to create a digital record of all of Newton’s writings, academic or not. Seventeen years, 6.4 million words, and countless texts later, The Newton Project is mostly complete. It contains transcriptions and translations of hundreds of writings by Sir Isaac Newton, each cataloged with its very own Newton Catalogue ID. This project represents years of work and collaboration with numerous universities and non-profits.

The homepage of The Newton Project

Breaking down this project is relatively simple. The Newton Project is, obviously, created from the original works of Sir Isaac Newton. His papers, letters, and notebooks, of theological, political, historical, and academic significance are all available on the project website. In processing the raw sources, these documents were translated and transcribed–simply presenting the raw images, say the project’s creators, would not serve The Newton Project’s goal of creating a digital archive. Tagging each work, once transcribed, with metadata, a title, and a unique ID allow these works to be presented in a searchable database for easy access by users; the project follows guidelines based on recommendations given by the Text Encoding Initiative. In order for the data to be made readily available and usable on the project website, a great deal of collaboration among programmers, encoders, transcribers and editors is required. The Newton Project uses an xml-based website, containing categorized, archived data. Each document is presented relatively simply, with text alongside any images included in the original document.

The amount of work that goes into creating an archive such as this is incredible. It has taken this team years to build what at first glance seems to be a very basic archive of Newton’s work–the time investment involved in creating similar, interactive projects must be even more incredible!

2 Comments


September 23, 2015
Miriam Posner

Hi Pallav,

Nice job breaking the project down! I agree, it’s a staggering amount of work to encode and edit documents to this level of detail. What do you think this offers to the project that wouldn’t be available if the authors had, say, just transcribed the documents and thrown them up on a content-management system? Is it worth it?

Another thing you might think about if you were really seriously analyzing this project is the actual TEI technical specifications for this project. Why did the project participants decide to mark up the specific things that they did? What did they think was important, and what did they ignore? Do you agree with their choices?

There’s also the larger question of the interface. Looking at the interface, do you have a clear sense of, say, with whom Newton corresponded, and how frequently? Can you imagine another way to show this information?

Again, nice work!

— Miriam Posner
- Reply
  
  September 24, 2015
  pallavkumar
  
  Those are good questions! I would assume that with the level of tagging and encoding they did, it’s much easier to search the database and do things like word frequency analyses. In terms of the technical specifications, thanks for linking to that! I don’t think I would have found it otherwise. I think their guidelines were reasonable for the most part–they obviously made a huge effort to keep transcriptions as true to the text as possible, which I applaud. Everything from misspelllings to bold/italic to changes in writers was kept track of as to keep the digital version of each document as true to the original as possible. I honestly can’t find anything that they could have forgotten.
  
  In terms of the interface, I think theirs leaves much to be desired. Although it’s very easy to navigate through the database as it stands, I would compare it to a library rather than a museum; it’s only easy to find documents that you’re looking for, and visualizations are entirely lacking. I myself imagine some kind of statistical display which could be linked to the passing of time. With each year, say, differently-colored points could appear for each different type of work added, along with counters to mark how much correspondence was exchanged with who. It might almost work better as a video–then again, I have no idea how to implement that, so that’s quite a different problem altogether. Thank you for your comment!

pallavkumar

2 Comments

Leave a Reply Cancel reply