Extracting Text from PDF There are many ways to get OCR’d text out of a pdf, from APIs to python utilities to copy/paste. If you are on a mac, one of the easiest is to set up an automator script to generate output from any PDF you drop on top
Continue readingCategory: Week 6: Text Analysis + XML
Text Analysis 101: XML, TEI, and VoyantTools
Over the past few weeks we have discussed and seen how the modern dynamic web—and the digital humanities projects it hosts—comprise structured data (usually residing in a relational database) that is served to the browser based on a user request where it is rendered in HTML markup. This week we are exploring how
Continue reading