Categories
Uncategorized

Rebuild FabricSpace

I discovered this really cool digital humanities project called FabricSpace. I found it as a blog post / article on the Digital Humanities page of Yale University library. Here’s a link to it: http://dh.library.yale.edu/projects/vogue/fabricspace/.

The main goal of this project is to generate clusters of fabrics based on each fabric’s context of appearance in Vogue magazines and see if the resulting clusters are interpretable to humans.

Source. 91,880 articles taken out of 124 years of Vogue magazine. All these articles are printed versions and had not been digitized prior to this project. Fortunately, the authors of this project found a smart way to accomplish this task that can be used to any other text corpuses that are nicely formatted and preserved.

Processes. First, all the articles were scanned and parsed using an optical character recognition (OCR) algorithm. Then, a list of fabric names is obtained from the authors’ own vocabularies and the resulting text using Ben Schmidt’s “nearest_to” R package. This list is later shortened by removing the fabrics that are less commonly used or have multiple meanings. Clustering is performed first on the longer list and then on the shorter list.

Presentation. A clustering dendrogram is produced for each list of fabric names. The authors put a lot of energy in explaining why the first list of names produced poor clusters. They plotted the word frequency over time for a fabric name with ambiguous / multiple meanings and showed that its word frequency, meaning and surrounding context in articles vary over time. In the end, they showed a successful clustering dendrogram without such names:

A screenshot from http://dh.library.yale.edu/projects/vogue/fabricspace/.

I really liked this project because it has to potential to give rise to inspirations in other fields like fashion design and studio arts: by combining fabrics that in distant clusters, new styles can be made. Another reason is that, although the center of this article is about the clustering algorithm, it explained all the relevant math concepts in plain English, which can help other practitioners in Digital Humanities to apply the technique to new contexts. In addition, all the code snippets are open source: they are included right above the plots they produced.

3 replies on “Rebuild FabricSpace”

I really like this project! What a cool concept. I enjoyed your last paragraph about the usefulness of the project; inspiration for design is just one of the many ways this data can be used. The accessibility was nice to note too, because having concepts in plain English is so important for students or other audiences to understand what is happening.

I really appreciated how detailed this post was in its investigations within the Robots Reading Vogue project, which I wrote about and found too vast too properly describe as a whole. I am still unsure whether or not these clusters are interpretable to humans, as you mentioned!

To be honest, I enjoyed reading and understanding your blog better than the actual article. I also became really intrigued and curious about the algorithm used for this. I can see so many possibilities of where it can applied, especially in preserving and reviving languages for instance.

Leave a Reply

Your email address will not be published. Required fields are marked *

css.php