Digital Humanities Projects with Small and Unusual Data: Some Experiences from the Trenches

Update March 15 2016: This content was selected for Digital Humanities Now by Editor-in-Chief Joshua Catalano based on nominations by Editors-at-Large: Ann Hanlon, Harika Kottakota, Heather Hill, Heriberto Sierra, Jill Buban, Marisha Caswell, Nuria Rodriguez Ortega. The slides have now been integrated, and they can also be seen in a reveal.js presentation here.

This is an edited version of a talk I gave at UC Irvine on February 5, at a Symposium on Data Science and Digital Humanities organized by Peter Krapp and Geoffrey Bowker.

I’ve made the focus of my talk Digital Humanities projects involving small and unusual data. What constitutes small and unusual will mean different things to different people, so keep in mind that I’ll always be speaking in relative terms. My working definition of small and unusual data will be texts and languages that are typically not used for developing and testing the tools, methods, and techniques used for Big Data analysis. I’ll be using “Big Data” as my straw man, even though most data sets in the Humanities are much smaller than those for whom the term is typically used in other fields. But I want to distinguish the types of data I will be discussing the from large corpora of hundreds or thousands of novels in Modern English which are the basis of important Digital Humanities work.… Read more…

Continue reading


NEH Funds the Archive of Early Middle English

I’m excited to announce that I have received an NEH Scholarly Editions and Translations grant, which I will co-direct with Dorothy Kim from Vassar College. The grant will help create an Archive of Early Middle English (AEME). We’ll start with a full digital edition of Oxford, Bodleian Library, Laud Misc 108, with a complete set of images. Other manuscripts will follow, and by the end of the grant we expect to have a full set of metadata and editorial conventions for other to submit materials. AEME will be designed to be flexible. Not everybody can afford to photograph full manuscripts, so we’ll be working to accommodate images as they become available in the public domain (perhaps licence a few that are not). We’ll also take individual texts, in addition to whole manuscripts. And and all ye multilingual enthusiasts, we haven’t forgotten one of the most important developments in recent scholarship. Although we want Early Middle English (about 1066-1350) to be at the centre, we’ll take any materials in any language that is also found in manuscripts containing Early Middle English.

I will write much more once the project gets started. For now, you can learn a little more on the AEME web site.… Read more…

Continue reading