How to Create and Cluster Topic Files in Lexos

This post is a follow-up to last year’s How to Create Topic Clouds with Lexos, where I showed how Lexos can be used to visualise topic models produced by Mallet. From time to time, colleagues have wondered whether it would be possible to use Lexos to perform cluster analysis on the topics Mallet produces. The motivation for doing this is simple enough; topics are often very similar, and it would be useful to have some statistical measure of this similarity to help us decide where groups of topics really should be interpreted under some meta-class. Some added urgency has arisen in discussions for the 4Humanities WhatEvery1Says Project, which is topic modelling a large collection of public discourse about the Humanities. We’ve begun considering whether doing cluster analysis on topic models can help us to refine our experiments.

The first step is finding a way to massage the Mallet data into a form we can submit to clustering algorithms. Lexos already transforms Mallet output into a topic-term matrix, which is then used to make word clouds using the top 100 words in each topic. Essentially, the topics are treated just like text documents (or at least slices of them).… Read more…

Continue reading


Play as Process and Product: On Making Serendip-o-matic

I’m at the DH 2014 conference in Lausanne, Switzerland, and enjoying it immensely, despite cold and rainy weather which should be impossible in July. I’ve just delivered my paper “Play as Process and Product: On Making Serendip-o-matic” (abstract here), along with colleagues Mia Ridge and Brian Croxall (co-author Amy Papaelias couldn’t make it but contributed remotely). Iʼll blog more on the conference itself in a separate post, but for now I thought Iʼd put my portion of the presentation online. Hereʼs Miaʼs portion, and here Brian’s portion.

Play as Process and Product: On Making Serendip-o-matic

Hi, Iʼm Scott Kleinman, and my job is to introduce you to the One Week | One Tool experience which led to the creation of Serendip-o-matic. One Week | One Tool was a summer institute sponsored by the National Endowment for the Humanities. It was organised by Tom Scheinfeldt and Patrick Murray-John, and hosted by the Roy Rosenzweig Center for History and New Media at George Mason University. The idea for One Week | One Tool was inspired by models of rapid community development and advertised as a digital “barn-raising”, in which a diverse group of twelve DH practitioners would gather “to produce something useful for humanities work and to help balance learning and doing in digital humanities training.”… Read more…

Continue reading


NEH Funds the Archive of Early Middle English

I’m excited to announce that I have received an NEH Scholarly Editions and Translations grant, which I will co-direct with Dorothy Kim from Vassar College. The grant will help create an Archive of Early Middle English (AEME). We’ll start with a full digital edition of Oxford, Bodleian Library, Laud Misc 108, with a complete set of images. Other manuscripts will follow, and by the end of the grant we expect to have a full set of metadata and editorial conventions for other to submit materials. AEME will be designed to be flexible. Not everybody can afford to photograph full manuscripts, so we’ll be working to accommodate images as they become available in the public domain (perhaps licence a few that are not). We’ll also take individual texts, in addition to whole manuscripts. And and all ye multilingual enthusiasts, we haven’t forgotten one of the most important developments in recent scholarship. Although we want Early Middle English (about 1066-1350) to be at the centre, we’ll take any materials in any language that is also found in manuscripts containing Early Middle English.

I will write much more once the project gets started. For now, you can learn a little more on the AEME web site.… Read more…

Continue reading


Introducing Serendip-o-matic

Serendip-o-matic I’m proud to introduce the online search tool Serendip-o-matic. From July 28-August 3, I worked with a fabulous group digital humanists to produce this tool from scratch as part of the One Week | One Tool project.

Serendip-o-matic connects your sources to digital materials located in libraries, museums, and archives around the world. By first examining your research interests, and then identifying related content in locations such as the Digital Public Library of America (DPLA), Europeana, Trove Australia, and Flickr Commons, Serendip-o-matic’s serendipity engine helps you discover photographs, documents, maps and other primary sources.

Whether you begin with text from an article, a Wikipedia page, or a full Zotero collection, Serendip-o-matic’s special algorithm extracts key terms and returns a surprising reflection of your interests. Because the tool is designed mostly for inspiration, search results aren’t meant to be exhaustive, but rather suggestive, pointing you to materials you might not have discovered. At the very least, the magical input-output process helps you step back and look at your work from a new perspective.

At some point, I will blog about the experience, but that will have to wait a little because the project has coincided with activity related to the news that I have received an NEH award to create an Archive of Early Middle English.… Read more…

Continue reading