How to Create Topic Clouds with Lexos

Some Background

Topic modelling is gaining increasing momentum as a research method in Digital Humanities, with MALLET as the general tool of choice. However, many would-be topic modellers have struggled to make effective use of MALLET’s output, which is raw data. In fact, there has been a growing movement to devise methods of visualising topic modelling data generally. A while back, Elijah Meeks had an idea for generating topic clouds: separate word clouds for each topic in the model. [I can’t seem to access his original blog post, but here is his code on GitHub.] Although word clouds have their problems as visualisations, Meeks speculated that they were particularly effective for examining topics in a topic model. Indeed, others have used word clouds to visualise topic modelling results, most notable Matt Jockers in the digital supplement to his Macroanalysis. One of the things I liked about Meeks’ implementation using d3.js was that it placed the clouds next to each other so that they could be compared.

I quickly transferred this idea to our work on the Lexomics project, and our software Lexos. In Lexomics, we frequently cut texts into chunks or segments, which can then be clustered to measure similarities and differences.… Read more…

Continue reading


NEH Funds the Archive of Early Middle English

I’m excited to announce that I have received an NEH Scholarly Editions and Translations grant, which I will co-direct with Dorothy Kim from Vassar College. The grant will help create an Archive of Early Middle English (AEME). We’ll start with a full digital edition of Oxford, Bodleian Library, Laud Misc 108, with a complete set of images. Other manuscripts will follow, and by the end of the grant we expect to have a full set of metadata and editorial conventions for other to submit materials. AEME will be designed to be flexible. Not everybody can afford to photograph full manuscripts, so we’ll be working to accommodate images as they become available in the public domain (perhaps licence a few that are not). We’ll also take individual texts, in addition to whole manuscripts. And and all ye multilingual enthusiasts, we haven’t forgotten one of the most important developments in recent scholarship. Although we want Early Middle English (about 1066-1350) to be at the centre, we’ll take any materials in any language that is also found in manuscripts containing Early Middle English.

I will write much more once the project gets started. For now, you can learn a little more on the AEME web site.… Read more…

Continue reading


Introducing Serendip-o-matic

Serendip-o-matic I’m proud to introduce the online search tool Serendip-o-matic. From July 28-August 3, I worked with a fabulous group digital humanists to produce this tool from scratch as part of the One Week | One Tool project.

Serendip-o-matic connects your sources to digital materials located in libraries, museums, and archives around the world. By first examining your research interests, and then identifying related content in locations such as the Digital Public Library of America (DPLA), Europeana, Trove Australia, and Flickr Commons, Serendip-o-matic’s serendipity engine helps you discover photographs, documents, maps and other primary sources.

Whether you begin with text from an article, a Wikipedia page, or a full Zotero collection, Serendip-o-matic’s special algorithm extracts key terms and returns a surprising reflection of your interests. Because the tool is designed mostly for inspiration, search results aren’t meant to be exhaustive, but rather suggestive, pointing you to materials you might not have discovered. At the very least, the magical input-output process helps you step back and look at your work from a new perspective.

At some point, I will blog about the experience, but that will have to wait a little because the project has coincided with activity related to the news that I have received an NEH award to create an Archive of Early Middle English.… Read more…

Continue reading