Topic modeling made just simple enough: Ted Underwood’s very coherent account of LDA and some best practices for humanists.
What kinds of “topics” does topic modeling actually produce: This post, also by Ted Underwood, also addresses best practices, including issues of topic coherence and the scale of the corpus.
The LDA Buffet is Now Open; or, Latent Dirichlet Allocation for English Majors: Matt Jockers’ allegory of the LDA Buffet.
Topic Modeling and Network Analysis: Scott Weingart’s description of topic modelling and its relationship with network analysis provides links to the major bibliography for topic modelling.
Topic Modeling in the Humanities: An overview: Clay Templeton’s introduction.
Why use visualizations to study poetry?: Lisa Rhody’s account of her experiments on ekphrasis (scroll to the bottom to see my comments).
Reading Tea Leaves: How Humans Interpret Topic Models: An interesting study of the way human interpret topic models. A video presentation of the project is also available.
Topic Modeling for Humanists: A Guided Tour: Scott Weingart’s version of this collection of links. He provides slightly more explanation and a few more links (in addition to the ones here).
GUI Topic Modeling Tool (uses MALLET as its back end)
For convenience, here are the commands for operating the command-line version of MALLET. The first command imports the data and the second generates the topics:
binmallet import-dir --input data --output filename.mallet
binmallet train-topics --input filename.mallet --num-topics 20
--output-state topic-state.gz --output-topic-keys filename.txt --output-doc-topics filename_composition
Still a challenge. Right now, the best visualisation option seems to be opening CSV data for topic models in Excel and generating graphs there.
That said, I’m really impressed with Matt Jockers’ theme viewer, presented in anticipation of the publication of his book Macroanalysis: Digital Methods and Literary History (UIUC Press, 2013). It’s really just a combination of individually generated bar and line graphs, combined with word clouds, and stuck in a database, but it’s effective.