Thanksgiving 2012

This year’s Thanksgiving word cloud includes both Facebook and my Twitter feed.

Word Cloud of Facebook and Twitter Feeds on Thanksgiving 2012

Noticeably absent this year is the word “bacon”.

And, for good measure, here is a topic model of the tweets and posts:

  1. I’m canyon things man cuteness blog ol drop teens gourd
  2. you’re lovely hike home family kill center orangutanes morning daily
  3. thanksgiving love turkey join audience kennedys infotech anonymous open driving
  4. kill history started jugar marylebone brightest uk leave topic twitter
  5. happy monrovia post reading move pie cat puddytat beagles
  6. day 1st work hiking eating dogs participation gustar espa obama
  7. thankful friends house make grading awesome head miss family dpla
  8. today beautiful turkeys year media con project triumphs articles tea
  9. la weather london dinner survived grateful marblehead hackfest workshop en
  10. park gobble beginning good kittens feasting te cnn touch wouldn’t

Suggestions for topic labels welcome.

Update: I was surprised to find that this post appeared on DH Now Unfiltered, so, given the increased traffic, I thought I’d better provide some explanation. I am trying to make this a regular tradition, after having produced a word cloud of last year’s Facebook posts on Thanksgiving. All the posts are only those by me or my friends, and this year’s Twitter feed includes only tweets by me or those I follow (plus the sponsored posts).… Read more…

Continue reading


Computational Approaches to “Small Data” in the Digital Humanities

Taking a break from my re-working of past posts on computational approaches to text analysis, I want to express a plea for attention to be paid to small data and for small data to form part of the larger conversation about quantitative analysis in the Digital Humanities. As the range of digitised materials expand through the proliferation of digitised print materials and born digital texts, big data has come to represent the opportunity for innovation in the Digital Humanities. For those of us who work in areas that have corpora that are restricted in size (medieval English literature, in my case), this represents a problem. How can those of us working in fields that employ these corpora participate in the growing conversation? Are the Digital Humanities doomed to experience a divide between those who do distant reading on “big data” and those who do close reading through markup and editing of smaller numbers of texts? One way the latter group can bridge the divide is by generating lots of metadata. But that only goes so far and doesn’t address some of the analysis of language being done through techniques like topic modelling and the like.

Small data is very useful because it throws into relief some of the same theoretical questions that (in my view) have yet to be sufficiently addressed in approaches to “big data”.… Read more…

Continue reading


A Change of Direction

Note: This post is part of a series reworkings of materials originally written between 2009 and 2012. A description of the nature and purpose of this project can be found here.

Still in September of 2009, I was speculating that comparing manuscript variations would be one of the most popular uses of Lexomics. I was definitely wrong. To date, the Anglo-Saxon Penitentials are the only texts we have subjected to this kind of analysis, and the verdict is not yet in. The process of (digital) manuscript collation is a real challenge, particularly in collating in such a way that one can extract lined up token lists for word frequency analysis. I haven’t seen what the latest version of Juxta can do; I probably need to make time for an experiment in the future. Sadly, I am not currently working on any texts that survive in more than two manuscripts, so this may have to wait.

As my speculation about collation more or less ground to a halt, I was continuing to think about the problem of representing meaning through cluster analysis. In addition to the problem of editorial influence, there is a question over what textual phenomena are being measured by the algorithm.… Read more…

Continue reading