Note: This post is part of a series reworkings of materials originally written between 2009 and 2012. A description of the nature and purpose of this project can be found here.
Having marked up the diplomatic and edited forms of the Old English poem Azarias (as discussed here), I turned my attention to the much longer poem Daniel. A look at this poem will provide some specific examples of the kinds of issues that arise when using markup to prep materials for quantitative analysis. The edition of Daniel in the Anglo-Saxon Poetic Records (ASPR) contains some 80 editorial emendations—or an average of one every 9 ½ lines. That’s denser than I was expecting-enough to make me wonder if they would affect the results of the Daniel/Azarias study initially performed by the Lexomics team.
A closer look at the emendations raises some important points. There are a fair number of editorially supplied words like to in line 25 or hæfdon in line 56. There are also many examples of late Old English changed to classical West Saxon: e.g. MS þeoden, changed to þeodum in line 34; MS metod, changed to meted in line 129. And there are clear uncorrected errors like mægen hwyrfe, changed from MS mæ gehwurfe in l. 221; MS cynig changed to cyning in l. 268; or MS hlfode, changed to hlifode in line 500. These emendations remove evidence of scribal influence on the text. Sometimes this evidence is linguistic in nature; elsewhere it is mechanical (e.g. sloppy copying of hlifode). If we actually cared to measure the latter, we could only quantify if we also marked up all the examples of forms the scribe corrected himself (about 30).
In other words, we have to account for another intermediary stage in the development of the text. I think it is really important that quantitative research be backed with a carefully articulated theory of the dynamism of texts and how it is considered in framing experimental works. It is also important that this theory by in dialogue with non-quantitative research. For instance, the “material turn” in philology which places manuscript forms before critical editions may not work well with quantitative analysis. Feeding manuscript forms into analytical, even where they are not nonsensical, may not yield results as meaningful as using edited forms purely because of the way they impact the statistics. But we’re then left with uncertainty about the influence of editors on the statistics, the criticism that prompted the move to privilege manuscript forms.
Back in 2009, I speculated that we might compare two copies of the same work, breaking them up into small chunks. Naturally, they would produce exactly the same graphs in a cluster analysis. We would then start changing the vocabulary in each chunk to random unique numbers to create noise. When the dendrograms no longer showed equivalent line groups in the same clade, we would calculate the average number of changed words per line to create a “noise ratio”. If the number of editorial emendations in the text falls below this “noise ratio”, chances are that they are not affecting the results. Such a procedure would be hard to implement, as I noted then, and we never followed up. It is important to have some sense of the validity of cluster analyses. There is an implementation in Gabmap, and the Lexomics team are working on a related tool called TrueTree. But none of these provide a way to assess editorial influence. Some work needs to be done on developing a systematic technique for addressing this issue.