The ballads are a rich source of textual information from Early Modern England. The DataLab has developed several methods to examine the textual data of individual ballads, as well as the entire corpus of ballads.
For individual ballads, DataLab efforts have produced tools to examine and visualize the word frequency within a ballad, as well as the weighted importance of key terms in a ballad compared to the corpus (using Term Frequency Inverse Document Frequency or TFiDF). In addition to word counts and importance, each ballad now includes a bigram (or two word) network, mapping what terms are frequently paired. By looking at how often word X follows word Y, we can start to understand the relationships between them, as in the case of refrains, place names, and more. More advanced techniques such as latent Dirichlet allocation (LDA) topic modeling allow each ballad to be ranked for its relevance to the 160 topics common in the corpus.
Across the ballad corpus, an LDA topic model generated 160 prevalent topics. Using the visualization tools provided, visitors can look at a topic distance map to see which topics are related, and what terms are most central to those topics. Together with the specific metrics EBBA provides for each ballad, this bird’s eye view can help place individual ballads in context among other ballads in the archive.