English Broadside Ballad Archive
Overview and Context
The English Broadside Ballad Archive (EBBA) was created to catalog and showcase all surviving ballads from 17th century England--currently around 10 thousand unique ballads. EBBA was started in 2003 at the University of California, Santa Barbara, its institutional home, by Dr. Patricia Fumerton, who continues to serve as the director of the Archive. DataLab’s Executive Director, Carl Stahmer, has served as the archive’s Associate Director since 2008 and is responsible for overseeing the archive’s technical development. As EBBA’s collection of ballads has grown, the DataLab has worked to expand the capabilities of the archive by providing functionality that allows users to apply computational methods to perform advanced analysis of the materials archived in the collection. To date, the DataLab has added numerous text mining, image recognition, music playback, and visualization capabilities. EBBA has been supported since its inception through a series of seven major grants from the National Endowment for the Humanities.
Broadside ballads were single-sided and inexpensive prints common in Early Modern England. Ballads often used familiar tunes and conveyed information about current events along with entertainment. Broadsides were a staple of daily life for people from all walks of life due to their low cost and public performances. However, this accessibility also meant they were printed on relatively cheap materials and were rarely archived. The preservation of these materials provide a look into the daily goings-on of the time, as well as the ability to follow the development of the medium and culture over time.
To learn more about the archive, visit the site at: http://ebba.english.ucsb.edu/
The ballads are a rich source of textual information from Early Modern England. The DataLab has developed several methods to examine the textual data of individual ballads, as well as the entire corpus of ballads.
For individual ballads, DataLab efforts have produced tools to examine and visualize the word frequency within a ballad, as well as the weighted importance of key terms in a ballad compared to the corpus (using Term Frequency Inverse Document Frequency or TFiDF). In addition to word counts and importance, each ballad now includes a bigram (or two word) network, mapping what terms are frequently paired. By looking at how often word X follows word Y, we can start to understand the relationships between them, as in the case of refrains, place names, and more. More advanced techniques such as latent Dirichlet allocation (LDA) topic modeling allow each ballad to be ranked for its relevance to the 160 topics common in the corpus.
Across the ballad corpus, an LDA topic model generated 160 prevalent topics. Using the visualization tools provided, visitors can look at a topic distance map to see which topics are related, and what terms are most central to those topics. Together with the specific metrics EBBA provides for each ballad, this bird’s eye view can help place individual ballads in context among other ballads in the archive.
Archive View (Arch-V) Image Recognition
Woodblock prints are an essential component of broadside ballads. Reliefs were carved into blocks of wood then used as stamps to transfer images onto a printed piece. The DataLab has developed the Arch-V tool to search through these prints and find ballads using similar prints or print elements. Understanding these prints helps us understand the ballads themselves, the time period they were produced, a well as the advancement of printing technology.
For each ballad, the woodblock print is isolated and compared to every other print in the EBBA database. A network of similar prints is then created to show what other prints were using the same images. Arch-V also allows users to upload their own print to compare to the database. This can help users identify unknown prints, or provide them with information from similar images. Once identified, this graph visualization can be used to place a print in context among the universe of related woodblock prints in EBBA.
From a technical standpoint Arch-V works by breaking down the image into its most distinctive components, thencomparing the content and location of those points to other prints in the database. These comparisons take into account what the components look like, but also where on the print they are located and where they are located in relation to other key points. By including not just what the components look like but where they are in the print, Arch-V outperforms similar software. Taking location into account is especially important when looking at historical prints given the possibility for damage, fading, or other transformation of the images.
Ballads were also sung to popular tunes in the period. EBBA’s music specialists have sung and transcribed these tunes, and the DataLab has produced a tool to convert those transcriptions into a playable format. For selected ballads, the lyrics and musical notation from the first stanza are converted into a playable MIDI format by a team at UC Santa Barbara. An embedded player on the site then shows this rendition of the ballad, complete with sheet music, lyrics, and proper tempo. The sing-a-long like format allows listeners to follow along with the ballad themselves to hear how they might have sounded when they were in print!
The DataLab is currently expanding EBBA’s ability to link with other databases, and to extract the named entities within the ballads. By incorporating information from other sources, we hope to give further context to the time and place these ballads were created.
One exciting example is the Map of Early Modern London. By extracting the names of print shops from ballads, we can show on the map where ballads were printed! Similar named entity recognition can help highlight which ballads are talking about notable figures and places, which will allow EBBA to provide context to these individuals and place by partnering with other archives.