Mining in the Old Bailey Project

Visualization from website, "With Criminal Intent"
Visualization from With Criminal Intent

The Old Bailey Project is a Goliath in the world of historic big data sets. Composed of around 125 million words from 197,000 trial records of London’s central court, ranging in date from 1674 to 1913, this project provides amazing potential for textual analysis of historic court records. Though this collection was previously available via hard copy or microfilm, this digitization project not only provides expanded access to the documents, but provides a resource to explore how digital humanities can alter the questions that historians ask. As Gibbs and Owens explains, large data sets provide historians an opportunity to interact with their research materials  differently. “Data in a variety of forms,” they explain, “can provoke new questions and explorations,” pushing historians to different queries from the vast sets of information available to them.[1]

The Old Bailey Project provides one such rich data set, which when combined with key tools illuminates a remarkable lens to view 250-years of everyday London life through legal documents. A major strength of the project is its infrastructure, composed of a backbone of XML language. This marked and tagged structure provides strengthens the searching capabilities within the project, and provides the ability to connecting phrases which like meanings; expands categories of crimes, punishment and verdicts; and can links alike names together. Further, this rich XML language creates a powerful data set, which when paired with digital tools, allows historians to create intriguing visualizations and find new interpretation. With the ability to search through thousands of records, research is pushed out of a linear fashion, making connections through key words and phrases throughout the corpus of the project. Finding different patterns through trends, or ‘messing around’ with data may yield new questions for historians or identify different patterns or variances. One set of researchers found, for instance, that most poisoning used coffee as the vehicle to administered their crime.[2] Though this example may seem trivial, it illustrates how the Old Bailey project can bring about new questions for historians. Within the ‘traditional’ hard texts, it would be time consuming and difficult to discern instances of coffee and poison appearing within the court records. Further, historians would need the forethought to look for these occurrences. As William Turkel explains, previous scholars “tended to cherry-pick anecdotes without having a sense that it was possible to measure all of that text and treat the whole archive as a single unit.”[3] Data mining this data set reveals new contexts, trends, and patterns easily, which historians can then closely examine to find connections previously unseen.

The project does have its limitations. Though the structured data set is powerful, it is not ideal for every scholar or discipline. Linguist Mangus Huber, for instance, found the search engine limited in his quest to analyze speech patterns of 18th century London.[4] Realistically though, I think the shortcomings of this project are mostly resolved with the use of digital tools, in conjunction with the data.  Take the example of poisoned coffee used above. The researchers at Criminal Intent, a website dedicated to analyzing the Old Bailey Project, used a combination of Zotero and Voyeur to visualize and more easily see the results of their search from the Old Bailey Project website.[5] It was through these tools that the trends were revealed and patterns could be assessed. Huber was able to curtail the project to fulfill his research queries by creating his own altered data corpus and metadata structure.[6] The Locating London’s Past project has used geo-referencing software to map the proceedings and crimes within the Old Bailey records.  Each of these tools, paired with the powerful data set from the project, allow historians to better understand, analyze, and ask new questions about London’s ordinary criminals, and by proxy ordinary citizens. The Old Bailey project is a powerful big data project, but to me, it is only as powerful as the tools used to analyze it.

[1] Fredrick Gibbs and Trevor Owens, “The Hermeneutics of Data and Historical Writing (Spring 2012 version),” Writing History in the Digital Age, Jack Dougherty and Kristen Nawrotzki, eds., (Ann Arbor, MI: University of Michigan Digital Culture Books, 2012).

[2]Beware the Coffee,” With Criminal Intent, March 29, 2011.

[3] Patricia Cohen, “As the Gavels Fell: 240 Years at Old Bailey,” The New York Times, August 17, 2011.

[4] Magnus Huber, “The Old Bailey Proceedings, 1674-1834: Evaluating and Annotating a Corpus of 18th and 19th Century Spoken English,” Varieng 1: Studies in Variation, Contacts, and Change in English, 2007.

[6] Huber, “Annotating a Corpus.”


One thought on “Mining in the Old Bailey Project

  1. Callie,

    You made an interesting point about projects like Old Bailey pushing research out of its typically linear fashion. I do wonder, though, if stretching the bounds of historical research gets stuck here. Like Modupe Labode suggested in class last week, it seems that historians are using digital tools only to reinforce the same linear, black and white narratives that have been dominant for so long. Perhaps it’s time to start experimenting with different types of story-telling, and to start constructing non-linear narratives. Easier said than done.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s