The Old Bailey Project is a Goliath in the world of historic big data sets. Composed of around 125 million words from 197,000 trial records of London’s central court, ranging in date from 1674 to 1913, this project provides amazing potential for textual analysis of historic court records. Though this collection was previously available via hard copy or microfilm, this digitization project not only provides expanded access to the documents, but provides a resource to explore how digital humanities can alter the questions that historians ask. As Gibbs and Owens explains, large data sets provide historians an opportunity to interact with their research materials differently. “Data in a variety of forms,” they explain, “can provoke new questions and explorations,” pushing historians to different queries from the vast sets of information available to them.
The Old Bailey Project provides one such rich data set, which when combined with key tools illuminates a remarkable lens to view 250-years of everyday London life through legal documents. A major strength of the project is its infrastructure, composed of a backbone of XML language. This marked and tagged structure provides strengthens the searching capabilities within the project, and provides the ability to connecting phrases which like meanings; expands categories of crimes, punishment and verdicts; and can links alike names together. Further, this rich XML language creates a powerful data set, which when paired with digital tools, allows historians to create intriguing visualizations and find new interpretation. With the ability to search through thousands of records, research is pushed out of a linear fashion, making connections through key words and phrases throughout the corpus of the project. Finding different patterns through trends, or ‘messing around’ with data may yield new questions for historians or identify different patterns or variances. One set of researchers found, for instance, that most poisoning used coffee as the vehicle to administered their crime. Though this example may seem trivial, it illustrates how the Old Bailey project can bring about new questions for historians. Within the ‘traditional’ hard texts, it would be time consuming and difficult to discern instances of coffee and poison appearing within the court records. Further, historians would need the forethought to look for these occurrences. As William Turkel explains, previous scholars “tended to cherry-pick anecdotes without having a sense that it was possible to measure all of that text and treat the whole archive as a single unit.” Data mining this data set reveals new contexts, trends, and patterns easily, which historians can then closely examine to find connections previously unseen.
The project does have its limitations. Though the structured data set is powerful, it is not ideal for every scholar or discipline. Linguist Mangus Huber, for instance, found the search engine limited in his quest to analyze speech patterns of 18th century London. Realistically though, I think the shortcomings of this project are mostly resolved with the use of digital tools, in conjunction with the data. Take the example of poisoned coffee used above. The researchers at Criminal Intent, a website dedicated to analyzing the Old Bailey Project, used a combination of Zotero and Voyeur to visualize and more easily see the results of their search from the Old Bailey Project website. It was through these tools that the trends were revealed and patterns could be assessed. Huber was able to curtail the project to fulfill his research queries by creating his own altered data corpus and metadata structure. The Locating London’s Past project has used geo-referencing software to map the proceedings and crimes within the Old Bailey records. Each of these tools, paired with the powerful data set from the project, allow historians to better understand, analyze, and ask new questions about London’s ordinary criminals, and by proxy ordinary citizens. The Old Bailey project is a powerful big data project, but to me, it is only as powerful as the tools used to analyze it.
 Fredrick Gibbs and Trevor Owens, “The Hermeneutics of Data and Historical Writing (Spring 2012 version),” Writing History in the Digital Age, Jack Dougherty and Kristen Nawrotzki, eds., (Ann Arbor, MI: University of Michigan Digital Culture Books, 2012).
 Magnus Huber, “The Old Bailey Proceedings, 1674-1834: Evaluating and Annotating a Corpus of 18th and 19th Century Spoken English,” Varieng 1: Studies in Variation, Contacts, and Change in English, 2007.