The Old Bailey: Contextual Challenges of Big Data Analysis

25 Feb
October 14, 1842 trail at the Old Bailey. Courtesy of Victorian Calendar blog.
October 14, 1842 trail at the Old Bailey. Courtesy of Victorian Calendar blog.

For the historian, text is our bread and butter. We are constantly encouraged to understand the past through close, critical reading of primary and secondary sources. In some ways, this is only natural; after all, we see the past as a series of stories, and, as storytellers, we know the public will be most invested (and perhaps more entertained) if we focus on individual human experiences to paint a sweeping narrative. In a previous post, I discussed the work of Franco Moretti and his insistence on the historian’s need to embrace what he calls “distance reading” in order to see things from a fuller perspective.[1] Although difficult to initially comprehend, this central argument for the need to step outside the box and analyze data in new, more detached ways is one that has grown on me over the semester. Thus, we are posed with an interesting question; can history in the digital realm help us achieve the traditional depth of expected in textual analysis while remaining distant enough to observe patterns otherwise missed?

Part of the answer lies seeing what we have produced so far. Although we still have a long way to go, sites like The Old Bailey suggest that distant and close reading of sources is certainly possible and profitable. The site is dedicated to allowing users to search through a digitized collection of over 100,000 court proceedings that took place in England’s Old Bailey courthouse from 1674 to 1863. Results can be searched and sorted according to gender of defendant/victim, offenses, verdicts and punishments and displayed in a variety of ways from tables to graphs.[2] Of course, extracting meaning from this abstracted cache depends on context—and thankfully, the site provides a very helpful historical background tab that gives new and returning users the lowdown on where these documents come from and what they mean. This critical addition opens the door to the site’s true potential. One can not only perform the usual keyword searches to dig for specific trials, but also trace the shifting frequency of usage of words in court cases over the years. Even for someone familiar with only the broad strokes of British history, I can easily see the potential for pursuing questions about the shifting political and social nature of the country over the years. For example, it would be quite interesting to see what crimes are most commonly committed when the country is at war, as this might be able to tell us something about the ripple effect of an event like America’s Revolutionary War throughout Britain.

For me, digital history’s most valuable assets are flexibility, transparency and a sense of playfulness—and the Old Bailey has all three. Part of this lies in the site utilizing XML (Extensible Markup Language) formatting and open source access, offering users the chance to not only explore the world but also providing them with a means to craft their own customizable tool belt of search terms—if one has the know-how use XML in such a playful manner. This makes the site highly adaptable in a way few others are—an important factor, given the increasing flexibility we demand from our technology. This flexibility also applies to the number of visual representations of data the site offers in the form of pie, chart and line graphs. Although using textual analysis has great potential to help us weave new webs of connectivity by allowing us to step back and view the “big picture” while giving us the option of occasionally giving into “descriptive reading,” it can be difficult for those of us who have never used it before. Although I am still learning the ins and outs of XML formatting, even I can see that this site is laudable for its attempt to make methodological approaches to digital history more transparent. Fredrick Owens and Trevor Gibbs correctly note that, as digital history is a field concerned with engaging broader audiences in the joys of historical discovery, our emerging methodological techniques must be accessible to a larger public audience rather than remain “an impenetrable and mysterious black box.”[3] The Old Bailey’s use of XML and open source formatting is certainly a step in the right direction in this regard, for all of this allows the visitor to engage these documents on his own terms—approaching history with the same playfulness so often expressed by blossoming professionals.

While these possibilities are exciting, the site’s limitations raise a number of interesting questions about what the future holds for contextually meaningful textual analysis. My colleague Nicholas Sacco rightfully points out that, while The Old Bailey’s analytical techniques for managing “big data” are useful for quantitative analysis, it is still very difficult to extract qualitative meaning (on both an individual and group level) from such a large cache of text.[4] In the end, our current coding parameters still force our machines to think more like machines and less like people. I believe David Brooks best expresses this in a recent New York Times piece, saying; “Data struggles with context… People are really good at telling stories that weave together multiple causes and multiple contexts. Data analysis is pretty bad at narrative and emergent thinking, and it cannot match the explanatory suppleness of even a mediocre novel.”[5]

I began this post by asking whether we can effectively mix quantitative with qualitative analysis without sacrificing emotional and contextual depth for the sake of numerical spread. While my experiences with The Old Bailey have pushed me toward the affirmative, I believe part of the answer lies in tackling a more pressing question; Are we ready to embrace a change in our method of historical storytelling in order to qualitatively meaningful quantitative studies? And, more importantly, can we afford not to be? Gibbs and Owens again warn us that, “This may mean de-emphasizing narrative in favor of illustrating the rich complexities between an argument and the data that supports it. It may mean calling attention to productive failure–when a certain methodology or technique proved ineffective or had to be abandoned. [But] these are precisely the kinds of lessons historians need to learn as they grapple with new approaches to making sense of the historical record.”[6] Even though The Old Bailey is far from perfect, the fact that we are able to create a site of this caliber that embraces such imaginative new forms of thought gives me hope for the future—and reinforces the need for historians to broaden our scope of narrative analysis.

[1] Franco Moretti, Graphs, Maps and Trees: Abstract Models for Literary History (Brooklyn: Verso Books, 2005), 3.

[2] Tim Hitchcock, Sharon Howard and Robert Shoemaker, “Research and Study Guides – Doing Statistics“, Old Bailey Proceedings Online (, version 7.0, 09 December 2012), accessed February 23, 2012.

[3] Frederick W. Gibbs and Trevor J. Owens, “The Hermeneutics of Data and Historical Writing” in Writing History in the Digital Age, ed. Jack Dougherty and Kristen Nawrotzki (Spring 2012 version).

[4] Nicholas Sacco, “The Old Baily Proceedings: Big Data and the Expansion of Research Methods,” Exploring the Past (February 23, 2012), accessed February 24, 2012.

[5] David Brooks, “What Data Can’t Do,” (February 13, 2012), accessed February 25, 2012. Thanks again to Nick for the illuminating article.

[6] Gibbs and Owens.


2 Responses to “The Old Bailey: Contextual Challenges of Big Data Analysis”

  1. jkalvait February 25, 2013 at 9:37 pm #

    I like your use of the word “playfulness.” I see it as a pun, as I hope you intended it to be. As a nerd, I can literally play in a database and the database can be playful in that the data is easily manipulated. I am going to focus on the former. I think that the data in the Old Bailey Online goes a long way in making me feel playful while interacting with it. I am fascinated by the documents. I also am intrigued by connecting this notion of play with your last (and first) paragraph. We can play with the way historians ask questions, do their research, and come to some conclusions, but I wonder how we keep our analysis playful enough that people will want to read it. As you noted, a lot of us (and our audience members) are attracted to history because of the story being told/analyzed. If we take the narrative away, or bury it, what do we stand to lose? Can a historian write a book using distance reading and still have the arc be playful enough to attract an audience to read it? Audience is not something we have really considered in our discussions. Obviously, these are not questions any of us can answer, but I certainly found your post playfully thought-provoking.

    • Tim Rainesalo February 25, 2013 at 10:41 pm #


      Thanks for you feedback. I’m glad you caught the double meaning of my use of the term “playfulness,” as I feel it is one that nicely describes the feelings many of us hope to evoke in people engaging with the past in the digital world. I think your comment about our lack of discussion regarding who exactly our audience is speaks to a difficulty many historians encounter when transitioning from paper media to fluid, digital forms of communication. Books, as you noted, are typically written with a specific audience in mind, but, by putting things online, we broaden our audience to potentially include the entire world! Even if this scope is not entirely realistic, I think it is something many historians may have trouble grasping considering we are so often encouraged to compose our ‘traditional’ academic work for a very specific audience. Overcoming this restriction is one of digital history’s greatest potentials, as I think our interactions with The Old Bailey have shown, for we’ve all managed to get something out of it despite British history falling outside our individual areas of historical expertise/interest.

      Another interesting question is whether this sense of playfullness that we experience in a uniquely fluid digital medium can even be transmitted to a traditional static, printed book? As you noted, none of us has the answer, but sometimes half the fun is in pondering the question!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: