Archive | February, 2013

Cross-Platform Concordances, anyone?

25 Feb

The Old Bailey Online offers a range of tools for statistical analysis and visual representation of large sets of data organized around categories (criminal accusations, verdicts and punishments) that make this site particularly useful for researchers hoping to employ quantitative social scientific methodologies to enrich their qualitative analyses. As such, the site functions as more than a searchable catalog of digitized collections, but as a toolkit for understanding and interpreting data that has long been available, but never so accessible and manipulable by so many.

In digging into the Old Bailey site, and the associated methodological literature, I have been particularly interested in how the site architecture facilitates the identification of individuals who appear in a variety of contexts in the Old Bailey archives. From what I can tell, the sociobiographical speaker data that is assigned to individuals within each discrete record (as illustrated in Magnus Huber’s Figure 15 below) is only valid within that record. [1]

figure15

In other words, one individual whose name is recorded in two records from the same day, will still be assigned two (or more) separate sociobiographical ID numbers in the metadata. For example, Ann Jackson, sentenced to death on January 13, 1790 after having been convicted of breaking and entering and theft is given the following IDs in the record of the court proceedings:

<persName id=”t17900113-2-defend63″ type=”defendantName”> ANN JACKSON
<persName id=”t17900113-2-person75″> Ann Jackson
<persName id=”t17900113-2-person79″> Ann Jackson
<persName id=”t17900113-2-person87″> ANN JACKSON

In the metadata associated with the “punishment summary” document of the same day, Jackson is identified as:

<persName id=”s17900113-1-person1025″> ANN Jackson

Mary Talbot, who had previously been sentenced to transportation, was convicted of “feloniously returning from transportation” and sentenced to death on January 13, 1790. In the metadata of the proceedings, Talbot is identified as:

<persName id=”t17900113-95-defend894″ type=”defendantName”> MARY TALBOT

and in the supplementary materials, in which the condemned pleads leniency on the grounds that she is pregnant, she is identified as:

<persName id=”o17900113-1-defend1024″ type=”defendantName”> Mary Talbot

According to that same document, the fact of her pregnancy is established by a Jury of Matrons, after which point her execution was stayed.

But another version of this record is also included in the Old Bailey database, in which Talbot is identified once as:

<persName id=”s17900113-1-person1037″> Mary Talbot

and twice as:

<persName id=”s17900113-1-defend1038″ type=”defendantName”> Mary Talbot

On December 8th of the same year, both Talbot and Jackson appear again in the records, having been “offered His Majesty’s Pardon on Condition of being transported for the Term of their natural Lives.” This time, Jackson asks the court to allow her child to accompany her, and the “Court referred her to the Secretary of State.” In the metadata to this document, Jackson appears as:

<persName id=”o17901208-2-defend655″ type=”defendantName”> Ann Jackson

Mary Talbot, in the same record is:

<persName id=”o17901208-2-defend662″ type=”defendantName”> Mary Talbot

An additional record including a verbatim copy of the above record has Talbot and Jackson identified as:

<persName id=”s17901208-1-defend724″ type=”defendantName”> Ann Jackson
<persName id=”s17901208-1-defend731″ type=”defendantName”> Mary Talbot

My questions about the future of this project (and others like it) have to do with how we might collapse these various IDs as we confirm the identities of the individuals and how we can go about annotating the collections of the Old Bailey Online and other similar projects in a way that allows for the creation of cross-platform concordances. I chose Mary Talbot and Ann Jackson because I had recently encountered the following newspaper article, printed in March of 1791 in Poughkeepsie, New York about them:

Jackson Talbot [2]

I want to know how we go about assigning unique identifiers to individuals such as Mary Talbot and Ann Jackson that would allow a researcher who encounters their stories in either of the source collections
to find the materials in additional collections where they are identified as the same individuals we encounter in the Old Bailey archives.
[1] Magnus Huber, “The Old Bailey Proceedings, 1674-1834 Evaluating and annotating a corpus of 18th- and 19th-century spoken English,” Varieng 1 (2007).

[2] “London, Jan. 21” Poughkeepsie Journal, March 16, 1791. Accessed february 23, 2013 http://docs.newsbank.com/openurl?ctx_ver=z39.88-2004&rft_id=info:sid/iw.newsbank.com:EANX&rft_val_format=info:ofi/fmt:kev:mtx:ctx&rft_dat=13F1F193D587BED0&svc_dat=HistArchive:ahnpdoc&req_dat=0D10997327EA07D5

Advertisements

The Old Bailey: Contextual Challenges of Big Data Analysis

25 Feb
October 14, 1842 trail at the Old Bailey. Courtesy of Victorian Calendar blog.
October 14, 1842 trail at the Old Bailey. Courtesy of Victorian Calendar blog.

For the historian, text is our bread and butter. We are constantly encouraged to understand the past through close, critical reading of primary and secondary sources. In some ways, this is only natural; after all, we see the past as a series of stories, and, as storytellers, we know the public will be most invested (and perhaps more entertained) if we focus on individual human experiences to paint a sweeping narrative. In a previous post, I discussed the work of Franco Moretti and his insistence on the historian’s need to embrace what he calls “distance reading” in order to see things from a fuller perspective.[1] Although difficult to initially comprehend, this central argument for the need to step outside the box and analyze data in new, more detached ways is one that has grown on me over the semester. Thus, we are posed with an interesting question; can history in the digital realm help us achieve the traditional depth of expected in textual analysis while remaining distant enough to observe patterns otherwise missed?

Part of the answer lies seeing what we have produced so far. Although we still have a long way to go, sites like The Old Bailey suggest that distant and close reading of sources is certainly possible and profitable. The site is dedicated to allowing users to search through a digitized collection of over 100,000 court proceedings that took place in England’s Old Bailey courthouse from 1674 to 1863. Results can be searched and sorted according to gender of defendant/victim, offenses, verdicts and punishments and displayed in a variety of ways from tables to graphs.[2] Of course, extracting meaning from this abstracted cache depends on context—and thankfully, the site provides a very helpful historical background tab that gives new and returning users the lowdown on where these documents come from and what they mean. This critical addition opens the door to the site’s true potential. One can not only perform the usual keyword searches to dig for specific trials, but also trace the shifting frequency of usage of words in court cases over the years. Even for someone familiar with only the broad strokes of British history, I can easily see the potential for pursuing questions about the shifting political and social nature of the country over the years. For example, it would be quite interesting to see what crimes are most commonly committed when the country is at war, as this might be able to tell us something about the ripple effect of an event like America’s Revolutionary War throughout Britain.

For me, digital history’s most valuable assets are flexibility, transparency and a sense of playfulness—and the Old Bailey has all three. Part of this lies in the site utilizing XML (Extensible Markup Language) formatting and open source access, offering users the chance to not only explore the world but also providing them with a means to craft their own customizable tool belt of search terms—if one has the know-how use XML in such a playful manner. This makes the site highly adaptable in a way few others are—an important factor, given the increasing flexibility we demand from our technology. This flexibility also applies to the number of visual representations of data the site offers in the form of pie, chart and line graphs. Although using textual analysis has great potential to help us weave new webs of connectivity by allowing us to step back and view the “big picture” while giving us the option of occasionally giving into “descriptive reading,” it can be difficult for those of us who have never used it before. Although I am still learning the ins and outs of XML formatting, even I can see that this site is laudable for its attempt to make methodological approaches to digital history more transparent. Fredrick Owens and Trevor Gibbs correctly note that, as digital history is a field concerned with engaging broader audiences in the joys of historical discovery, our emerging methodological techniques must be accessible to a larger public audience rather than remain “an impenetrable and mysterious black box.”[3] The Old Bailey’s use of XML and open source formatting is certainly a step in the right direction in this regard, for all of this allows the visitor to engage these documents on his own terms—approaching history with the same playfulness so often expressed by blossoming professionals.

While these possibilities are exciting, the site’s limitations raise a number of interesting questions about what the future holds for contextually meaningful textual analysis. My colleague Nicholas Sacco rightfully points out that, while The Old Bailey’s analytical techniques for managing “big data” are useful for quantitative analysis, it is still very difficult to extract qualitative meaning (on both an individual and group level) from such a large cache of text.[4] In the end, our current coding parameters still force our machines to think more like machines and less like people. I believe David Brooks best expresses this in a recent New York Times piece, saying; “Data struggles with context… People are really good at telling stories that weave together multiple causes and multiple contexts. Data analysis is pretty bad at narrative and emergent thinking, and it cannot match the explanatory suppleness of even a mediocre novel.”[5]

I began this post by asking whether we can effectively mix quantitative with qualitative analysis without sacrificing emotional and contextual depth for the sake of numerical spread. While my experiences with The Old Bailey have pushed me toward the affirmative, I believe part of the answer lies in tackling a more pressing question; Are we ready to embrace a change in our method of historical storytelling in order to qualitatively meaningful quantitative studies? And, more importantly, can we afford not to be? Gibbs and Owens again warn us that, “This may mean de-emphasizing narrative in favor of illustrating the rich complexities between an argument and the data that supports it. It may mean calling attention to productive failure–when a certain methodology or technique proved ineffective or had to be abandoned. [But] these are precisely the kinds of lessons historians need to learn as they grapple with new approaches to making sense of the historical record.”[6] Even though The Old Bailey is far from perfect, the fact that we are able to create a site of this caliber that embraces such imaginative new forms of thought gives me hope for the future—and reinforces the need for historians to broaden our scope of narrative analysis.


[1] Franco Moretti, Graphs, Maps and Trees: Abstract Models for Literary History (Brooklyn: Verso Books, 2005), 3.

[2] Tim Hitchcock, Sharon Howard and Robert Shoemaker, “Research and Study Guides – Doing Statistics“, Old Bailey Proceedings Online (www.oldbaileyonline.org, version 7.0, 09 December 2012), accessed February 23, 2012.

[3] Frederick W. Gibbs and Trevor J. Owens, “The Hermeneutics of Data and Historical Writing” in Writing History in the Digital Age, ed. Jack Dougherty and Kristen Nawrotzki (Spring 2012 version).

[4] Nicholas Sacco, “The Old Baily Proceedings: Big Data and the Expansion of Research Methods,” Exploring the Past (February 23, 2012), accessed February 24, 2012.

[5] David Brooks, “What Data Can’t Do,” nytimes.com (February 13, 2012), accessed February 25, 2012. Thanks again to Nick for the illuminating article.

[6] Gibbs and Owens.

25 Feb

ngoodlin

The Old Bailey Online is a resource that really helps to explore the potential of the digital realm.  The dataset includes about 197,000 trials conducted in England from the 17th to the early 19th century.[1]  Clearly, these records have been a valuable source of information for researchers for many years; in the past, however, it could take months to glean even a basic fact from such a large mass of data, such as how many women were charged as burglars in the 19th century, or how many children were involved in crimes.  The Old Bailey Online dramatically simplifies the answering of these questions, providing a new way to look at and analyze large datasets.

The Old Bailey Online, then, is an excellent example of a digital tool that showcases how historical methodologies can be changed with the advent of technology.  Frederick W. Gibbs and Trevor…

View original post 547 more words

Mining in the Old Bailey Project

25 Feb
Visualization from website, "With Criminal Intent"

Visualization from With Criminal Intent

The Old Bailey Project is a Goliath in the world of historic big data sets. Composed of around 125 million words from 197,000 trial records of London’s central court, ranging in date from 1674 to 1913, this project provides amazing potential for textual analysis of historic court records. Though this collection was previously available via hard copy or microfilm, this digitization project not only provides expanded access to the documents, but provides a resource to explore how digital humanities can alter the questions that historians ask. As Gibbs and Owens explains, large data sets provide historians an opportunity to interact with their research materials  differently. “Data in a variety of forms,” they explain, “can provoke new questions and explorations,” pushing historians to different queries from the vast sets of information available to them.[1]

The Old Bailey Project provides one such rich data set, which when combined with key tools illuminates a remarkable lens to view 250-years of everyday London life through legal documents. A major strength of the project is its infrastructure, composed of a backbone of XML language. This marked and tagged structure provides strengthens the searching capabilities within the project, and provides the ability to connecting phrases which like meanings; expands categories of crimes, punishment and verdicts; and can links alike names together. Further, this rich XML language creates a powerful data set, which when paired with digital tools, allows historians to create intriguing visualizations and find new interpretation. With the ability to search through thousands of records, research is pushed out of a linear fashion, making connections through key words and phrases throughout the corpus of the project. Finding different patterns through trends, or ‘messing around’ with data may yield new questions for historians or identify different patterns or variances. One set of researchers found, for instance, that most poisoning used coffee as the vehicle to administered their crime.[2] Though this example may seem trivial, it illustrates how the Old Bailey project can bring about new questions for historians. Within the ‘traditional’ hard texts, it would be time consuming and difficult to discern instances of coffee and poison appearing within the court records. Further, historians would need the forethought to look for these occurrences. As William Turkel explains, previous scholars “tended to cherry-pick anecdotes without having a sense that it was possible to measure all of that text and treat the whole archive as a single unit.”[3] Data mining this data set reveals new contexts, trends, and patterns easily, which historians can then closely examine to find connections previously unseen.

The project does have its limitations. Though the structured data set is powerful, it is not ideal for every scholar or discipline. Linguist Mangus Huber, for instance, found the search engine limited in his quest to analyze speech patterns of 18th century London.[4] Realistically though, I think the shortcomings of this project are mostly resolved with the use of digital tools, in conjunction with the data.  Take the example of poisoned coffee used above. The researchers at Criminal Intent, a website dedicated to analyzing the Old Bailey Project, used a combination of Zotero and Voyeur to visualize and more easily see the results of their search from the Old Bailey Project website.[5] It was through these tools that the trends were revealed and patterns could be assessed. Huber was able to curtail the project to fulfill his research queries by creating his own altered data corpus and metadata structure.[6] The Locating London’s Past project has used geo-referencing software to map the proceedings and crimes within the Old Bailey records.  Each of these tools, paired with the powerful data set from the project, allow historians to better understand, analyze, and ask new questions about London’s ordinary criminals, and by proxy ordinary citizens. The Old Bailey project is a powerful big data project, but to me, it is only as powerful as the tools used to analyze it.


[1] Fredrick Gibbs and Trevor Owens, “The Hermeneutics of Data and Historical Writing (Spring 2012 version),” Writing History in the Digital Age, Jack Dougherty and Kristen Nawrotzki, eds., (Ann Arbor, MI: University of Michigan Digital Culture Books, 2012).

[2]Beware the Coffee,” With Criminal Intent, March 29, 2011.

[3] Patricia Cohen, “As the Gavels Fell: 240 Years at Old Bailey,” The New York Times, August 17, 2011.

[4] Magnus Huber, “The Old Bailey Proceedings, 1674-1834: Evaluating and Annotating a Corpus of 18th and 19th Century Spoken English,” Varieng 1: Studies in Variation, Contacts, and Change in English, 2007.

[6] Huber, “Annotating a Corpus.”

Digital “Archive Fever”

24 Feb

Old Bailey Online provides a model for the creation of big data and uses for textual analysis, as well as offering a glimpse into future uses for big data.  This on-line project presents not only the tagged data of over 197,000 trials, but also search and statistical programs for the data set. This project is an excellent example of the strength of a public history approach, one that embraces shared authority and wide audiences, and digital history, promoting open access and scholarly community. This project demonstrates, following Jacques Derrida in Archive Fever, that the “process of “archivization” produces the very conditions for history and memory, scholars who use archival materials must be actively involved in these processes of digitization.”[1] This project, as it was conceived and implemented by historians, enriches the types of questions it can answer, and extends its reach. The website has background and teaching materials as well as additional functionality through providing digital data.  There are a wide variety of questions to answer and new questions continually emerging, though there are limits to what can be learned.  The availability of using the digital source data allows for innovative combinations of data from other sources. One promise these sources articles offers were possible research design information suggesting ways that I can explore big data sets.  This project shows how important it is for public historians overcome the limits of the American archival landscape to better facilitate the creation and analysis of big data, possibly through public history.Image

Historians are comfortable with “sources,” and therefore analyzing sources, even in a digital environment. They are generally less familiar with thinking of sources as data or their subsequent analysis as textual rather than historical.  Gibbs and Owens define “data” as computer process-able information, and “big data,” by extension, is a collection of data.[2] How big is big?  The answer can vary, Tim Hitchcock, one of the Old Bailey Online developers, writes, in his recent article “Textmining British Studies: an Overview of Recent Developments, that there is an “almost ‘infinite’ archive” of digital printed primary sources of the “traditional cannon of western historiography. “  It is difficult to imagine the scope of this archive; Hitchcock estimates that more than 60 percent of every word published in English between 15th century and 1923 is available online. [3]This “data” is presented in the form of “electronic texts” that digitally represent oral or written language in a form from any number of different types of sources.   Textual Analysis, when done digitally often referred to as textmining, can be understood as a research method that describes and interprets the characteristics of any text [4]

How can historians use the traditional research process of “immersing” ourselves in the sources in the world of the “infinite archive?”  The key difference in this world of big data and textual analysis is not necessarily new data, although it may change in scope, but new relationship with the sources brought on by new tools, research methods, questions and products. Tara McPherson, in her blog post “Sharing the Archives” analyses these relationships, “In the past, humanities scholars have raided archives in order to capture their treasures for our books and articles.  This relationship has often been uni-directional and vampiric, giving little back to the archive.”  The infinite archives pose a new set of limits on historical research, not the problem of scarcity but of abundance. To varying degrees, historians must practice Moretti ‘distance reading” whether skimming online research queries for the most promising leads or developing sophisticated digital analytical tools.  Morretti’s answer was “Maps, Graphs and Trees,” or more generally visualizing the relationships from a distance to reveal patterns to help understand patterns and meaning.[5]  The keys to understating these new research paradigms are angle of view, visualization strategy and relationships between the visualizations or the products of analysis to historical analysis.[6]

The Old Bailey Online allows us to look at an archive of big data, digitized in reference to historical profession as well as the tools and products of a textual analysis of the data.  The Old Bailey Online offers a window into not only the London crime underworld of the 18th century, but also into the brave new world of historical possibilities.  On one hand, these records have long been available and used by historians for social and political histories, which can now be done quicker, better and faster. With new questions and methods, these “inherited texts” can answer questions about how “text evidences a more or less unknowable past.”[7]  McPherson explains “our interpretations might live within the archive, curating pathways of analysis through its datasets or reframing the archive via new interfaces and multiple points of view.[8]   Specifically for the Old Bailey Online, we can evaluate what it is and what questions it can answer and what it can’t, the types of visualization strategies possible within the site and with connection to other data and the relationships of these products and strategies to the craft of history and public history.

The Old Bailey Online creates “multiple pathways of analysis” through a user friendly search interface, statistical programs, bibliographies and secondary contextual articles, scholarly interaction and additional functionality through the source digital data.  This is a large data set with over 197,000 trials from 1674-1834. The data is richly tagged to allow for enhanced searches by defendant, victims, and types of crimes, verdict and punishments.[9] The creators have demonstrated a commitment to continual updates of the database to increase functionality.[10] There is a combined quantitative and quantitative multi-faceted analysis tool built into the website with a variety of display and export options.  Overall, the site has a user friendly interface, and is accessible to non-academic or specialist audiences, providing additional context information and teaching materials.

Historians must remember that text mining is way to analyze language rather than meaning.[11] Historians must use other tools for moving from language to meaning, whether simple analysis of results, close reading, word triangulation or other methods.  Once the text has mined to created data, then the meaning can take the form of semantic, geospatial or chronology just to name a few.  Hitchcock advocates the use of big data projects to a make use of the explosion of material available digital material and analytical tools. [12]  These areas of analysis are more comfortable for historians because the analytical process is more similar to traditional historical analysis. One simple tool is Ngram, spawning the science of culturomics, or a quantitative study of history using Google books.  This positivist endeavor is similar to the quantitative history of the 1970. Many historians reject that these proprietary Google algorithms are capturing some objective truth, but the tool remains useful for developing historical questions.[13] The triangulation of data, or establishing relationship between three terms, is more like historical studies, after cultural turn.

The Old Bailey Online has limits based upon the inherent textual content of the source, but the availability of the digital source material to researchers allows for innovative combinations of data allowing for new points of view.  While there are any number of reports that can be run these results are not interpretation, but themselves data and a departure point for analysis.   Of particular interest to me in Old Bailey Online the is how the interface allows the researcher to moving back and forth from close reading to distance reading. This constant shifting of lenses allows the distance reading to add additional texture to a close reading, thereby not negating the narrative potential but increasing it. A researcher is able to drill down to see individuals in statistics, and able to add texture to a narrative account in areas not filled in by specific individual examples. She is also able to overlay the textural data on to build and natural environments directly with the mapping function and indirectly through a material culture analysis.

Curating pathways of analysis,”   through making visible is strength of the Old Bailey Online, and related efforts by Hitchcock.  When scholars detail their research design information it has the power to allow others to better understand their processes and think about ways they can profit from their experiences. I was particularly interested in Gibbs and Owens contention that even flawed research designs can offer important information. I was impressed and inspired by the variety of research designs and products outlined by Hitchcock.[14]   The utilization of on-line sites with criticism and editing features such as “History Working Papers” and embedded data in journal articles, such as in the Hitchcock article offer tremendous possibilities for enhancing the didactic features of these articles.[15]

Magnus Huber’s “The Old Bailey Proceedings, 1674-1834 Evaluating and annotating a corpus of 18th- and 19th-century spoken English,” was a fascinating example of how all of the pieces can come together in ways that the project designer would not have perceived.  Huber takes a linguistics approaching how the data can be used to capture spoken English from the period.  She specifically outlines her research design, how she structures the data, and how she tested and overcame the limitations of the source. [16]

ImageAcross the pound, public history has the possibility to assist in overcoming the topography of the American archival landscape to better facilitate the creation and analysis of big data.  The examples we looked at this week were based on Canada and British models, which have along experience in creating corpus of historical material and a strongly governmental archival culture.  Historians can no longer afford to be vampiric, and raid the archives of its treasures with impunity.  In the United States, the archival landscape is bifurcated with a strong focus on localisms and completion.  In this brave new world, curators and historians can most profitably work together in the creation of this big data. One of the largest big data projects, Ancestry.com, is of limited utility to scholars for datamining as the creators are not usually willing to let scholars have access to the sources information.  In the process, I hope that we heed Timothy Hitchock’s warnings and do not simply use these digital tools to enforce the hegemonic narrative but recovering missing stories and places that we may never even thought to look for.  His work on color suggests that the past may not always need to be presented in black and white. [17]


[1]Jacque Derrida, Archive Fever: A Freudian Impression (Chicago: University Of Chicago Press, 1998). Quoted from Tara McPherson “Sharing the Archive”.

[2] Frederick W. Gibbs and Trevor J. Owens, “The Hermeneutics of Data and Historical Writing” in Writing History in the Digital Age, ed. Jack Dougherty and Kristen Nawrotzki (Spring 2012 version).

[3] Tim Hitchcock, “Textmining British Studies: an Overview of Recent Developments,” History Working Papers (2012).

[4] Geoffrey Rockwell and Ian Lancashire, “What is Textual Analysis?http://tapor.ualberta.ca/Resources/TAIntro/

[5] Franco Moretti, Graphs, Maps, Trees: Abstract Models for a Literary History (Verso, 2007). It was interesting that many of the projects in Hitchcock’s paper took one of these general formats.

[9] Tim Hitchcock, Sharon Howard and Robert Shoemaker, “Research and Study Guides – Doing Statistics“, Old Bailey Proceedings Online (www.oldbaileyonline.org, version 7.0, 09 December 2012 ).http://www.oldbaileyonline.org/static/Proceedings.jsp

Exploring Old Bailey Online

24 Feb

            Taking digital history this semester has opened my eyes to ways in which history can interact with other disciplines. Specifically, for the past couple of weeks, our class has been exploring how textual analysis and the creation of “long data” fields can reveal new historical questions. The Old Bailey project makes the proceedings of London’s central criminal court from the seventeenth century through the twentieth century available in an open access format. Ultimately, there is quite a bit that Old Bailey can do from a textual analysis perspective that is of use to historians. Perhaps most importantly, it allows oscillation between distant readings (e.g. looking at “long data” fields) and close readings (e.g. looking at individual court records). Despite its numerous advantages, Old Bailey has a great deal of unrealized potential in terms of linking textual data. Further work in this area could make Old Bailey and other similar projects revolutionary in their ability to facilitate primary source research.

            What makes Old Bailey’s interface distinct and extremely valuable is flexibility. Researchers can use its search tools to locate specific court records (limiting results with fields such as gender, verdict, crime, and punishment). This facilitates close readings of court records such as this one, where you can read a witness’s account of Ann Curtin stealing flannel fabric, and Curtin’s statement of defense. Old Bailey also allows more distant readings of the data in the court records. The statistics search once again lets researchers use specific search criteria, but produces results in the form of charts and graphs. Here, you can see that I wanted to find out how many of the women convicted of simple larceny, like Curtin, were found guilty vs. found not guilty. It seems that most of these women did not get off easily. The ability to switch between close readings of data and more distant readings is useful in a most rudimentary way because it allows researchers to look at the same data from different points of view. In terms of textual analysis, if someone didn’t have the time to read through every single court record in which a woman was found guilty of simple larceny, he/she could do a statistic search to take a more distant look at that data set. What is nice is that one can click on the data displayed in the charts and it will link back to all of the relevant court records. Now that allows EASY oscillation between close and distant readings of data!

            Though I haven’t experimented with it very much, another positive aspect of the Old Bailey project is that once a researcher has found a record that he/she is interested in, he/she can search through the “associated records” feature for related documents. These documents are linked through the individual trial record pages, and the individual trial records can likewise be accessed through a link in the associated record. This would probably be most useful in cases where researchers are focusing in on close readings of certain types of court cases.  

            The Old Bailey project is definitely a treasure trove of information for researchers of British law and life. However, it does have some shortcomings. It seems that the project is still in the early phases of linking data, meaning that it does not allow for deep levels of historical and textual analysis. For instance, if I wanted to find more court cases involving Ann Curtin, I would likely have to conduct a specific search using her name as my main search criteria. If I found other Ann Curtins in the database, it might not be clear if she was the same person. There might have been multiple Ann Curtins tried for different crimes in the Old Bailey court. Increased levels of linked data might allow researcher to track individuals throughout the records.

             The fact that the Old Bailey project openly displays its XML data (here is the XML code for Ann Curtin’s trial) means that other data analysts might be able to build off of what Old Bailey has done thus far. Open access platforms, such as Old Bailey, which promote transparency open opportunities to link data across other similar databases and websites. It would be amazing, for instance, to be able to link census records to the Old Bailey proceedings to provide more information in regards to where defendants and victims lived. Of course, there would be limitations to this based on how much information is available about each individual (in other words, one would have to make sure that the Ann Curtin found in the court proceedings matches the one found in census records). All of the possibilities involving linked data rely on textual analysis.

             I also noticed while exploring the Old Bailey project that the recording of court proceedings was largely a commercial enterprise in the seventeenth century, and later came under the control of the city of London. This was very interesting to me, and prompted me to ask how the language in the proceedings might have changed between 1679 and 1778 (when the city of London gained control of the recording of proceedings). I would like to see Old Bailey launch small-scale textual analysis projects that might show users how the project’s data can be used in different ways, and I think an exploration of change in language between 1679 and 1778 might be a great start. Such projects would link contextual historical information with textual analysis. As a historian who is not quite sure how, or whether, textual analysis can enhance my research, I would like to see Old Bailey demonstrate how historical data and textual analysis can go hand in hand.[1]


[1] All information for this post was taken from Old Bailey Online: The Proceedings of the Old Bailey, 1674-1913,http://www.oldbaileyonline.org//forms/formMain.jsp.

 

 

The Old Bailey Online: a step toward changing how historians do history

23 Feb

Murder, robbery, scandal! Who would not be fascinated by some aspect of the proceedings of London’s central criminal court (Old Bailey) from 1674 to 1913? This source is a treasure trove to not only historians, but linguists, sociologists, and members of a host of other disciplines. However, without accesses to these sources they are little use to most of the people who could learn from the recordings. The Old Bailey Online seeks to make this valuable source accessible to large number of people by digitizing and offering the proceedings online. But what makes the website really useful is the fact that it is “fully searchable” and provides a variety of tools and resources to help any scholar wishing to explore the material. Although there is a lot to discuss about this amazing project, my focus in this blog post will be the tools and resources that the Old Bailey Online offers.  Having digital textual analysis tools built into the website allows this project to be more useful than a typical digitization project would be. Big data sets that come with digitization projects, like the Old Bailey Online, allow historians to reconsider their methodology and explore new research questions.

            Digitization and digital searches allow historians almost instantaneous access to more sources—and data from those sources—then they once could have only dreamed of processing during a lifetime. In many ways the traditional historical methodology hinders and limits the interpretations we can discover in these big data sets. How this methodology should change is still being debated.[i] Tim Hitchcock, one of the project directors, acknowledges that big data sets, such as the Old Bailey Online, require different strategies: “we can’t even begin to read all the material one would want to consult in a classic immersive fashion.”[ii] The Old Bailey Online website provides ways of manipulating the data to present findings that would not be possible with the close reading methods historians’ typical use to analyze their texts. The “Statistics” search function of this website allows users to “count” trails by specific criteria (such as age, punishment, or offense) and produce tables, graphs, and pie charts based on the data.

This statistical tool allows you to see big picture trends over a specific time period and find patterns that would not be evident if traditional methodology was used. Using this tool allows the user to manipulate variables of time, gender, age, punishment, and crime and analyze them in relation to each other.  Old Bailey Online goes a step further and also allows the user to combine this “distant reading” with a historian’s traditional way of analyzing text at a micro level (i.e. one case and all of its circumstances at a time). In their piece arguing for the need for more methodological transparency in history writing, Frederick Gibbs and Trevor Owens posit that, “As historical data become more ubiquitous, humanists will find it useful to pivot between distant and close readings.”[iii] When I generate a table (such as this random one) I can click on individual intersections of data and find specific cases that fall under the category (for example all the cases of 10 year olds in the 1790s who were accused of animal theft).  By providing this feature Old Baily Online makes it easier for a scholar to combine both distant and close textual analysis and understand their topic on more than one level.

Although helpful, there is a lot more that the Old Bailey database could be manipulated to do. The tools currently available on the website mainly help scholars who are interested in questions about criminals and court proceedings. However, the Old Bailey proceedings can be useful for an unimaginable number of other research interests. For example, Magnus Huber writes about a project that used the data of Old Baily and made it more useful for linguists who are interested in answering questions about the specific words and syntax that were used in the proceedings.[iv] Textual analysis tools that were able to track the month and offence over time might also provide interesting insights into when crime happens. The important point here is that scholars should not limit themselves based on what the Old Bailey Online project has offered for use. The project directors have been very open with their data, and for every case, a XML version is easy to access. Other questions could be explored by developing more digital tools to analyze this data and this should be encouraged.

Old Bailey Online is not the perfect digital source. However, I believe it to be a great example of how a digitalization project can move beyond providing documents to scholars online. Data can change the way historians view and interpret the past but only if there are ways to analyze big data sets beyond the traditional historical methodology. For me, that the Old Bailey Online provides digital textual analysis tools is a step toward a digital project revolutionizing how historians do history.


[i] My recent exploration of Moretti’s Graphs, Maps, and Trees was the beginning of my attempt to understand what new tools and techniques are needed as historians engage in digital sources.

[ii] Tim Hitchcock, “Textmining British Studies: an Overview of Recent Developments,” History Working Papers (2012).

[iii] Frederick W. Gibbs and Trevor J. Owens, “The Hermeneutics of Data and Historical Writing” in Writing History in the Digital Age, ed. Jack Dougherty and Kristen Nawrotzki (Spring 2012 version).

Questions and Searches on Old Bailey Online

23 Feb

The digitized records of the Old Bailey proceedings are extremely large in scope.  This collection of over 100,000 trials allows for the examination of various questions stemming from the British justice system over centuries.  Instead of taking a micro view, historians can step back to look for more gradual changes over time and identify patterns and deviancies through statistical analysis.  Other digital tools can also aid historians in making connections within the text itself.

The proceedings data set would be unwieldy to work with without the help of various tools.  Luckily for the researcher, Old Bailey Online provides many tools for filtering and organizing its data.[1]  Besides basic searches by categories including name, gender, offense, and time period, the web site also allows for statistical analysis, where one can chart, for example, the number of counts of deception allegations over the decades, or even specify it further to only show those cases where the defendant was female and/or ruled guilty.  For this type of analysis, specifics are reduced to mere numbers and it is the amount that is important.  Historians begin by asking “how much/many…?” and then move on to other questions based on what they discover.

The graphing of data is important in encouraging those new questions, which again deal with amounts and change. How does this amount compare to that amount?  How does the number of x change?  Are these numbers reliable? Mapping also adds a “where” to the equation.[2]  Statistical questions yield statistical answers, yet historians know that that is only the beginning of an exploration.  Franco Moretti addressed this when he wrote that graphs do not provide interpretation and that maps often do not provide explanations, but “at least [show] us that there is something that needs to be explained.”[3]  It is the “why” that interests historians, and while these tools can address some of the basic questions and provide a framing context, they cannot go much deeper.

The usefulness of these search tools also depends on what focus a historian takes.  Looking at statistical information related to gender or certain type of crime is simplified because those are default search categories. However, looking for other types of text-based information can be more difficult.  For example, motives and evidence are not as easily phrased in an easy key word format and you have to know exactly what you’re searching for (and all possible spellings) to receive results.

Looking at the text itself is a whole different approach.  Magnus Huber’s research into the spoken language of the proceedings shows a way to also statistically analyze language usage.[4]  He used the search function to look for various appearances of certain words. However, for a linguist, the search function also had limitations because there was no way to search between different types of text, like what parts were transcribed speech.  This is where the concept of text markup (using xml) and linked data come into play.

We talked about this briefly in class, but I think it merits further consideration to think about all of the connections one can add to the Old Bailey proceedings through xml. Marking parts of speech could be useful to a linguist, or for historians, topic modeling could be useful for searching by topics rather than keywords and allow historians to more easily discover information and even conventions of the time.[5] Also important is what Huber is doing: linking the speech to the sociobiographical information of the speaker.  For historians, it would be useful to have a database with more information about the people who appear in the proceedings as well.  While details like age and occupation are encoded, it should be clearer who each individual person is, which can be established through some type of name authority file. Linking to data from other sources (like census records) for the individuals listed is one way to provide historians with information  about the actors in these court proceedings.  I’m interested to hear other ideas as well.

While Old Bailey Online is well set up for statistical analysis, implementation of other digital tools can help add more context and the ability to search and analyze content in an effort to explore the “why” questions.


[1] Old Bailey Online: The Proceedings of the Old Bailey, 1674-1913, http://www.oldbaileyonline.org//forms/formMain.jsp.

[2] Locating London’s Past, http://www.locatinglondon.org/. On this site, the trial data can be overlaid with historical maps to show spatial patterns.

[3] Franco Moretti, Graphs, Maps, Trees: Abstract Models for a Literary History (London: Verso, 2005), 9, 39.

[4] Magnus Huber, “The Old Bailey Proceedings, 1674-1834. Evaluating and annotating a corpus of 18th- and 19th-century spoken English,” Varieng 1 (2007), http://www.helsinki.fi/varieng/journal/volumes/01/huber/.

[5] Tim Hitchcock, “Textmining British Studies: an Overview of Recent Developments,” History Working Papers (2012), http://www.historyworkingpapers.org/?page_id=266.

Navigating the Old Bailey Online

19 Feb

I do not know much about British History, but one does not have to be a British Historian to get excited about the cache of documents available through the Old Bailey Online. Luckily, there is a historical background tab that thrilled my social historian heart and provided me with context about where these documents come from and what they contain. Given my own research interests in social, gender, and legal history, I would ask questions about how women were treated in the courts; what the kinds of cases women were most often involved in; and whether women were sentenced less harshly than men. While these questions could be answered with the search boxes the site provides, it is noteworthy that the Old Bailey Online is coded in XML and is open access. Meaning, that the data is more easily manipulated when historians have the knowledge needed to combined data sets and create their own searching standards in XML. The sources available in the Old Bailey Online and the way it is coded make this database revolutionary.

The textual analysis conducted by using the Old Bailey Online demonstrates how remarkable the site is. Textual analysis traces the changes in a language through analyzing the usage of words and their frequency. Most often these kinds of studies can only be done on published works, which do not accurately depict a language as the masses would have used it. Through building the Old Bailey Online, historians can analyze proceedings from 1674 to 1834. This amounts to over 100,000 trials with about 52 million words and passages. Computers have always been instrumental to the study of text analysis given the massive data sets, but the Old Bailey Online goes beyond what was traditionally done.[1] As Magnus Huber wrote about the site and its materials, the Old Bailey Online “thus offers the rare opportunity of analyzing everyday language in a period that has been neglected both with regard to the compilation of primary linguistic data and the description of the structure, variability, and change of English.” [2] The Old Bailey Online provides easy access to, and manipulability of, one of the only sources that give historians a look at the speech patterns of commoners.

The searching features of the Old Bailey Online are likewise pretty amazing, but not perfect. Textual analysts, for example, are not completely happy because they cannot search for contractions.[3] However, because the database is open access, if textual analysts had a lot of time and money, they could go back and fix that for their own purposes.[4] Personally, I never would have thought twice about using anything beyond the search features until we discussed in class that visitors can access the XML code. Through manipulation of this code, comparing other large databases and cross-referencing for people is significantly easier. Using XML is still a fairly new tool in the digital history world. However, a lot of digital historians prefer XML for the possibilities of combining and efficiently searching large data sets.[5]

The Old Bailey Online launched in 2003 and has inspired a reviewer to write, “as someone who probably visits the site two or three times a week, I am bound to wonder at how we all managed before then.”[6] The largest critique of this historian was that the Old Bailey’s papers are not complete. Responding to this critique, the creators of the site plugged their newer database, London Lives, which attempts to provide a fuller picture of London crime in the same accessible format.[7]

Perhaps in a few years with the help of open access classes, I may become more proficient at realizing the possibilities and shortcomings of XML and sites like the Old Bailey Online.[8] Until then, I cannot pretend my few weeks of studying digital history provides me a full understanding of the significance of such databases. Even a novice can see the site is still encountering some technical issues. The scan of the handwritten original is useful, but I have been unable to open one. As this site was launched ten years ago, I suspect updates are needed. Even with technical glitches, this website is clearly a standout and trendsetter in the field of digital history.


[1] Father Roberto Busa started a project in 1940 which he eventually transitioned onto a computer. The final product was published in 1970. Geoffrey Rockwell and Ian Lancashire, “Electronic Texts and Text Analysis,” TAPoR, http://tapor.ualberta.ca/Resources/TAIntro/ (accessed February 14, 2013).

[3] Ibid.

[4] My knowledge of this kind of thing is limited. But, the access to the code and a digitized picture of the original document leads me to believe that anything is possible if one has the time and money.

[5] As Daniel Cohen and Roy Rosenzweig pointed out in order to combine their historical math collections, Cornell, the University of Michigan, and the State and University Library of Göttingen used XML.  Daniel Cohen and Roy Rosenzweig, “Appendix” in Digital History (Philadelphia: University of Pennsylvania Press, 2005), 249-260.

[6] Dr. Drew D. Gray, review of “The Old Bailey Proceedings Online,” Reviews in History, http://www.history.ac.uk/reviews/review/897 (accessed February 15, 2013).

[7] Clive Emsley, Tim Hitchcock, Robert B. Shoemaker, “Author’s Response,” Reviews in History, http://www.history.ac.uk/reviews/review/897 (accessed February 15, 2013).

[8] The Programming Historian is a site run out of the Center for History and New Media. “The Programming Historian,” Roy Rosenzweig Center for History and New Media, http://programminghistorian.org/ (accessed February 14, 2013).