Digital “Archive Fever”

24 Feb

Old Bailey Online provides a model for the creation of big data and uses for textual analysis, as well as offering a glimpse into future uses for big data.  This on-line project presents not only the tagged data of over 197,000 trials, but also search and statistical programs for the data set. This project is an excellent example of the strength of a public history approach, one that embraces shared authority and wide audiences, and digital history, promoting open access and scholarly community. This project demonstrates, following Jacques Derrida in Archive Fever, that the “process of “archivization” produces the very conditions for history and memory, scholars who use archival materials must be actively involved in these processes of digitization.”[1] This project, as it was conceived and implemented by historians, enriches the types of questions it can answer, and extends its reach. The website has background and teaching materials as well as additional functionality through providing digital data.  There are a wide variety of questions to answer and new questions continually emerging, though there are limits to what can be learned.  The availability of using the digital source data allows for innovative combinations of data from other sources. One promise these sources articles offers were possible research design information suggesting ways that I can explore big data sets.  This project shows how important it is for public historians overcome the limits of the American archival landscape to better facilitate the creation and analysis of big data, possibly through public history.Image

Historians are comfortable with “sources,” and therefore analyzing sources, even in a digital environment. They are generally less familiar with thinking of sources as data or their subsequent analysis as textual rather than historical.  Gibbs and Owens define “data” as computer process-able information, and “big data,” by extension, is a collection of data.[2] How big is big?  The answer can vary, Tim Hitchcock, one of the Old Bailey Online developers, writes, in his recent article “Textmining British Studies: an Overview of Recent Developments, that there is an “almost ‘infinite’ archive” of digital printed primary sources of the “traditional cannon of western historiography. “  It is difficult to imagine the scope of this archive; Hitchcock estimates that more than 60 percent of every word published in English between 15th century and 1923 is available online. [3]This “data” is presented in the form of “electronic texts” that digitally represent oral or written language in a form from any number of different types of sources.   Textual Analysis, when done digitally often referred to as textmining, can be understood as a research method that describes and interprets the characteristics of any text [4]

How can historians use the traditional research process of “immersing” ourselves in the sources in the world of the “infinite archive?”  The key difference in this world of big data and textual analysis is not necessarily new data, although it may change in scope, but new relationship with the sources brought on by new tools, research methods, questions and products. Tara McPherson, in her blog post “Sharing the Archives” analyses these relationships, “In the past, humanities scholars have raided archives in order to capture their treasures for our books and articles.  This relationship has often been uni-directional and vampiric, giving little back to the archive.”  The infinite archives pose a new set of limits on historical research, not the problem of scarcity but of abundance. To varying degrees, historians must practice Moretti ‘distance reading” whether skimming online research queries for the most promising leads or developing sophisticated digital analytical tools.  Morretti’s answer was “Maps, Graphs and Trees,” or more generally visualizing the relationships from a distance to reveal patterns to help understand patterns and meaning.[5]  The keys to understating these new research paradigms are angle of view, visualization strategy and relationships between the visualizations or the products of analysis to historical analysis.[6]

The Old Bailey Online allows us to look at an archive of big data, digitized in reference to historical profession as well as the tools and products of a textual analysis of the data.  The Old Bailey Online offers a window into not only the London crime underworld of the 18th century, but also into the brave new world of historical possibilities.  On one hand, these records have long been available and used by historians for social and political histories, which can now be done quicker, better and faster. With new questions and methods, these “inherited texts” can answer questions about how “text evidences a more or less unknowable past.”[7]  McPherson explains “our interpretations might live within the archive, curating pathways of analysis through its datasets or reframing the archive via new interfaces and multiple points of view.[8]   Specifically for the Old Bailey Online, we can evaluate what it is and what questions it can answer and what it can’t, the types of visualization strategies possible within the site and with connection to other data and the relationships of these products and strategies to the craft of history and public history.

The Old Bailey Online creates “multiple pathways of analysis” through a user friendly search interface, statistical programs, bibliographies and secondary contextual articles, scholarly interaction and additional functionality through the source digital data.  This is a large data set with over 197,000 trials from 1674-1834. The data is richly tagged to allow for enhanced searches by defendant, victims, and types of crimes, verdict and punishments.[9] The creators have demonstrated a commitment to continual updates of the database to increase functionality.[10] There is a combined quantitative and quantitative multi-faceted analysis tool built into the website with a variety of display and export options.  Overall, the site has a user friendly interface, and is accessible to non-academic or specialist audiences, providing additional context information and teaching materials.

Historians must remember that text mining is way to analyze language rather than meaning.[11] Historians must use other tools for moving from language to meaning, whether simple analysis of results, close reading, word triangulation or other methods.  Once the text has mined to created data, then the meaning can take the form of semantic, geospatial or chronology just to name a few.  Hitchcock advocates the use of big data projects to a make use of the explosion of material available digital material and analytical tools. [12]  These areas of analysis are more comfortable for historians because the analytical process is more similar to traditional historical analysis. One simple tool is Ngram, spawning the science of culturomics, or a quantitative study of history using Google books.  This positivist endeavor is similar to the quantitative history of the 1970. Many historians reject that these proprietary Google algorithms are capturing some objective truth, but the tool remains useful for developing historical questions.[13] The triangulation of data, or establishing relationship between three terms, is more like historical studies, after cultural turn.

The Old Bailey Online has limits based upon the inherent textual content of the source, but the availability of the digital source material to researchers allows for innovative combinations of data allowing for new points of view.  While there are any number of reports that can be run these results are not interpretation, but themselves data and a departure point for analysis.   Of particular interest to me in Old Bailey Online the is how the interface allows the researcher to moving back and forth from close reading to distance reading. This constant shifting of lenses allows the distance reading to add additional texture to a close reading, thereby not negating the narrative potential but increasing it. A researcher is able to drill down to see individuals in statistics, and able to add texture to a narrative account in areas not filled in by specific individual examples. She is also able to overlay the textural data on to build and natural environments directly with the mapping function and indirectly through a material culture analysis.

Curating pathways of analysis,”   through making visible is strength of the Old Bailey Online, and related efforts by Hitchcock.  When scholars detail their research design information it has the power to allow others to better understand their processes and think about ways they can profit from their experiences. I was particularly interested in Gibbs and Owens contention that even flawed research designs can offer important information. I was impressed and inspired by the variety of research designs and products outlined by Hitchcock.[14]   The utilization of on-line sites with criticism and editing features such as “History Working Papers” and embedded data in journal articles, such as in the Hitchcock article offer tremendous possibilities for enhancing the didactic features of these articles.[15]

Magnus Huber’s “The Old Bailey Proceedings, 1674-1834 Evaluating and annotating a corpus of 18th- and 19th-century spoken English,” was a fascinating example of how all of the pieces can come together in ways that the project designer would not have perceived.  Huber takes a linguistics approaching how the data can be used to capture spoken English from the period.  She specifically outlines her research design, how she structures the data, and how she tested and overcame the limitations of the source. [16]

ImageAcross the pound, public history has the possibility to assist in overcoming the topography of the American archival landscape to better facilitate the creation and analysis of big data.  The examples we looked at this week were based on Canada and British models, which have along experience in creating corpus of historical material and a strongly governmental archival culture.  Historians can no longer afford to be vampiric, and raid the archives of its treasures with impunity.  In the United States, the archival landscape is bifurcated with a strong focus on localisms and completion.  In this brave new world, curators and historians can most profitably work together in the creation of this big data. One of the largest big data projects, Ancestry.com, is of limited utility to scholars for datamining as the creators are not usually willing to let scholars have access to the sources information.  In the process, I hope that we heed Timothy Hitchock’s warnings and do not simply use these digital tools to enforce the hegemonic narrative but recovering missing stories and places that we may never even thought to look for.  His work on color suggests that the past may not always need to be presented in black and white. [17]


[1]Jacque Derrida, Archive Fever: A Freudian Impression (Chicago: University Of Chicago Press, 1998). Quoted from Tara McPherson “Sharing the Archive”.

[2] Frederick W. Gibbs and Trevor J. Owens, “The Hermeneutics of Data and Historical Writing” in Writing History in the Digital Age, ed. Jack Dougherty and Kristen Nawrotzki (Spring 2012 version).

[3] Tim Hitchcock, “Textmining British Studies: an Overview of Recent Developments,” History Working Papers (2012).

[4] Geoffrey Rockwell and Ian Lancashire, “What is Textual Analysis?http://tapor.ualberta.ca/Resources/TAIntro/

[5] Franco Moretti, Graphs, Maps, Trees: Abstract Models for a Literary History (Verso, 2007). It was interesting that many of the projects in Hitchcock’s paper took one of these general formats.

[9] Tim Hitchcock, Sharon Howard and Robert Shoemaker, “Research and Study Guides – Doing Statistics“, Old Bailey Proceedings Online (www.oldbaileyonline.org, version 7.0, 09 December 2012 ).http://www.oldbaileyonline.org/static/Proceedings.jsp

3 Responses to “Digital “Archive Fever””

  1. Nick Sacco February 25, 2013 at 5:18 am #

    Wow, what a comprehensive essay! I like a lot of what you are saying here and wholeheartedly agree that we find ourselves dealing with an abundance of sources when looking at any post-American Civil War event, around the time the field of history became professionalized. Although I find the term “vampiric” rather odd when describing the “traditional” way of utilizing archival repositories, but McPherson nontheless provides some exciting ideas for creating new relationships with archival material. In the future we will need people who can specialize in “distant reading” who can compliment and enhance the many studies concentrating on “close reading.”

  2. angelabpotter February 25, 2013 at 4:23 pm #

    Nick,
    I took vampric to mean coming in and draining the “blood” or knowledge from the archives but not giving back to the archives or establishing a relationship with the archives. I have to say that this was the prevailing attitude at least where I have worked in the past.
    Now, there are two conflicting trends, 1) people do not even have to go to the archives, they are “sucking” the content from online and 2) historians are in a postion of shared authority with the rise of collaborative projects and public history. People have use the “prison” metaphor, alluding to the material being locked away but I thought the vampire graphic and thought provoking.
    Should be an interesting class. Thanks for the comment.
    Angie

  3. Nick Sacco February 25, 2013 at 9:36 pm #

    Angela,

    It wasn’t you but McPherson who left me wondering what it meant to have a “vampiric” relationship. I’m still not sure I understand how historians of the past could have drained the “blood” of archives without establishing a relationship with them. Did archival institutions actively attempt to establish a relationship with historians, which the historians rebuffed? Or was this a two way street? What did it mean to have “a relationship” between historians and archivists in the 19th and 20th centuries? If a historian uses archival resources and then turns those resources into a great study of the past–one that captivates its audience and gets them into history–is that not in some way sharing the power of archives with the broader public and at least a partial sharing of authority with archivists? Archival institutions have always gotten credit and are cited in historical studies, so it it seems to me that at least some sort of reciprocal relationship between historians and archivists has existed.

    I’m just throwing out some ideas, not concrete answers. Regardless, we can all agree that that relationship is changing.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: