Week 8: Big Data: Mining and Visualization

Summary

This week focuses on the topics of data mining and visualization. Data mining is the process of extracting information from “big data.” This is often accomplished by using a computer algorithm to look for patterns in the data sets. Visualization is the process of transforming “big data” into visually comprehensible images — graphs, maps, charts, tables, etc. Both data mining and visualization are forms of interpretation that rely on theoretical abstractions and methodological constructs to both define the process of analysis and representation.

Outline

I. Discussion: What is data mining? (1 hr)
II. Break (5 min)
III. What is data visualization? (1 hr)
IV. Break (5 min)
V. How can data mining and visualization be applied to historical analysis? (30 min)

 

Assignments (due before class)

  • Blog Post (500 words): How have historians used data mining and/or visualization effectively to interpret the past?
  • Weekly Twitter Assignment
  • Weekly WordPress Comment Assignment

 

Required Reading

Data mining and visualization are huge topics, and there is no way that we can adequately survey them in a single week. So, keep in mind that this week’s readings just skim the surface. The first group of readings focus in “big data” and its interpretation — sometimes referred to as data mining. Data mining is a big umbrella term for a host of approaches that examine data sets too large to be analyzed or quantified by an individual researcher, or even a small group of researchers. Take, for example, Early English Books Online (EEBO). EEBO includes a huge number of books in English published between 1450 and 1700. It would be impossible for any person to read all of these books, let alone analyze them. So, humans write algorithms that ask specific questions of the texts (a.k.a. the “corpus”). By probing these texts using a computer algorithm, the researcher is, in effect, “mining” the corpus for data — text mining.

There are many ways to analyze “big data.” A consortium of research centers and funders have encouraged scholars to experiment with new ways of examining data sets under the rubric of the “Digging into Data Challenge.” Read through the Digging into Data Challenge website as well as the First Monday report on Digging into Data to get a sense of what the project seeks to accomplish.

Big data poses a challenge not just in the interpretation of large data sets but in the representation of data sets. “Visualization” is the umbrella term for the process of representing data in a visual format. Read the three articles below and consider what role visualization might play in uncovering new historical information. Once you have finished, have a look at the “Periodic Table of Visualization Methods” and consider the strengths and weaknesses of different visualization approaches.

Projects to Investigate

There are lots of data mining projects that you can investigate online. The Digging into Data Challenge website alone has plenty to keep you occupied. Have a look at them and consider what approaches work well for academics and which work well of the public. I want you to focus on the two projects below, which we will discuss in class.

Leave a comment