Preservation and Access: Research and Development

Period of Performance

6/1/2008 - 5/31/2011

Funding Totals (outright + matching)

$320,000.00 (approved)
$300,000.00 (awarded)

Text-Mining and Analysis Tools for Historical Research

FAIN: PR-50019-08

George Mason University (Fairfax, VA 22030-4444)
Roy Rosenzweig (Project Director: July 2007 to January 2008)
Daniel J. Cohen (Project Director: January 2008 to September 2011)

Research, development, and testing of tools designed to locate documents in large digital corpora, extract information, and analyze large-scale patterns across texts.

In the last decade the library community and other providers of digital collections have created an incredibly rich digital archive of historical and cultural materials. Yet most scholars have not yet figured out ways to take full advantage of the digitized riches suddenly available on their computers. Indeed, the abundance of digital documents has actually exacerbated the problems of some researchers who now find themselves overwhelmed by the sheer quantity of available material. Meanwhile, some of the most profound insights lurking in these digital corpora remain locked up. We believe the absence of appropriate methods and interfaces is largely to blame: digital content providers have not yet developed the kind of sophisticated and flexible search, extraction, and analysis tools capable of capitalizing on this vast investment in a digitized cultural heritage.

Media Coverage

Analyzing Literature by Words and Numbers (Media Coverage)
Author(s): Patricia Cohen
Publication: The New York Times
Date: 12/3/2010
Abstract: Victorians were enamored of the new science of statistics, so it seems fitting that these pioneering data hounds are now the subject of an unusual experiment in statistical analysis.

Associated Products

Next History (Web Resource)
Title: Next History
Author: Roy Rosenzweig Center for History and New Media
Abstract: Website for our text mining for history focus group, with pointers to exemplary projects and commentary.
Year: 2009
Primary URL:

Victorians Institute Conference Keynote: Searching for the Victorians (Conference Paper/Presentation)
Title: Victorians Institute Conference Keynote: Searching for the Victorians
Author: Dan Cohen
Abstract: Why did the Victorians look to mathematics to achieve certainty, and how we might understand the Victorians better with the mathematical methods they bequeathed to us? I want to relate the Victorian debate about the foundations of our knowledge to a debate that we are likely to have in the coming decade, a debate about how we know the past and how we look at the written record that I suspect will be of interest to literary scholars and historians alike. It is a philosophical debate about idealism, empiricism, induction, and deduction, but also a practical discussion about the methodologies we have used for generations in the academy.
Date: 10/2/10
Primary URL:
Conference Name: Victorians Institute Conference