NEH banner

Funded Projects Query Form
One match

Grant number like: HD-50099-07

Query elapsed time: 0.031 sec

1
Page size:
 1 items in 1 pages
 
1
Page size:
 1 items in 1 pages
Drexel University (Philadelphia, PA 19104-2875)
Robert B. Allen (Project Director: November 2006 to December 2009)

HD-50099-07
Digital Humanities Start-Up Grants
Digital Humanities

Totals:
$30,000 (approved)
$30,000 (awarded)

Grant period:
4/1/2007 – 4/30/2009

Automatic Extraction of Article Metadata from Digitized Historical Newspapers

The development of a programming tool for automatically identifying, categorizing, and describing newspaper articles from digital files produced by the National Digital Newspaper Program (NDNP).

In the next few years, images of several hundred thousand pages will be digitized and available online through the National Digital Newspaper Program. While the digitization process typically includes identification of the words in the text using basic optical character recognition (OCR), the identification and indexing of articles is not required of the project awardees. Articles are the natural unit for interacting with the news. Knowing the articles can improve search accuracy and support user-friendly interaction and it should increase the value of the material for historians, teachers of history, and members of the public who are interested in history. We will develop automated methods for such article-level processing. Specifically we will build a set of Java programs that will use the image files and the OCR files as input and will identify, categorize, and extract descriptions from articles.