Search Criteria


Key Word Search by:

Organization Type

State or Jurisdiction

Congressional District


Division or Office

Grants to:

Date Range Start

Date Range End

  • Special Searches

    Product Type

    Media Coverage Type


Search Results

Grant number like: HD-50099-07

Permalink for this Search

Page size:
 1 items in 1 pages
Award Number Grant ProgramAward RecipientProject TitleAward PeriodApproved Award Total
Page size:
 1 items in 1 pages
HD-50099-07Digital Humanities: Digital Humanities Start-Up GrantsDrexel UniversityAutomatic Extraction of Article Metadata from Digitized Historical Newspapers4/1/2007 - 4/30/2009$30,000.00RobertB.Allen   Drexel UniversityPhiladelphiaPA19104-2875USA2007Library ScienceDigital Humanities Start-Up GrantsDigital Humanities300000300000

The development of a programming tool for automatically identifying, categorizing, and describing newspaper articles from digital files produced by the National Digital Newspaper Program (NDNP).

In the next few years, images of several hundred thousand pages will be digitized and available online through the National Digital Newspaper Program. While the digitization process typically includes identification of the words in the text using basic optical character recognition (OCR), the identification and indexing of articles is not required of the project awardees. Articles are the natural unit for interacting with the news. Knowing the articles can improve search accuracy and support user-friendly interaction and it should increase the value of the material for historians, teachers of history, and members of the public who are interested in history. We will develop automated methods for such article-level processing. Specifically we will build a set of Java programs that will use the image files and the OCR files as input and will identify, categorize, and extract descriptions from articles.