Search Criteria

 






Key Word Search by:









Organization Type


State or Jurisdiction


Congressional District





help

Division or Office
help

Grants to:


Date Range Start


Date Range End


  • Special Searches




    Product Type


    Media Coverage Type








 


Search Results

Grant number like: HD-50099-07

Permalink for this Search

1
Page size:
 1 items in 1 pages
Award Number Grant ProgramAward RecipientProject TitleAward PeriodApproved Award Total
1
Page size:
 1 items in 1 pages
HD-50099-07Digital Humanities: Digital Humanities Start-Up GrantsDrexel UniversityAutomatic Extraction of Article Metadata from Digitized Historical Newspapers4/1/2007 - 4/30/2009$30,000.00RobertB.Allen   Drexel UniversityPhiladelphiaPA19104-2875USA2007Library ScienceDigital Humanities Start-Up GrantsDigital Humanities300000300000

The development of a programming tool for automatically identifying, categorizing, and describing newspaper articles from digital files produced by the National Digital Newspaper Program (NDNP).

In the next few years, images of several hundred thousand pages will be digitized and available online through the National Digital Newspaper Program. While the digitization process typically includes identification of the words in the text using basic optical character recognition (OCR), the identification and indexing of articles is not required of the project awardees. Articles are the natural unit for interacting with the news. Knowing the articles can improve search accuracy and support user-friendly interaction and it should increase the value of the material for historians, teachers of history, and members of the public who are interested in history. We will develop automated methods for such article-level processing. Specifically we will build a set of Java programs that will use the image files and the OCR files as input and will identify, categorize, and extract descriptions from articles.