Search Criteria


Key Word Search by:

Organization Type

State or Jurisdiction

Congressional District


Division or Office

Grants to:

Date Range Start

Date Range End

  • Special Searches

    Product Type

    Media Coverage Type


Search Results

Grant number like: HD-51568-12

Permalink for this Search

Page size:
 1 items in 1 pages
Award Number Grant ProgramAward RecipientProject TitleAward PeriodApproved Award Total
Page size:
 1 items in 1 pages
HD-51568-12Digital Humanities: Digital Humanities Start-Up GrantsUniversity of Maryland, College ParkActive OCR: Tightening the Loop in Human Computing for OCR Correction6/1/2012 - 5/31/2014$41,906.00TravisRobertBrown   University of Maryland, College ParkCollege ParkMD20742-5141USA2012Interdisciplinary Studies, GeneralDigital Humanities Start-Up GrantsDigital Humanities419060419060

The development of a proof-of-concept correction tool to improve optical character recognition in humanities text collections.

We propose a proof-of-concept application that will experiment with the use of active learning and other iterative techniques for the correction of eighteenth-century texts provided by the HathiTrust Digital Library and the 2,231 ECCO text transcriptions released into the public domain by Gale and distributed by the Text Creation Partnership (TCP) and 18thConnect. In an application based on active learning or a similar approach, the user could identify dozens or hundreds of difficult characters that appear in the articles from that same time period, and the system would use this new knowledge to improve optical character recognition (OCR) across the entire corpus. A portion of our efforts will focus on the need to incentivize engagement in tasks of this type, whether they are traditionally crowdsourced or through a more active, iterative process like the one we propose. We intend to examine how explorations of a users' preferences can improve their engagement with corpora of materials.