Search Criteria


Key Word Search by:

Organization Type

State or Jurisdiction

Congressional District


Division or Office

Grants to:

Date Range Start

Date Range End

  • Special Searches

    Product Type

    Media Coverage Type


Search Results

Grant number like: HAA-263837-19

Permalink for this Search

Page size:
 1 items in 1 pages
Award Number Grant ProgramAward RecipientProject TitleAward PeriodApproved Award Total
Page size:
 1 items in 1 pages
HAA-263837-19Digital Humanities: Digital Humanities Advancement GrantsNortheastern UniversityImproving Optical Character Recognition and Tracking Reader Annotations in Printed Books by Collating and Transcribing Multiple Exemplars1/1/2019 - 6/30/2021$100,000.00David Smith   Northeastern UniversityBostonMA02115-5005USA2018Computational LinguisticsDigital Humanities Advancement GrantsDigital Humanities100000099223.60

Further research in enhanced optical character recognition techniques for historical print books and automatic discoverability of handwritten marginalia drawing upon the collections of the Internet Archive.

Most past digitization projects have focused on transcribing documents individually. With the availability of library-scale digital collections, we propose a Digital Humanities Advancement Grant (Level II) to develop computational image and language models to discover multiple copies and editions of similar texts and to correct each text using these comparable witnesses. We provide evidence that this collational transcription system can significantly improve optical character recognition on historical books. We also propose to use these collated editions to discover annotated passages in large digitized book collections. This approach will therefore not only mitigate the errors that reader annotations introduce into the OCR process but will also produce the first automatically generated database of handwritten annotations, Ichneumon. Methods and software developed by this project will thus benefit future research on automatic collation, book history, and historical reading practices.