Developing automated text-image alignment to enhance access to heritage manuscript images
FAIN: PR-50178-13
Sanskrit Library (Providence, RI 02906-4629)
Peter M. Scharf (Project Director: May 2012 to October 2016)
Development of software to produce the partial transcription of Sanskrit manuscripts for human validation. The project would also integrate the manuscripts in a digital library to extend the use of lexical resources and linguistic tools for full-text searching and analysis.
The proposed project aims to enhance access to primary cultural heritage materials of India by developing human-validated automated text-image alignment techniques in order to provide access to digital images via related machine-readable texts, lexical resources, linguistic software, and a sophisticated search interface. Digital images of manuscripts written in Sanskrit, one of the world's richest culture-bearing languages, will be integrated into a digital library of Sanskrit. This integration will allow generalized information extraction and search techniques to reach enormous reservoirs of Sanskrit manuscripts. Integrating primary cultural materials with the Sanskrit Library will thus enable broad use of Indic collections for research and education where Indic materials are grossly underrepresented. The result will be extendable to the collections of Sanskrit manuscripts housed in American libraries and throughout the world and to archives of scanned Sanskrit books.