Board of Trustees of the University of Illinois (Champaign, IL 61801-3620) William Underwood (Project Director: May 2019 to October 2022)
PR-268817-20
Research and Development
Preservation and Access
|
[White paper][Grant products]
Totals:
$73,122 (approved) $73,122 (awarded)
Grant period:
3/1/2020 – 5/31/2021
|
Broadening Access to Text Analysis by Describing Uncertainty
A Tier I project to study errors and paratextual
noise in optically transcribed digital library texts, and the consequences of
these errors on historical and humanistic conclusions measuring trends across
time.
The noise associated with
digital transcription has become an important obstacle to humanistic research.
While the errors in digital texts are easily observed, the downstream effects
of error on scholarship are far from clear. Consequential problems for the
humanities often spring less from the average level of error in a collection
than from the uneven distribution of noise across different periods, genres,
and social strata. Uncertainty about this problem undermines confidence in
research and discourages some scholars from using digital libraries at all. To
address these problems, we will 1) Create paired libraries of clean, manually
transcribed volumes and optically-transcribed versions of the same volumes,
with or without paratext. 2) Conduct parallel experiments in these corpora to
empirically measure the distortions affecting scholarship. 3) Construct a map
of error and share resources that help scholars estimate levels of uncertainty
in their work.
|