Broadening Access to Text Analysis by Describing Uncertainty
FAIN: PR-268817-20
Board of Trustees of the University of Illinois (Champaign, IL 61801-3620)
William Ted Underwood (Project Director: May 2019 to October 2022)
A Tier I project to study errors and paratextual
noise in optically transcribed digital library texts, and the consequences of
these errors on historical and humanistic conclusions measuring trends across
time.
The noise associated with
digital transcription has become an important obstacle to humanistic research.
While the errors in digital texts are easily observed, the downstream effects
of error on scholarship are far from clear. Consequential problems for the
humanities often spring less from the average level of error in a collection
than from the uneven distribution of noise across different periods, genres,
and social strata. Uncertainty about this problem undermines confidence in
research and discourages some scholars from using digital libraries at all. To
address these problems, we will 1) Create paired libraries of clean, manually
transcribed volumes and optically-transcribed versions of the same volumes,
with or without paratext. 2) Conduct parallel experiments in these corpora to
empirically measure the distortions affecting scholarship. 3) Construct a map
of error and share resources that help scholars estimate levels of uncertainty
in their work.
Associated Products
Code used in "Broadening Access To Text Analysis by Describing Uncertainty" (Database/Archive/Digital Edition)Title: Code used in "Broadening Access To Text Analysis by Describing Uncertainty"
Author: Underwood, Ted
Author: Lundy, Morgan
Author: Shang, Wenyi
Abstract: This project maps the consequences of error and paratextual noise for real-world humanistic questions across different periods (1700 to the present) and analytical methods (including emerging neural approaches).
Year: 2021
Primary URL:
https://github.com/tedunderwood/nehuncertaintyPrimary URL Description: This is a GitHub repository. A DOI will be created within two weeks, when the project is in a final stable form.
Access Model: open access