Program

Preservation and Access: Research and Development

Period of Performance

3/1/2020 - 5/31/2021

Funding Totals

$73,122.00 (approved)
$73,122.00 (awarded)


Broadening Access to Text Analysis by Describing Uncertainty

FAIN: PR-268817-20

Board of Trustees of the University of Illinois (Champaign, IL 61801-3620)
William Underwood (Project Director: May 2019 to October 2022)

A Tier I project to study errors and paratextual noise in optically transcribed digital library texts, and the consequences of these errors on historical and humanistic conclusions measuring trends across time.

The noise associated with digital transcription has become an important obstacle to humanistic research. While the errors in digital texts are easily observed, the downstream effects of error on scholarship are far from clear. Consequential problems for the humanities often spring less from the average level of error in a collection than from the uneven distribution of noise across different periods, genres, and social strata. Uncertainty about this problem undermines confidence in research and discourages some scholars from using digital libraries at all. To address these problems, we will 1) Create paired libraries of clean, manually transcribed volumes and optically-transcribed versions of the same volumes, with or without paratext. 2) Conduct parallel experiments in these corpora to empirically measure the distortions affecting scholarship. 3) Construct a map of error and share resources that help scholars estimate levels of uncertainty in their work.





Associated Products

Code used in "Broadening Access To Text Analysis by Describing Uncertainty" (Database/Archive/Digital Edition)
Title: Code used in "Broadening Access To Text Analysis by Describing Uncertainty"
Author: Underwood, Ted
Author: Lundy, Morgan
Author: Shang, Wenyi
Abstract: This project maps the consequences of error and paratextual noise for real-world humanistic questions across different periods (1700 to the present) and analytical methods (including emerging neural approaches).
Year: 2021
Primary URL: https://github.com/tedunderwood/nehuncertainty
Primary URL Description: This is a GitHub repository. A DOI will be created within two weeks, when the project is in a final stable form.
Access Model: open access