Program

Digital Humanities: Digital Humanities Advancement Grants

Period of Performance

9/1/2018 - 2/28/2021

Funding Totals

$111,597.00 (approved)
$111,597.00 (awarded)


Developing the Data Set of Nineteenth-Century Knowledge

FAIN: HAA-261228-18

Temple University (Philadelphia, PA 19122-6003)
Peter Logan (Project Director: January 2018 to May 2022)
Jane Greenberg (Co Project Director: June 2018 to May 2022)

Participating institutions:
Temple University (Philadelphia, PA) - Applicant/Recipient
Drexel University (Philadelphia, PA) - Participating Institution

A project to study the structure and transformation of nineteenth-century knowledge via computational analysis of several editions of the Encyclopedia Britannica from 1788 to 1911.

This project draws on historic editions of the Encyclopedia Britannica, a vital resource of knowledge to build one of the most extensive, open, digital collections available today for studying the structure of nineteenth-century knowledge and its transformation. The most comprehensive representation extant of what constituted official knowledge at the time, they also demonstrate changes in the nature of knowledge in the English-speaking world. The project creates the first accurate textual data for this corpus and extends its usability by applying innovative methods to automatically generate metadata for each of the 100,000 entries. Each entry will be tagged with both current and historical subject categories. At the end of the grant period, all of the data will be made freely available, and a series of experiments will be conducted to identify the feasibility of tracking concept drift across time within the corpus.





Associated Products

Nineteenth-Century Knowledge Project (Conference Paper/Presentation)
Title: Nineteenth-Century Knowledge Project
Author: Peter M. Logan
Abstract: This talk outlines the progress and problems of a three-year-old project to build an extensive, open, digital collection for studying the structure of nineteenth-century knowledge, based on historic editions of the Encyclopedia Britannica. Today, we readily recognize a pervasive Eurocentrism in these entries, among other flaws. But at the time, the Britannica editions were the most authoritative comprehensive representation in the English-speaking world of knowledge as a whole. Knowledge has changed since that time, and it changed during the publication of this material, from 1790-1911. This data set documents those changes. The goal of this project is identify patterns in the transformation of knowledge by mining the final data set. All of these works are available on the web, but their textual data is too inaccurate for valid text mining. This project thus creates the first accurate TEI edition of this valuable resource. The full corpus consists of 100,000 articles derived from 80,000 print pages. The TEI will be supplemented with metadata using Named Entity Recognition. The Metadata Research Center at Drexel University will further enrich the data by adding subject metadata from both current and historical vocabularies, using an automated recognition program they developed. When complete, all individual entries will be made freely available through the Oxford Text Archive. It will be uploaded for other researchers in bulk form to the CORE Repository of the Humanities Commons.
Date: 9/10/18
Primary URL: http://https://tei2018.dhii.asia/AbstractsBook_TEI_0907.pdf
Conference Name: TEI2018 (Text Encoding Initiative Consortium)

Nineteenth-Century Knowledge Project (Web Resource)
Title: Nineteenth-Century Knowledge Project
Author: Peter Melville Logan
Abstract: Informational guide to the Nineteenth-Century Knowledge Project.
Year: 2019
Primary URL: https://tu-plogan.github.io/
Primary URL Description: Website

Knowledge Representation, History, and Historical Metadata (Conference Paper/Presentation)
Title: Knowledge Representation, History, and Historical Metadata
Author: Logan, Peter Melville
Author: Greenberg, Jane
Author: Grabus, Samantha
Abstract: While historical documents are common sources of digital humanities collections, many digital projects still do not use controlled terminologies to represent their content and aid search, discoverability, and use. There are often pragmatic reasons for this. The art of identifying appropriate descriptive terms is a valuable skill. Unfortunately, too few DH projects have access to information specialists who can index their documents for them. We are addressing these challenges with the Nineteenth-Century Knowledge Project, an ongoing initiative to create the first standards-compliant digital version of historical editions of the Encyclopedia Britannica. The sheer scale of the project precludes human indexing, because it would take an estimated six-to-eight years to read through all of the entries. Instead, we use an innovative method to add automatically generated content metadata using linked open terminologies and the HIVE-approach. This method has allowed us to experiment on the optimal controlled vocabulary to use for indexing historical documents. Our presentation will focus on the results of this experiment.
Date: 7/11/2019
Primary URL: https://dev.clariah.nl/files/dh2019/boa/0889.html
Primary URL Description: Longer abstract in the Book of Abstracts produced for the conference.
Conference Name: Digital Humanities 2019, sponsored by the Alliance for Digital Humanities Organizations

Representing Aboutness: Automatically Indexing 19th- Century Encyclopedia Britannica Entries (Article)
Title: Representing Aboutness: Automatically Indexing 19th- Century Encyclopedia Britannica Entries
Author: Grabus, Sam
Author: Greenberg, Jane
Author: Logan, Peter
Author: Boone, Joan
Abstract: Representing aboutness is a challenge for humanities documents, given the linguistic indeterminacy of the text. The challenge is even greater when applying automatic indexing to historical documents for a multidisciplinary collection, such as encyclopedias. The research presented in this paper explores this challenge with an automatic indexing comparative study examining topic relevance. The setting is the NEH-funded 19th-Century Knowledge Project, where researchers in the Digital Scholarship Center, Temple University, and the Metadata Research Center, Drexel University, are investigating the best way to index entries across four historical editions of the Encyclopedia Britannica (3rd, 7th, 9th, and 11th editions). Individual encyclopedia entry entries were processed using the Helping Interdisciplinary Vocabulary Engineering (HIVE) system, a linked-data, automatic indexing terminology application that uses controlled vocabularies. Comparative topic relevance evaluation was performed for three separate keyword extraction algorithms: RAKE, Maui, and Kea++. Results show that RAKE performed the best, with an average of 67% precision for RAKE, and 28% precision for both Maui and Kea++. Additionally, the highestranked HIVE results with both RAKE and Kea++ demonstrated relevance across all sample entries, while Maui’s highest-ranked results returned zero relevant terms. This paper reports on background information, research objectives and methods, results, and future research prospects for further optimization of RAKE’s algorithm parameters to accommodate for encyclopedia entries of different lengths, and evaluating the indexing impact of correcting the historical Long S.
Year: 2019
Primary URL: https://journals.lib.washington.edu/index.php/nasko/article/view/15635/13017
Primary URL Description: PDF file of the article, Sam Grabus, Jane Greenberg, Peter Logan, and Joan Boone. 2019. Representing Aboutness: Automatically Indexing 19th-Century Encyclopedia Britannica Entries. NASKO, Vol. 7. pp. 138-148.
Access Model: Open access
Format: Journal
Periodical Title: Proceedings from North American Symposium on Knowledge Organization
Publisher: NASKO

Indexing the Data Set of 19th-Century Knowledge (Conference Paper/Presentation)
Title: Indexing the Data Set of 19th-Century Knowledge
Author: Pascua, Sonia
Author: Grabus, Sam
Abstract: Presentation on automatically generating subject metadata for entries in the data set of nineteenth-century knowledge and showing the importance of using historically-specific controlled vocabularies.
Date: 01/24/2020
Conference Name: LEADS-4-NDP Forum at Drexel University

Knowledge Representation, History, and Historical Metadata (Conference Paper/Presentation)
Title: Knowledge Representation, History, and Historical Metadata
Author: Logan, Peter M.
Author: Greenberg, Jane
Abstract: Describes the conceptual difficulties in tracking knowledge over time, beginning with the selection of a data source, and concluding with the selection of an appropriate controlled vocabulary to index the data set.
Date: 10/11/2019
Conference Name: Digital Hermeneutics Conference, German Historical Institute

Transforming Nineteenth-Century Knowledge (Conference Paper/Presentation)
Title: Transforming Nineteenth-Century Knowledge
Author: Logan, Peter M.
Abstract: What are the theoretical difficulties involved in analyzing how knowledge changed during the nineteenth-century? And how is the Nineteenth-Century Knowledge Project addressing them? This talk explores the problems in identifying a source text, processing the materials, and the built-in biases of different controlled vocabularies used to index the data set.
Date: 11/15/2020
Conference Name: Victorian Data, University of Virginia

Toward A Metadata Activity Matrix: Conceptualizing and Grounding the Research Life-cycle and Metadata Connections (Conference Paper/Presentation)
Title: Toward A Metadata Activity Matrix: Conceptualizing and Grounding the Research Life-cycle and Metadata Connections
Author: Greenberg, Jane
Author: Pascua, Sonia
Author: Li, Kai
Abstract: How metadata is involved in the data-driven scientific practice is an important means to evaluate its values, the goal underlying the concept of metadata capital. In this work-in-progress paper, we propose a research project aiming to examine how metadata activities are embedded in research and data activities, as represented in research and data lifecycle models. As a first step of this project, we identify research and data lifecycle models and that best fit the scope of this project and offer some higher-level mapping among research activities, data processes, and metadata activities. This work offers a solid framework for the next step of this project to better understand the real-world values of metadata works and outputs.
Date: 09/23/2019
Primary URL: https://www.dublincore.org/conferences/2019/abstracts/#13
Primary URL Description: abstract
Conference Name: International Conference on Dublin Core and Metadata Applications, in Seoul, South Korea

Evaluating the Impact of the Long-S upon 18th-Century Encyclopedia Britannica Automatic Subject Metadata Generation Results (Article)
Title: Evaluating the Impact of the Long-S upon 18th-Century Encyclopedia Britannica Automatic Subject Metadata Generation Results
Author: Grabus, Sam
Abstract: This research compares automatic subject metadata generation when the pre-1800s Long-S character is corrected to a standard "s". The test environment includes entries from the third edition of the Encyclopedia Britannica, and the HIVE automatic subject indexing tool. A comparative study of metadata generated before and after correction of the Long-S demonstrated an average of 26.51 percent potentially relevant terms per entry omitted from results if the Long-S is not corrected. Results confirm that correcting the Long-S increases the availability of terms that can be used for creating quality metadata records. A relationship is also demonstrated between shorter entries and an increase in omitted terms when the Long-S is not corrected.
Year: 2020
Primary URL: http://https://ejournals.bc.edu/index.php/ital/article/view/12235/10093
Primary URL Description: Downloadable PDF on the journal's website
Access Model: Open access
Format: Journal
Periodical Title: Information Technology and Libraries (ITAL)
Publisher: American Library Association

1910 Library of Congress Subject Headings (Database/Archive/Digital Edition)
Title: 1910 Library of Congress Subject Headings
Author: Jane Greenberg
Author: Joan Boone
Author: Sam Grabus
Author: Peter M Logan
Abstract: A SKOS-enabled database version of the first edition of the Library of Congress Subject Headings (1910-1914)
Year: 2020
Primary URL: https://hive2.cci.drexel.edu/
Primary URL Description: HIVE Vocabulary Server
Access Model: Open access

Ephraim Chambers Cyclopaedia (Database/Archive/Digital Edition)
Title: Ephraim Chambers Cyclopaedia
Author: Jane Greenberg
Author: Joan Boone
Author: Sam Grabus
Author: Peter M Logan
Abstract: A SKOS-enabled database of subject terms used in Eprhaim Chambers's Cyclopaedia (1728).
Year: 2020
Primary URL: https://hive2.cci.drexel.edu/
Primary URL Description: HIVE vocabulary server at Drexel University
Access Model: Open access