Computational tools for diachronic and cross-cultural study of literature: multilingual stylometry and phylogenetic profiling
FAIN: HAA-271822-20
University of Texas at Austin (Austin, TX 78712-0100)
Pramit Chaudhuri (Project Director: January 2020 to present)
Joseph Dexter (Co Project Director: July 2020 to present)
The extension of a textual analysis tool kit for stylistic and authorship studies that was originally developed for Latin and ancient Greek to now include capabilities for working with Old English and Bengali resources.
This project, for which we are seeking a Level III Digital Advancement Grant, will expand a suite of tools with which traditionally-trained humanists can analyze literary texts in a quantitative manner. The tools are designed with an important class of literary problems in mind, exemplified by the identification of stylistic effects and the individuating of works within generic traditions. We tackle these problems using two complementary approaches: stylometry augmented by machine learning and phylogenetic profiling. We will leverage our previous research in literary stylistics for the creation of a user-friendly multilingual stylometry toolkit and make enhancements to our existing methods for evolutionary analysis of literature, including automation of key steps. The tools will be tested on a set of problems at the intersection of literary criticism and big data across multiple languages, including Latin, ancient Greek, Old English, and Bengali.
Associated Products
Profiling of Intertextuality in Latin Literature Using Word Embeddings (Article)Title: Profiling of Intertextuality in Latin Literature Using Word Embeddings
Author: Patrick Burns
Author: James Brofos
Author: Kyle Li
Author: Pramit Chaudhuri
Author: Joseph Dexter
Abstract: Identifying intertextual relationships between authors is of central importance to the study of literature. We report an empirical analysis of intertextuality in classical Latin literature using word embedding models. To enable quantitative evaluation of intertextual search methods, we curate a new dataset of 945 known parallels drawn from traditional scholarship on Latin epic poetry. We train an optimized word2vec model on a large corpus of lemmatized Latin, which achieves state-of-the-art performance for synonym detection and outperforms a widely used lexical method for intertextual search. We then demonstrate that training embeddings on very small corpora can capture salient aspects of literary style and apply this approach to replicate a previous intertextual study of the Roman historian Livy, which relied on hand-crafted stylometric features. Our results advance the development of core computational resources for a major premodern language and highlight a productive avenue for cross-disciplinary collaboration between the study of literature and NLP.
Year: 2021
Primary URL:
https://aclanthology.org/2021.naacl-main.389/Access Model: Open Access
Format: Journal
Periodical Title: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Publisher: Association for Computational Linguistics
Semantic Intertextual Search with Latin Word-Embedding Models (Public Lecture or Presentation)Title: Semantic Intertextual Search with Latin Word-Embedding Models
Abstract: This paper describes optimization of a computational method for representing semantic information in Latin texts and application of the method to identifying intertextual relationships of literary significance. The distributional hypothesis in linguistics holds that the meaning of a word can be inferred from the contexts in which it is used (Firth); the development of effective methods for computing distributional representations known as word embeddings has revolutionized natural language processing research over the past decade (Mikolov et al., Devlin et al.). We optimize a word embedding model for Latin and use that model to improve existing methods for intertextual search through incorporation of semantic matching...
Author: Joseph Dexter
Author: Pramit Chaudhuri
Date: 01/10/2021
Location: 152nd Annual Meeting of the Society for Classical Studies
Primary URL:
https://classicalstudies.org/annual-meeting/152/abstract/semantic-intertextual-search-latin-word-embedding-modelsSenecan Trimeter and Humanist Tragedy (Article)Title: Senecan Trimeter and Humanist Tragedy
Author: Fedchin, A.
Author: Burns, P.
Author: P. Chaudhuri
Author: J. Dexter
Abstract: The lack of extant contemporary comparanda obscures the workings of iambic trimeter in Senecan tragedy. This article offers a quantitative analysis of the reception of Senecan trimeter in four early works of Italian Humanist Tragedy, which illuminates the creative possibilities afforded by the basic structure of the meter and identifies specific features important to questions of style and semantics. Our analysis demonstrates, among other things, that both Seneca and the Humanist tragedians use clusters of resolution in conjunction with antilabe as a literary device to convey high emotion.
Year: 2022
Format: Journal
Periodical Title: American Journal of Philology