Program

Digital Humanities: Digital Humanities Start-Up Grants

Period of Performance

4/1/2012 - 2/28/2014

Funding Totals

$49,835.00 (approved)
$49,835.00 (awarded)


Tesserae: A Search Engine for Allusion

FAIN: HD-51570-12

SUNY Research Foundation, Buffalo State College (Buffalo, NY 14222-1004)
Neil Coffee (Project Director: October 2011 to June 2014)

The early stage development of a computational tool to detect and analyze literary allusions, with an initial focus on Latin and ancient Greek.

The Tesserae Project is an interdisciplinary research effort employing computational methods to detect and analyze literary allusion (a form of text reuse) currently focusing on Latin and ancient Greek. The Project seeks funding to create a fully-functional, publicly available tool to detect similar phrases in two texts at rates that approach those of literary commentators. To this end, funding will support adding sensitivity to word meaning, phrase context, and sound similarity. Detection rate improvements will be measured against a set of 3000 parallel phrases previously graded for literary significance. A revised website will inform researchers of research results and new functions of the tool. The project team will give presentations and produce publications explaining the function, results, and theoretical consequences of the fully operational tool. This work is preliminary to an out-year Implementation Phase that will see the addition of English, French, Italian, and Spanish.





Associated Products

Intertextuality in the Digital Age (Article)
Title: Intertextuality in the Digital Age
Author: Coffee, Neil
Author: Koenig, J.-P.
Author: Poornima, Shakthi
Author: Forstall, Christopher
Author: Ossewaarde, Roelant
Author: Jacobson, Sarah
Abstract: This paper describes a new digital approach to intertextual study in- volving the creation of a free online tool for the automatic detection of parallel phrases. A test comparison of Vergil’s Aeneid and Lucan’s Civil War shows that the tool can identify a substantial number of meaningful intertexts, both previ- ously recorded and unrecorded. Analysis of these results demonstrates how au- tomatic detection can provide more comprehensive and accessible perspectives on intertextuality as an aggregate phenomenon. Identification of the language features necessary to detect intertexts also provides a path toward improved au- tomatic detection and more precise definitions of intertextuality.
Year: 2012
Access Model: subscription only
Format: Journal
Publisher: Transactions of the American Philological Association

The Tesserae Project: Intertextual Analysis of Latin Poetry (Article)
Title: The Tesserae Project: Intertextual Analysis of Latin Poetry
Author: Coffee, Neil
Author: Koenig, J.-P.
Author: Poornima, Shakthi
Author: Forstall, Christopher W.
Author: Ossewaarde, Roelant
Author: Jacobson, Sarah L.
Abstract: Tesserae is a web-based tool for automatically detecting allusions in Latin poetry. Although still in the start-up phase, it already is capable of identifying significant numbers of known allusions, as well as similar numbers of allusions previously unnoticed by scholars. In this article, we use the tool to examine allusions to Vergil’s Aeneid in the first book of Lucan’s Civil War. Approximately 3,000 lin- guistic parallels returned by the program were compared with a list of known allusions drawn from commentaries. Each was examined individually and graded for its literary significance, in order to benchmark the program’s performance. All allusions from the program and commentaries were then pooled in order to examine broad patterns in Lucan’s allusive techniques which were largely unapproachable without digital methods. Although Lucan draws relatively con- stantly from Vergil’s generic language in order to maintain the epic idiom, this baseline is punctuated by clusters of pointed allusions, in which Lucan frequently subverts Vergil’s original meaning. These clusters not only attend the most sig- nificant characters and events but also play a role in structuring scene transitions. Work is under way to incorporate the ability to match on word meaning, phrase context, as well as metrical and phonological features into future versions of the program.
Year: 2012
Primary URL: http://llc.oxfordjournals.org/content/early/2012/07/20/llc.fqs033.abstract http://llc.oxfordjournals.org/content/28/2/221
Access Model: subscription only
Format: Journal
Publisher: Literary and Linguistic Computing

Modeling the Scholars: Detecting Intertextuality through Enhanced Word-Level N-Gram Matching (Article)
Title: Modeling the Scholars: Detecting Intertextuality through Enhanced Word-Level N-Gram Matching
Author: Forstall, Christopher W.
Author: Coffee, Neil
Author: Buck, Thomas
Author: Roache, Katherine
Author: Jacobson, Sarah
Abstract: The study of intertextuality, or how authors make artistic use of other texts in their works, has a long tradition, and has in recent years benefited from a variety of applications of digital methods. This article describes an approach for detect- ing the sorts of intertexts that literary scholars have found most meaningful, as embodied in the free Tesserae website http://tesserae.caset.buffalo.edu/. Tests of Tesserae Versions 1 and 2 showed that word-level n-gram matching could recall a majority of parallels identified by scholarly commentators in a benchmark set. But these versions lacked precision, so that the meaningful parallels could be found only among long lists of those that were not meaningful. The Version 3 search described here adds a second stage scoring system that sorts the found parallels by a formula accounting for word frequency and phrase density. Testing against a benchmark set of intertexts in Latin epic poetry shows that the scoring system overall succeeds in ranking parallels of greater significance more highly, allowing site users to find meaningful parallels more quickly. Users can also choose to adjust both recall and precision by focusing only on results above given score levels. As a theoretical matter, these tests establish that lemma iden- tity, word frequency, and phrase density are important constituents of what make a phrase parallel a meaningful intertext.
Year: 2014
Access Model: subscription
Format: Journal
Periodical Title: Literary and Linguistic Computing
Publisher: Literary and Linguistic Computing