Large-Scale Learning and the Automatic Analysis of Historical Texts
FAIN: HH-50001-09
Tufts University (Somerville, MA 02144-2401)
Gregory R. Crane (Project Director: July 2008 to September 2011)
Peter Losin (Co Project Director: November 2020 to November 2020)
Consultation with staff from the National Energy Research Scientific Computing Center to investigate the development of dynamic lexica for Latin and ancient Greek.
The Perseus Project recently received funding from the National Endowment for the Humanities to investigate the automatic construction of "dynamic lexica" for historical languages (specifically Latin and Greek) as the output of automatic processes based on both supervised and unsupervised learning techniques. We are seeking NEH/NERSC supercomputing support and training for two reasons: 1.) to let us significantly reduce our training time for two known automatic processes already under development (automatic parsing and parallel text alignment), in order to allow us to be more agile in our future development and optimization; and 2.) to let us begin experimenting with approaches not available to us without the use of such resources (such as a hybrid approach to word sense disambiguation involving labeled sense induction and clustering). In this we hope not only to improve upon our existing methods but also to investigate the possibility for innovative new work as well.
Associated Products
The Dynamic Lexicon (Database/Archive/Digital Edition)Title: The Dynamic Lexicon
Author: David Bamman
Abstract: The published form of the Dynamic Lexicon includes automatically generated lexical entries along with the underlying intermediate analysis used to generate them (including word-level alignments between source texts and their translations, and automatic morphological tagging and syntactic analysis for the Greek and Latin originals).
Year: 2011
Primary URL:
http://nlp.perseus.tufts.edu/lexicon/Primary URL Description: A link to a description of the lexicon and downloads of the data
Access Model: All data is licensed under a Creative Commons Attribution-Sharealike license.