Program

Preservation and Access: Advancing Knowledge: The IMLS/NEH Digital Partnership

Period of Performance

10/1/2007 - 8/31/2011

Funding Totals

$349,939.00 (approved)
$349,939.00 (awarded)


Scalable Named Entity Identification in Classical Studies

FAIN: PK-50022-07

Tufts University (Somerville, MA 02144-2401)
Gregory R. Crane (Project Director: March 2007 to August 2012)

Construction of a testbed of scholarly and cultural documents on the ancient world and the development of digital, open-source tools to enable researchers and librarians to utilize contextual materials available in text-based collections.

The Perseus Project and the Collections and Archives of Tufts University propose to develop infrastructure for finding references to particular people and places from classical antiquity in several ancient and modern languages in primary and secondary source collections. We will offer and publish open-source, stand alone services and Fedora repository disseminators for searching, browsing, and visualizing entities within the Tufts Digital Library. Under a creative commons license, we will publish knowledge sources such as: linguistic data to identify forms of the most common 60,000 proper classical names in seven languages; knowledge base of the 30,000 people and places most prominent in texts; indices associating c. 200,000 passages with particular entities and an association network of 500,000 tagged names for named entity identification systems; automatically generated index of classical people and places identified in a 1 billion-word testbed of both scholarly and general cultural documents.





Associated Products

Rethinking Critical Editions Of Fragmentary Texts By Ontologies (Conference Paper/Presentation)
Title: Rethinking Critical Editions Of Fragmentary Texts By Ontologies
Author: Matteo Romanello
Author: Monica Berti
Author: Federico Boschetti
Author: Alison Babeu
Author: Gregory Crane
Abstract: This paper discusses the main issues encountered in the design of domain ontology to represent ancient literary texts that survive only in fragments, i.e. through quotations embedded in other texts. The design approach presented in the paper combines a knowledge domain analysis conducted through semantic spaces with the integration of well established ontologies and the application of ontology design patterns. After briefly describing the specific meaning of “fragment” in a literary context, the paper gives insights into the main conceptual issues of the ontology design process. Lastly, it outlines the overall architecture of protocols, services and data repositories which is required to implement a digital edition of fragments based on the proposed ontology.
Date: 07/01/09
Primary URL: http://elpub.scix.net/data/works/att/158_elpub2009.content.pdf
Primary URL Description: Link to the full text of the published paper.
Secondary URL: http://hdl.handle.net/10427/70403
Secondary URL Description: Link to a deposited version of this paper in the Tufts Digital Library.
Conference Name: ELPUB 2009: 13th International Conference on Electronic Publishing: Rethinking Electronic Publishing: Innovation in Communication Paradigms and Technologies

When Printed Hypertexts Go Digital: Information Extraction from the Parsing of Indices (Conference Paper/Presentation)
Title: When Printed Hypertexts Go Digital: Information Extraction from the Parsing of Indices
Author: Matteo Romanello
Author: Monica Berti
Author: Alison Babeu
Author: Gregory Crane
Abstract: Modern critical editions of ancient works generally include manually created indices of other sources quoted in the text. Since indices can be considered as a form of domain specific language, the paper presents a parsing-based approach to the problem of extracting information from them to support the creation of a collection of fragmentary texts. This paper considers first the characteristics and structure of quotation indices and their importance when dealing with fragmentary texts. It then presents the results of applying a fuzzy parser to the OCR transcription of an index of quotations to extract information from potentially noisy input.
Date: 07/01/2009
Primary URL: http://dx.doi.org/10.1145/1557914.1557987
Primary URL Description: A link to the final published version of this paper in the ACM Digital Library.
Secondary URL: http://hdl.handle.net/10427/70404
Secondary URL Description: A link to an archived version of this paper in the Tufts Digital Library.
Conference Name: HT '09 Proceedings of the 20th ACM conference on Hypertext and hypermedia