Program

Preservation and Access: Humanities Collections and Reference Resources

Period of Performance

5/1/2014 - 8/31/2016

Funding Totals

$40,000.00 (approved)
$39,985.59 (awarded)


Coptic SCRIPTORIUM: Digitizing a Corpus for Interdisciplinary Research in Ancient Egyptian

FAIN: PW-51672-14

University of the Pacific (Stockton, CA 95211-0110)
Caroline T. Schroeder (Project Director: July 2013 to December 2016)
Amir Zeldes (Co Project Director: June 2015 to December 2016)

Planning for the creation of a digitized corpus of Coptic texts of importance to scholarship in biblical studies, early Christian history, and linguistics. The project would develop a pilot text corpus and establish technical standards to ensure interoperability of the corpus with other digital projects on the ancient world.

Coptic, having evolved from the language of the hieroglyphs of the pharaonic era, represents the last phase of the Egyptian language and is pivotal for a wide range of disciplines, such as linguistics, biblical studies, the history of Christianity, Egyptology, and ancient history. The Coptic language has proven essential for the decipherment and continued study of Ancient Egyptian and is of major interest for Afro-Asiatic linguistics and Coptic linguistics in its own right. Coptic manuscripts are sources for biblical and extra-biblical texts and document ancient and Christian history. Coptic SCRIPTORIUM will advance knowledge in these fields by increasing access to now largely inaccessible texts of historical, religious, and linguistic significance. The project designs digital tools and methodologies and applies them to literary texts, creating a rich open-access corpus.





Associated Products

Raiders of the Lost Corpus (Article)
Title: Raiders of the Lost Corpus
Author: Caroline T. Schroeder
Author: Amir Zeldes
Abstract: Coptic represents the last phase of the Egyptian language and is pivotal for a wide range of disciplines, such as linguistics, biblical studies, the history of Christianity, Egyptology, and ancient history. It was also essential for "cracking the code" of the Egyptian hieroglyphs. Although digital humanities has been hailed as distinctly interdisciplinary, enabling new forms of knowledge by combining multiple forms of disciplinary investigation, technical obtacles exist for creating a resource useful to both linguists and historians, for example. The nature of the language (outside of the Indo-European family) also requires its own approach. This paper will present some of the challenges -- both digital and material -- in creating an online, open source platform with a database and tools for digital research in Coptic. It will also propose standards and methodologies to move forward through those challenges. This paper should be of interest not only to scholars in Coptic but also others working on what are traditionally considered more "marginal" language groups in the pre-modern world, and researchers working with corpora that have been removed from their original ancient or medieval repositories and fragmented or dispersed.
Year: 2016
Primary URL: http://digitalhumanities.org/dhq/vol/10/2/000247/000247.html
Access Model: Open Access
Format: Journal
Periodical Title: Digital Humanities Quarterly

Computational Methods for Coptic: Developing and Using Part-of-Speech Tagging for Digital Scholarship in the Humanities (Article)
Title: Computational Methods for Coptic: Developing and Using Part-of-Speech Tagging for Digital Scholarship in the Humanities
Author: Amir Zeldes
Author: Caroline T. Schroeder
Abstract: This article motivates and details the first implementation of a freely available part of speech tag set and tagger for Coptic. Coptic is the last phase of the Egyptian language family and a descendant of the hieroglyphs of ancient Egypt. Unlike classical Greek and Latin, few resources for digital and computational work have existed for ancient Egyptian language and literature until now. We evaluate our tag set in an inter-annotator agreement experiment and examine some of the difficulties in tagging Coptic data. Using an existing digital lexicon and a small training corpus taken from several genres of literary Sahidic Coptic in the first half of the first millennium, we evaluate the performance of a stochastic tagger applying a fine-grained and coarse-grained set of tags within and outside the domain of literary texts. Our results show that a relatively high accuracy of 94–95% correct automatic tag assignment can be reached for literary texts, with substantially worse performance on documentary papyrus data. We also present some preliminary applications of natural language processing to the study of genre, style, and authorship attribution in Coptic and discuss future directions in applying computational linguistics methods to the analysis of Coptic texts.
Year: 2016
Primary URL: http://dsh.oxfordjournals.org/content/30/suppl_1/i164
Format: Journal
Periodical Title: Digital Scholarship in the Humanities
Publisher: Oxford University Press Journals

Web service for reading, citing, accessing Coptic SCRIPTORIUM text corpora (Web Resource)
Title: Web service for reading, citing, accessing Coptic SCRIPTORIUM text corpora
Author: Coptic SCRIPTORIUM
Abstract: Coptic SCRIPTORIUM provides Coptic texts for reading, analysis, and complex searches. The texts are citable and accessible through stable URNs, such as urn:cts:copticLit:shenoute.fox for Shenoute's work Not Because a Fox Barks. This application will provide the most recent version of our documents in the formats currently available for each text. If you know the URN for the material you seek, please enter it in the box. You may also find documents by using the filters on the menu. If you wish to read Coptic texts, you can view individual documents online (in HTML) in various visualizations. We also provide links to our corpora in our search tool ANNIS, as well as data files in TEI XML, PAULA XML, and ANNIS formats. This web application provides the most recent version of the data. Previous versions are available on GitHub.
Year: 2015
Primary URL: http://data.copticscriptorium.org

Coptic SCRIPTORIUM instance of ANNIS database (Database/Archive/Digital Edition)
Title: Coptic SCRIPTORIUM instance of ANNIS database
Author: Coptic SCRIPTORIUM
Abstract: Coptic SCRIPTORIUM text corpora published in the ANNIS search and visualization tool. For querying (simple or complex) the corpora. This database is regularly updated.
Year: 2014
Primary URL: https://corpling.uis.georgetown.edu/annis/scriptorium
Access Model: Open Access

Duplicitous Diabolos: Parallel witness encoding in quantitative studies of Coptic manuscripts (Conference Paper/Presentation)
Title: Duplicitous Diabolos: Parallel witness encoding in quantitative studies of Coptic manuscripts
Author: Amir Zeldes
Abstract: This paper briefly discusses markup, metadata and evaluation issues that arise when projects do not include a critical edition adjudicating different variants, but instead incorporate multiple, full diplomatic transcriptions. When used naively, such corpora will cause duplicate results that are hard to discern in quantitative studies, and in cases of incomplete, unexact or fragmentary parallel witnesses, substantially complicate the decision about what users actually want to have. Using a case study on Coptic manuscripts, the paper suggests that as a provisional strategy, documents should be partitioned as finely grained as necessary such that each section's parallel witness status is encoded, and that for each parallel set, it can be useful to define a redundancy metadatum which identifies the 'best' candidate for quantitative study among the available choices.
Date: 8/10/2015
Primary URL: http://www.balisage.net/Proceedings/vol16/print/Zeldes01/BalisageVol16-Zeldes01.html
Conference Name: Symposium on Cultural Heritage Markup. Balisage Series on Markup Technologies

Shenoute in Code: Digitizing Coptic Cultural Heritage for Collaborative Online Research and Study (Article)
Title: Shenoute in Code: Digitizing Coptic Cultural Heritage for Collaborative Online Research and Study
Author: Caroline T. Schroeder
Abstract: The preservation of the cultural heritage of the Coptic Orthodox Church is, to be completely blunt and honest, fragile. And this is due not only to the political instability in Egypt over the past few years, but to a very long and complicated history involving everything from linguistic drift and religious assimilation to colonialism and monetary greed. Increasingly, cultural heritage groups have been examining how digitization projects can contribute to cultural repatriation efforts. Collectively and collaboratively, the Coptic Studies scholarly community and Coptic Christian must engage the question of how and whether the digitization of cultural heritage material can address both contemporary and historical issues concerning Coptic Christian access to their own heritage. This essay is divided in the three parts. First, a presentation of my own Digital Humanities project, Coptic SCRIPTORIUM, provides a case study for the ways digitization can increase access to Coptic language and literature. Second, I explore some of the tensions between the digital and the material, the tensions involved in producing a digital collection based material archives created in the context of the colonial acquisition of Egyptian antiquities and their subsequent dispersal outside of Egypt. Finally, the third section poses questions and issues for further consideration, drawing on insights from work on indigenous North American cultural heritage collections.
Year: 2015
Primary URL: https://www.academia.edu/27963729/Shenoute_in_Code_Digitizing_Coptic_Cultural_Heritage_for_Collaborative_Online_Research_and_Study
Access Model: Subscription, available Open Access on author's site
Format: Journal
Periodical Title: Coptica

Digital Coptic 2 Symposium and Workshop (Conference/Institute/Seminar)
Title: Digital Coptic 2 Symposium and Workshop
Abstract: Coptic SCRIPTORIUM hosted a second workshop and symposium on Digital Humanities and Coptic Studies. It took place March 12-13, 2015 at Georgetown University in Washington, DC. This event followed on the workshop in May 2013 at Humboldt University, Berlin. There were no registration fees to attend. Day 1 was a public symposium on Digital Humanities and Coptic Studies. Day 2 was a workshop on Coptic SCRIPTORIUM.
Date Range: March 2015
Location: Georgetown University
Primary URL: http://copticscriptorium.org/workshop2015/index.html