Coptic SCRIPTORIUM:A Corpus, Tools, and Methods for Corpus Linguistics and Computational Historical Research in Ancient Egypt
FAIN: HD-51907-14
University of the Pacific (Stockton, CA 95211-0110)
Caroline T. Schroeder (Project Director: September 2013 to May 2017)
Amir Zeldes (Co Project Director: June 2015 to May 2017)
The development of a user interface and language analysis tools to facilitate interdisciplinary, collaborative research and annotation of digitized Coptic texts.
Coptic, having evolved from the language of the hieroglyphs of the pharaonic era, represents the last phase of the Egyptian language and is pivotal for a wide range of disciplines, such as linguistics, biblical studies, the history of Christianity, Egyptology, and ancient history. Coptic SCRIPTORIUM provides the first open-source technologies for computational and digital research across the disciplines as applied to Egyptian texts. The project is developing a digitized corpus of Coptic texts available in multiple formats and visualizations (including TEI XML), tools to analyze and process the language (e.g., the first Coptic part-of-speech tagger), a database with search and visualization capabilities, and a collaborative platform for scholars to contribute texts and annotations and to conduct research. The technologies and corpus will function as a collaborative environment for digital research by any scholars working in Coptic.
Associated Products
Raiders of the Lost Corpus (Article)Title: Raiders of the Lost Corpus
Author: Caroline T. Schroeder
Author: Amir Zeldes
Abstract: Coptic represents the last phase of the Egyptian language and is pivotal for a wide range of disciplines, such as linguistics, biblical studies, the history of Christianity, Egyptology, and ancient history. It was also essential for "cracking the code" of the Egyptian hieroglyphs. Although digital humanities has been hailed as distinctly interdisciplinary, enabling new forms of knowledge by combining multiple forms of disciplinary investigation, technical obtacles exist for creating a resource useful to both linguists and historians, for example. The nature of the language (outside of the Indo-European family) also requires its own approach. This paper will present some of the challenges -- both digital and material -- in creating an online, open source platform with a database and tools for digital research in Coptic. It will also propose standards and methodologies to move forward through those challenges. This paper should be of interest not only to scholars in Coptic but also others working on what are traditionally considered more "marginal" language groups in the pre-modern world, and researchers working with corpora that have been removed from their original ancient or medieval repositories and fragmented or dispersed.
Year: 2016
Primary URL:
http://digitalhumanities.org/dhq/vol/10/2/000247/000247.htmlAccess Model: Open Access
Format: Journal
Periodical Title: Digital Humanities Quarterly
Computational Methods for Coptic: Developing and Using Part-of-Speech Tagging for Digital Scholarship in the Humanities (Article)Title: Computational Methods for Coptic: Developing and Using Part-of-Speech Tagging for Digital Scholarship in the Humanities
Author: Amir Zeldes
Author: Caroline T. Schroeder
Abstract: This article motivates and details the first implementation of a freely available part of speech tag set and tagger for Coptic. Coptic is the last phase of the Egyptian language family and a descendant of the hieroglyphs of ancient Egypt. Unlike classical Greek and Latin, few resources for digital and computational work have existed for ancient Egyptian language and literature until now. We evaluate our tag set in an inter-annotator agreement experiment and examine some of the difficulties in tagging Coptic data. Using an existing digital lexicon and a small training corpus taken from several genres of literary Sahidic Coptic in the first half of the first millennium, we evaluate the performance of a stochastic tagger applying a fine-grained and coarse-grained set of tags within and outside the domain of literary texts. Our results show that a relatively high accuracy of 94–95% correct automatic tag assignment can be reached for literary texts, with substantially worse performance on documentary papyrus data. We also present some preliminary applications of natural language processing to the study of genre, style, and authorship attribution in Coptic and discuss future directions in applying computational linguistics methods to the analysis of Coptic texts.
Year: 2016
Primary URL:
http://dsh.oxfordjournals.org/content/30/suppl_1/i164Access Model: Open Access
Format: Journal
Periodical Title: Digital Scholarship in the Humanities
Publisher: Oxford University Press Journals
Web service for reading, citing, accessing Coptic SCRIPTORIUM text corpora (Web Resources) (Web Resource)Title: Web service for reading, citing, accessing Coptic SCRIPTORIUM text corpora (Web Resources)
Author: Coptic SCRIPTORIUM
Abstract: Coptic SCRIPTORIUM provides Coptic texts for reading, analysis, and complex searches. The texts are citable and accessible through stable URNs, such as urn:cts:copticLit:shenoute.fox for Shenoute's work Not Because a Fox Barks. This application will provide the most recent version of our documents in the formats currently available for each text. If you know the URN for the material you seek, please enter it in the box. You may also find documents by using the filters on the menu.
If you wish to read Coptic texts, you can view individual documents online (in HTML) in various visualizations. We also provide links to our corpora in our search tool ANNIS, as well as data files in TEI XML, PAULA XML, and ANNIS formats. This web application provides the most recent version of the data. Previous versions are available on GitHub.
Year: 2015
Primary URL:
http://data.copticscriptorium.orgCoptic SCRIPTORIUM instance of ANNIS database (Database/Archive/Digital Edition)Title: Coptic SCRIPTORIUM instance of ANNIS database
Author: Coptic SCRIPTORIUM
Abstract: Coptic SCRIPTORIUM text corpora published in the ANNIS search and visualization tool. For querying (simple or complex) the corpora. This database is regularly updated.
Year: 2014
Primary URL:
https://corpling.uis.georgetown.edu/annis/scriptoriumAccess Model: Open Access
Duplicitous Diabolos: Parallel witness encoding in quantitative studies of Coptic manuscripts (Conference Paper/Presentation)Title: Duplicitous Diabolos: Parallel witness encoding in quantitative studies of Coptic manuscripts
Author: Amir Zeldes
Abstract: This paper briefly discusses markup, metadata and evaluation issues that arise when projects do not include a critical edition adjudicating different variants, but instead incorporate multiple, full diplomatic transcriptions. When used naively, such corpora will cause duplicate results that are hard to discern in quantitative studies, and in cases of incomplete, unexact or fragmentary parallel witnesses, substantially complicate the decision about what users actually want to have. Using a case study on Coptic manuscripts, the paper suggests that as a provisional strategy, documents should be partitioned as finely grained as necessary such that each section's parallel witness status is encoded, and that for each parallel set, it can be useful to define a redundancy metadatum which identifies the 'best' candidate for quantitative study among the available choices.
Date: 8/10/2016
Primary URL:
http://www.balisage.net/Proceedings/vol16/print/Zeldes01/BalisageVol16-Zeldes01.htmlConference Name: Symposium on Cultural Heritage Markup. Balisage Series on Markup Technologies
Shenoute in Code: Digitizing Coptic Cultural Heritage for Collaborative Online Research and Study (Article)Title: Shenoute in Code: Digitizing Coptic Cultural Heritage for Collaborative Online Research and Study
Author: Caroline T. Schroeder
Abstract: The preservation of the cultural heritage of the Coptic Orthodox Church is, to be completely blunt and honest, fragile. And this is due not only to the political instability in Egypt over the past few years, but to a very long and complicated history involving everything from linguistic drift and religious assimilation to colonialism and monetary greed. Increasingly, cultural heritage groups have been examining how digitization projects can contribute to cultural repatriation efforts. Collectively and collaboratively, the Coptic Studies scholarly community and Coptic Christian must engage the question of how and whether the digitization of cultural heritage material can address both contemporary and historical issues concerning Coptic Christian access to their own heritage. This essay is divided in the three parts. First, a presentation of my own Digital Humanities project, Coptic SCRIPTORIUM, provides a case study for the ways digitization can increase access to Coptic language and literature. Second, I explore some of the tensions between the digital and the material, the tensions involved in producing a digital collection based material archives created in the context of the colonial acquisition of Egyptian antiquities and their subsequent dispersal outside of Egypt. Finally, the third section poses questions and issues for further consideration, drawing on insights from work on indigenous North American cultural heritage collections.
Year: 2015
Primary URL:
https://www.academia.edu/27963729/Shenoute_in_Code_Digitizing_Coptic_Cultural_Heritage_for_Collaborative_Online_Research_and_StudyAccess Model: Subscription, available Open Access on author's site
Format: Journal
Periodical Title: Coptica
Digital Coptic 2 Symposium and Workshop (Conference/Institute/Seminar)Title: Digital Coptic 2 Symposium and Workshop
Abstract: Coptic SCRIPTORIUM hosted a second workshop and symposium on Digital Humanities and Coptic Studies. It took place March 12-13, 2015 at Georgetown University in Washington, DC. This event followed on the workshop in May 2013 at Humboldt University, Berlin. There were no registration fees to attend. Day 1 was a public symposium on Digital Humanities and Coptic Studies. Day 2 was a workshop on Coptic SCRIPTORIUM.
Date Range: March 2015
Location: Georgetown University
Primary URL:
http://copticscriptorium.org/workshop2015/index.html