Expanding Coptic Digital Online Collections
FAIN: PW-290519-23
University of Oklahoma, Norman (Norman, OK 73019-3003)
Caroline T. Schroeder (Project Director: July 2022 to present)
The expansion of Coptic Scriptorium, an open-access database of Coptic language texts, and the development of digitized corpora, data models, and natural language tools for several Coptic dialects.
The Coptic language is the last phase of the Egyptian language family, a direct descendant of the hieroglyphs of ancient Egypt. Coptic texts and the study of Coptic linguistics are important for multiple academic disciplines and for the heritage community of Coptic Orthodox Christians who use the language for liturgy and their cultural identity. In this grant, the Coptic Scriptorium project (https://copticscriptorium.org) will expand our digital database of richly annotated texts to enable more comprehensive research and expand access to more heritage sources. We will also enable wider access to our resources by building more user-friendly interfaces.
Associated Products
Release 5.0.0 of Coptic Scriptorium Richly Annotated Corpora (Database/Archive/Digital Edition)Title: Release 5.0.0 of Coptic Scriptorium Richly Annotated Corpora
Author: Over 20 authors
Abstract: Release 5.0.0 of Coptic Scriptorium data now includes over 1,288,229 tokens of searchable, linguistically analyzed Coptic data from dozens of ancient Coptic works.
This release also marks the introduction of Bohairic Coptic data to our corpus holdings: the repository now contains Bohairic Bible materials, covering Mark 1-16 and 1 Cor. 1-16, with manually reviewed segmentation for the entire corpus, and manual tagging and treebanking for chapters 1-5 in each book. Segmentation and tagging were reviewed in collaboration with Nicholas Wagner, and treebanking was done in collaboration with Nina Speranskaja. As a result of this work, we are in the process of compiling new NLP tools and guidelines specifically for Bohairic.
In addition, the release includes corrections and updates to existing corpora as well as the addition of several new Sahidic works and documents:
A. Sections of five works by Shenoute of Atripe:
My Heart is Crushed
So Listen
Who But God is the Witness
This Great House
If Everyone Errs
B. New documents were added to existing works:
Acephalous Work 22
Apophthegmata Patrum
C. Newly added translation spans for Pistis Sophia, aligned by Randy Komforty
These join the newly treebanked and tagged Bohairic data (Bohairic 1 Corinthians, Bohairic Gospel of Mark)/
We are very grateful to all of our collaborators and contributors, without whom this project could not function.
As with all our releases, raw machine readable data for all corpora can be found, including morphological and syntactic analysis, as well as named entity recognition and entity linking (currently only for Sahidic), in this GitHub repository, in a variety of popular formats:
https://github.com/CopticScriptorium/corpora
You can also search for complex linguistic annotations in the data using our ANNIS server - please see our tutorial here to get started with some query tips and a helpful cheat sheet:
https://copticscriptorium.org/ANNIS_tutorial
Year: 2024
Primary URL:
https://github.com/CopticScriptorium/corpora/releases/tag/v5.0.0Secondary URL:
https://data.copticscriptorium.orgAccess Model: Open access either CC-BY 4.0 or CC-BY-SA 4.0
Release 4.5.0 of Coptic Scriptorium Richly Annotated Corpora (Database/Archive/Digital Edition)Title: Release 4.5.0 of Coptic Scriptorium Richly Annotated Corpora
Author: Over 20 editors and translators
Abstract: Release 4.5.0 of Coptic Scriptorium data now includes an additional 11,500 tokens of richly annotated Coptic text beyond the text available in the previous release.
This release corrects a large number of consistency errors identified in our existing data, and also adds some new documents:
Revisions to five works of Besa:
On Vigilance
Exhortations
To Aphthonia
On Lack of Food
To Thieving Nuns
Sections of three works by Shenoute of Atripe:
My Heart is Crushed
God Who Alone is True
I Have Been Considering
New documents were added to existing works:
Acephalous Work 22
Apophthegmata Patrum
Newly treebanked data with syntactic gold standard annotations for 1 Corinthians 7
We are very grateful to all of our collaborators and contributors, without whom this project could not function.
As with all releases, raw machine readable data for all corpora can be found, including morphological and syntactic analysis, as well as named entity recognition and entity linking, in this GitHub repository, in a variety of popular formats.
You can also search for complex linguistic annotations in the data using our ANNIS server - please see our new tutorial here to get started with some query tips and a helpful cheat sheet:
https://copticscriptorium.org/ANNIS_tutorial
Year: 2023
Primary URL:
https://github.com/CopticScriptorium/corpora/releases/tag/v4.5.0Primary URL Description: GitHub release site
Secondary URL:
https://data.copticscriptorium.orgAccess Model: Open access either CC-BY 4.0 or CC-BY-SA 4.0