Transcribing AILLA: Increasing Collection Access and Reusability through Crowdsourced Transcription
FAIN: PW-259116-18
University of Texas at Austin (Austin, TX 78712-0100)
Virginia Garrard Burnett (Project Director: July 2017 to July 2023)
A
Foundations pilot project to transcribe materials in Mixtec, a pre-Columbian language
spoken in south-central Mexico, and that are housed at the Archive of the Indigenous
Languages of Latin America. Working with
undergraduate linguistics students and Mixtec community members who have
migrated to southern California, the applicant would undertake transcription of
hand-written documents and audio recordings to make them searchable, thereby improving
access and reuse.
The Archive of the Indigenous Languages of Latin America (AILLA) at the University of Texas at Austin (UT) has thousands of images of handwritten manuscripts whose text cannot be searched, making them difficult for users to discover and access. Since many of these documents are transcriptions or translations of recordings in AILLA's collections, improving access to a manuscript increases access to other media. This project pilots a low-cost process to improve access to resources in AILLA's collections by crowdsourcing the transcription of select handwritten documents written in Mixtec languages using open-source software. It will be implemented in an undergraduate linguistics course at UT, for which lesson plans will be developed, and within a Mixtec speech community in California, in an effort to develop a community of practice. More broadly, investigators will disseminate findings among other digital archives so they may adapt the approach to any language.
Associated Products
Transcribe AILLA: Making Digital Texts and Datasets from Mixtec Manuscripts (Conference Paper/Presentation)Title: Transcribe AILLA: Making Digital Texts and Datasets from Mixtec Manuscripts
Author: Ryan Sullivant
Abstract: Brief presentation on challenges to supporting Mixtec-language transcription activities during the organized panel “Challenges to Transcription in Languages other than English”.
Date: 2019-09-27
Primary URL:
https://docs.google.com/presentation/d/1gWnRQ4euQO_xWLGWfwlAwT7YTlINCOpkcQ-rhoInDHs/edit?usp=sharingPrimary URL Description: Public Google Slides link
Conference Name: Digital Frontiers
A sketch of Tututepec Mixtec based on 20th century historical sources (Conference Paper/Presentation)Title: A sketch of Tututepec Mixtec based on 20th century historical sources
Author: Ryan Sullivant
Abstract: Some Mixtec varieties of the Coastal dialect group (Josserand 1983) are relatively vital and have been documented and described, whereas others are not. One of these, Tu’un Savi, or Tututepec Mixtec, is spoken by relatively few people and has not been well-studied, even though history suggests it may have been influential in the past. The town of San Pedro Tututepec is built on the former capital of the Mixtec-speaking Yuku Dzaa empire that controlled much of the coastal region of what is now Oaxaca from the Late Post-Classic period until after the Conquest (Spores 1993).
This paper presents a brief sketch of Tututepec Mixtec as represented in historical and archival sources with an eye to identifying traits that distinguish it from other Coastal Mixtec languages that are more prominent in the literature. In particular, evidence will be shown pointing to Tututepec Mixtec sharing more features with Chayuco Mixtec (Pensinger 1974) than with other nearby Mixtec languages such as Jamiltepec and San Juan Colorado Mixtec (Johnson 1988; Stark, Johnson & Lorenzo 1986). These sources include a wordlist (Belmar 1902; Sullivant 2015) and responses to linguistic surveys undertaken in the 1970s (Josserand n.d.). This sketch of Tututepec Mixtec will identify phonological, morphological, and lexical traits of the language that will be useful for the study of the historical development of the Mixtec languages and variation across the Coastal Mixtec languages.
It is hoped that this initial sketch made possible by studies of previously collected by underanalyzed data will draw attention to this language and its position in the constellation of Mixtec languages of the Coastal group and possible historical influence across Oaxaca’s Pacific coast. This work has the potential to bring attention to the Tu’un Savi speech community of Tututepec and may spur further research on this language or other languages it may have been in contact with it historically.
Date: 2020-01-03
Primary URL:
https://docs.google.com/presentation/d/1cWn3jHA6cMFjONgfuVmQVpIzDVjl6JpCb_LOUIC3nQ0/edit?usp=sharingPrimary URL Description: Public Google Slides link
Conference Name: Society for the Study of the Indigenous Languages of the Americas
Transcriptions of Mixtec Language Surveys (Database/Archive/Digital Edition)Title: Transcriptions of Mixtec Language Surveys
Author: Ryan Sullivant
Author: Kathryn Josserand
Abstract: This collection contains digital transcriptions of originally handwritten transcriptions of audio recordings of language variation surveys in Mixtec (and other closely related) languages. The audio recordings that correspond to these transcriptions are found can be found in the MesoAmerican Languages Collection of Kathryn Josserand. This collection contains three kinds of documents:
Texts with the digital transcriptions in PDF formatting
Images and Texts which are PDF documents showing the image of the manuscript alongside their transcription, and
Datasets that present survey data in tabular CSV formats.
These materials were produced by volunteer transcribers as part of an NEH-funded pilot project to crowdsource the transcription of the handwritten documents. This project was supervised by Ryan Sullivant with technical assistance by May Helena Plumb.
Year: 2020
Primary URL:
https://ailla.utexas.org/islandora/object/ailla:271571Primary URL Description: Link to AILLA collection
Access Model: Materials are public. A free account must be created to access files.
MesoAmerican Languages Collection of Kathryn Josserand (Database/Archive/Digital Edition)Title: MesoAmerican Languages Collection of Kathryn Josserand
Author: Kathryn Josserand
Abstract: Digital collection of materials in AILLA that was expanded as part of this grant.
Year: 2020
Primary URL:
https://ailla.utexas.org/islandora/object/ailla:124466Primary URL Description: Link to AILLA collection
Access Model: Materials are public. A free account registration is required to access files.
The Kathryn Josserand Mixtec Language Surveys (Web Resource)Title: The Kathryn Josserand Mixtec Language Surveys
Author: TranscribeAILLA volunteers
Author: Kathryn Josserand
Abstract: A collection on the University of Texas Libraries' instance of FromThePage, the website where this project's volunteer transcription was performed.
Year: 2017
Primary URL:
https://fromthepage.lib.utexas.edu/sullivant/the-kathryn-josserand-mixtec-language-surveysPrimary URL Description: FromThePage collection link
Variation in Mixtec Languages (Course or Curricular Material)Title: Variation in Mixtec Languages
Author: Ryan Sullivant
Abstract: Activities for teaching some concepts in linguistics using transcribed Mixtec surveys as primary language data.
Year: 2018
Primary URL:
https://curriculum.llilasbenson.utexas.edu/lesson/lesson-variation-mixtec-languagesPrimary URL Description: Specific URL pending.
Audience: Undergraduate