Program

Preservation and Access: Humanities Collections and Reference Resources

Period of Performance

6/1/2020 - 8/31/2021

Funding Totals

$59,975.00 (approved)
$59,975.00 (awarded)


Bilingual Voices in the U.S./Mexico Borderlands: Technology-Enhanced Transcription and Community Engaged Scholarship

FAIN: PW-269430-20

University of Texas Rio Grande Valley (Edinburg, TX 78539-2909)
Katherine O'Donnell Christoffersen (Project Director: July 2019 to present)

A project to evaluate transcription tools and methods and develop a preservation plan for two sociolinguistic corpora documenting contemporary language practices of Spanish/English bilingual speakers in South Texas and southern Arizona.

Linguists at the University of Texas Rio Grande Valley (UTRGV) and the University of Arizona (UA) have collected over 157 hours of audio-recorded interviews with Spanish/English bilinguals documenting language varieties along the U.S./Mexico border. However, due to the time-consuming nature of manual transcription, many of these interviews have not yet been transcribed, limiting access to this valuable collection. This project pilots technologically-enhanced transcription methodologies, such as speech recognition and time alignment, to speed and streamline the transcription process. It also pilots a sustainable, community-based approach to the transcription of interviews by undergraduate and graduate students in research internship courses. This assessment, outcomes and findings of this project will guide other scholars seeking to develop their own community-based sociolinguistic corpora.



Media Coverage

Bilingual voices provide insight into community languages (Media Coverage)
Publication: University of Arizona College of Humanities Press Release
Date: 5/5/2020
Abstract: This is the press release by the University of Arizona College of Humanities on the grant project.
URL: https://humanities.arizona.edu/news/bilingual-voices-provide-insight-community-languages

UTRGV linguists awarded federal grant to document the language of the RGV (Media Coverage)
Author(s): Victoria Brito
Publication: UTRGV Newsroom
Date: 11/10/2021
Abstract: This is the press release by UTRGV Newsroom on the grant project.
URL: https://www.utrgv.edu/newsroom/2020/05/21-utrgv-linguists-awarded-federal-grant-to-document-the-language-of-the-rgv.htm

Code switch: Professors work to document blend of languages (Media Coverage)
Author(s):
Publication: Brownsville Herald
Date: 6/7/2020
Abstract: This is a front page newspaper article about the grant-funded project from the Brownsville Herald newspaper.
URL: https://www.utrgv.edu/cobiva/blog/2020/06/brownsville-herald,-code-switch-article.pdf

Researching the language at the U.S.-Mexican border (Media Coverage)
Author(s): Steven Hughes
Publication: UTRGV Pulse Magazine
Date: 9/21/2020
Abstract: This is a magazine article by a UTRGV student on the project written in the student magazine.
URL: https://utrgvpulse.com/2020/09/21/researching-the-language-at-the-u-s-mexican-border/



Associated Products

Instructions for Using R to Revise Autogenerated Stream Transcription (Script)
Title: Instructions for Using R to Revise Autogenerated Stream Transcription
Writer: Jessica Draper
Abstract: This is a pdf of instructions on how to use R studio using Script 1 and Script 2 for revising Stream autogenerated transcription.
Year: 2021
Primary URL: https://www.dropbox.com/s/8uzb1of5uc5t9e3/Instructions to Clean up Transcripts with R%2C Step 1 %26 2.pdf?dl=0
Primary URL Description: Instructions for using Script 1 and Script 2 for revising Stream autogenerated transcription

Step 1 Script (Script)
Title: Step 1 Script
Writer: Jessica Draper
Abstract: This Step 1 Script works in R Studio to remove extraneous lines and symbols from the original Stream autogenerated transcript.
Year: 2021
Primary URL: https://www.dropbox.com/s/1hlhxpu76d5yc9t/Step 1.R?dl=0
Primary URL Description: This Step 1 Script remove extraneous lines and symbols from the original Stream autogenerated transcript.

Step 2 Script (Script)
Title: Step 2 Script
Writer: Jessica Draper
Abstract: After individuals have entered the speaker codes, this second step R script allows individuals to revise
Year: 2021
Primary URL: https://www.dropbox.com/s/u7loy74k2babalb/Step 2.R?dl=0

Building Sociolinguistic Corpora: CESA & CoBiVa (Conference Paper/Presentation)
Title: Building Sociolinguistic Corpora: CESA & CoBiVa
Author: Katherine Christoffersen
Author: Ryan Bessett
Author: Ana Carvalho
Abstract: The importance of sharing sociolinguistic data has been the subject of workshops (LSA 2012, 2016), included in publications (Mallinson, 2013) and encouraged by funding agencies (NSF 2016). In line with these initiatives is the creation of student-based corpora of U.S. Spanish (e.g., Corpus of Mexican Spanish in Salinas, Spanish in Texas, Corpus de Español en el Sur de Arizona, Corpus Bilingüe del Valle), in which students take part in building sociolinguistic corpora. Student-based US Spanish corpora not only provide important data for variationist research including the opportunity to analyze changes in progress (Torres Cacoullos & Berry, forthcoming), but also provide students with training in sociolinguistic methods. This hands-on approach in turn raises sociolinguistic awareness among students, especially among minority language speakers whose native dialects are often seen as mixed hybrids. In this paper, we illustrate such a project by explaining step-by-step the involvement of students in the creation and maintenance of the Corpus del Español en el Sur de Arizona (CESA) and the Corpus Bilingüe del Valle (CoBiVa), the protocols followed to facilitate the sharing of data, and the development of technologically-aided transcription methods.
Date: 9/26/2020
Primary URL: https://www.dropbox.com/s/ij11q1wfdowdnsn/Christoffersen%2C Bessett%2C %26 Carvalho%2C 2020%2C LASSO presentation.pdf?dl=0
Conference Name: Linguistics Association of the Southwest Fall 2020

Testing technologically-aided transcription methods for the development of bilingual sociolinguistic corpora (Conference Paper/Presentation)
Title: Testing technologically-aided transcription methods for the development of bilingual sociolinguistic corpora
Author: Katherine Christoffersen
Author: Ryan Bessett
Author: Ana Carvalho
Abstract: In this study, we describe the Corpus del Español en el Sur de Arizona (CESA) and the Corpus Bilingüe del Valle (CoBiVa) and compare technologically-aided transcription methods for corpora development. Currently, the corpora are involved in a research project analyzing and testing various technologically aided transcription methods. Research assistants and students in the internship classes are testing the speed and ease of use of auto-generated transcription using Microsoft Stream, voice recognition using SpeechNotes, and manual transcription with ExpressScribe. The presentation will describe the results from testing with regard to ease of use, speed, and accuracy. Preliminary results show that Stream may be best suited for monolingual data, while SpeechNotes may be best suited for bilingual data. The researchers hope that these findings will encourage further development of local community-based, community-driven corpora.
Date: 3/20/2021
Primary URL: https://www.dropbox.com/s/c0rkt7lrl9iwlzw/Christoffersen%2C Bessett%2C %26 Carvalho%2C 2021%2C AAAL Presentation.pdf?dl=0
Conference Name: American Association of Applied Linguistics

Speed, accuracy, and ease of use of technologically-aided transcription methods for bilingual sociolinguistic corpora (Conference Paper/Presentation)
Title: Speed, accuracy, and ease of use of technologically-aided transcription methods for bilingual sociolinguistic corpora
Author: Katherine Christoffersen
Author: Ryan Bessett
Author: Ana Carvalho
Author: Mayte Vega Mudy
Author: Isabella Calafate de Barros
Abstract: For this study, we first compared three distinct open source transcription methods: 1) a manual transcription program (ExpressScribe), 2) a real-time voice recognition software where the user can edit and switch between languages (SpeechNotes), and 3) a captioning software that generates a transcript with timestamps, but only in one language (Microsoft Stream). The researchers compared baseline transcripts from the two voice recognition software methods to a corrected version of the transcript for each audio file to derive measures of accuracy. Averaged together across interviews, there was an average accuracy rate of 26% for Audio from Speaker (SpeechNotes) and 65% for auto-generated (Stream), leading us to discard SpeechNotes as a viable option. In a follow up study during Spring 2021, 33 students participated in a sociolinguistic field methods course that focused on the development of sociolinguistic corpora at the two universities. Overall, students reported that 10 minutes of transcription took them approximately 3 hours and 24 minutes for the manual (ExpressScribe) and 2 hours for the auto-generated transcript revision (Stream). Students also widely preferred the auto-generated transcription method for its speed and ease of use. The results show that the captioning software (Microsoft Stream) performs the highest in terms of speed, accuracy, and ease of use. This project contributes to a growing trend in data sharing and open access to sociolinguistic corpora (e.g., Bullock & Toribio, 2013; Brown, 2019), but it furthers these practices by exploring, applying, and evaluating advanced technological tools, thereby expanding their availability to the larger community of scholars interested in the expansion and diffusion of bilingual data.
Date: 9/25/21
Primary URL: https://www.dropbox.com/s/rcvyjjoxen59j59/Christoffersen%2C Bessett%2C %26 Carvalho%2C 2021%2C LASSO presentation.pdf?dl=0
Conference Name: Linguistics Association of the Southwest, Fall 2021

Memorandum of Understanding with UA Libraries (Report)
Title: Memorandum of Understanding with UA Libraries
Author: Veronica Reyes Escudero
Author: Ana Carvalho
Abstract: This is the memorandum of agreement between UA libraries and Ana Carvalho regarding the agreement for the preservation of the CESA corpus.
Date: 8/2/2021
Primary URL: https://www.dropbox.com/s/w8kjevj60co1r20/MOU%2C CESA %26 University of Arizona Library Special Collections.pdf?dl=0

User Agreement with UTRGV Libraries (Report)
Title: User Agreement with UTRGV Libraries
Author: Justin White
Author: Katherine Christoffersen
Author: Ryan Bessett
Abstract: This is the User Agreement created for the preservation of the CoBiVa by the UTRGV library.
Date: 8/24/2021
Primary URL: https://www.dropbox.com/s/659nctr667raf8r/User Agreement%2C CoBiVa %26 UTRGV Libraries Digital Collection.pdf?dl=0

UTRGV Preservation Policies (Report)
Title: UTRGV Preservation Policies
Author: UTRGV Libary
Abstract: Here are more details about the preservation policies of UTRGV related to the process for preserving CoBiVa.
Date: 1/1/2018
Primary URL: https://www.dropbox.com/scl/fi/o5lwtkvreb94fxi22rfe3/UNIVER-1.DOC?dl=0&rlkey=l0198dkm0lrh9wco8ww3pde63

UA Digital Preservation Policy (Report)
Title: UA Digital Preservation Policy
Author: UA Library
Abstract: This is the UA Library Digital Preservation Policy which relates to how CESA will be preserved.
Date: 6/6/2019
Primary URL: https://www.dropbox.com/s/e2ncian13wwg36a/UA%20DigitalPreservationPolicy_1-1-7_Final-Public.pdf?dl=0

DOI and digital repository for CoBiVa (Web Resource)
Title: DOI and digital repository for CoBiVa
Author: Katherine Christoffersen
Author: Ryan Bessett
Abstract: Together with librarian Justin White, we created the following DOI and digital repository for CoBiVa. This points to the website so as not to divert traffic from utrgv.edu/cobiva, and it also will be the source of a first level of backups for the data.
Year: 2021
Primary URL: http://doi.org/10.51734/DB0001

DOI and Digital Repository for CESA (Web Resource)
Title: DOI and Digital Repository for CESA
Author: Ana Carvalho
Abstract: Together with Fernando Rios from the UA Library, this DOI and digital repository was created to refer individuals to the main CESA website (cesa.arizona.edu) so as not to divert traffic. It also references and provides a link to CoBiVa as a sister corpus.
Year: 2021
Primary URL: https://doi.org/10.25422/azu.data.15070800

NEH Updates Blog section (Blog Post)
Title: NEH Updates Blog section
Author: Katherine Christoffersen
Abstract: This section of the CoBiVa blog contains all the blog posts related to the progress and work of this NEH grant-funded project. It was created to provide details and updates on the project, process, and dissemination of findings.
Date: 6/1/2020
Primary URL: https://www.utrgv.edu/cobiva/blog/category/technologically-aided-transcription.htm

CESA/CoBiVa Handbook & Resources (Web Resource)
Title: CESA/CoBiVa Handbook & Resources
Author: Ryan Bessett
Author: Ana Carvalho
Author: Katherine Christoffersen
Abstract: This dropbox folder contains the CESA/CoBiVa Handbook as well as resources such as intake forms and interview protocol. It is linked on both corpora websites.
Year: 2021
Primary URL: https://bit.ly/CESA_CoBiVa_Handbook

CESA & CoBiVa Training Handbook (Web Resource)
Title: CESA & CoBiVa Training Handbook
Author: Ryan Bessett
Author: Ana Carvalho
Author: Katherine Christoffersen
Abstract: This pdf provides detailed instructions on the interview and transcription process. An earlier version was created for training students on CESA. This updated version includes CoBiVa protocols, and it could also be helpful for researchers/scholars interested in creating sociolinguistic methods or using technologically-aided transcription methods.
Year: 2021
Primary URL: https://www.dropbox.com/s/v2prsb5f48oy0fz/0.%20CorpusTrainingHandbook.pdf?dl=0

CoBiVa Awarded NEH Grant (Blog Post)
Title: CoBiVa Awarded NEH Grant
Author: Katherine Christoffersen
Abstract: This is a blog post on the CoBiVa blog announcing the NEH award.
Date: 5/20/2020
Primary URL: https://www.utrgv.edu/cobiva/blog/2020/05/neh-grant-awarded-for-cobiva.htm
Website: CoBiVa Blog

BilingualTechnologically Aided Transcription Methods, Spanish and English (Blog Post)
Title: BilingualTechnologically Aided Transcription Methods, Spanish and English
Author: Katherine Christoffersen
Abstract: This blog post reports the preliminary overview of technologically-aided transcription methods by the PI and co-PI.
Date: 6/26/2020
Primary URL: https://www.utrgv.edu/cobiva/blog/2020/06/bilingual-technologically-aided-transcription-methods-spanish-and-english.htm
Website: CoBiVa Blog

Stream, SpeechNotes, and ExpressScribe (Blog Post)
Title: Stream, SpeechNotes, and ExpressScribe
Author: Katherine Christoffersen
Abstract: This blog post reviews the three technologically-aided transcription methods chosen for further trials: Microsoft Stream, SpeechNotes and ExpressScribe.
Date: 7/10/2020
Primary URL: https://www.utrgv.edu/cobiva/blog/2020/07/stream-speechnotes-and-expressscribe.htm
Website: CoBiVa Blog

Testing Technologically Aided Transcription- Training & Methods (Blog Post)
Title: Testing Technologically Aided Transcription- Training & Methods
Author: Katherine Christoffersen
Abstract: This blog post reports on the process for the initial trials of the 3 technologically-aided transcription methods including 6 research assistants (3 from UA and 3 from UTRGV).
Date: 8/21/2020
Primary URL: https://www.utrgv.edu/cobiva/blog/2020/08/blog-post.htm
Website: CoBiVa Blog

Preliminary Analysis of Speed & Ease of Use of Technologically-Aided Transcription Methods (Blog Post)
Title: Preliminary Analysis of Speed & Ease of Use of Technologically-Aided Transcription Methods
Author: Katherine Christoffersen
Abstract: This blog post reports the initial preliminary analyses of speed and ease of use from the Summer 2020 trials with 6 RAs.
Date: 9/25/2020
Primary URL: https://www.utrgv.edu/cobiva/blog/2020/09/analysis%20of%20speed%20and%20ease%20of%20use%20of%20technologically-aided%20transcription%20methods.htm
Website: CoBiVa Blog

UTRGV at LASSO 2020 Virtual Conference (Blog Post)
Title: UTRGV at LASSO 2020 Virtual Conference
Author: Katherine Christoffersen
Abstract: This blog post provides an update on the presentation at LASSO 2020.
Date: 9/28/2020
Primary URL: https://www.utrgv.edu/cobiva/blog/2020/09/blog-post1.htm
Website: CoBiVa Blog

Accuracy of Technologically-Aided Transcription Methods (Blog Post)
Title: Accuracy of Technologically-Aided Transcription Methods
Author: Katherine Christoffersen
Abstract: This blog post reports the analysis of accuracy for Microsoft Stream and SpeechNotes.
Date: 3/15/2021
Primary URL: https://www.utrgv.edu/cobiva/blog/2021/03/accuracy-of-technologically-aided-transcription-methods.htm
Website: CoBiVa Blog

CoBiVa at Virtual AAAL 2021 (Blog Post)
Title: CoBiVa at Virtual AAAL 2021
Author: Katherine Christoffersen
Abstract: This blog post reports on the presentation of the study's findings at AAAL 2021.
Date: 4/1/2021
Primary URL: https://www.utrgv.edu/cobiva/blog/2021/04/blog-post.htm
Website: CoBiVa Blog

Revising Stream Transcripts to WEBVTT with R (Blog Post)
Title: Revising Stream Transcripts to WEBVTT with R
Author: Katherine Christoffersen
Abstract: This blog post provides the instructions for running 2 scripts using R studio to revise autogenerated transcripts from Stream. It also links to Github.
Date: 5/1/2021
Primary URL: https://www.utrgv.edu/cobiva/blog/2021/05/revising-stream-transcripts-to-webvtt-with-r.htm
Website: CoBiVa Blog

Summer 2021 CoBiVa Team (Blog Post)
Title: Summer 2021 CoBiVa Team
Author: Katherine Christoffersen
Abstract: This blog post introduces RAs who were hired to work on the CoBiVa funded in part through NEH.
Date: 6/1/2021
Primary URL: https://www.utrgv.edu/cobiva/blog/2021/06/summer-2021-cobiva-team.htm
Website: CoBiVa Blog

Speed and Ease of Use for Technologically Aided Transcription Methods (Blog Post)
Title: Speed and Ease of Use for Technologically Aided Transcription Methods
Author: Katherine Christoffersen
Abstract: This blog post reports the speed and ease of use of technologically-aided transcription methods through comparison of Microsoft Stream and ExpressScribe among students at UA and UTRGV during internship courses in Spring 2021.
Date: 9/10/2021
Primary URL: https://www.utrgv.edu/cobiva/blog/2021/09/speed-and-ease-of-use-for-technologically-aided-transcription-methods.htm
Website: CoBiVa Blog

CoBiVa at LASSO 2021 (Blog Post)
Title: CoBiVa at LASSO 2021
Author: Katherine Christoffersen
Abstract: This blog post reports on the presentation at LASSO 2021 presenting the findings of the project.
Date: 8/1/2021
Primary URL: https://www.utrgv.edu/cobiva/blog/2021/10/cobiva%20at%20lasso%202021.htm
Website: CoBiVa Blog