Associated Products
Instructions for Using R to Revise Autogenerated Stream Transcription (Script)Title: Instructions for Using R to Revise Autogenerated Stream Transcription
Writer: Jessica Draper
Abstract: This is a pdf of instructions on how to use R studio using Script 1 and Script 2 for revising Stream autogenerated transcription.
Year: 2021
Primary URL: to Clean up Transcripts with R%2C Step 1 %26 2.pdf?dl=0Primary URL Description: Instructions for using Script 1 and Script 2 for revising Stream autogenerated transcription
Step 1 Script (Script)Title: Step 1 Script
Writer: Jessica Draper
Abstract: This Step 1 Script works in R Studio to remove extraneous lines and symbols from the original Stream autogenerated transcript.
Year: 2021
Primary URL: 1.R?dl=0Primary URL Description: This Step 1 Script remove extraneous lines and symbols from the original Stream autogenerated transcript.
Step 2 Script (Script)Title: Step 2 Script
Writer: Jessica Draper
Abstract: After individuals have entered the speaker codes, this second step R script allows individuals to revise
Year: 2021
Primary URL: 2.R?dl=0Building Sociolinguistic Corpora: CESA & CoBiVa (Conference Paper/Presentation)Title: Building Sociolinguistic Corpora: CESA & CoBiVa
Author: Katherine Christoffersen
Author: Ryan Bessett
Author: Ana Carvalho
Abstract: The importance of sharing sociolinguistic data has been the subject of workshops (LSA 2012, 2016), included in publications (Mallinson, 2013) and encouraged by funding agencies
(NSF 2016). In line with these initiatives is the creation of student-based corpora of U.S. Spanish (e.g., Corpus of Mexican Spanish in Salinas, Spanish in Texas, Corpus de Español en el Sur de Arizona, Corpus Bilingüe del Valle), in which students take part in building sociolinguistic corpora. Student-based US Spanish corpora not only provide important data for variationist
research including the opportunity to analyze changes in progress (Torres Cacoullos & Berry, forthcoming), but also provide students with training in sociolinguistic methods. This
hands-on approach in turn raises sociolinguistic awareness among students, especially among minority language speakers whose native dialects are often seen as mixed hybrids. In this paper, we illustrate such a project by explaining step-by-step the involvement of students in the creation and maintenance of the Corpus del Español en el Sur de Arizona (CESA) and the Corpus Bilingüe del Valle (CoBiVa), the protocols followed to facilitate the sharing of data, and the development of technologically-aided transcription methods.
Date: 9/26/2020
Primary URL: Bessett%2C %26 Carvalho%2C 2020%2C LASSO presentation.pdf?dl=0Conference Name: Linguistics Association of the Southwest Fall 2020
Testing technologically-aided transcription methods for the development of bilingual sociolinguistic corpora (Conference Paper/Presentation)Title: Testing technologically-aided transcription methods for the development of bilingual sociolinguistic corpora
Author: Katherine Christoffersen
Author: Ryan Bessett
Author: Ana Carvalho
Abstract: In this study, we describe the Corpus del Español en el Sur de Arizona (CESA) and the Corpus Bilingüe del Valle (CoBiVa) and compare technologically-aided transcription methods for corpora development. Currently, the corpora are involved in a research project analyzing and testing various technologically aided transcription methods. Research assistants and students in the internship classes are testing the speed and ease of use of auto-generated transcription using Microsoft Stream, voice recognition using SpeechNotes, and manual transcription with ExpressScribe. The presentation will describe the results from testing with regard to ease of use, speed, and accuracy. Preliminary results show that Stream may be best suited for monolingual data, while SpeechNotes may be best suited for bilingual data. The researchers hope that these findings will encourage further development of local community-based, community-driven corpora.
Date: 3/20/2021
Primary URL: Bessett%2C %26 Carvalho%2C 2021%2C AAAL Presentation.pdf?dl=0Conference Name: American Association of Applied Linguistics
Speed, accuracy, and ease of use of technologically-aided transcription methods for bilingual sociolinguistic corpora (Conference Paper/Presentation)Title: Speed, accuracy, and ease of use of technologically-aided transcription methods for bilingual sociolinguistic corpora
Author: Katherine Christoffersen
Author: Ryan Bessett
Author: Ana Carvalho
Author: Mayte Vega Mudy
Author: Isabella Calafate de Barros
Abstract: For this study, we first compared three distinct open source transcription methods: 1) a manual transcription program (ExpressScribe), 2) a real-time voice recognition software where the user can edit and switch between languages (SpeechNotes), and 3) a captioning software that generates a transcript with timestamps, but only in one language (Microsoft Stream). The researchers compared baseline transcripts from the two voice recognition software methods to a corrected version of the transcript for each audio file to derive measures of accuracy. Averaged together across interviews, there was an average accuracy rate of 26% for Audio from Speaker (SpeechNotes) and 65% for auto-generated (Stream), leading us to discard SpeechNotes as a viable option. In a follow up study during Spring 2021, 33 students participated in a sociolinguistic field methods course that focused on the development of sociolinguistic corpora at the two universities. Overall, students reported that 10 minutes of transcription took them approximately 3 hours and 24 minutes for the manual (ExpressScribe) and 2 hours for the auto-generated transcript revision (Stream). Students also widely preferred the auto-generated transcription method for its speed and ease of use. The results show that the captioning software (Microsoft Stream) performs the highest in terms of speed, accuracy, and ease of use. This project contributes to a growing trend in data sharing and open access to sociolinguistic corpora (e.g., Bullock & Toribio, 2013; Brown, 2019), but it furthers these practices by exploring, applying, and evaluating advanced technological tools, thereby expanding their availability to the larger community of scholars interested in the expansion and diffusion of bilingual data.
Date: 9/25/21
Primary URL: Bessett%2C %26 Carvalho%2C 2021%2C LASSO presentation.pdf?dl=0Conference Name: Linguistics Association of the Southwest, Fall 2021
Memorandum of Understanding with UA Libraries (Report)Title: Memorandum of Understanding with UA Libraries
Author: Veronica Reyes Escudero
Author: Ana Carvalho
Abstract: This is the memorandum of agreement between UA libraries and Ana Carvalho regarding the agreement for the preservation of the CESA corpus.
Date: 8/2/2021
Primary URL: CESA %26 University of Arizona Library Special Collections.pdf?dl=0User Agreement with UTRGV Libraries (Report)Title: User Agreement with UTRGV Libraries
Author: Justin White
Author: Katherine Christoffersen
Author: Ryan Bessett
Abstract: This is the User Agreement created for the preservation of the CoBiVa by the UTRGV library.
Date: 8/24/2021
Primary URL: Agreement%2C CoBiVa %26 UTRGV Libraries Digital Collection.pdf?dl=0UTRGV Preservation Policies (Report)Title: UTRGV Preservation Policies
Author: UTRGV Libary
Abstract: Here are more details about the preservation policies of UTRGV related to the process for preserving CoBiVa.
Date: 1/1/2018
Primary URL: Digital Preservation Policy (Report)Title: UA Digital Preservation Policy
Author: UA Library
Abstract: This is the UA Library Digital Preservation Policy which relates to how CESA will be preserved.
Date: 6/6/2019
Primary URL: and digital repository for CoBiVa (Web Resource)Title: DOI and digital repository for CoBiVa
Author: Katherine Christoffersen
Author: Ryan Bessett
Abstract: Together with librarian Justin White, we created the following DOI and digital repository for CoBiVa. This points to the website so as not to divert traffic from, and it also will be the source of a first level of backups for the data.
Year: 2021
Primary URL: and Digital Repository for CESA (Web Resource)Title: DOI and Digital Repository for CESA
Author: Ana Carvalho
Abstract: Together with Fernando Rios from the UA Library, this DOI and digital repository was created to refer individuals to the main CESA website ( so as not to divert traffic. It also references and provides a link to CoBiVa as a sister corpus.
Year: 2021
Primary URL: Updates Blog section (Blog Post)Title: NEH Updates Blog section
Author: Katherine Christoffersen
Abstract: This section of the CoBiVa blog contains all the blog posts related to the progress and work of this NEH grant-funded project. It was created to provide details and updates on the project, process, and dissemination of findings.
Date: 6/1/2020
Primary URL: Handbook & Resources (Web Resource)Title: CESA/CoBiVa Handbook & Resources
Author: Ryan Bessett
Author: Ana Carvalho
Author: Katherine Christoffersen
Abstract: This dropbox folder contains the CESA/CoBiVa Handbook as well as resources such as intake forms and interview protocol. It is linked on both corpora websites.
Year: 2021
Primary URL: & CoBiVa Training Handbook (Web Resource)Title: CESA & CoBiVa Training Handbook
Author: Ryan Bessett
Author: Ana Carvalho
Author: Katherine Christoffersen
Abstract: This pdf provides detailed instructions on the interview and transcription process. An earlier version was created for training students on CESA. This updated version includes CoBiVa protocols, and it could also be helpful for researchers/scholars interested in creating sociolinguistic methods or using technologically-aided transcription methods.
Year: 2021
Primary URL: Awarded NEH Grant (Blog Post)Title: CoBiVa Awarded NEH Grant
Author: Katherine Christoffersen
Abstract: This is a blog post on the CoBiVa blog announcing the NEH award.
Date: 5/20/2020
Primary URL: CoBiVa Blog
BilingualTechnologically Aided Transcription Methods, Spanish and English (Blog Post)Title: BilingualTechnologically Aided Transcription Methods, Spanish and English
Author: Katherine Christoffersen
Abstract: This blog post reports the preliminary overview of technologically-aided transcription methods by the PI and co-PI.
Date: 6/26/2020
Primary URL: CoBiVa Blog
Stream, SpeechNotes, and ExpressScribe (Blog Post)Title: Stream, SpeechNotes, and ExpressScribe
Author: Katherine Christoffersen
Abstract: This blog post reviews the three technologically-aided transcription methods chosen for further trials: Microsoft Stream, SpeechNotes and ExpressScribe.
Date: 7/10/2020
Primary URL: CoBiVa Blog
Testing Technologically Aided Transcription- Training & Methods (Blog Post)Title: Testing Technologically Aided Transcription- Training & Methods
Author: Katherine Christoffersen
Abstract: This blog post reports on the process for the initial trials of the 3 technologically-aided transcription methods including 6 research assistants (3 from UA and 3 from UTRGV).
Date: 8/21/2020
Primary URL: CoBiVa Blog
Preliminary Analysis of Speed & Ease of Use of Technologically-Aided Transcription Methods (Blog Post)Title: Preliminary Analysis of Speed & Ease of Use of Technologically-Aided Transcription Methods
Author: Katherine Christoffersen
Abstract: This blog post reports the initial preliminary analyses of speed and ease of use from the Summer 2020 trials with 6 RAs.
Date: 9/25/2020
Primary URL: CoBiVa Blog
UTRGV at LASSO 2020 Virtual Conference (Blog Post)Title: UTRGV at LASSO 2020 Virtual Conference
Author: Katherine Christoffersen
Abstract: This blog post provides an update on the presentation at LASSO 2020.
Date: 9/28/2020
Primary URL: CoBiVa Blog
Accuracy of Technologically-Aided Transcription Methods (Blog Post)Title: Accuracy of Technologically-Aided Transcription Methods
Author: Katherine Christoffersen
Abstract: This blog post reports the analysis of accuracy for Microsoft Stream and SpeechNotes.
Date: 3/15/2021
Primary URL: CoBiVa Blog
CoBiVa at Virtual AAAL 2021 (Blog Post)Title: CoBiVa at Virtual AAAL 2021
Author: Katherine Christoffersen
Abstract: This blog post reports on the presentation of the study's findings at AAAL 2021.
Date: 4/1/2021
Primary URL: CoBiVa Blog
Revising Stream Transcripts to WEBVTT with R (Blog Post)Title: Revising Stream Transcripts to WEBVTT with R
Author: Katherine Christoffersen
Abstract: This blog post provides the instructions for running 2 scripts using R studio to revise autogenerated transcripts from Stream. It also links to Github.
Date: 5/1/2021
Primary URL: CoBiVa Blog
Summer 2021 CoBiVa Team (Blog Post)Title: Summer 2021 CoBiVa Team
Author: Katherine Christoffersen
Abstract: This blog post introduces RAs who were hired to work on the CoBiVa funded in part through NEH.
Date: 6/1/2021
Primary URL: CoBiVa Blog
Speed and Ease of Use for Technologically Aided Transcription Methods (Blog Post)Title: Speed and Ease of Use for Technologically Aided Transcription Methods
Author: Katherine Christoffersen
Abstract: This blog post reports the speed and ease of use of technologically-aided transcription methods through comparison of Microsoft Stream and ExpressScribe among students at UA and UTRGV during internship courses in Spring 2021.
Date: 9/10/2021
Primary URL: CoBiVa Blog
CoBiVa at LASSO 2021 (Blog Post)Title: CoBiVa at LASSO 2021
Author: Katherine Christoffersen
Abstract: This blog post reports on the presentation at LASSO 2021 presenting the findings of the project.
Date: 8/1/2021
Primary URL: CoBiVa Blog