Program

Research Programs: Dynamic Language Infrastructure-Documenting Endangered Languages - Fellowships

Period of Performance

10/1/2020 - 9/30/2021

Funding Totals

$60,000.00 (approved)
$60,000.00 (awarded)


Documentation and speech corpus development for Gitksan [git]

FAIN: FN-271111-20

Clarissa Forbes
Arizona Board of Regents (Tucson, AZ 85721-0073)

Translation and annotation of audio recordings for a community- and scholar-accessible online repository of the Native American Gitksan language spoken in Alaska and British Columbia.

Gitksan is the traditional language of the Gitxsan people of Alaska and the northern interior of British Columbia. It is the easternmost member of the Tsimshianic family and highly endangered, with an estimated 300-500 native speakers in their late 50s, at the youngest. Language shift toward English is well underway, in large part due to the effects of the Canadian residential school policy of the 20th century, making the need for language documentation increasingly urgent. Essentially, all documentary work on Gitksan has been conducted in the last 40 years; existing resources include an unpublished grammar, a few short lessons and stories, and several wordlists and phrasebooks of varying levels of detail. There are many areas yet undocumented.
The project's primary goal is the development of a online text repository with several functions:
1) a community-accessible body of narratives and conversations,
2) a base of sample sentences for an existing community-accessible online dictionary in active
development at the University of British Columbia, and
3) a corpus for linguists working on Gitksan to view long-form narrative or conversational data. (Edited by staff)



Media Coverage

Visiting Scholar Receives $60,000 NEH Fellowship Award to Document Endangered Language (Media Coverage)
Author(s): Kristina Makansi
Publication: University of Arizona News Releases
Date: 8/13/2020
Abstract: Clarissa Forbes’ work on Gitksan embodies University of Arizona’s dedication to preserving Indigenous languages.
URL: https://sbs.arizona.edu/news/visiting-scholar-receives-60000-neh-fellowship-award-document-endangered-language



Associated Products

Gitksan FST (Computer Program)
Title: Gitksan FST
Author: Clarissa Forbes
Abstract: A finite-state morphological transducer for the Gitksan language (Tsimshianic, BC) implemented in foma. Lexical items are primarily drawn from Hindle & Rigsby (1975).
Year: 2021
Primary URL: https://github.com/caforbes/git_fst
Primary URL Description: Github source code
Access Model: open source program
Programming Language/Platform: foma, python
Source Available?: Yes

An FST morphological analyzer for the Gitksan language (Conference Paper/Presentation)
Title: An FST morphological analyzer for the Gitksan language
Author: Clarissa Forbes
Author: Garrett Nicolai
Author: Miikka Silfverberg
Abstract: This paper presents a finite-state morphological analyzer for the Gitksan language. The analyzer draws from a 1250-token Eastern dialect wordlist. It is based on finite-state technology and additionally includes two extensions which can provide analyses for out-of-vocabulary words: rules for generating predictable dialect variants, and a neural guesser component. The pre-neural analyzer, tested against interlinear-annotated texts from multiple dialects, achieves coverage of (75-81%), and maintains high precision (95-100%). The neural extension improves coverage at the cost of lowered precision.
Date: 08/15/2021
Primary URL: https://sigmorphon.github.io/workshops/2021/2021_SIGMORPHON_Proceedings.pdf
Primary URL Description: Proceedings of SIGMORPHON 2021
Conference Name: SIGMORPHON

An FST morphological analyzer for the Gitksan language. (Conference Paper/Presentation)
Title: An FST morphological analyzer for the Gitksan language.
Author: Clarissa Forbes
Author: Garrett Nicolai
Author: Miikka Silfverberg
Abstract: This paper provides a case study of the ongoing development of a finite-state morphological analyzer for the Gitksan language. The analyzer draws from a 1250-token Eastern dialect wordlist, and was tested against interlinear-annotated texts from multiple dialects. Analyzer coverage varies across dialects (50-70%), but it maintains high precision (95-100%).
Date: 06/11/2021
Conference Name: Americas NLP