Documentation and dictionary of Oro Win (orw)
FAIN: FN-266285-19
Joshua Birchall
Unaffiliated independent scholar
Video recordings and preparation of a multimedia dictionary and associated Android app for Oro Win, an indigenous Amazonian language with currently only six fully fluent speakers
Oro Win is a member of the Chapacuran language family spoken along the
headwaters of the Pacaás Novos River in the Brazilian state of Rondônia in southwestern
Amazonia. There are currently six elderly native speakers of the Oro Win language and another
twelve community members that can be considered semi-speakers from an ethnic population
of approximately 120 individuals. There are currently no published dictionaries of any Chapacuran
language, and the need for this type of work to be carried out with the community
is especially urgent.
This project has three primary objectives: (1) to train indigenous researchers so that they
have the knowledge and skills to document and study their own language; (2) to develop an
extensive and multifaceted documentary corpus of the Oro Win language in close collaboration
with native researchers through a participatory community-based model of language
documentation; (3) to use this corpus to produce a multimedia dictionary for the indigenous
and academic communities that includes examples for lexical entries from actual language
use. All materials will be archived at the Museu Paraense Emilio Goeldi, a Brazilian federal
research institute, with a copy deposited at the Archive for Indigenous Languages of Latin
America at the University of Texas (AILLA).
This project will produce the first published dictionary of a Chapacuran
language. Oro Win retains
a number of conservative grammatical and phonological features not found in Wari, the last
Chapacuran language still being learned by children as a first language. This project is an
opportunity to document the natural speech and lexical knowledge of the last generation of
Oro Win who learned the language as children and still use it in their daily lives. Increased documentation of the Oro Win language and
culture can help expand our knowledge about the regional ethnolinguistic landscape. (Edited by staff)
Associated Products
OroWin-ORW (Database/Archive/Digital Edition)Title: OroWin-ORW
Author: Joshua Birchall
Abstract: Multimedia digital language archive at the Acervo de Línguas Indígenas do MPEG. Museu Paraense Emílio Goeldi: Belém.
Year: 2022
Primary URL:
https://arqling. museu-goeldi.br/Access Model: Metadata is public but access to resources is by request, approval and agreeing to terms and conditions of use.
Oro Win Dictionary Collection of Joshua Birchall (Database/Archive/Digital Edition)Title: Oro Win Dictionary Collection of Joshua Birchall
Author: Joshua Birchall
Abstract: Multimedia language documentation archive collection in The Archive of the Indigenous Languages of Latin America at the University of Texas in Austin.
Year: 2022
Primary URL:
https://www.ailla.utexas.org/islandora/object/ailla%3A285356Primary URL Description: Direct link to collection within https://www.ailla.utexas.org.
Access Model: Metadata is public but project resources are currently under a one-year embargo whose request must be requested while a subsequent version of dictionary is being developed.
csv2rmd: Um programa python para produzir dicionários multimídia com Markdown (Computer Program)Title: csv2rmd: Um programa python para produzir dicionários multimídia com Markdown
Author: Saulo Brito
Author: Joshua Birchall
Abstract: csv2rmd: a Python program to produce multimedia dictionaries with Markdown.
Year: 2022
Primary URL:
https://doi.org/10.5281/zenodo.6642438Primary URL Description: Alpha public release (version 0.1)
Secondary URL:
https://github.com/SauloRTB/Csv2RmdSecondary URL Description: Github repository
Access Model: open access
Programming Language/Platform: Python
Source Available?: Yes
Dicionário Oro Win (Computer Program)Title: Dicionário Oro Win
Author: Joshua Birchall
Author: Silvania Oro Eo' Cabixi
Author: Luciano Oro Win
Abstract: Multimedia dictionary app developed with the Oro Win community of Brazil.
Year: 2022
Access Model: Currently only available through physical transfer until a revised version is released.
Programming Language/Platform: APK
Source Available?: No
Developing workflows for community-based lexicography (Conference Paper/Presentation)Title: Developing workflows for community-based lexicography
Author: Joshua Birchall
Abstract: In classic lexicography, dictionaries are primarily static documents printed on paper and compiled by a single researcher. Modern community-based lexicography, on the other hand, is generally team-based, cross-platform, dynamic and multimedia. Multimedia dictionaries are often developed as a component of projects on the documentation of language use within its cultural context, usually combined with judicious elicitation of missing lexical items and examples of their usage. However, to produce multimedia dictionaries, one has often needed to rely on either advanced computer programming skills or the use of a suite of software packages where it is time consuming to insert the data and difficult to get it out in a form that is usable for other aspects of research. This leads to the following question: how can we develop a workflow for community-based lexicography that can integrate materials from a variety of sources while still maintaining the integrity of the original data in way that is accessible to most linguists?
In this talk, I show how to apply the data science principles of portability, transparency, modularity and version control to modern lexicography so that it can move beyond this labor-intensive or programming- intensive dichotomy. Drawing from examples from three ongoing research projects with different indigenous communities in Brazilian Amazonia—Oro Win (Chapacuran), Sakurabiat (Tupian) and Kanoˆe (isolate)— I outline a workflow based off of these principles with three main advantages: (i) it only requires basic knowledge of data science practices instead of advanced programming knowledge; (ii) it relies on free and open-source software and data standards; and, (iii) it easily fits into the context of a larger documentation project by taking advantage of other common aspect of the work, such as transcription in ELAN. I also showcase a few new tools currently being developed to further aid in this process.
Integrating data science principles into workf
Date: 11/11/2022
Conference Name: 15th High Desert Linguistics Society
Incorporating data science principles into modern lexicography (Conference Paper/Presentation)Title: Incorporating data science principles into modern lexicography
Author: Joshua Birchall
Abstract: When developing multimedia dictionaries, one has often needed to rely on either advanced computer programming skills or the use of software where it is time consuming to insert the data and difficult to get it out in a form that is usable for other aspects of research. In this talk, I show how the data science principles of portability, transparency and modularity can be applied to modern lexicography to move beyond this labor-intensive or programming-intensive dichotomy. Drawing on examples from three ongoing projects with different communities in Brazilian Amazonia---Oro Win (Chapacuran), Sakurabiat (Tupian) and Kanoê (isolate)---I outline a workflow based on these principles that uses free open-source software and can easily fit into the context of a larger documentation project by taking advantage of other common aspect of the work, such as transcription in ELAN. A data science approach to modern lexicography can help to streamline the production of multimedia dictionaries and ensure that lexicographers are able to make their research more readily available to prospective users.
Keywords: Lexicography, Indigenous languages, Language documentation
Date: 10/25/2022
Conference Name: University of Oregon Linguistics Department Colloquium Series
Producing community-oriented language materials with Markdown (Conference Paper/Presentation)Title: Producing community-oriented language materials with Markdown
Author: Joshua Birchall
Abstract: The last few decades have seen an enormous increase in the amount of ethnolinguistic documentation being produced with language communities all over the globe. Besides using this documentation for linguistic research, there is a huge opportunity to transform these new resources into products that are immediately useful and accessible to the communities themselves through local electronic publishing.
In this talk I discuss the preliminary results of two ongoing projects in Brazil to develop community-oriented language materials through incorporation of archived multimedia: a dictionary with the Oro Win (Chapacuran) and a cultural encyclopedia with the Aikanã (isolate). Both of these projects are being prepared using Markdown, a text-based markup language that is quickly become the standard for internet publishing. This talk outlines an open-source workflow using Markdown that can immediately produce html and (non-multimedia) pdf versions of these materials and discusses the possible role of local publishing in the toolkit of documentary linguists.
Date: 3/27/22
Conference Name: UC Berkeley Fieldwork Forum
Dicionários multimídia, planejamento linguístico e práticas de revitalização de línguas (Conference Paper/Presentation)Title: Dicionários multimídia, planejamento linguístico e práticas de revitalização de línguas
Author: Ivan Rocha
Author: Joshua Birchall
Author: Ana Vilacy Galucio
Abstract: O trabalho discute o papel dos dicionários multimídia no planejamento linguístico em práticas de revitalização de línguas indígenas ameaçadas. Dentre as 150 línguas no Brasil, cerca de 40% a 50% está ameaçada e a outra parte em situação de vulnerabilidade. O planejamento linguístico inclui a avaliação e a descrição do status (situação), do corpus (material existente, documentado e descrito) e da aquisição (modo de transmissão linguística e produtos disponíveis) (cf. Amaral, 2020). Nestas etapas do planejamento, os dicionários multimídia surgem como uma ferramenta para auxiliar na revitalização de uma língua e podem ser tanto úteis no processo de transmissão da língua quanto documental e descritivo.
O objetivo principal do trabalho é abordar a importância dos dicionários multimídia como um produto multifuncional e multiuso, e apresentar a metodologia e a tecnologia envolvidas na sua elaboração. Almeja-se ainda descrever as macro e microestruturas adotadas nessas obras, decididas conforme a necessidade dos falantes ou potenciais falantes da língua.
O uso de tecnologia como smartphones, tablets ou PCs , ainda que incipiente, é uma realidade em muitas comunidades indígenas, o que favorece a implementação de dicionários em formatos digitais, cujas entradas lexicais podem ser ilustradas com recursos multimídias (além de informações conceituais, definições, transcrição, campos semânticos, exemplos de uso, etc.). Esses formatos digitais apresentam vantagens em relação aos formatos impressos, como a distribuição e o fácil acesso dos apps entre aqueles usuários de smartphones ou tablets, redução de custos e flexibilidade na organização da macroestrutura.
As ferramentas computacionais utilizadas no processo de elaboração desses recursos multimídias são os softwares de transcrição e anotação linguísticas (ELAN), de gerenciamento, análise e armazenamento de dados lexicais, textuais e multimídias (FLEx), uso de scripts em Python para a extração de dados do FLEx e a automat
Date: 7/5/2021
Conference Name: 68o Seminário do Grupo de Estudos Linguísticos
Construyendo una nueva lexicografía en las tierras bajas (Conference Paper/Presentation)Title: Construyendo una nueva lexicografía en las tierras bajas
Author: Joshua Birchall
Abstract: Invited plenary lecture of the I Congreso Nacional de Lexicología y Lexicografía de Bolivia
Date: 12/4/2021
Conference Name: I Congreso Nacional de Lexicología y Lexicografía de Bolivia
Multipurpose lexical databases for lexicography and historical linguistics (Conference Paper/Presentation)Title: Multipurpose lexical databases for lexicography and historical linguistics
Author: Ana Vilacy Galucio
Author: Joshua Birchall
Abstract: As home to one of the largest digital language archives in Latin America, the Museu Goeldi has been developing an open-source workflow to produce multimedia dictionaries from the collections as community reference materials. We also have a long tradition of historical linguistics, both computational and traditional. Both of these endeavors rest crucially on the development of lexical databases. In this talk we present the current digital infrastructure we have developed and discuss issues in data handling and digital publication, concluding with a wishlist for suggestions and possible collaborations from the workshop participants.
Date: 9/23/2020
Conference Name: Untangling the linguistic of the Americas: Collaborative efforts and interdisciplinary approaches in an open science framework