[Return to Query]
New Languages for NLP Course Materials (Course or Curricular Material)
Title: New Languages for NLP Course Materials
Author: Andrew Janco
Author: Natalia Ermolaev
Author: Toma Tasovac
Author: David Lassner
Author: Quinn Dombrowski
Author: Anubhav Sharma
Abstract: This site provides an open reference resource for participants during the workshops and acts as the first draft of materials for the online course. The course materials site has sections that present pre-requisite skills and knowledge. It has entries for each session during the workshops with supporting information and instructions. The overall goal of the course materials site is to provide an ongoing reference work to support participants’ work and asynchronous learning.
Year: 2021
Primary URL: https://new-languages-for-nlp.github.io/course-materials/intro.html
Primary URL Description: This is the URL for the course materials.
Audience: Graduate
New Languages for NLP project website (Web Resource)
Title: New Languages for NLP project website
Author: Andrew Janco
Author: Natalia Ermolaev
Abstract: The project website serves as the public-facing informational source for the project. This is where we articulated our aims and goals, as well as the significance of our project. We have a page that describes our languages, team-members, and research goals. The full schedules for our workshops are posted publicly here.
Year: 2021
Primary URL: https://newnlp.princeton.edu/
Primary URL Description: Project website URL
Cadet (Computer Program)
Title: Cadet
Author: Andrew Janco
Abstract: Cadet is an open-source Python web application that was created in 2021 by Andrew Janco to facilitate participants’ work and will be shared with the general public following the grant. The application facilitates the customization of language defaults for tokenization and lookups data. Cadet also uses token frequency to bulk annotate frequent unambiguous terms and to shorten the time needed for annotation.
Year: 2021
Primary URL: https://github.com/New-Languages-for-NLP/cadet
Primary URL Description: Source code for Cadet
Access Model: Open-source
Programming Language/Platform: Python
Source Available?: Yes
Eisenstein (Computer Program)
Title: Eisenstein
Author: Andrew Janco
Abstract: “Eisenstein” is an open-source Python web application that was built in 2021 by Andrew Janco for participants that needed optical character recognition using Tesseract. This web application simplifies Tesseract text extraction in over one hundred languages:
Year: 2021
Primary URL: https://eisenstein.apjan.co/
Primary URL Description: User-facing website
Secondary URL: https://github.com/apjanco/eisenstein
Secondary URL Description: Source code
Access Model: Open-source
Programming Language/Platform: Python
Source Available?: Yes
Permalink: https://apps.neh.gov/publicquery/products.aspx?gn=HT-272570-20