HT-272570-20 | Digital Humanities: Institutes for Advanced Topics in the Digital Humanities | Princeton University | New Languages for NLP: Building Linguistic Diversity in the Digital Humanities | 9/1/2020 - 8/31/2024 | $239,983.00 | Natalia | | Ermolaev | Andrew | | Janco | Princeton University | Princeton | NJ | 08540-5228 | USA | 2020 | Computational Linguistics | Institutes for Advanced Topics in the Digital Humanities | Digital Humanities | 239983 | 0 | 229080.63 | 0 | an institute to help humanities scholars learn how to create linguistic data and apply statistical models to new languages.
Natural Language Processing (NLP) has revolutionized our ability to interpret texts at scale and is an essential tool for scholars in the digital humanities. However, only a small percentage of the world’s languages are supported by the major NLP libraries. The New Languages for NLP Institute will help scholars with expertise in less-resourced languages to create linguistic data and train NLP models for their languages. In three workshops, held at the Center for Digital Humanities at Princeton University in 2021-2022, participants will create linguistic data and train statistical language models for new languages. They will learn best practices in project and research data management. As an outcome of the project, participants will publish an open dataset in the standard Conference on Computational Natural Language Learning format as well as a trained language model that can be used for computational text analysis. |