Program

Digital Humanities: Institutes for Advanced Topics in the Digital Humanities

Period of Performance

10/1/2018 - 9/30/2022

Funding Totals

$197,385.00 (approved)
$197,385.00 (awarded)


Word Vectors for the Thoughtful Humanist: Institutes on Critical Teaching and Research with Vector Space Models

FAIN: HT-261812-18

Northeastern University (Boston, MA 02115-5005)
Julia Hammond Flanders (Project Director: March 2018 to present)
Sarah Connell (Co Project Director: June 2018 to present)

A series of four three-day institutes for a total of 72 participants on the use of word embedding models for textual analysis. The three-day institutes would be hosted by Northeastern University.

The Northeastern University Women Writers Project seeks funding for a three-year institute series on word embedding models, to overcome barriers to entry for humanist researchers and teachers. We plan four institutes in all: two aimed at teachers and two aimed at researchers, with a novice and intermediate event for each audience. Each event will be followed by a three-month period of virtual discussion and consultation with WWP staff and fellow participants, and sharing of research and teaching outcomes.





Associated Products

Women Writers Vector Toolkit (Web Resource)
Title: Women Writers Vector Toolkit
Author: Ashley Clark
Author: Parth Tandel
Author: Parmeet Singh Saluja
Author: Syd Bauman
Author: Sarah Connell
Author: Molly Nebiolo
Author: Cara Messina
Author: William Reed Quinn
Abstract: The Women Writers Vector Toolkit (WWVT), an online laboratory for learning about and experimenting with word embedding models, housed at Northeastern University and part of the Women Writers Project’s WWP Lab. This site is an accessible and transparent space for users with many levels of experience in text analysis. The site includes detailed documentation in order to: describe the processes and choices behind the regularization of the texts used to train the models in the Word Vector Interface, publish code samples, and share assignments and case studies that demonstrate teaching and research applications for word embedding models.
Year: 2018
Primary URL: https://www.wwp.northeastern.edu/lab/wwvt/index.html
Primary URL Description: URL for the WWVT site.

Women Writers Vector Toolkit (Web Resource)
Title: Women Writers Vector Toolkit
Author: Sarah Connell
Author: Ashley Clark
Author: Parth Tandel
Author: Parmeet Singh Saluja
Author: Molly Nebiolo
Author: William Reed Quinn
Author: Cara Messina
Author: Syd Bauman
Abstract: The Women Writers Vector Toolkit (WWVT) is an online laboratory for learning about and experimenting with word embedding models, housed at Northeastern University and part of the Women Writers Project’s WWP Lab. The site is an accessible and transparent space for users with many levels of experience in text analysis. The site includes detailed documentation in order to: describe the processes and choices behind the regularization of the texts used to train the models in the Word Vector Interface, publish code samples, and share assignments and case studies that demonstrate teaching and research applications for word embedding models.
Year: 2018
Primary URL: https://www.wwp.northeastern.edu/lab/wwvt/index.html
Primary URL Description: Main WWVT site

Textual Re-Modeling: TEI Transformation for Word Embedding Models (Conference Paper/Presentation)
Title: Textual Re-Modeling: TEI Transformation for Word Embedding Models
Author: Sarah Connell
Abstract: This paper presented research on methods for transforming TEI documents for use with word embedding models, and offered a case study involving a sub-corpus of the Women Writers Project textbase.
Date: 11/10/2019
Primary URL: https://voices.uchicago.edu/dhcs2019/program/
Primary URL Description: Conference program
Conference Name: Chicago Colloquium on Digital Humanities and Computer Science

Encoding the Archive: Building Collaborative Digital Editions in an Early Modern Classroom (Conference Paper/Presentation)
Title: Encoding the Archive: Building Collaborative Digital Editions in an Early Modern Classroom
Author: Connell, Sarah
Abstract: This paper will discuss some lessons and challenges from the “Encoding the Archive” project, in which students work in teams to publish archival documents as part of a class on gender and early modern bodies at Northeastern University, taught by Professor Marina Leslie. Each student encodes an individual document according to the standards of the Text Encoding Initiative, but the teams make collective decisions about how to handle regularization, annotation, and representation of document structures and features. Student teams also decide how to display their documents online and author a group editorial declaration. This model allows students to contextualize their individual documents and encourages them to become invested in the editorial process, as they must reach consensus about many potential approaches to modeling and publishing their texts. This paper will share the project’s design considerations, adjustments over several iterations, student feedback and outcomes, and potential adaptations for other contexts.
Date: 04/20/2021
Primary URL: https://www.rsa.org/page/Virtual2021
Primary URL Description: Web site for the virtual RSA 2021 conference
Conference Name: Renaissance Society of America

Reading Models, Modelling Reading: Digital Texts and Human Readers (Conference Paper/Presentation)
Title: Reading Models, Modelling Reading: Digital Texts and Human Readers
Author: Flanders, Julia
Abstract: Drawing on the work of the Women Writers Project (http://www.wwp.northeastern.edu), this presentation will explore the ways in which digital models and representation systems serve as proxies for human reading, or as intermediaries in that process. How do models such as text markup and algorithmic analysis such as word embedding models operate in dialogue with the written "source" text? How do we read these models, and how do these models themselves operate as readers? (And what is a "model"?) We'll examine the Restoration-era women's texts from Women Writers Online and also data about the materials read by these authors, building on the WWP's current "Word Vectors for the Thoughtful Humanist" Institutes and recently completed "Intertextual Networks" grant that explores how women cite and make use of what they read.
Date: 04/15/2021
Primary URL: https://www.huntington.org/restoration-women
Primary URL Description: Main URL for the virtual conference
Conference Name: “This Reading of Books Is a Pernicious Thing”: Restoration Women Writers and Their Readers

Experiments in Tokenization for Word Embedding Models (Blog Post)
Title: Experiments in Tokenization for Word Embedding Models
Author: Juniper Johnson
Abstract: The previous models developed for the Women Writers Vector Toolkit (WWVT) allow users only to interact with a corpora using single word queries. While useful, this approach does have limitations: for instance, in querying multi-word terms such as place names (“New York”) or concepts (“women’s rights”). When a word embedding model is made using the wordVectors package, a textual corpus is tokenized into individual word-level units. Therefore, when the word embedding model is created, it forms a relational model between the words of a corpus, not phrases or concepts. After the first Word Vectors for the Thoughtful Humanist institute in 2019, the WWVT team sought to create expanded resources for scholars interested in word embedding models. One part of this project was to create a workflow that could provide opportunities to tokenize multiword strings in the word embedding models.
Date: 4/21/2021
Primary URL: https://wwp.northeastern.edu/blog/tokenization-word-embedding-models/
Blog Title: Women Writers Project Blog
Website: Women Writers Project

Are the romantics to blame? Exploring the WWP WordVectors Code as a Word Vectors Novice (Blog Post)
Title: Are the romantics to blame? Exploring the WWP WordVectors Code as a Word Vectors Novice
Author: Emily Miller
Abstract: This blog post explores the experience of what it was like to use the Women Writers Project's code for word-embedding models, and begin training and querying models on one's own corpus, as someone brand new to the toolset.
Date: 12/3/2021
Primary URL: https://wwp.northeastern.edu/blog/wordvectors-code-as-novice/
Blog Title: Women Writers Project Blog
Website: Women Writers Project

Exploring English Translations of the Passover Haggadah in Word2Vec (Blog Post)
Title: Exploring English Translations of the Passover Haggadah in Word2Vec
Author: Avraham Roos
Abstract: This blog post was excerpted from Avraham Roos’s dissertation, “Why is This Translation Different from All Other Translations? A Linguistic and Cultural-Historical Analysis of English Translations of the Passover Haggadah from 1770 to Now.” It uses several word2vec operations to experiment with language use in English translations of Hebrew texts for celebrating the Passover Sedar.
Date: 5/3/2022
Primary URL: https://wwp.northeastern.edu/blog/exploring-english-translations-of-the-passover-haggadah-in-word2vec/
Primary URL Description: WWP Blog
Blog Title: Women Writers Project: The Blog
Website: Women Writers Project

Struggling to Teach with Word Vectors (Blog Post)
Title: Struggling to Teach with Word Vectors
Author: Hayley Stefan
Abstract: This post first outlines several successful research outcomes using word2vec to examine young adult novels related to school shootings, and then provides a critical discussion of the continuing barriers to teaching with computational methods in small institutions.
Date: 8/29/2022
Primary URL: https://wwp.northeastern.edu/blog/stefan-word-vectors/
Primary URL Description: WWP Blog
Blog Title: Women Writers Project: The Blog
Website: Women Writers Project

TEI, Transformation, and Text Analysis: Building a Markup-based Toolkit for Word Embedding Models (Conference Paper/Presentation)
Title: TEI, Transformation, and Text Analysis: Building a Markup-based Toolkit for Word Embedding Models
Author: Sarah Connell
Abstract: This paper will share insights gained from building a toolkit that uses text encoding to improve corpus creation for text analysis, with a web interface that is designed for theoretically-grounded experimentation in algorithmic text analysis. The Women Writers Project is currently developing the Women Writers Vector Toolkit (beta link at https://wwp.northeastern.edu/wwo/lab/wwvt/, final version to be published in December 2018), an interface that will allow users to explore several different word embedding models trained on texts from Text Encoding Initiative (TEI) corpora that include Women Writers Online, the Victorian Women Writers Project, and Early English Books Online. Word embedding models are a powerful method for studying relationships between words in large corpora, but training and querying them requires knowledge of a computer programming language, such as Python or R.
Date: 07/25/2019
Primary URL: https://drive.google.com/drive/folders/1RjjVyAEeZi-7IfH0vSECYHT6KsO4o8GI
Primary URL Description: Public-facing slides and notes
Conference Name: Association for Computers and the Humanities