Program

Digital Humanities: Digital Humanities Start-Up Grants

Period of Performance

5/1/2015 - 6/30/2017

Funding Totals

$60,000.00 (approved)
$59,788.71 (awarded)


Easing Entry and Improving Access to Computer-Assisted Text Analysis for the Humanities

FAIN: HD-228732-15

Wheaton College (Norton, MA 02766-2322)
Mark David LeBlanc (Project Director: September 2014 to June 2019)
Scott Kleinman (Co Project Director: March 2015 to June 2019)
Michael D. C. Drout (Co Project Director: March 2015 to June 2019)

The addition of several features to the Lexos software, including a set of instructional resources to help scholars and students understand the most appropriate uses for computational methods for text analysis.

The rapid digitization of texts presents both new opportunities and real barriers of entry to computer-assisted explorations of texts. The Lexos software developed by the Lexomics Project provides a simple, web-based workflow for text processing, statistical analysis, and visualization designed to address these barriers. The project will support Lexos' core strength as an entry-level tool while seeking to position it as an innovative intervention in Digital Humanities conversations about the interplay of machine learning and text analysis. The project will embed dialogue about the use of computational methods to study humanities data in the tool itself through our "In the Margins" feature to collect and disseminate discussions of the problems, solutions, and best practices for using computational methods for text analysis.



Media Coverage

Breaking down the bard Lexomics project tackles question of collaboration in Shakespeare’s ‘Henry VIII,’ ‘Two Noble Kinsmen’ (Media Coverage)
Author(s): Communications Department
Publication: Wheaton College News and Events
Date: 12/11/2014
Abstract: Ann Marie Brasacchio ’16 and Elizabeth Peterson ’15, under the guidance of Professor of English Michael Drout, have been working to unravel that mystery using Wheaton’s Lexomics program to analyze how words and phrases are used in these plays.
URL: http://wheatoncollege.edu/news/2014/12/11/breaking-down-the-bard/?utm_source=WheatonWeekDec11&utm_medium=email&utm_content=bard&utm_campaign=WheatonWeek

Developing scholarly tools Wheaton research team wins grant from the NEH (Media Coverage)
Author(s): Communications Department
Publication: Wheaton College (News and Events)
Date: 4/16/2015
Abstract: A Wheaton College-based research team plans to make computer text analysis more accessible to humanities scholars and students with a grant from the National Endowment for the Humanities (NEH).
URL: http://wheatoncollege.edu/news/2015/04/16/developing-scholarly-tools/?utm_source=20150416&utm_medium=email&utm_content=lexos&utm_campaign=WheatonWeek

Resource: How to Create and Cluster Topic Files in Lexos (Review)
Author(s): Editors (Digital Humanties Now)
Publication: Digital Humanities Now
Date: 9/10/2015
Abstract: This post is a follow-up to last year’s How to Create Topic Clouds with Lexos, where I showed how Lexos can be used to visualise topic models produced by Mallet. From time to time, colleagues have wondered whether it would be possible to use Lexos to perform cluster analysis on the topics Mallet produces. The motivation for doing this is simple enough; topics are often very similar, and it would be useful to have some statistical measure of this similarity to help us decide where groups of topics really should be interpreted under some meta-class.
URL: http://digitalhumanitiesnow.org/2015/09/resource-how-to-create-and-cluster-topic-files-in-lexos/

CSUN Faculty Member Awarded Grant to Develop Literature Analysis Software (Media Coverage)
Author(s): Hansook Oh
Publication: CSUN Today
Date: 8/19/2015
Abstract: California State University, Northridge English professor Scott Kleinman is writing a tale of literature and computer science. Awarded a $60,000 grant from the National Endowment for the Humanities, Kleinman is working to further develop Lexos, a software tool that aids in the analysis and interpretation of literature. Lexos makes computational text analysis more easily accessible to scholars and students in the humanities, who may not have the time or resources to learn sophisticated computerized coding techniques.
URL: http://csunshinetoday.csun.edu/education/csun-faculty-member-awarded-grant-to-develop-literature-analysis-software/

Developing Scholarly Tools (Review)
Author(s): Communications Department
Publication: Wheaton Quarterly
Date: 9/15/2015
Abstract: The NEH SUG grant will provide funding for the Lexomics team, which includes Wheaton professors Michael Drout and Mark LeBlanc as well as California State University Northridge professor Scott Kleinman, to continue its efforts over the next two summers. It marks the third grant the team has received from the NEH. The team also has received support from the Mellon Foundation and from the college’s endowed funds.
URL: http://wheatoncollege.edu/quarterly/2015/09/15/developing-scholarly-tools/

Computing and the Digital Humanities (Review)
Author(s): LeBlanc, M.D.
Publication: NCWIT Teaching Paper
Date: 5/17/2016
Abstract: This paper introduces three assignments—each with their own “starter kits” for students—for those with a love of the written (and digital) word. Programming on and with digitized texts introduces students to rich new areas of scholarship including stylometry (i.e., the statistical analysis of variations in literary style between one writer or genre and another), applied to, for example, authorship attribution.
URL: https://www.engage-csedu.org/find-resources/computing-and-digital-humanities



Associated Products

DNA and Mandarin: Bringing introductory programming to the Life Sciences and Digital Humanities (Conference Paper/Presentation)
Title: DNA and Mandarin: Bringing introductory programming to the Life Sciences and Digital Humanities
Author: LeBlanc, M.D. and Drout, M.D.C.
Abstract: The ability to write software (to script, to program, to code) is a vital skill for students and their future data-centric, multidisciplinary careers. We present a ten-year effort to teach introductory programming skills in domain-focused courses to students across divisions in our liberal arts college. By creatively working with colleagues in Biology, Statistics, and now English, we have designed, modified, and offered six iterations of two courses: “DNA” and “Computing for Poets”. Larger percentages of women have consistently enrolled in these two courses vs. the traditional first course in the major. We share our open source course materials and present here our use of a blended learning classroom that leverages the increasing quality of online video lectures and programming practice sites in an attempt to maximize faculty-student interactions in class.
Date: 06/02/2015
Primary URL: http://www.sciencedirect.com/science/article/pii/S1877050915012661
Primary URL Description: Procedia Computer Science Volume 51, 2015, Pages 1937–1946
Conference Name: International Conference On Computational Science, ICCS 2015 — Computational Science at the Gates of Nature

Exploring Digitized Texts: The Digital Humanities as Makers (Public Lecture or Presentation)
Title: Exploring Digitized Texts: The Digital Humanities as Makers
Abstract: Post-graduation, students are faced with two questions: (1) What do you know? and (2) What can you make? Liberal Arts Colleges, in particular, do an excellent job with helping students to answer the first, however, we struggle with the second. This talk provides examples of how the "maker movement" on our campus intersects with our on-going work in the digital humanities. The rapid digitization of texts presents both new opportunities and real barriers of entry to computer-assisted explorations of texts. We present our open-source Lexos software that provides a simple, web-based workflow for text processing, statistical analysis, and visualization.Lexos was created for use with small to medium-sized collections of texts (rather than large text corpora or “big data”) and expands the range of statistical and visualization methods within reach of Humanities students and scholars, particularly those who are just learning to employ computational techniques in their work. In addition to examples using the software to explore texts from Old English to classical Chinese, the talk will share how research with undergraduates fuels our interdisciplinary teaching and how our teaching generates new avenues for exploration.
Author: LeBlanc, M.D.
Date: 09/10/2015
Location: Denison University, Granville, OH

Lexomic Analysis of Beowulf: Part 1, Cluster Analysis. (Book)
Title: Lexomic Analysis of Beowulf: Part 1, Cluster Analysis.
Author: Michael D.C. Drout, Yvette Kisor, Leah Smith, Allison Dennett and Natasha Piirainen.
Abstract: Michael D.C. Drout, Yvette Kisor, Leah Smith, Allison Dennett and Natasha Piirainen (in press, 2016). Lexomic Analysis of Beowulf: Part 1, Cluster Analysis. New York: Palgrave Pivot, 2016. Application of cluster analysis to Beowulf.
Year: 2016
Type: Multi-author monograph
Copy sent to NEH?: No

Tracking the Moving Ratio of þ to ð in Anglo-Saxon Texts: A New Method, and Evidence for a lost Old English version of the ‘Song of the Three Youths.’ (Article)
Title: Tracking the Moving Ratio of þ to ð in Anglo-Saxon Texts: A New Method, and Evidence for a lost Old English version of the ‘Song of the Three Youths.’
Author: Michael D.C. Drout and Elie Chauvet
Abstract: This paper demonstrates that a plot of the continuously rolling ratio of the characters þ to ð can be used to identify sections of Anglo-Saxon poems whose transmission histories differ from each other. Substantial differences in values of the function ? = þ/(þ+ð) are correlated both with the division between Genesis A and B and with the boundaries of a section of Genesis A that has a source other than the Latin Bible, thus validating the approach. The ?-plot of Daniel contains an anomaly that begins at lines 362–364, precisely those lines of the poem that are paralleled in the runic inscription on the newly discovered Honington Clip, described by John Hines in this issue of Anglia (pp. 257–277). The confluence of the evidence of the archeological find, the ?-analysis, previous traditional and computer-assisted analysis, and recent art historical work by Phyllis Portnoy leads to the conclusion that both the runic inscription and lines 362–408 of Daniel derive from a lost Old English “Song of the Three Youths”. Further investigation shows that the methods can be applied successfully to the Exeter Book poem Azarias, which shares a common ancestor with Daniel. The approach can also be extended by plotting the correlation between ? and the frequency of dental fricatives in final position in Old English texts. The methods therefore not only identify a specific lost Old English source but also possess a potential general utility for reconstructing the histories of Anglo-Saxon texts.
Year: 2015
Primary URL: http://www.degruyter.com/view/j/angl.2015.133.issue-2/ang-2015-0024/ang-2015-0024.xml
Format: Journal

Lexos: a workflow for text analysis (Computer Program)
Title: Lexos: a workflow for text analysis
Author: LeBlanc, M.D.
Author: Kleinman, Scott
Abstract: Lexos is an integrated workflow of tools to facilitate the computational analyses of texts, presented in a web-based interface. Functionality provided includes the ability to "scrub" texts (remove punctuation, lemmatize, consolidate characters, remove stopwords, etc), cut or segment texts, and a suite of options for analysis and visualizations, including creating and downloading Document Term Matrices (DTM) of token counts (both word- and character-ngrams or tf-idf); cluster analysis (hierarchical or k-means, with silhouette scores); rolling-window analyses of substring, word, or regex-pattern occurrences; bubble visualizations (of term frequencies); and word clouds (of term frequencies or MALLET-produced topic modelling results). More functionality is being added on an ongoing basis.
Year: 2015
Primary URL: https://github.com/WheatonCS/Lexos
Programming Language/Platform: Python and (Javascript, HTML/CSS) Runs in your browser via the web and/or local installations for MacOS, Windows, and Linux.
Source Available?: Yes

Course Materials for Teaching Introductory Programming for Text Analysis (Course or Curricular Material)
Title: Course Materials for Teaching Introductory Programming for Text Analysis
Author: LeBlanc, M.D.
Abstract: Five assignments in a semester-long CS-1-like course named Computing for Poets to introduce students to programming within one area of the digital humanities: the application of computing to the study of digitized texts. The course exposes students to leading markup languages (HTML, CSS, XML) and teaches computer programming (Python) as a vehicle to explore and “data mine” digitized texts. Programming facilitates top-down thinking and practice with computational thinking skills such as problem decomposition, algorithmic thinking, and experimental design - topics that humanities students in our experience rarely see. A learning objective for students in this course is to articulate how computational analyses of digitized texts enables both a “close reading” of a single text and as well as a “distant reading” of many texts across time. The goal for each student is to master enough programming to modify digitized texts to help in a computational experiment that explores a question of a text or set of texts.
Year: 2015
Primary URL: https://www.engage-csedu.org/search/site/LeBlanc
Audience: Undergraduate

Beowulf: An Intensive 5-Day Seminar (Conference/Institute/Seminar)
Title: Beowulf: An Intensive 5-Day Seminar
Author: Michael D.C. Drout.
Abstract: An Intensive 5-Day Seminar, highlighting the application of Lexos to digitized texts.
Date Range: December 28, 2014-January 2, 2015 and May 13-15, 2016
Location: Schooling for Life, Los Angeles, CA.

A Reconsideration of the Relationship Between Víga-Glúms Saga and Reykdœla Saga: New Evidence from Lexomic Analysis (Article)
Title: A Reconsideration of the Relationship Between Víga-Glúms Saga and Reykdœla Saga: New Evidence from Lexomic Analysis
Author: Rosetta Berger and Michael D.C. Drout
Abstract: Reconsideration of the Relationship Between Víga-Glúms Saga and Reykdœla Saga: New Evidence from Lexomic Analysis.
Year: 2015
Format: Journal
Periodical Title: Viking and Medieval Scandinavia

Topic Modeling Ancient Chinese Texts: Knowledge Discovery in Databases for Humanists (Article)
Title: Topic Modeling Ancient Chinese Texts: Knowledge Discovery in Databases for Humanists
Author: Scott Kleinman, R. Nichols, K. Nielbo, E. Slingerland, U. Bergeton, and C. Logan
Abstract: tbd
Year: 2016
Format: Journal
Periodical Title: Journal of Asian Studies

Lexomics Across the Academy (Conference/Institute/Seminar)
Title: Lexomics Across the Academy
Author: Mike Drout, Mark LeBlanc, and three Wheaton undergraduates
Abstract: Hands-on sessions for faculty at Johnson C. Smith University (JCSU, Charlotte, NC) over a two-day period.
Date Range: Oct. 5-7, 2016
Location: JCSU, Charlotte, NC

Text Analysis with Lexos (Conference/Institute/Seminar)
Title: Text Analysis with Lexos
Author: Scott Kleinman
Abstract: Hands-on Lexos workshop in Nepal.
Date Range: June 2017
Location: Institute of Advanced Communication, Education and Research, Kathmandu, Nepal

Bringing Computational Thinking to the Digital Humanities: Introducing Students to Explorations of Digitized Texts (Conference/Institute/Seminar)
Title: Bringing Computational Thinking to the Digital Humanities: Introducing Students to Explorations of Digitized Texts
Author: Mark D. LeBlanc
Abstract: Our work on Lexos as an entry-level tool for scholars of digitized texts is presently funded by the National Endowment for the Humanities (NEH) and reflects six years of development and testing, including use in our interdisciplinary undergraduate courses. More information on our successful use in the classroom and in our own research (Beowulf, classical Chinese, Tolkien, Poe, etc) is available at our website: http://lexomics.wheatoncollege.edu. The software for Lexos is available on our github repo: https://github.com/WheatonCS/Lexos. Introductions and Agenda (which we will modify as we go :) I. Lexos play (50 minutes) II. Review of computational techniques and “teachable moments” (30 minutes) III. Discussion of outreach to the Digital Humanities on our campuses (20 minutes) IV. Pair-programming – probing a “fav” set of texts (50 minutes)
Date Range: April 7, 2017
Location: CCSCNE 2017 - The College of Saint Rose, Albany, NY
Primary URL: http://lexos.wheatoncollege.edu

Lexos: Easing Entry to Computational Studies with Digitized Texts (Conference/Institute/Seminar)
Title: Lexos: Easing Entry to Computational Studies with Digitized Texts
Author: Mark D. LeBlanc, Kate Boylan, Mike Drout
Abstract: In our experience, scholars who might like to perform computational analysis in their areas of expertise and/or wish to teach their students how to do so become discouraged too early in the game. This workshop will provide hands-on exposure to and practice with the free, open-source, web-based tool Lexos, including course materials that we have used in our interdisciplinary courses; our software is available at our GitHub repo. The workshop goal is to lower the barriers required for computer-assisted text analysis over a broad range of texts, including pre-modern and non-Western languages. Lexos requires no prerequisities to use, in fact a take away from the workshop is to stimulate your ideas for the many different ways to introduce students to computational analyses of texts. Participants are encouraged to arrive with a folder of text files of interest (raw text files, .txt, HTML, or XML required; .pdf and .docx formats are not handled). Our work on Lexos as an entry-level tool for scholars of digitized texts is presently funded by the National Endowment for the Humanities (NEH), Wheaton College (Norton, MA) and the Center for the Digital Humanities at California State University, Northridge reflects six years of development and testing, including use in our undergraduate classrooms. More information on our successful use in the classroom and in our own research (presently in Beowulf, classical Chinese, Tolkien, Poe, etc.) is available at our website. This session will be taught by Wheaton College’s Michael D.C. Drout, Professor of English, Director of the Center for the Study of the Medieval; Mark D. LeBlanc, Prof. of Computer Science; and Kate Boylan, Digital Initiatives Librarian.
Date Range: March 20, 2017
Location: Boston College Libraries Coffee & Code series, O’Neill Library, Digital Studio
Primary URL: https://ds.bc.edu/2017-spring-events/

Toward Reproducibility in DH Experiments: A Case Study in Search of Edgar Allan Poe’s First Published Work (Conference Paper/Presentation)
Title: Toward Reproducibility in DH Experiments: A Case Study in Search of Edgar Allan Poe’s First Published Work
Author: Mark D. LeBlanc
Abstract: Reproducing experimental results is a hallmark of empirical investigation and serves both to verify and inspire. This paper is a call for more systematic documentation of computational stylistic experiments. Publishing only summaries of the methods and results of empirical work is an artifact of traditional print media. To facilitate experimental reproducibility and to help the growing community who wish to learn how to apply computational methods and subsequently teach the next generation of scholars, the publication of results must include (i) access to the digitized texts, (ii) a clear workflow and most essentially (iii) the source code that led to each and all of the experimental results. By way of example, we present the steps and process in a GitHub repository for computationally probing the unknown and contested authorship of an 1831 short story entitled “A Dream” as we seek evidence if this work is similar to other attributed works by Edgar Allan Poe. The entire framework is intended as a pedagogical jumpstart for others, especially those new to computational stylometry. If Poe did write the story, it would be his first published work.
Date: 08/10/2017
Primary URL: https://dh2017.adho.org/abstracts/027/027.pdf
Conference Name: Digital Humanities (DH) 2017

Lexos 2017: building reliable software in python (Article)
Title: Lexos 2017: building reliable software in python
Author: Mark D. LeBlanc
Author: Cheng Zhang
Author: Weiqi Feng
Author: Emma Steffens
Author: Alvaro de Landaluce
Author: Scott Kleinman
Abstract: Refactoring software is challenging, but necessary to ensure software correctness and extensibility. We present a plan that blends automated tools and human reviews when refactoring the back-end of a web-based application. The Lexos software, developed by the NEH-funded Lexomics Project, provides a simple, web-based workflow for text processing, statistical analysis, and visualization of results when exploring digitized texts. The development of Lexos spans six years and includes over fifty undergraduate developers, many who assumed leadership roles in architectural design and systems engineering over three software releases. This paper shares our current refactoring effort on the Python backend to produce Lexos v3.2, an effort that includes a transition from Python v2.7 to Python v3.6. Good software engineering practices guide the effort, including the use of type hinting, a Model-View-Control pattern, PEP 8 code and PEP 257 documentation styles, unit testing, and continuous integration.
Year: 2018
Primary URL: https://dl.acm.org/citation.cfm?id=3205205&dl=ACM&coll=DL
Access Model: Subscription only
Format: Journal
Periodical Title: Journal of Computing Sciences in Colleges
Publisher: Consortium for Computing Sciences in Colleges

Digital Humanities Projects with Small and Unusual Data: Some Experiences from (Conference Paper/Presentation)
Title: Digital Humanities Projects with Small and Unusual Data: Some Experiences from
Author: Scott Kleinman
Abstract: What constitutes small and unusual will mean different things to different people, so keep in mind that [the author] always be speaking in relative terms. [The author's] working definition of small and unusual data will be texts and languages that are typically not used for developing and testing the tools, methods, and techniques used for Big Data analysis. [The author will] be using “Big Data” as my straw man, even though most data sets in the Humanities are much smaller than those for whom the term is typically used in other fields. But [the author] want[s] to distinguish the types of data [the author] will be discussing the from large corpora of hundreds or thousands of novels in Modern English which are the basis of important Digital Humanities work. {The author will] also be primarily concerned with the application of machine-learning, statistical, and quantitative approaches to the analysis of unstructured texts, which forms one part of what might be called the core of activity in the Digital Humanities. But the issues [the author will] be addressing overlap significantly with other DH activities such as the application of linked data and digital editing.
Date: 2/5/2016
Primary URL: http://scottkleinman.net/blog/2016/03/15/digital-humanities-projects-with-small-and-unusual-data/
Primary URL Description: Modified transcript of the talk.
Conference Name: Symposium on Data Science and Digital Humanities

Lexomics (Web Resource)
Title: Lexomics
Author: Mark LeBlanc
Author: Scott Kleinman
Author: Michael Drout
Abstract: Website for the Lexomics project
Year: 2015
Primary URL: https://wheatoncollege.edu/academics/special-projects-initiatives/lexomics/

Lexos (Web Resource)
Title: Lexos
Author: Mark LeBlanc
Author: Scott Kleinman
Author: Michael Drout
Abstract: Website for the Lexos software project
Year: 2015
Primary URL: http://lexos.wheatoncollege.edu
Secondary URL: https://github.com/WheatonCS/Lexos