Program

Digital Humanities: Digital Humanities Implementation Grants

Period of Performance

9/1/2012 - 5/31/2015

Funding Totals

$228,546.00 (approved)
$227,509.39 (awarded)


WordSeer: A Text Analysis Environment for Literature Study

FAIN: HK-50011-12

University of California, Berkeley (Berkeley, CA 94704-5940)
Marti Hearst (Project Director: January 2012 to May 2016)

Further development of the WordSeer platform, which provides computational analysis and visualization tools for literary researchers. The platform will be available for general use but also will include three new case studies based on three different text collections: interviews and writings of North American slaves (University of California, Berkeley); the works of Stephen Crane (Emory University); and the complete works of Shakespeare (University of Calgary).

This project will continue on the success of a Digital Humanities Startup grant (HD-51244-11) to produce a software environment for literary text analysis. Literature study is a cycle of reading, interpretation, exploration, and understanding. Called WordSeer, this software system integrates tools for automated processing of text with interaction techniques that support the interpretive, exploratory, and note-taking aspects of scholarship. Development of the tool follows best practices surrounding user-centered design and evaluation. At present, the system supports grammatical search and contextual similarity determination, visualization of patterns of word context. This implementation grant will allow for incorporating additional tools to aid comparison, exploration, grouping, and hypothesis formation, and to make the software more robust and therefore sharable and usable by a wide community of scholars.





Associated Products

WordSeer Website (Web Resource)
Title: WordSeer Website
Author: Aditi Muralidharan
Abstract: A Text Analysis Environment for Humanities Scholars WordSeer is a text analysis environment that combines visualization, information retrieval, sensemaking and natural language processing to make the contents of text navigable, accessible, and useful.
Year: 2011
Primary URL: http://wordseer.berkeley.edu/
Primary URL Description: Website for the WordSeer Project

WordSeer Open Source Software (Computer Program)
Title: WordSeer Open Source Software
Author: Aditi Muralidharan
Author: Ian MacFarland
Author: Hassan Jannah
Author: David Bendebury
Author: Raymon Sutedjo-The
Abstract: WordSeer is a text analysis environment that combines visualization, information retrieval, sensemaking and natural language processing to make the contents of text navigable, accessible, and useful. You can run WordSeer on your local machine or on a shared server, and you access it via your modern web browser of choice (Google Chrome works best, though). If you're a scholar who's interested in using Wordseer for your research, read on. If you want to learn more about how to contribute to WordSeer's ongoing design and development, take a look at our Guidelines for Contributors. You can find much more information on WordSeer, including demo videos, use case studies, and background research, at wordseer.berkeley.edu.
Year: 2016
Primary URL: https://github.com/Wordseer/wordseer/
Primary URL Description: Location of the open source code of WordSeer 4.0 on Github.
Secondary URL: http://wordseer.berkeley.edu/wordseer-4-0/
Secondary URL Description: Location of code for WordSeer 4.0 from the WordSeer site.
Programming Language/Platform: Python, javascript,
Source Available?: Yes

WordSeer 4.0 Demonstration Videos (Blog Post)
Title: WordSeer 4.0 Demonstration Videos
Author: Marti Hearst
Abstract: WordSeer 4.0 How-To Videos
Date: 2/24/2016
Primary URL: http://wordseer.berkeley.edu/wordseer-4-0-how-to-videos/
Primary URL Description: Location of how to videos on the WordSeer site.
Blog Title: WordSeer 4.0 How-To Videos
Website: WordSeer

Improving the Recognizability of Syntactic Relations Using Contextualized Examples (Conference Paper/Presentation)
Title: Improving the Recognizability of Syntactic Relations Using Contextualized Examples
Author: Aditi Muralidharan
Author: Marti Hearst
Abstract: A common task in qualitative data analysis is to characterize the usage of a linguistic entity by issuing queries over syntactic relations between words. Previous interfaces for searching over syntactic structures require programming-style queries. User interface research suggests that it is easier to recognize a pattern than to compose it from scratch; therefore, interfaces for non-experts should show previews of syntactic relations. What these previews should look like is an open question that we explored with a 400-participant Mechanical Turk experiment. We found that syntactic relations are recognized with 34% higher accuracy when contextual examples are shown than a baseline of naming the relations alone. This suggests that user interfaces should display contextual examples of syntactic relations to help users choose between different relations.
Date: 6/22/2014
Primary URL: http://people.ischool.berkeley.edu/~hearst/papers/acl14grammatical.pdf
Primary URL Description: url for paper
Conference Name: ACL

Supporting Exploratory Text Analysis in Literature Study (Article)
Title: Supporting Exploratory Text Analysis in Literature Study
Author: Aditi Muralidharan
Author: Marti Hearst
Abstract: We present WordSeer, an exploratory analysis environment for literary text. Literature study is a cycle of reading, interpretation, exploration, and understanding. While there is now abundant technological support for reading and interpreting literary text in new ways through text-processing algorithms, the other parts of the cycle—exploration and understanding—have been relatively neglected. We are motivated by the literature on sensemaking, an area of computer science devoted to supporting open-ended analysis on large collections of data. Our software system integrates tools for algorithmic processing of text with interaction techniques that support the interpretive, exploratory, and note-taking aspects of scholarship. At present, the system supports grammatical search and contextual similarity determination, visualization of patterns of word context, and examination and organization of the source material for comparison and hypothesis building. This article illustrates its capabilities by analyzing language-use differences between male and female characters in Shakespeare’s plays. We find that when love is a major plot point, the language Shakespeare uses to refer to women becomes more physical, and the language referring to men becomes more sentimental. Future work will incorporate additional sensemaking tools to aid comparison, exploration, grouping, and pattern recognition.
Year: 2012
Primary URL: http://llc.oxfordjournals.org/content/28/2/283.full?keytype=ref&%2520ijkey=nzuBUietose8TgW
Primary URL Description: url for paper
Access Model: subscription only
Format: Journal
Periodical Title: Literary and Linguistic Computing
Publisher: Oxford Journals

WordSeer: A Knowledge Synthesis Environment for Textual Data (Conference Paper/Presentation)
Title: WordSeer: A Knowledge Synthesis Environment for Textual Data
Author: Aditi Muralidharan
Author: Marti Hearst
Author: Christopher Fan
Abstract: We describe WordSeer, a tool whose goal is to help scholars and analysts discover patterns and formulate and test hypotheses about the contents of text collections, midway between what humanities scholars call a traditional “close read” and the new “distant read” or “culturomics” approach. To this end, WordSeer allows for highly flexible “slicing and dicing” (hence “sliding”) across a text collection. The tool allows users to view text from different angles by selecting subsets of data, viewing those as visualizations, moving laterally to view other subsets of data, slicing into another view, expanding the viewed data by relaxing constraints, and so on. We illustrate the text sliding capabilities of the tool with examples from a case study in the field of humanities and social sciences – an analysis of how U.S. perceptions of China and Japan changed over the last 30 years.
Date: 10/27/2013
Primary URL: http://people.ischool.berkeley.edu/~hearst/papers/cikmdemo2013.pdf
Primary URL Description: url for paper
Conference Name: CIKM