A Text Analysis Tool for Examining Stylistic Similarities in Narrative Collections
FAIN: HD-51244-11
University of California, Berkeley (Berkeley, CA 94704-5940)
Bryan E. Wagner (Project Director: October 2010 to February 2013)
Development of a text analysis tool for examining and visualizing grammatical and stylistic features to assist authorship identification.
Increasing numbers of primary and secondary source texts have been digitized in recent years. Scholars who want to study these new collections in depth need computational assistance because of their large scale. The non-programmer tools for text analysis currently available operate at the word level, and they show tables of counts and lists of occurrences, but rarely interactive visualizations. We propose to build a text analysis tool that includes visualizations and works on the grammatical structure and stylistic features of text, applying highly accurate technology from computational linguistics and authorship identification to extract this information. We will develop our tool for a collection of slave narratives whose authorship is ambiguous. In doing so, we will find out whether visualizations of grammatical and stylistic features are useful to literary scholars, and whether this information allows them to make satisfying large-scale analyses of their text.
Media Coverage
Trimming time in the stacks (Media Coverage)
Author(s): Nicole Freeling
Publication: University of California | Research | Explore stories
Date: 12/20/2011
Abstract: A sophisticated text-analyzing tool developed by a UC Berkeley graduate student could speed literary searches for humanities scholars and other researchers.
URL: http://research.universityofcalifornia.edu/stories/2011/12/wordseer.html
"We need tools that can help people have their ideas faster" (Media Coverage)
Author(s): Mac Slocum
Publication: O'Reilly Radar
Date: 2/17/2011
Abstract: Aditi Muralidharan on improving discovery and building intuition into search.
URL: http://radar.oreilly.com/2011/02/aditi-muralidharan-wordseer.html
Review of WordSeer, produced by Aditi Muralidharan, Marti Hearst, and Bryan Wagner (Review)
Author(s): Amy Earhart
Publication: Journal of Digital Humanities
Date: 4/10/2012
Abstract: The WordSeer tool, developed at the University of California, Berkley by Aditi Muralidharan and Marti Hearst with research partner Bryan Wagner, is an exploratory analysis or “sensemaking” environment for literary texts. The tool is based on an understanding of literary analysis as a cyclical, rather than a linear, process, a notion that has been underemphasized in tool development where visualizations and datamining have generally been seen as exposing the text for scholarly treatment. WordSeer allows you to read a text, search for relationships between words and phrases, examine grammatical relationships, and examine produced heat map and tree visualizations.
URL: http://journalofdigitalhumanities.org/1-1/wordseer/
Associated Products
WordSeer (Computer Program)Title: WordSeer
Author: Aditi Muralidharan
Author: Marti A. Hearst
Author: Bryan Wagner
Abstract: WordSeer is an exploratory text analysis environment for literature study. It is currently available for three collections of text, and will become generally available in late 2012.
Year: 2011
Primary URL:
http://wordseer.berkeley.eduPrimary URL Description: UC Berkeley web application
Access Model: Open source
Source Available?: Yes
A visual interface for exploring language use in slave narratives (Conference Paper/Presentation)Title: A visual interface for exploring language use in slave narratives
Author: Aditi Muralidharan
Abstract: The increasing prevalence of digitized source material in the humanities has led to uncertainty about how this suddenly available information will change scholars' research methods. What balance will scholars strike between in-depth examination of a few sources, and a more "distant reading" (Moretti 2005) of a large number of them? Our focus is specifically on text collections: comparing texts, and identifying and tracing patterns of language use. These tasks are not widely supported by any current software, but if humanities researchers want to use digitized text collections on a larger scale, they will need to do exactly such things.
We restrict ourselves to a particular collection: the North American antebellum slave narratives, written by fugitive slaves in the decades before the Civil War with the support of abolitionist sponsors. Scholars agree about the slave narrative's most basic conventions but it is likely that these narratives, with their extreme repetitiveness, may also manifest other regular features that have yet to be detected by scholars. This project aims to assist literary scholars in uncovering these patterns with computational techniques.
In collaboration with English scholars, we have built WordSeer, a system that can compare two or more narratives' grammatical features, and analyze the distribution of textual patterns throughout an entire collection. Our goal is for English scholars to be able to use our system to gather accurate information about language use patterns in a way that is intuitive, and natural to them.
We will present the system currently under development, and share the lessons we have learned while building a text exploration interface for use in the humanities.
Date: 2011-06-21
Primary URL:
http://dh2011abstracts.stanford.edu/xtf/view?docId=tei/ab-324.xml;query=;brand=defaultPrimary URL Description: Full text at conference website.
Conference Name: Digital Humanities 2011
WordSeer: Exploring Language Use in Literary Text (Conference Paper/Presentation)Title: WordSeer: Exploring Language Use in Literary Text
Author: Aditi Muralidharan
Author: Marti A. Hearst
Abstract: Increasing numbers of primary and secondary source texts in the
humanities have been digitized in recent years. Humanities
scholars who want to study these new collections in depth need
computational assistance because of their large scale. We have
built WordSeer, a text analysis tool that includes visualizations
and works on the grammatical structure of text extracted using
highly accurate off-the shelf natural language processing tools. We
have focused on the task of exploring language use patterns in a
collection of North American slave narratives, but the technique is
applicable to any text collection. Our preliminary user studies
with humanities scholars show that WordSeer makes it easier for
them to translate their questions into queries and find answers to
their questions compared to a standard keyword-based search
interface. In this paper, we present the system currently under
development and describe text analysis features we plan to
include in the next iteration.
Date: 2011-10-20
Primary URL:
https://docs.google.com/a/kent.edu/viewer?a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnxoY2lyd29ya3Nob3B8Z3g6NmU4YWM1ODc5NGE0MGYPrimary URL Description: PDF at the conference website
Conference Name: HCIR 2011 Conference on Human-Computer Interaction and Information Retrieval
Large Scale Text Analysis in the Humanities: Methods and Challenges (Conference/Institute/Seminar)Title: Large Scale Text Analysis in the Humanities: Methods and Challenges
Author: Aditi Muralidharan
Abstract: To tackle increasingly large digitized archives of text, the digital humanities community has responded with an avid interest in text mining and visualization. Everywhere one looks these days, computer scientists are bringing text analysis to humanities scholars with tutorials, workshops, and toolkits. Nevertheless, crucial information is being lost in translation. If text analysis toolkits are to be truly successful, information needs to start flowing the other way and computer scientists must learn from humanities scholars what humanistic text analysis really means. If not, they will continue making "natural" assumptions that do not always translate into the humanities. For example, concepts like "question", "hypothesis", "data", "evidence" are always well-defined in scholars' minds and are universal to all analysis. In the extreme case, this misalignment of basic assumptions could lead to fleets of powerful text analysis tools that nobody knows how to actually apply to humanistic analysis.
Date Range: 2011-09-20
Location: Maryland Institute for Technology in the Humanities, B0131 McKeldin Library, College Park, MD
Primary URL:
http://mith.umd.edu/podcast/?podcast=112Primary URL Description: Podcast and Slides on the Institute's web page.
Building Intuition Into Search (Radio/Audio Broadcast or Recording)Title: Building Intuition Into Search
Writer: Aditi Muralidharan
Abstract: If the semantic web becomes an everyday reality, we’d have a web that would produce better answers for us. But what if you’re not sure of the question in the first place? Aditi Muralidharan wanted to know if it was possible to create a search engine that will intuitively know as it’s searching what you actually need. Nora speaks to Aditi about her project WordSeer – a search tool that analyzes language patterns in an effort to build intuition into the search engine. (Runs 5:57)
Date: 2011-06-26
Primary URL:
http://www.cbc.ca/spark/2011/06/spark-153%E2%80%93-june-26-29-2011/Primary URL Description: Description of segment on Canadian science radio show Spark's website.
Format: Radio
Men and Women in Shakespeare (Blog Post)Title: Men and Women in Shakespeare
Author: Aditi Muralidharan
Abstract: In previous posts, I’ve shown how WordSeer can be used to explore small, well-defined questions: what word did Shakespeare use for ‘beautiful’? Is the occurrence of the word ‘love’ the same in the comedies and tragedies? This post is different. WordSeer has now developed enough to support a simple, but complete, exploratory analysis.
The question we’ll think about is this:
“How does the portrayal of men and women in Shakespeare’s plays change under different circumstances?”
As one answer, we’ll see how WordSeer suggests that when love is a major plot point, the language referring to women changes to become more physical, and the language referring to men becomes more sentimental.
Date: 2012-01-24
Primary URL:
http://mininghumanities.com/2012/01/24/men-and-women-in-shakespeare/Primary URL Description: Blog post at mininghumanities.com
Blog Title: Text Mining and the Digital Humanities
Website:
http://mininghumanities.comWordSeer: “love” in Shakespeare’s tragedies and comedies (Blog Post)Title: WordSeer: “love” in Shakespeare’s tragedies and comedies
Author: Aditi Muralidharan
Abstract: When scholars try to make sense out of large collections of text, they frequently do two things: compare, and collect. They collect samples of “interesting” things, and compare them with each other along various relevant dimensions.
In this post, I demonstrate the collection and comparison features of WordSeer by using it to compare the usage of the word “love” in Shakespeares comedies and tragedies. You can watch the screencast, or simply read on.
Date: 2011-12-15
Primary URL:
http://mininghumanities.com/2011/12/15/wordseer-compares-love-in-tragedies-and-comedies/Primary URL Description: Blog post at mininghumanities.com
Blog Title: Text Mining and the Digital Humanities
Website:
http://mininghumanities.com“Beautiful” in Shakespeare (Blog Post)Title: “Beautiful” in Shakespeare
Author: Aditi Muralidharan
Abstract: A common problem in search and exploration interfaces is the vocabulary problem. This refers to the great variety of words with which different people can use to describe the same concept. For people exploring a text collection, this makes search difficult. There are only a limited number different queries they can think of to describe that concept, but they may be missing many other instances that use different words. This is an important issue for humanities scholars. Often, the very first step of a literature analysis is to comb through text, trying to find thought-provoking examples to study later.
In this post, I give an example of how our project WordSeer, a text analysis environment for humanities scholars, can be used to overcome this problem. In this example, I’ll using an instance of WordSeer running on the complete works of Shakespeare from the Internet Shakespeare Editions. It’s live, so you can follow along with this example on the web at wordseer.berkeley.edu/shakespeare.
Date: 2011-12-07
Primary URL:
http://mininghumanities.com/2011/12/07/beautiful-in-shakespeare/Primary URL Description: Blog post at mininghumanities.com
Blog Title: Text Mining and the Digital Humanities
Website:
http://mininghumanities.comSensemaking for Literature Study (Conference Paper/Presentation)Title: Sensemaking for Literature Study
Author: Aditi Muralidharan
Abstract: We present WordSeer, an exploratory analysis environment for literary text. Literature study is a cycle of reading, interpretation, exploration, and understanding. While there is now abundant technological support for reading and interpreting literary text in new ways through text-processing algorithms, the other parts of the cycle -- exploration and understanding -- have been relatively neglected. We are motivated by the literature on sensemaking, an area of computer science devoted to supporting open-ended analysis on large collections of data
Date: 04/06/2012
Primary URL:
http://www.eecs.berkeley.edu/~aditi/presentations/sensemaking-for-literature-study.pdfPrimary URL Description: Presentation PDF at http://eecs.berkeley.edu
Conference Name: Stanford University Weekly HCI Lunch
(to appear) Supporting Exploratory Text Analysis in Literature Study (Article)Title: (to appear) Supporting Exploratory Text Analysis in Literature Study
Author: Aditi Muralidharan
Author: Marti A. Hearst
Abstract: We present WordSeer, an exploratory analysis environment for literary text. Literature study is
a cycle of reading, interpretation, exploration, and understanding. While there is now abundant technological
support for reading and interpreting literary text in new ways through text-processing algorithms, the other
parts of the cycle -- exploration and understanding -- have been relatively neglected. We are motivated
by the literature on sensemaking, an area of computer science devoted to supporting open-ended analysis
on large collections of data. Our software system integrates tools for algorithmic processing of text
with interaction techniques that support the interpretive, exploratory, and note-taking aspects of scholarship.
At present, the system supports grammatical search and contextual similarity determination, visualization of
patterns of word context, and examination and organization of the source material for
comparison and hypothesis-building. This article illustrates its capabilities by analyzing
language use differences between male and female characters in Shakespeare's plays. We find that when
love is a major plot point, the language Shakespeare uses to refer to women becomes more physical,
and the language referring to men becomes more sentimental. Future work will incorporate additional
sensemaking tools to aid comparison, exploration, grouping, and pattern recognition.
Year: 2012
Primary URL:
http://www.eecs.berkeley.edu/~aditi/papers/llc-sensemaking.pdfPrimary URL Description: Draft of PDF at eecs.berkeley.edu
Access Model: Subscription
Format: Journal
Periodical Title: Literary and Linguistic Computing
Publisher: Oxford Journals