Program

Digital Humanities: Digital Humanities Implementation Grants

Period of Performance

9/1/2014 - 8/31/2017

Funding Totals

$324,841.00 (approved)
$324,841.00 (awarded)


Exploring the Billions and Billions of Words in the HathiTrust Corpus with Bookworm: HathiTrust + Bookworm Project

FAIN: HK-50176-14

Board of Trustees of the University of Illinois (Champaign, IL 61801-3620)
J. Stephen Downie (Project Director: February 2014 to present)
Erez Lieberman-Aiden (Co Project Director: July 2014 to present)

The enhancement and integration of the Bookworm analytical tool with the HathiTrust Digital Library, which holds 3.9 billion pages of digitized materials. Scholars would be able build individual collections of materials to be studied and to discover new textual use patterns across the corpus.

The HathiTrust + Bookworm (HT+BW) Project provides scholars new ways to explore trends within the massive HathiTrust corpus. Detailed exploration of metadata facets adds analytic value over such tools as Google Ngram Viewer. It enables scholars to explore personal worksets and aids discovery of new works. It will help the HathiTrust Research Center provide computational access to the HathiTrust corpus. Open-source improvements to Bookworm code will increase value to other large text projects.





Associated Products

HathiTrust + Bookworm (HT+BW): A Collaboration between University of Illinois at Urbana-Champaign, Indiana University, Northeastern University, Baylor College of Medicine, and Rice University (Conference Paper/Presentation)
Title: HathiTrust + Bookworm (HT+BW): A Collaboration between University of Illinois at Urbana-Champaign, Indiana University, Northeastern University, Baylor College of Medicine, and Rice University
Author: J. Stephen Downie
Author: Erez Lieberman Aiden
Author: Benjamin Schmidt
Author: Robert McDonald
Author: Loretta Auvil
Author: Sayan Bhattacharyya
Author: Colleen Fallaw
Author: Peter Organisciak
Author: Muhammad Shamim
Abstract: Bookworm, developed by the Cultural Observatory, is a tool that visualizes chronological trends in lexical usage within collections of digitized texts with metadata facets. The goal of this collaboration is to implement an enhanced open-source version of Bookworm that will enable culturomic exploration of the corpus of the HathiTrust Digital Library (HTDL), which represents a significant part of mankind's cultural legacy. Project participants include developers of the Google Ngram Viewer. The HT+BW project applies a similar idea to the vast collection of digitized text in the HTDL. The HTDL’s metadata is being leveraged for sophisticated trend analysis at a fine level of granularity. The HTDL consists of materials from the holdings of some of the pre-eminent research libraries in the world, and currently consists of over 13 million volumes of digitized text, of which over 4.6 million are in the public domain.
Date: 03/30/2015
Primary URL: https://drive.google.com/file/d/0ByTkFSkY8lF7TkRlNjdQc1o1VGc/view
Primary URL Description: Poster in .pdf format
Conference Name: 2015 HTRC UnCamp

Panel Discussion: The HathiTrust Research Center (Conference Paper/Presentation)
Title: Panel Discussion: The HathiTrust Research Center
Author: J. Stephen Downie
Author: Beth Plale
Author: Loretta Auvil
Author: Peter Organisciak
Abstract: While large corpora of digitized books provide unprecedented opportunities for novel modes of scholarly inquiry, through computational analysis, into the cultural legacy of mankind as preserved in print media, corpora that include books that are under copyright and cannot be made available to scholars for computational analysis in bulk form, present a special challenge. A second challenge for inquiry in the humanities, is that the interface should provide scholars with tools for both distant and close reading. This panel will consist of four papers that present a complementary set of approaches to this problem, which, individually, provide scholarly users with interfaces to the text data along collection-centric, document-centric, and word/phrase-centric threads of inquiry into the corpus. The first paper will present the underlying cyberinfrastructural approach to this problem as embodied in the non-consumptive approach to scholarly research. The second paper will describe the collection-centric interface to the corpus via the notion of the workset, the customized subset of the corpus, which a scholar gathers, curates, and sustains through version management, and will conceptually describe the data model that can be operationalized into an architecture and interface to support worksets. The third and fourth papers will describe the HTRC-Bookworm and HTRC-Feature-Extraction initiatives, which represent word/phrase -centric and document-centric approaches, respectively, to interfacing with the corpus.
Date: 10/23/2014
Primary URL: http://d2i.indiana.edu/node/16544
Primary URL Description: Event listing with full-length abstract.
Conference Name: 2014 Chicago Colloquium on Digital Humanities and Computer Science (DHCS)

The HathiTrust + Bookworm Project: Exploring Cultural and Literary Trends in Millions of Scanned Books (Conference Paper/Presentation)
Title: The HathiTrust + Bookworm Project: Exploring Cultural and Literary Trends in Millions of Scanned Books
Author: Peter Organisciak
Author: Loretta Auvil
Author: Benjamin Schmidt
Author: Sayan Bhattacharyya
Author: Colleen Fallaw
Author: Muhammad Saad Shamim
Author: Robert McDonald
Author: J. Stephen Downie
Author: Erez Lieberman Aiden
Abstract: Poster presentation covering introduction of HT+Bookworm project and work done up to that date.
Date: 4/3/15
Conference Name: Graduate School of Library and Information Science Research Showcase

Exploration of Billions of Words of the HathiTrust Corpus with Bookworm: HathiTrust + Bookworm Project (Conference Paper/Presentation)
Title: Exploration of Billions of Words of the HathiTrust Corpus with Bookworm: HathiTrust + Bookworm Project
Author: Loretta Auvil
Author: Erez Lieberman Aiden
Author: J. Stephen Downie
Author: Benjamin Schmidt
Author: Sayan Bhattacharyya
Author: Peter Organisciak
Abstract: Introduction to Bookworm and its scholarly application and summary of next steps for project.
Date: 7/1/15
Conference Name: Digital Humanities 2015

The Once and Future Past: Where Historical Record Digitization has been, and Where it is Going (Conference Paper/Presentation)
Title: The Once and Future Past: Where Historical Record Digitization has been, and Where it is Going
Author: Erez Lieberman Aiden
Abstract: Keynote address at HTRC UnCamp 2015 covering applications of Bookworm, and state of the art in the area of digital text visualization and analysis tools.
Date: 3/31/15
Conference Name: HTRC UnCamp 2015

HT+BW: HathiTrust+Bookworm (Conference Paper/Presentation)
Title: HT+BW: HathiTrust+Bookworm
Author: Loretta Auvil
Abstract: This session will discuss HT+BW, a project to integrate the HTDL corpus, processed at the HathiTrust Research Center (HTRC), with the Bookworm platform for text analysis, developed at the Cultural Observatory. Bookworm greatly extends the type of analysis that was popularized by the Google Ngrams Viewer, making it possible to “slice and dice” the data in an arbitrary corpus, in real time, using a greatly enhanced set of content-based and metadata-based features. The HT+BW will greatly increase the value of the HTRC because it will assist humanities scholars and students in their effort to delve deeper into the HathiTrust corpus and to explore more complex, multi-faceted research questions.
Date: 4/18/15
Primary URL: http://dplafest2015.sched.org/event/a1cfbaca67fd71a2409d28d9b27b1351#.VgxDMXur3RI
Primary URL Description: Link to event entry on DPLAfest website.
Conference Name: DPLAfest 2015

The HathiTrust+Bookworm tool for lexical trend discovery (Conference/Institute/Seminar)
Title: The HathiTrust+Bookworm tool for lexical trend discovery
Author: Sayan Bhattacharyya
Author: Harriett Green
Abstract: The HathiTrust+Bookworm tool for discovering and plotting lexical trends: Scholarly research using the power of data and metadata. Bookworm is a tool for visualization and analysis. It is useful for plotting usage trends in collections of texts. The newly developed HathiTrust + Bookworm tool enables you to explore the texts from the HathiTrust Digital Library. The HathiTrust Digital Library consists of materials from the digitized holdings of some of the most important research libraries in the world, and currently consists of approximately twelve million physical volumes of text in digitized form. This workshop will teach attendees how to use the HathiTrust + Bookworm tool to discover word usage trends across English-language texts from Hathi Trust. The attendees will learn how to create custom subsets of texts from the HathiTrust collection, and how to plot word trends with the Bookworm tool
Date Range: 4/29/15
Location: University Library, University of Illinois at Urbana-Champaign
Primary URL: http://illinois.edu/calendar/detail/4068?eventId=32632315&calMin=201504&cal=20150408&skinId=7198
Primary URL Description: Link to event entry.

The HathiTrust+Bookworm tool for lexical trend discovery (Conference/Institute/Seminar)
Title: The HathiTrust+Bookworm tool for lexical trend discovery
Author: Sayan Bhattacharyya
Abstract: The HathiTrust+Bookworm tool for discovering and plotting lexical trends: Scholarly research using the power of data and metadata. Bookworm is a tool for visualization and analysis. It is useful for plotting usage trends in collections of texts. The newly developed HathiTrust + Bookworm tool enables you to explore the texts from the HathiTrust Digital Library. The HathiTrust Digital Library consists of materials from the digitized holdings of some of the most important research libraries in the world, and currently consists of approximately twelve million physical volumes of text in digitized form. This workshop will teach attendees how to use the HathiTrust + Bookworm tool to discover word usage trends across English-language texts from Hathi Trust. The attendees will learn how to create custom subsets of texts from the HathiTrust collection, and how to plot word trends with the Bookworm tool
Date Range: 4/30/15
Location: Virtual workshop

Text mining with the HathiTrust Research Center: An introduction to working with digitized text corpora and metadata (Conference/Institute/Seminar)
Title: Text mining with the HathiTrust Research Center: An introduction to working with digitized text corpora and metadata
Author: Sayan Bhattacharyya
Abstract: The workshop will provide a hands-on introduction to the HTDL collection and its metadata, and to the tools and functionalities developed by the HTRC that leverage these resources. Through the concrete instances of the HTRC tools, the workshop will orient attendees about the new challenges and opportunities that the ability to carry out algorithmic text analysis at such a large scale presents to researchers. The workshop will cover the Secure Hathi Analytics Research Commons (SHARC), the HathiTrust+Bookworm (HT+BW) tool and the HTRC Extracted Features Dataset. Attendees will be shown how to build their own worksets (small, customized subcorpora from the HathiTrust Digital Library corpus) and how to conduct analyses on worksets. There will also be group discussion involving all attendees about the emerging questions that these novel developments are likely to inaugurate in their own fields and about how these developments can affirm or disrupt (or both affirm and disrupt simultaneously) established practices of inquiry.
Date Range: 5/30/15
Location: HASTAC 16, Michigan State University
Primary URL: http://www.hastac2015.org/schedule/
Primary URL Description: Event listing for workshop

The HathiTrust Research Center: Large-scale Computational Analysis with the World’s First Massive Digital Library (Conference/Institute/Seminar)
Title: The HathiTrust Research Center: Large-scale Computational Analysis with the World’s First Massive Digital Library
Author: Sayan Bhattacharyya
Abstract: The HathiTrust (HT) is a research consortium and digital library consisting of more than 13 million volumes of digitized text, mostly from the world's foremost research libraries. A large part of this material is under copyright and hence not directly downloadable. The HathiTrust Research Center (HTRC) has started developing facilities for non-consumptive access to the text data by providing innovative means of analytical access (without allowing download-access) to the text data. The workshop is intended for students and researchers interested in textual analytics using the HTRC corpus. Sayan Bhattacharyya, Postdoctoral Research Fellow at the HTRC, will showcase different approaches/mechanisms for non-consumptive analysis of the HT corpus, such as a portal, an HTRC Data Capsule, an Extracted Features (EF) dataset based on a subset of the corpus, and an HT+Bookworm tool for trend analysis on the corpus.
Date Range: 7/13/15
Location: LSA's Biennial Linguistic Institute, The University of Chicago
Primary URL: https://lsa2015.uchicago.edu/events/hathitrust-research-center-large-scale-computational-analysis-world-s-first-massive-digital
Primary URL Description: Event listing on website.

Introduction to the HathiTrust Research Center (HTRC): Teaching and research using the power of data and metadata in large text corpora (Conference/Institute/Seminar)
Title: Introduction to the HathiTrust Research Center (HTRC): Teaching and research using the power of data and metadata in large text corpora
Author: Sayan Bhattacharyya
Author: Eleanor Dickson
Abstract: Introduction to using HTRC services and tools, including Bookworm, for the classroom.
Date Range: 7/28/15
Location: Humanities Intensive Learning and Teaching (HILT) 2015

HTRC Bookworm launched! (Blog Post)
Title: HTRC Bookworm launched!
Author: HT+Bookworm team
Abstract: Explore the demo from UnCamp 2015 over millions of texts!
Date: 3/30/15
Primary URL: https://htrcbookworm.wordpress.com/2015/03/30/htrc-bookworm-launched/
Blog Title: HT+BW
Website: HT+BW blog

LibGuide for the HT+Bookworm prototype is now available (Blog Post)
Title: LibGuide for the HT+Bookworm prototype is now available
Author: Sayan Bhattacharyya
Abstract: A LibGuide for the HT+Bookworm prototype has now been published. LibGuides are sets of web pages for assistance that are compiled by library personnel. This particular LibGuide has been created by the Scholarly Commons of the Library of the University of Illinois, Urbana-Champaign.
Date: 3/20/15
Primary URL: https://htrcbookworm.wordpress.com/2015/03/20/libguide-for-the-htbookworm-prototype-is-now-available/
Blog Title: HT+BW

First workshop for HT+BW held at University of Illinois, Urbana-Champaign Scholarly Commons (Blog Post)
Title: First workshop for HT+BW held at University of Illinois, Urbana-Champaign Scholarly Commons
Author: Sayan Bhattacharyya
Abstract: A workshop on HT+BW was held for interested faculty and students at the University of Illinois, Urbana-Champaign Scholarly Commons, on April 29, 2015.
Date: 5/18/15
Primary URL: https://htrcbookworm.wordpress.com/2015/05/18/first-workshop-for-htbw-held-at-university-of-illinois-urbana-champaign-scholarly-commons/
Blog Title: HT+BW

Several HT+BW examples in expository HTRC workshop in July at LSA-SI (Blog Post)
Title: Several HT+BW examples in expository HTRC workshop in July at LSA-SI
Author: Sayan Bhattacharyya
Abstract: The HathiTrust Research Center (HTRC) will be doing a two-hour expository workshop at the Linguistic Society of America (LSA)’s Biennial Linguistic Institute, at the University of Chicago, on July 13, 2015: “The HathiTrust Research Center: Large-scale Computational Analysis with the World’s First Massive Digital Library.”
Date: 6/24/15
Primary URL: https://htrcbookworm.wordpress.com/2015/06/24/several-htbw-examples-in-expository-htrc-workshop-in-july-at-lsa-si/
Blog Title: HT+BW

The HathiTrust Research Center's Tools for Text Analysis with Digitized Text from the HathiTrust Digital Library (Conference Paper/Presentation)
Title: The HathiTrust Research Center's Tools for Text Analysis with Digitized Text from the HathiTrust Digital Library
Author: Eleanor Dickson
Author: Sayan Bhattacharyya
Abstract: The HathiTrust Research Center (HTRC) provides research support for the growing corpus of over fourteen million volumes in the HathiTrust Digital Library (HTDL) through a suite of tools text analysis. The size of the HTDL affords scholars the opportunity to increase the scale of their inquiry and to ask new kinds of research questions. The HTRC tools create avenues for scholars to pursue these new modes of research by allowing for “non-consumptive” text analysis with the HTDL corpus. Through demonstrations, hands-on exercises, and discussion, workshop attendees will learn about the suite of HTRC tools and how they can be used to support research and teaching. Attendees will come away with an understanding of: HTRC tools that allow researchers to build custom subcollections of items from the HTDL, run HTRC-provided, off-the-shelf algorithms against them, and interpret the results; HathiTrust+Bookworm, an interactive visualization for studying lexical trends within material from the HTDL; and the Extracted Features (EF) Dataset, which provides page-level metadata and data derived from the items in a subcollection that a researcher can download and analyze on his or her own computer. The workshop will present scenarios and example use cases in which HTRC tools shine, and will demonstrate the ways in which they can complement each other. Attendees will learn new strategies and approaches for the complex research questions that digitized text corpora are uniquely poised to help answer across academic disciplines.
Date: 10/27/2015
Primary URL: http://dlfforum2015.sched.org/event/be23d227e002af769e83f79ea39db842
Conference Name: DLF Forum 2015

New tools from the HathiTrust Research Center for digitized text analysis at scale: The HathiTrust+Bookworm tool and the Extracted Features dataset (Public Lecture or Presentation)
Title: New tools from the HathiTrust Research Center for digitized text analysis at scale: The HathiTrust+Bookworm tool and the Extracted Features dataset
Abstract: As library digitization efforts produce large quantities of digitized textual content, they create the conditions of possibility for novel inferencing techniques at scale, raising tantalizing possibilities for producing new knowledge about history, linguistics, literary studies and other related fields. However, this possibility will be realized only if tools and infrastructure to explore and analyze textual data can rise to the challenge posed by the data’s scale and access restrictions. To this end, the HathiTrust Research Center (HTRC), based jointly at the University of Illinois and Indiana University, is creating novel capabilities to enable scholars to have access, for research purposes, to the millions of works that constitute the content of the HathiTrust Digital Library. We will discuss two such capabilities — (1) the HathiTrust+Bookworm (HT+BW) tool; and (2) the HTRC Extracted Features (EF) dataset. The first, HT+BW, is an NEH-funded multi-university initiative for visualizing language usage trends; the current prototype supports nearly five million books. The second, the HTRC Extracted Features Dataset, makes available, for the same works, certain kinds of extracted quantitative data at the page level. In this talk, we will describe: Technical challenges posed by the massive scale of the HathiTrust Digital Library content, and how HT+BW and the HTRC EF dataset are meeting some of these challenges. Epistemic issues foregrounded by pedagogical uses of HT+BW and the HTRC EF dataset.
Author: Peter Organisciak
Author: Sayan Bhattacharyya
Date: 2/10/16
Location: Champaign, Illinois
Primary URL: http://cirss.lis.illinois.edu/Events/eventDetails.php?id=269
Primary URL Description: Event listing page.

HathiTrust Research Center Tools and Services. (Public Lecture or Presentation)
Title: HathiTrust Research Center Tools and Services.
Abstract: On Monday, November 2 from 12:30 to 1:30 in the Murray Room, the Library will be hosting an information session with staff and researchers from the HathiTrust Research Center for Georgetown faculty (and students!) to learn more about the HTRC. Come hear about the text mining and other tools the HTRC is developing and how you can use them, and the HathiTrust Digital Library, in your own research!
Author: Eleanor Dickson
Author: Miao Chen
Date: 11/02/2015
Location: Georgetown University Library

Workshop with HT+Bookworm for student teams for 4humanities.org Student Prize Contest "Why is studying the humanities important?" (Public Lecture or Presentation)
Title: Workshop with HT+Bookworm for student teams for 4humanities.org Student Prize Contest "Why is studying the humanities important?"
Abstract: At this workshop, Sayan Bhattacharyya, Postdoctoral Research Associate at the Graduate School of Library and Information Science, will show interested students how the HathiTrust+Bookworm tool (partly developed here at UIUC), and a Word Similarity Tool (built at Cornell University) can help students interested in participating in the contest to construct philological arguments using text-mining of the digitized contents of the world's great research libraries, supporting their entries. Students will be encouraged to form student teams from among other interested attendees at the workshop. Open to both graduate and undergraduate students.
Author: Sayan Bhattacharyya
Date: 2/4/16
Location: Champaign, Illinois
Primary URL: http://illinois.edu/calendar/detail/4068?eventId=33077902&calMin=201601&cal=20160131&skinId=7198
Primary URL Description: Even listing page.

Class session in Prof. Christi Merrill's Winter 2016 undergraduate Asian Cultures class at the University of Michigan, Ann Arbor, showcasing the use of HT+Bookworm (Public Lecture or Presentation)
Title: Class session in Prof. Christi Merrill's Winter 2016 undergraduate Asian Cultures class at the University of Michigan, Ann Arbor, showcasing the use of HT+Bookworm
Abstract: Class session in Prof. Merrill's undergraduate Asian Cultures course at UM. This session showcased using Bookworm in a classroom setting.
Author: Sayan Bhattacharyya
Date: 2/2/16
Location: Virtual

Class session in Prof. Christi Merrill's Winter 2016 undergraduate Comparative Literature class at the University of Michigan, Ann Arbor, showcasing the use of HT+Bookworm (Public Lecture or Presentation)
Title: Class session in Prof. Christi Merrill's Winter 2016 undergraduate Comparative Literature class at the University of Michigan, Ann Arbor, showcasing the use of HT+Bookworm
Abstract: Class session in Prof. Christi Merrill's Winter 2016 undergraduate Comparative Literature class at the University of Michigan, Ann Arbor, showcasing the use of HT+Bookworm.
Author: Sayan Bhattacharyya
Date: 2/1/16
Location: Virtual

The HathiTrust Research Center's Extracted Features Dataset: An Opportunity for "Distant" Reading of Millions of Books from the World's Great Research Libraries. (Conference Paper/Presentation)
Title: The HathiTrust Research Center's Extracted Features Dataset: An Opportunity for "Distant" Reading of Millions of Books from the World's Great Research Libraries.
Author: Sayan Bhattacharyya
Abstract: Part of "Big Data Case Studies" panel, Big Data Summit 2015.
Date: 11/11/2015
Conference Name: Big Data Summit 2015

HT+Bookworm to Stanford University Library subject specialist librarians (Public Lecture or Presentation)
Title: HT+Bookworm to Stanford University Library subject specialist librarians
Abstract: Presentation on demoing and utilizing Bookworm to Stanford University Library subject specialist librarians.
Author: Sayan Bhattacharyya
Date: 1/20/16
Location: Stanford University

Using the HathiTrust Research Center’s Tools for Text Analysis. (Conference Paper/Presentation)
Title: Using the HathiTrust Research Center’s Tools for Text Analysis.
Author: Sayan Bhattacharyya
Author: Eleanor Dickson
Abstract: We are delighted to announce that the Hathi Trust Research Center will be conducting a workshop on the tools that they have developed for the HathiTrust Digital Library.
Date: 11/15/2015
Primary URL: https://lucian.uchicago.edu/blogs/dhcs/2015/10/09/hathi-trust-workshop-at-dhcs-2015/
Conference Name: Chicago Colloquium on Digital Humanities & Computer Science (DHCS 2015)

The HathiTrust+Bookworm Project as a Model for Collaborative Research at Large Scale (Conference Paper/Presentation)
Title: The HathiTrust+Bookworm Project as a Model for Collaborative Research at Large Scale
Author: Sayan Bhattacharyya
Author: Muhammad Saad Shamim
Abstract: Panelists share examples of four collaborative projects involving research by a team of two or more scholars from literary studies and computer science or other disciplines. Discussion focuses on best practices, lessons learned, communication strategies, what challenges to anticipate, and methods, tools, and outcomes.
Date: 1/8/16
Primary URL: https://apps.mla.org/conv_listings_detail?prog_id=406&year=2016
Primary URL Description: Event listing page.
Conference Name: MLA 2016

Doing Text Analysis with the HathiTrust Research Center's Tools (Public Lecture or Presentation)
Title: Doing Text Analysis with the HathiTrust Research Center's Tools
Abstract: Presentation to librarians and researchers at UT on using HTRC tools for text analysis of HT corpus.
Author: Sayan Bhattacharyya
Author: Eleanor Dickson
Date: 1/4/16
Location: University of Texas, Austion

Class session in Prof. Christi Merrill's class 'Comparative Literature 322: Writing World Literatures' at the University of Michigan, Ann Arbor, showcasing the use of HT+Bookworm (Public Lecture or Presentation)
Title: Class session in Prof. Christi Merrill's class 'Comparative Literature 322: Writing World Literatures' at the University of Michigan, Ann Arbor, showcasing the use of HT+Bookworm
Abstract: Class session in Prof. Christi Merrill's class 'Comparative Literature 322: Writing World Literatures' at the University of Michigan, Ann Arbor, showcasing the use of HT+Bookworm.
Author: Sayan Bhattacharyya
Date: 11/19/2015
Location: University of Michigan

Class session in Prof. Christi Merrill's class 'Comparative Literature 322: Writing World Literatures' at the University of Michigan, Ann Arbor, showcasing the use of HT+Bookworm (Public Lecture or Presentation)
Title: Class session in Prof. Christi Merrill's class 'Comparative Literature 322: Writing World Literatures' at the University of Michigan, Ann Arbor, showcasing the use of HT+Bookworm
Abstract: Class session in Prof. Christi Merrill's class 'Comparative Literature 322: Writing World Literatures' at the University of Michigan, Ann Arbor, showcasing the use of HT+Bookworm.
Author: Sayan Bhattacharyya
Date: 11/19/2015
Location: University of Michigan

Text Analysis with the HathiTrust Research Center (Public Lecture or Presentation)
Title: Text Analysis with the HathiTrust Research Center
Abstract: Workshop on HTRC tools in text analysis on the HT corpus. Part of the University of Michigan Digital Scholarship Workshop Series.
Author: Sayan Bhattacharyya
Date: 11/20/16
Location: University of Michigan Ann-Arbor

HTRC+Bookworm in an Undergraduate Classroom (Conference Paper/Presentation)
Title: HTRC+Bookworm in an Undergraduate Classroom
Author: Sayan Bhattacharyya
Abstract: Lightning talk
Date: 4/8/2016
Conference Name: Global Digital Humanities Symposium

"Big Data" textual y america latina: como el HathiTrust+Bookworm permite investagaciones de formas nuevas a través de libros digitalizados a gran escala (Article)
Title: "Big Data" textual y america latina: como el HathiTrust+Bookworm permite investagaciones de formas nuevas a través de libros digitalizados a gran escala
Author: Sayan Bhattacharyya
Author: Peter Organisciak
Author: Loretta Auvil
Author: Leena Unnikrishnan
Author: Benjamin Schmidt
Author: Muhammad Saad Shamim
Author: Robert McDonald
Author: J. Stephen Downie
Abstract: Textual Big Data and Latin America: How the HathiTrust+Bookworm enables new forms of inquiry by means of digitized books at large scale.
Year: 2016
Access Model: Open Access [but url not yet published]
Format: Journal
Periodical Title: Dossier MAPFRE
Publisher: MAPFRE

HathiTrust Research Center: Exploring New Collaboration Opportunities [HTRC Lecture Series] (Public Lecture or Presentation)
Title: HathiTrust Research Center: Exploring New Collaboration Opportunities [HTRC Lecture Series]
Abstract: This was a series of invited lectures describing the HTRC project generally and showcasing Bookworm in particular given to digital humanists at the following Chinese universities: Peking University (21 May 2016), Wuhan University (23 May 2016), Nanjing University (25 May 2016), and Jiangsu University (30 May 2016).
Author: J. Stephen Downie
Date: 5/21/2016
Location: China

Adding flexibility to large-scale text visualization with HathiTrust+Bookworm (Conference Paper/Presentation)
Title: Adding flexibility to large-scale text visualization with HathiTrust+Bookworm
Author: Peter Organisciak
Author: Sayan Bhattacharyya
Author: Loretta Auvil
Author: Leena Unnikrishnan
Author: Benjamin Schmidt
Author: Muhammad Saad Shamim
Author: Robert McDonald
Author: J. Stephen Downie
Author: Erez Lieberman Aiden
Abstract: The HathiTrust holds one of the largest collections of digitized published work in the world. With nearly 14 million volumes ("Currently Digitized", HathiTrust.org), it is a challenge to create mechanisms to understand and interpret a collection of this size. The HathiTrust+Bookworm (HT+BW) project presents ways to behold that textual content through interactive visualization. In this poster, we present new work from HT+BW in applying visualization as an complementary, rather than center, tool for supporting better comprehension of large collections. Whereas HT+BW has previously been used in standalone contexts with pre-determined metadata, we present work in two new areas: 1) allowing scholars to analyze custom personal collections from within the larger corpus; and 2) use of HT+BW as a supplement to other uses of the HathiTrust Research Center.
Date: 7/11/2016
Primary URL: http://dh2016.adho.org/abstracts/179
Primary URL Description: A link to the poster abstract through the DH 2016 proceedings.
Conference Name: Digital Humanities 2016

Classification and the library (Public Lecture or Presentation)
Title: Classification and the library
Abstract: Presentation given to the History Lab Working Group at Columbia University.
Author: Benjamin Schmidt
Date: 4/19/2016
Location: Columbia University, New York, NY

Text analytics for medical history (Public Lecture or Presentation)
Title: Text analytics for medical history
Abstract: Presentation given at the National Institutes of Health.
Author: Benjamin Schmidt
Date: 4/11/2016
Location: Bethesda, MD

Literary dopplegängers and interestingness (Blog Post)
Title: Literary dopplegängers and interestingness
Author: Benjamin Schmidt
Abstract: "I started this post with a few digital-humanities posturing paragraphs: if you want to read them, you'll encounter them eventually. But instead let me just get the point: here's a trite new category of analysis that wouldn't be possible without distant reading techniques that produces sometimes charmingly serendipitous results." - from the blog post.
Date: 5/30/2016
Primary URL: http://sappingattention.blogspot.com/2016/05/literary-dopplegangers-and.html
Blog Title: Sapping Attention
Website: Blogspot

Data visualization for the humanities (Public Lecture or Presentation)
Title: Data visualization for the humanities
Abstract: Presentation given at the Digital Futures Initiative at Grinnell College.
Author: Benjamin Schmidt
Date: 6/7/2016
Location: Grinnell, IA

Visualizing and classifying large digital libraries (Public Lecture or Presentation)
Title: Visualizing and classifying large digital libraries
Abstract: Presentation given at the New York Public Library
Author: Benjamin Schmidt
Date: 7/29/2016
Location: New York, NY

Trends in Centuries of Words: Progress on the HathiTrust+Bookworm Project (Conference Paper/Presentation)
Title: Trends in Centuries of Words: Progress on the HathiTrust+Bookworm Project
Author: Peter Organisciak
Author: J. Stephen Downie
Abstract: The HathiTrust+Bookworm (HT+BW) project is providing quantitative access to the millions of works in the HathiTrust Digital Library. Through a tool called Bookworm, digital humanities scholars can use out­of­the­box exploratory visualization tools to compare trends in all or parts of the collection, or use the API directly to query for more advanced questions. In this poster, we present the progress of the HT+BW project and discuss both its potential value to the digital humanities scholars and its current limitations.
Date: 09/13/2016
Primary URL: http://conf2016.jadh.org/proceedings-JADH2016-online.pdf
Primary URL Description: Conference Proceedings: Poster abstract appears on pp 41-42.

Reading 15 million books - Tooling the future of massive-scale research data (Public Lecture or Presentation)
Title: Reading 15 million books - Tooling the future of massive-scale research data
Abstract: Bookworm + HTRC talk given at the University of Denver.
Author: Peter Organisciak
Date: 2/28/2017
Location: Denver, Colorado

Reading 15 million books - Tooling the future of massive-scale research data (Public Lecture or Presentation)
Title: Reading 15 million books - Tooling the future of massive-scale research data
Abstract: Bookworm + HTRC talk given at the University of Colorado Denver.
Author: Peter Organiciak
Date: 3/8/2017
Location: Denver, Colorado

Digital humanities using both closed and open data: Use cases from the HathiTrust Research Center (Public Lecture or Presentation)
Title: Digital humanities using both closed and open data: Use cases from the HathiTrust Research Center
Abstract: The HathiTrust Digital Library (HTDL) contains 14.7 million volumes (over 5 billion pages). Unfortunately, roughly 9 million HTDL volumes are under copyright restrictions and cannot be shared with users. To overcome this problem, the HathiTrust Research Center (HTRC) is creating a set of "non-consumptive research" services to make these closed materials more open and thus useful to scholars. This talk introduces such non-consumptives services as "Data Capsules," "Extracted Features" and the "Bookworm + HathiTrust" tool. Each HTRC service is designed to open new points of access to otherwise closed data while still respecting all copyright limitations.
Author: J. Stephen Downie
Author: Peter Organisciak
Date: 1/24/2017
Location: Center for Open Data in the Humanities, National Institute of Informatics, Tokyo, Japan

Digital humanities using both closed and open data: Use cases from the HathiTrust Research Center [repeated talk previously given in Japan] (Public Lecture or Presentation)
Title: Digital humanities using both closed and open data: Use cases from the HathiTrust Research Center [repeated talk previously given in Japan]
Abstract: The HathiTrust Digital Library (HTDL) contains 14.7 million volumes (over 5 billion pages). Unfortunately, roughly 9 million HTDL volumes are under copyright restrictions and cannot be shared with users. To overcome this problem, the HathiTrust Research Center (HTRC) is creating a set of "non-consumptive research" services to make these closed materials more open and thus useful to scholars. This talk introduces such non-consumptives services as "Data Capsules," "Extracted Features" and the "Bookworm + HathiTrust" tool. Each HTRC service is designed to open new points of access to otherwise closed data while still respecting all copyright limitations.
Author: J. Stephen Downie
Author: Peter Organisciak
Date: 3/20/2017
Location: Nanjing University, Nanjing, China

Words in a world of scaling-up: Epistemic normativity and text as data (Article)
Title: Words in a world of scaling-up: Epistemic normativity and text as data
Author: Sayan Bhattacharrya
Abstract: Cultural and literary studies have long been cognizant that apparatuses for knowledge production can render certain kinds of texts “illegible.” The relationships between knowledge, power and episteme that produce this occlusion have traditionally been explored and analyzed at the level of engagement with specific social and literary texts. This paper describes how a similar problem can arise in the context of the analysis of large-scale bodies of text. Our example is an analytical tool, intended for discovering trends and patterns in large text corpora. By describing what happens when the tool is applied to a large, heterogeneous and diverse textual corpus, we show how textual inscriptions that stand in a relationship of subalternity to structuring normativities of the text corpus could become invisible unless they already conform to the epistemic assumptions underlying those normativities. We conclude by discussing how my observations relate, by analogy and by allegory, to some issues of interest in discussions of world literature.
Year: 2017
Primary URL: http://sanglap-journal.in/index.php/sanglap/article/view/157
Primary URL Description: Link to article
Access Model: open access
Format: Journal
Periodical Title: Sanglap: Journal of Literary and Cultural Inquiry
Publisher: Sanglap