Program

Digital Humanities: Digital Humanities Start-Up Grants

Period of Performance

6/1/2014 - 6/30/2016

Funding Totals

$60,000.00 (approved)
$59,696.55 (awarded)


Image Analysis for Archival Discovery (Aida)

FAIN: HD-51897-14

University of Nebraska, Lincoln (Lincoln, NE 68503-2427)
Elizabeth Lorang (Project Director: September 2013 to May 2017)
Leen-Kiat Soh (Co Project Director: September 2003 to May 2017)

The development of a prototype tool that would allow scholars and students to apply image processing and machine learning techniques to identify specific visual elements within digitized collections. The project would start with an attempt to identify poetry found in the Chronicling America collection of historic newspapers.

Images created in the digitization of primary materials contain a wealth of machine-processable information for data mining and large-scale analysis, and this information should be leveraged both to connect researchers with the resources they need and to augment interpretation of human culture, as a complement to and extension of text-based approaches. The proposed project, "Image Analysis for Archival Discovery" (Aida), applies image processing and machine learning techniques from computer science to digitized materials to facilitate and promote archival discovery. Beginning with the automatic detection of poetic content in historic newspapers, this project will develop image processing as a methodology for humanities research and analysis. In doing so, it will advance work on two fronts: 1) it will contribute to the reevaluation of newspaper verse in American literary history; 2) it will assess the application of image analysis as a method for discovery in archival collections.



Media Coverage

How to find a poem in 200-year-old newspapers (Media Coverage)
Author(s): Sojico, Jackie
Publication: NET Radio
Date: 11/1/2014
Abstract: Liz Lorang is on the hunt for poetry. Not poetry from today, but from 200 years ago. And she’s looking for it where it published the most in the 19th century: in American newspapers. She’s hoping a team of computer scientists can help her find all of it.
URL: http://netnebraska.org/article/culture/943643/how-find-poem-200-year-old-newspapers

Mining Newspapers for Poetry (Media Coverage)
Author(s): Lieberman, Michael
Publication: Book Patrol
Date: 10/14/2014
Abstract: What to do you get when you partner up a digital humanities projects librarian with an associate professor of computer science and engineering? Answer: Something good.
URL: http://blog.seattlepi.com/bookpatrol/2014/10/17/mining-newspapers-for-poetry/

Project mines 8 million news pages for poetry (Media Coverage)
Author(s): Gayman, Deann
Publication: UNL Today
Date: 10/13/2014
Abstract: What differentiates a line of text from a news story and a line of text from a poem? Not much, and that’s a problem for researchers of American poetry.
URL: http://news.unl.edu/newsrooms/unltoday/article/project-mines-8-million-news-pages-for-poetry/



Associated Products

Leveraging Visual Information for Discovery and Analysis of Digital Collections. (Conference Paper/Presentation)
Title: Leveraging Visual Information for Discovery and Analysis of Digital Collections.
Author: Soh-LeenKiat
Author: Lorang, Elizabeth
Abstract: Proceeding from the research-in-progress of the Image Analysis for Archival Discovery project, this presentation will consider emerging possibilities of visual analytics for discovery in digital collections. This presentation will describe a methodology to identify poetic content in digitized newspapers--through extraction and categorization of visual cues using image processing and machine learning techniques--and discuss its future applications for the digital library community.
Date: 10/29/2015
Conference Name: Digital Library Federation Forum

Digital Approaches to American Periodicals (Roundtable Discussion) (Conference Paper/Presentation)
Title: Digital Approaches to American Periodicals (Roundtable Discussion)
Author: Lorang, Elizabeth
Abstract: The Image Analysis for Archival Discovery (Aida) project team is investigating the use of image analysis to identify poetic content in historic newspapers. This project has both a broad goal to explore new strategies for identifying materials of relevance for researchers within large digital collections, as well as a very specific goal within literary studies.
Date: 2015-05-20
Conference Name: American Literature Association Annual Conference

Detection of Poetic Content in Historic Newspapers through Image Analysis (Conference Paper/Presentation)
Title: Detection of Poetic Content in Historic Newspapers through Image Analysis
Author: Lorang, Elizabeth
Author: Soh, Leen-Kiat
Author: Lunde, Joseph
Author: Thomas, Grace
Abstract: By conservative estimates, several hundred thousand poems appeared in early American and U.S. newspapers from the eighteenth through the early twentieth centuries. Counting snippets of verse that appeared in death notices, advertisements, and articles makes the presence of poetry in historic newspapers even more pervasive. Feminist scholars and others performing recovery work routinely resurrect authors and works from newspaper pages, but until recently this rich trove of newspaper verse as a corpus of its own has been outside the scope of literary study and a footnote in histories of American newspapers. In the last decade, however, scholars have made significant inroads in studying the importance of newspaper verse as a form and the public role of poetry in American culture. Underpinning this scholarship is a growing recognition that the evaluation and history of American poetry should not be based on less than one percent of the poetic record. In addition, this new scholarship values and explores the role of poetry in the daily lives of people, including making sense of what it means to be human and in processing national, social, and individual experiences. To the extent that these new histories depend on traditional methods of archival discovery and analysis, however, they will remain anecdotal— individual narratives extrapolated from a miniscule subset of the whole, with limited means of situating the anecdote as either representative or idiosyncratic. In short, the magnitude of the corpus requires new modes of discovery and analysis.
Date: 07/12/2014
Conference Name: Digital Humanities 2014

Image Analysis for Archival Discovery (Aida) (Web Resource)
Title: Image Analysis for Archival Discovery (Aida)
Author: Elizabeth Lorang
Author: Leen-Kiat Soh
Abstract: Project website for "Image Analysis for Archival Discovery (Aida)" Libraries, archives, museums, and other groups are creating millions and millions of digital images as we digitize the cultural record. For the most part, though, these digital images are under-utilized, and we leverage little of their information potential. At the same time, locating relevant materials in digital collections is often already a difficult endeavor and will become increasingly so as more content is digitized. The Aida team is exploring what more we can do with the millions of images that represent the digitized cultural record—particularly digital images of textual materials—and we are interested in the types of discovery that serious attention to digital images might yield.
Year: 2014
Primary URL: http://projectaida.org/

Developing an Image-Based Classifier for Detecting Poetic Content in Historic Newspaper Collections (Article)
Title: Developing an Image-Based Classifier for Detecting Poetic Content in Historic Newspaper Collections
Author: Elizabeth Lorang
Author: Leen-Kiat Soh
Author: Maanas Varma Datla
Author: Spencer Kulwicki
Abstract: The Image Analysis for Archival Discovery (Aida) project team is investigating the use of image analysis to identify poetic content in historic newspapers. The project seeks both to augment the study of literary history by drawing attention to the magnitude of poetry published in newspapers and by making the poetry more readily available for study, as well as to advance work on the use of digital images in facilitating discovery in digital libraries and other digitized collections. We have recently completed the process of training our classifier for identifying poetic content, and as we prepare to move to the deployment stage, we are making available our methods for classification and testing in order to promote further research and discussion. The precision and recall values achieved during the training (90.58%; 79.4%) and testing (74.92%; 61.84%) stages are encouraging. In addition to discussing why such an approach is needed and relevant and situating our project alongside related work, this paper analyzes preliminary results, which support the feasibility and viability of our approach to detecting poetic content in historic newspaper collections.
Year: 2015
Primary URL: http://www.dlib.org/dlib/july15/lorang/07lorang.html
Access Model: Open Access, but Journal has ceased publication as of 2017.
Format: Journal
Periodical Title: D-Lib Magazine