Program

Digital Humanities: Cooperative Agreements and Special Projects (Digital Humanities)

Period of Performance

2/1/2021 - 10/31/2023

Funding Totals

$149,650.00 (approved)
$135,691.00 (awarded)


Machines Reading Maps: Finding and Understanding Text on Maps

FAIN: HC-278125-21

University of Southern California (Los Angeles, CA 90089-0012)
Deborah Ann Holmes-Wong (Project Director: August 2020 to present)
Yao-Yi Chiang (Co Project Director: October 2020 to present)

The development of a workflow that would use advanced machine learning and annotation tools to extract and annotate text on maps across large historic map collections. The UK partner, The Alan Turing Institute, is requesting £199,942 from the Arts and Humanities Research Council.

Machines Reading Maps aims to transform how humanities scholars and cultural heritage professionals interact with map images. Maps constitute a vast body of global cultural heritage, and only a very small portion has been brought into digital platforms for meaningful search, investigation, and discovery at scale. Our project will create open-source tools and methods that employ machine-learning to enable researchers and cultural institutions to identify text on scanned maps and make that text meaningful via metadata creation and linking to historical gazetteers and other resources. Working with partners at the Library of Congress, British Library, and National Library of Scotland, we will generate and share data and methods from Sanborn, Goad, and OS historical maps and link map text to resources for understanding US and UK social history. Our project will enrich spatial explorations of history and help cultural institutions share map collections more effectively with the public.





Associated Products

Synthetic Map Generation to Provide Unlimited Training Data for Historical Map Text Detection (Article)
Title: Synthetic Map Generation to Provide Unlimited Training Data for Historical Map Text Detection
Author: Li, Zekun
Author: Guan, Runyu
Author: Yu, Qianmu
Author: Chiang, Yao-Yi
Author: Knoblock, Craig
Abstract: Many historical map sheets are publicly available for studies that require long-term historical geographic data. The cartographic design of these maps includes a combination of map symbols and text labels. Automatically reading text labels from map images could greatly speed up the map interpretation and helps generate rich metadata describing the map content. Many text detection algorithms have been proposed to locate text regions in map images automatically, but most of the algorithms are trained on out-of-domain datasets (e.g., scenic images). Training data determines the quality of machine learning models, and manually annotating text regions in map images is labor-extensive and time-consuming. On the other hand, existing geographic data sources, such as Open-StreetMap (OSM), contain machine-readable map layers, which allow us to separate out the text layer and obtain text label annotations easily. However, the cartographic styles between OSM map tiles and historical maps are significantly different. This paper proposes a method to automatically generate an unlimited amount of annotated historical map images for training text detection models. We use a style transfer model to convert contemporary map images into historical style and place text labels upon them. We show that the state-of-the-art text detection models (e.g., PSENet) can benefit from the synthetic historical maps and achieve significant improvement for historical map text detection.
Year: 2021
Primary URL: http://https://doi.org/10.1145/3486635.3491070
Primary URL Description: ACM Digital Library DOI
Secondary URL: https://arxiv.org/abs/2112.06104
Secondary URL Description: Open access available through arxiv.org
Access Model: Subscription access with copy at arxiv.org
Format: Journal
Format: Other
Periodical Title: GEOAI '21: Proceedings of the 4th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery
Publisher: Association for Computing Machinery (ACM) GEOAI-SIGSPATIAL

Maps with a Sense of the Past: What Are Synthetic Maps, and Why Do We Love Them? (Blog Post)
Title: Maps with a Sense of the Past: What Are Synthetic Maps, and Why Do We Love Them?
Author: Vitale, Valeria
Author: Katherine McDonough
Author: Fleet, Christopher
Author: Li, Zekun
Abstract: Maps are excellent documentary sources for understanding the history of the landscape, including past human activities and former physical environments. Many organizations have scanned tens of thousands of historical maps and shared them online. For example, the Sanborn Fire Insurance Map Collection scanned by the Library of Congress shows detailed changes in the built environment of North American cities during the 19th and early 20th centuries. USGS topographic maps illustrate the past landscape of the whole of the United States during the 19th century. Meanwhile, on this side of the pond, the Ordnance Survey’s large-scale maps, many of which have been digitised by the National Library of Scotland, cover Great Britain from the 19th century to the present day, and provide a wealth of information for researching urban and rural built and natural environments as they change over time.
Date: 10/04/21
Primary URL: https://blog.nls.uk/maps-with-a-sense-of-the-past/
Blog Title: National Library of Scotland Blog
Website: National Library of Scotland Blog Site

Machines Reading Maps: From Text on Maps to Linked Spatial Data; (Film/TV/Video Broadcast or Recording)
Title: Machines Reading Maps: From Text on Maps to Linked Spatial Data;
Writer: McDonough, Katherine
Writer: Vitale, Valeria
Director: Vitale, Valeria
Director: McDonough, Katherine
Producer: Stanford Geospatial Center
Abstract: Part of the Geo4LibCamp 2022 Program
Year: 2022
Primary URL: https://www.youtube.com/watch?v=ACWZMEKWDls.
Access Model: YouTube, open access
Format: Video

Machine Reading Maps (Web Resource)
Title: Machine Reading Maps
Author: McDonough, Katherine
Author: Vitale, Valeria
Abstract: Creating a generalisable machine learning pipeline to process text on maps and catalysing humanities, scientific, and cultural heritage communities to use map text as data.
Year: 2021
Primary URL: https://www.turing.ac.uk/research/research-projects/machines-reading-maps
Primary URL Description: Project web site.

Machine Reading Maps (Project Management Site) (Web Resource)
Title: Machine Reading Maps (Project Management Site)
Author: McDonough, Katherine
Author: Vitale, Valeria
Abstract: This site contains internal project documents and progress reports.
Year: 2021
Primary URL: , https://machines-reading-maps.github.io/

UCGIS Machine Reading Maps (Film/TV/Video Broadcast or Recording)
Title: UCGIS Machine Reading Maps
Writer: Chiang, Yao-Yi
Director: Chiang, Yao-Yi
Producer: Chiang, Yao-Yi
Abstract: Historical maps contain detailed geographic information difficult to find elsewhere, but they typically exist as scanned images without searchable metadata. Existing approaches for making historical maps searchable rely on tedious manual work (including crowd-sourcing) to generate their metadata (e.g., geolocations and keywords). Optical character recognition (OCR) software could alleviate the required manual work, but the recognition results are individual words instead of location phrases (e.g., “Black” and “Mountain” vs. “Black Mountain”). Also, these recognized words are plain text and do not have semantic labels (e.g., place vs. road names). In an ongoing project, collaborating with the Alan Turing Institute, the British Library, the Library of Congress, and the University of Southern California Digital Library, we are developing a machine-learning map processing pipeline that automatically reads text from thousands of scanned historical maps and makes that text meaningful. The machine learning pipeline will transform how cultural heritage institutions can enrich and expose metadata about their digitized map collections. As a preliminary work for this project, this talk will present an end-to-end approach to address the real-world problem of finding and indexing historical map images. This approach automatically processes historical map images to extract their text content and generates a set of metadata linked to large external geospatial knowledge bases. The linked metadata support complex queries for finding and indexing historical maps, such as retrieving all historical maps covering mountain peaks higher than 1,000 meters in California. We have implemented the approach in a system called mapKurator. We have evaluated mapKurator using historical maps from several sources with various map styles, scales, and coverage. Our results show significant improvement over the state-of-the-art methods. The code has been made publicly available as modules of the Google
Year: 2021
Primary URL: https://www.youtube.com/watch?v=F45jgjbhoIY.
Access Model: Open access
Format: Video

Machine Reading Maps Project Newsletter 1 (Web Resource)
Title: Machine Reading Maps Project Newsletter 1
Author: Vitale, Valeria
Abstract: Newsletter for advisory board that highlights accomplishments of project team as well as questions requiring advisory board feedback.
Year: 2021
Primary URL: https://github.com/machines-reading-maps/Tutorials-Newsletters/blob/9567dc3638edc4c77ccc82caa790b10414dee4bb/Newsletter_2021_10.pdf.

MapKurator (Computer Program)
Title: MapKurator
Author: Li, Zekun
Abstract: Wrapper around Zekun's model to detect and generate annotations around map labels
Year: 2021
Primary URL: https://github.com/machines-reading-maps/map-kurator
Access Model: open access
Programming Language/Platform: Python
Source Available?: Yes

Entity recommendation API (Computer Program)
Title: Entity recommendation API
Author: Kim, Jina
Abstract: API code for entity recommendation MapKurator/Recogito
Year: 2021
Primary URL: https://github.com/machines-reading-maps/entity-recommendation-api
Access Model: open access
Programming Language/Platform: Python, Dockerfile
Source Available?: Yes