A Knowledge Graph for Managing and Analyzing Spanish American Notary Records
FAIN: HAA-287903-22
University of Missouri, Kansas City (Kansas City, MO 64110-2235)
Viviana L. Grieco (Project Director: January 2022 to present)
Praveen Rao (Co Project Director: May 2022 to present)
The continued development of computational methods to analyze and process handwritten scripts from 17th century documents.
We seek NEH funding to complete the development of a software that will enable twenty-first century scholars to expeditiously read and analyze seventeenth-century Spanish American notary records and efficiently find relevant content in these documentary collections. Using recent advances in deep learning and knowledge management, we will develop a tool to manage and analyze about 220,000 pages of digital images of seventeenth-century manuscripts available at the Archivo General de la República Argentina located in Buenos Aires. This collection combines a large variety of handwritten scripts. Based on this distinctive collection, our proposed tool will enable processing manuscripts available at other archival sites and create research and collaboration opportunities elsewhere in Latin America.
Associated Products
Few-Shot Learning for Word Recognition in Handwritten Seventeenth-Century Spanish American Notary Records (Article)Title: Few-Shot Learning for Word Recognition in Handwritten Seventeenth-Century Spanish American Notary Records
Author: Nouf Alrasheed
Author: Shraboni Sarker
Author: Praveen Rao
Author: Viviana Grieco
Abstract: Historical records are invaluable sources of information that provide insights into multiple aspects of past events and societies. The analysis of historical records using deep learning poses critical challenges such as the lack of sufficient labeled data and at times the poor quality of scanned images. In this paper, we propose SpanishFSL, a few-shot learning (FSL) approach for word recognition in 17th-century handwritten Spanish American notary records. SpanishFSL draws inspiration from a zero-shot learning approach developed for image classification. It leverages an autoencoder to construct class-attribute signatures to effectively bridge the gap between seen and unseen classes. This enables SpanishFSL to generalize and accurately recognize words not present in the training set. Our labeled dataset was prepared by paleography experts using a subset of the notary records drafted by two notaries. Through experimental evaluation, we observed that SpanishFSL can outperform other FSL classifiers in terms of word recognition accuracy.
Year: 2024
Primary URL:
https://dl.acm.org/doi/pdf/10.1145/3595916.3626365Primary URL Description: This url deploys the pdf version of this article.
Access Model: Open Access
Format: Journal
Periodical Title: MMAsia '23: Proceedings of the 5th ACM International Conference on Multimedia in Asia
Publisher: ACM Digital Library
Prizes
UMKC MIDE Student Spotlight Award
Date: 4/28/2023
Organization: UMKC's Missouri Institute for Defense and Energy
Abstract: The award highlights 5 core values: Gritty Innovation; Research Integrity; Teamwork; Confident Humility; Passion.
Seventeenth-Century Spanish American Notary Records for Fine-Tuning Spanish Large Language Models. (Article)Title: Seventeenth-Century Spanish American Notary Records for Fine-Tuning Spanish Large Language Models.
Author: Shraboni Sarker
Author: Ahmad Tamim Hamad
Author: Hulayyil Alshammari
Author: Viviana Grieco
Author: Praveen Rao
Abstract: Large language models have gained tremendous popularity in do- mains such as ecommerce, finance, healthcare, and education. Fine- tuning is a common approach to customize an LLM on a domain- specific dataset for a desired downstream task. In this paper, we present a valuable resource for fine-tuning LLMs developed for the Spanish language to perform a variety of tasks such as classi- fication, masked language modeling, clustering, and others. Our resource is a collection of handwritten notary records from the seventeenth century obtained from the National Archives of Ar- gentina. This collection contains a combination of original images and transcribed text (and metadata) of 160+ pages that were hand- written by two notaries, namely, Estenban Agreda de Vergara and Nicolas de Valdivia y Brisuela nearly 400 years ago. Through empir- ical evaluation, we demonstrate that our collection can be used to fine-tune Spanish LLMs for tasks such as classification and masked language modeling, and can outperform pretrained Spanish models and ChatGPT-3.5/ChatGPT-4o. Our resource will be an invaluable resource for historical text analysis and is publicly available on GitHub.
Year: 2024
Primary URL:
https://arxiv.org/pdf/2406.05812Primary URL Description: URL deploys pdf version of article
Access Model: Open Access
Format: Journal
Periodical Title: ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL)
Publisher: arXiv
Area de Historia Digital. Instituto de Historia Latinoamericana y Argentina Dr. Emilio Ravignani (Web Resource)Title: Area de Historia Digital. Instituto de Historia Latinoamericana y Argentina Dr. Emilio Ravignani
Author: Martin L.E. Wasserman
Abstract: This website features all the programs that are being developed under the Digital History section of the Instituto de Historia Latinoamericana y Argentina Dr. Emilio Ravignani (Buenos Aires, Argentina).
Year: 2020
Primary URL:
http://ravignani.institutos.filo.uba.ar/area-de-historia-digitalA Knowledge Graph for Managing and Analyzing Spanish American Notary Records (Web Resource)Title: A Knowledge Graph for Managing and Analyzing Spanish American Notary Records
Author:
https://www.umkc.edu/mide/NEH-Project/Author: Viviana Grieco
Author: Proven Rao
Abstract: This website summarized the progress of our work. It features team members, presentations, and publications.
Year: 2020
Few-Shot Learning for Word Recognition in Handwritten Seventeenth-Century Spanish American Notary Records (Conference Paper/Presentation)Title: Few-Shot Learning for Word Recognition in Handwritten Seventeenth-Century Spanish American Notary Records
Author: Nouf Alrasheed
Author: Shraboni Sarker
Author: Viviana Grieco
Author: Praveen Rao
Abstract: Historical records are invaluable sources of information that provide insights into multiple aspects of past events and societies. The analysis of historical records using deep learning poses critical challenges such as the lack of sufficient labeled data and at times the poor quality of scanned images. In this paper, we propose SpanishFSL, a few-shot learning (FSL) approach for word recognition in 17th-century handwritten Spanish American notary records. SpanishFSL draws inspiration from a zero-shot learning approach developed for image classification. It leverages an autoencoder to construct class-attribute signatures to effectively bridge the gap between seen and unseen classes. This enables SpanishFSL to generalize and accurately recognize words not present in the training set. Our labeled dataset was prepared by paleography experts using a subset of the notary records drafted by two notaries. Through experimental evaluation, we observed that SpanishFSL can outperform other FSL classifiers in terms of word recognition accuracy.
Date: 12/06/2023
Primary URL:
http://http://www.mmasia2023.org/program.htmlConference Name: ACM Multimedia Asia 2023