Program

Digital Humanities: Digging into Data

Period of Performance

2/1/2014 - 1/31/2017

Funding Totals

$125,000.00 (approved)
$125,000.00 (awarded)


Automating Data Extraction from Chinese Texts

FAIN: HJ-50173-14

President and Fellows of Harvard College (Cambridge, MA 02138-3800)
Peter K. Bol (Project Director: May 2013 to May 2017)

The development of the Automating Data Extraction from Chinese Texts platform to allow scholars to transform texts written in classical Chinese into highly structured data suitable for the application of text mining techniques. The project is led by humanities scholars and computer scientists from Harvard University (US) and King's College, London (UK) with additional expertise provided by scholars from National Taiwan University and Academia Sinica, Taiwan. The UK partner is requesting £125,000 from the UK funding consortium.

The Automating Data Extraction from Chinese Texts Project aims to provide humanists and social scientists with a means of transforming 2200 years of Chinese texts into structured data. The project will fully develop an open-source platform that allows its users to apply sophisticated text-mining techniques, hitherto the domain of information scientists, to a wide variety of historical and literary texts. Users interested in biographical data, for example, will be able to tag and extract personal names, dates, place names, official titles and postings, kinship ties, and other social relationships. The platform will be tested against 2000 local histories spanning an 800-year period and 19,000 letters and 500 notebooks dating from the seventh through the thirteenth century. Data extracted from the sample repositories will be used to enrich text-mining applications and will also be made available in English and Chinese for research through open-access online databases and data archives.





Associated Products

MARKUS (Web Resource)
Title: MARKUS
Author: Ho Hou Ieong
Author: Hilde De Weerdt
Abstract: MARKUS is a semi-automatic markup platform that is designed to automate the markup process of specific types of named entities (personal names, place names, temporal references, and official titles) in classical Chinese. Users can also tag their texts automatically by uploading their own lists of words or regular expressions, as well as edit or add any markup manually after the automatic markup process. A range of online reference tools have been integrated in the platform. Users can consult these reference sources while reading texts or verifying markup. The final results can be exported into tabular format for further statistical, temporal, spatial, and social network analysis.
Year: 2014
Primary URL: http://dh.chinese-empires.eu/beta/index.html
Primary URL Description: This URL gives access to the online text markup system

Transforming historical texts into data for computerized analysis (Book Section)
Title: Transforming historical texts into data for computerized analysis
Author: Shih-pei Chen
Author: Ho Hou Ieong
Editor: Chu-Ren Huang
Abstract: forthcoming
Year: 2015
Publisher: Springer
Book Title: Digital Humanities: Bridging the Divide”, Series on Humanities in Asia

Chinese Empires in Comparative Perspective: A Digital Approach (Article)
Title: Chinese Empires in Comparative Perspective: A Digital Approach
Author: Ho Hou Ieong
Author: Hilde De Weerdt
Abstract: In the special issue Studies in Global Asias, Verge Vol 4.
Year: 2015
Primary URL: https://www.upress.umn.edu/journal-division/Journals/verge-studies-in-global-asias
Access Model: print
Format: Journal
Periodical Title: Verge: Studies in Global Asias
Publisher: University of Minnesota Press

Introduction to MARKUS (Film/TV/Video Broadcast or Recording)
Title: Introduction to MARKUS
Writer: Ho Hou Ieong
Abstract: Introduction to the Markus system, currently in Chinese.
Year: 2015
Primary URL: https://youtu.be/NltG3EjC9_A
Secondary URL: https://www.youtube.com/watch?v=jGfnlhywztY&feature=youtu.be
Access Model: open access
Format: Video

MARKUS update and new tools (Blog Post)
Title: MARKUS update and new tools
Author: Ho Hou Ieong
Author: Hilde De Weerdt
Abstract: updates on tools and system development
Date: 6/1/2015
Primary URL: http://chinese-empires.eu/blog/markus-update-and-new-tools/
Secondary URL: http://chinese-empires.eu
Website: Chinese Empires

Textual Markup and Historical Research (Public Lecture or Presentation)
Title: Textual Markup and Historical Research
Abstract: How the use of text markup enables Chinese historical research. ?????????
Author: Hilde De Weerdt
Date: 6/29/2015
Location: Academia Sinica, The Institute of History and Philology, Taipei

Textual Markup and Historical Research (Public Lecture or Presentation)
Title: Textual Markup and Historical Research
Abstract: ????????? / How text markup can enable historical research
Author: Hilde De Weerdt
Date: 6/27/2015
Location: National Taiwan University, History Department, Taipei

Integrating Research, Teaching, and Learning in the Digital Era (Public Lecture or Presentation)
Title: Integrating Research, Teaching, and Learning in the Digital Era
Abstract: Overview of digital methods in study of Chinese history
Author: Hilde De Weerdt
Date: 2/5/2015
Location: the Irish College in Leuven, University of Leuven

MARKUS and the New History of Collective Action (Public Lecture or Presentation)
Title: MARKUS and the New History of Collective Action
Abstract: Using the MARkUS markup system for historical research on collective action
Author: Hilde De Weerdt
Date: 12/1/2014
Location: International Conference on Digital Archives and Digital Humanities, Academia Sinica, Taipei

Why Teach Chinese Text Analysis and Research Skills Online? (Public Lecture or Presentation)
Title: Why Teach Chinese Text Analysis and Research Skills Online?
Abstract: overview of digital text analysis methods
Author: Hilde De Weerdt
Date: 6/24/2014
Location: Leiden University

MARKUS:An Infrastructural Semi-automatic Markup Platform for Classical Chinese (Public Lecture or Presentation)
Title: MARKUS:An Infrastructural Semi-automatic Markup Platform for Classical Chinese
Abstract: introduction to MARKUS
Author: Ho Hou Ieong
Date: 7/3/2015
Location: Digital Humanities 2015

MARKUS: a Semi-automatic Markup Platform for Classical Chinese (Public Lecture or Presentation)
Title: MARKUS: a Semi-automatic Markup Platform for Classical Chinese
Abstract: introduction to MARKUS for the conference on Chinese Local Gazetteers (???): Historical Method and Computerized Data Collection and Analysis
Author: Ho Hou Ieong
Date: 4/24/2015
Location: Max Planck Institute for the History of Science Dept. III, Berlin

MARKUS: A Semi-automatic Markup Platform for Classical Chinese (Public Lecture or Presentation)
Title: MARKUS: A Semi-automatic Markup Platform for Classical Chinese
Abstract: Introduction to Markus, in Chinese MARKUS:?????????????
Author: Ho Hou Ieong
Date: 12/1/2014
Location: The 5th International Conference of Digital Archives and Digital Humanities, Taipei
Primary URL: https://www.academia.edu/11078612

Automating Data Extraction from Chinese Texts Project (Public Lecture or Presentation)
Title: Automating Data Extraction from Chinese Texts Project
Abstract: Introduction to the MARKUS system
Author: Ho Hou Ieong
Date: 10/24/2014
Location: Digital Sinology and Taiwan Studies, Oriental Institute, Czech Academy of Sciences, Prague

Digital humanities and western Sinology (Public Lecture or Presentation)
Title: Digital humanities and western Sinology
Abstract: role of digital humanities in western Sinology for the Digital humanities in Taiwan Conference. In Chinese ???????????
Author: Ho Hou Ieong
Date: 03/12/2014
Location: College of Humanities and Social Sciences, National Tsing Hua University, Hsinchu, Taiwan

Textual Markup and Humanities Research (Conference/Institute/Seminar)
Title: Textual Markup and Humanities Research
Author: Hilde De Weerdt
Abstract: for the Workshop session in “????????” Markup, Statistics and Analysis
Date Range: 3/20/2015
Location: National Chung Hsing University, Taichung

Textual Markup and Humanities Research (Conference/Institute/Seminar)
Title: Textual Markup and Humanities Research
Author: Hilde De Weerdt
Abstract: Workshop session in “????????” (Markup, Statistics and Analysis)
Date Range: 3/6/2015
Location: National Cheng Gung University, Tainan

MARKUS: Reading and Analyzing Classical Chinese Texts Digitally (Conference/Institute/Seminar)
Title: MARKUS: Reading and Analyzing Classical Chinese Texts Digitally
Author: Hilde De Weerdt
Abstract: for the Workshops on Digital Resources in Chinese Historical, Geographical, and Literary Studies, at AAS-in-Asia Conference, Singapore.
Date Range: 7/18/2014
Location: AAS-in-Asia Conference, Singapore

MARKUS: Reading and Analyzing Classical Chinese Texts Digitally (Conference/Institute/Seminar)
Title: MARKUS: Reading and Analyzing Classical Chinese Texts Digitally
Author: Hilde De Weerdt
Abstract: itnroduction to MARKUS
Date Range: 6/6/2014
Location: International Interdisciplinary Conference on Middle Period China, 800-1400, Harvard University, Cambridge, MA

Mining and Discovering Biographical Information in Difangzhi with a Language-Model-based Approach (Conference Paper/Presentation)
Title: Mining and Discovering Biographical Information in Difangzhi with a Language-Model-based Approach
Author: Hongsu Wang
Author: Chao-lin Liu
Author: Peter K. Bol
Abstract: Describes a language model based approach to extracting biographical data from Chinese local histories
Date: 7/3/2015
Conference Name: Digital Humanities 2015, Sydney

Local Gazetteers as Databases: Joining the Geographical and Biographical (Conference Paper/Presentation)
Title: Local Gazetteers as Databases: Joining the Geographical and Biographical
Author: Peter K. Bol
Abstract: Integrating biographical and spatial data found in Chinese local gazetteers
Date: 04/27/2015
Conference Name: International Symposium on “Chinese Local Gazetteers: Historical Method and Computerized Data Collection and Analysis, Max Planck Institute, Berlin

The Big Data of Chinese Biography (Public Lecture or Presentation)
Title: The Big Data of Chinese Biography
Abstract: Introduction to the China Biographical Database
Author: Hongsu Wang
Author: Peter K. Bol
Date: 03/22/1015
Location: Yale University

Humanities, digital resources, and digital tools: two examples (Public Lecture or Presentation)
Title: Humanities, digital resources, and digital tools: two examples
Abstract: On the China Biographical Database and the online MOOC "ChinaX" as examples of using digital tools and resources for the study and teaching of Chinese History ?????????????????????????????????? - ???????????(CBDB)??????(ChinaX)
Author: Wen Yu
Author: Peter K. Bol
Date: 03/11/2014
Location: National Library, Taibei

Computational Methodologies for Chinese History (Public Lecture or Presentation)
Title: Computational Methodologies for Chinese History
Abstract: The development of the China Biographical Database
Author: Peter K. Bol
Date: 04/08/2014
Location: Chinese University Hong Kong

China Biographical Database (Database/Archive/Digital Edition)
Title: China Biographical Database
Author: Song Chen
Author: Peter K. Bol
Author: Hongsu Wang
Author: Shih-pei Chen
Author: Ho Hou Ieong
Author: Michael A. Fuller
Abstract: The China Biographical Database is an online relational database with biographical information about approximately 360,000 individuals as of April 2015, primarily from the 7th through 19th centuries. The data is meant to be useful for statistical, social network, and spatial analysis as well as serving as a kind of biographical reference.
Year: 2014
Primary URL: http://isites.harvard.edu/icb/icb.do?keyword=k16229