Просмотреть запись

Automating Historical Source Transcription

Электронный научный архив УРФУ

Информация об архиве | Просмотр оригинала
 
 
Поле Значение
 
Заглавие Automating Historical Source Transcription
 
Автор Thorvaldsen, G.
 
Тематика CENSUS
MACHINE LEARNING
POPULATION REGISTER
TRANSCRIPTION
 
Описание Transcribing the 1950 Norwegian census with 3.3 million person records and linking it to the Central Population Register (CPR) provides longitudinal information about significant population groups during the understudied period of the mid-20th century. Since this source is closed to the public, we receive no help from genealogists and rather use machine learning techniques to semi-automate the transcription. First the scanned manuscripts are split into individual cells and multiple names are divided. After the birthdates were transcribed manually in India, a lookup routine searches for families with matching sets of birthdates in the 1960 census and the CPR. After manual checks with GUI routines, the names are copied to the text version of the 1950 census, also storing the links to the CPR. Other fields like occupations or gender contain numeric or letter codes and are transcribed wholesale with routines interpreting the layout of the graphical images. Work employing these methods has also started on the 1930 census, which is the last of the Norwegian censuses to be transcribed. © 2021, Thorvaldsen.
International Institute of Social History Amsterdam
Lars Ailo Ballo
Scientific Research Network of Historical Demography
National Institutes of Health, NIH
Universitetet i Tromsø, UiT
European Science Foundation, ESF
International Institute of Social History, IISH
Fonds Wetenschappelijk Onderzoek, FWO
Norges Forskningsråd, (225950)
Funding text 1: Historical Life Course Studies is a no-fee double-blind, peer-reviewed open-access journal supported by the European Science Foundation (ESF, http://www.esf.org), the Scientific Research Network of Historical Demography (FWO Flanders, http://www.historicaldemography.be) and the International Institute of Social History Amsterdam (IISH,
Funding text 2: This paper is written with input from Kåre Bævre (National Institute of Health), Lars Holden (Norwegian Computing Center), Trygve Andersen (UiT) and Lars Ailo Ballo (UiT). Supported financially by the Norwegian Research Council (Project # 225950).
 
Дата 2024-04-08T11:05:43Z
2024-04-08T11:05:43Z
2022
 
Тип Article
Journal article (info:eu-repo/semantics/article)
Published version (info:eu-repo/semantics/publishedVersion)
 
Идентификатор Thorvaldsen, G 2021, 'Automating Historical Source Transcription', Historical Life Course Studies, Том. 10, стр. 59-63. https://doi.org/10.51964/hlcs9568
Thorvaldsen, G. (2021). Automating Historical Source Transcription. Historical Life Course Studies, 10, 59-63. https://doi.org/10.51964/hlcs9568
2352-6343
Final
All Open Access; Gold Open Access; Green Open Access
https://hlcs.nl/article/download/9568/10093
https://hlcs.nl/article/download/9568/10093
http://elar.urfu.ru/handle/10995/131208
10.51964/hlcs9568
85170375792
 
Язык en
 
Права Open access (info:eu-repo/semantics/openAccess)
cc-by
https://creativecommons.org/licenses/by/4.0/
 
Формат application/pdf
 
Издатель European Historical Population Samples Network (EHPS-Net)
 
Источник Historical Life Course Studies
Historical Life Course Studies