You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Description

OCR Ground truth package for Swedish Fraktur.

Package contains XX page images and ALTO XML files for each page, with the proofreading done by the Finnish native speakers. The pages of fraktur range from the year 1771 until 1915. The package can help in creating own postcorrection algorithms for OCR text recognition.

 

NB! The ground truth package does not contain the data of the 1919 or later due to copyright reasons.

User interface
 
Data downloads
http://digi.kansalliskirjasto.fi/opendata  
API
-
License
Terms of use

Content type
Pageimages, ALTO XML, metadata
Language
Swedish
Data status
Primary source
Size
xx pages
Update frequency
-
Relationships
-
External information
Article: Creating and using ground truth OCR sample data for Finnish historical newspapers and journals (PDF file)
Acknowledgements
 
Contact information
[email protected]
Star rating
-
  • No labels