You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

Description

OCR Groundtruth package for Finnish Fraktur.

Package contains 450 page images and ALTO XML files for each page, with the proofreading done by the Finnish native speakers. The pages of fraktur range from the year 1836 until 1910. The package can help in creating own postcorrection algorithms for OCR text recognition.

There is also an Excel file for all of the 471 903 words, which contains result given to the word by Tesseract and FineReader. If a tool hasn't found corresponding word, then the given cell is empty, so select the words in the Excel, which you need.

User interface
 
Data downloads
http://digi.kansalliskirjasto.fi/opendata  
API
-
License
Terms of use (in Finnish).

Content type
Pageimages, ALTO XML, metadata
Language
Finnish
Data status
Primary source
Size
450 pages
Update frequency
-
Relationships
-
External information
 
Acknowledgements

Digitalia project.

Leverage from the EU 2014-2020European Union
Contact information
kk-tutkijapalvelut@helsinki.fi
Star rating
-
  • No labels