Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


OCR Ground truth package for Swedish Fraktur.

Excerpt

...

dlclassmeet

...

Package contains page images and ALTO XML files for each page, with the proofreading done in Swedish  by the Finnish native speakers. The pages of fraktur range from the year 1771 until 1915. The package can help in creating own post correction algorithms for OCR text recognition.

...

Definition List
dlclassmeet


Description

Note1. the tiff files exif metadata lacks resolution information, so if the coordinates of ALTO do not match, be aware that images has been done either 200 or 300 dpi.

...

Note2. The ground truth package does not contain the data of the 1918 or later due to copyright reasons.

User interface

...

 

 





Definition List
dlclassmeet


 
Content typePageimages, ALTO XML, metadata
LanguageSwedish
Data statusPrimary source
Sizexx pages
Update frequency-
Relationships-
External informationArticle: Creating and using ground truth OCR sample data for Finnish historical newspapers and journals (PDF file)
Acknowledgements
Contact information[email protected]
Star rating-


...