Page History
usage: extract.py [-h] [-v] [-i] [-u] [-s]
extract words from ALTO
optional arguments:
-h, --help show this help message and exit
-v, --verbose increase output verbosity
-i, --initial use inital revision (0) instead of latest
-u, --unlocked include unlocked documents
-s, --sanitise strip non-unicode characters from words
The `extract` tool extracts words from the ALTO XML files in the Revizor collection and stores them in an SQLite3 database. This enables one to use SQL queries to list and research the data in the collection.
...