usage: import.py [-h] [-v] [-d] [-a AUTHOR] [-t TITLE] [-i ISO] [-l DSHANDLE]
                 [-m META] [-u UID] [-e EMAIL] [-c]
                 document [parent]

import documents into application


positional arguments:

  document              document folder

  parent                UUID of parent collection

 

optional arguments:

  -h, --help            show this help message and exit

  -v, --verbose         show informational messages

  -d, --debug           show debugging messages

  -a AUTHOR, --author AUTHOR

                        set author

  -t TITLE, --title TITLE

                        set title

  -i ISO, --iso ISO     set language (iso code, 3 characters)

  -l DSHANDLE, --dshandle DSHANDLE

                        set DSpace handle link

  -m META, --meta META  set metadata fields (field=value)

  -u UID, --uid UID     uid of importing user

  -e EMAIL, --email EMAIL

                        email address of importing user

  -c, --continue        continue even if there are errors (dangerous!)

 

Only one of UID or email address is needed to specify the importing user.

 

 


 

The import tool adds a document to the editor. The importer creates a UUID for the new document; then the folders are added to the application's file storage and revision and page entries inserted into the database.

 

The UUID of the parent collection should be provided on the command line if you want the document to show up in the interface.

 

You can provide the UID of the user doing the importing by providing the --uid parameter.

 

You can add metadata by providing --title, --author, --iso (language) and --dshandle (DSpace link) options, or add random metadata by providing --meta tag=value pairs.

 

The document's directory structure needs to be in the correct format so the importer can find ALTO XML and pictures files. If you have to import a lot of documents, you'd do well to make a trivial shell script to automate the process. Several working examples of previous imports can be found in the application's path.

 

/document

  /pages/1/5eb63bbbe01eeed093cb22bb8f5acdc3

  /pages/2/2fbb46d103e517b49f2acffd9d51b4b6

  /thumbs/00001.jpg

  /thumbs/00002.jpg

  /images/00001.jpg

  /images/00002.jpg

  document.pdf

 

For the sake of completeness you can include the PDF in the import path, but as of now there is no direct link in the interface to download this PDF. Use the DSpace link field.

 

You can update the metadata of any inserted documents after the fact by making sure the document has a handle link to DSpace – you can add this link from the web interface if you did not do so at the time of import – and running the `dscacher` tool, which will retrieve the JSON metadata and cache it locally in the database.

 

See also:

    dscacher.py: get/sync Dublin Core metadata from DSpace

 

 

  • No labels