Suomenkielinen versio: OAI-PMH -haravointirajapinta Finnan indeksiin

History

v1.0 Initial version

General Information

OAI-PMH is a common API particularly suited for harvesting of metadata. Finna's OAI-PMH interface provides access to harvesting of all freely available material. Harvesting requires an application that supports the protocol, i.e. a harvester. There are libraries available for different programming languages, and there are applications that also support OAI-PMH. RecordManager is one that has been developed by the National Library of Finland and is used also in Finna. openarchives.org lists several other options.

The principal idea of OAI-PMH is that harvesting is done in batches, e.g. a 1000 records at a time. This ensures the harvester and provides can handle requests in a reasonable time. Every response contains a resumptionToken that allows the next set of records to be requested until no more records are available. We recommend using a harvester that can retry a request in case it fails e.g. due to a network error or other transient error. It's also worth noting that a single response can be relatively large, and e.g. parsers based on libxml2 may need the XML_PARSE_HUGE option to be able to parse the responses successfully.


Finna's OAI-PMH provider can be found at https://api.finna.fi/OAI/Server.

Restrictions

  • The set of records to be harvested can be selected by using a set defined by the provider. OAI-PMH does not allows for search terms or other filtering options.
  • Finna does not support deleted records, so any deletions are not indicated in the responses.
  • The harvester must use a User-Agent that's not detected as a bot. Finna uses the Crawler-Detect library for bot check. Otherwise a request results in HTTP status code 403 (Forbidden). It is recommended to use a User-Agent string that identifies the service using the interface.

Metadata Formats in Finna

Finna has metadata in several different formats. OAI-PMH allows one to harvest all records in a specific source format, basic information in Dublin Core format or a special combination format that contains metadata fields commonly used in several formats. Collection list links in the table lead to Finna's search API.


FormatCoverageDescription
oai_dcDublin CoreAll contentBasic format that supports all records regardless of their original metadata format (collection list)
oai_vufind_jsonFinnan omaAll contentA combination of a Dublin Core base record and additional metadata in JSON format in oai_vufind_json:metadata element. A description of the available fields is available in the documentation for Finna's Search API. Only a subset of fields provided by the search interface is available, however. Content depends on the source format, and not all records have content in all fields.
marc21MARCXMLMostly library catalogsA common format in library catalogs (collection list)
oai_eadEADArchival materialA format for describing archival material, old version (collection list)
oai_ead3EAD3Archival materialA format for describing archival material, new version (collection list)
oai_forwardFORWARDMaterial of the National Audiovisual InstituteA format for audiovisual material based on the EN15907 standard (collection list)
oai_lidoLIDOMuseumsA format commonly used in museums (collection list)
oai_qdcQualified Dublin CoreRepositories, thesis, library materialAn extended version of Dublin Core (collection list)

Sets in Finna

Without a set definition one can harvest a representative set of deduplicated records. The following sets can also be used:

SetDescription
non_dedupNon-deduplicated set of records. Larger than the default set, contains all duplicate records.
  • No labels