Suomenkielinen versio: OAI-PMH -haravointirajapinta Finnan indeksiin
History
v1.0 Initial version
General Information
OAI-PMH is a common API particularly suited for harvesting of metadata. Finna's OAI-PMH interface provides access to harvesting of all freely available material. Harvesting requires an application that supports the protocol, i.e. a harvester. There are libraries available for different programming languages, and there are applications that also support OAI-PMH. RecordManager is one that has been developed by the National Library of Finland and is used also in Finna. openarchives.org lists several other options.
The principal idea of OAI-PMH is that harvesting is done in batches, e.g. a 1000 records at a time. This ensures the harvester and provides can handle requests in a reasonable time. Every response contains a resumptionToken that allows the next set of records to be requested until no more records are available. We recommend using a harvester that can retry a request in case it fails e.g. due to a network error or other transient error. It's also worth noting that a single response can be relatively large, and e.g. parsers based on libxml2 may need the XML_PARSE_HUGE option to be able to parse the responses successfully.
Finna's OAI-PMH provider can be found at https://api.finna.fi/OAI/Server.
Restrictions
- The set of records to be harvested can be selected by using a set defined by the provider. OAI-PMH does not allows for search terms or other filtering options.
- Finna does not support deleted records, so any deletions are not indicated in the responses.
- The harvester must use a User-Agent that's not detected as a bot. Finna uses the Crawler-Detect library for bot check. Otherwise a request results in HTTP status code 403 (Forbidden). It is recommended to use a User-Agent string that identifies the service using the interface.
Metadata Formats in Finna
Finna has metadata in several different formats. OAI-PMH allows one to harvest all records in a specific source format, basic information in Dublin Core format or a special combination format that contains metadata fields commonly used in several formats. Collection list links in the table lead to Finna's search API.
Format | Coverage | Description | |
---|---|---|---|
oai_dc | Dublin Core | All content | Basic format that supports all records regardless of their original metadata format (collection list) |
oai_vufind_json | Finnan oma | All content | A combination of a Dublin Core base record and additional metadata in JSON format in oai_vufind_json:metadata element. A description of the available fields is available in the documentation for Finna's Search API. Only a subset of fields provided by the search interface is available, however. Content depends on the source format, and not all records have content in all fields. |
marc21 | MARCXML | Mostly library catalogs | A common format in library catalogs (collection list) |
oai_ead | EAD | Archival material | A format for describing archival material, old version (collection list) |
oai_ead3 | EAD3 | Archival material | A format for describing archival material, new version (collection list) |
oai_forward | FORWARD | Material of the National Audiovisual Institute | A format for audiovisual material based on the EN15907 standard (collection list) |
oai_lido | LIDO | Museums | A format commonly used in museums (collection list) |
oai_qdc | Qualified Dublin Core | Repositories, thesis, library material | An extended version of Dublin Core (collection list) |
Sets in Finna
Without a set definition one can harvest a representative set of deduplicated records. The following sets can also be used:
Set | Description |
---|---|
non_dedup | Non-deduplicated set of records. Larger than the default set, contains all duplicate records. |