Finto AI is a service for automated subject indexing. It can be used to suggest subjects for text in Finnish, Swedish and English.. It currently gives suggestions based on concepts of the the General Finnish Ontology YSO.
Finto AI can be used via the form at ai.finto.fi . You can use the form by copy-pasting text to the large text field and then clicking the button "Get subject suggestions". In the drop-down menu you can choose the language of the text. You can also set the maximum number of suggestions you would like to receive.
Finto AI also has an API, which makes it easy to integrate with other systems. More information of the API can be found from this wiki on the page Finto AI open API service, a detailed OpenAPI/Swagger technical documentation is available at https://ai.finto.fi/v1/ui/.
An API integration is already in place at the University of Jyväskylä : students submitting their Master's thesis to the JYX repository get suggestions from Annif that they can use or discard, then a librarian/informatician does a final check. A similar workflow is being piloted in the Osuva repository of the University of Vaasa.
Vocabularies and Languages
Finto AI currently uses the latest version of the General Finnish Ontology (2020.4 Diotima) including place names (YSO Places). It supports three languages, Finnish, Swedish and English, and gives subject terms in the same language as the text it is given. We are planning to expand the choice of subject vocabularies and languages in the future.
In the development process we have discovered and tested several algorithms, and selected the currently best combination for Finto AI (Maui, Omikuji and a TensorFlow based NN ensemble). The algorithms have been trained primarily with metadata from the Finna discovery service, but full text documents have also been used for fine-tuning the models. The development work of is ongoing and we will offer updates and improvements to Finto AI accordingly.
From Annif API to Finto AI, the Production Version
Finto AI is based on Annif, a tool for automated subject indexing. You can read more about using Annif in it's GitHub Wiki. To work, Annif needs a controlled vocabulary (subject headings, thesaurus or classification) and existing metadata - Annif can then be used to assign subjects for new documents. This tool is built upon a combination of existing natural language processing and machine learning tools including Maui, Omikuji, fastText and Gensim. It is designed to be multilingual and it can support any subject vocabulary (in SKOS or a simple TSV format). It can be used either via a command-line interface or a microservice-style REST API. In fact, this demo API under Api.annif.org and the demo form at annif.org are the basis of Finto AI. As a development tool, Annif offers more methods than Finto AI, successful features will be integrated to Finto AI in time.
Finto AI returns subject terms in the same language as the input text. In the near future we are going to make it possible to select the term language independently of the text language, so that e.g. English language documents can be given subject keywords in Finnish or vice versa.
In the near future, after some further research and development, we’ll aim to offer a learn method via the API, so that human-corrected results could be used to teach Annif and improve results.