Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In the development process we have discovered and tested several algorithms, and selected the currently best combination for Finto AI (MauiMLLM, Omikuji and a TensorFlow based NN ensemble). The algorithms have been trained primarily with metadata from the Finna discovery service, but full text documents have also been used for fine-tuning the models. The development work of  is ongoing and we will offer updates and improvements to Finto AI accordingly.

...

Finto AI is based on Annif, a tool for automated subject indexing. You can read more about using Annif in it's GitHub Wiki. To work, Annif needs a controlled vocabulary (subject headings, thesaurus or classification) and existing metadata - Annif can then be used to assign subjects for new documents. This tool is built upon a combination of existing natural language processing and machine learning tools including e.g. Omikuji, fastText and Gensim. It is designed to be multilingual and it can support any subject vocabulary (in SKOS or a simple TSV format). It can be used either via a command-line interface or a microservice-style REST API. In fact, this demo API under Api.annif.org and the demo form at annif.org are the basis of Finto AI. As a development tool, Annif offers more methods than Finto AI, successful features will be integrated to Finto AI in time.

Future developments

Finto AI returns subject terms in the same language as the input text. In the near future we are going to make it possible to select the term language independently of the text language, so that e.g. English language documents can be given subject keywords in Finnish or vice versa.

In the near future, after some further research and development, we’ll aim to offer a learn method via the API, so that human-corrected results could be used to teach Annif and improve results.