Date: Wednesday, March 5, 2014

Time: 12:00 - 14:00 (CET), that is 13:00 - 15:00 (EET, Helsinki), or 15:00 - 17:00 (MSK)

Place: National Library Of Finland, Vallila office. Street address: Teollisuuskatu 23, 4th floor, room A414 (door at the lobby by the lifts)

Remote Access by Adobe Connect: Please check the connection in good time before the start of the webinar. Your computer may need to download components.


  • 13:00-13:15 Jussi-Pekka Hakkarainen (National Library of Finland): An introduction to the OCR Webinar
  • 13:15-13:30 Wouter van Hemel & Anis Moubarik (National Library of Finland): What the OCR editor does and how it works
  • 13:30-13:45 Jack Rueter (University of Helsinki): The Language Access and Open Choice
  • 13:45-14:00 Minna Vanhasalo (University of Tampere): The Kotus Experience: Crowdsourcing the Old Literary Finnish for the research's benefit
  • 14:00-14:15 Crowdsourcing session (available for the registered users only)
  • 14:15-14:30 Feedback session

Each presentation will be followed by a brief interval of questions. The enquiries to the speakers can be set via chat in Adobe Connect. Comments can also be set at the Kiwi Page of the project.

If you have any questions regarding this information or webinar, please, don’t hesitate to contact us by e-mail ([email protected]) or by phone (+358 50 363 9223).

  • No labels


  1. Dear colleagues, please, give your feedback here by writing a comment

  2. Anonymous

    The conference room in Adobe Connect will be opened for the public about 15 mins prior to the kick-off

  3. Reply to Jeremy Bradley,

    thank you for participating in our webinar and asking the questions. I am sorry that I haven't managed to reply to you any earlier, albeit I promised to do so asap.

    You asked, ( what would be the the most appropriate way to store the Mari-El newspapers and make the texts available for the public.

    Your question was: "Is there *any* way to legally and ethically make these texts available to a wider audience? Possibly with restrictions, only allowing access to single sentences at a time, or only allowing access to registered users, only handing out IDs to certain university departments? Obviously, I'm not going to hurt anyone's business interests by publishing these materials myself, and I would give full attribution, but: this knowledge does not give me the right to simply ignore copyright whole-sale."

    Here's my answer: Frankly speaking, there's no other way round here than deal with the copyright holders over the publishing the copies of the Mari-El journals/newspapers. Have a look at the comments of Russian solictor on the copyrights here: The "Legal opinion" begins from the page 45 It may give you some guidelines to tackle the matter.

    My personal opinion is that all material should be available for the public without any geographical or IP restrictions etc., but the matter needs to solved with the Mari-El newspaper, naturally. Only they could allow you to publish the material online, but I am afraid that there is a good reason why this material has been taken out from the servers earlier and I am convinced that this is neither commercial not political issue, but legal one.

    When it comes to the word lists that are made out of these newspapers, is however another question - into my opinion. I am not sure whether the solictors would agree with me, but you could publish the word lists through Clarin or equivalent service. For instance, once the OCR text of our material in Fenno-Ugrica will be corrected, we will donate the word lists to Fin-Clarin: I suggest you to consult Fin-Clarin and/or Literature Bank staff for detailed answers and your assistance. We at the National Library of Finland are not specialists when it comes to the wordlists and corpora - I am sorry.

    When it comes to the platform that is browsable and shows only a sentence or two for public could be beneficial for linguistic purpose, but doesn't really support the cause of openness and availability. You see, I am all or nothing man with this sort of approaches. However, I don't resist, if you want to create a corpus that shows a line or two, but that leads us once again to the problematics on the copyrights. I support the idea that the copyrights would be transferred as complete as possible in order to enable further use (scietific) of text in other platforms and systems too.

    I will post this reply to our KIWI page too.

    Yours &c.,