4th Summer Datathon on Linguistic Linked Open Data (SD-LLOD-22)

The 4th Summer Datathon on Linguistic Linked Open Data (SD-LLOD-22) will be held physically from May 30th to June 3rd 2022 at Residencia Lucas Olazábal of Universidad Politécnica de Madrid, Cercedilla, Madrid (arrival expected on 29th evening).

The SD-LLOD-22 datathon has the main goal of giving people from industry and academia practical knowledge in the field of Linked Data applied to Linguistics. The final aim is to allow participants to migrate their own (or other’s) linguistic data and publish them as Linked Data on the Web and/or develop applications on top of Linguistic Linked Data. This datathon series is unique in its topic worldwide and continues from the success of the previous editions in 2015 and 2017 in Cercedilla (Spain), and in 2019 in Dagstuhl (Germany). This edition is sponsored and organised by COST (European Cooperation in Science and Technology) through NexusLinguarum, the “European network for Web-centred linguistic data science” COST Action (CA18209, https://nexuslinguarum.eu/) and funded by the European Union.

Sponsored and organised by


Also supported by



During the datathon, participants will:

  • Generate and publish their own Linguistic Linked Data from some existing data sources..
  • Apply Linked Data principles and Semantic Web technologies (Ontologies, RDF, Linked Data) into the field of language resources.
  • Use the principal models used for representing Linguistic Linked Data, in particular OntoLex lemon.
  • Learn about Linked Data-based NLP workflows and applications.
  • Learn about potential benefits and applications of Linguistic Linked Data for specific use cases.


During the datathon, seminars will be organised to cover topics such as:

  • Ontologies and Linked Data
  • The Lexicon Model for Ontologies (Ontolex-Lemon)
  • Integrating documents, annotations and NLP tools with Linked Data and RDF using Web Annotation and NIF
  • Guidelines for RDF generation and publication of Language Resources
  • Linked Data in lexicography and terminology
  • Use and Applications of Linguistic Linked Data
  • Metadata and Licenses for Linguistic Linked Data
  • Linked Data-aware NLP workflows

The program of the summer datathon will contain three types of sessions:

      1. Seminars to show novel aspects and discuss selected topics.
      2. Hands-on sessions to introduce the basic foundations of each topic, methods, and technologies and where participants will perform different tasks using the methods and technologies presented.
      3. Datathon sessions sessions, where participants will work, in groups of 4-5, on miniprojects and where they will apply what they have learned, involving the generation and/or use of Linguistic Linked Data.

Participants are invited to propose a “miniproject” related to the topics of the datathon, which might include some datasets for their conversion into linked data. In this edition, we particularly encourage miniprojects that involve under-resourced languages. A selection of proposals will form the basis for the miniprojects which the participants will work on during the datathon sessions. Participants who do not propose a miniproject, or whose miniproject is not selected, will be able to join another miniproject. There will be an award for the best miniproject.

Participants should bring their own laptops to follow the hacking sessions, but they will be provided with digital copies of all the material used during the course and will have assistance for installing all the required software.


Sun 29/5 Mon 30/5 Tue 31/5 Wed 1/6 Thu 2/6 Fri 3/6
09:00 - 09:30 Opening and Ontology/Linked Data basics Presentation of participant groups Parallel Seminars: Linguistic Annotations; Terminology and Lexicography Seminar: Metadata Seminar: LD-aware NLP workflows (Teanga)
09:30 - 10:00 Parallel Hands-on: Corpora Annotation; Lexicographical Data
10:00 - 11:00 Seminar: Linguistic Linked Data and Ontolex Seminar: SPARQL Hands-on: Linking (NAISC and VocBench) Datathon: Results Presentations
11:00 - 11:30 Coffee Break Coffee Break Coffee Break Coffee Break Coffee Break
11:30 - 12:00 Introduction to VocBench and LiLa Parallel Hands-on: Basic SPARQL Querying; Intermediate SPARQL Querying Parallel Hands-on: Corpora; Terminology and Lexicography Datathon Datathon: Results Presentations
12:00 - 12:30 Invited Talk: Artem Revenko
13:00 - 14:30 Lunch Lunch Lunch Lunch Lunch
14:30 - 15:00 Hands-on Ontolex Lexicon Building Parallel Hands-on: Basic LLOD Generation; Intermediate LLOD Generation Daily Report (tutors only) Daily Report (tutors only) Conclusions and Awards
15:00 - 15:30 Datathon Datathon
16:00 - 16:30 Coffee Break Coffee Break Coffee Break Coffee Break
16:30 - 19:30 Arrival, Registration and Installfest Minute Madness, Projects and Groups Selection, Datathon Datathon Excursion to Segovia Datathon
19:30 - 20:30 Hiking around Cercedilla
20:30 - 22:00 Dinner and Icebreaking Session Dinner Dinner Dinner Dinner
Please, find the PDF version of the schedule here.


Jorge Gracia

University of Zaragoza

Patricia Martín-Chozas

Ontology Engineering Group, Universidad Politécnica de Madrid

Anas Fahad Khan

Institute for Computational Linguistics «A. Zampolli». CLARIN-IT, Italy

Christian Chiarcos

Institute for Digital Humanities, University of Cologne, Germany

Local Organisers

Elena Montiel Ponsoda

Ontology Engineering Group, Universidad Politécnica de Madrid


The datathon is a sponsored event, and it has no registration fee, but participants are expected to cover the cost of their meals and accommodation at the Residence for the whole duration of the datathon. The general price is 575 euros. As part of the registration process, applicants are invited to submit a short abstract of their ideas for the datathon (miniproject proposal, e.g., description of possible resources to be converted, linked or reused during the datathon, ideas for use cases, etc.).

Registration will close on 18/04/2022. A minimum of fifteen travelling grants will be provided by NexusLinguarum (covering accommodation, meals and travel expenses). To apply for it, the participant must be affiliated with a legal entity in a COST Full/Cooperating Member country or a COST Near Neighbour Country. To register in the SD-LLD-22 and to find out more info about the travelling grants, please fill this form.

COVID statement

The datathon is planned as a physical event. The local organisation is committed to guaranteeing a safe event. Note that there might be some COVID rules to comply with at the time of celebration of the event. These will be announced in due course.

Important Dates (tentative)


Sina Ahmadi

NUI Galway, Ireland

Thierry Declerck

DFKI, Germany

Milan Dojcinovski

CTU in Prague, Czech Republic

Cristian Fäth

Goethe-University Frankfurt am Main, Germany

Dagmar Gromann

University of Vienna, Austria

Max Ionov

Institute for Digital Humanities, University of Cologne, Germany

Gilles Sérasset

Université Grenoble Alpes, France

Andon Tchechmedjiev

IMT École des Mines d’Alès

Invited lecturers

Manuel Fiorelli

University of Rome Tor Vergata, Italy

David Lindemann

UPV/EHU University of the Basque Country, Spain

Francesco Mambrini

Università Cattolica del Sacro Cuore, Italy

Bernardo Stearns

NUI Galway, Ireland

Armando Stellato

University of Rome Tor Vergata, Italy


Artem Revenko

Semantic Web Company


The 4th Summer Datathon on Linguistic Linked Open Data (SD-LLOD-22) will be held at Cercedilla (Madrid), which is a small village in the mountains near Madrid.

The event will take place at the Residencia Lucas Olazábal of Universidad Politécnica de Madrid, which is located in Cercedilla, in the forest of the Sierra de Guadarrama, in a place known as Las Dehesas de Cercedilla, which is 50km from Madrid, and 15km from Navacerrada (Directions).

Going to Cercedilla from Madrid-Barajas airport

The nearest airport is Adolfo Suárez Madrid-Barajas. Once you are in Madrid, the best option to go to Cercedilla is by train. The estimated time of the whole trip (from Adolfo Suárez Madrid-Barajas airport to Cercedilla Residence) is around 2 hours.

Taking the train to Cecerdilla

If you reach to Terminals 1, 2 or 3 in Adolfo Suárez Madrid-Barajas Airport
Go to the metro station and take take line 8 (the pink line on the map) to Nuevos Ministerios statin. Leave the metro station and go to the train station (Cercanías Renfe). Then take line C8B to Cercedilla.

If you reach to Terminal 4 in Adolfo Suárez Madrid-Barajas Airport
Go to the train station and take line C1 to Chamartín. Then take line C8B to Cercedilla.

Getting to the Residence From Cercedilla

The train depot in Cercedilla is 4 km away from the Residence. You should take a taxi to get to the residence. The taxi stop is in front of train station.
If you arrive late and there is nobody waiting for you at the Cecerdilla train station, please phone the following number:
(+34) 91 852 15 68

Going by car to the Residence

From Madrid, motorway A-6 until the exit El Escorial, Guadarrama. Take direction Guadarrama until crossroads with the old N-6 (it is a crossroads with traffic lights where you can see El Piquio Hotel). Turn left and cross Guadarrama village until Cercedilla indication (it is a road on the right, next to a headquarters). Go straight on road until Cercedilla. When you pass the Cercedilla’s train station, go straight on next crossroads Las Dehesas – La Fuenfría direction. When you arrive to forest information turn right following Residencia Lucas Olazábal UPM direction.

Residencia Lucas OlazC!bal

About LLOD and the SD-LLOD datathon series

In natural language processing, linguistics, and neighboring fields, Linguistic Linked Open Data (LLOD) describes a method and an interdisciplinary community concerned with creating, sharing and (re-)using language resources in accordance with Linked Data principles. The Linguistic Linked Open Data cloud was conceived and is maintained by the Open Linguistics Working Group (OWLG) of Open Knowledge International, and has been a point of focal activity for several W3C community groups, research projects and infrastructure efforts since then.

To a large extent, LLOD development has been driven forward by international workshops and accompanying hackathons, as organized, for example, in the context of workshops on Multilingual Linked Open Data for Enterprises in 2012 and 2014 in Leipzig, Germany. Since 2015, these are organized in the form of bi-annual summer schools: The first Summer Datathon on Linguistic Linked Open Data (SD-LLOD’15) was held in June 2015 in Cercedilla, Madrid, Spain, as was the second Summer Datathon on Linguistic Linked Open Data (SD-LLOD’17) in July 2017. The 2019 edition is organized in conjunction with and held before the 2nd International Conference on Language, Data and Knowledge (LDK-2019, May 20th-22th, Leipzig, Germany).

Notable outcomes of earlier datathon editions include the first installment of the LLOD cloud and the LLOD cloud diagram (as a result of MLODE-2012), a large number of converted resources, and numerous scientific publications, and thesis projects that build on successful mini-projects, experiments or case studies conducted at or initiated during the previous SD-LLOD datathon.

SD-LLOD-22 Slides and Materials