Computational Linguistics Videos

Simone Bocca – Data quality & Interoperability – The DataScientia approach (55:33)
Data quality and interoperability are characteristics playing a crucial role for data generation and exploitation. The cost of “noisy” (or “dirty”) data, as well as a low level of data interoperability, is paid at different levels during the data lifecycle. It affects the capacity of data retrieval, the interpretation of the data, the data adaptation and integration, therefore the capacity of exploiting the data. In other words the value of data itself. To limit such costs, quality and interoperability principles have been defined, such as the 5-star Open Data schema, and the FAIR principles. Nevertheless, one more step can be done towards data resources which can be smartly exploited in different contexts. This seminar aims at summarizing the current approaches for data quality and interoperability, therefore describing the approach defined by the DataScientia foundation to do the additional step toward higher quality and interoperable data.

Gabor Bella – Language Resources for South Africa (01:12:09)
This presentation is held with the Tshwane University of Technology colleges.

Alessandra Morellini – The Datascientia portal: access and navigation of Open data via CKAN (52:24)
This presentation will discuss data management in our Datascientia portal. First, we will discuss our context and how the issue of open data management is currently being addressed. In the next section, we will introduce the problem of data publishing and data composition, focusing on stratified data. In the third part of the talk, we will discuss our implementation of the proposed solution and the related technologies, in particular CKAN, one of the widely used portals. Finally, we will talk about possible improvements and problems that we plan to address in the future.

Nandu Chandran Nair – IndoUKC: a Concept-Centered Indian Multilingual Lexical Resource
The presentation was given by Nandu Chandrannair at the LREC 2022 conference(16:27)

Temuulen Khishigsuren – Using Linguistic Typology to Enrich Multilingual Lexicons: the Case of Lexical Gaps in Kinship
The presentation was given by Temuulen Khishigsruen at the LREC 2022 conference. (12:46)
11/05/2022 Find the paper

Gábor Bella – Language Diversity: Visible to Humans, Exploitable by Machines
We present the Universal Knowledge Core (UKC), a lexical database on more than 2.000 languages. The UKC focuses on linguistic diversity, representing cross-lingual phenomena such as lexical gaps, cognates, or lexical similarity. (08:00)
02/05/2022 Browse the data online
Download related datasets

Gábor Bella – Language Diversity: Visible to Humans, Exploitable by Machines (short version)
Short demo of the Universal Knowledge Core, a large multilingual lexical database with a focus on language diversity. (02:20)
25/01/2022 Browse the data online
Download related datasets

Khuyagbaatar Batsuren and Gábor Bella – MorphyNet: a Large Multilingual Database of Derivational and Inflectional Morphology
2021 SIGMORPHON workshop (15:25)
MorphyNet is a huge multilingual database of derivational and inflectional morphology. It is destined for a wide range of computational applications. It contains millions of entries in morpheme segmentation and covers 15 languages (and counting).
Find the paper | More information about the 2021 SIGMORPHON workshop

Yamini Chandrashekar – New UKC and UKC catalogue
KnowDive Seminars (50:10)
Visit Knowdive here

Fausto Giunchiglia – From language diversity to Bias (what about under-resourced languages)
Digital Language Divide (58:58)
Cyprus Center for Algorithmic Transparency