This course is concerned with one of the five issues (the 5 V's) which are core to big data, namely "variety". The goal of the course is to provide students with tools, methodologies, technologies needed in order to successfully analyse, model and integrate data coming from multiple heterogeneous sources.

COURSE DESCRIPTION
This course will cover the following topics:
  • a general methodology for knowledge and data analysis, modeling and integration;
  • an analysis of the state of the art tools and methodologies for data analysis, modeling and integration;
  • an introduction to ontologies, Extended ER models and linguistic resources.

This is a hands-on, lab and experiment based course. Students will be given a data analysis/modelling/integration problem that they will have to solve, possibly, while taking the class. During the experiment, students will have to apply to the problem the notions introduced in class.

EXAM
The exam consists in addressing one of the items slots listed under "KDI project", where:
  • Each title needs to be instantiated by a specific domain of knowledge - e.g., "Healthcare Domain Knowledge (DK) Analysis" or "Food Domain Knowledge (DK) Evolution"
  • The final output consists on a written "Technical Report" (.doc) and, where requested, on a oral "Presentation" (.ppt)
  • Each course project can be extended with a "Research project" and, eventually, with a "Thesis project", as described in the table below
  • The items of a slot can be extended and/or combined with the items of another slot (in agreement with the teacher) - e.g., "DK Analysis" can be extended with "Create the OWL" model, which is present in "DK Generation"
  • A more fine-grained description of the "Research project" and the "Thesis" project will be produced in agreement with the teacher
Title KDI project (6 credits) Research project (12 credits; 300 hours) Thesis (24 credits; 600 hours)
Domain Knowledge (DK) Analysis
  • Take an existing DK as input
  • Make explicit the scenario
  • Make explicit the personas
  • Make explicit the storytelling
  • List and formalize the addressed queries
  • Design the EER Model (logical and physical views)
  • Make an example set from the given DK
  • Use the generated example set for analyzing the given DK
  • Generate a final presentation
gDrive folder population – the research project consists of creating new documentation for the given DK and/or updating the existing one Create a new RapidMiner extension for DK importing and analysis – the thesis consists of providing a concrete and novel contribution in the field of KR by developing a new and ad hoc extension of RapidMiner for KR analysis and management
Domain Knowledge (DK) Evolution
  • Take an existing DK as input
  • Take new DC inputs
  • Update the scenario
  • Update the personas
  • Update the storytelling
  • Update the addressed queries
  • Update the EER Model (logical view)
  • Update the OWL Model
  • Update the mapping with the CSK
  • Visualize the final output
  • Generate a final presentation
GitLab Project population – the research project consists of creating new DC files, i.e., input and output files (along with the corresponding recipes) for a given DK Create a new dump for an already existing DK with an example of application – the thesis consists of providing a concrete and novel contribution in the field of KR by developing a new ad hoc version of an already existing domain knowledge representation, which is able to address novel application requirements
Domain Knowledge (DK) Generation
  • Take new DC inputs
  • Describes the scenario
  • Describes the personas
  • Create the storytelling
  • List and formalize the addressed queries
  • Create the EER Model (logical view)
  • Create the OWL Model
  • Map the OWL model with the CSK
  • Visualize the final output
  • Generate a final presentation
GitLab Project creation – the research project consists of creating a new GitLab project and populate it with subprojects and files (along with the corresponding recipes) related to the newly generated DK Create the first dump of a new DK with an example of application – the thesis consists of providing a concrete and novel contribution in the field of KR by developing a new ad hoc domain knowledge representation, which is able to address novel application requirements
Domain Knowledge (DK) Integration
  • Take an existing DK as input
  • Take new data related to the given DK
  • Describe and analyze the data
  • Import the DK OWL into KARMA
  • Import the collected data into KARMA
  • Address the integration task
  • Describe and analyze the results
  • Generate a final presentation
Using a DK for data integration – the research project consists of importing and modelling new data instances through KARMA, updating and populating the GitLab “entity level” folder for the given domain, and eventually generating new DCs Create a new dump for an already existing DK with an example of application – the thesis consists of providing a concrete and novel contribution in the field of KR by developing a new ad hoc version of an already existing domain knowledge representation, which is able to address novel application requirements by using new imported data instances
Material
KDI previous editions:

REPOSITORY OF WORK FROM PREVIOUS YEARS. This page contains results from projects developed in the previous years. Students are welcome to reuse any of these previous results in the development of their current assignment.

COURSE DESIGNERS
9

Fausto Giunchiglia

Knowdive Group Founder

Personal Page
mattia.fumagalli

Mattia Fumagalli

Personal Page
subhashis.das

Subhashis Das

Personal Page
zamboni512

Alessio Zamboni

Personal Page