This course is concerned with one of the five issues (the 5 V's) which are core to big data, namely "variety". The goal of the course is to provide students with tools, methodologies, technologies needed in order to successfully analyse, model and integrate data coming from multiple heterogeneous sources.
COURSE DESCRIPTIONThis course will cover the following topics:
- a general methodology for knowledge and data analysis, modeling and integration;
- an analysis of the state of the art tools and methodologies for data analysis, modeling and integration;
- an introduction to ontologies, Extended ER models and linguistic resources.
This is a hands-on, lab and experiment based course. Students will be given a data analysis/modelling/integration problem that they will have to solve, possibly, while taking the class. During the experiment, students will have to apply to the problem the notions introduced in class.
EXAMThe exam consists in addressing one of the items slots listed under "KDI project", where:
- Each title needs to be instantiated by a specific domain of knowledge - e.g., "Healthcare Domain Knowledge (DK) Analysis" or "Food Domain Knowledge (DK) Evolution"
- The final output consists on a written "Technical Report" (.doc) and, where requested, on a oral "Presentation" (.ppt)
- Each course project can be extended with a "Research project" and, eventually, with a "Thesis project", as described in the table below
- The items of a slot can be extended and/or combined with the items of another slot (in agreement with the teacher) - e.g., "DK Analysis" can be extended with "Create the OWL" model, which is present in "DK Generation"
- A more fine-grained description of the "Research project" and the "Thesis" project will be produced in agreement with the teacher
Title | KDI project (6 credits) | Research project (12 credits; 300 hours) | Thesis (24 credits; 600 hours) |
Domain Knowledge (DK) Analysis |
|
gDrive folder population – the research project consists of creating new documentation for the given DK and/or updating the existing one | Create a new RapidMiner extension for DK importing and analysis – the thesis consists of providing a concrete and novel contribution in the field of KR by developing a new and ad hoc extension of RapidMiner for KR analysis and management |
Domain Knowledge (DK) Evolution |
|
GitLab Project population – the research project consists of creating new DC files, i.e., input and output files (along with the corresponding recipes) for a given DK | Create a new dump for an already existing DK with an example of application – the thesis consists of providing a concrete and novel contribution in the field of KR by developing a new ad hoc version of an already existing domain knowledge representation, which is able to address novel application requirements |
Domain Knowledge (DK) Generation |
|
GitLab Project creation – the research project consists of creating a new GitLab project and populate it with subprojects and files (along with the corresponding recipes) related to the newly generated DK | Create the first dump of a new DK with an example of application – the thesis consists of providing a concrete and novel contribution in the field of KR by developing a new ad hoc domain knowledge representation, which is able to address novel application requirements |
Domain Knowledge (DK) Integration |
|
Using a DK for data integration – the research project consists of importing and modelling new data instances through KARMA, updating and populating the GitLab “entity level” folder for the given domain, and eventually generating new DCs | Create a new dump for an already existing DK with an example of application – the thesis consists of providing a concrete and novel contribution in the field of KR by developing a new ad hoc version of an already existing domain knowledge representation, which is able to address novel application requirements by using new imported data instances |
REPOSITORY OF WORK FROM PREVIOUS YEARS. This page contains results from projects developed in the previous years. Students are welcome to reuse any of these previous results in the development of their current assignment.