Goal:
The aim of the course is to provide students with a thorough understanding of the purpose, methods, and underlying rationale of data preparation, preprocessing, and transformation tasks, which are essential to the proper functioning of data models and data analysis workflows. An additional objective of the course is to examine key issues in data collection, storage, and loading.
Following the theoretical presentation of the methods introduced in the lectures and an explanation of their practical importance, the practical sessions enable students to gain hands-on experience by implementing these methods in Python.
Course description:
The course syllabus is structured around the four main stages of data preprocessing—data cleaning, data transformation, data integration, and data reduction—and covers the following topics: CRISP-DM; data loading; data exploration; handling missing data; outlier detection and treatment; data manipulation; data transformation; normalization; aggregation; dimensionality reduction and expansion; data compression; discretization; data quality assurance; and an overview of the tools and techniques used to implement these tasks.