About

The summer school is a joint effort by ERS-IASC (European Regional Section of International Association for Statistical Computing) ECAS (European Courses in Advanced Statistics) and CLADAG (Classification and Data Analysis Group of the Italian Statistical Society).

The course is intended to achieve postgraduate training in special areas of statistics for both researchers and professional data analysts. The focus is on classification and clustering methods with particular emphasis on modern high-dimensional data sets (MHDS). MHDS have recently emerged because of the fast improvement in data acquisition, storage and processing. The availability of massive data sets are of large interest also in machine learning, data science and computer science. Large data sets apply in many contexts such as biological experiments, financial markets, astronomy, etc. Classification and clustering play a key role in this new paradigm to discover the inhomogeneous structure often underlying these data. Starting from basic concepts, the course will introduce the audience to novel techniques and software through extensive applications to real data. Numerical applications will be performed through a variety of software, including some R packages and some cloud-computing platforms (SaaS, Software as a Service) issuing from research but targeting many kinds of practitioners

Program

Day 1 // Introduction to Cluster Analysis and Classification
Multivariate data formats. Multivariate data and their visualization. Linear spaces, distances, dissimilarities, and geometric structures in several dimensions. Multivariate location-scale models. Clustering and classification. Types of clustering. Centroid-Based clustering methods. Agglomerative hierarchical methods. Spectral clustering. Density based methods.

Day 2 // Mixture Mmodels, Model-based Clustering and Algorithms
Mixture models. Sampling from mixture models and clustered populations. Elliptical shaped clusters and the Gaussian model. Finite Gaussian mixture models (GMM) and model-based clustering. MLE estimation for GMM. EM-algorithm and its variants. Computational aspects for MLE of GMM: scale restrictions and cluster initialization. Clusterwise linear regression and cluster-weighted models.

Day 3 // Model Selection, Variable Selection and Cluster Validation
Model-based clustering and model selection criteria: AIC, BIC, ICL. Strategies for model specification for the GMM model and its variants. High-dimensional data and variable selection. Dimensional reduction for clustering and classification. Estimating the number of clusters. Cluster validation and cluster stability. Criteria for comparing clusterings.

Day 4 // Further Topics in Cluster Analysis and Classification
Robustness and clustered data. Robust methods for cluster analysis. Clustering with categorical variables and mixed type-data. Network data clustering. Clustering strategies and method selection

Day 5 // Issues in Clustering Big Data
Three lectures on emerging fields in clustering big data: Co-clustering, clustering of high-dimensional data, clustering of time series



Schedule

Monday, May 21, 2018
Introduction to cluster analysis and classification

Time Topic Lecturer
09.00-09.30 Introduction
09.30-11.00 Lecture 1 (Topic 1) C. Biernacki
11.00-11.30 Coffee break
11.30-13.30 Lecture 2 (Topic 2) S. Ingrassia
13.30-15.00 Lunch
15.00-16.00 Lecture 3 (topic 1) C. Biernacki
16.00-17.00 Practical lab session on lectures 1-3 TBA


Tuesday, May 22, 2018
Mixture models, model-based clustering and algorithms

Time Topic Lecturer
08.30-10.30 Lecture 4 (Topic 1) C. Biernacki
10.30-11.00 Coffee break
11.00-13.00 Lecture 5 (Topic 2) S. Ingrassia
13.00-14.30 Lunch
14.30-15.30 Practical lab session on lectures 1-3 TBA
15.30-16.00 Coffee break
16.00-17.00 Practical lab session on lectures 4-5 TBA


Wednesday, May 23, 2018
Model selection, variable selection and cluster validation

Time Topic Lecturer
08.30-10.30 Lecture 6 (Topic 3) S. Ingrassia
10.30-11.00 Coffee break
11.00-13.00 Lecture 7 (Topic 3) P. Coretto
13.00-14.30 Lunch
14.30-15.30 Practical lab session on lectures 6-7 TBA
15.30-18.30 Social Event


Thursday, May 24, 2108
Further topics in cluster analysis and classification

Time Topic Lecturer
08.30-10.30 Lecture 8 (Topic 4) P. Coretto
10.30-11.00 Coffee break
11.00-13.00 Lecture 9 (Topic 4) P. Coretto
13.00-14.30 Lunch
14.30-15.30 Practical lab session on lectures 8-9 TBA
15.30-16.00 Coffee break
16.00-17.30 Discussion on future advances related to lectures 8-9 TBA


Friday, may 25, 2018
Three topics in clustering big data

Time Topic Lecturer
08.30-10.30 Coclustering C. Biernacki
10.30-11.00 Coffee break
11.00-13.00 Clustering of high-dimensional data C. Bouveyron
13.00-14.30 Lunch
14.30-16.30 Clustering of time series S. Frühwirth-Schnatter
16.30-16.45 Closing

Lecturers

Christophe Biernacki
UFR de Mathématiques
Université Lille 1
FRANCE




Charles Bouveyron
Laboratoire J.A. Dieudonné, UMR CNRS 7531,
and Equipe Asclepios, INRIA Sophia-Antipolis
Université Nice Côte d’Azur
FRANCE




Pietro Coretto
Department of Economics and Statistics
University of Salerno
ITALY




Sylvia Frühwirth-Schnatter
Institute for Statistics and Mathematics
Vienna University of Economics and Business
Austria




Salvatore Ingrassia
Department of Economics and Business
University of Catania
ITALY

 
 

Venue

TBA

Accommodation

TBA

Registration and Deadlines

TBA