Abstract

We introduce a robust and sparse clustering procedure for high-dimensional data. The robustness aspect is addressed by a weighting function incorporated in the k-means procedure, consequently leading to an automatic weight assignment for each observation. The sparsity aspect is given by a lasso-type penalty on weighted between-cluster sum of squares. We additionally propose a framework for determining the optimal number of both clusters and variables that contribute to a cluster separation.

Reference

Brodinova, S., Filzmoser, P., Ortner, T., Zaharieva, M., & Breiteneder, C. (2017). Robust and sparse clustering for high-dimensional data. In CLADAG 2017 Book of Short Papers. Conference of the CLAssification and Data Analysis Group (CLADAG) of the Italian Statistical Society (SIS), Milan, Italy, EU. http://hdl.handle.net/20.500.12708/57014

Projects