K-means and Anomalous Clustering

Prof. Boris Mirkin
Russia, National Research University Higher School of Economics in Moscow, UK, Birkbeck University of London
I consider first a rather simple intuitive criterion of individual cluster analysis, the product of the average within-cluster similarity and the number of elements in it to be maximized, and bring forth its mathematical properties relating the criterion with high-density subgraphs and spectral clustering approach. Then I present a simple approximation anomalous cluster model leading to the criterion and families of very effective ADDI crisp clustering methods (Mirkin, 1987) and FADDIS fuzzy clustering methods (Mirkin, Nascimento, 2012); the latter leading to mysteries in the popular Laplace similarity data normalization.
Then I show that the celebrated square-error k-means clustering criterion can be equivalently reformulated as of finding a partition consisting of the anomalous clusters. I will finish with a problem in consensus clustering
to show that it is equivalent to the anomalous similarity clustering and present experimental results of the superiority of this approach over competition.