K-means is the cluster mean and it represents the dissimilarity among clusters. It's a measure. And one of the problem it faces is that when we increase K dissimilarity falls; this could lead K to be the number of points/nodes in the entire problem. One of the solutions for this is sampling instead of all, for example, 3 trillion books; sampling with different K and use the best. The other solution could be to penalize when increasing the number of clusters usage.
--
Ivan Zhou
Graduate Student
Graduate Professional Student Association (GPSA) Assembly Member
School of Computing, Informatics and Decision Systems Engineering
Ira A. Fulton School of Engineering
Arizona State University