k-means is to look for kinks in the tightness vs. K curve. However, it
might not be practical to run k-means for so many different k's on
large datasets. So one way to improve this is to take a random sample
of the dataset beforehand, running k-means with various sizes of k to
find the "best" k.
- Stephen Booher