Thursday, October 20, 2011

10/20/2011

One way to address the problem of not knowing k in advance of running
k-means is to look for kinks in the tightness vs. K curve. However, it
might not be practical to run k-means for so many different k's on
large datasets. So one way to improve this is to take a random sample
of the dataset beforehand, running k-means with various sizes of k to
find the "best" k.

- Stephen Booher