Saturday, October 15, 2011

10/13/2011

Purity of a cluster is an external validation measure to compare different clusterings of same set of points.

Purity of a cluster = No. of elements from the majority class

Purity of cluster = Sum of pure sizes of clusters / Total number of elemnts across the clusters

Looking at the formula it is clear that the purity will increase if we increse the number of clusters. If we compare two clusterings contaning different number of clusters the one containing the hihger number of clusters will be more pure. Therefore this method of validation should be preferably used to compare clusterings contaning same number of clusters.

-Apurva Nair