Tuesday, October 25, 2011

10/20/11

Clustering on High Dimensional data
like Documents is tricky as most document
pairs have a similarity distance approaching 0.
The cosine theta distance is not really
a good measure, so use the costly LSI
to reduce dimensions.  The true distance
is then represented in reduced dim. space.
Manjara did this but its not practical. yet.

M.