Wednesday, September 21, 2011

9/20/2011

The documents and terms are vectors in space of factors.We use LSI to capture variance.
d*t = (d*f) * (f*f) * (f*t). The Rank of the matrix of (f*f) determines the dimensions.The dimensions can be further reduced which gives us a better insight of data variance and also understand the similarities . LSI captures the clusters over  data.It can capture the intuitions  which the cos(theta) similarity would never capture i.e..like synonymy and polysemy.

Srividya