Tweet Notes (CSE 494/598 F11): 9/20/2011

Wednesday, September 21, 2011

9/20/2011

The documents and terms are vectors in space of factors.We use LSI to capture variance.

d*t = (d*f) * (f*f) * (f*t). The Rank of the matrix of (f*f) determines the dimensions.The dimensions can be further reduced which gives us a better insight of data variance and also understand the similarities . LSI captures the clusters over data.It can capture the intuitions which the cos(theta) similarity would never capture i.e..like synonymy and polysemy.

Srividya