Tuesday, September 20, 2011

9/15/2011

Document is a vector in the space of term, term is a vector in the space of document. If you consider term-term metric, it should be dialog metric, if in fact the term are turly independent.

Alternatively, we can also do it with query log,(people typing this word my also typing this word, maybe it is for you too)
Two terms are related if they have high occurance in the documents.

Email messages are bag of address, so we can compute address correlation.

Amazon users are bags of purchases, so they can do corelation between purchases. people often buy these things also buy those things, would you like to buy it as well?
The benefit of term-term correlation is if these terms are not independent, we can explore their correlation.

- Shu