Tuesday, September 13, 2011

09/08/2011

Because of the sparse of vector, inverted indexing become more useful.
Naive retrieval will touch every documents, so it is inefficient. Documents whose similarity would be zero would not be touched any more. That is inverted index.The importance is to use inverted index to compute similarity metrix.
In traditional IR, index is made of key word, and lexicon is made of key word. Modern search engines index the full text.
If you construct index your self, you should pay attention to stemming and stop words, which can not only reduce the index size, but also improve answer relevance.

--  Shu Wang