Tuesday, September 13, 2011

09/08/2011

As we know IDF is a Global feature, it can be incorporated along with the inverted index which would save computation time during search.
All words are not equally important, ex) The word 'Computer' in the repository of computer documents has no role to play.
Therefore its useless to index words like 'The' which has an IDF zero.
One of the point to ponder is that there is a difference between words with IDF 0 as opposed 0.0000001. This changes the ranking of the pages after the initial 3000 search results.
 
Aneeth