Thursday, September 29, 2011

09/29/2011

To incorporate importance as well as query similarity in a ranking a document, the best approach is to cluster documents in the corpus based on topic/class they belong to, find importance of the document wrt the cluster. This can be pre-computed and stored for later use.

When the query arrives, assign a topic to this query, find documents based on similarity wrt the query and calculate pageRank as:
PR=alpha*SimilarityVale + (1-alpha)* Importance.
Where alpha is any value between 0 & 1.

-Rashmi Dubey