Wednesday, September 14, 2011

8/13/11

A power law distribution when applied to IR can be used to show that
those without query logs can have a harder time gathering them, due to
the inferior ability of a search engine without previous logs to
improve searching. Edit Distance - The distance between two words
determined by the number of insertions, deletions, replacements, or
transposition of letters necessary so that the two words are the same.
This process like me other discussed in class thus far, can be
weighted to give differing values to each of the type of changes, and
even different weights to different variations of each type of change
(such as some letter replacements being more or less heavily weighted
than some others). When using this process, proper alignment of
characters is necessary to prevent the edit distance from becoming a
high complexity problem. Correlation and Co-occurrence Analysis -
Terms that are related may be added to the results using a thesaurus
like method.
-Thomas Hayden