Tuesday, September 6, 2011

9/6/11

When we try to normalize frequency of a word, we are asking the
question "how indicative is this word with respect to the corpus?"
Intuitively, we think about the answer to the question as:
* If the word appears in most or all other documents, that word isn't
very indicative.
As a corollary, we also say that:
* If the word doesn't appear in other documents, that word is very indicative.

Andree