-Boolean out (avoid 0, 1), we need one that accepts/produces in-between values too.
-The importance of a word is measured on how common the word is in the documents. This can be seen as in how come a cow or dog is not in a zoo.
-The similarity level should stay equal, whenever the same proportion of words is duplicated/copied/concatenated in the same document.
--
Ivan Zhou
Ivan Zhou
Graduate Student
Graduate Professional Student Association (GPSA) Assembly Member
School of Computing, Informatics and Decision Systems Engineering
Ira A. Fulton School of Engineering
Arizona State University