Monday, October 31, 2011

10/27/2011

Naive bayes is one solution to the problem of classification, where given some data (for example emails) and some assumptions, we apply Bayes' Theorem to try and gauge the probability that it is classified in a class or not (e.g. spam or not).

Andree

Sunday, October 30, 2011

27/09/2011

Interesting point noticed while analyzing project part 2 results.

It is expected to rank page based on following equation: W*Pagerank + (1-W)*TF/IDFSimilarity, where Pagerank is a probability of importance of a page against entire document corpus (including irrelevant documents), and TF/IDF Similarity is the probability of similarity measure of a page in the cluster formed by relevant documents. If W is not carefully defined then this probability consideration would result in some irrelevant pages getting ranked high over relevant page.

--bhaskar

Thursday, October 27, 2011

10/27/2011

Neural nets consume lot of training time i.e.  the time required to go from training data to learned expressions.

Preethi

10/27/2011

Parametric learners are not proportional to the training data where the parameters do not depend upon the data.Most learners are parametric. 
Non parametric learners keep entire training data around.Eg..K-NN.

Preethi

10/27/2011

Support Vector Machine provide the best possible split between clusters.

-----Abilash

10/25/2011

General perspective while performing clustering process is that clusters are spherical in shape but in reality that is not the case most of the times. In order to deal with irregular clusters , clusters with in a clusters the best bet would be performing clustering process using nearest neighbor algorithm.

----Abilash

10/25/2011

Getting feedback from users on the search results proves very helpful
when computing the next results and it can be a way to personalize
them. And the problem we face with clustering sometimes is labeling
them.

-Ivan Zhou