Tweet Notes (CSE 494/598 F11): 10/27/2011

Wednesday, November 16, 2011

10/27/2011

Parametric VS non-parametric learning:

parametric:
size is fixed to a set of parameters

non-parametrix:
size relative to size of training data

Generally training time has two costs, examples (number) and processing time.

There are a number of good text based learning machine techniques, but
naive bayes nets are a good starting point.

Naive bayesian assumes that all attributes are independent.
Information is lost, but this form is much faster and still quite
good.

Some algorithms are good at dealing with missing and incremental
additions of data, bayesian is are one of these.

Smoothing can be done to prevent erroneous training examples from
jumping to irrational conclusions. This could be done by prepping the
training data with a virtual uniform value such as 50/100.

-Thomas Hayden