Nikhil Pratap
Monday, December 5, 2011
09/27/2011
Nikhil Pratap
12/1/2011
-- Dinu John
12/1/2011
11/29/2011
-- Dinu John
11/29/2011
-- Dinu John
11/29/2011
-- Dinu John
10/27/2011
--
Nikhil Pratap
10/25/2011
Nikhil Pratap
Graduate student
Department of Computer Science
(Arizona State University)
Sunday, December 4, 2011
11/03/2011
In the Content-Based Recommending, recommendations are based on information
on the content of items rather than on other user's opinions.
Adv:
We can recommend to users with unique tastes.
We will be able to recommend new and unpopular items.
Nikhil Pratap
11/1/2011
The Collaborative filtering follow the following steps:
Weight all users with respect to similarity with the active user.
Select a subset of the users (neighbors) to use as predictors.
Normalize the ratings and compute a prediction from a weighted combination of the selected neighbors ratings.
Present items with highest predicted ratings as recommendations.
Nikhil Pratap
11/1/11
Mohal
10/27/2011
Mohal
10/27/2011
Mohal
10/25/2011
Mohal
10/20/2011
Mohal
10/20/2011
Mohal
10/14/2011
Mohal
10/14/2011
Mohal
12/1/2011
Ivan Zhou
Fwd: 11/22/2011
A extraction mode could do things such as segmentation,
classification, clustering, and association.
--
Nikhil Pratap
11/22/2011
classification, clustering, and association.
12/1/11
11/29/2011
11/29/2011
Avinash Bhashyam
11/22/2011
11/22/2011
Avinash Bhashyam
11/29/2011
--
Srijith Ravikumar
11/08/2011
Saturday, December 3, 2011
11/17/2011
11/17/2011
Friday, December 2, 2011
12/1/2011
11/29/11
Wednesday, November 30, 2011
11/29/2011
Preethi
11/29/2011
Preethi
11/15/2011
11/17/2011
11/22/2011
11/29/2011
Tuesday, November 29, 2011
11-29-2011
~Kalin Jonas
11-17-2011
Kalin Jonas
11-15-2011
Kalin Jonas
11/22/2011
-Ivan Zhou
--
Ivan Zhou
11/22
11/17/2011
-- Dinu
11/22/2011
11/22/2011
Friday, November 25, 2011
11/15/2011
11/22/2011
Wednesday, November 23, 2011
11/22/11
11/22/2011
Figuring out this structure and extracting information from it is
highly useful for relating things such as documents that might
reference each other, but do not actually contain a hyperlink.
Again, we are not trying to fully understand what the documents mean,
we just want to obtain a limited amount of information.
A extraction mode could do things such as segmentation,
classification, clustering, and association.
Information extraction falls somewhere between IR and NLP. In a sense
IE is a limited subject specific version of NLP.
-Thomas Hayden
Tuesday, November 22, 2011
11/17/2011
November 22, 2011
11/17/2011
Nikhil Pratap
11/17
11/17/11
11/15
11/10
11/8
11/3
11/1
10/27
11/17/2011
Also, SPARQL is used to query RDF data.
Thanks
Sandeep Gautham
11/17/2011
11/17/2011
11/15/2011
RDF is a standard for writing base facts in XML syntax whereas, for writing domain knowledge in XML syntax OWL/RDF schema are standards.
11/08/2011
11/17/11
11/08/2011
Monday, November 21, 2011
11/15/2011
11/15/2011
11/15/2011
11/17/2011
Avinash Bhashyam
Saturday, November 19, 2011
11/17/2011
Srijith Ravikumar
Friday, November 18, 2011
11/17/2011
to retrieve data. These databases might have different schemas and a
way to retrieve the data could be writing two queries, but the better
way would be to use a schema mapping (integrator) and use one query.
This is where OWL comes into play.
-Ivan Zhou
Thursday, November 17, 2011
11/17/2011
- In RDF, semantic meaning can be formed by considering the connection between proposition or predicate symbols.
- Deductive databases contain tables with some background knowledge.
- OWL provides background knowledge in case of RDF.
- OWL is an XML file having its own schema as an RDF file.
- With respect to Semantic Web, normally people are unhappy with both the input and output as the query complexity increases and even the output returned will not be perfect as desired since the query will be imperfect.
- For integration of structured data in many databases, schema mapping is done between relationships between attributes in databases. For this schema mapping OWL is used.
Preethi
11/15/17
Nikhil Pratap
11/17/2011
11/15/2011
phrase even though they dont know the meaning. But they are able to
know that the syntax is fine.
-Ivan Zhou
11/10/2011
11/10/2011
11/15/11
11/15/2011
11/15/2011
-- Dinu
11/15/11
11/15/2011
11/10/2011
11/8/2011
11/3/2011
11/15/2011
XQuery allows change of tags. Say change of tags from <writer> to <author>
-Sandeep Gautham
1/11/2011
10/27/2011
11/15/2011
-James Cotter
Wednesday, November 16, 2011
11/15/11
11/15/11
- XML tags have no meaning.
- Meaning comes from inter-relation.
- Machines don't have any background knowledge like humans.
11/15/11
- DTD or XML schema can only validate an XML syntactically i.e the XML is well formed.
- They can't validate an XML semantically
- DTD is not in XML format.
11/18/2011
Unlike IR, we have to worry about query with XML in database.
XQuery is a standard query language for use with XML and more
information is available from W3C.
DTD can be used to enforce schema.
XQuery is "similar" to SQL and does many/all of the things SQL can if not more.
XML schema only provides us with loose meaning. The schema's meaning
has to be decided upon.
-Thomas Hayden
11/08/2011
Competitive ratio - accuracy of online computation/optimal accuracy
Expected return - clickvalue * likelihood of click.
Diversity becomes important when showing more than one ad, for
instance you don't want to show competitor ads together.
It might be good to bias high CTR ads first when showing multiple ads,
so that users aren't turned off by lower CTR (offensive) ads.
Vickery Auction: Winner only pays price of second highest bidder. This
in practice produces truthful bidding strategies.
-Thomas Hayden
11/03/2011
the document-words relation used previously.
User-items relation allows us to do things like user-user vector space
comparisons.
Cold start - Issue of starting with no data.
Just because two users have everything in common doesn't meany
anything if everything is little. Significance weighting.
Content Boosted - The concept of using learned info to compute info
about new content.
-Thomas Hayden
11/01/2011
expression of its dependents.
These expressions can be very large. Overflow could be avoided by
working with the log of value instead of the original.
Sample bias and 0 errors could be prevented by multiplying each
probability by 1/V, where V is some "virtual" document count.
Feature selection is important for both performance, and accuracy
reasons. Including too much information can actually lower the
"correctness" of the results.
Diversity of features can be just as important as similarity.
-Thomas Hayden
10/27/2011
parametric:
size is fixed to a set of parameters
non-parametrix:
size relative to size of training data
Generally training time has two costs, examples (number) and processing time.
There are a number of good text based learning machine techniques, but
naive bayes nets are a good starting point.
Naive bayesian assumes that all attributes are independent.
Information is lost, but this form is much faster and still quite
good.
Some algorithms are good at dealing with missing and incremental
additions of data, bayesian is are one of these.
Smoothing can be done to prevent erroneous training examples from
jumping to irrational conclusions. This could be done by prepping the
training data with a virtual uniform value such as 50/100.
-Thomas Hayden
Fwd: 10/25/2011
Unsupervised learning (clustering) and supervised learning
(classification) are two extremes, but often techniques exist that
fall between that can work with labeled and unlabeled data.
Relevance feed back - Define relevance based on user feedback either
direct or indirect, such as clicking a link for query results.
RF can be used to rewrite the original query to improve the desired
precision and recall.
-Thomas Hayden
10/25/2011
Unsupervised learning (clustering) and supervised learning
(classification) are two extremes, but often techniques exist that
fall between that can work with labeled and unlabeled data.
Relevance feed back - Define relevance based on user feedback either
direct or indirect, such as clicking a link for query results.
RF can be used to rewrite the original query to improve the desired
precision and recall.
Tuesday, November 15, 2011
11/15/2011
Preethi
Re: 11/15/2011
Srividya
11/15/2011
11/10/2011
Ivan Zhou
11/10/2011
11/10/2011
XML is a standard of semi-structured data model.
It is a bridge between structured and unstructured data,
covering the continuous spectrum from unstructured documents to structured data.
Nikhil
11/10/2011
Example : Path Queries uses tree structure of the XML file.
- Sandeep Gautham
11/10/2011
11-10-2011
~Kalin Jonas
11-08-2011
~Kalin Jonas
11/10/2011
Avinash Bhashyam
11/08/2011
Avinash Bhashyam
11/8/20111
11/8/2011
Sunday, November 13, 2011
10/25/2011
Shubhendra
Saturday, November 12, 2011
11/10/2011
Friday, November 11, 2011
11/11/2011
Thursday, November 10, 2011
11/10/11
11/8/2011
search keywords. But later on it realized that the better approach was
to also include the click through rate in its calculations in order to
increase it's profits. In short, displaying a higher priced ad might
not mean more profits.
-Ivan Zhou
11/08/11
11/08/2011
Search engine
User
Advertiser
The utility matrix is a function of utility matrix of all three of these. The objective is to optimize this complex function such that user's utility and advertiser's ROI don't fall below a certain value.
Optimizing this function is search engine's responsibility.
-Rashmi Dubey.
11/08/2011
11/08/2011
Wednesday, November 9, 2011
11/8/2011
For instance if an offline computation for choosing which ads to display generates an optimal income of $100, and an online computation generates $50, then the competitive ratio is 50/100=1/2.
--James Cotter
Tuesday, November 8, 2011
11/08/2011
11/08/2011
--Shreejay
11/08/2011
11/08/11
11/08/2011
-- Dinu John
11/08/2011
11/08/2011
11/08/2011
- Elias
11/03/2011
collaborative filtering. In search advertising, when the user clicks
on the ad, then Google gets paid. And it's until 2002 that Google
based its ads on keyword searches.
-Ivan Zhou
11/03/2011
11/03/2011
11/01/2011
10/27/2011
10/25/2011
10/25/2011
10/20/2011
11/3/2011
11/1/2011
10/27/2011
Naive Bayes classifier based on bayes networks and it works better if smoothing probability sometimes and we can use M-estimates to improve probablity estimates.
10/25/2011
10/20/2011
Hierarchical clustering methods: divisive (bisecting k-means) and agglomerative. Buckshot clustering: combines HAC and K-Means clustering.
Clustering on text: use LSI to reduce dimensions before clusterings.
10/13/2011
(Inter-cluster distance) while the distances between different clusters are maximized.
Purity of clustering= Sum of pure sizes of clusters/Total number of elements across clusters.
k-means method use Iterative way to improve clustering.
Monday, November 7, 2011
11/3/2011
Kalin Jonas
11/03/2011
Avinash Bhashyam
Sunday, November 6, 2011
11/6/2011
11/1/2011
Feature selection plays a pivotal role in Naive Bayes Classifier. Including too many features might lower performace and might also slow down learning process.
------Abilash
11/1/2011
Feature selection plays a pivotal role in Naive Bayes Classifier. Including too many features might lower performace and might also slow down learning process.
11/3/2011
-James Cotter
Saturday, November 5, 2011
11/03/2011
--Dinu John
Friday, November 4, 2011
11/3/2011
11/04/2011
Ca,u = Covar(ra,ru)/StdDevia(ra)/ StdDevia(ru)
Its similar to dotproduct formula
11/03/2011
Thursday, November 3, 2011
11/03/2011
They help only when they are taken from same distribution of labeled examples.
11/03/2011
-Sekhar
11/1/2011
~Kalin Jonas
11/1/2011
Ivan Zhou
11/1/11
11/01/2011
11/03/2011
10/1/11
11/01/2011
-- Dinu
10/27/2010
-- Dinu
11/01/2011
Intrusive detection is in a way explicitly asking the user to rate the items.
On the other hand, in non-intrusive detection, we follow the user actions.
-Sandeep Gautham
11/1/2011
This reduces the computation of probabilities from n*d^k to n*d*k.
-Sandeep Gautham
11/01/2011
Tuesday, November 1, 2011
11/1/2011
11/1/2011
- Elias
10/27/2011
-Ivan Zhou
--
Ivan Zhou
10/27/11
-James Cotter
10/27/2011
10/27/2011
each object should have a uniform chance of being in each category. As
we gather more and more samples, we must be sure to understand that
this is not always the case. We do this by adding "virtual samples"
which means that we pretend we have received M samples that are
uniformly distributed between the categories. As the empirical sample
size approaches and passes the size of M, this model begins to "trust"
its empirical samples more than the virtual samples.
~Kalin Jonas
Monday, October 31, 2011
10/27/2011
Sunday, October 30, 2011
27/09/2011
It is expected to rank page based on following equation: W*Pagerank + (1-W)*TF/IDFSimilarity, where Pagerank is a probability of importance of a page against entire document corpus (including irrelevant documents), and TF/IDF Similarity is the probability of similarity measure of a page in the cluster formed by relevant documents. If W is not carefully defined then this probability consideration would result in some irrelevant pages getting ranked high over relevant page.
--bhaskar