Tweet Notes (CSE 494/598 F11): August 2011

Wednesday, August 31, 2011

8/30/2011 class 2nd tweet

Some of the Hard Parts of IR:

• Simply matching on words is a very brittle approach.

• One word can have a zillion different semantic meanings

let us consider an example:

we will Consider: Take

– "take a place at the table"

– "take money to the bank"

– "take a picture"

– "take a lot of time"

– "take drugs"

8/30/2011 class tweet

Evaluating IR:

1.Recall is the fraction of relevant documents retrieved from the set of total relevant documents collection-wide.

2. Precision is the fraction of relevant documents retrieved from the total number retrieved.

3. An IR system ranks documents by SC, allowing the user to trade off between precision and recall.

Difference between Information Retrieval and Database Retrieval is
* in DB the query result must have 100% Precision and 100% Recall whereas in IR it has to be a balance of the two to be acceptable.
* all the results of DB query any addition or removal of entries from the result set would make the result incorrect, whereas in IR to overcome the errors in result set it generates ranking to represent relevance of each entry.

8/30/2011

Database Vs Search Engine

When you want to categorize databases based on performance metric you rate them based on the time(speed) taken to retrieve a particular query. On the other hand a Search engine is evaluated depending upon the relevant information it retrieves for a particular user query.

-------Abilash

08/25/2011

Involve Humans

It is possible to get by with techniques blythely ignorant of semantics, when you have humans in the loop

-- Abhishek

08/25/2011

Its a Jungle out there.

Anybody can put up information on web. Nobody authenticates or approves that information

We can gain correct knowledge only if enough people put up correct information on the web

-- Abhishek

08/30/2011

Evaluation of IR engine

Area under the curve is the best measure of precision and recall.

For an ideal system, the area under the curve is maximum (precision = 1 for all values of recall, area under the curve = 1*1 = 1 )

- Abhishek

08/30/2011

---> F-measure is deﬁned as a harmonic mean of precision and recall,.
---> The harmonic mean is more reasonable than the arithmetic mean when computing a mean of ratios.
---> All search algorithms have to make a tradeoff between recall and precision.
---> Typical web surfers would like every result on the first page to be relevant (high precision).

---> In contrast, various professional searchers such as paralegals and intelligence analysts are very concerned with trying to get as high recall as possible.

-Aamir

08/30/2011

Things that make IR Harder:

-IR system doesn't do NLP

-User knows that system doesn't do NLP and gives malformed queries.

--Avinash

--

Avinash Bhashyam
1202913681
Graduate Student,
Master of Science Computer Science,
Arizona State University
Phone - (312) 810-2690

Tuesday, August 30, 2011

8/30/2011

Precision and Recall are measures to evaluate the effectiveness of an Information Retrieval System.

Precision = tp/(tp+fp)

Recall = tp/(tp+fn)

tp- Relevant documents retrieved by the system

fp- All documents retrieved by the system, based on the user query

fn – All the relevant documents present

Precision/recall values can be plotted and the area under the Precision/recall curve is a measure of the effectiveness.
Practically, 100% effectiveness is almost impossible to achieve. Why? Because, it is entirely dependent on the User.

Ramya

08/30/2011

Relevance can be learned, specified, you could make them up, or use
some combination of the prior. True relevancy is ultimately decided by
the consumer/user/evaluator, however this can be extremely difficult
to establish. Unlike information retrieval which returns imperfect
results, data retrieval should always return perfect results.
Additional environmental information such as previous queries used,
previous results, information about the user, etc, can all be used to
better improve relevancy. -Thomas Hayden

8/30/2011

Keywords are a weak projection of what users want and search engine is expected to give the data. Imprecise queries are IR's hardest problem. The true relevance of the resultset is decided by the user.

Archana

8/30/2011

IR systems are evaluated by finding the Precision and Recall.

Precision = tp/(tp+fp)

Recall = tp/(tp+fn)

where,

tp = true positives

fp = false positives

fn = false negatives

-Bharath

8/30/2011

Precision = tp/(tp+fp)

Recall = tp/(tp+fn)

tp- set of documents required by/sent to the user by the search system

fp- Documents user does not need but returned to the user

fn – set of documents that user wants but not sent by the system

Precision is something we can tell that we missed it unlike recall

Srividya

8/30/2011

Data Retrieval (say database query) :

* Well defined semantics.

* Even a single error is considered failure.

Info Retrieval (say normal text query) :

* Semantics are frequently loose.

* Small errors can be tolerated.

Sandeep Gautham

8/30/2011

->Data retrieval consists of determining which document (in context of IR) contains the keywords present in user query.
->Information retrieval consists of interpreting (implicit) the documents and ranking them in accordance to the relevance of user query.

-Dinu

08/30/2011

The weighted harmonic mean of precision and recall is called the F-measure or balanced F-score and is given by:

$F = \frac{2 \cdot \mathrm{precision} \cdot \mathrm{recall}}{(\mathrm{precision} + \mathrm{recall})}.\,$

Sekhar

Fwd: 8/30/2011

IR systems are evaluated using precision and recall.Relevance function depends on user,documents,queries,the whole document universe and other documents already shown.

Preethi Satishchandra

8/30/2011

IR systems are evaluated using precision and recall.Relevance function depends on user,documents,queries,the whole document universe and other documents already shown.

8/30/2011

F-measure is combining Precision and Recall to get a single value.

Mathematically, it is the harmonic mean of Precision and Recall.

-Arjun

8/30/2011

In an ideal system or truly sound system precision and recall is 1.0 or 100%

--
-Dinu John

August 30th, 2011

A Precision Recall Curve graphs the relationship between soundness and completeness of

an Information Retrieval System. In essence, these Curves measure quality of the algorithm.

I Love starting at 100% precision! Let's hope its not all downhill from there as the optimal

IR system has 100% precision (1.0) and 100% recall (1.0).

8/25/2011

Dear Professor,

Tweet notes:

-Relevance is hard to compute in traditional model. Eventually it is approximated to similarity matrix.

Regards

Bharath

Thursday, August 25, 2011

8/25/2011

There are three forms of learning in pattern recognition namely, supervised, unsupervised and partly supervised learning. We would like to exploit massive amounts of unlabeled data(real data that is not labeled) by learning and supervising.

Archana

8/25/2011

Collaborative Filtering :

Automatically finds choices of the user

Example:Amazon.com where it suggests to the user about the products that he could buy(depending on the items recently bought by others)

Srividya

8/25/2011

Web can be thought of collective unconscious in the sense of Carl Jung which is why

you can extract common-sense knowledge from web.

rao