## Wednesday, August 31, 2011

### 8/30/2011 class 2nd tweet

Some of the Hard Parts of IR:

• Simply matching on words is a very brittle approach.

• One word can have a zillion different semantic meanings
let us consider an example:
we will Consider:  Take
– "take a place at the table"
– "take money to the bank"
– "take a picture"
– "take a lot of time"
– "take drugs"

### 8/30/2011 class tweet

Evaluating IR:

1.Recall is the fraction of relevant documents retrieved from the set of total relevant documents collection-wide.
2. Precision is the fraction of relevant documents retrieved from the total number retrieved.
3. An IR system ranks documents by SC, allowing the user to trade off between precision and recall.

### 8/30/2011

Difference between Information Retrieval and Database Retrieval is
* in DB the query result must have 100% Precision and 100% Recall whereas in IR it has to be a balance of the two to be acceptable.
* all the results of DB query any addition or removal of entries from the result set would make the result incorrect, whereas in IR to overcome the errors in result set it generates ranking to represent relevance of each entry.

### 8/30/2011

Database Vs Search Engine
When you want to categorize databases based on performance metric you rate them based on the time(speed) taken to retrieve a particular query. On the other hand a Search engine is evaluated depending upon the relevant information it retrieves for a particular user query.

-------Abilash

### 08/25/2011

Involve Humans

It is possible to get by with techniques blythely ignorant of semantics, when you have humans in the loop

--  Abhishek

### 08/25/2011

Its a Jungle out there.

•   Anybody can put up information on web. Nobody authenticates or approves that information
•   We can gain correct knowledge only if enough people put up correct information on the web

-- Abhishek

### 08/30/2011

Evaluation of IR engine

•     Area under the curve is the best measure of precision and recall.
•    For an ideal system, the area under the curve is maximum (precision = 1 for all values of recall, area under the curve = 1*1 = 1 )

- Abhishek

### 08/30/2011

---> F-measure is deﬁned as a harmonic mean of precision and recall,.
---> The harmonic mean is more reasonable than the arithmetic mean when computing a mean of ratios.
---> All search algorithms have to make a tradeoff between recall and precision.
---> Typical web surfers would like every result on the first page to be relevant (high precision).
---> In contrast, various professional searchers such as paralegals and intelligence analysts are very concerned with trying to get as high recall as possible.

-Aamir

### 08/30/2011

Things that make IR Harder:
-IR system doesn't do NLP
-User knows that system doesn't do NLP and gives malformed queries.

--Avinash

--

Avinash Bhashyam
1202913681
Master of Science Computer Science,
Arizona State University
Phone - (312) 810-2690

## Tuesday, August 30, 2011

### 8/30/2011

Precision and Recall are measures to evaluate the effectiveness of an Information Retrieval System.

Precision = tp/(tp+fp)

Recall = tp/(tp+fn)

tp- Relevant documents retrieved by the system

fp- All documents retrieved by the system, based on the user query

fn – All the relevant documents present

Precision/recall values can be plotted and the area under the Precision/recall curve is a measure of the effectiveness.
Practically, 100% effectiveness is almost impossible to achieve. Why? Because, it is entirely dependent on the User.

Ramya

### 08/30/2011

Relevance can be learned, specified, you could make them up, or use
some combination of the prior. True relevancy is ultimately decided by
the consumer/user/evaluator, however this can be extremely difficult
to establish. Unlike information retrieval which returns imperfect
results, data retrieval should always return perfect results.
Additional environmental information such as previous queries used,
previous results, information about the user, etc, can all be used to
better improve relevancy. -Thomas Hayden

### 8/30/2011

Keywords are a weak projection of what users want and search engine is expected to give the data. Imprecise queries are IR's hardest problem. The true relevance of the resultset is decided by the user.

Archana

### 8/30/2011

IR systems are evaluated by finding the Precision and Recall.

Precision = tp/(tp+fp)
Recall = tp/(tp+fn)

where,
tp = true positives
fp = false positives
fn = false negatives

-Bharath

### 8/30/2011

Precision = tp/(tp+fp)

Recall = tp/(tp+fn)

tp- set of documents required by/sent to the user by the search system

fp- Documents user does not need but returned to the user

fn – set of documents that user wants but not sent by the system

Precision is something we can tell that we missed it unlike recall

Srividya

### 8/30/2011

Data Retrieval (say database query) :
* Well defined semantics.
* Even a single error is considered failure.

Info Retrieval (say normal text query) :
* Semantics are frequently loose.
* Small errors can be tolerated.

Sandeep Gautham

### 8/30/2011

->Data retrieval consists of determining which document (in context of IR) contains the keywords present in user query.
->Information retrieval consists of interpreting (implicit) the documents and ranking them in accordance to the relevance of user query.

-Dinu

### 08/30/2011

The weighted harmonic mean of precision and recall is called  the  F-measure or balanced F-score and  is given by:

$F = \frac{2 \cdot \mathrm{precision} \cdot \mathrm{recall}}{(\mathrm{precision} + \mathrm{recall})}.\,$
Sekhar

### Fwd: 8/30/2011

IR systems are evaluated using precision and recall.Relevance function depends on user,documents,queries,the whole document universe and other documents already shown.

Preethi Satishchandra

### 8/30/2011

IR systems are evaluated using precision and recall.Relevance function depends on user,documents,queries,the whole document universe and other documents already shown.

### 8/30/2011

F-measure is combining Precision and Recall to get a single value.
Mathematically, it is the harmonic mean of Precision and Recall.

-Arjun

### 8/30/2011

In an ideal system or truly sound system precision and recall is 1.0 or 100%

--
-Dinu John

### August 30th, 2011

A Precision Recall Curve graphs the relationship between soundness and completeness of
an Information Retrieval System.  In essence, these Curves measure quality of the algorithm.
I Love starting at 100% precision!  Let's hope its not all downhill from there as the optimal
IR system has 100% precision (1.0) and 100% recall (1.0).

M.

### 8/25/2011

Dear Professor,

Tweet notes:

-Relevance is hard to compute in traditional model. Eventually it is approximated to similarity matrix.

Regards
Bharath

## Thursday, August 25, 2011

### 8/25/2011

There are three forms of learning in pattern recognition namely, supervised, unsupervised and partly supervised learning. We would like to exploit massive amounts of unlabeled data(real data that is not labeled) by learning and supervising.

Archana

### 8/25/2011

Collaborative Filtering :
Automatically finds choices of the user
Example:Amazon.com where it suggests to the user about the products that he could buy(depending on the items recently bought by others)

Srividya

### 8/25/2011

Web can be thought of collective unconscious in the sense of Carl Jung which is why
you can extract common-sense knowledge from web.

rao