Tweet Notes (CSE 494/598 F11)

Wednesday, November 30, 2011

11/29/2011

Wrapper class is used for extraction when pages are generated with identical schema. Along with generation of wrapper class even maintenance of it is also very necessary.

Preethi

11/29/2011

DOM trees are used for writing the patterns of data in web pages which can be extracted. There are lot of regular web pages from which information can be extracted easily.

Preethi

11/15/2011

XQuery- Every xquery may not represent a unique SQL query. It may have queries for meta-data. Converting it to SQL queries is a non-trivial conversion.

It may be converted to a single or multiple SQL queries.

-Rashmi Dubey

11/17/2011

Deductive Database is the combination database with tables plus some additional background knowledge.

-Rashmi Dubey

11/22/2011

Information Extraction (IE) lies in the middle of Information Retrieval & NLP approaches. It aims to extract information from semi-structured data like Referenced Papers, Wikipedia,etc.

The regularities of the web-pages enables IE without NLP.

-Rashmi Dubey

11/29/2011

In Collective Classification, the neighbours of a node define constraints on it.

-Rashmi Dubey

Tuesday, November 29, 2011

We can assume that a Hidden Markov Model exists behind every possible grammatically correct sentence. If we can develop a rudimentary HMM for a language like English, we can use it to predict the likelihood of a sentence being said. We can take an imperfect speech recognition software and use the sounds the program hears to generate a list of possible sentences that sound similar. We use the HMM to determine how likely each of these sentences is to be said, relative to each other. This gives us a good idea of what the human actually said, rather than picking one possibility at random, since real sentences are far more likely than nonsense that simply rhymes with the sentence.
~Kalin Jonas