Monday, December 5, 2011

09/27/2011

Hubs point to Authority. Hubs value is calculated by the sum of all the Authorities points they're pointed at, 
and Authority's value is calculated by the sum of the hubs' points that are pointing at them.

In page rank / auth and hub power Iteration is the fastest way of calculating primary eigen vectors
 and the rate of convergence depends on the eigen gap.

--
Nikhil Pratap

12/1/2011

The challenge with data integration is that, the data is spread over different sources with different schema and lot of time is consumed cleaning, migrating these data and integrating into one unit.

-- Dinu John


12/1/2011

Information integration is the process of combining data from difference source and giving one view of the entire data in a federated manner.

-- Dinu John

11/29/2011

Extraction pattern can be specified as path from the root of the DOM tree using Xquery or we can write regular expression to extract particular portion of the web page.

-- Dinu John


11/29/2011

The process of extracting data from webpages is referred as screen scraping.

-- Dinu John

11/29/2011

Extractors for a semi-structure web site are referred as wrappers. These wrappers are custom wrappers and have to be modified if the structure of web page changes.

-- Dinu John


11/29/2011

HTML structure of the web pages belonging to a web site is specific and regular. They all follow a specific pattern and have a template.

-- Dinu John