Wednesday, November 23, 2011


The web is not random at all. Structure exists, it just isn't well defined.

Figuring out this structure and extracting information from it is
highly useful for relating things such as documents that might
reference each other, but do not actually contain a hyperlink.

Again, we are not trying to fully understand what the documents mean,
we just want to obtain a limited amount of information.

A extraction mode could do things such as segmentation,
classification, clustering, and association.

Information extraction falls somewhere between IR and NLP. In a sense
IE is a limited subject specific version of NLP.

-Thomas Hayden