Figuring out this structure and extracting information from it is
highly useful for relating things such as documents that might
reference each other, but do not actually contain a hyperlink.
Again, we are not trying to fully understand what the documents mean,
we just want to obtain a limited amount of information.
A extraction mode could do things such as segmentation,
classification, clustering, and association.
Information extraction falls somewhere between IR and NLP. In a sense
IE is a limited subject specific version of NLP.