An interesting aspect to ponder upon in Information Extraction is web pages are not random but in fact are written by humans there by enabling the possibility to understand the structure and content.
Information extraction from web doesn't require full NLP because there is significant regularity on the web.So some structure can be found from template driven pages and wrappers can extract information using structure.