There's a robots.txt file located on the root of a web server that lists a set of rules for the crawlers, which they can respect or not if they choose to. It can tell the crawlers if they're allowed to index the website, restrict certain crawlers, or limit the indexing to certain directories only.
--
Ivan Zhou
Graduate Student
Graduate Professional Student Association (GPSA) Assembly Member
School of Computing, Informatics and Decision Systems Engineering
Ira A. Fulton School of Engineering
Arizona State University