is a plain-text file that sits on the root of a website, and lets
search engine spiders (or robots) know which sections of the website
to access, and which to exclude. There
are a number of reasons of why to use a robot.txt file, but the most
important reason is to protect personal or sensitive
information. An example of this would be that search engine
spiders/robots can be prevented from crawling and indexing a
customer's credit card information from an ecommerce website.
Robots.txt can and should also be used in collaboration with forms of
security, such as firewalls and passwords. This keeps anyone from
accessing this sensitive information while using a search engine.
to write a Robot.txt
a text editor and type
save this as “robots.txt”. It is important that this is named
with lower-case letters.
the * for "User-Agent:" applies to all robots. To apply robots.txt to specific
search engines, type in the name of the engine; for example:
sections of the site can be added in. A separate “Disallow” must
be used with every URL prefix to be excluded. Here is an example
Disallow: /docs/, /bin/
This will disallow robots from crawling these folders and
everything inside of them. To disallow robots from accessing specific
pages or files but not the whole folder, type in the specific pages.
all of the pages desired to be disallowed are added into robots.txt,
save it, and place the file on the root of the website.
everything in a website that is NOT disallowed in this file will be
crawled and indexed.
more detailed information on robots.txt view these links