What is a robot.txt?

What is robots.txt?

is a plain-text file that sits on the root of a website, and lets search engine spiders (or robots) know which sections of the website to access, and which to exclude. There are a number of reasons of why to use a robot.txt file, but the most important reason is to protect personal or sensitive information. An example of this would be that search engine spiders/robots can be prevented from crawling and indexing a customer's credit card information from an ecommerce website. Robots.txt can and should also be used in collaboration with forms of security, such as firewalls and passwords. This keeps anyone from accessing this sensitive information while using a search engine.

How to write a Robot.txt

Open a text editor and type

User-agent: *

Now save this as “
robots.txt”. It is important that this is named with lower-case letters.
Note: the * for "User-Agent:" applies to all robots. To apply robots.txt to specific search engines, type in the name of the engine; for example:

UserAgent: Google

Now sections of the site can be added in. A separate “Disallow” must be used with every URL prefix to be excluded. Here is an example below:

User-agent: *
Disallow: /docs/, /bin/

User-agent: *
Disallow: /docs/
Disallow: /bin/

This will disallow robots from crawling these folders and everything inside of them. To disallow robots from accessing specific pages or files but not the whole folder, type in the specific pages.

User-agent: *
Disallow: /example.html
Disallow: /another-example.html

After all of the pages desired to be disallowed are added into robots.txt, save it, and place the file on the root of the website.

Remember, everything in a website that is NOT disallowed in this file will be crawled and indexed.

For more detailed information on robots.txt view these links

Add Feedback