Content:

Get a free consultation from an expert on your project

How to detect and correct errors in the robots.txt file

Each self-respecting webmaster should have at least a general idea of how to manage the indexing process of pages and site files in search engines.
We will not pull the rubber and immediately note that to find a common language with search robots is enough correctly configure robots.txt. The key word is "right. After all, if you make mistakes in robots.txt, the consequences can be quite unpleasant.

Get a free consultation from an SEO expert on your site

The most common errors of the robots.txt file

  1. Confused values of directives (by ignorance or by oversight).
  2. Listing multiple URLs in one Disallow directive.
  3. The name of the robots.txt file with errors.
  4. The title of the robots.txt file contains capital letters.
    Заглавные буквы в названии файла robots.txt
  5. Empty field in User-agent.
  6. Missing Disallow Directive.

  1. The incorrect URL format in the Disallow directive. An error in the robots.txt file on the left half of the screenshot will cause all pages and files to be closed from indexing, at the beginning of the URL of which is “admin”.Everything will be indicated correctly if you use the “$” and “/” characters to indicate “break”.
  1.  Transfer all directory files. In principle, this is not entirely a mistake. It is simply more rational in this case to close the entire directory from indexing.

 
Principles to be followed to prevent errors in robots.txt file

  1. The directives prescribed in robots.txt are recommendations that only robots of large search engines adhere to. Third-party bots most often do not pay attention to them. Therefore, it is better to block them by IP.
     
  2. Blocked in robots.txt pages still continue to be available to Internet users. Therefore, if the goal is to hide the web page not only from robots, but also from users, you need to set a password on it.
     
  3. Subcommunes are considered by search engines as separate sites. Therefore, recommendations for their indexing should be prescribed in individual robots.txt at the root of each subdomain.
     
  4. robots.txt is not case sensitive. Therefore, directives can be written both lowercase and uppercase. But file and directory names should only be prescribed as they look in the browser address bar.
     
  5. User-agent Directive responds to all directives specified below down to the next line with User-agent. Therefore, one should not hope that the directives under the second User-agent will be executed by the robot specified in the first User-agent. For each robot, instructions need to be duplicated.
     

Checking robots.txt for errors

To check robots.txt for errors, it is easiest to use the tools for this from search engines.

In the case of Google, you need to go to Search Console / Scan and select "Robots.txt File Validation Tool.


Under the window with the contents of the file you want, you can see the number of errors and warnings.

Yandex.Webmaster has a similar functionality (Tools / Robots.txt Analysis).


It is also possible to find out how many errors are in the monitored robots.txt.

True, if both checks show that there are no errors, this is not a reason to rejoice. This only means that the instructions in the file meet the standards.

But it may well contain many of the above errors that will lead to problems with indexing the site. Therefore, when checking robots.txt for errors, you should not rely only on such automated tools - you need to carefully check everything yourself.