Google: Don’t Use 403/400 Error Responses For Rate Limiting Googlebot

Google has published instructions on how to properly reduce Googlebot’s crawl rate due to increased misuse of 403/404 response codes, which can have a negative impact on websites.

The guidance stated that abuse of response codes was on the rise from web publishers and content delivery networks.

Googlebot rate limit

Googlebot is an automated program from Google that visits (crawls) websites and downloads content.

Limiting Googlebot rate means slowing down the speed at which Google crawls a website.

The phrase, Google crawl rate, refers to the number of requests to web pages that Googlebot makes per second.

There are times when a publisher may want to slow down Googlebot, for example if it causes the server to become overloaded.

Google recommends several ways to limit Googlebot’s crawl rate, the main one being through the use of Google Search Console.

Determine the rate through the search console It will slow down the crawl rate for 90 days.

Another way to influence Google’s crawl rate is through Using a robots.txt file To prevent Googlebot from crawling individual pages, directories (categories), or the entire website.

The good thing about Robots.txt is that it only asks Google to refrain from crawling and does not require Google to remove a site from the index.

However, the use of robots.txt can have “long-term effects” on Google’s crawling patterns.

Perhaps that is why the ideal solution is to use Search Console.

Google: Stop price fixing with 403/404

Google has published guidelines on its Search Central blog advising publishers not to use 4XX response codes (except for response code 429).

The blog post specifically mentioned abusing 403 and 404 error response codes to limit rate, but the guidelines apply to all 4XX response codes except for the 429 response.

The recommendation is necessary because they have seen an increase in publishers using these error response codes for the purpose of reducing Google’s crawl rate.

A 403 response code means that the visitor (Googlebot in this case) is prohibited from visiting the web page.

The 404 response code tells Googlebot that the web page has completely disappeared.

Server error response code 429 means “too many requests” and this is a valid response to the error.

Over time, Google may eventually drop web pages from their search index if they continue to use these two error response codes.

This means that the ranking of pages in search results will not be considered.

Google Books:

Over the past few months, we’ve seen an uptick in website owners and some CDNs trying to use 404 and other 4xx client errors (but not 429) to try to reduce Googlebot’s crawl rate.

The short version of this blog post is: Please don’t do that…”

Ultimately, Google recommends using 500, 503, or 429 error response codes.

A 500 response code means that there was an internal server error. A 503 response means that the server is unable to process the webpage request.

Google treats these two types of responses as temporary errors. So it will come again later to check if the pages are available again.

The 429 error response tells the bot that it is making too many requests and can also ask it to wait a specified amount of time before re-crawling.

Google recommends that you consult their developer page About rate limiting Googlebot.

Read the Google blog post:
Do not use 403s or 404s to determine the rate

Featured image by Shutterstock /

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button