What is the purpose of serpstatbot?
Our bot constantly crawls the web to add new links and track changes in our link database. We provide our users with access to one of the largest backlink databases on the market for planning and monitoring marketing campaigns.
What happens to the crawled data?
Crawled data is added to the backlink index. You can get access to the data from the index in Serpstat.
How do you handle 404 or 301 pages?
We collect historical data to make sure that no temporary change has an impact on your site. If the links to pages with a 404 or 301 server response code still exist, serpstatbot will continue to find and follow them. You can check Google 404 policy to find more information.
Does the bot crawl links with the rel = nofollow attribute?
Although Google claims that rel=nofollow links don’t influence the Pagerank, the bot crawls the target page anyway. If you don’t want serpstatbot to crawl such links, use the robots.txt file to disallow the target page. You can find more information on Wikipedia Nofollow.
How can I block serpstatbot?
Serpstatbot addresses to the robots.txt standard. If you don’t want the bot to crawl your website then add the following text to your robots.txt:
Please always ensure the bot can get robots.txt file itself. If not, then it will crawl your site by default.
If you think that serpstatbot doesn’t follow your robots.txt commands, please contact us via email: email@example.com. Please attach the URL to your website and log entries that show bot trying to retrieve pages that it was not supposed to.
What commands in robots.txt does serpstatbot support?
Serpstatbot supports the following additions to robots.txt:
- Crawl-delay up to 20 seconds (higher figures will be cut)
- Redirecting within the same site when trying to get robots.txt
- Simple pattern matching in Disallow directives consistent with Yahoo’s wildcard specification
- Allow directives prevail over Disallow if they are longer
- Failures to retrieve a robots.txt file, for example, 403 Forbidden, are considered as the absence of any prohibitions. In this case, the bot will crawl all physically accessible pages.
Why doesn’t my robots.txt block work on serpstatbot?
There are several reasons:
- Off-site redirects after requesting robots.txt: serpstatbot goes only to redirects of the same domain.
- If several domains run on the same server, some servers can log access to these domains to a single file without specifying a domain name. You should add a domain name to the access log or split access logs on a per-domain basis.
How can I slow down serpstatbot?
You can slow down bot by adding the following settings to your robots.txt file:
Crawl-Delay must be an integer: it stands for the number of seconds to wait between requests. Serpstatbot accepts a delay of up to 20 seconds between requests to your site. High Crawl-Delay should minimize the impact on your site. This parameter will also be valid if it is set for User-Agent: *.
If our bot finds out that you have used Crawl-Delay for any other bot, then it will automatically slow down the crawling process.
How do I verify that requests are from you?
Unfortunately, we can’t limit our bots to a set of static IP addresses. However, we can send a pre-made string of identifiers along with all requests to your site. This can be a part of the http or https headers in the CRAWLER-IDENT field or a part of the User-Agent line. Serpstat won’t use it to another domain or subdomain, so you will know that requests containing this string come from our network. To take advantage of this opportunity, please contact firstname.lastname@example.org with information about your website and the identifier you want to send. We can also generate a random identifier string for you.