What is the purpose of serpstatbot?
Our bot constantly crawls the web to add new links to our database and monitor the link that it found earlier. That's why we can offer our users the most comprehensive data. This data collected by serpstatbot helps digital marketers in planning and monitoring their online marketing campaigns.
What happens to the crawled data?
Crawled data is added to the public backlinks index that we maintain.
How do you handle 404 or 301 pages?
We collect historical data to make sure that no temporary change has an impact on your site. If the links to these pages still exist, serpstatbot will continue to find and follow them. You can check Google 404 policy to find more information.
The bot crawls links with rel=nofollow
Google claims that rel=nofollow links don’t influence the Pagerank, so the crawler still visits the target page. If you don’t want serpstatbot to crawl such links, then use the robots.txt file to disallow the target page. You can find more information in Wikipedia Nofollow.
How can I block serpstatbot?
Serpstatbot addresses to the robots.txt standard. If you don’t want the bot to crawl your website then add the following text to your robots.txt:
Please always ensure the bot can get robots.txt file itself. If not, then it will crawl your site by default.
If you think that serpstatbot doesn’t follow your robots.txt commands, please get in touch with us via email: firstname.lastname@example.org. Please attach the URL to your website and log entries that show bot trying to retrieve pages that it was not supposed to.
What commands in robots.txt does serpstatbot support?
Serpstatbot supports the following additions to robots.txt:
- Crawl-delay up to 20 seconds (higher figures will be cut)
- Redirecting within the same site when trying to get robots.txt
- Simple pattern matching in Disallow directives consistent with Yahoo’s wildcard specification
- Allow directives prevail over Disallow if they are longer
- Failures to retrieve robots.txt such as 403 Forbidden will be addressed as blanket disallow directive
Why doesn’t my robots.txt block work on serpstatbot?
There are several reasons:
- Off-site redirects after requesting robots.txt: serpstatbot goes only to redirects of the same domain.
- Several domains run on the same server. Some servers can log accesses to multiple domains to one file, and this can lead to difficulties. You should add domain information to the access log, or split access logs on a per domain basis.
- Robots.txt is out of sync with the developer copy. Serpstatbot may not follow robots.txt only to see what the developer was testing against a development server.
How can I slow down serpstatbot?
You can slow down bot by adding the following settings to your robots.txt file:
Crawl-Delay must be an integer: it stands for the number of seconds to wait between requests. Serpstatbot makes a delay of up to 20 seconds between requests to your site. Please note that your website can be scanned from several serpstatbots at the same time. High Crawl-Delay should minimize the impact on your site. This parameter will also be valid if it has been used for the * wildcard character.
If our bot finds out that you have used Crawl-Delay for any other bot, then it will automatically slow down the crawling process.
How do I verify that requests are from you?
Unfortunately, we can’t restrict our bots to a limited number of IP addresses. However, we can send a pre-made string of identifiers with all requests to your site. This can be a part of the http or https headers in the CRAWLER-IDENT field or a part of the User-Agent line. We won’t forward this string to anyone else and send it to another domain or subdomain, so you will know that requests containing this string come from our network. To take advantage of this opportunity, please contact email@example.com with information about your website and the identifier you want to send. We can also generate a random identifier string for you.