If you run a business, the first things you get to grips with are time management and budget control. Well, although Google can sometimes seem like they aren’t like anyone else – they are. Google has resources and they allocate what they are willing to spend on crawling your website, this is your crawl budget. I will discuss some of the factors surrounding crawl budgets, what they are exactly and some advice on how to manage your crawl budget for the best results.
How is a crawl budget allocated?
The amount of resources that Google is willing to spend crawling your website is determined by a number of factors. Google looks at your website:
- Update rate
- Number of pages
- Capacity to handle crawling
Although the algorithm oozes sophistication, you have the ability to intervene and sway the way that Google actually crawls your website.
How important is a crawl budget?
To put this in simple terms – very important. Crawl budget is the determining factor in how quickly your page appears in the search results. The reasons that can typically have an impact on the importance of your website are a poor user experience, spammy content or a combination of both. If google doesn’t consider your website to be important enough, it will set a low crawl budget. If this is the case then the best thing to do is to better your content and you will start to see improvements.
Crawlers are automated, with automation, you need to follow the guidelines or you will begin to run into problems, one such problem is – Crawling Traps. Crawlers can get stuck in a loop, fail to find your pages, stumble across faulty relative links, a soft 404 page instead of an actual 404 page and that’s the trap set in motion.
How much time should be spent on crawl budgets?
If you are running a medium to large website that is updated once a day to once a week, then a focus on the crawl budget is essential to ensure that a bottleneck or lag of indexing doesn’t occur. As with anything in life, it is better to look for the issues before they become issues. Audit for issues on your website, no matter the size.
6 Top tips to optimize your crawl budget
Chances are that you have heard about sitemaps, the document that contains all of the information that you would like to be crawled and indexed in search. Whilst sitemaps are not compulsory, rather a recommendation, they are a recommendation for a reason. Without having a sitemap, Google will have to blindly discover pages following internal links on your site. With a sitemap, Google knows your website, its size, and which pages are supposed to be indexed.
Some platforms have an auto-generated sitemap option, or there are plugins and sites that will help you on your quest to inform Google. If you have a large website, you might require multiple sitemaps, as the limit is 50 thousand pages.
Crawling Conflicts and How to Resolve Them
Crawling conflicts send Google mixed signals and eat into your crawl budget unnecessarily. Google Search Console to the rescue – Check your coverage report. There is a dedicated ‘Error’ tab for crawling conflicts, this details the number of errors, the type of errors and the pages that are affected.
Common crawling issues:
- Accidental submission of a page (Requires it to be unsubmitted)
- Access is being denied through a technical issue (Use your tools)
- Pages that should be hidden are crawled
If the wrong method is used to block a page, such as blocking with a robots.txt file. It is not uncommon for Google to see this recommendation (robots.txt are only recommendations), and decide to show this blocked page regardless, meaning something you might want private is publicly available.
2. Non-essential resources
If you look at the makeup of your page, there is a good chance that you have a lot of images, video content and potentially GIF’s that are ‘decoration’ rather than vital to the understanding of the page. Although you might consider these as non-essential, Google doesn’t know this and these files use up your crawl budget.
Thankfully, you can disallow resources individually by name:
Or, you can also disallow a file type in general, so if you have a lot of the same type of files;
3. Optimize the sites structure
Internal linking does not have a direct impact on your crawl budget, but the general rule for important content on your site is that they are never more than three clicks away. Google says that the pages linked directly from your homepage may be considered more important and crawled more often.
Using these steps will help to get the most out of your crawl budget, but if you have just published a page and don’t wish to play the waiting game, remember that Google’s Search Console has its request indexing feature. Paste your URL, click enter and Google will see this and come to crawl it as soon as possible. While backlinks and content are generally the things that are focused on, sometimes attention needs to be given to crawlers, improving the performance of your site so that your valuable content can be found.
4. Avoid long redirect chains
Your crawl budget is calculated in ‘units’, each task carried out takes one of your units. If you have long redirect chains, the search engine will only follow for so long and then stop, resulting in the destination page not getting crawled. The general rule of thumb is to use no more than two redirects in a row.
5. Resolve duplicate content issues
Duplicate content is having two or more pages with largely similar content. This can be attributed to Dynamic URLs, A/B testing, content syndication, and sometimes is down to the CMS platform that is in use. Having duplicate content uses up your crawl budget, double or triple crawling the same piece of content.
A website audit would tell you where to find the duplicate pages, use your tools and search for duplicate titles and meta-descriptions. Once you have found the duplicate content, you can let google know to ignore that duplicate and focus on crawling the main page: This is called ‘canonicalization‘.
6. Optimize the sites structure
Internal linking does not have a direct impact on your crawl budget, but the general rule for important content on your site is that they never are more than three clicks away. Google says that the pages linked directly from your homepage may be considered more important and crawled more often.
Using these steps will help to get the most out of your crawl budget, but if you have just published a page and don’t wish to play the waiting game, remember that Google’s Search Console has its request indexing feature. Paste your URL, click enter and Google will see this and come to crawl it as soon as possible. While backlinks and content are generally the things that are focused on, sometimes SEO attention needs to be given to crawlers, improving your site’s performance so that your valuable content can be found, indexed and available to your audience.