Back to learn seo
Crawl Budget: How Many Pages Search Engines Will Crawl on Your Site & How to Optimize It
We know that Google has massive data crawling, rendering, and indexing capabilities, and great big banks of servers all over the world. That doesn’t mean that the search engine resources are unlimited. Vast, but not unlimited.
When you have a finite resource – like time, money, or bandwidth – you budget it. When the SEO world talks about “crawl budget,” they’re talking about the resources that Google will spend on crawling, rendering, and indexing a specific website.
From an SEO perspective, the real-world impact of crawl budget appears in terms of how many of your pages are indexed and how quickly updates are reflected in the search results and cache.
While crawl budget isn’t a huge concern for smaller websites, large websites (especially ones that are JavaScript-heavy and change often) may run into issues.
What metrics measure crawl budget?
The following metrics can help you learn more about your site’s crawl budget:
- Pages crawled by Googlebot per day
- The number of pages reported as “Discovered – currently not indexed” in Google Search Console Coverage reports
- The delay between a URL’s submission via XML sitemap and Google’s indexing that URL
Crawl budget also has lagging indicators as well. If a page isn’t crawled, for example, it won’t be indexed. If it isn’t indexed, it won’t rank or earn any organic search traffic. On the flip side, crawl budget improvements can not only improve your leading metrics (e.g. total number of pages crawled), but also your lagging indicators (e.g. rankings and traffic).
If you’re not sure where to start, take a look at the deep pages on your site — the pages that take 5+ clicks to get to from the home page. Typically, the deeper a page is on the site, the less frequently search engine bots will crawl it.
How to improve a site’s crawl budget
Google is budgeting its resources to determine how frequently to return to your site, how much time to spend on it, and how deep to go into it. If you want to improve those searchbot behaviors, you’ve got to influence them.
According to Google, they determine crawl budget by looking at crawl rate limit and crawl demand.
Improving crawl rate limit
Crawl rate limit is a combination of factors that, when taken together, determine when Google will stop crawling your website. Factors that influence crawl rate limit include:
- Page speed (e.g. How fast/slow are the pages loading?)
- Error codes (e.g. Is the crawler running into 5xx errors?)
- GSC crawl limit (e.g. Have you manually set a limit in Google Search Console?)
This is where collaborating with your development team is crucial, since they are often the ones that can help make page speed improvements, server improvements, and other technical changes to the website.
Improving crawl demand
The other factor Google looks at to determine crawl budget is crawl demand. Factors that influence crawl demand include:
- Popularity
- Content freshness
The concept of “crawl budget” has been described as an allotment of time the bot spends on a website. If your pages are too slow, it will necessarily have to spend more time executing them, rather than skipping merrily along to the next page. If the landing page it finds when it gets there is old, repetitive, or low quality, it will keep going, but eventually it will determine to spend less time on the site the next time it visits.
Common problem areas for crawl budget
There are a few top contenders for problem areas when it comes to crawl budget:
- Faceted navigation
- Mobile Link Structure
- Page Bloat
Faceted navigation is usually the worst culprit for crawl budget issues, as it is easy to create without noticing. When search filters, parameters, and on-site searches create new URLs that are accessible to the Googlebot, there are potentially hundreds of thousands of URLs being created and indexed. Some of these pages may be of value to your potential search users, but certainly not all of them are valuable. The key with faceted navigation is to be deliberate and strategic about it. Choose which pages to link to, optimize, and ensure are different from everything else on your site. The remainder can be marked no-index, or better yet, suppress the generation of new unique URLs with those parameters entirely.
Most of the time the Googlebot that is crawling and indexing the pages first is the mobilebot. This is important when the internal link structure and navigation on your mobile site are different from the desktop site. If the mobilebot can’t discover and crawl pages, then they are likely to wait a long while to be discovered.
As above, the quality and value of the pages for search users does impact the amount of time Google will spend on a site. Strategic pruning, redirecting and no-indexing of low-quality landing pages takes time, but can pay off in terms of crawl budget, not to mention overall site quality.