Saying "crawl budget is important" is like saying "summer is over" - both are true and sobering statements.
Luckily, we can do something about it: we can still enjoy the fall season and optimize the crawl budget with 7 actionable steps below!
1. Take a closer look at your user profile pages.
In the age of sharing economy and online communities, many sites offer us the ability to create profiles, post comments, sell things, publish content, or curate and save collections of items. For instance, publishers or media sites often host user forums, marketplaces allow for sellers' and buyers' profile functionality, recipe sites invite food lovers to post recipes, and create and save recipe collections.
This presents a great experience for users: they are encouraged to create and consume more content and engage with fellow readers. However, from the bots' point of view, user profile pages offer little to no value since they are templated, numerous, and don't include unique or useful information.
With Botify's SiteCrawler, we often see that user profile pages take up a good chunk of crawl budget. Bots spend time crawling them before getting to your more strategic, "money" pages.
One of the examples we came across was a software provider where users had to create profiles to download distributions, so the site ended up with hundreds of thousands of profile pages, and the only unique fields were usernames and locations. These pages showed no value to search engines - why index them if nobody would search for a username/location combo?
Marketplaces often see this issue too: some sellers have many listings on their seller profile page, while others will only have a handful, and buyers will have 0. Exposing profile pages to bots is only helpful for most active, content-rich profiles.
Example: bots are spending a lot of time crawling "users" pages, but those pages don't drive organic traffic, resulting in waste of crawl budget.
With Botify's SiteCrawler and LogAnalyzer, you can check: are the bots spending too much time crawling user profile pages, containing thin or templated content? Then decide whether to allow crawling for all, some, or none of them, based on value for searchers and bots.
2. Parameter URLs: how to handle
URL parameters (sometimes called query strings) are a part of the URL that follows the "?" symbol. They can be used for identifying information, sorting, filtering, paginating, or tracking.
Parameters are also a very common and obvious crawl-budget "eater." Luckily, with parameter URLs, a quick fix can make a big difference.
Here are some ways to optimize parameter URLs:
- Remove empty parameters and stick to the same order of keys. Example:
store.com/shoes?category=womens&type=boots
store.com/shoes?type=boots&category=womens
Choose one way to order (for example, alphabetized: "category", then "type") and stick to it.
- Combine multiple parameters into one and let the backend handle functionality. Example:
store.com/shoes?selector=category-womens|type-boots
Here, instead of dealing with 2 parameters, Google only needs to handle one - named "selector".
- Use Google Search Console to let bots know what parameters to ignore. Example:
In this screenshot, we are telling Googlebot that the "url" parameter doesn't change page's content = can be ignored, and Google doesn't need to crawl all URLs with this parameter. The "sort" parameter only changes the order of content, and not the actual content = that's why it is set to "Sorts". The "size" parameter filters the content, meaning there will be less results shown on page = that's why it is set to "Narrows".
3. Faceted navigation: how to check if it's a problem
Faceted navigation can manifest as a variation of the issue #2 above, with parameters in the query string (store.com/shoes?type=womens&size=7&color=red), or it can be implemented as clean URLs (store.com/womens-size7-red-shoes.html).
Regardless of the implementation, the problem with faceted navigation is page bloat: multiple variations of facets/filters produce URLs in the orders of magnitude! Some of them will have search demand, but most will not.
With one shoe retailer, we were able to spot that faceted category pages made up almost 90% of the site, but less than 1% of them received organic visits: those were unique sizes (either very small or very large), extra wide or extra narrow shoes, and shoes with arch support. The rest were just wasting our crawl time!
Example: Out of 427k facets, only 403 were getting visits:
One marketplace took some strict measures, cut URLs in half by removing facets, and saw a significant increase in crawls to its strategic pages: 19x increase in crawl activity in 6 weeks.
Combining your crawl, logs, visits, and keywords data will show you exactly which facets you should keep, and which ones should be removed, blocked, or set to "noindex."
4. Search pages
Similar to the faceted navigation issue, allowing all of your search queries to be exposed to bots can be problematic. Also, quite often, search URLs don't drive any visits.
To tackle this, you can use insights from Botify's SiteCrawler, RealKeywords, and EngagementAnalytics as well as your internal search tool to see what terms you are likely to rank for and whether you need to have those specific search pages open to bots.
Instead of opening search URLs, consider creating landing pages for your most popular or targeted keyword terms to capture top-funnel traffic.
Then, focus on optimizing your product pages for long-tail terms.
5. Pagination: many websites get it wrong
This is another issue we see a lot because it's very easy to get it wrong and make SEO pagination mistakes.
A couple of rules of thumb for pagination:
- If a page in a sequence shows unique content, it should be canonical to itself (and not page 1). Otherwise, bots get mixed signals: different content, same canonical, which can throw off your canonical/duplicate content strategy
- Check the correlation between page depth and crawl rates: to avoid deep pages, consider only allowing bots to go N levels deep
6. Tag pages: beware of thin content
Tags are a useful feature for site visitors, who can quickly navigate to related content. Tag pages can also serve as hubs for thematic content for publishers. However, many sites that allow user-generated or user-submitted posts with tags sometimes see that tags can be overused and result in too many tag pages with thin or duplicate content.
Which, again, drains crawl budget. To optimize your tag pages, use segmentation in SiteCrawler, and the Content Quality feature to detect thin or redundant content.
7. Page load time
This may or may not be news for you - but page speed is an important factor for managing crawl budget.
Google's crawl rate is one of the two components (the other is crawl demand) that determines your crawl budget. So, the faster your site, the more pages bots can crawl in the allocated time frame.
Even though this idea is easy to understand, page speed is a hard problem to fix. With lots of content, interactive elements, third-party plugins (like related content and reviews), sites become heavy and slow. Add scaling issues to the mix, and you have a real puzzle: how do I allocate dev resources, manage business goals, and improve user experience - all while keeping load speed in mind?
To address all of this, Botify released a product suite called Botify Activation. SpeedWorkers is a part of Botify Activation, and its goal is to optimize for speed and to render important content for bots.
- Makes sites 7-10x faster right out of the box
- Provides a click-of-a-button option to remove tracking parameters, minimizing the URL parameter issues
- Pre-renders content that typically relies on JavaScript, improving content quality and rendering speed
- Can perform quality checks for thin content, for example, count the number of articles on tag pages and raise a warning
Faster pages for bots means more of your content is crawled, indexed, and driving value for your business.
To summarize it all: we know that managing crawl budget is hard. Search bots have their own agenda: to crawl as many pages as possible, and to get quality content to rank. And we should be guiding bots to pages that are most important to our business goals. Managing crawl budget gives you back the power to do that.
I hope this article gave you ideas and actions to add to your SEO roadmap, whether it's performing the technical cleanup, doing a content review, or optimizing page speed to gain crawl budget.
Which out of these 7 strategies would be most valuable for your website right now?