Some empty pages may be lurking in your website, unnoticed, making robots' job harder. Being empty, they don't generate any organic visits. The main issue here is crawl waste: when Googlebot is exploring these pages, it's not crawling your actual content. That's why we should track these pages and prevent them from being crawled.
Beware of systematic, crawlable empty pages
A few empty pages don't seem like a red alert. But some empty pages are generated systematically by content pages, and then, the problem reaches very serious proportions.This is potentially critical, for instance, for websites with large amounts of daily new content such as forums. Google could be spending a significant amount of its daily crawl budget on new empty pages and failing to discover some of the new content (while finding that half of the new pages are not interesting, which is not an incentive to come and get more as fast as possible). Perhaps URL patterns are very clear and can give Google a hint, but perhaps it's not that clear, or Google may not go out of his way trying to figure it out. We shouldn't take that chance.The bulk of empty pages are usually created by user action links, when managed through their own URL in an tag (as opposed to Javascript). These account for 2 of the top 3 causes listed below. Top causes of empty pages1) Links meant for registered usersTypically, any link that allows a user to act on a particular content and requires him to be logged-in, for instance:
- Write a review about this product
- Reply to this post / comment
- Report abuse for this post /comment
- Manage this ad (classifieds websites)
- Contact us about this,
- Email this (with email form hosted on the website)
- Number of pages with the same Title
- Number of pages with the same H1
- Page code size
- Page that has the one link to the page we are looking at
In most cases title and/or H1 will be identical on all empty pages with the same cause. Page size can also give a hint, if there are a number of pages with a similar size (the size alone is not enough, there may be a heavy template).Here are corresponding filters and displayed fields settings (remove unwanted settings by clicking on the cross; make your selections from the drop-down lists; for displayed fields, start typing a part of the field name in the "fields to display" area to narrow down the selection):In this example (ordered by highest number of URLs per H1 - click twice on the H1 column header to sort), there is a meta-redirect on several thousand pages:You can easily see more information about an empty page, or the page it is linked from, simply by clicking on the URL. To go directly to the page on your website, click on the blue arrow on the right of the URL.Let's say there are a number of pages with the same H1, and you would like to be able to see only one example URL for each different H1. Add "First duplicate H1 found" to the filter rules and click on "Apply".Out of those, we can also find out which may be already identified as useless for SEO and as a result have a noindex meta tag - but still create Google crawl waste because the search engine has to request the page to find out it should not be indexed.Add a "has robots anchors as 'noindex' " filter set to "true" and click on "Apply".
See what happens when navigating the website through robot's eyes
As causes are very specific, we can also approach the problem the other way around, and check what a robot gets while navigating the website. If we find empty pages, we can then search for URLs with the same pattern in Botify Analytics' URL Explorer.Go to your website after disabling all Javascript, cookies and meta-redirects in your Web browser using a developer add-on (for instance, this Firefox extension). Click on all user action links you can find to see if you get a new page (new URL).This will also allow to find empty pages that are disallowed to robots (disallow directive in the website's robots.txt file or nofollow in link to the page). This is not the worse case scenario, but it's not ideal either. Although Google won't waste any crawl on these pages, they still cause significant link juice waste, if links to empty pages are coded as with tags: these links are assigned a portion of the pagerank of the page they are on, but don't transmit it to any page (it falls into a PR "black hole"). We won't be able to query these pages in a standard Botify Analytics report, as the Botify robot the same rules as Googlebot. But we will still get infomation about links to disallowed / nofollow pages in the "Outinks" section of the report (from the perspective of the crawlable page they are linked from).And if you wanted to know more about empty pages that are currently disallowed in the robots.txt file, you can still do another Botify Analytics crawl using the Virtual Robots.txt functionality: paste your robots.txt file's content and remove the line that disallows these pages before starting the crawl.What's your experience? Do you see some other causes of empty pages spoiling your SEO? We’d love to hear about it, do not hesitate to drop a comment on this post!