Watch out for sneaky 404s!
Removing 404 "Not Found" pages linked on a website is a basic, necessary and recurrent SEO task. Most of these errors are quite straightforward to identify and correct, by listing pages that contain a link to a 404 (many SEO tools allow to do so), and replacing or removing the link. But this may not be enough, as some 404s are sneakier than others: those linked on the website through redirections will remain. These dead ends can be identified with Botify Analytics.
Identifying Redirects to 404s
Botify Analytics allows you to identify redirects to 404s easily, using the URL Explorer.
In the report, HTTP codes tab, click on the client errors block:
This displays a sample list of 404 (Not Found) and other client errors (such as 410 - Gone, 401 - Unauthorized, 403 - Forbidden), along with pages that link to them (their referrers).
Click on "Explore al URLs" to see the full list and explore results.
Let's select only URLs which are the target of a redirection (add a filter to indicate that the "redirected from" flag must exist), and display redirections source URLs (start typing "redirected from" in the fields to display and select from the drop-down list - and remove some of the preselected display fields to get a leaner results table).
If there are different types of client errors, it's also a good idea to filter on "HTTP 404", then reiterate with other error types, as they will most probably have different causes.
Click on "Apply".
In this example, we can see that some 404s are only found through redirects: they don't have any internal incoming links.
Redirect Chains
In some cases, it can make sense to also check if there are redirect chains to 404s. This typically happens when:
- The server makes several unsuccessful attempts at correcting the URL (add or remove trailing slash, change subdomain….) before ending up with a 404
- There are old redirected links on a website that migrated to HTTPS, and HTTP to HTTPS redirects are added to the mix
We'll need to first identify all redirect chains, and then match the results to the 404s above.
Going back to our example, if we change the selected HTTP status codes to 3XX, we'll get all redirect chains (URLs that return a redirection, and are themselves the target of a redirection). Let's also add "redirects to" to the fields to display and click on "Apply".
As all information in the URL Explorer is URL centric (related to the URL in the first column), the first line in this example means:
[URL from "redirected from" column] -- 302 redirect --> [URL from "URL" column] -- 301 redirect --> [URL from "redirects to" column].
These results cover all redirect chains, whatever the HTTP status code of the redirection target ("redirects to" column).
If, as in our example, there are many redirect chains, we can narrow the results down by adding filters to :
- Select only one type of redirect (301 vs 302)
- Exclude obvious types we know return HTTP 200 (click on any URL from the "redirect to" column to find out from the URL's detailed information)
In our example, it makes sense to:
- Select 301 only (it appears that most 302 are related to user login pages)
- Exclude URLs related to RSS feeds, by filtering out URLs containing "/rss/" (a quick check shows that redirection targets returns HTPP 200)
If there are few results, looking at these results and the 404s from redirects will be enough to understand the situation. If there are quite a few results, then we'll have to export redirect chains, export 404s from redirects and consolidate data with spreadsheet software to get redirect chains ending with 404s.
Wondering About Longer Redirection Chains?
What if the redirect chain is longer than 2 redirects? This is more unlikely, but it can happen. We will also be able to detect that it's the case as longer redirect chains will appear split into lines.
For instance, A → B → C→ D will appear as:
- B is redirected from A and redirects to C
- C is redirected from B and redirects to D
So to check for these, all we need is an additional consolidation step: build full redirect chains first, then consolidate with 404s from redirects.