"What does it take to rank well in Google?"
That question has been the catalyst for multiple ranking factor studies over the years, but is that type of study still relevant?
Many people find that ranking factor studies are a helpful guide to understanding what Google cares about. Others think ranking factor studies are misleading.
At Botify, we're of the opinion that ranking factor studies can be a good thing, but there's a better way. To understand why the studies need improvement, let's take a step back to define "ranking factors" and how they've been studied over the years.
What are ranking factors?
Ranking factors are signals that Google considers when determining how relevant a page is for a given query. The page's relevance to the query will then determine in what position it will rank in Google's search engine results page (SERP).
Ranking factors vs. SEO best practices
Ranking factors are not the same as SEO best practices, which can include things like not having a NoIndex tag on pages you want people to find in search results or using your robots.txt file to block Google from crawling pages you don't want in the index. The difference is that ranking factors help Google determine the relevance of the page to a query, while best practices can include things that help Google find and understand the pages on your site.
Google-confirmed ranking factors
Google has confirmed many ranking factors, such as page speed and HTTPS. They've also clarified factors that aren't used in web ranking, such as the keywords meta tag.
How many ranking factors does Google have?
Back in 2006, Google was quoted as saying that they have "over 200 ranking factors." Matt Cutts, who at the time was Google's head of web spam, later clarified that each factor has multiple variations.
We now also know that Google's algorithm is intelligent thanks to technologies like RankBrain and BERT, so it's changing all the time as it learns how to provide more relevant results to our queries.
What are ranking factor studies?
Ranking factor studies are an attempt to reverse engineer Google's algorithm. Typically, they fall into two categories:
- Ranking factor correlation studies: When you hear people refer to ranking factor studies, they're most often referring to a study that uses Spearman's rank correlation coefficient to determine the correlation between certain on-page factors and rank position.
- Ranking factor expert surveys: Some ranking factor lists are created by surveying SEO industry experts to get their opinion of which factors most heavily contribute to ranking well in Google's search results.
The factors from these studies become SEO truisms (e.g. "long content ranks better") that are then often used as checkpoints in SEO audits. We're told we need to fix things "because they're SEO best practice" without knowing whether or not it'll actually help our performance, and to what degree.
People love correlation studies because they give us the guidelines and certainty we crave in an industry where those are few and far between. However, there are three big problems with many ranking factor studies today.
1. Third-party data
Many ranking factor studies rely on third-party data. This is because the people performing these studies typically don't directly manage or own the site's they're evaluating in their study.
In the context of ranking factor studies, third-party data is information that's collected by a source that doesn't have direct access to the site, like Alexa's page speed data.
The danger of third-party data is that it may not be representative of a site or group of site's actual data.
2. Correlation vs. causation
Correlation studies cannot show us causation. In other words, just because something like direct traffic corresponds with higher rankings, it does not mean that direct traffic leads to higher rankings. It may be that the opposite is true (higher rankings leads to more direct traffic), or the correlation could be completely spurious.
When studying correlation, it's easy to fall into "chicken or egg" debates and mistakenly infer causation from correlation.
3. Selective queries/SERPs
Many ranking factor studies phrase their findings this way: "We found a positive correlation between X and ranking on the first page of Google."
The problem with this is that it lacks a lot of nuance.
SEOs know well that what it takes to rank for one query is not representative of what it takes to rank for all queries.
Let's take the two queries below, for example.
On the left, we have the query "red running shoes," which is most likely a transactional query -- most people typing this in want to buy red running shoes. The #1 ranking result is a product category page. It's just a feed. It doesn't have any "SEO content" up at the top. Google has determined that this page is a sufficient answer to the searcher's request for "red running shoes" even without a paragraph about red running shoes at the top.
Then look at the query on the right. "Is the keto diet effective" -- this is an informational query. Someone is doing research, and health/medical related research at that. The #1 ranking result for that query is an article that's nearly 2,000 words long. Google has determined, for this particular query, someone is seeking information on a topic that has the ability to affect the reader's health, so it makes sense that the top ranking page and the most relevant answer to this query is a fairly long article.
So, when you see ranking factor studies say that "long content ranks better," take that with a grain of salt.
How to conduct your own ranking factor studies
If ranking factor correlation studies lack the nuance and first-party data we need to make informed decisions, what's our alternative?
We recommend using your own site's data. Here's how.
Have the right tools in your toolkit
In order to determine what correlates with higher or lower rankings on your unique site, you'll want the following tools in your toolkit:
- Crawl Data: You're going to want to know all the data points about your site, which you can get with a crawl. Everything from the length of your non-template content to page depth to how many internal links it has to JavaScript elements and much more. In Botify, you get over 1,000 of these data points.
- Log File Data: You're also going to want to know how search engines are crawling the pages in your site structure (which you can find via a site crawl) or not crawling them, which is why you need a log file analysis tool.
- Keyword Data: You're also going to want keyword data, such as what keywords you're getting impressions for, how often they're being clicked, what the CTR is, what device those keywords are showing up on, etc.
- On-Site Analytics Data: And finally, to close the loop, you'll want to integrate your analytics data (typically Google or Adobe Analytics). This'll help you see which of your pages is getting organic search traffic, what the bounce rate is, what the time on page is, etc.
Once you're armed with this data, how do you use it?
We've put together four example use cases for finding your site's true ranking factors. If you're a Botify customer, you can repeat the same tests or use these as inspiration to come up with your own tests.
Example test #1: Is content freshness a ranking factor?
Conventional SEO wisdom tells us that refreshing your content is best practice because you don't want it to get stale. But because we want to operate on a "trust, but verify" policy, we'll want to check that for ourselves -- does updating content actually help it rank better?
We wanted to find out, so we use the "% content change" metric in Botify. "Fresh" is a relatively ambiguous term. People use it to describe content that's been updated recently, is updated frequently, or was published recently, so how do you measure that? So we opted to use "% of content change" as our proxy for content freshness. This metric shows you how much your content has changed since the previous crawl. So if you rewrote half a page, it'd show that as a "large content change" in Botify.
Keeping in mind the SEO funnel, we know that in order to rank you first need to be crawled, so we first looked to see how % of content change influences crawl metrics. On the site in question, we did see a correlation. Content that changed the most was crawled more frequently.
But what we really want to know is, does that also translate into higher rankings?
To find out, we layered ranking data onto that log data. Turns out, pages that were changed more were also ranking for more keywords.
They also rank in higher positions!
This is interesting, but we also need to be careful. The correlation we see here may not be causation, and there are likely many other factors at play -- such as how you change the content, which matters even more than just how much content you changed.
However, it's still enough to prompt us to keep testing this theory. Try testing it out on a small subset of pages and see if you get similar results. Try segmenting your pages to see if changing content works better on some page types than others (you can even segment your pages using existing site breadcrumbs!).
Speaking of which, we also created this same report for an e-commerce website, and find that the most refreshed content on an e-commerce site actually ranked the worst.
It goes to show that you can't really be sure if an "SEO truism" is true for your site until you see the data for yourself.
Want to learn more? Check out:
- Is content freshness a ranking factor? (to read the full study)
- How to evaluate the quality of your content (to learn more about % content change)
- How to create custom metrics tables (to set up your own ranking factor report)
Example test #2: Is page depth a ranking factor?
For our second test, we wanted to see if page depth (or "click depth" as it's often called) is a ranking factor.
Conventional SEO wisdom tells us that the deeper a page is on your site, the worse it will rank. So the question we used as the basis for our test was: "What impact does a page's depth actually have on SEO performance?"
To do that, we looked at the page depth metric.
In Botify, Page Depth is the number of clicks it takes to get to a page from the home page. The home page, as the starting point, has a depth of 0. In this example here it links directly to the pages Clothing, Shoes, Accessories, and Sale, so those pages all have a depth of 1. The pages that those pages link to will have a depth of 2, and so on.
We found that, on most large sites, the deeper the page, the less likely Google will crawl it, and uncrawled pages won't be added to the index and won't be able to rank.
Deep pages, even when crawled, tend to rank poorly because they have little or no internal pagerank (in other words, they aren't linked to a lot). This isn't always the case though -- some pages that are far from the home page in terms of clicks are still linked to a lot and therefore have good pagerank.
Often though, this is what we see: the deeper the page, the worse it ranks.
And when they rank poorly, they get less organic search traffic.
Again, while this is true of a lot of websites we looked at, it may or may not be true on yours, so definitely test this for yourself! And, when you find deep pages, ask yourself "are these even strategic?" For example, some deep/poorly ranking pages may be pages you care little about. Instead, try viewing a segment of your pages you know are important and evaluating their depth's correlation with rankings.
Want to learn more? Check out:
- Is page depth a ranking factor? (for the full study)
- Visit Search Engines > Top Charts (for page depth reports)
- Visit Distribution > Average Depth (for a list of your pages by depth)
Example test #3: Is content age a ranking factor?
The third test we wanted to run was finding out whether content age is a ranking factor. Conventional SEO wisdom tells us that new content performs better than old or stale content. To find out for ourselves, we used metrics like first crawl and active pages.
First crawl shows us how long Google has known about a URL, which is what we really mean when we're asking whether age is a Google ranking factor. This also helps us find a page's true age (at least in Google's eyes), as opposed to publish date, which is easily modified and isn't always a true representation of how long that URL has existed.
We're also looking at Active Pages, which is a Botify metric for whether a URL has received any organic search traffic in a given period. If it has, it's active.
So, what did we find?
Here, we're looking at the volume of active pages compared to how long ago Google first discovered the page. We did this across a few different types of pages, and you can see the variance between all four, especially classifieds pages! Note that for articles, we're talking about traffic on universal search and not Google News.
In general, a big chunk of active pages are pages that are 1-2 years old. For ad pages on classified sites through, active pages are mostly 0, 1, or 3 months old.
So from this study, we can see that there's a lot of variance between different sites as well as different page types. That's a plug again for the importance of not only looking at your own site's data, but also for bucketing your site into smaller, logical groups (e.g. if you have product as well as article pages).
Want to learn more? Check out:
- Myth buster: is organic traffic generated by new or older pages? (for the full study)
Example test #4: Is E-A-T a ranking factor?
For our fourth and final ranking factor study example, let's take a look at E-A-T and whether or not it's a ranking factor. But first, some background on E-A-T, which stands for expertise, authoritativeness, and trust.
- E-A-T comes from Google's quality rater guidelines documentation, and Google has made it clear that human quality raters don't directly influence rankings.
- However, these guidelines are a good indicator of what the algorithm is striving to reward.
- In order for something to be a ranking factor, you have to be able to measure it, so there are likely quantitative measures of more qualitative characteristics like trust.
Conventional SEO wisdom says that E-A-T is a ranking factor. So for this study, the question we asked was, "Does content that's written or reviewed by a subject matter expert rank better?"
Measuring this is a bit more manual, but we were able to do it by creating and saving a filter in Botify that we labeled "medically reviewed" (content on their site that was reviewed and edited by a medical expert) -- we did this using custom extracts.
Below, you can see the report we created for a site in the health vertical. We have two rows, one that groups all the medically reviewed content and one that groups all the ones that haven't been medically reviewed. There are a lot more that haven't been medically reviewed, and yet the smaller group of medically reviewed pages is ranking for more keywords, getting more clicks, and getting more impressions.
We definitely think this would be a good test to run in other YMYL industries, whether you're in the health space, legal space, finance space, or otherwise. YMYL is also something that's in Google's quality rater guidelines, and it stands for "your money or your life" and it's used to describe queries that have the ability to substantially impact the reader's money or livelihood. Google holds these types of pages to a much higher standard, which is why we see these types of sites hit so often after algorithm updates roll out.
Want to learn more about using custom extracts to create a similar report? Check out:
So, what's the value of ranking correlation studies?
When your job is optimizing for a complex, intelligent algorithm (which is the case for SEOs), there are always multiple factors at play, and those are changing all the time. This makes it really difficult to draw conclusions.
However, finding correlations in our own data (when used in conjunction with your experience and common sense) can definitely point us in the right direction and help us test and explore things we may not have seen otherwise.
So, is there value? When you rely on your own site's data, test, and remember that correlation isn't causation, we think the answer is… yes!