It is a well known and almost universally accepted fact that Google will reward your content for something called “Readability”. As a result there have been many off the shelf content marketing tools ( think Grammarly, Hemingway readable.com etc) that have attempted to help people publish content with optimal Readability scores. The thought is that if content is published according to recommendations of these tools, they will be more likely to rank on Google Search Engines.
But what EXACTLY is Readability and how has it traditionally been measured? This case study takes a deep dive into the data science around Readability scores. We recently examined the top 20 webpages for 1,000 popular keywords to find out if Readability Score influences Google Search rank.
We used Originality.AI's Text Readability Analyzer to score content readability according to 10 popular readability systems that are used in the most common and popular Readability Tools. Our study revealed some interesting findings that we are eager to share with you!
A Sneak Peek Of Our Key Findings
The highest-ranking pages on Google (those in the top 20) have similar readability.
The four readability systems shown below offer more accurate predictions of top-ranked content and produce normally distributed data with narrow standard deviations.
What does that mean? For optimal results aim for these readability scores using these Readability systems:
Avoid using Readability systems such as SMOG, Coleman-Liau, Flesch-Kincaid Grade, and Automated Readability.
Why? Because they lump college-level and higher scores into grade 12 which creates highly skewed data!
Although no significant correlation of readability to Google rank was found, it is clear that top ranked pages have tightly clustered readability scores.
Most online text content is tailored to people with a high school education level.
Quantifying ( Measuring) Readability
A readability score is simply a number assigned a piece of content that reflects how difficult it is to understand. Various systems have been developed for this purpose, and the ones we used for this study include:
Each of these systems uses an empirical approach to calculate a score, incorporating one or more of these metrics:
Word complexity (syllables per word)
In general, a readability score can be understood as the number of years of education required to comprehend a piece of written content. However, this does not apply to the Dale-Chall and Flesch Reading Ease methods. According to Wikipedia, scores from these two systems can be converted to grade equivalents, as demonstrated in Table 1 and Table 2.
The first four readability systems employ more sophisticated approaches than the latter four, which do not take into account exceptions for common “complex” words. A complex word typically consists of three or more syllables (except for FORCAST, which considers two-syllable words).
Why is this important? Factoring in common words with complex words is crucial, as it can adversely affect score accuracy. Words that should be excluded from the category of complex words include:
Words ending in common suffixes (such as -es, -ed, or -ing)
Furthermore, the constants used in the formulas of these systems result in scores being effectively capped at 12, even though they should theoretically range up to 20. These two factors fundamentally influence the performance of the two groups, which is why we have chosen to categorize them separately.
Origin of the Dataset ( Genesis)
Still with us? Let’s dig into the origin of our dataset.
Our dataset comprises 13,582 entries categorized by keyword, Google rank, URL, and scores calculated using the 8 different readability methods mentioned above. We collected keyword, rank, and URL data using popular organic research tools, and analyzed the content from each website with Originality.AI's Text Readability Analyzer. We reduced our initial 20,000 entries to 13,582, as some website content was too brief for accurate evaluation.
Digging Into the Data
Interpreting our data posed several challenges, such as the varying scales of different readability metrics, which range from 0 to 10, 20, or even 100.
We encountered a mix of normally distributed data and highly skewed data. Furthermore, while a high score for Flesch Reading Ease is preferred, lower values indicate easier comprehension for all other methods.In this study, we divided the eight readability systems into two groups of four. Figures 1 and 2 demonstrate the rationale behind this division: one group exhibits normal distributions, while the other group's distributions are highly skewed.
The FORCAST, Gunning-Fog, Flesch Reading Ease, and Dale-Chall methods exhibit normal distributions, low skewness, and narrow standard deviations. Although these readability methods provide high-quality data, we found no significant correlation to Google rank (as seen in Figure 3). Our attempts to divide the data at the mean to determine if a correlation towards the mean exists yielded inconclusive results.
What does that mean? Even though inconclusive, this data was noteworthy in our analysis and forms the foundation for our most crucial finding: that the highest-ranking pages on Google share similar readability levels.
The following four readability methods—SMOG, Coleman-Liau, Flesch-Kincaid Grade, and Automated Readability—depicted in Figure 2, exhibit distinctly different characteristics.
Each has a highly skewed data distribution, and their standard deviations are even narrower than the first group's. While these data also revealed no significant correlation to Google rank, they strongly suggest that the majority of the content we analyzed corresponds to a 12th-grade reading level.
As mentioned earlier, there appears to be a mathematical artifact affecting the maximum scale of these four methods.
We believe that the way their formulas are constructed effectively caps their maximum range, grouping difficult-to-read material into the 12th-grade level, which prevents these methods from generating normally distributed data.
The “Moneyball” Approach
When we couldn't find any significant correlations in our data, we decided to adopt a "Moneyball" strategy. Instead of attempting to identify the reasons behind the results, we focused on analyzing the patterns we observed. In the end, it is more important to recognize the characteristics of an MVP than it is to determine how each contributes to their MVP status. After all, if you can spot the winner, does it really matter why they win?
To enable accurate comparisons, we normalized each readability method to its grade-level equivalent, as demonstrated in Figure 3. Except for the Dale-Chall method, the scores from the other methods concur that content should be aimed at a 12th-grade or college reading comprehension level. Of course, it is essential to tailor your content to your target audience, but our data suggests that this level is the optimal target unless you are focusing on a specific demographic.
Suggested readability scores from various sources and tools consistently agree on easier readability levels than this study has found.
We advise aiming for the scores supported by the hard data presented in this article and study— a grade 12 reading level— rather than adhering to conventional wisdom that is suggested by popular tools.
Our findings from our Data Analysis deviate quite a bit from the recommendations of numerous readability methods that have been adopted by some of the most popular Readability tools out there.
Although it is commonly accepted that content should target readers at a 10th-grade education level, our data consistently reveals that the mean readability grade is 12 or higher in nearly every case.
How can these well-established and widely accepted readability metrics be inaccurate and just plain wrong in determining the ideal scores for online content with relation to their ranking capabilities?
One possible explanation is that these readability systems, originally developed between the 1940s and 1970s, may no longer accurately represent current literacy levels. IE they are outdated and no longer accurate.
Alternatively, it could be that the internet audience is not as diverse as the global population and may self-select readers with above-average reading abilities.
Walk Away With These Key Takeaways
Top 20 ranking pages on Google have similar readability.
Use these readability systems and score targets for optimal results:
Avoid SMOG, Coleman-Liau, Flesch-Kincaid Grade, and Automated Readability
Top ranked pages have tightly clustered readability scores.
The majority of online text content caters to people with a grade 12 level of education.
Releasing Originality.AI’s Readability Tool
Originality.AI’s Readability Feature was developed with the intention of equipping both writers and publishers with a superior, modern and up to date tool that will help them to create and publish content that scores optimally based on the correct readability criteria proven by their data analysis study.
With its new technology, the Originality.AI tool is able to analyze a piece of content and provide a content score as well as suggestive guidelines th its true ability and likelihood of ranking on Google Search Engines.
Originality.AI’s Readability Tool Interface
Highlights sections and makes recommendations to improve Readability
Scans for AI, Plagiarism and Readability with one click
Allows user to navigate through each feature to access results and recommendations
Originality.AI’s Readability Recommendations by Readability System
A key feature of the Readability Tool is its ability to make recommendations based on various readability systems:
The Gunning Fog Index is a readability metric that accounts for sentence length and the number of complex words.
The Flesch-Kincaid Reading Ease formula is designed to assess the readability of a text by examining the average sentence length and syllables per word. Higher scores indicate easier to read text.
The FORCAST grade level is for technical documents. The formula measures text readability based on the frequency of single syllable words.
The Dale-Challl Readability Grade is the test for measuring readability using words familiar to a fourth grader. The lower the score the more readable.
We hope that this feature will help inform your publishing decisions around the readability of your content and that you will find the recommendations with regards to each Readability system useful. This Feature is now available on the Core APP and the API and is complimentary for those who are scanning content for plagiarism.
Founder / CEO of Originality.AI I have been involved in the SEO and Content Marketing world for over a decade. My career started with a portfolio of content sites, recently I sold 2 content marketing agencies and I am the Co-Founder of MotionInvest.com, the leading place to buy and sell content websites. Through these experiences I understand what web publishers need when it comes to verifying content is original. I am not For or Against AI content, I think it has a place in everyones content strategy. However, I believe you as the publisher should be the one making the decision on when to use AI content. Our Originality checking tool has been built with serious web publishers in mind!