But what EXACTLY is Readability and how has it traditionally been measured? This case study takes a deep dive into the data science around Readability scores. We recently examined the top 20 webpages for 1,000 popular keywords to find out if Readability Score influences Google Search rank.
We used Originality.AI's Text Readability Analyzer to score content readability according to 10 popular readability systems that are used in the most common and popular Readability Tools. Our study revealed some interesting findings that we are eager to share with you!
What does that mean? For optimal results aim for these readability scores using these Readability systems:
Flesch Kincaid Grade Level
Why? Because they lump college-level and higher scores into grade 12 which creates highly skewed data!
A readability score is simply a number assigned a piece of content that reflects how difficult it is to understand. Various systems have been developed for this purpose, and the ones we used for this study include:
Each of these systems uses an empirical approach to calculate a score, incorporating one or more of these metrics:
In general, a readability score can be understood as the number of years of education required to comprehend a piece of written content. However, this does not apply to the Dale-Chall and Flesch Reading Ease methods. According to Wikipedia, scores from these two systems can be converted to grade equivalents, as demonstrated in Table 1 and Table 2.
The first four readability systems employ more sophisticated approaches than the latter four, which do not take into account exceptions for common “complex” words. A complex word typically consists of three or more syllables (except for FORCAST, which considers two-syllable words).
Why is this important? Factoring in common words with complex words is crucial, as it can adversely affect score accuracy. Words that should be excluded from the category of complex words include:
Furthermore, the constants used in the formulas of these systems result in scores being effectively capped at 12, even though they should theoretically range up to 20. These two factors fundamentally influence the performance of the two groups, which is why we have chosen to categorize them separately.
Still with us? Let’s dig into the origin of our dataset.
Our dataset comprises 13,582 entries categorized by keyword, Google rank, URL, and scores calculated using the 8 different readability methods mentioned above. We collected keyword, rank, and URL data using popular organic research tools, and analyzed the content from each website with Originality.AI's Text Readability Analyzer. We reduced our initial 20,000 entries to 13,582, as some website content was too brief for accurate evaluation.
Interpreting our data posed several challenges, such as the varying scales of different readability metrics, which range from 0 to 10, 20, or even 100.
We encountered a mix of normally distributed data and highly skewed data. Furthermore, while a high score for Flesch Reading Ease is preferred, lower values indicate easier comprehension for all other methods.In this study, we divided the eight readability systems into two groups of four. Figures 1 and 2 demonstrate the rationale behind this division: one group exhibits normal distributions, while the other group's distributions are highly skewed.
The FORCAST, Gunning-Fog, Flesch Reading Ease, and Dale-Chall methods exhibit normal distributions, low skewness, and narrow standard deviations. Although these readability methods provide high-quality data, we found no significant correlation to Google rank (as seen in Figure 3). Our attempts to divide the data at the mean to determine if a correlation towards the mean exists yielded inconclusive results.
What does that mean? Even though inconclusive, this data was noteworthy in our analysis and forms the foundation for our most crucial finding: that the highest-ranking pages on Google share similar readability levels.
The following four readability methods—SMOG, Coleman-Liau, Flesch-Kincaid Grade, and Automated Readability—depicted in Figure 2, exhibit distinctly different characteristics.
Each has a highly skewed data distribution, and their standard deviations are even narrower than the first group's. While these data also revealed no significant correlation to Google rank, they strongly suggest that the majority of the content we analyzed corresponds to a 12th-grade reading level.
As mentioned earlier, there appears to be a mathematical artifact affecting the maximum scale of these four methods.
We believe that the way their formulas are constructed effectively caps their maximum range, grouping difficult-to-read material into the 12th-grade level, which prevents these methods from generating normally distributed data.
When we couldn't find any significant correlations in our data, we decided to adopt a "Moneyball" strategy. Instead of attempting to identify the reasons behind the results, we focused on analyzing the patterns we observed. In the end, it is more important to recognize the characteristics of an MVP launch than it is to determine how each contributes to their MVP status. After all, if you can spot the winner, does it really matter why they win?
To enable accurate comparisons, we normalized each readability method to its grade-level equivalent, as demonstrated in Figure 3. Except for the Dale-Chall method, the scores from the other methods concur that content should be aimed at a 12th-grade or college reading comprehension level. Of course, it is essential to tailor your content to your target audience, but our data suggests that this level is the optimal target unless you are focusing on a specific demographic.
Suggested readability scores from various sources and tools consistently agree on easier readability levels than this study has found.
For example, readable.com recommends these readability scores: FORCAST grade of 9-10, Gunning Fog score of 8, and Flesch Reading Ease of 60 - 70. These scores are significantly lower than those we observed in online content that is ranking well on Google Search Engines.
We advise aiming for the scores supported by the hard data presented in this article and study— a grade 12 reading level— rather than adhering to conventional wisdom that is suggested by popular tools.
Our findings from our Data Analysis deviate quite a bit from the recommendations of numerous readability methods that have been adopted by some of the most popular Readability tools out there.
Although it is commonly accepted that content should target readers at a 10th-grade education level, our data consistently reveals that the mean readability grade is 12 or higher in nearly every case.
How can these well-established and widely accepted readability metrics be inaccurate and just plain wrong in determining the ideal scores for online content with relation to their ranking capabilities?
One possible explanation is that these readability systems, originally developed between the 1940s and 1970s, may no longer accurately represent current literacy levels. IE they are outdated and no longer accurate.
Alternatively, it could be that the internet audience is not as diverse as the global population and may self-select readers with above-average reading abilities.
Walk Away With These Key Takeaways
Originality.AI’s Readability Feature was developed with the intention of equipping both writers and publishers with a superior, modern and up to date tool that will help them to create and publish content that scores optimally based on the correct readability criteria proven by their data analysis study.
With its new technology, the Originality.AI tool is able to analyze a piece of content and provide a content score as well as suggestive guidelines th its true ability and likelihood of ranking on Google Search Engines.
Notable Features
A key feature of the Readability Tool is its ability to make recommendations based on various readability systems:
The Gunning Fog Index is a readability metric that accounts for sentence length and the number of complex words.
The Flesch-Kincaid Reading Ease formula is designed to assess the readability of a text by examining the average sentence length and syllables per word. Higher scores indicate easier to read text.
The FORCAST grade level is for technical documents. The formula measures text readability based on the frequency of single syllable words.
The Dale-Challl Readability Grade is the test for measuring readability using words familiar to a fourth grader. The lower the score the more readable.
We hope that this feature will help inform your publishing decisions around the readability of your content and that you will find the recommendations with regards to each Readability system useful. This Feature is now available on the Core APP and the API and is complimentary for those who are scanning content for plagiarism.
Not a customer yet? TRY ORIGINALITY.AI
Here is a list of all readability tests.