At Originality.ai we believe in the ethical development (study looking at how to block AI bots) and ethical use of generative AI using our AI detector that can identify if text was created by ChatGPT or another AI writing tool.
We are reporting on how AI is impacting the online world, tracking how much AI is in Google's search results and the study below does a deep dive into the March 5th 2024 update.
On March 5th Google announced a massive update would be occurring that would aim to reduce unhelpful content by 40%. This update started with a significant round of manual actions being applied to sites and having them be completely deindexed. These manual actions occured in conjunction with this update but are separate from the algorithm update that is to be rolled out over 2 to 4 weeks.
We have completed 2 significant studies in the last 24hrs to help people understand these updates and both are included here.
In this post we are going to cover…
To see the methodology see each section below.
If you have any questions, want access to the data or are looking to extend this research reach out to jon@originality.ai
This update from Google was announced on March 5th is aiming to penalize…
There has been more communication from Google of this update in its first days than most other similar updates.
It is important to note again that manual actions are not the same as the algorithm updates that will also be occurring.
If Google has identified a site that does not meet its guidelines it can apply a “manual action” and completely remove it from its search results (aka deindex the website).
Sites started receiving an increasing number of these notifications on March 5th in their Google Search Console Manual Action dashboard…
The consequence of these manual actions appears to be a complete removal from Google's search results.
Now what do we know about the sites that were impacted…
In an attempt to better understand the extent of this manual action effort we have completed a study to identify content websites that have been deindexed by Google who had until very recently had Google Organic traffic.
This study focuses on Content first websites not ECommerce or other types of sites.
Here is how we completed this study.
Summary: Identified a list of 79k websites that are a better-than-average reflection of the internet, checked if they are currently indexed and if they were not indexed, we checked 2 sources (AHrefs and SImilarWeb) to verify if they recently had organic traffic.
If a website recently had organic traffic (February) but is now not indexed in Google, we assume a March 5th Update Manual Action was applied.
If you want access to the dataset of the deindexed websites including ahrefs and similar web data please reach out to Jon@originality.ai
Many media outlets were quick to jump to the conclusion that this update is aimed at squashing AI spam in Google's search results…
Many SEO’s on X agree…
But using our AI Checker we wanted to do a more rigorous analysis.
We looked at 100 recent articles for each of the deindexed sites that have already been publicly shared to see the prevalence of AI Content on the sites that received a manual penalty.
The short answer is that yes… after analyzing 200 sites and over 40k URLs with our AI checker it is clear that the vast majority of sites which received a manual action were likely using AI content.
With the March 5th update Google deindexed almost 2% of all sites on popular advertising platforms like MediaVine, Ezoic and Raptive. Some of these platforms like MediaVine have taken a proactive No AI Content stance immediately after these manual actions (source).
https://www.mediavine.com/ai-and-our-commitment-to-a-creator-first-future/
At the time of the update we analyzed the 14 publicly revealed websites and identified that all of them had some amount of AI content on them. This got a lot of discussion going…
But… we wanted to do a more thorough analysis and go WAY deeper than just the 14 sites that had already been revealed at the time.
The vast majority of the sites deindexed appear to have had some AI content published on them. Using the AI threshold as a conservative 5% (false positive rate is < 3% in almost all datasets tested).
Some sites seem to have taken a mixed approach to both AI and Human written content while some of the sites were clearly showing nothing but AI generated content.
51 of the sites or ~30% of the sites were pure AI generated content.
But these findings could likely be a sampling bias since the list of sites was taken from MediaVine, Raptive and EZoic (all ad networks popular amongst web publishers).
These findings would not lead us to blame AI content for the sites being deindexed IF this amount of AI content was consistent across the rest of Google’s SERPs.
We have an ongoing study looking at the content for a webpage in the top 20 search results for 500 different keywords dating back 60+ months.
The risk of AI-generated spam overwhelming Googles search results is an existential threat to Google. This seems to be a clear attempt by Google to not just punish but also make a statement about their view on AI generated spam.