We looked at 26,000 Amazon product reviews to answer the question: ‘Did an AI write that review?’
Over 26,000 Amazon product reviews were randomly selected to generate the source dataset. These records were cleaned using standard processes, which included removing all customer-sensitive and identifying information. A statistically representative subset of 2,000 records was processed with Originality.AI’s AI Detection API. The most accurate AI detector to analyze text data for the likelihood of AI Content and returns a probability score of 0 to 1.
AI-enhanced text generators pre-existed Chat GPT. GPT-3 was released in June 2020, quickly followed by the writing aid tools that leveraged it. However, since the launch of Chat GPT, text generators progressed from limited, niche applications to more widespread usage. Trend analysis of the dataset indicates that on average, the number of product reviews with 50% or more AI Content has grown since the average annual value of 0.02 in 2022. As at publishing date, this value has increased by approximately 400%. There are no signs of slowing down.
There are two kinds of reviews: The extreme 1-ers and 5-ers who either love or hate a product passionately and want everyone to know this. Or the Chill Trio – 2s, 3s, and 4s. As it turns out, they’re Chill in more ways than one as these reviews appear less likely to rely on AI text generators. According to our findings, AI content is approximately 1.3 times more likely to be detected in Extreme reviews than Moderate reviews.
There are some bright spots. Analysis shows that Verified reviewers are roughly 1.4 times less likely to be AI generated than non-verified reviewers. Amazon.com values the integrity of the review process and goes to great lengths to safeguard it. However, not all ecommerce websites have Amazon’s resources.
Another optimistic discovery came from analyzing the helpfulness score of a review (eg. ‘20 people found this review helpful’) that is used to rank reviews. It appears that people are more likely to be instinctively biased against AI content. Or to be precise, there is -0.06 correlation on average between standardized helpfulness and likelihood of AI Content. And yes, that’s supposed to be the good news. The bad news is that this figure doesn’t factor how potential bot-voting skews that statistic or if reviews that have been around longer get more votes.
Further analysis is still being done. It will take some time to build a large enough dataset to answer all the burning questions with statistical confidence. But this is enough to get us thinking. There’s nothing wrong with the use of AI Content, especially in a free public service like sharing your genuine opinion on a product. But that’s assuming that all those reviews are free and genuine, and not… 😷motivated by vested interests, competition, and paid reviews 😷.
Finally, ask yourself: would it make a difference to your purchasing decisions if you knew that the review that persuaded you to click on ‘Add to Cart’ came from a heartfelt human anecdote… or from the articulate logic of a well-trained machine?
The Github repository of the notebooks, source codes and the cleaned datasets for these experiments are available here.
If you have any questions regarding this study please contact us.
Have you seen a thought leadership LinkedIn post and wondered if it was AI-generated or human-written? In this study, we looked at the impact of ChatGPT and generative AI tools on the volume of AI content that is being published on LinkedIn. These are our findings.
We believe that it is crucial for AI content detectors reported accuracy to be open, transparent, and accountable. The reality is, each person seeking AI-detection services deserves to know which detector is the most accurate for their specific use case.