We analyzed 187,000 reviews about 15,231 companies at G2, a site for reviews about B2B software. Our AI detector examined the text of reviews to find how much content was likely generated by humans or AI.
More than 26% of Reviews on G2, since the launch of ChatGPT, are suspected of being AI generated
Our AI detector examined the text of reviews to find how many of them were likely generated by humans or AI. We calculated what percent of reviews posted each month were AI-generated and tracked how that percentage changed over time. The amount of AI-generated reviews grew year after year as GPT-2 and GPT-3 were introduced. Then there was an immediate spike when ChatGPT became available on November 30, 2022, much like what we found in our study of Capterra, a similar software-review site.
The first 11 months of 2002 averaged 15.2% of reviews being from AI, while December 2022 through September 2023 averaged 25.6%, an increase of 68%.
Note that the ~5% of reviews that were identified as AI generated before 2020 represent false positives that occur with AI detectors. Reference our accuracy study for more details - https://originality.ai/ai-content-detection-accuracy/
High Ratings of 4 to 5 Were 1.7 Times More Likely to Be AI-Generated
The rate of AI-generated reviews differed by the rating given to the product. G2 ratings are on a scale of 0 to 5. We found that 26.93% of reviews with a rating of 4, 4.5, or 5 were written by AI, and that 15.98% of reviews with ratings of 0. 0.5, or 1 were AI-generated after the launch of ChatGPT. This difference made higher ratings 1.7 times more likely to have AI-generated reviews than lower ratings.
Anonymous, “Verified User” Reviews Were 20% Less Likely to Be AI-Generated
Along with the difference by product rating, there was also a difference in the percentage of reviews that appear to be AI-generated depending on how the author was identified. G2 allows reviews to be posted with the user’s name visible or anonymously as a “Verified User” with the author’s job field indicated, such as marketing or computer software. The verified reviews were less likely to be AI-generated than those with names attached, at 13.1% vs. 16.3% over the past six years, but the difference widened to 21.8% vs. 27.5% once ChatGPT launched. The anonymous reviews also had slightly lower ratings, with an average of 3.99 compared to 4.34 out of 5 for named reviewers.
2023 overall has seen a rapid increase in the share of reviews that are AI generated at G2, averaging 26% for the year so far with the high mark at 34.6% in June. However, beginning in August, the monthly averages hover around 19%. It remains to be seen if this lower average will become the norm. Possible explanations are that fewer AI-generated reviews are being submitted to the site or that the site has begun using AI-detection tools or other procedures to curb how many get published, which could mirror what we found in our study of another software-review site, TrustRadius.
Using AI to generate text for reviews presents challenges for everyone trying to use those reviews to make decisions, whether for personal or business uses. How do you feel about a review that was AI-generated? There could be times when it could be appropriate to use ChatGPT to improve readability, but is that what 26% of G2 reviewers have been doing since the generative AI became public? Do you want to feel like you’re in a Turing test every time you read a review? The findings in this study show how detecting AI can change how we use sites like G2 and rate their usefulness.
If you have any questions about this study, please contact us.
Have you seen a thought leadership LinkedIn post and wondered if it was AI-generated or human-written? In this study, we looked at the impact of ChatGPT and generative AI tools on the volume of AI content that is being published on LinkedIn. These are our findings.
We believe that it is crucial for AI content detectors reported accuracy to be open, transparent, and accountable. The reality is, each person seeking AI-detection services deserves to know which detector is the most accurate for their specific use case.