Recently, author Janelle Shane published a case study on AI Weirdness highlighting AI detectors like Originality.AI and how they misclassify human-written text as AI-written, particularly when the text is written by non-native speakers. This was based on a Data Study conducted by university students.
First off- We welcome any/all AI detection accuracy tests to further help educate users on the limitations of AI detectors and how to properly use them.
This article digs into the study and updates it based on an analysis of the very same data against Originality.AI’s latest version ( 1.4). The results are clear- Originality 1.4 Is by far the most accurate detector and had the fewest false positives out of all AI Content Detectors.
Taking a step back, this article is meant to take a deeper look at false positives and their inherent bias against English-as-a-second-language students and issues with academic integrity. False positives do happen and it is an alarming claim that they may impact some students more than others, which is something that should be investigated.
The author of this article compares her own writing along with flowery Seuss-style poetry alongside a study that compared standard U.S. 8th grader essays with TOEFL (Teaching of English as a Foreign Language) test results when run through a variety of AI detection platforms, including Originality.AI.
She mentioned that AI detectors had flagged her own writing from her book as “ very likely to be written by AI, but when we ran that same excerpt and scanned it through Originality.AI’s platform, it came back as 99% Original. You can see the scan with the excerpt here: https://app.originality.ai/share/ciodqj5pt6g2uyz0
Next, she took her own writing and had it paraphrased by an AI tool. She claimed that when she ran it through an AI content detector that it “was now rated likely to be written entirely by a human". Again, we took that excerpt and ran it through Originality.AI’s platform and this was flagged correctly as 100% AI. See report here: https://app.originality.ai/share/s0xdvuajpqcnw2fi
Lastly, she said that she also got a likelihood that it was written entirely by human rating if she had paraphrasing tools to rewrite the paragraphs in the metered rhyme of Dr Seuss, in Old English, or as a startup pitch. We tested this again against Originality.AI, which again, classified it correctly as AI generated. See report here: https://app.originality.ai/share/zy3jw4rfkqeg8txa
These reports show that the updated version of Originality.AI (1.4) accurately classified sections of text that the author had previously received false positives on- showing large improvements.
AI content generation is, without a doubt disrupting and revolutionizing entire industries online. Many experts, professors and publishers rely on AI content detectors in order to distinguish between what is human-written and what’s written by AI both online and academically.
With the launch and ongoing development of groundbreaking tools like Originality.AI, there are going to be numerous case studies conducted to evaluate the efficacy of these types of tools – as there should be.
We will be showcasing our tool’s capabilities within publicly available data sets in the weeks to come.
However, the “Don’t Use AI Detectors for Anything Important” case study was based on version 1.1 of Originality.AI. We have made significant changes and are currently on version 1.4 of our tool.
We ran our tool against this exact data set and feel that the case study should be modified to reflect that the latest Originality.AI Model (1.4) detected AI-Written Content at 100% For All AI Data Sets.
Had Our Latest Model Been Used for the Case Study, The Charts Would have looked a bit ( A LOT) different:
It’s also important to note the differences in false positives in terms of marketing versus academic use. We have repeatedly emphasized that Originality.AI is not for academic use.
The Data that our AI has been trained on is closely tied to online content for the purpose of ranking on search engines- NOT academic papers. On our signup page it specifically says that it is built for publishers, agencies and writers, not for students….If we deploy an academic focused solution we will train our AI detection on more TOEFL essays to avoid this problem. But for now our stance is that we are not for academic use and are built specifically for serious content marketers and SEOs.
As one of the most popular AI writing detection services, we are continually developing our services to have the lowest false positive detection rate of any AI content detector. With our most recent update, we now have this number at less than 2.5%. See the study and comparison with other tools along with our latest GPT-4-trained detection model by clicking here.
When it comes to non-native English speaker texts being erroneously flagged as false positives, we’re looking deeply into the causes and correlations to find the underlying causes. One of the theses we are investigating is what we call “cyborg writing”.
Cyborg writing happens when a writer uses too many writing assistant tools (many of which are powered by AI). For example, if a writer uses autocorrect while relying extensively on a grammar tool and runs their content through an outlining or content optimization tool, all of these tools leverage AI to some extent and that could be the underlying reason for these false positives.
Even if the student or content creator writes the words themselves, submitting it through different degrees of filtering and assistance can leave tell-tale AI “tracks” that systems trained to detect these tracks (like Originality.AI) will pick up.
But is this more of an issue among non-native English speakers? Or is it simply a more nuanced question of “What level of computer-aided assistance is allowed before something no longer becomes a writer’s original work?” We believe there is no easy, one-size-fits-all answer to this and that the answer will depend on the situation.
For this reason, even over the long-term, AI detectors will never be able to provide a perfectly clean 100% solid track record with zero false positives. However, being able to combine AI detection with the ability to visualize the content creation process is one of the main reasons why we built our free Google Chrome AI Detection Extension. This allows someone to see the creation of a Google Doc in order to prove that the writer did indeed create the content, rather than copying and pasting it from ChatGPT or another AI writing service.
With all of these points in mind, and considering that the 1.4 version of Originality.AI accurately predicted AI-written (and human-written) content at nearly 100% across all data sets, we would like to invite the creators of this case study to rerun their data on our updated version and see the results firsthand for themselves.
We analyzed almost 2,000 Amazon product reviews to answer that age-old question: ‘Did an AI write that 5-star review? ’Specifically, we investigated the growing trend of AI-generated content in these reviews, its relationship with known quantities, and its ultimate influence on shopping decisions.