We Have 99% Accuracy in Detecting AI: Originality.ai Study

We believe that it is crucial for AI content detectors reported accuracy to be open, transparent, and accountable. The reality is, each person seeking AI-detection services deserves to know which detector is the most accurate for their specific use case.

December 7, 2025

The world needs reliable AI detection tools; however, no AI detection tool is ever going to be 100% perfect.

It’s important to understand their limitations so that you can use them responsibly.

What does this mean for developers of AI detectors? They should be as transparent as possible about the capabilities and limitations of their detectors.

At Originality.ai, we believe that transparency is a top priority.

So, below we’ve included our analysis of Originality.ai’s AI detector efficacy, including accuracy data and false positive rates.

Then, to review third-party data on Originality.ai's AI detector accuracy, see this meta-analysis of multiple academic studies on AI text detection.

Try the patented Originality.ai AI Detector for free today!

AI Detection Accuracy Testing Standard

We are proposing a standard for testing AI detector effectiveness and AI detector accuracy, along with the release of an Open Source tool to help increase the transparency and accountability of all AI content detectors.

We hope to achieve this idealistic goal by…

Open-sourcing a research tool we developed to assist anyone (researcher, journalist, customer, or other AI detector) in testing multiple AI detectors. An even easier-to-use open-source AI detector efficacy tool here.
Providing detailed instructions (including the calculation in the tool) to help identify the most important AI vs Original Human classifier efficacy metrics.
Transparently reporting our own tool’s accuracy on multiple publicly available datasets.

Interested in evaluating an AI content detector's potential use case for your organization? Then, this article is for you.

In This AI Detector Accuracy Guide:

How AI detectors work
How to calculate an AI detector's effectiveness
How to complete your own tests (using one of the open-sourced tools we provide)
What we think should and should not be considered AI content
How accurate our AI content detector is based on the testing we have done
If you can trust our AI detector's effectiveness
An overview of 3rd party studies on AI detection accuracy

Have a question, suggestion, research question, or commercial use cases? Please contact us.

Key Takeaways (TL;DR)

Originality.ai Launches the Next Generation of AI Detection Models in September 2025:
- Lite Version 1.0.2: Notable 99% accuracy on leading flagship AI models (OpenAI, Gemini, Claude, and DeepSeek). Continues to maintain low false positive rates of 0.5%.
- Turbo 3.0.2: Exceptional 99%+ accuracy on leading flagship AI models (OpenAI, Gemini, Claude, and DeepSeek). Robust capabilities in identifying humanized content, up to 97% accuracy. Reduced false positive rates to 1.5%.
- Academic 0.0.5 (New): All-new academic model with 99%+ accuracy and <1% false positives. Ideal for teachers and students.
Originality.ai Launches Lite Version 1.0.1 in June 2025.
- Improved accuracy on leading flagship AI models.
- New, robust capabilities in identifying content from the latest AI Humanizer tools.
- Maintains an exceptionally low false positive rate.
- Allows for light AI editing. We define light AI editing as around 5% of characters changed by an AI editor.
Originality expands our Multilingual AI Detector to 30 Languages in May 2025! Learn more about our Multilingual AI Detector.
Originality.ai Launches Version 3.0.1 Turbo in October 2024, resulting in an improvement on the most challenging dataset created from the newest LLMs.
Originality.ai Lite vs. Turbo vs. Academic Use Cases - Quick Overview:
- Turbo 3.0.2: If your risk tolerance for AI is ZERO! It is designed to identify any use of AI, even light AI editing.
- Lite 1.0.2: If you want to minimize false positives and are okay with light AI editing.
- Academic 0.0.5: Ideal for detecting AI-generated text in academic writing with low false positives.

Across all tests, Originality.ai has increased its accuracy, further establishing Originality.ai as the most accurate AI checker.

Why Are AI Detectors Essential in 2025?

Originality.ai offers the most accurate AI detector — so what?

Before diving into our accuracy rates, let’s first review why AI detectors are important — or rather essential — in 2025, starting with a challenge to OpenAI’s stance on AI detection.

Do AI Detectors Work? OpenAI Says No???

‍In July 2023, OpenAI released an announcement that suggested AI detectors don’t work when it shut down its own detection tool.

So, do AI detectors work? OpenAI Says No.

However, oversimplistic views that “AI detectors are perfect” or “AI detectors don't work” are equally problematic.

We still have an offer to OpenAI (or anyone willing to take us up on it) to back up their claim that AI detectors don't work with proceeds sent to charity. Learn more here.

Societal Impacts of Undetectable AI-Generated Content Are Real

AI Content Detectors need to be a part of the solution to undetectable AI-generated content.

The current unsupported AI detection accuracy claims and research papers that have tackled this problem are simply not good enough in the face of the societal risks LLM-generated content poses.

Here are some real-life scenarios when AI can pose significant problems:

Mass Propaganda
Fake News
- AI-Generated Book List Scandal
Toxic Spam
Academic Dishonesty / AI Plagiarism
Hallucinations
- AI Hallucinations or Factual Errors Cause Serious Problems
Cheating Writers
Cheating Agencies
Fake Product Reviews
- See a full list of our AI Studies and coverage of fake AI reviews
Fake Job Applications
Fake University Application Essays
Fake Scholarship Applications

Societal Impacts of Undetectable AI-Generated Content Are Real

Not to mention that multiple third-party studies have found that humans struggle to identify AI-generated content.

SEO Implications of AI Content in Google

Then, there are also implications for SEOs and marketers.

AI Content is rising in Google, which presents a number of challenges. So, we created a Live Dashboard to monitor AI in Google Search Results.

Google can detect and does penalize AI content, and it's already happening via manual updates and Google Algorithm updates. Check out our study on Google AI Penalties.

Not to mention that in 2025, Google released updated Search Quality Rater Guidelines stating:

“The Lowest rating applies if all or almost all of the MC on the page (including text, images, audio, videos, etc) is copied, paraphrased, embedded, auto or AI generated, or reposted from other sources with little to no effort, little to no originality, and little to no added value for visitors to the website.” - Source: Google

FTC Warns Against Unsupported AI Content Detection Accuracy Claims

Claimed accuracy rates with no supporting studies are clearly a problem.

We hope the days of AI detection tools claiming 99%+ accuracy with no data to support it are over. A single number is not good enough in the face of the societal problems AI content can produce, and the important role AI content detectors have to play.

The FTC has come out on multiple occasions to warn against tools claiming AI detection accuracy or unsubstantiated AI efficacy.

In 2025, the FTC addressed misleading accuracy claims from one company offering AI detection without the data to back it up:

“The order settles allegations that Workado [Content at Scale now BrandWell] promoted its AI Content Detector as “98 percent” accurate in detecting whether text was written by AI or human. But independent testing showed the accuracy rate on general-purpose content was just 53 percent, according to the FTC’s administrative complaint. The FTC alleges that Workado violated the FTC Act because the “98 percent” claim was false, misleading, or non-substantiated.” - Source: FTC

“If you’re selling a tool that purports to detect generative AI content, make sure that your claims accurately reflect the tool’s abilities and limitations.” source (page since removed from the FTC)

“you can’t assume perfection from automated detection tools. Please keep that principle in mind when making or seeing claims that a tool can reliably detect if content is AI-generated.” source (page since removed from the FTC)

“Marketers should know that — for FTC enforcement purposes — false or unsubstantiated claims about a product’s efficacy are our bread and butter” source (page since removed from the FTC)

We fully agree with the FTC on this and have provided the tool needed for others to replicate similar accuracy studies for themselves.

The misunderstanding of how to detect AI-generated content has already caused a significant amount of pain, including a professor who incorrectly failed an entire class.

So, we created this guide and tools, because we believe…

In the transparent and accountable development and use of AI.
That AI detectors have a role to play in mitigating the potential negative societal impacts of generative AI.

AI detection tools' “accuracy” should be communicated with the same transparency and accountability that we want to see in AI’s development and use. Our hope is that this study will move us all closer to that ideal.

At Originality.ai, we aren’t for or against AI-generated content… but believe in transparency and accountability in its development, use, and detection.

Originality.ai helps ensure there is trust in the originality of the content being produced by writers, students, job applicants or journalists.

Pro Tip: Scanning high volumes of content for AI? Check out our Bulk Scan feature.

Originality.ai Version History:

Along with this study, we are releasing the latest version of our AI content detector. Below is our release history.

1.1 – Nov 2022 BETA (released before Chat-GPT)

GPT-2, GPT-NEO, GPT-J, and GPT-3 accurate detection. But was able to be “tricked” with Paraphrasing
First GPT-3 trained detector
First commercially available AI detector

1.4 – Apr 2023

Improved ChatGPT detection
Accurate on GPT4-generated content
Only tool capable of accurately detecting paraphrased content.
Reduced the number of false positives with increased training on human-generated content

2.0 Standard — Aug 2023

Reduced False Positives
Improved Accuracy on the Hardest to Detect AI Content (GPT4, ChatGPT & Paraphrased)
Release of Open Source Benchmark Dataset.
Release of Open Source AI Detection Efficacy Testing Tool(s).
Between 1.4 and 2.0 there were many models that our team built, which slightly increased AI detection capabilities, but we were not going to release a model until it materially reduced false positives.
September 2024 Update: This model has been retired from our platform. Sign up to try out Lite, our latest model and your top pick!

3.0 Turbo — Feb 2024

Trained on the newest LLMs (Grok, Mixtral, GPT-4 Turbo, Gemini, Claude 2)
Accuracy increased on our toughest testing dataset from 90.2% to 98.8%
False Positives have been slightly improved from 2.9% to 2.8%

Even easier-to-use Open Source AI detection efficacy research tool released.

2.0.1 Standard (BETA) — July 2024

Improved version of the flagship 2.0.0 Standard model.
We’re releasing this version in BETA testing.
September 2024 Update: Thank you for your feedback! BETA testing has now concluded, and with Lite being your top choice, this model has been retired.

1.0.0 Lite — July 2024

Highly accurate with 98% accuracy in detecting AI content.
An under 1% false positive rate.
Allows for lightly AI-edited content (like Grammarly’s grammar and spelling suggestions) while still differentiating between light AI editing and fully generated AI content.

3.0.1 Turbo — October 2024

Highly accurate with 99%+ accuracy in detecting AI content.
An under 3% false positive rate.
Best for use cases where there’s a 0 tolerance policy for AI content.
Robust against bypassing methods — extremely challenging to bypass.

Multilingual 2.0.0 — May 2025

The Originality.ai Multi Language 2.0.0 model now supports 30 languages!
Notable improvements with an overall accuracy of 97.8%
Reduced false negatives to 1.99% and lowered false positive rate to 2.4%
Multilingual Accuracy Study

1.0.1 Lite — June 2025

Improved 99%+ accuracy on all leading flagship AI models from OpenAI, Gemini, Claude and Deepseek.
New, more robust capabilities in identifying content from the latest AI Humanizer tools.
Maintains an exceptionally low false positive rate, now 0.5%.

1.0.2 Lite — September 2025

Exceptional 99% accuracy on leading flagship AI models from OpenAI, Anthropic/Claude, Gemini, and DeepSeek.
Continues to maintain low false positive rates of 0.5% to recognize human-written content as human.

3.0.2 Turbo — September 2025

Notable 99%+ accuracy on leading flagship AI models from OpenAI, Anthropic/Claude, Gemini, and DeepSeek.
Improved resistance, with up to 97% accuracy on the latest AI humanizers and AI bypassers.
Even lower false positive rate of 1.5%.

0.0.5 Academic — September 2025

New, highly accurate model with 99%+ accuracy, built for educators.
Low false positive rates of <1% to support students and teachers.
A best-in-class academic AI detector. Use with our Free Chrome extension to prove your writing is yours.

Basic Explanation of How Our AI Detector Works

Our AI detector works by leveraging supervised learning of a carefully fine-tuned large AI language model.

We use a large language model (LLM) and then feed this model millions of carefully selected records of known AI and known human content. It has learned to recognize patterns between the two.

More details on our AI content detection.

How AI Content Detectors Work:

Below is a brief summary of the 3 general approaches that an AI detector (or called in Machine Learning speak a “classifier”) can use to distinguish between AI-generated and human-generated text.

1. Feature-Based Approach:

The feature-based approach uses the fact that there can potentially be consistently identifiable and known differences that exist in all text generated by an LLM like ChatGPT when compared to human text. Some of these features that tools look to use are explained below.

Burstiness

Burstiness in text refers to the tendency of certain words to appear in clusters or "bursts" rather than being evenly distributed throughout a document.

AI-generated text can potentially have more predictability (less burstiness) since AI models tend to reuse certain words or phrases more often than a human writer would.

Some tools attempt to identify AI text using burstiness (more burstiness = human, less burstiness = AI).

‍Perplexity

Perplexity is a measure of how well a probability model predicts the next word. In the context of text analysis, it quantifies the uncertainty of a language model by calculating the likelihood of the model producing a given text.

Lower perplexity means that the model is less surprised by the text, indicating the text was more likely AI-generated. High perplexity scores can indicate human-generated text.

Frequency Features

Frequency features refer to the count of how often certain words, phrases, or types of words (like nouns, verbs, etc.) appear in a text. For example, AI generation might overuse certain words, underuse others, or use certain types of words at rates that are inconsistent with human writing. These features might be able to help detect AI-generated text.

Learn about the most commonly used ChatGPT words and phrases, as well as obvious ChatGPT sayings.

Readability or Fluency Features

Studies have shown that earlier (ie 2019) LLMs would generate text that has similar readability scores.

Punctuation

This pertains to the use and distribution of various punctuation marks in a text. AI-generated text often exhibits correct and potentially predictable use of punctuation.

For instance, it might use certain types of punctuation more often than a human writer would, or it might use punctuation in ways that are grammatically correct but stylistically unusual. By analyzing punctuation patterns, someone might attempt to create a detector that can predict AI-generated content.

Advantages vs. Disadvantages

Advantages: Once patterns are identified, they can be repeatedly identified in a very cost-effective and fast manner.
Disadvantages: Modern LLMs such as ChatGPT4 and Bard can produce varied enough content that these detectors can be bypassed with clever ChatGPT prompts.
Examples: GPTZero, Winston AI

2. Zero-Shot Approach:

A zero-shot approach uses a pre-trained language model to identify text generated by a model similar to itself. Basically, asking itself how likely the content the AI is seeing was generated by a similar version of itself (note: don’t try asking ChatGPT… it doesn’t work like that).

Advantages: Easier to build and does not require supervised training
Disadvantages: Susceptible to bypassing with paraphrasing
Examples: GPTZero, ZeroGPT

3. Fine-Tuning AI Model Approach:

A fine-tuning AI model approach uses a large language model such as BERT or RoBERTa and trains on a set of human and AI-generated text. It learns to identify the differences between the two in order to predict if the content is AI or Original.

Advantages: Can produce the most effective detection
Disadvantages: These can be more expensive to train and operate. They can also lag behind in detection capabilities for the newest AI tools until their training is updated.
Examples: Originality.ai AI Detector, OpenAI Text Classifier (taken offline)

The test below looks at the performance of multiple detectors using all of the strategies identified above.

AI Detector Accuracy Testing Plan:

This post covers the main and supporting tests that were all completed on the latest versions of the Originality.ai AI Content Detector.

What Is the Best Test? Use Your Own Data!

The dataset(s) provided might be applicable for your use case or potentially if you are evaluating AI detection tools' effectiveness for another type of content you will need to produce your own dataset.

Use our Open-Source Tool to make running your data and evaluating detectors' performance much easier.

Testing Method & New Open-Source Testing Tools:

To make the running of tests easy, repeatable and accurate, we created and decided to open-source our tools to help others do the same. The main tool allows you to enter the API key for multiple AI content detectors and plug in your own data to then receive not just the results from the tool but also a complete statistical analysis of the detection effectiveness calculations.

This tool makes it incredibly easy for you to run your test content against all AI content detectors that have an available API.

The reason we built and open-sourced this tool to run tests is so that we can increase the transparency into tests by…

Running all tests at basically the same time on the same day
Ensuring the exact same text with no difference in formatting is sent to each tool
Quickly testing datasets as they become available
Providing an opportunity for potential customers or researchers to test their own data and make an informed decision about which AI detector is ideal for their use case.

The speed at which new LLMs are launching and the speed AI detection is evolving means that accuracy studies, which take 4 months from test to publication, are hopelessly outdated.

Features of This Tool:

Free & Open Sourced
Able to Scan A Text Dataset With Multiple AI Detectors
Quickly Provides Results
Automatically Calculates Detector Efficacy Metrics (confusion matrix, accuracy, false positive rates, etc.)

Link to GitHub: https://github.com/OriginalityAI/AI-detector-research-tool

In addition to the tool mentioned above, we have provided three additional ways to easily run a dataset through our tool…

Our View On: AI Detectors Within Academia & False Positives in General

We do not believe that AI detection scores alone should be used for academic honesty purposes and disciplinary action.

The rate of false positives (even if low) is still too high to be relied upon for disciplinary action.

Here is a guide we created to help writers or students reduce false positives in AI content detector usage.

Plus, we created a free AI detector Chrome extension to help writers, editors, students, and teachers visualize the creation process and prove originality.

Our newly released Academic model is best for educators and academic settings, as it allows for light AI editing with popular tools like Grammarly (grammar and spelling suggestions) and is designed to be accurate on STEM-related content.

Learn more about Originality.ai for Education.

‍

How To Evaluate AI Detectors “Accuracy”:

Below are the best practices and methods used to evaluate the effectiveness of AI classifiers (i.e., AI content detectors). There is some nerdy data below, but if you are looking for even more info, here is a good primer for evaluating the performance of a classifier.

One single number related to a detector's effectiveness without additional context is useless!

Don’t trust a SINGLE “accuracy” number without additional context.

Here are the metrics we look at to evaluate a detector's efficacy…

Confusion Matrix

The confusion matrix and the F1 (more on it later) together are the most important measures we look at. In one image, you can quickly see the ability of an AI model to correctly identify both Original and AI-generated content.

True Positive (TP) – AI detector correctly identified content as AI.
False Negative (FN) – AI detector incorrectly identified AI content as Human.
False Positive (FP) – AI detector incorrectly identified human content as AI.
True Negative (TN) – AI detector correctly identified human content as AI.

*Version 1.4 Confusion Matrix on a GPT-4 & Human Dataset Test*

True Positive Rate — AI Text Detection Capabilities

Identifies AI content correctly x% of the time. True Positive Rate TPR (also known as sensitivity, hit rate or recall).

True Positive Rate TPR = TP / (TP & FN)

True Negative Rate — Human-Text Detection Capabilities:

Identifies human content correctly x% of the time. True Negative Rate TNR (also known as specificity or selectivity).

True Negative Rate TNR = TN / (TN & FP) = 1- FPR

Accuracy:

What % of your predictions were correct? Accuracy alone can provide a misleading number. This is in part why you should be skeptical of AI detectors' claimed “accuracy” numbers if they do not provide additional details for their accuracy numbers. The following metric is what we use, along with our open source tool to measure accuracy.

Accuracy = True / (True + False) = (TP + TN) / (TP + TN + FP + FN)

F1:

Combines Recall and Precision to create one measure to rank all detectors, often used when ranking multiple models. It calculates the harmonic mean of precision and sensitivity.

F1 = 2 x (PPV x TPR) / (PPV + TPR) where Precision (PPV) = TP / (TP + FP)

Metrics Considered but Not Used:

ROC & AUROC: Not used since we can't adjust the sensitivity of other tools and some tools do not provide a percentage.
Precision: PPV = TP / (TP + FP) - Not used

But… What Should Be Considered AI Content?

So, what should and should not be considered AI content? As “cyborg” writing combining humans and AI assistants rises, what should and shouldn’t be considered AI content is tricky!

Some studies have made some really strange decisions on what to claim as “ground truth” human or AI-generated content.

In fact, there was one study that used human-written text in multiple languages that was then translated (using AI tools) to English and called it “ground truth” Human content.

Source…

Description of Dataset:

Classifying the AI Translated Dataset (02-MT) as Human-written???

https://arxiv.org/pdf/2306.15666.pdf

We think this approach is crazy!

Our position is that if the effect of putting content into a machine is that the output from that machine is unrecognizable when comparing the two documents, then it should be the aim of an AI detector to identify the output text as AI-generated.

The alternative is that any content could be translated and presented as Original work since it would pass both AI and plagiarism detection.

What Does Light AI Editing Mean?

As the way people write evolves, there is an increased use of AI tools in research and editing.

First, let’s answer the question: “What is AI Editing?”

AI editing is the process of using an AI-powered tool as support to correct grammar, punctuation and spelling.

At Originality.ai, we offer an AI Grammar Checker to help you catch common errors like spelling mistakes, comma splices, or grammatical errors (like confusing when to use they’re vs. their).

However, where things get tricky with AI editing is when a tool offers AI-powered rephrasing features that effectively rewrite sentences for you. For instance, Grammarly, a popular writing tool, offers AI rephrasing that can trigger AI detection.

Here is how we classify light AI editing at Originality.ai:

As AI editing tools become increasingly popular, we’ve set targets to define AI editing:

5% AI editing, we aim to call it human.
- However, we have a 5% false positive rate at 5% AI editing
What if 10% of the text is AI-edited? We aim to call it human.
- However, we have a 10% false positive rate at 10% AI editing
Above 10% AI-editing? We aim to call the text AI.
Beyond 20% AI editing, our AI detector is highly accurate at identifying it as Likely AI.

What Should be Considered AI Writing?

Here is what we think should and should not be classified as AI-generated content:

AI-Generated and Not Edited = AI-Generated Text
AI-Generated and Human Edited = AI-Generated Text
Human Written and Heavily AI Edited = AI-Generated Text
AI Outline*, Human Written, and Heavily AI Edited = AI-Generated Text
Human Written and Lightly Edited with AI (~5% change) = Original Human-Generated
AI Research and Human Written = Original Human-Generated
Human Written and Human Edited = Original Human-Generated

*AI Outline is defined as using AI (an LLM) to create a content idea, do some research, and/or create an outline. The level at which AI is used during this process may vary and could potentially affect the likelihood the text is detected as AI or human.

Some journalists, such as Kristi Hines, have done a great job at trying to evaluate what AI content is and whether AI content detectors should be trusted by reviewing several studies - https://www.searchenginejournal.com/should-you-trust-an-ai-detector/491949/.

Review a meta-analysis of AI-detector accuracy studies for further insight into the efficacy of AI-detectors.

Originality.ai AI Detector Accuracy Tests

Lite 1.0.2 Tests

In September 2025, we released the updated, more robust Lite 1.0.2 AI detection model.

Why?

The rapid evolution of advanced AI language models, such as OpenAI's GPT series, Anthropic's Claude, and Google's Gemini, can produce increasingly human-like text.

At the same time, "AI Humanizer" tools — designed specifically to obfuscate AI-generated content and evade detection systems — are also increasing in popularity.

In response to these developments, we developed Lite 1.0.2, designed to accurately identify content generated by the latest AI models and humanizer tools, while maintaining a low false positive rate (ensuring that human-written content is not incorrectly flagged as AI-generated).

Lite AI Detection Accuracy Testing

99% accuracy across all leading flagship AI models from OpenAI, Gemini, Claude, and Deepseek‍
Maintains a lower 0.5% false positive rate (in comparison to Lite 1.0.0 = 0.73% FPR)

Lite AI Detection Accuracy Internal Benchmark
Dataset Name	TPR	FPR	FNR
Internal Benchmark: Recently Released Models	98.91%	0.52%	1.09%

Lite AI Detection Accuracy per Model

We evaluate our AI detection model on outputs from various state-of-the-art and widely used language models to assess its robustness across generations of AI systems.

This testing included accuracy evaluations with some of the latest AI models. Here’s a quick overview:

Lite 1.0.2 AI Detection Accuracy on the Latest AI Models
Model	Accuracy
GPT-5	99.6%
GPT-4.1-Nano	100%
GPT-4o	99.2%
GPT-4.1	99.8%
Claude Haiku	99.2%
Gemini 2.0 Flash	100%
DeepSeek V3.1 Chat	99.68%

‍

Lite AI Humanizer Testing

To further test the robustness of our detection model, we evaluate its performance against AI-generated content that has been deliberately modified using AI humanizer tools.

Most AI humanizer tools are designed to paraphrase or rephrase AI-written text in ways that make it more difficult for detection systems to identify. While some AI Humanizer tools like ours are designed to have the content sound more natural, but not to bypass detection.

Lite 1.0.2 AI Detection Accuracy on Popular AI Humanizers
AI Humanizer	Accuracy
Undetectable.ai	80.3%
Phrasly.ai	86.1%
Stealthwriter.ai	87.1%
Netus.ai	86.1%
Humbot.ai	81.2%

Turbo 3.0.2 Tests:

AI content continues to evolve rapidly, especially with the release of the GPT-5 family and the new DeepSeek reasoning models.

Academic and enterprise contexts require detectors that minimize false negatives (FN) to avoid missing AI-generated content, while keeping false positives (FP) low to protect human authors.

Turbo 3.0.2 is an upgraded model that significantly improves detection accuracy. It introduces enhanced defences against AI humanizer tools and demonstrates consistent robustness across public human-written datasets.

Turbo AI Detection Accuracy Testing

99%+ accuracy across all leading flagship AI models from OpenAI, Gemini, Claude, and Deepseek
Low 1.5% false positive rate
Robust against AI humanizer and bypasser tools with up to 97% accuracy.

Turbo AI Detection Accuracy Internal Benchmark
Dataset Name	TPR	FPR	FNR
Internal Benchmark: Recently Released Models	99.11%	1.1%	0.89%

Turbo AI Detection Accuracy per Model

As with our Lite model, we evaluate our Turbo AI detection model on outputs from the latest flagship LLMs.

This testing included accuracy evaluations with some of the latest AI models. Here’s a quick overview:

Turbo AI Detection Accuracy on the Latest AI Models
Model	Accuracy
GPT-5	99.8%
GPT-4.1-Nano	100%
GPT-4o	99.5%
GPT-4.1	99.9%
Claude Haiku	99.7%
Gemini 2.0 Flash	100%
DeepSeek V3.1 Chat	99.9%

Turbo 3.0.2 AI Humanizer Testing

Turbo 3.0.2 is robust in real-world adversarial conditions, delivering up to 97% accuracy against top AI humanizers and AI bypassers.

Turbo AI Detection Accuracy on Popular AI Humanizers
AI Humanizer	Accuracy
Netus.ai	97%
Undetectable.ai	92%
Phrasly.ai	92%
Stealthwriter.ai	94%
Humbot.ai	90%

‍

‍

Turbo 3.0.1: A Historical Look at Past Testing & Studies

See past studies on Turbo 3.0.1, as a historical look at some of our previous testing conducted as the latest AI models were released:

Turbo 3.0.1 Accuracy on the Latest AI Models
AI Model	Turbo 3.0.1 Accuracy
Claude 4 Sonnet and Opus	Claude 4 Sonnet = 98.4% accuracy Claude 4 Opus = 98.4% accuracy
GPT-4.5	GPT-4.5 = 95.5% accuracy
GPT-4.1	GPT-4.1 = 97.9% accuracy
Qwen2.5-Turbo	Qwen2.5-Turbo = 99.3% accuracy
Qwen2.5-Max	Qwen2.5-Max = 99.8% accuracy
DeepSeek	DeepSeek = 99.3% accuracy

‍

‍

Our newly released 3.0.2 Turbo model has continued to show exceptional accuracy and improvements, reflecting the dedicated work of our machine learning engineers to train our AI detection models and improve accuracy as new LLMs continue to be released.

Academic 0.0.5 Tests:

Academic AI Detection Accuracy Testing

99%+ accuracy across leading flagship AI models, OpenAI (GPT-5) and DeepSeek.
Low <1% false positive rate.
Robust against AI humanizer and bypasser tools with up to 92% accuracy.
Focused on academic content (STEM answers, including code and formulas)

Academic AI Detection Accuracy Internal Benchmark
Dataset Name	TPR	FPR	FNR
Internal Benchmark: (GPT-5 family, DeepSeek, Gemini-2.5)	99.77%	0.15%	0.23%

*Originality.ai* *Academic: 0.0.5 Confusion Matrix*

Academic AI Detection Accuracy per Model

As with our Lite and Turbo models, we evaluate our Academic AI detection model on outputs from the latest flagship LLMs.

This testing included accuracy evaluations with some of the latest AI models. Here’s a quick overview:

Academic AI Detection Accuracy on the Latest AI Models
Model	Accuracy
GPT-5	99.6%
DeepSeek V3.1 Chat	99.8%

‍

‍

Academic 0.0.5 AI Humanizer Testing

Despite the strong paraphrasing capabilities of AI humanizers, the Originality.ai Academic 0.0.5 Model delivered strong accuracy, up to 92% with top AI humanizers and AI bypassers.

Academic AI Detection Accuracy on Popular AI Humanizers
AI Humanizer	Accuracy
Netus.ai	92.1%
Undetectable.ai	82.4%
Phrasly.ai	89%
Stealthwriter.ai	89.1%
Humbot.ai	83.3%

‍

‍

Third-Party AI Detection Studies

Here are additional studies completed by 3rd parties and their findings showing Originality to be the most accurate…

Summaries of these studies: Meta-Analysis of AI Detection Accuracy

3rd Party AI Detection Studies
Study	Originality.ai’s Accuracy	Performance Highlights	Key Competitors
An Empirical Study of AI-Generated Text Detection Tools	97%	Highest true positives, Lowest false negatives	GPTZero, Writer
The Effectiveness of Software Designed to Detect AI-Generated Writing: A Comparison of 16 AI Text Detectors	97%	100% accuracy on GPT-3.5 and GPT-4 papers	Copyleaks, TurnItIn
RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors	85%	Most accurate across base and adversarial datasets, Exceptional performance on paraphrased content	Binoculars, FastDetectGPT
The great detectives: humans versus AI detectors in catching large language model-generated medical writing	100%	100% accuracy on ChatGPT-generated and AI-rephrased articles	ZeroGPT, GPT-2 Output Detector
Characterizing the Increase in AI Content Detection in Oncology Scientific Abstracts	96%	96% Accuracy for AI-generated (GPT-3.5, GPT-4) abstracts with over 95% sensitivity	GPTZero, Sapling
Students are using large language models and AI detectors can often detect their use	91%	Highest accuracy of 91% for Human vs AI and 82% for Human vs Disguised text	GPTZero, ZeroGPT, Winston
Exploring the Consequences of AI-Driven Academic Writing on Scholarly Practices	96.6%	Highest Mean Prediction Score of 96.5% for ChatGPT generated content and 96.7% for ChatGPT Revision of Human-authored content	ContentDetector.AI, ZeroGPT, GPTZero, Winston.ai
Recent Trend in Artificial Intelligence-Assisted Biomedical Publishing: A Quantitative Bibliometric Analysis	97.6% AUC	Excellent overall accuracy with an area under the receiver operating curve (AUC) of 97.6%.	Originality.ai, Copyleaks, Crossplag, GPT-2 Output Detector, GPT Zero, and Writer.
Comparative accuracy of AI-based plagiarism detection tools: an enhanced systematic review	98-100%	Near-perfect accuracy, demonstrating the highest overall accuracy of detectors studied.	Originality.ai, Turnitin AI, Sapling, and Winston AI (as well as: GPTZero, Copyleaks, ZeroGPT, Content at Scale, and GPT-2 Output Detector).
Using aggregated AI detector outcomes to eliminate false-positives in STEM-student writing	98%	Remarkable precision. Only 2% false positives and 2% false negatives, highlighting its superior reliability.	Originality.ai, Copyleaks, GPTZero, DetectGPT
The Arabic AI Fingerprint: Stylometric Analysis and Detection of Large Language Models Text	Perfect (100%) or near-perfect accuracy on AI academic abstract datasets	Outperformed the research’s own fine-tuned detectors. For OpenAI social posts, Originality.ai reached F1-scores over 99%.	XLM-RoBERTa (fine-tuned multilingual model, used in the research as the detection baseline)
The accuracy-bias trade-offs in AI text detection tools and their impact on fairness in scholarly publication.	Lite: Highest overall accuracy of 98.61%	Lite and Turbo each achieved 99.07% accuracy with a 0% False Negative Rate, for samples from non-native English-speaking authors.	GPTZero, ZeroGPT , and DetectGPT

‍

The end result?

Across both internal testing and third-party studies, we continue to outperform competitors as the Most Accurate AI Detector.

Complete List of All AI Content Detectors:

Below is a list of all AI content detectors and a link to a review of each. For a more thorough comparison of all AI detectors and their features, have a look at this post: Best AI Content Detection Tools

List of Tools:

As these tests have shown, not all tools are created equal! There have been many quickly created tools that simply use a popular Open Source GPT-2 detector.

Why Is Our Model More Accurate?

Below are a few of the main reasons we suspect Originality.ai’s AI detection performance and overall AI detector accuracy are significantly better than alternatives…

Larger Model: We suspect (can’t confirm) that we use a much larger model… there is no way we could offer a free or ad-supported option given our models' compute cost per scan.
‍
Train on the Latest LLMs: New AI models are continuously being released. We train our AI detector to identify the latest LLMs.
‍‍
Train on Harder Datasets: The datasets we continue to create and train our AI on focus on increasingly adversarial detection bypassing methods. The better our AI gets, the more clever the prompt engineering or playground settings need to be to bypass us, and then we train on that new, more challenging dataset.

Final Thoughts

The AI/ML team and core product team at Originality.ai have worked relentlessly to build and improve on the most effective AI content detector!

The Results…

Originality.ai Launches Lite 1.0.2

Lite 1.0.2 has high accuracy at 99%, with a 0.5% False Positive Rate
Allows for light AI editing (such as Grammarly suggestions)
Resistant to popular bypassers and humanizers

Originality.ai Launches Version 3.0.2 Turbo

Turbo 3.0.2 has 99%+ accuracy, with a 1.5% false positive rate.
When your tolerance for AI in writing is near 0, then use Version 3.0.2 Turbo
Highly resistant to popular AI humanizing tools with up to 97% accuracy
If you allow for some minor edits with AI, then use Lite

Originality.ai Launches Version 0.0.5 Academic

Academic 0.0.5 has exceptional 99%+ accuracy and a low <1% false positive rate
Robust against popular AI humanizers and AI bypassers
Ideal for teachers and students accurate for STEM answers (Code and Formulas)

We hope this post will help you understand more about AI detectors, AI detector accuracy, and give you the tools to complete your own analysis if you want to.

We believe…

In transparent and accountable development and use of AI.
AI detectors have a role to play in mitigating some of the potential negative societal impacts of generative AI.
AI detection tool’s “accuracy” should be communicated with the same transparency and accountability that we want to see in AI’s development and use.

Our hope is that this study has moved us closer to achieving this and that our open-source initiatives will help others to be able to do the same.

If you have any questions on whether Originality.ai would be the right solution for your organization, please contact us.

If you are looking to run your own tests, please contact us. We are always happy to support any study (academic, journalist, or curious mind).

Additionally, to learn more about how Originality.ai performs in third-party academic research and studies, review our meta-analysis of accuracy studies.

Try our AI detector for yourself.

Jonathan Gillham

View All Posts By Author

Founder / CEO of Originality.ai I have been involved in the SEO and Content Marketing world for over a decade. My career started with a portfolio of content sites, recently I sold 2 content marketing agencies and I am the Co-Founder of MotionInvest.com, the leading place to buy and sell content websites. Through these experiences I understand what web publishers need when it comes to verifying content is original. I am not For or Against AI content, I think it has a place in everyones content strategy. However, I believe you as the publisher should be the one making the decision on when to use AI content. Our Originality checking tool has been built with serious web publishers in mind!

We Have 99% Accuracy in Detecting AI: Originality.ai Study

We believe that it is crucial for AI content detectors reported accuracy to be open, transparent, and accountable. The reality is, each person seeking AI-detection services deserves to know which detector is the most accurate for their specific use case.

AI Detection Accuracy Testing Standard

In This AI Detector Accuracy Guide:

Key Takeaways (TL;DR)

Why Are AI Detectors Essential in 2025?

Do AI Detectors Work? OpenAI Says No???

Societal Impacts of Undetectable AI-Generated Content Are Real

SEO Implications of AI Content in Google

FTC Warns Against Unsupported AI Content Detection Accuracy Claims

So, we created this guide and tools, because we believe…

Originality.ai Version History:

Basic Explanation of How Our AI Detector Works

How AI Content Detectors Work:

1. Feature-Based Approach:

Burstiness

‍Perplexity

Frequency Features

Readability or Fluency Features

Punctuation

Advantages vs. Disadvantages

2. Zero-Shot Approach:

3. Fine-Tuning AI Model Approach:

AI Detector Accuracy Testing Plan:

What Is the Best Test? Use Your Own Data!

Testing Method & New Open-Source Testing Tools:

Our View On: AI Detectors Within Academia & False Positives in General

How To Evaluate AI Detectors “Accuracy”:

Confusion Matrix

True Positive Rate — AI Text Detection Capabilities

True Negative Rate — Human-Text Detection Capabilities:

Accuracy:

F1:

Metrics Considered but Not Used:

But… What Should Be Considered AI Content?

What Does Light AI Editing Mean?

First, let’s answer the question: “What is AI Editing?”

Here is how we classify light AI editing at Originality.ai:

What Should be Considered AI Writing?

Originality.ai AI Detector Accuracy Tests

Lite 1.0.2 Tests

Lite AI Detection Accuracy Testing

Lite AI Detection Accuracy per Model

Lite AI Humanizer Testing

Turbo 3.0.2 Tests:

Turbo AI Detection Accuracy Testing

Turbo AI Detection Accuracy per Model

Turbo 3.0.2 AI Humanizer Testing

Turbo 3.0.1: A Historical Look at Past Testing & Studies

Academic 0.0.5 Tests:

Academic AI Detection Accuracy Testing

Academic AI Detection Accuracy per Model

Academic 0.0.5 AI Humanizer Testing

Third-Party AI Detection Studies

Complete List of All AI Content Detectors:

Why Is Our Model More Accurate?

Final Thoughts

The Results…

We believe…

Jonathan Gillham

More From The Blog

AI Studies

Over 95% of Christmas & Holiday Recipes are Human-Written

AI Studies

Is GPT-5.2 Content Detectable?

AI Studies

10.9% of Reviews About Temu on Trustpilot are Likely AI in 2025

Al Content Detector & Plagiarism Checker for Marketers and Writers

Use our leading tools to ensure you can hit publish with integrity!

Tools

KNOWLEDGE BASE

Company