The Most Accurate AI Content Detector
Try Our AI Detector
AI Studies

AI Detection Accuracy Studies — Meta-Analysis of 8 Studies

A comprehensive overview and meta-analysis of academic research and studies that demonstrate the exceptional performance of Originality.ai in detecting AI-generated text.

In the many studies below looking at which AI detector is the most accurate, Originality.ai has consistently emerged as the most accurate AI text detector, outperforming various other tools. 

This article provides a meta-analysis of multiple research studies that showcase Originality.ai’s superior detection capabilities. These findings validate Originality.ai’s own AI detector accuracy study. They show that Originality.ai has outstanding performance when distinguishing AI-generated content from human-written text, demonstrating reliable third-party evidence of our efficacy. 

Key Findings (TL;DR)

Originality.ai AI Detector identified as the most effective in all 6 published 3rd party studies below

Originality.ai stands out as the most accurate tool for AI-generated text detection across multiple studies with high precision, recall, and overall accuracy. Originality.ai’s AI Content Checker has consistently outperformed other tools in detecting AI content and ensuring the authenticity of human-written text.

The following studies have been analyzed to assess the accuracy of AI-generated Text Detection Tools.

An Empirical Study of AI-Generated Detection Tools

The Effectiveness of Software Designed to Detect AI-Generated Writing: A Comparison of 16 AI Text Detectors

RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors

The Great Detectives: Humans vs. AI Detectors in Catching Large Language Model-Generated Medical Writing

Characterizing the Increase in Artificial Intelligence Content Detection in Oncology Scientific Abstracts From 2021 to 2023

Students are using large language models and AI detectors can often detect their use

Exploring the Consequences of AI-Driven Academic Writing on Scholarly Practices

Recent Trend in Artificial Intelligence-Assisted Biomedical Publishing: A Quantitative Bibliometric Analysis

Study Publication Link: AI-Assisted Biomedical Publishing

Rankings

Study Title Originality.ai’s Accuracy Performance Highlights Key Competitors
An Empirical Study of AI-Generated Text Detection Tools 97% Highest true positives, Lowest false negatives GPTZero, Writer
The Effectiveness of Software Designed to Detect AI-Generated Writing: A Comparison of 16 AI Text Detectors 97% 100% accuracy on GPT-3.5 and GPT-4 papers Copyleaks, TurnItIn
RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors 85% Most accurate across base and adversarial datasets, Exceptional performance on paraphrased content Binoculars, FastDetectGPT
The great detectives: humans versus AI detectors in catching large language model-generated medical writing 100% 100% accuracy on ChatGPT-generated and AI-rephrased articles ZeroGPT, GPT-2 Output Detector
Characterizing the Increase in AI Content Detection in Oncology Scientific Abstracts 96% 96% Accuracy for AI-generated (GPT-3.5, GPT-4) abstracts with over 95% sensitivity GPTZero, Sapling
Students are using large language models and AI detectors can often detect their use 91% Highest accuracy of 91% for Human vs AI and 82% for Human vs Disguised text GPTZero, ZeroGPT, Winston
Exploring the Consequences of AI-Driven Academic Writing on Scholarly Practices 96.6% Highest Mean Prediction Score of 96.5% for ChatGPT generated content and 96.7% for ChatGPT Revision of Human-authored content ContentDetector.AI, ZeroGPT, GPTZero, Winston.ai
Recent Trend in Artificial Intelligence-Assisted Biomedical Publishing: A Quantitative Bibliometric Analysis 97.6% AUC Excellent overall accuracy with an area under the receiver operating curve (AUC) of 97.6%. Originality.ai, Copyleaks, Crossplag, GPT-2 Output Detector, GPT Zero, and Writer.

Study Summaries

Study 1: An Empirical Study of AI-Generated Text Detection Tools

Based on An Empirical Study of AI-Generated Text Detection Tools, Originality.ai is the leading tool in detecting AI-generated text, achieving the highest accuracy rate of 97%, outperforming five other tools in identifying human-written content.

(Accuracy Comparison of AI Text Detection Tools on AH&AITD)

Key Findings

  • Accuracy: 97%
  • Precision: 98%
  • Recall: 96%
  • F1-score: 97%

Study Details

  • Tools Evaluated: Originality.ai, Zylalab, GPTKIT, GPTZero, Sapling, Writer
  • Dataset: 11,580 samples from AH&AITD dataset
  • Evaluation Criteria: Accuracy, Precision, Recall, F1 score, ROC curve, Confusion Matrix

Performance Highlights

  • Highest True Positives: 5,547
  • Lowest False Negatives: 243
  • Second Lowest False Positives: 94
  • Second Highest True Negatives: 5,696

Source

https://www.opastpublishers.com/peer-review/an-empirical-study-of-aigenerated-text-detection-tools-6354.html

Study 2: The Effectiveness of Software Designed to Detect AI-Generated Writing: A Comparison of 16 AI Text Detectors

According to this comprehensive study on “The Effectiveness of Software Designed to Detect AI-Generated Writing,” where 16 AI text detectors were evaluated, Originality.ai demonstrated remarkable accuracy identifying AI-generated content. It ranked as a top performer across GPT-3.5, GPT-4, and human-written papers with an overall accuracy of 97%.

(% of all 126 documents for which each detector gave correct, uncertain, or incorrect responses)

Key Findings

  • Overall Accuracy: 97%
  • GPT-3.5 Accuracy: 100%
  • GPT-4 Accuracy: 100%
  • Human Papers Accuracy: 95%

Study Details

  • Tools Evaluated: Originality.ai, Copyleaks, TurnItIn, Scribbr, ZeroGPT, Grammica, GPTZero, Crossplag, OpenAI, IvyPanda, GPT Radar, SEO.ai, Content at Scale, Writer, Sapling, ContentDetector.ai
  • Top Performers: Originality.ai, Copyleaks, TurnItIn
  • Dataset: 126 short papers/essays that were generated by AI or first-year college students.
  • Evaluation Criteria: Overall accuracy, accuracy with each type of document, decisiveness, the number of false positives, and the number of false negatives.

Performance Highlights

  • Overall Accuracy: Very high
  • Accuracy, GPT-3.5: Very high
  • Accuracy, GPT-4: Very high
  • Decisiveness: High
  • False Positives: Few
  • False Negatives: Few

Source

https://www.degruyter.com/document/doi/10.1515/opis-2022-0158/html

Study 3: RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors

In the largest and most comprehensive study to date, RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors, Originality.ai outperformed 11 leading AI detectors, achieving a remarkable accuracy of 85% on the base dataset and 96.7% on the paraphrased content in identifying AI-generated content. 

Key Findings

  • Base Dataset Accuracy: 85%
  • Adversarial Techniques: 1st in 9 out of 11 tests
  • Content Domains: 1st in 5 out of 8 domains
  • Paraphrased Content Accuracy: 96.7%

Study Details

  • Tools Evaluated:
    • Commercial: Originality.ai, GPTZero, Winston, ZeroGPT 
    • Metric-Based: GLTR, Binoculars, Fast DetectGPT, LLMDet 
    • Neural: RoBERTa-Base (GPT2), RoBERTaLarge (GPT2), RoBERTa-Base (ChatGPT), RADAR 
  • Dataset: 6,287,820 texts
  • Evaluation Criteria
    • 11 Types of Adversarial attacks (strategies to make text undetectable)
    • Accuracy at 5% False Positive Threshold for all tests 

Performance Highlights

  • Most Accurate AI Detector on Base Dataset
  • Most Accurate AI Detector on Adversarial Datasets
  • The Most Accurate AI Detector Across All Domains
  • Exceptional Performance on Paraphrased Content

Source

https://arxiv.org/abs/2405.07940

Study 4: The Great Detectives: Humans versus AI Detectors in Catching Large Language Model-generated Medical Writing

The study, The Great Detectives: Humans versus AI Detectors in Catching Large Language Model-generated Medical Writing, directly compares the accuracy of advanced AI detectors and human reviewers in detecting AI-generated medical writing after paraphrasing.

Six common AI content detectors and four human reviewers were employed to differentiate between the original and AI-generated articles. Originality AI emerged as the most sensitive and accurate platform for detecting AI-generated (including paraphrased) content.

(Accuracy of six AI content detectors in identifying AI-generated articles)

Key Findings

  • ChatGPT-Generated Articles Accuracy: 100%
  • AI-Rephrased Articles Accuracy: 100% 
  • Human evaluators performed worse than AI detectors

Study Details

  • Tools Evaluated
    • Six AI detectors: Originality.ai, TurnItIn, GPTZero, ZeroGPT, Content at Scale, GPT-2 Output Detector
    • Four Human Reviewers: Two student reviewers and Two professorial reviewers
  • Dataset: 150 texts (academic papers) 
  • Evaluation Criteria: AI score or Perplexity score

Performance Highlights

  • Only AI detector to identify 100% of AI Content
  • Only AI detector to identify 100% AI-Rephrased Content

Source

https://link.springer.com/article/10.1007/s40979-024-00155-6

Study 5: Characterizing the Increase in Artificial Intelligence Content Detection in Oncology Scientific Abstracts From 2021 to 2023

The study Characterizing the Increase in Artificial Intelligence Content Detection in Oncology Scientific Abstracts From 2021 to 2023 examines the effectiveness of three AI-content detectors (Originality.ai, GPTZero, and Sapling) in identifying AI-generated content in scientific abstracts submitted to the ASCO Annual Meetings from 2021 to 2023.

(Accuracy of AI content detectors in classifying human-written and AI-generated content)

Key Findings

  • Perfect AUROC scores of 1.00 for GPT-3.5 and nearly perfect for GPT-4
  • High AUPRC for distinguishing AI-generated from human-written abstracts

Study Details

  • Three Tools Evaluated: Originality.ai, GPTZero, Sapling
  • Dataset: 15,553 oncology scientific abstracts from ASCO Annual Meetings (2021-2023) 
  • Evaluation Criteria: AUPRC, AUROC, Brier Score

Performance Highlights

  • GPT-3.5 vs. Human: 99.7%
  • GPT-4 vs. Human: 98.7%
  • Mixed GPT-3.5 vs. Human: 87.8%
  • Mixed GPT-4 vs. Human: 81.5%

Source

https://ascopubs.org/doi/pdfdirect/10.1200/CCI.24.00077

Study 6: Students Are Using Large Language Models and AI Detectors Can Often Detect Their Use

The study Students are using large language models and AI detectors can often detect their use, aimed to explore how students use LLMs in their college work at the University of Wisconsin-Madison and evaluate the effectiveness of AI Detectors in identifying AI-generated text. 

They evaluated five AI detectors (Content at Scale, GPTZero, ZeroGPT, Winston, and Originality.ai, however, due to poor performance, Content at Scale was not further analyzed.

(Accuracy of AI content detectors)

Key Findings

  • Highest Accuracy of 91% for Human vs. AI and 82% for Human vs Disguised Text
  • Top F1 Score of 92% for Human vs. AI and a near-top score of 80% for Human vs. Disguised Text

Study Details

  • Three Tools Evaluated: Originality.ai, GPTZero, Winston, ZeroGPT
  • Dataset: 459 unique essays on the regulation of the tryptophan operon (human-written, AI-generated, disguised AI-generated) 
  • Evaluation Criteria: Accuracy, Precision, Recall, F1 score

Performance Highlights

  • Accuracy (Human vs. AI): 0.91
  • Precision  (Human vs. AI): 0.85
  • Recall (Human vs. AI): 1.0
  • F1 Score (Human vs. AI): 0.92
  • F1 Score (Human vs. Disguised): 0.80

Source

https://www.frontiersin.org/articles/10.3389/feduc.2024.1374889/full

Study 7: Exploring the Consequences of AI-Driven Academic Writing on Scholarly Practices

The University of Florida study on AI detection in Academic Writing, “This Paper Was Written with the Help of ChatGPT: Exploring the Consequences of AI-Driven Academic Writing on Scholarly Practices,” aimed to explore the effectiveness of various AI content detection tools in differentiating between AI-generated content and human-written copy in academic writing. 

They evaluated five AI detectors (Originality.ai, ContentDetector.AI, ZeroGPT, GPTZero, and Winston.ai.

Key Findings

  • Highest Mean Prediction Scores in 4 out 5 Categories for two different datasets - GPTR (ChatGPT revision of Human-authored content) peaking at 99.3% in EDM and 94.10% in LAK dataset
  • Lowest Error Rate of 3.8% for EDM Dataset and 17.7% for LAK Dataset

Study Details

  • The Tools Evaluated: Originality.ai, ContentDetector.AI, ZeroGPT, GPTZero, and Winston.ai.
  • Dataset: Titles and Abstracts from the LAK22 and EDM2022 conference proceedings (Human-authored, ChatGPT (GPT-4-Turbo Model), ChatGPT Revision of Human-authored, 50% ChatGPT + 50% Human-authored, 50% Human-authored + 50% ChatGPT)
  • Evaluation Criteria: Mean Prediction Scores, Root Mean Square Error (RMSE), Area Under the Curve (AUC)

Performance Highlights

  • Mean Prediction Score (EDM Dataset): GPTR - 99.30%, GPT - 97.50%
  • Mean Prediction Score (LAK Dataset): GPTR - 94.10%, GPT - 95.50%
  • RSME (EDM Dataset): GPTR - 3.80%, GPT-10.10%
  • RSME (LAK Dataset): GPTR  - 17.70%, GPT - 17.20%

Source

https://educationaldatamining.org/edm2024/proceedings/2024.EDM-short-papers.55/2024.EDM-short-papers.55.pdf

Study 8: Recent Trend in Artificial Intelligence-Assisted Biomedical Publishing

The rise of AI-generated content in biomedical publishing has created a demand for reliable AI text detection tools. 

A recent bibliometric study, “Recent Trend in Artificial Intelligence-Assisted Biomedical Publishing: A Quantitative Bibliometric Analysis,” analyzed trends in AI-assisted content within peer-reviewed biomedical literature and compared the performance of various AI-detection tools. 

Originality.ai showed impressive results in this study, standing out with its superior accuracy and effectiveness compared to other AI detectors.

(Trends in published abstracts by the predicted probability of AI-generated text)

Key Findings

  • Originality.ai achieved 100% sensitivity and 95% specificity in detecting AI-generated content.
  • Originality.ai demonstrated excellent overall accuracy with an area under the receiver operating curve (AUC) of 97.6%.
  • AI-generated content in biomedical literature increased from 21.7% to 36.7% between 2020 and 2023, as detected by Originality.ai.

Study Details

  • Six Tools Evaluated: Originality.ai, Copyleaks, Crossplag, GPT-2 Output Detector, GPT Zero, and Writer
  • Dataset: Abstracts from peer-reviewed journals indexed in MEDLINE between 2020 and 2023.
    • 390 randomized controlled trial abstracts from MEDLINE — randomly (30 abstracts per quarter) selected between January 2020 and March 2023. 
    • 60 abstracts — generated using ChatGPT to test the sensitivity of the AI detectors.
    • 60 abstracts — selected from the 1980s, when AI usage was minimal, were used to test specificity.
  • Evaluation Criteria
    • Sensitivity (the ability to correctly detect AI-generated text).
    • Specificity (the ability to correctly identify human-generated text).
    • Overall accuracy (represented by the AUC).

Performance Highlights

  • Finding 1: Originality.ai achieved 100% sensitivity in detecting AI-generated abstracts.
  • Finding 2: Originality.ai demonstrated 95% specificity, correctly identifying human-written abstracts with minimal false positives.
  • Finding 3: Originality.ai showed strong discriminatory ability with an AUC of 97.6%.

Source

https://assets.cureus.com/uploads/review_article/pdf/158398/20230618-14395-7fhu27.pdf

Further Reading

We conducted an analysis based on the third-party study “ESPERANTO: Evaluating Synthesized Phrases to Enhance Robustness in AI Detection for Text Origination,” accessible through Cornell University. While the authors didn't include Originality.ai in the original study, we ran a comparative analysis using the study's dataset to evaluate the robustness of the Originality.ai AI detector.

Our analysis with the ESPERANTO dataset found that Originality.ai demonstrated a robust performance and strong resilience to back-translation. Read the full results of our Originality.ai with the ESPRANTO dataset here.

Jonathan Gillham

Founder / CEO of Originality.ai I have been involved in the SEO and Content Marketing world for over a decade. My career started with a portfolio of content sites, recently I sold 2 content marketing agencies and I am the Co-Founder of MotionInvest.com, the leading place to buy and sell content websites. Through these experiences I understand what web publishers need when it comes to verifying content is original. I am not For or Against AI content, I think it has a place in everyones content strategy. However, I believe you as the publisher should be the one making the decision on when to use AI content. Our Originality checking tool has been built with serious web publishers in mind!

More From The Blog

Al Content Detector & Plagiarism Checker for Marketers and Writers

Use our leading tools to ensure you can hit publish with integrity!