Does Originality.AI Have a Free Trial?

Yes, you can get 50 credits by installing the free AI detection Chrome Extension to test Originality.AI’s detection capabilities. 1 credit can scan 100 words.

Can Originality.AI Detect Chat GPT Content?

Yes, Originality.AI can detect ChatGPT content.

How Does AI Detection Work?

Our internally built artificial intelligence uses supervised learning with multiple models including a modified BERT model to predict if content is AI or Original. Our AI has been provided with millions of records both AI and Original content then trained to tell the difference between the two. After each training session, a large test data set is used to evaluate if the new model is an improvement or not.

How Accurate is Originality.AI AI Content Detection Tool?

On the last OpenAI GPT-4 model we tested Originality.AI and the results were it was 99.37% accurate with 1.56% false positives on the known human text. AI detection is different for every model. Below are the detection rates when testing Originality.AI: - GPT-3.5 Detection Accuracy: 99.9% accurate - ChatGPT GPT4 Detection Accuracy: 83.29% accurate - GPT-4 Detection Accuracy: 99.5% accurate - Paraphrased (quillbot) Detection Accuracy: 94.7% accurate

Which AI detector is most accurate?

Originality.AI is the most accurate AI content detector producing the fewest false positives while also the only tool that accurately identifies a piece of content (human or AI) that has been paraphrased.

AI Studies

AI Detection Accuracy Studies — Meta-Analysis of 11 Studies

A comprehensive overview and meta-analysis of academic research and studies that demonstrate the exceptional performance of Originality.ai in detecting AI-generated text.

Jonathan Gillham

October 23, 2025

In the many studies below looking at which AI detector is the most accurate, Originality.ai has consistently emerged as the most accurate AI text detector, outperforming various other tools.

This article provides a meta-analysis of multiple research studies that showcase Originality.ai’s superior detection capabilities. These findings validate Originality.ai’s own AI detector accuracy study. They show that Originality.ai has outstanding performance when distinguishing AI-generated content from human-written text, demonstrating reliable third-party evidence of our efficacy.

Key Findings (TL;DR)

Originality.ai AI Detector identified as the most effective in all 6 published 3rd party studies below

Originality.ai stands out as the most accurate tool for AI-generated text detection across multiple studies with high precision, recall, and overall accuracy. Originality.ai’s AI Content Checker has consistently outperformed other tools in detecting AI content and ensuring the authenticity of human-written text.

The following studies have been analyzed to assess the accuracy of AI-generated Text Detection Tools.

An Empirical Study of AI-Generated Detection Tools

Originality.ai Study Overview Link: Empirical Study of AI-Generated Text Detection
Study Publication Link: Opast Publishing Group

The Effectiveness of Software Designed to Detect AI-Generated Writing: A Comparison of 16 AI Text Detectors

Originality.ai Study Overview Link: The Effectiveness of Software Designed to Detect AI-Generated Writing
Study Publication Link: https://www.degruyter.com/document/doi/10.1515/opis-2022-0158/html

RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors

Originality.ai Study Overview Link: Extensive Study “RAID”
Study Publication Link: [2405.07940] RAID

The Great Detectives: Humans vs. AI Detectors in Catching Large Language Model-Generated Medical Writing

Originality.ai Study Overview Link: Large Language Model-Generated Text in Medical Writing
Study Publication Link: The great detectives

Characterizing the Increase in Artificial Intelligence Content Detection in Oncology Scientific Abstracts From 2021 to 2023

Originality.ai Study Overview Link: ASCO Research AI-Generated Text in Scientific Abstracts
Study Publication Link: AI Detection in Oncology Scientific Abstracts From 2021 to 2023

Students are using large language models and AI detectors can often detect their use

Originality.ai Study Overview Link: Student-Written College Coursework From AI-Generated Text
Study Publication Link: Students are using large language models

Exploring the Consequences of AI-Driven Academic Writing on Scholarly Practices

Originality.ai Study Overview Link: University of Florida Study on Ensuring the Authenticity of Scholarly Writing‍
Study Publication Link: This Paper Was Written with the Help of ChatGPT

Recent Trend in Artificial Intelligence-Assisted Biomedical Publishing: A Quantitative Bibliometric Analysis

Study Publication Link: AI-Assisted Biomedical Publishing

‍Comparative accuracy of AI-based plagiarism detection tools: an enhanced systematic review‍

Study Publication Link: Journal of AI, Humanities, and New Ethics

Using aggregated AI detector outcomes to eliminate false-positives in STEM-student writing

Originality.ai Study Overview Link: Eliminating False-Positives in STEM-Student Writing
Study Publication Link: Arizona State University

AI, Human, or Hybrid? Reliability of AI Detection Tools in Multi-Authored Texts

Study Publication Link: Inteletica

Rankings

Study Title	Originality.ai’s Accuracy	Performance Highlights	Key Competitors
An Empirical Study of AI-Generated Text Detection Tools	97%	Highest true positives, Lowest false negatives	GPTZero, Writer
The Effectiveness of Software Designed to Detect AI-Generated Writing: A Comparison of 16 AI Text Detectors	97%	100% accuracy on GPT-3.5 and GPT-4 papers	Copyleaks, TurnItIn
RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors	85%	Most accurate across base and adversarial datasets, Exceptional performance on paraphrased content	Binoculars, FastDetectGPT
The great detectives: humans versus AI detectors in catching large language model-generated medical writing	100%	100% accuracy on ChatGPT-generated and AI-rephrased articles	ZeroGPT, GPT-2 Output Detector
Characterizing the Increase in AI Content Detection in Oncology Scientific Abstracts	96%	96% Accuracy for AI-generated (GPT-3.5, GPT-4) abstracts with over 95% sensitivity	GPTZero, Sapling
Students are using large language models and AI detectors can often detect their use	91%	Highest accuracy of 91% for Human vs AI and 82% for Human vs Disguised text	GPTZero, ZeroGPT, Winston
Exploring the Consequences of AI-Driven Academic Writing on Scholarly Practices	96.6%	Highest Mean Prediction Score of 96.5% for ChatGPT generated content and 96.7% for ChatGPT Revision of Human-authored content	ContentDetector.AI, ZeroGPT, GPTZero, Winston.ai
Recent Trend in Artificial Intelligence-Assisted Biomedical Publishing: A Quantitative Bibliometric Analysis	97.6% AUC	Excellent overall accuracy with an area under the receiver operating curve (AUC) of 97.6%.	Originality.ai, Copyleaks, Crossplag, GPT-2 Output Detector, GPT Zero, and Writer.
Comparative accuracy of AI-based plagiarism detection tools: an enhanced systematic review	98-100%	Near-perfect accuracy, demonstrating the highest overall accuracy of detectors studied.	Originality.ai, Turnitin AI, Sapling, and Winston AI (as well as: GPTZero, Copyleaks, ZeroGPT, Content at Scale, and GPT-2 Output Detector).
Using aggregated AI detector outcomes to eliminate false-positives in STEM-student writing	98%	Remarkable precision. Only 2% false positives and 2% false negatives, highlighting its superior reliability.	Originality.ai, Copyleaks, GPTZero, DetectGPT
AI, Human, or Hybrid? Reliability of AI Detection Tools in Multi-Authored Texts	100%	100% accuracy on AI texts and across each LLM tested (ChatGPT, Grok, and Gemini) *Spanish Texts Dataset	Originality.ai, Copyleaks, GPTZero

‍

Study Summaries

Study 1: An Empirical Study of AI-Generated Text Detection Tools

Based on An Empirical Study of AI-Generated Text Detection Tools, Originality.ai is the leading tool in detecting AI-generated text, achieving the highest accuracy rate of 97%, outperforming five other tools in identifying human-written content.

Key Findings

Accuracy: 97%
Precision: 98%
Recall: 96%
F1-score: 97%

Study Details

Tools Evaluated: Originality.ai, Zylalab, GPTKIT, GPTZero, Sapling, Writer
Dataset: 11,580 samples from AH&AITD dataset
Evaluation Criteria: Accuracy, Precision, Recall, F1 score, ROC curve, Confusion Matrix

Performance Highlights

Highest True Positives: 5,547
Lowest False Negatives: 243
Second Lowest False Positives: 94
Second Highest True Negatives: 5,696

Source

https://www.opastpublishers.com/peer-review/an-empirical-study-of-aigenerated-text-detection-tools-6354.html

Study 2: The Effectiveness of Software Designed to Detect AI-Generated Writing: A Comparison of 16 AI Text Detectors

According to this comprehensive study on “The Effectiveness of Software Designed to Detect AI-Generated Writing,” where 16 AI text detectors were evaluated, Originality.ai demonstrated remarkable accuracy identifying AI-generated content. It ranked as a top performer across GPT-3.5, GPT-4, and human-written papers with an overall accuracy of 97%.

Key Findings

Overall Accuracy: 97%
GPT-3.5 Accuracy: 100%
GPT-4 Accuracy: 100%
Human Papers Accuracy: 95%

Study Details

Tools Evaluated: Originality.ai, Copyleaks, TurnItIn, Scribbr, ZeroGPT, Grammica, GPTZero, Crossplag, OpenAI, IvyPanda, GPT Radar, SEO.ai, Content at Scale, Writer, Sapling, ContentDetector.ai
Top Performers: Originality.ai, Copyleaks, TurnItIn
Dataset: 126 short papers/essays that were generated by AI or first-year college students.
Evaluation Criteria: Overall accuracy, accuracy with each type of document, decisiveness, the number of false positives, and the number of false negatives.

Performance Highlights

Overall Accuracy: Very high
Accuracy, GPT-3.5: Very high
Accuracy, GPT-4: Very high
Decisiveness: High
False Positives: Few
False Negatives: Few

Source

https://www.degruyter.com/document/doi/10.1515/opis-2022-0158/html

Study 3: RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors

In the largest and most comprehensive study to date, RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors, Originality.ai outperformed 11 leading AI detectors, achieving a remarkable accuracy of 85% on the base dataset and 96.7% on the paraphrased content in identifying AI-generated content.

Key Findings

Base Dataset Accuracy: 85%
Adversarial Techniques: 1st in 9 out of 11 tests
Content Domains: 1st in 5 out of 8 domains
Paraphrased Content Accuracy: 96.7%

Study Details

Tools Evaluated:
- Commercial: Originality.ai, GPTZero, Winston, ZeroGPT
- Metric-Based: GLTR, Binoculars, Fast DetectGPT, LLMDet
- Neural: RoBERTa-Base (GPT2), RoBERTaLarge (GPT2), RoBERTa-Base (ChatGPT), RADAR
Dataset: 6,287,820 texts
- 8 Domains: Abstracts, Recipes, Books, News, Poetry, Reddit, Reviews, Wikipedia
- 11 Text Generation Models: LLM’s such as ChatGPT, GPT-4, Llama etc
- Source: https://github.com/liamdugan/raid
Evaluation Criteria:
- 11 Types of Adversarial attacks (strategies to make text undetectable)
- Accuracy at 5% False Positive Threshold for all tests

Performance Highlights

Most Accurate AI Detector on Base Dataset
Most Accurate AI Detector on Adversarial Datasets
The Most Accurate AI Detector Across All Domains
Exceptional Performance on Paraphrased Content

Source

https://arxiv.org/abs/2405.07940

Study 4: The Great Detectives: Humans versus AI Detectors in Catching Large Language Model-generated Medical Writing

The study, The Great Detectives: Humans versus AI Detectors in Catching Large Language Model-generated Medical Writing, directly compares the accuracy of advanced AI detectors and human reviewers in detecting AI-generated medical writing after paraphrasing.

Six common AI content detectors and four human reviewers were employed to differentiate between the original and AI-generated articles. Originality AI emerged as the most sensitive and accurate platform for detecting AI-generated (including paraphrased) content.

Study 4: The Great Detectives: Humans versus AI Detectors in Catching Large Language Model-generated Medical Writing — *(Accuracy of six AI content detectors in identifying AI-generated articles)*

Key Findings

ChatGPT-Generated Articles Accuracy: 100%
AI-Rephrased Articles Accuracy: 100%
Human evaluators performed worse than AI detectors

Study Details

Tools Evaluated:
- Six AI detectors: Originality.ai, TurnItIn, GPTZero, ZeroGPT, Content at Scale, GPT-2 Output Detector
- Four Human Reviewers: Two student reviewers and Two professorial reviewers
Dataset: 150 texts (academic papers)
Evaluation Criteria: AI score or Perplexity score

Performance Highlights

Only AI detector to identify 100% of AI Content
Only AI detector to identify 100% AI-Rephrased Content

Source

https://link.springer.com/article/10.1007/s40979-024-00155-6

Study 5: Characterizing the Increase in Artificial Intelligence Content Detection in Oncology Scientific Abstracts From 2021 to 2023

The study Characterizing the Increase in Artificial Intelligence Content Detection in Oncology Scientific Abstracts From 2021 to 2023 examines the effectiveness of three AI-content detectors (Originality.ai, GPTZero, and Sapling) in identifying AI-generated content in scientific abstracts submitted to the ASCO Annual Meetings from 2021 to 2023.

Key Findings

Perfect AUROC scores of 1.00 for GPT-3.5 and nearly perfect for GPT-4
High AUPRC for distinguishing AI-generated from human-written abstracts

Study Details

Three Tools Evaluated: Originality.ai, GPTZero, Sapling
Dataset: 15,553 oncology scientific abstracts from ASCO Annual Meetings (2021-2023)
Evaluation Criteria: AUPRC, AUROC, Brier Score

Performance Highlights

GPT-3.5 vs. Human: 99.7%
GPT-4 vs. Human: 98.7%
Mixed GPT-3.5 vs. Human: 87.8%
Mixed GPT-4 vs. Human: 81.5%

Source

https://ascopubs.org/doi/pdfdirect/10.1200/CCI.24.00077

Study 6: Students Are Using Large Language Models and AI Detectors Can Often Detect Their Use

The study Students are using large language models and AI detectors can often detect their use, aimed to explore how students use LLMs in their college work at the University of Wisconsin-Madison and evaluate the effectiveness of AI Detectors in identifying AI-generated text.

They evaluated five AI detectors (Content at Scale, GPTZero, ZeroGPT, Winston, and Originality.ai, however, due to poor performance, Content at Scale was not further analyzed.

Study 6: Students Are Using Large Language Models and AI Detectors Can Often Detect Their Use — *(Accuracy of AI content detectors)*

Key Findings

Highest Accuracy of 91% for Human vs. AI and 82% for Human vs Disguised Text
Top F1 Score of 92% for Human vs. AI and a near-top score of 80% for Human vs. Disguised Text

Study Details

Three Tools Evaluated: Originality.ai, GPTZero, Winston, ZeroGPT
Dataset: 459 unique essays on the regulation of the tryptophan operon (human-written, AI-generated, disguised AI-generated)
Evaluation Criteria: Accuracy, Precision, Recall, F1 score

Performance Highlights

Accuracy (Human vs. AI): 0.91
Precision (Human vs. AI): 0.85
Recall (Human vs. AI): 1.0
F1 Score (Human vs. AI): 0.92
F1 Score (Human vs. Disguised): 0.80

Source

https://www.frontiersin.org/articles/10.3389/feduc.2024.1374889/full

Study 7: Exploring the Consequences of AI-Driven Academic Writing on Scholarly Practices

The University of Florida study on AI detection in Academic Writing, “This Paper Was Written with the Help of ChatGPT: Exploring the Consequences of AI-Driven Academic Writing on Scholarly Practices,” aimed to explore the effectiveness of various AI content detection tools in differentiating between AI-generated content and human-written copy in academic writing.

They evaluated five AI detectors (Originality.ai, ContentDetector.AI, ZeroGPT, GPTZero, and Winston.ai.

Study 7: Exploring the Consequences of AI-Driven Academic Writing on Scholarly Practices

Key Findings

Highest Mean Prediction Scores in 4 out 5 Categories for two different datasets - GPTR (ChatGPT revision of Human-authored content) peaking at 99.3% in EDM and 94.10% in LAK dataset
Lowest Error Rate of 3.8% for EDM Dataset and 17.7% for LAK Dataset

Study Details

The Tools Evaluated: Originality.ai, ContentDetector.AI, ZeroGPT, GPTZero, and Winston.ai.
Dataset: Titles and Abstracts from the LAK22 and EDM2022 conference proceedings (Human-authored, ChatGPT (GPT-4-Turbo Model), ChatGPT Revision of Human-authored, 50% ChatGPT + 50% Human-authored, 50% Human-authored + 50% ChatGPT)
Evaluation Criteria: Mean Prediction Scores, Root Mean Square Error (RMSE), Area Under the Curve (AUC)

Performance Highlights

Mean Prediction Score (EDM Dataset): GPTR - 99.30%, GPT - 97.50%
Mean Prediction Score (LAK Dataset): GPTR - 94.10%, GPT - 95.50%
RSME (EDM Dataset): GPTR - 3.80%, GPT-10.10%
RSME (LAK Dataset): GPTR - 17.70%, GPT - 17.20%

Source

https://educationaldatamining.org/edm2024/proceedings/2024.EDM-short-papers.55/2024.EDM-short-papers.55.pdf

Study 8: Recent Trend in Artificial Intelligence-Assisted Biomedical Publishing

The rise of AI-generated content in biomedical publishing has created a demand for reliable AI text detection tools.

A recent bibliometric study, “Recent Trend in Artificial Intelligence-Assisted Biomedical Publishing: A Quantitative Bibliometric Analysis,” analyzed trends in AI-assisted content within peer-reviewed biomedical literature and compared the performance of various AI-detection tools.

‍Originality.ai showed impressive results in this study, standing out with its superior accuracy and effectiveness compared to other AI detectors.

Study 8: Recent Trend in Artificial Intelligence-Assisted Biomedical Publishing — *(Trends in published abstracts by the predicted probability of AI-generated text)*

Key Findings

Originality.ai achieved 100% sensitivity and 95% specificity in detecting AI-generated content.
Originality.ai demonstrated excellent overall accuracy with an area under the receiver operating curve (AUC) of 97.6%.
AI-generated content in biomedical literature increased from 21.7% to 36.7% between 2020 and 2023, as detected by Originality.ai.

Study Details

Six Tools Evaluated: Originality.ai, Copyleaks, Crossplag, GPT-2 Output Detector, GPT Zero, and Writer.
Dataset: Abstracts from peer-reviewed journals indexed in MEDLINE between 2020 and 2023.
- 390 randomized controlled trial abstracts from MEDLINE — randomly (30 abstracts per quarter) selected between January 2020 and March 2023.
- 60 abstracts — generated using ChatGPT to test the sensitivity of the AI detectors.
- 60 abstracts — selected from the 1980s, when AI usage was minimal, were used to test specificity.
Evaluation Criteria
- Sensitivity (the ability to correctly detect AI-generated text).
- Specificity (the ability to correctly identify human-generated text).
- Overall accuracy (represented by the AUC).

Performance Highlights

Finding 1: Originality.ai achieved 100% sensitivity in detecting AI-generated abstracts.
Finding 2: Originality.ai demonstrated 95% specificity, correctly identifying human-written abstracts with minimal false positives.
Finding 3: Originality.ai showed strong discriminatory ability with an AUC of 97.6%.

Source

https://assets.cureus.com/uploads/review_article/pdf/158398/20230618-14395-7fhu27.pdf

Study 9: Comparative accuracy of AI-based plagiarism detection tools: an enhanced systematic review

In March 2025, the Journal of AI, Humanities, and New Ethics published “Comparative accuracy of AI-based plagiarism detection tools: an enhanced systematic review.”

The aim was to address research gaps in the efficacy of AI-powered plagiarism detection tools by analyzing published studies.

To measure their accuracy, the researchers conducted a search for four AI detectors: Originality.ai, Turnitin AI, Sapling, and Winston AI, across peer-reviewed studies that incorporated quantitative accuracy measurements.

The research evaluated studies from a range of academic disciplines including, medicine, business, English, psychology, education, the humanities, and more.

Study 9: Comparative accuracy of AI-based plagiarism detection tools: an enhanced systematic review — *Image source*

Key Findings

Originality.ai demonstrated near-perfect 98-100% average accuracy, which ranked it in a top position for the most accurate AI Detector.
Following Originality.ai in accuracy were Turnitin AI (92-100% accuracy) and Sapling (97% accuracy).

Study Details

Tools Evaluated: Originality.ai, Turnitin AI, Sapling, and Winston AI
- Additionally the study also found that the following AI detectors were frequently included in the comparative analyses: GPTZero, Copyleaks, ZeroGPT, Content at Scale, and GPT-2 Output Detector.
Dataset:126 million academic papers in the Semantic Scholar corpus
- A search was conducted across these papers for:
  - Primary terms: “artificial intelligence plagiarism detection,” “machine-generated text detection,” “Turnitin AI,” “OriginalityAI,” “Sapling,” and “Winston AI.”
  - Secondary terms: “academic integrity,” “accuracy rate,” “comparative analysis,” and “AI text.”
- This enabled researchers to compile 500 samples that were the most relevant.
Evaluation Criteria:
- The study had to contain a minimum of one of the AI tools specified (Originality.ai, Turnitin AI, Sapling, or Winston AI) and had to have been conducted in either an academic or educational setting.
- The study had to include “quantitative measurements of accuracy rates,” and use “validated machine-generated text samples” to evaluate detection accuracy.
- There needed to be a clear methodology included in the study and it had to comparatively analyze accuracy (instead of a technical focus).
- The study had to have conducted “empirical research, systematic review, or meta-analysis providing primary data about detection accuracy.”

In addition to defining evaluation criteria, the researchers also include exclusion criteria. The exclusion criteria highlighted that studies which were not peer-reviewed were excluded, as well as those with insufficient data collection or those that lacked quantitative measurements.

Further, although the researchers aimed to study Winston AI, they could not find studies with reported results for Winston AI.

Performance Highlights

Originality.ai showcased near-perfect accuracy: an average accuracy of 98-100%.
Some studies analyzed noted that Originality.ai achieved 100% accuracy.
Across the academic disciplines studied, Originality.ai excelled at detecting topics in computer science, physics, mathematics, and cross-disciplinary texts.

Source

https://jaihne.com/index.php/jaihne/article/view/11/3

Study 10: Using aggregated AI detector outcomes to eliminate false-positives in STEM-student writing

Arizona State University evaluated four AI detection tools (Originality.ai, GPTZero, Copyleaks, and DetectGPT) to identify AI versus human-generated essays in a STEM educational environment and published a study available via the American Physiological Society in March 2025.

We compiled a complete study summary in our article, Eliminating False-Positives in STEM-Student Writing.

Here’s a quick look at the highlights of the study:

Originality.ai exhibited a strong, consistent performance that not only surpassed other AI detection tools but also outperformed human evaluators, including faculty and teaching assistants.

Study 10: Using aggregated AI detector outcomes to eliminate false-positives in STEM-student writing — *Image Source*

Key Findings

Originality.ai correctly classified 98% of AI-generated essays.
Originality.ai demonstrated one of the lowest false-positive rates (2%) of the evaluated tools.
Originality.ai offers the best balance between safety (low FP) and strength (high TP).

Study Details

Tools Evaluated: Originality.ai, GPTZero, Copyleaks, and DetectGPT
Dataset: The total dataset was 348 essays (174 human-written + 174 AI-generated).
- However, the AI detectors were tested on a sample of 99 essays (50 human-written and 49 AI-generated).
- The essays (approximately 150 words) were created in response to a prompt about plasma membrane anatomy and physiology.
Evaluation Criteria: Percentages such as FP and TP rates were calculated based on the 99-essay evaluation set.
- Essays were evaluated using the following metrics:
  - True Positive (TP): AI-generated content correctly identified.
  - True Negative (TN): Human-written content correctly identified.
  - False Positive (FP): Human-written essays wrongly identified as AI.
  - False Negative (FN): AI essays wrongly identified as human-written.
- To provide insight into the accuracy metrics of this study we calculated the F1 Score and TPR based on the study's research data and the number of samples the study tested. Formulas for evaluation calculations:
  - TPR (Recall or Sensitivity) = TP / (TP + FN)
  - FPR = FP / (FP + TN)
  - Precision = TP / (TP + FP)
  - F1 Score = 2 * (Precision * Recall) / (Precision + Recall)

Performance Highlights

Originality.ai showcased remarkable precision, with only 2% false positives and 2% false negatives.
With a TP rate of 98% and a TN rate of 98%, Originality.ai is one of the only detectors that maintains consistently high performance.
Originality.ai surpassed human evaluators (average false positive rate of 5% and a true positive rate of 85%).

Source

https://journals.physiology.org/doi/pdf/10.1152/advan.00235.2024

Study 11: AI, Human, or Hybrid? Reliability of AI Detection Tools in Multi-Authored Texts

In September 2025, Inteletica published a new study on AI detection, “AI, Human, or Hybrid?: Reliability of AI detection tools in multi-authored texts,” and you can read the full study here.

This study evaluated three AI detection tools (Originality.ai, GPTZero, and Copyleaks) on a Spanish dataset of texts that contained material which was human, artificial (AI), and hybrid/mixed texts. Three LLMs were used in the study: ChatGPT, Grok, and Gemini.

The study found that Originality.ai demonstrated exceptional performance and, overall, was more accurate than GPTZero and Copyleaks in correctly identifying AI-generated texts from each of the LLMs tested.

Here’s a quick chart comparing performance on AI texts, human texts, and LLMs:

“AI, Human, or Hybrid?” Study: AI Detector Performance
AI Detector	AI Texts Accuracy	Human Texts Accuracy	Accuracy by LLM
Originality.ai	100%	90%	Grok: 100% ChatGPT: 100% Gemini: 100%
GPTZero	86.7%	96.7%	Grok: 90% ChatGPT: 90% Gemini: 80%
Copyleaks	76.7%	76.7%	Grok: 100% ChatGPT: 90% Gemini: 40%

‍

*AI Detection Confusion Matrix:* *Image Source*

Further, Originality.ai showed robust performance in identifying AI-generated texts even after human modifications were made (more details in performance highlights below).

Key Findings

Of the three AI detectors tested, “Originality.ai achieved the best overall performance, with 100% accuracy on AI texts and 90% on human texts.”
When tested on different LLMs (ChatGPT, Grok, and Gemini), Originality.ai correctly detected AI-generated texts across each of the LLMs tested (100%).
For texts written by humans and that were edited or reformulated by LLMs, Originality.ai maintained a strong performance, identifying Grok & ChatGPT at 100% and Gemini in 90% of the cases.
Then, when tested with AI-generated text that had been modified by humans, Originality.ai continued to confidently and correctly identify the AI output with 100% confidence.

*Results for AI-Generated Text Detection by LLM:* *Image Source*

Study Details

Tools Evaluated: Originality.ai, GPTZero, and Copyleaks
Dataset: The total dataset included 180 texts. These were split across four categories:
- (1) Human Modality (H) = 30 texts
- (2) AI Modality (AI) = 30 texts
- (3) Human with AI Support Modality (H + AI) = 30 texts
- (4) AI with Human Revision Modality (AI + H) = 90 texts
Evaluation Criteria: Performance was evaluated based on standard binary classification metrics (accuracy, precision, recall, F1-score, false positive rate, and false negative rate) as well as a tailored coding system for the hybrid texts (TP-MIXED, FP-MIXED, FN-MIXED).

Performance Highlights

Originality.ai was exceptional at identifying AI-generated texts correctly with 100% accuracy.
Further, Originality.ai had 100% confidence in identifying AI texts generated by each of the LLMs tested in the study: ChatGPT, Grok, and Gemini.
In identifying human-written content that was “subsequently edited or reformulated by AI tools,” Originality identified 100% of the content altered by Grok and ChatGPT and 90% of the content altered by Gemini. Consistently demonstrating a robust ability to identify AI modifications.
Then, when tested with AI-generated content that had been modified by humans, Originality.ai “maintains high confidence levels consistent with those obtained before the alterations, indicating strong model stability against minor or stylistic manipulations.” (see a breakdown in the table below).

*AI Detection Results of AI Text Modified by Humans:* *Image Source*

See a breakdown of the performance of each detector on the AI texts that were modified by humans and correctly classified as either AI or hybrid text:

“AI, Human, or Hybrid?” Study: Performance on AI Texts Modified by Humans
(True Positive % = total correctly classified as AI or Hybrid)
AI Detector	Grammatical Errors	Human-Like Sentences	Punctuation Errors
Originality.ai	100%	100%	100%
GPTZero	100%	96.7%	100%
Copyleaks	76.6%	83.4%	80%

‍

Source

https://inteletica.iberamia.org/index.php/journal/article/view/51/27

Learn more about Originality.ai and multilingual AI detection.

More From The Blog

AI Studies

82% of Amazon “Herbal Remedies” Books in 2025 Were Likely AI-Written

Is the Herbal Remedies book you’re reading on Amazon real or an AI-generated fake? Discover the concerning impact of AI on herbal remedy books published on Amazon in 2025.

Michael Fraiman

October 23, 2025

We Have 99% Accuracy in Detecting AI: Originality.ai Study

We believe that it is crucial for AI content detectors reported accuracy to be open, transparent, and accountable. The reality is, each person seeking AI-detection services deserves to know which detector is the most accurate for their specific use case.

Jonathan Gillham

October 23, 2025