AI Studies

Empirical Study of AI-Generated Text Detection — Results as per An Empirical Study of AI-Generated Text Detection Tools exhibited outstanding AI detection capabilities, according to research in An Empirical Study of AI-Generated Text Detection Tools, 2023.

Based on An Empirical Study of AI-Generated Text Detection Tools by Arslan Akram (from The Department of Computer Science, Faculty of Computer Science and Information Technology, The Superior University, Pakistan), when distinguishing AI-generated content from machine-written content, outperforms other detection tools. It is the most reliable alternative for AI text identification.

Key Findings (TL;DR)

  • achieved the highest accuracy rate at 97% of all the tools in the study.
  • demonstrated outstanding precision (98%), recall (96%), and F1-score (97%).
  • stood out in the confusion matrix with the highest true positives and lowest false negatives — demonstrating an exceptional ability to correctly identify human-written content from AI-generated content.

Study Details

The study evaluated six AI text detection tools: Zylalab, GPTKIT, GPTZero, Sapling,, and Writer, with a particular emphasis on their accuracy, precision, recall, and F1-score. Although all the tools fared well in the evaluations, was the most effective. It demonstrated an exceptional capability to detect AI-generated text from human-written text.

AI Text Detection Tools

  • Six AI text detection tools (Zylalab, GPTKIT, GPTZero, Sapling,, and Writer)


The total number of samples in the dataset is 11,580. 

The dataset, named AH&AITD (Arslan’s Human and AI Text Database) includes:

  • Human-written text samples from academic databases (Google Scholar and ResearchGate), content producer and blogger databases (Wikipedia), and other knowledge aggregators.
  • AI-text samples were generated from the models (ChatGPT, GPT-4, GPT-3, GPT-3.5, GPT-2, etc).
Testing Dataset Samples
Testing Dataset Samples (AH&AITD)

Evaluation Criteria

  • Accuracy, Precision, Recall, F1 score, ROC curve, and Confusion Matrix’s AI Detector Results

Finding 1: achieved the highest accuracy rate of 97.0% among all evaluated tools

Accuracy Comparison of AI Text Detection Tools on AH&AITD
Accuracy Comparison of AI Text Detection Tools on AH&AITD

Finding 2: demonstrated outstanding precision, recall, and F1-score showed:

  • Precision: 98% — it correctly identified AI-generated text.
  • Recall: 96% — it captured the majority of AI-generated text.
  • F1-score: 97% — it performed well in both precision and recall with an excellent balance.
Precision, recall, and F1-score of Ai Text Detection tools for Ai Generated Content
Precision, recall, and F1-score of Ai Text Detection tools for Human Written Content
Comparative Results of AI Text Detection Tools for Human and AI-generated text on AH&AITD

Finding 3: achieved the highest AUC among all tested tools demonstrated a superior ROC curve, giving an AUC score of 0.97, which is the highest in comparison to the others. It means that Originality’s AI Checker has an excellent ability to distinguish between AI-generated and human-written text.

ROC Curves For Ai Text Detection Tools
Testing Receiver Operating Curves of AI Text Detection Tools on AH&AITD

Finding 4: stands out with its results on Confusion Matrix showed exceptional performance with:

  • The highest true positives = 5,547 (Correctly identified AI text).
  • The lowest false negatives = 243 (AI text incorrectly identified as human).
  • The second lowest false positives = 94 (Human text incorrectly identified as AI).
  • The second highest true negatives = 5,696 (Correctly identified human text).

In the above two readings (False Positives and True Negatives), ranked second with a very close margin of 17 samples when compared to other tools. 

Testing confusion matrices of AI text detection tools on AH And AITD
Testing confusion matrices of AI text detection tools on AH&AITD

Final Thoughts was the most reliable and effective tool for AI-generated text detection, with high accuracy, precision, and recall. The author’s in-depth research and fair evaluation provide a clear comparison of the performance of AI text detection tools. Further, the research demonstrated that is a must-have tool for anyone who wants to ensure the authenticity of their work.

Jonathan Gillham

Founder / CEO of I have been involved in the SEO and Content Marketing world for over a decade. My career started with a portfolio of content sites, recently I sold 2 content marketing agencies and I am the Co-Founder of, the leading place to buy and sell content websites. Through these experiences I understand what web publishers need when it comes to verifying content is original. I am not For or Against AI content, I think it has a place in everyones content strategy. However, I believe you as the publisher should be the one making the decision on when to use AI content. Our Originality checking tool has been built with serious web publishers in mind!

More From The Blog

AI Content Detector & Plagiarism Checker for Serious Content Publishers

Improve your content quality by accurately detecting duplicate content and artificially generated text.