AI Studies

Is GPT-4o Content Detectable?

Is GPT-4o fooling AI detectors? Our study reveals 96% accuracy in spotting GPT-4o content. Test yours and See if GPT-4o flies under the radar.

With the release of OpenAI’s latest AI model GPT-4o “omnimodel” there is a  need to understand our AI detectors accuracy

This quick study looks at 1000 GPT-4o generated text results to answer if GPT-4o is able to be detected.

Is GPT-4o AI Content Detectable?

  1. Yes - GPT-4o Text is Detectable with a 96.3% Accuracy for our Model 2.0 Standard and 97.8% accuracy for our 3.0 Turbo model.
  2. This is a slight drop from our GPT-4 performance of over 99% and is expected to close as we train our AI detector on GPT-4o content

Try our AI Detector here.

Dataset

In order to evaluate the detectability of GPT-4o, we prepared a dataset of 1000 GPT-4o-generated text samples.

AI Generated Text Data

For AI-text generation, we used GPT-4o based on three approaches given below:

  1. Rewrite prompts: Generating the content by providing the model a customized prompt along with some articles (probably generated by LLMs) as a reference to rewrite from. (450 Samples)
  2. Rewrite human-written text: Generating the content considering the provided prompt to bypass the AI Detection tool by rewriting the human-written text which we fetched from an open-source dataset (325 Samples)some text
    1. One-Class Learning for AI-Generated Essay Detectionsome text
      1. Paper: https://www.mdpi.com/2076-3417/13/13/7901
      2. Dataset: https://github.com/rcorizzo/one-class-essay-detection
  3. Write articles from scratch: Generating the articles from scratch based on the given topics ranging from fictional and non-fictional diverse domains such as history, medicine, mental health, content marketing, social media, literature, robots, future etc. (225 Samples)

Evaluation

To evaluate the efficacy we used the Open Source AI Detection Efficacy tool that we have released:

Originality.AI has two models namely Model 3.0 Turbo and Model 2.0 Standard for the purpose of AI Text Detection.

  • Use Version 3.0 Turbo - If your risk tolerance for AI is ZERO! It is designed to identify any use of AI even light AI
  • Version 2.0 Standard - If you are okay with slight use of AI (i.e. AI editing)

The open-source testing tool returns a variety of metrics for each detector you test, each of which reports on a different aspect of that detectors performance, including:

  • Sensitivity (True Positive Rate): The percentage of the time the detector identifies AI correctly.
  • Specificity (True Negative Rate): The percentage of the time the detector identifies humans correctly.
  • Accuracy: The percentage of the detectors predictions that were correct
  • F1: The harmonic mean of Specificity and Precision, often used as an agglomerating metric when ranking performance of multiple detectors.

If you'd like a detailed discussion of these metrics, what they mean, how they're calculated, and why we chose them, check out our blog post on AI detector evaluation. For a succinct upshot, though, we think the confusion matrix is an excellent representation of a model's performance.

Below is an evaluation of both the models on the above dataset. 

Confusion Matrix:

Confusion Matrix on AI only dataset with Model 2.0 Standard
Figure 1. Confusion Matrix on AI only dataset with Model 2.0 Standard
Confusion Matrix on AI only dataset with Model 3.0 Turbo
Figure 2. Confusion Matrix on AI only dataset with Model 3.0 Turbo

Evaluation Metrics:

For this smaller test to be able to identify the ability for Originality.ai’s AI detector to identify GPT-4o content we look at True Positive Rate or the % of the time that the model correctly identified AI text as AI out of a 1000 sample GPT-4o content. 

Model 2.0 Standard:

  • Recall (True Positive Rate) = 96.4%

Model 3.0 Turbo:

  • Recall (True Positive Rate) = 97.8%

Jonathan Gillham

Founder / CEO of Originality.ai I have been involved in the SEO and Content Marketing world for over a decade. My career started with a portfolio of content sites, recently I sold 2 content marketing agencies and I am the Co-Founder of MotionInvest.com, the leading place to buy and sell content websites. Through these experiences I understand what web publishers need when it comes to verifying content is original. I am not For or Against AI content, I think it has a place in everyones content strategy. However, I believe you as the publisher should be the one making the decision on when to use AI content. Our Originality checking tool has been built with serious web publishers in mind!

More From The Blog

AI Content Detector & Plagiarism Checker for Serious Content Publishers

Improve your content quality by accurately detecting duplicate content and artificially generated text.