With the release of OpenAI’s latest AI model GPT-4o “omnimodel” there is a need to understand our AI detectors accuracy.
This quick study looks at 1000 GPT-4o generated text results to answer if GPT-4o is able to be detected.
Try our AI Detector here.
In order to evaluate the detectability of GPT-4o, we prepared a dataset of 1000 GPT-4o-generated text samples.
For AI-text generation, we used GPT-4o based on three approaches given below:
To evaluate the efficacy we used the Open Source AI Detection Efficacy tool that we have released:
Originality.ai has two models namely Model 3.0 Turbo and Model 2.0 Standard for the purpose of AI Text Detection.
The open-source testing tool returns a variety of metrics for each detector you test, each of which reports on a different aspect of that detectors performance, including:
If you'd like a detailed discussion of these metrics, what they mean, how they're calculated, and why we chose them, check out our blog post on AI detector evaluation. For a succinct upshot, though, we think the confusion matrix is an excellent representation of a model's performance.
Below is an evaluation of both the models on the above dataset.
For this smaller test to be able to identify the ability for Originality.ai’s AI detector to identify GPT-4o content we look at True Positive Rate or the % of the time that the model correctly identified AI text as AI out of a 1000 sample GPT-4o content.
Model 2.0 Standard:
Model 3.0 Turbo:
Have you seen a thought leadership LinkedIn post and wondered if it was AI-generated or human-written? In this study, we looked at the impact of ChatGPT and generative AI tools on the volume of AI content that is being published on LinkedIn. These are our findings.
We believe that it is crucial for AI content detectors reported accuracy to be open, transparent, and accountable. The reality is, each person seeking AI-detection services deserves to know which detector is the most accurate for their specific use case.