With the release of OpenAI’s latest AI model GPT-4o “omnimodel” there is a need to understand our AI detectors accuracy.
This quick study looks at 1000 GPT-4o generated text results to answer if GPT-4o is able to be detected.
Try our AI Detector here.
In order to evaluate the detectability of GPT-4o, we prepared a dataset of 1000 GPT-4o-generated text samples.
For AI-text generation, we used GPT-4o based on three approaches given below:
To evaluate the efficacy we used the Open Source AI Detection Efficacy tool that we have released:
Originality.ai has two models namely Model 3.0 Turbo and Model 2.0 Standard for the purpose of AI Text Detection.
The open-source testing tool returns a variety of metrics for each detector you test, each of which reports on a different aspect of that detectors performance, including:
If you'd like a detailed discussion of these metrics, what they mean, how they're calculated, and why we chose them, check out our blog post on AI detector evaluation. For a succinct upshot, though, we think the confusion matrix is an excellent representation of a model's performance.
Below is an evaluation of both the models on the above dataset.
For this smaller test to be able to identify the ability for Originality.ai’s AI detector to identify GPT-4o content we look at True Positive Rate or the % of the time that the model correctly identified AI text as AI out of a 1000 sample GPT-4o content.
Model 2.0 Standard:
Model 3.0 Turbo:
We studied how Originality.ai’s multilingual AI detector stacked up to state-of-the-art AI content detectors across a range of Arabic datasets, as per the study “The Arabic AI Fingerprint: Stylometric Analysis and Detection of Large Language Models Text.” These are our findings.
In an extension of the peer-reviewed study, “The accuracy-bias trade-offs in AI text detection tools and their impact on fairness in scholarly publication,” Originality.ai demonstrated exceptional results.