OpenAI released a new series of AI models designed to spend more time thinking before they respond — GPT-o1-preview. In their release, OpenAI describes that it can reason through complex tasks and solve harder problems than previous models in science, coding, and math.
As the new model was trained with reasoning traces and can spend time considering before it answers, in some domains, this has led to greater performance than before. In order to maintain the authenticity and integrity of written content available online, it is necessary to have a greater AI content detector as well.
This brief study looks at 1000 GPT-o1-preview generated text results to find out whether the Originality.ai AI Detector can detect GPT-o1-preview.
Try the Originality.ai AI Detector. Then, learn about AI content detection accuracy and Originality’s strong performance in a meta-analysis of eight third-party studies.
To evaluate the detectability of GPT-o1, we prepared a dataset of 1000 GPT-o1-preview generated text samples.
For AI-text generation, we used GPT-o1-preview based on three approaches:
To evaluate the efficacy, we used the Open Source AI Detection Efficacy tool that we released:
Originality.ai has three models, 3.0.0 Turbo, 2.0.1 Standard, and 1.0.0 Lite, for AI text detection.
For additional information on each of these models, check out our AI detector and read our AI detection accuracy guide.
The open-source testing tool returns a variety of metrics for each detector you test, each of which reports on a different aspect of that detector’s performance, including:
For a detailed discussion of these metrics, what they mean, how they're calculated, and why we chose them, check out our blog post on AI detector evaluation. For a succinct snapshot, the confusion matrix is an excellent representation of a model's performance.
Below is an evaluation of all these models on the above dataset.
For this smaller test to identify the ability of Originality.ai’s AI detector to detect GPT-o1-preview content, we reviewed the True Positive Rate or the percentage (%) of time that the model correctly identified AI text as AI out of 1000 samples of GPT-o1-preview content.
1.0.0 Lite:
2.0.1 Standard:
3.0.0 Turbo:
To compare the efficacy of AI detectors, we also evaluated the dataset on the GPTZero AI detector. We've included the results of this performance below.
Additionally, we also ran the tool — GPTZero on the same dataset and here is its performance:
Based on the results, we were able to see that GPTZero significantly struggled with detecting Rewrite human-written text, whereas Originality continued to demonstrate a strong performance.
Overall, Originality.ai continues to demonstrate an outstanding capability to identify AI-generated content, including the latest releases of AI models such as OpenAI’s GPT-o1, GPT-4o and GPT-4o-mini.
Each of Originality.ai’s AI detection models detected GPT-o1 with a high degree of accuracy from 3.0.0 Turbo with 93.47% accuracy to 2.0.1 Standard with 94.47% accuracy, and 1.0.0 Lite with 91.66% accuracy. With the latest updates to our AI detection models, Turbo 3.0.1 can now detect GPT-o1-preview with 98% accuracy. Our machine learning engineers are continuing to improve our accuracy to 99%+, as with most new models released by OpenAI.