OpenAI has unveiled GPT-4.1, a powerful new AI model optimized for coding and instruction following.
According to OpenAI, this model “outperforms GPT‑4o and GPT‑4o mini across the board.”
With support for an unprecedented one million token context window, GPT-4.1 marks a major leap in processing capacity, far beyond the previous GPT‑4o models’ 128,000-token limit.
In light of this release, we evaluated GPT-4.1 to test the accuracy of our AI Detector.
This quick study looks at 1000 GPT-4.1-generated text results to answer whether GPT-4.1 can be detected.
Try our AI Detector here.
To evaluate the detectability of GPT-4.1, we prepared a dataset of 1000 GPT-4.1-generated text samples.
For AI-text generation, we used GPT-4.1 based on three approaches given below:
To evaluate the efficacy, we used our Open Source AI Detection Efficacy tool:
Originality.ai has three models — Model 3.0.1 Turbo, 1.0.0 Lite, and Multi Language for AI text detection.
Learn more about which AI detection model is best for you and your use case.
The open-source testing tool returns a variety of metrics for each detector tested, each of which reports on a different aspect of that detector’s performance, including:
If you'd like a detailed discussion of these metrics, what they mean, how they're calculated, and why we chose them, check out our blog post on AI detector evaluation. For a succinct snapshot, the confusion matrix is an excellent representation of a model's performance.
Below is an evaluation of both models on the above dataset.
For this small test to reflect the Originality.ai AI Detector’s ability to identify GPT-4.1 content, we looked at the True Positive Rate, or the percentage of the time the model correctly identified AI text as AI out of a 1000-sample GPT-4.1 content.
Model 3.0.1 Turbo:
Model 1.0.0 Lite:
Our study confirms that the content generated by GPT-4.1 AI-generated text is highly detectable with our AI detector. The Model 3.0.1 Turbo exhibited strong performance with 97.9% accuracy, while Model 1.0.0 Lite followed closely with 94.5%.
These results highlight the effectiveness of the Originality.ai AI detector in identifying AI-generated content, even with the latest releases of popular AI models, like GPT-4.1, ensuring reliable detection across various text generation approaches.
Interested in learning more about AI detection? Check out our guides:
Originality.ai is exceptional at identifying AI-generated medical articles, according to the study “The Great Detectives: Humans vs. AI Detectors in Catching Large Language Model-Generated Medical Writing,” 2024.