On May 22, 2025, Anthropic announced Claude 4, including the Claude Sonnet 4 and Claude Opus 4 models.
Claude Sonnet 4 is the latest upgrade in Anthropic’s Sonnet series, which offers a significant boost in coding accuracy, reasoning, and instruction.
Then, Anthropic describes Claude Opus 4 as “the world’s best coding model” and is designed to excel in complex coding and agent workflows. Opus 4 also delivers sustained performance over long-running tasks. Further, it leads industry benchmarks like SWE-bench and Terminal-bench.
Following this release, we analyzed Claude Sonnet 4 and Claude Opus 4 to assess how the advancements of these models impact the performance of our AI detector.
To test how well AI-generated text can be detected, we ran two tests:
The study’s tests aimed to check how accurate our tool is at detecting AI-written content.
It was interesting to note that despite being different models, Claude 4 Sonnet and Claude 4 Opus performed identically when it came to detectability.
Try our AI Detector.
To evaluate the detectability of Claude 4 Sonnet, we prepared a dataset of 1000 Claude 4 Sonnet-generated text samples.
For AI-text generation, we used Claude 4 Sonnet based on three approaches given below:
To evaluate the detectability of Claude 4 Opus, we prepared a dataset of 1000 Claude 4 Opus-generated text samples.
For AI-text generation, we used Claude 4 Opus based on three approaches given below:
To evaluate the efficacy, we used our Open Source AI Detection Efficacy tool:
Originality.ai has three models — Model 3.0.1 Turbo, 1.0.0 Lite, and Multi Language for the purpose of AI text detection.
For this test, we evaluated Claude 4 Sonnet & Opus with the Turbo 3.0.1 model.
Learn more about which AI detection model is best for you and your use case.
The open-source testing tool returns a variety of metrics, each of which reports on a different aspect of performance, including:
If you'd like a detailed discussion of these metrics, what they mean, how they're calculated, and why we chose them, check out our blog post on AI detector evaluation. For a succinct snapshot, the confusion matrix is an excellent representation of a model's performance.
Below is an evaluation of the Turbo 3.0.1 model on the above datasets.

For this small test to reflect the Originality.ai AI detector’s ability to identify Claude 4 Sonnet content, we looked at the True Positive Rate or the percentage of the time the model correctly identified AI text as AI out of a 1000-sample of Claude 4 Sonnet content.
Model 3.0.1 Turbo:

For this small test to reflect the Originality.ai detector’s ability to identify Claude 4 Opus content, we looked at the True Positive Rate or the percentage of the time the model correctly identified AI text as AI out of a 1000-sample of Claude 4 Opus content.
Model 3.0.1 Turbo:
Our study confirms that the content generated by Claude 4 Sonnet and Opus is highly detectable with our AI detector.
The Originality.ai model 3.0.1 Turbo excelled with 98.4% accuracy in detecting both Claude 4 Sonnet and Claude 4 Opus-generated text in our tests.
These results highlight the effectiveness of the Originality.ai AI detector in identifying AI-generated content, ensuring reliable detection across various text generation approaches.
Interested in learning more about AI detection? Check out our guides:

MoltBook may be making waves in the media… but these viral agent posts are highly concerning. Originality.ai’s study with our proprietary fact-checking software found that Moltbook produces 3 X more harmful factual errors than Reddit.