AI Studies

Can Anthropic AI Claude 3 Be Detected

Comparative analysis of Claude 3's AI detection capabilities versus other leading alternatives. Insights, dataset, and tools available for exploration.

Claude 3. We wanted to test how well the Originality AI detector and other leading alternatives could detect its content as AI-generated

Below, you can find a summary of the key findings, how these findings were discovered through our testing, and a short analysis of the results.

Plus, you can also access the dataset and the open-source AI detection testing/statistical analysis tool used for this study.

Key Findings:

  1. Anthropic AI Claude 3 data can be identified at similar accuracies to existing LLM models:
  2. Originality AI Standard 2.0 - 94.3% True Positive Rate
  3. GPTZero - 82.4% True Positive Rate
  4. Originality AI Turbo 3.0 - 98.1% True Positive Rate
  5. Sapling - 81.3% True Positive Rate
  6. Copyleaks - 91.9% True Positive Rate

Method:

  • We created a 100-sample text dataset using Google Bard Gemini Pro on Dec 7
  • We used a 100-sample known human dataset from this benchmark dataset
  • Using the APIs for AI detectors, we tested against multiple detectors…some text
    • Originality AI Standard 2.0
    • GPTZero
    • Originality AI Turbo 3.0
    • Sapling
    • Copyleaks
  • This open-source AI detector accuracy tool was used to ensure the same content was fed into each API and the tool also automatically calculates the efficacy and complete statistical analysis of the results including F1, TPR, FPR etc.
  • Results are presented below, and the data of the test is available here 

Analysis:

To get the most accurate results, we used machine learning best practices to evaluate a classifier's efficacy. If you want to learn more about the measures used and AI detector accuracy, check out this detailed guide.

Claude 3 AI Detection Results:

Claude 3 AI Detection Results

Originality AI Standard 2.0

Originality AI Standard 2.0 was able to detect the correct answer 94.3%

F1 score: 0.971

Precision: 1.0

Recall (True Positive Rate): 0.943

Specificity (True Negative Rate): 0.0

False Positive Rate: 0.0

Accuracy: 0.94

Originality AI Standard 2.0 was able to detect the correct answer 94.3% of the time with a 0% false positive rate.

Originality AI Turbo 3.0

Originality AI Turbo 3 correctly identifying 98.1% of the content as AI-generated

F1 score: 0.99

Precision: 1.0

Recall (True Positive Rate): 0.981

Specificity (True Negative Rate): 0.0

False Positive Rate: 0.0

Accuracy: 0.981

Originality AI Turbo 3.0 provided the best results in this test, correctly identifying 98.1% of the content as AI-generated and only getting 1.9% incorrect. 

GPTZero

GPTZero correctly identifying 82.4% of the AI-generated articles as AI-written

F1 score: 0.903

Precision: 1.0

Recall (True Positive Rate): 0.824

Specificity (True Negative Rate): 0.0

False Positive Rate: 0.0

Accuracy: 0.824

GPTZero also performed solidly on this test, correctly identifying 82.4% of the AI-generated articles as AI-written and 17.6% of them incorrectly as human-written.

Sapling

Sapling identify 81.3% of the content as AI-generated

F1 score: 0.90

Precision: 1.0

Recall (True Positive Rate): 0.813

Specificity (True Negative Rate): 0.0

False Positive Rate: 0.0

Accuracy: 0.813

Sapling performed the worst out of all the tools tested but still managed to identify 81.3% of the content as AI-generated, incorrectly identifying 18.7% as human-written.

Copyleaks

Copyleaks identifying 91.9% of the AI content as AI-generated

F1 score: 0.958

Precision: 1.0

Recall (True Positive Rate): 0.919

Specificity (True Negative Rate): 0.0

False Positive Rate: 0.0

Accuracy: 0.919

Copyleaks also did well when testing Claude 3, identifying 91.9% of the AI content as AI-generated and 8.1% falsely identified as human-written.

In Summary

From the results of this study, Anthropic AI’s Claude 3 detectability appears to align with that of other LLMs, such as ChatGPT (GPT-3.5, GPT-4). 

Both ‍Originality Standard 2.0 and Turbo 3.0  outperformed GPTZero, Sapling, and Copyleaks on AI detection, with our Turbo 3.0 model performing particularly well. 

‍Always remember that AI detectors are not perfect and do produce false positives. However, they do work and we are still looking for a participant in our challenge for charity (do AI detectors work). 

‍If you are interested in running your own study, please reach out, as we are happy to offer research credits.

Jonathan Gillham

Founder / CEO of Originality.AI I have been involved in the SEO and Content Marketing world for over a decade. My career started with a portfolio of content sites, recently I sold 2 content marketing agencies and I am the Co-Founder of MotionInvest.com, the leading place to buy and sell content websites. Through these experiences I understand what web publishers need when it comes to verifying content is original. I am not For or Against AI content, I think it has a place in everyones content strategy. However, I believe you as the publisher should be the one making the decision on when to use AI content. Our Originality checking tool has been built with serious web publishers in mind!

More From The Blog

AI Content Detector & Plagiarism Checker for Serious Content Publishers

Improve your content quality by accurately detecting duplicate content and artificially generated text.