Case Studies

Can Grok AI Content Be Detected?

Grok AI: Undetectable by AI content checkers? New study reveals shocking results! Can YOU spot the AI-written content?

Given the popularity of our recent studies into Google Bard content and Mixtral AI content, we have decided to conduct another small study into whether or not Grok AI content can get detected by some of the most popular AI content detection tools, including Originality.ai.

Here, we will look at the Grok AI model in greater detail and test the AI content detection tool’s abilities to detect whether the content generated is AI-written.

Read on to learn about the top-level results from this study, alongside a short analysis of how each tool performed.

You can also access the dataset here or the open-source AI detection testing/statistical analysis tool used for this study. 

We have made and open-sourced an AI detector efficacy research tool that is even easier for researchers to use to test AI detector efficacy against a dataset. 

Note:  this is ONLY a 200-sample test which is significantly too small for any conclusive answers. For a more complete AI detection accuracy study see this study

Short on time? Here are the key findings:

  1. Grok AI data can be identified at similar accuracies to existing LLM models, such as Google Bard and ChatGPT.
  1. Originality.ai - 90% True Positive Rate
  2. GPTZero - 68.6% True Positive Rate
  3. CopyLeaks - 67.5% True Positive Rate
  4. Sapling - 71% True Positive Rate

Method:

  • We created a 200-sample text dataset using Grok on Jan 17
  • Using the APIs for AI detectors, we tested against multiple detectors:
  • To ensure that this test was as fair and as impartial as possible, we conducted it using this open-source AI detector accuracy tool. This tool allowed us to ensure each API received the exact same articles so that there was continuity across each one and no margin for error or an easier segment for one tool compared to another.
  • Results are presented below, and the data of the test is available here 

Analysis:

To see how well each of these tools worked, we used machine learning best-practises, testing a wide variety of AI-generated content for maximum effect.

If you want to learn a little bit more about that process, check out this detailed guide. You can also learn more about the accuracy of AI detectors as a whole.

When trying to determine how effective an AI detector is, the easiest and most consistent method to use is by focusing on the confusion matrix (which you will see outlined for each detector in this article) and the F1 score. The F1 score is frequently used as a metric to convert the overall Confusion Matrix into a single figure.

Mixtral AI Detection Results:

Originality.ai

Originality.ai correctly identified 90 Percent of the content as AI-written

F1 score: 0.95

Recall: 0.9

Accuracy: 0.9

From our 200+ AI-generated article sample range, Originality.ai correctly identified 90% of the content as AI-written while incorrectly attributing 10% to human-written content.

GPTZero

GPTZero detecting 68.6 Percent of the content as AI-generated

F1 score: 0.81

Recall: 0.69

Accuracy: 0.69

GPTZero performed slightly worse for this test, detecting 68.6% of the content as AI-generated and incorrectly attributing 31.4% of the content as human-written.

CopyLeaks

CopyLeaks detecting 67 Percent of the content as AI-generated

F1 score: 0.81

Recall: 0.68

Accuracy: 0.68

CopyLeaks performed similarly to GPTZero, also significantly underperforming compared to Originality.ai, identifying 67.5% of the content as AI-generated and incorrectly claiming that 32.5% of the content is human-written.

Sapling

Sapling correctly identified 71 Percent of the content as AI-written

F1 score: 0.83

Recall: 0.71

Accuracy: 0.71

Sapling faired a little better than both CopyLeaks and GPTZero, but worse than Originality.ai, detecting 71% of the content as AI-generated and incorrectly determining 29% is AI-generated.

Summary

As you can see from the results of this test, Grok AI’s detectability is very similar across all tools compared to our other similar tests for Google Bard, ChatGPT, and Mixtral.

Originality.ai performed the strongest, with a 90% success rate compared to Sapling’s 71%, GPTZero’s 68.6%, and CopyLeaks’ 67.5%.

As you can see from this small dataset, even the best AI detectors have flaws, and that must be taken into account. However, it is clear from this study that the Originality.ai tool continues to lead the way with the most accurate AI content detection software.

It also highlights why these types of study are so important as we continue to push the conversation of AI transparency forward, to allow all of us to continue to learn, grow, and improve together. 

With that in mind, if you are interested in running your own study, please reach out, as we are happy to offer research credits.

We are also still looking for a participant in our challenge for charity (do AI detectors work). If you'd like to get involved, please get in touch.

Jonathan Gillham

Founder / CEO of Originality.ai I have been involved in the SEO and Content Marketing world for over a decade. My career started with a portfolio of content sites, recently I sold 2 content marketing agencies and I am the Co-Founder of MotionInvest.com, the leading place to buy and sell content websites. Through these experiences I understand what web publishers need when it comes to verifying content is original. I am not For or Against AI content, I think it has a place in everyones content strategy. However, I believe you as the publisher should be the one making the decision on when to use AI content. Our Originality checking tool has been built with serious web publishers in mind!

More From The Blog

AI Content Detector & Plagiarism Checker for Serious Content Publishers

Improve your content quality by accurately detecting duplicate content and artificially generated text.