This quick study examines whether the text content Grok 4 creates can be detected by leading AI detectors or is it undetectable?
Yes, Grok 4 AI content can be detected by the most accurate Originality.ai AI Checker at a 100% accuracy rate based on initial tests of Grok 4 text samples.
Here, we’ll look at the Grok AI model in greater detail and test the AI content detection tool’s abilities to detect whether the content generated is AI-written.
First, Grok 4 created 10 prompts for creating a text sample that might bypass AI detectors (shown in the image below).
Next, Grok 4 created those 10 text samples.
All 10 of those text samples were tested against 4 AI detectors:
Each response from Grok was run through the 4 AI detectors, and their AI score was recorded.
In the image above, the incorrect predictions are highlighted in red.
Originality.ai demonstrated exceptional performance and correctly identified all 10 samples of Grok content as AI.
Overall, 100% of Originality.ai’s predictions in identifying Grok 4 content as AI were correct.
Results:
Like Originality.ai, GPTZero was able to successfully identify all 10 samples as AI-generated.
See an in-depth review of GPTZero here.
Results:
ZeroGPT performed poorly, and only correctly identified 40% of the Grok 4 samples as AI-generated.
Read a review of ZeroGPT here.
Results:
Grammarly’s AI detector also performed poorly, only identifying 40% of the Grok 4 samples as AI-generated.
Read a review of the Grammarly AI Detector.
Results:
Based on a robust AI detection accuracy study for Originality.ai’s latest AI detection model, below is the accuracy for detecting Grok 3:
We have made and open-sourced an AI detector efficacy research tool that is even easier for researchers to use to test AI detector efficacy against a dataset.
As you can see from the results of this test, Grok 3 and Grok 4 AI text can be identified by leading AI detectors like Originality.ai’s AI Checker.
However, other AI detectors do struggle with identifying it.
Whenever a new LLM model is released (like Grok 4), we test it to evaluate our AI detector's efficacy, and there is typically a slight accuracy drop, which is then closed once the AI detection models can train on the new LLM model's content.
This was a simple test of accuracy. It’s important to remember that when determining how effective an AI detector is, the most consistent method to use is a confusion matrix and the F1 score (learn more about comprehensive AI detection accuracy tests in our study).
This study highlights the importance of these types of studies, as we continue to push the conversation of AI transparency forward, allowing all of us to continue learning, growing, and improving together.
With that in mind, if you are interested in running your own study, please reach out, as we are happy to offer research credits.
We are also still looking for participants in our challenge for charity (do AI detectors work). If you'd like to get involved, please get in touch.
Learn more about AI detection and AI detection accuracy: