AI Studies

Can Google Bard Gemini Pro Content be Detected

In this quick study, we look at the Gemini Pro model and the ability for Originality AI detector to effectively detect its content.

In this quick study, we look at the Gemini Pro model and the ability for Originality AI detector to effectively detect its content. 

Below is a summary of the key findings, method of testing, analysis of results and access to the dataset plus the open-source AI detection testing/statistical analysis tool used for this study.

Note:  this is ONLY a 200-sample test which is significantly too small for any conclusive answers! For a more complete AI detection accuracy study see this study

Key Findings:

  1. Google Bard Gemini Pro data can be identified at similar accuracies to existing LLM models:
  1. - 99.0% True Positive Rate
  2. GPTZero - 93.1% True Positive Rate
  3. Other Tools Testing is in Progress


  1. We created a 100-sample text dataset using Google Bard Gemini Pro on Dec 7
  2. We used a 100-sample known human dataset from this benchmark dataset
  3. Using the API’s for AI detectors we tested against multiple detectors…
  2. GPTZero
  3. More In Progress (API issues)
  1. This open-source AI detector accuracy tool was used to ensure the same content was fed into each API and the tool also automatically calculates the efficacy and complete statistical analysis of the results including F1, TPR, FPR etc.
  2. Results are presented below and the data of the test is available here 


We use machine learning best practises to evaluate a classifiers efficacy. Here is a guide that goes in depth into each of the measures used and AI detectors accuracy more broadly. 

The two most important things to understand when evaluating if an AI detector works is the confusion matrix (example shown below) and the F1 score.

Confusion Matrix (example for performance on GPT-4) 

The F1 score is a commonly used metric to turn the overall Confusion Matrix into a single number is the F1 

Gemini AI Detection Results:

F1 score: 0.985
Precision: 0.981
Recall (True Positive Rate): 0.990
Specificity (True Negative Rate): 0.980
False Positive Rate: 0.020
Accuracy: 0.985

Originality was able to detect the correct article 99% of the time with a 2% false positive rate.


F1 score: 0.960
Precision: 0.990
Recall (True Positive Rate): 0.931
Specificity (True Negative Rate): 0.990
False Positive Rate: 0.010
Accuracy: 0.961

GPTZero performed well on this test correctly identify 93% of the AI generated articles and 99% of the human generated articles correctly. 


Based on this study it appears that the detectability of Google Bard Gemini Pro aligns with other LLM’s such as ChatGPT (GPT-3.5, GPT-4). 

Originality outperformed GPTZero on AI detection and slightly underperformed on Human Content detection. 

Always remember that AI detectors are not perfect and do produce false positives. However, they do work and we are still looking for a participant in our challenge for charity (do AI detectors work). 

If you are interested in running your own study please reach out as we are happy to offer research credits.

Jonathan Gillham

Founder / CEO of I have been involved in the SEO and Content Marketing world for over a decade. My career started with a portfolio of content sites, recently I sold 2 content marketing agencies and I am the Co-Founder of, the leading place to buy and sell content websites. Through these experiences I understand what web publishers need when it comes to verifying content is original. I am not For or Against AI content, I think it has a place in everyones content strategy. However, I believe you as the publisher should be the one making the decision on when to use AI content. Our Originality checking tool has been built with serious web publishers in mind!

More From The Blog

AI Content Detector & Plagiarism Checker for Serious Content Publishers

Improve your content quality by accurately detecting duplicate content and artificially generated text.