In this quick study, we look at the Gemini Pro model and the ability for Originality AI detector to effectively detect its content.
Below is a summary of the key findings, method of testing, analysis of results and access to the dataset plus the open-source AI detection testing/statistical analysis tool used for this study.
Note: this is ONLY a 200-sample test which is significantly too small for any conclusive answers! For a more complete AI detection accuracy study see this study.
We use machine learning best practises to evaluate a classifiers efficacy. Here is a guide that goes in depth into each of the measures used and AI detectors accuracy more broadly.
The two most important things to understand when evaluating if an AI detector works is the confusion matrix (example shown below) and the F1 score.
Confusion Matrix (example for Originality.ai performance on GPT-4)
The F1 score is a commonly used metric to turn the overall Confusion Matrix into a single number is the F1
Originality was able to detect the correct article 99% of the time with a 2% false positive rate.
GPTZero performed well on this test correctly identify 93% of the AI generated articles and 99% of the human generated articles correctly.
Based on this study it appears that the detectability of Google Bard Gemini Pro aligns with other LLM’s such as ChatGPT (GPT-3.5, GPT-4).
Originality outperformed GPTZero on AI detection and slightly underperformed on Human Content detection.
Always remember that AI detectors are not perfect and do produce false positives. However, they do work and we are still looking for a participant in our challenge for charity (do AI detectors work).
If you are interested in running your own study please reach out as we are happy to offer research credits.