In this quick study, we look at the Gemini Pro model and the ability for Originality AI detector to effectively detect its content.
Below is a summary of the key findings, method of testing, analysis of results and access to the dataset plus the open-source AI detection testing/statistical analysis tool used for this study.
Note: this is ONLY a 200-sample test which is significantly too small for any conclusive answers! For a more complete AI detection accuracy study see this study.
We use machine learning best practises to evaluate a classifiers efficacy. Here is a guide that goes in depth into each of the measures used and AI detectors accuracy more broadly.
The two most important things to understand when evaluating if an AI detector works is the confusion matrix (example shown below) and the F1 score.
Confusion Matrix (example for Originality.ai performance on GPT-4)
The F1 score is a commonly used metric to turn the overall Confusion Matrix into a single number is the F1
Originality was able to detect the correct article 99% of the time with a 2% false positive rate.
GPTZero performed well on this test correctly identify 93% of the AI generated articles and 99% of the human generated articles correctly.
Based on this study it appears that the detectability of Google Bard Gemini Pro aligns with other LLM’s such as ChatGPT (GPT-3.5, GPT-4).
Originality outperformed GPTZero on AI detection and slightly underperformed on Human Content detection.
Always remember that AI detectors are not perfect and do produce false positives. However, they do work and we are still looking for a participant in our challenge for charity (do AI detectors work).
If you are interested in running your own study please reach out as we are happy to offer research credits.
Have you seen a thought leadership LinkedIn post and wondered if it was AI-generated or human-written? In this study, we looked at the impact of ChatGPT and generative AI tools on the volume of AI content that is being published on LinkedIn. These are our findings.
We believe that it is crucial for AI content detectors reported accuracy to be open, transparent, and accountable. The reality is, each person seeking AI-detection services deserves to know which detector is the most accurate for their specific use case.