A recent study “ESPERANTO: Evaluating Synthesized Phrases to Enhance Robustness in AI Detection for Text Origination” accessible via Cornell University, highlights the vulnerabilities of existing detectors against evasion techniques, such as back-translation, and provides a dataset to evaluate their robustness.
In this context, we tested Originality.ai’s AI Detector using the same dataset and methods to validate its capabilities. Here's how it stands out as a leader in the domain.
Key Findings (TL;DR)
Learn more about Originality.ai’s efficacy in AI detection in our AI Detection Accuracy Study and a Meta-Analysis of Third-Party AI Detection Studies.
The study introduces back-translation as a method to manipulate AI-generated texts – a technique where AI-generated text is translated into multiple languages and then back into the original language. This process retains the original meaning but can alter the text enough to evade detection.
Nine detection tools were evaluated on a newly built dataset of 720k texts. The results revealed significant performance gaps in many tools, especially when tested against back-translated texts.
Open-source tools: RADAR, LLMDet, Likelihood, Rank, Log-Rank, ESAS
Commercial tools: Pangram, GPTZero, ZeroGPT
The dataset used in the study, named ESPERANTO, comprises:
True Positive Rate (TPR)
Originality.ai consistently delivered near-perfect TPR scores (average of 99.7%) for detecting AI-generated texts across all categories. This indicates unmatched precision in identifying original versus generated content.
Despite the challenge posed by manipulated back-translated texts, Originality.ai maintained an average TPR of 85.6%, outperforming competitors like GPTZero, ZeroGPT etc.
Across categories such as News, Reddit QA, and scientific abstracts, it demonstrated superior performance, achieving high accuracy even in nuanced or paraphrased text styles.
In a world where detecting AI-generated content is becoming more critical than ever, Originality.ai is paving the way for reliable and effective solutions. Even under the challenging conditions of back-translation manipulation, its consistently high TPR, adaptability across various domains, and strong robustness highlight its superiority over existing tools.