Artificial intelligence (AI) has rapidly become integral in educational settings, particularly in writing assignments. However, distinguishing between human and AI-generated content remains challenging. A recent study – “Using aggregated AI detector outcomes to eliminate false-positives in STEM-student writing,” conducted by Arizona State University, evaluated four AI detection tools to identify AI versus human-generated essays in a STEM educational environment.
Originality.ai correctly classified 98% of AI-generated essays.
Originality.ai exhibited one of the lowest false-positive rates (2%) among the evaluated tools.
Originality.ai offers the best balance between safety (low FP) and strength (high TP).
Study Details
This research involved undergraduate students enrolled in an anatomy and physiology course. The objective was to evaluate the performance of multiple AI detection tools, focusing on their accuracy in distinguishing human-written essays from AI-generated ones.
In total, 174 students submitted both a human-written and an AI-generated essay. However, for AI detector evaluation, a subset of 99 essays (50 human-written and 49 AI-generated) was used due to one corrupted AI-generated file.
AI Detection Tools Evaluated
Originality.ai
GPTzero
Copyleaks
Detect GPT
Dataset Information
The total dataset included 348 essays (174 human + 174 AI-generated). However, AI detectors were tested on a sample of 99 essays (50 human-written and 49 AI-generated). These essays responded to a prompt about plasma membrane anatomy and physiology. Each essay was approximately 150 words long.
Evaluation Criteria
Essays assessed by AI detectors were evaluated using the following metrics:
To provide insight into the accuracy metrics of this study we calculated the F1 Score and TPR based on the study's research data and the number of samples the study tested.
Originality.ai's Performance Highlights
AI Detector Tool
Originality.ai
Copyleaks
GPTZero
DetectGPT
False Positive (Human-written marked as AI)
2.0%
0%
2.0%
18%
False Negative (AI-written marked as Human)
2.0%
6.2%
10.2%
42.8%
F1 Score
98%
~96.9%
~93.6%
~65.3%
True Positive Rate TPR = TP / (TP + FN)
98%
94%
89.8%
57.2%
False Positive Rate FPR = FP / (FP + TN)
2.0%
0%
2.0%
18%
Strengths
Best balance & low error on both sides; Highly consistent
Some missed AI essays (6.2% FN); Good at detecting human content
Very low FP; Minimizes false flags on human writing
None notable
Weakness
None significant
Slightly higher FN
Misses more AI content (10.2% FN)
Extremely high FN (misses most AI essays); High FP
Overall Reliability
Very High
High
Medium-High
Very Low
1. Near-Perfect AI Detection Accuracy: 98% True Positive Rate
Originality.ai correctly identified 48 out of 49 AI-generated essays, giving it a True Positive (TP) rate of 98%.
2. Excellent False Positive Rate: Just 2%
Originality.ai flagged only 1 out of 50 human-written essays as AI — a false-positive rate of 2%.
Originality.ai is highly reliable with a low false positive rate, ensuring academic fairness.
3. Balanced Performance Across Both Sides (Human + AI Essays)
Originality.ai leads with a 98.0% F1 score, indicating an excellent balance between precision and recall.
With a TP rate of 98% and a TN rate of 98%, Originality.ai is one of the only detectors that maintains consistently high performance.
Additional Analysis
Human evaluators had an average False Positive rate of 5% and a True Positive rate of 85%, indicating that Originality.ai surpassed human evaluators in accuracy and consistency. This builds on previous studies that show humans struggle to identify AI content.
Student surveys revealed that 63.2% considered the AI-generated essays superior to their own in quality, indicating that Originality.ai can significantly help instructors maintain fairness.
Final Thoughts
Originality.ai offers the best balance and accuracy among all the evaluated AI detectors. Its remarkable precision, with only 2% false positives and 2% false negatives, highlights its superior reliability in distinguishing between AI-generated and human-written content.
This consistent performance not only surpasses other AI tools like Copyleaks and GPTZero but also outperforms human evaluators, including faculty and teaching assistants.
Given the growing concern around AI-generated submissions, Originality.ai stands out as an essential tool for educators and institutions committed to maintaining academic integrity. Its ability to minimize both false flags and missed detections makes it an invaluable asset in today's educational landscape.
Read more about AI detection and AI detection accuracy:
Founder / CEO of Originality.ai I have been involved in the SEO and Content Marketing world for over a decade. My career started with a portfolio of content sites, recently I sold 2 content marketing agencies and I am the Co-Founder of MotionInvest.com, the leading place to buy and sell content websites. Through these experiences I understand what web publishers need when it comes to verifying content is original. I am not For or Against AI content, I think it has a place in everyones content strategy. However, I believe you as the publisher should be the one making the decision on when to use AI content. Our Originality checking tool has been built with serious web publishers in mind!