Eliminating False-Positives in STEM-Student Writing: Originality.ai Outshines

Originality.ai has exceptional accuracy and low false positives in STEM-student writing according to a study by Arizona State University.

November 1, 2025

Artificial intelligence (AI) has rapidly become integral in educational settings, particularly in writing assignments. However, distinguishing between human and AI-generated content remains challenging. A recent study – “Using aggregated AI detector outcomes to eliminate false-positives in STEM-student writing,” conducted by Arizona State University, evaluated four AI detection tools to identify AI versus human-generated essays in a STEM educational environment.

Originality.ai demonstrated outstanding performance.

Key Findings (TL;DR)

Originality.ai correctly classified 98% of AI-generated essays.
Originality.ai exhibited one of the lowest false-positive rates (2%) among the evaluated tools.
Originality.ai offers the best balance between safety (low FP) and strength (high TP).

Study Details

This research involved undergraduate students enrolled in an anatomy and physiology course. The objective was to evaluate the performance of multiple AI detection tools, focusing on their accuracy in distinguishing human-written essays from AI-generated ones.

In total, 174 students submitted both a human-written and an AI-generated essay. However, for AI detector evaluation, a subset of 99 essays (50 human-written and 49 AI-generated) was used due to one corrupted AI-generated file.

AI Detection Tools Evaluated

Originality.ai
GPTzero
Copyleaks
Detect GPT

Dataset Information

The total dataset included 348 essays (174 human + 174 AI-generated). However, AI detectors were tested on a sample of 99 essays (50 human-written and 49 AI-generated). These essays responded to a prompt about plasma membrane anatomy and physiology. Each essay was approximately 150 words long.

Evaluation Criteria

Essays assessed by AI detectors were evaluated using the following metrics:

True Positive (TP): AI-generated content correctly identified.
True Negative (TN): Human-written content correctly identified.
False Positive (FP): Human-written essays wrongly identified as AI-generated.
False Negative (FN): AI-generated essays wrongly identified as human-written.

Percentages such as FP and TP rates were calculated based on this 99-essay evaluation set.

Formulas for evaluation calculations:

TPR (Recall or Sensitivity) = TP / (TP + FN)
FPR = FP / (FP + TN)
Precision = TP / (TP + FP)
F1 Score = 2 * (Precision * Recall) / (Precision + Recall)

To provide insight into the accuracy metrics of this study we calculated the F1 Score and TPR based on the study's research data and the number of samples the study tested.

Originality.ai's Performance Highlights

AI Detector Tool	Originality.ai	Copyleaks	GPTZero	DetectGPT
False Positive (Human-written marked as AI)	2.0%	0%	2.0%	18%
False Negative (AI-written marked as Human)	2.0%	6.2%	10.2%	42.8%
F1 Score	98%	~96.9%	~93.6%	~65.3%
True Positive Rate TPR = TP / (TP + FN)	98%	94%	89.8%	57.2%
False Positive Rate FPR = FP / (FP + TN)	2.0%	0%	2.0%	18%
Strengths	Best balance & low error on both sides; Highly consistent	Some missed AI essays (6.2% FN); Good at detecting human content	Very low FP; Minimizes false flags on human writing	None notable
Weakness	None significant	Slightly higher FN	Misses more AI content (10.2% FN)	Extremely high FN (misses most AI essays); High FP
Overall Reliability	Very High	High	Medium-High	Very Low

‍

1. Near-Perfect AI Detection Accuracy: 98% True Positive Rate

Originality.ai correctly identified 48 out of 49 AI-generated essays, giving it a True Positive (TP) rate of 98%.

2. Excellent False Positive Rate: Just 2%

Originality.ai flagged only 1 out of 50 human-written essays as AI — a false-positive rate of 2%.
Originality.ai is highly reliable with a low false positive rate, ensuring academic fairness.

Learn more about Originality.ai for educators.

3. Balanced Performance Across Both Sides (Human + AI Essays)

Originality.ai leads with a 98.0% F1 score, indicating an excellent balance between precision and recall.
With a TP rate of 98% and a TN rate of 98%, Originality.ai is one of the only detectors that maintains consistently high performance.

Additional Analysis

Human evaluators had an average False Positive rate of 5% and a True Positive rate of 85%, indicating that Originality.ai surpassed human evaluators in accuracy and consistency. This builds on previous studies that show humans struggle to identify AI content.
Student surveys revealed that 63.2% considered the AI-generated essays superior to their own in quality, indicating that Originality.ai can significantly help instructors maintain fairness.

Final Thoughts

Originality.ai offers the best balance and accuracy among all the evaluated AI detectors. Its remarkable precision, with only 2% false positives and 2% false negatives, highlights its superior reliability in distinguishing between AI-generated and human-written content.

This consistent performance not only surpasses other AI tools like Copyleaks and GPTZero but also outperforms human evaluators, including faculty and teaching assistants.

Given the growing concern around AI-generated submissions, Originality.ai stands out as an essential tool for educators and institutions committed to maintaining academic integrity. Its ability to minimize both false flags and missed detections makes it an invaluable asset in today's educational landscape.

Read more about AI detection and AI detection accuracy:

Jonathan Gillham

View All Posts By Author

Founder / CEO of Originality.ai I have been involved in the SEO and Content Marketing world for over a decade. My career started with a portfolio of content sites, recently I sold 2 content marketing agencies and I am the Co-Founder of MotionInvest.com, the leading place to buy and sell content websites. Through these experiences I understand what web publishers need when it comes to verifying content is original. I am not For or Against AI content, I think it has a place in everyones content strategy. However, I believe you as the publisher should be the one making the decision on when to use AI content. Our Originality checking tool has been built with serious web publishers in mind!