Can humans detect AI-generated text?
Some argue they can, while others rely on high-performing AI detection tools such as Originality.ai.
With the high impact of identifying (or misidentifying) AI-generated content, it’s essential to establish if humans can detect AI-generated text. To evaluate this further, we reviewed six independent studies on the subject.
Despite high confidence, people typically struggle to detect AI-generated text. Our study on the most commonly used ChatGPT words found that ChatGPT is harder to identify than most think.
To evaluate this further, we’re summarizing and reviewing six third-party studies on the subject of humans detecting AI-generated text.
The first study is available through ScienceDirect and investigates the ability of teachers to detect AI-generated content.
As AI continues to advance, detecting AI-generated content in education is becoming increasingly important. So, the researchers in this study conducted a series of experiments to see if teachers could differentiate between AI text and human-written text.
The study incorporated two groups of participants:
Study 1 (participants were pre-service teachers)
“Pre-service teachers identified only 45.1 percent of AI-generated texts correctly and 53.7 percent of student-written texts.”
Study 2 (participants were experienced teachers)
“They identified only 37.8 percent of AI-generated texts correctly, but 73.0 percent of student-written texts.”
The findings show that teachers struggle to distinguish AI-generated content from human-written work without support from leading AI detection tools.
Study: Do teachers spot AI? Evaluating the detectability of AI-generated texts among student essays
Source: ScienceDirect
In this study, accessible through Cornell University, researchers again assessed participants' ability to differentiate between content produced by AI or written by humans.
The content researched in this study included “stories, news articles, and recipes.”
The findings further show that humans struggle to identify AI-generated text, even if they have received some form of training prior to the task.
Find out about the accuracy of AI detectors according to third-party researchers in a meta-analysis of AI-detection accuracy studies.
Study: All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text
Source: Cornell University
This study, also accessible through Cornell University, offers additional insight into the ability of humans to detect AI-generated content produced by three AI models: GPT-4, GPT-3.5, and ELIZA.
The study focused on how successful humans are at identifying AI-generated text. It included 500 participants and evaluated their AI-detection efficacy using a ‘game’ that resembled a messaging app.
Further, the study evaluated whether or not prior knowledge or familiarity with AI and large language models (LLMs) could impact the results.
This study further highlights that humans struggle to identify AI-generated text consistently, even if they are familiar with LLM subject matter.
Interested in learning more about AI detection and using AI detectors? Read the Originality AI Content Detector Accuracy Review for more information.
Study: People cannot distinguish GPT-4 from a human in a Turing test
Source: Cornell University
The International Journal for Educational Integrity also conducted a detailed study into AI detection, focusing specifically on academic writing. The aim was to see whether participants could identify AI-generated essays from human-written ones.
The study included 140 college instructors and 145 college students as participants. Researchers provided each participant with one human-written student essay and one essay generated by AI. Then, they were asked to evaluate which was AI-generated and which was student-written.
The results demonstrate that both educators and students have difficulty identifying AI usage in academic work.
Study: Can you spot the bot? Identifying AI-generated writing in college essays
Source: International Journal for Educational Integrity via BioMed Central Ltd
The fifth study on this list is available in Arthroscopy: The Journal of Arthroscopic & Related Surgery, and focuses on a reviewer’s ability to identify ChatGPT-generated abstracts.
As with other studies, participants were shown a mixture of AI-generated and human-written content, this time in the format of scientific abstracts, and were asked to select which ones they believed were AI-generated.
As with the other studies, this further proved that humans struggle to detect AI-generated content.
Study: Identification of ChatGPT-Generated Abstracts Within Shoulder and Elbow Surgery Poses a Challenge for Reviewers
Source: Arthroscopy: The Journal of Arthroscopic & Related Surgery
Lastly, we have the study from the Canadian Society of Plastic Surgeons, focusing on content in the highly specialized context of medical residency applications.
The participants were two recently retired surgeons with 20 years of experience in the CaRMS (Canadian Residency Matching Service). They were asked to evaluate 11 AI and 11 human applications and identify whether they were human-written or generated by ChatGPT-4.
The findings once again show that identifying AI content is challenging for humans. It even proves challenging for experienced professionals with over two decades of experience in medical residency applications to identify human vs. AI residency applications.
Study: Can ChatGPT Fool the Match? Artificial Intelligence Personal Statements for Plastic Surgery Residency Applications: A Comparative Study
Source: Canadian Society of Plastic Surgeons
So, can humans detect AI-generated text? As demonstrated by these studies, the answer is no.
While humans can correctly identify AI text in some instances, they still struggle overall to differentiate AI content from human-written content. As a result, for additional clarity, an accurate AI content detector is highly beneficial.
In general, humans struggle to detect AI-generated content. Studies show that human accuracy can vary significantly.
The efficacy of AI detection tools varies depending on the tool. The Originality.ai Lite AI Checker exhibits 98% accuracy in AI detection. Additionally, the Turbo model demonstrates 99%+ AI detection accuracy.
To compare the efficacy of popular AI tools, check out our AI detection review series.
There can be significant implications, including the spread of misinformation, academic dishonesty, and a lack of authenticity in online information.
Have you seen a thought leadership LinkedIn post and wondered if it was AI-generated or human-written? In this study, we looked at the impact of ChatGPT and generative AI tools on the volume of AI content that is being published on LinkedIn. These are our findings.
We believe that it is crucial for AI content detectors reported accuracy to be open, transparent, and accountable. The reality is, each person seeking AI-detection services deserves to know which detector is the most accurate for their specific use case.