AI Studies

Can Humans Detect AI-Generated Text? 6 Studies Would Suggest They Can’t

As AI continues advancing, it’s essential to establish whether humans can identify AI-generated text. To evaluate this further, we reviewed six independent studies on the subject.

Can humans detect AI-generated text?

Some argue they can, while others rely on high-performing AI detection tools such as Originality.ai.

With the high impact of identifying (or misidentifying) AI-generated content, it’s essential to establish if humans can detect AI-generated text. To evaluate this further, we reviewed six independent studies on the subject.

Key Takeaways (TL;DR)

  • Despite confidence in their own ability, humans tend to struggle to spot AI-generated content.
  • AI content detectors are significantly more effective than humans at identifying AI content.
  • There are some significant implications of not being able to detect AI content, such as the spread of misinformation, academic dishonesty, and a lack of authenticity online.

Can Humans Detect AI-Generated Text? 6 Studies Explained 

Despite high confidence, people typically struggle to detect AI-generated text. Our study on the most commonly used ChatGPT words found that ChatGPT is harder to identify than most think. 

To evaluate this further, we’re summarizing and reviewing six third-party studies on the subject of humans detecting AI-generated text.

1. Do teachers spot AI? Evaluating the detectability of AI-generated texts among student essays

Study overview

The first study is available through ScienceDirect and investigates the ability of teachers to detect AI-generated content.

As AI continues to advance, detecting AI-generated content in education is becoming increasingly important. So, the researchers in this study conducted a series of experiments to see if teachers could differentiate between AI text and human-written text.

The study incorporated two groups of participants:

Study 1 (participants were pre-service teachers)

 “Pre-service teachers identified only 45.1 percent of AI-generated texts correctly and 53.7 percent of student-written texts.”

Study 2 (participants were experienced teachers)

 “They identified only 37.8 percent of AI-generated texts correctly, but 73.0 percent of student-written texts.”

Key findings

  • The participants were unable to consistently identify AI and human-written content.
    • Pre-service teachers only accurately identified 45.1% of AI texts.
    • Experienced teachers only accurately identified 37.8% of AI texts.
  • Even when participants felt confident, their judgements were often incorrect, highlighting an overestimation of the human ability to detect AI.

Conclusion

The findings show that teachers struggle to distinguish AI-generated content from human-written work without support from leading AI detection tools.

Study: Do teachers spot AI? Evaluating the detectability of AI-generated texts among student essays

Source: ScienceDirect

2. All That’s ‘Human’ Is Not Gold: Evaluating Human Evaluation of Generated Text

Study overview

In this study, accessible through Cornell University, researchers again assessed participants' ability to differentiate between content produced by AI or written by humans.

The content researched in this study included “stories, news articles, and recipes.” 

  • 255 stories were reviewed for the study (sourced from social media posts).
    • 50 of the 255 human-written stories were chosen.
    • 50 additional stories were generated from prompts.
  • 2,111 news articles were reviewed for the study (sourced from 15 news publications).
    • 50 of the 2,111 articles were chosen.
    • 50 additional articles were generated from prompts.
  • 50 human-written recipes from the RecipeNLG dataset were included.
    • 50 additional recipes were generated from prompts.

Key findings

  • Again, the results showed that participants had a poor success rate in identifying AI-generated content.
  • Participants had 57.9% accuracy when identifying GPT2 vs. human content (rounded to 58% in the published table).
  • Then, study participants had a 49.9% accuracy at identifying GPT3 vs. human content (rounded to 50% in the published table).
  • When participants were trained in AI recognition before the test, their scores slightly improved, but the results were not substantial.

Conclusion

The findings further show that humans struggle to identify AI-generated text, even if they have received some form of training prior to the task.

Find out about the accuracy of AI detectors according to third-party researchers in a meta-analysis of AI-detection accuracy studies.

Study: All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text

Source: Cornell University

3. People cannot distinguish GPT-4 from a human in a Turing test

Study overview

This study, also accessible through Cornell University, offers additional insight into the ability of humans to detect AI-generated content produced by three AI models: GPT-4, GPT-3.5, and ELIZA.

The study focused on how successful humans are at identifying AI-generated text. It included 500 participants and evaluated their AI-detection efficacy using a ‘game’ that resembled a messaging app.  

Further, the study evaluated whether or not prior knowledge or familiarity with AI and large language models (LLMs) could impact the results.

Key findings

  • Similar to previous studies, research shows that participants were overall unable to detect AI-generated text and differentiate it from human-written text.
    • Human text was correctly identified as human-written 67% of the time.
    • GPT-4 was incorrectly identified as human 54% of the time.
    • ELIZA was incorrectly identified as human 22% of the time
  • With regard to the unique aspect of familiarity with LLMs in this study, LLM knowledge had a marginal impact on accuracy.
  • Interestingly, age was the largest influential factor in terms of accuracy. Younger age groups generally identified AI models more accurately than older age groups.

Conclusion

This study further highlights that humans struggle to identify AI-generated text consistently, even if they are familiar with LLM subject matter.

Interested in learning more about AI detection and using AI detectors? Read the Originality AI Content Detector Accuracy Review for more information.

Study: People cannot distinguish GPT-4 from a human in a Turing test

Source: Cornell University

4. Can you spot the bot? Identifying AI-generated writing in college essays

Study overview

The International Journal for Educational Integrity also conducted a detailed study into AI detection, focusing specifically on academic writing. The aim was to see whether participants could identify AI-generated essays from human-written ones.

The study included 140 college instructors and 145 college students as participants. Researchers provided each participant with one human-written student essay and one essay generated by AI. Then, they were asked to evaluate which was AI-generated and which was student-written.

Key findings

  • The study showed that participants struggled to accurately identify AI-generated essays.
  • College instructors correctly identified ChatGPT 70% of the time.
  • College students correctly identified ChatGPT 60% of the time.

Conclusion

The results demonstrate that both educators and students have difficulty identifying AI usage in academic work.

Study: Can you spot the bot? Identifying AI-generated writing in college essays

Source: International Journal for Educational Integrity via BioMed Central Ltd  

5. Identification of ChatGPT-Generated Abstracts Within Shoulder and Elbow Surgery Poses a Challenge for Reviewers

Study overview

The fifth study on this list is available in Arthroscopy: The Journal of Arthroscopic & Related Surgery, and focuses on a reviewer’s ability to identify ChatGPT-generated abstracts.

As with other studies, participants were shown a mixture of AI-generated and human-written content, this time in the format of scientific abstracts, and were asked to select which ones they believed were AI-generated.

Key findings

  • Reviewers correctly spotted AI content 62% of the time
  • 38% of the time, original human-written abstracts were incorrectly identified as AI.
  • As a result of the difficulties reviewers faced, the study found that AI detection methods beyond human evaluation are needed.

Conclusion

As with the other studies, this further proved that humans struggle to detect AI-generated content.

Study: Identification of ChatGPT-Generated Abstracts Within Shoulder and Elbow Surgery Poses a Challenge for Reviewers

Source: Arthroscopy: The Journal of Arthroscopic & Related Surgery

6. Can ChatGPT Fool the Match? Artificial Intelligence Personal Statements for Plastic Surgery Residency Applications: A Comparative Study

Study overview

Lastly, we have the study from the Canadian Society of Plastic Surgeons, focusing on content in the highly specialized context of medical residency applications.

The participants were two recently retired surgeons with 20 years of experience in the CaRMS (Canadian Residency Matching Service). They were asked to evaluate 11 AI and 11 human applications and identify whether they were human-written or generated by ChatGPT-4.

Key findings

  • The study showed that evaluators successfully differentiated between AI and human applications 65.9% of the time.
  • The paper acknowledged that the small sample size was a limitation of the study, while the evaluator’s expertise was a notable strength. 

Conclusion

The findings once again show that identifying AI content is challenging for humans. It even proves challenging for experienced professionals with over two decades of experience in medical residency applications to identify human vs. AI residency applications.

Study: Can ChatGPT Fool the Match? Artificial Intelligence Personal Statements for Plastic Surgery Residency Applications: A Comparative Study

Source: Canadian Society of Plastic Surgeons

Final Thoughts

So, can humans detect AI-generated text? As demonstrated by these studies, the answer is no. 

While humans can correctly identify AI text in some instances, they still struggle overall to differentiate AI content from human-written content. As a result, for additional clarity, an accurate AI content detector is highly beneficial.

FAQs About AI Detection

Can humans detect AI-generated content?

In general, humans struggle to detect AI-generated content. Studies show that human accuracy can vary significantly.

How accurate are AI detection tools?

The efficacy of AI detection tools varies depending on the tool. The Originality.ai Standard 2.0.1 AI Checker exhibits 99%+ accuracy in AI detection

To compare the efficacy of popular AI tools, check out our AI detection review series

What are the implications of not being able to detect AI content?

There can be significant implications, including the spread of misinformation, academic dishonesty, and a lack of authenticity in online information.

Jonathan Gillham

Founder / CEO of Originality.ai I have been involved in the SEO and Content Marketing world for over a decade. My career started with a portfolio of content sites, recently I sold 2 content marketing agencies and I am the Co-Founder of MotionInvest.com, the leading place to buy and sell content websites. Through these experiences I understand what web publishers need when it comes to verifying content is original. I am not For or Against AI content, I think it has a place in everyones content strategy. However, I believe you as the publisher should be the one making the decision on when to use AI content. Our Originality checking tool has been built with serious web publishers in mind!

More From The Blog

AI Content Detector & Plagiarism Checker for Marketers and Writers

Use our leading tools to ensure you can hit publish with integrity!