AI Writing

Can Originality.AI Detect GPT 3, GPT 3.5 And ChatGPT Generated Text?

With the release of OpenAI’s new model for AI-generative text, we needed to re-check Originality.AI’s accuracy. Essentially we completed a study to identify if the AI developed at Originality.AI can detect if the content that was produced by ChatGPT, GPT 3.5 (DaVinci-003) with the same accuracy as it can for GPT-3 which is 94%.

With the release of OpenAI’s new model for AI-generative text, we needed to re-check Originality.AI’s accuracy. Essentially we completed a study to identify if the AI developed at Originality.AI can detect if the content that was produced by ChatGPT, GPT 3.5 (DaVinci-003) with the same accuracy as it can for GPT-3 which is 94%.

TLDR:

We tested 20 articles across GPT-3, GPT-3.5, and ChatGPT and ran each result through Originality.AI to determine the effectiveness of identifying AI content.

The results are that both ChatGPT and GPT-3 can be successfully identified with the existing AI but are superior to GPT-3.

Further training of the Originality.AI model will further improve its detection ability for GPT-3.5 and ChatGPT.

Overview

OpenAI just released a new model for AI-generated text, which is GPT3.5. In order to test the reliability of Originality.AI, several tests have been conducted. The test is conducted by entering the article generated from three different OpenAI models (GPT3, GPT3.5, and ChatGPT) into Originality. AI.

Originality.AI has the ability to detect whether the text is AI-generated. The performance of Originality.AI will be represented by the AI detection score and plagiarism check. The aim of this testing is to find out whether Originality.AI needs to be retrained or not.

Data Input

The test is conducted by testing 20 articles generated from three different OpenAI models; GPT3, GPT3.5, and ChatGPT, which are the most advanced AI text-generated models in the market. The articles were approximately 500 characters long with various topics: fact-based news, sports articles, how-to article, and fun-fact articles.

Prompt

In Order to generate this AI-Generated text, various prompt has been tested. Open-AI GPT Models are leading in bringing personalization to the text written. Most AI-Generated text platforms such as Jasper AI and Copy AI built these platforms with GPT-based models and extra prompt engineering.

To test the originality AI Performance, I test various keywords to generate the article by randomizing various of article types; blog, news, article, tutorial, magazine article, and also Instagram caption) and adding personalization to the prompt.

An example of the prompt would be “Write a news article about Who will win the world cup in Qatar by BBC News Journalist”. These are list of article types and lengths which tested on the OpenAI models.

I also add personalization to the prompt which makes this AI have more persona to see if originality AI still can predict the AI on the text despite a persona added here. The list of personas tested is Bloomberg Journalist, Ariana Grande, Jimmy Fallon, Nelson Mandela, Gumball, Elon Musk, best magazine journalist, BBC News Journalist, Donald Trump, Jack Ma, Gordon Ramsay, Jennifer Lawrence, Los Angeles Times Journalist, Machio Kaku, Rachael Ray, mythology, William Shakespeare, Sebastian Stan, Kevin Hart, Anthony Mackle, Vogue Journalist, Neil deGrasse Tyson, Joko Widodo – Indonesian President, New York Times Journalist, Elizabeth Olsen, Jeff Bezos, Guido van Rossum, IKEA, NASA Scientist, CNN News Journalist, Carl Sagan, and  CBC Journalist.

Performance Result

After conducting the test on three different OpenAI Models, I found out that Originality AI can detect the AI on all the text generated by GPT-3, GPT-3.5, and ChatGPT by 99.41% on average. With GPT-3 has the highest maximum average score of 99.95%, followed by GPT-3.5 99.65%, and ChatGPT at 98.65%. It makes sense because GPT 3.5 is a more advanced model of GPT 3. ChatGPT is able to be detected as a non-AI on average by 1.35% because it has a better personalization feature. According to this performance test, I can conclude that Originality.AI performs really well as it can detect the AI-generated by various models with different combinations of prompt and personalization as high as 99.41%.

  • AI Detection Score Result

The graph and table below show the average, min, and max of GPT-3, GPT-3.5, and ChatGPT score results.

According to the graph below, ChatGPT-based text has the lowest AI Detection score, followed by GPT3.5 and GPT3.

I tried to compare the AI detection score on the OpenAI model with a persona added and without a persona. Adding a persona in the prompt tends to make the AI detection score slightly lower. The OpenAI might pick up some persona from the internet therefore it’s able to trick the AI Detector slightly~very low score, less than 1 %.

  • Plagiarism Detector

Originality.AI can detect plagiarism from the text very well. Most of the tests conducted contain plagiarism. Attached sample of plagiarism detected.

Approximately 10-25% of the text generated by GPT-3, GPT-3.5, and ChatGPT actually passed the plagiarism check. Originality.AI is able to detect that these text has plagiarism because GPT has trained on various text sources, including news source and online articles. I found out that 6 out of 20 sample ChatGPT-based texts were able to pass the plagiarism check, while only 3 samples out of each 20 GPT3 and GPT3.5-based texts passed the plagiarism check.

  1. Originality.AI Performance Analysis on GPT3 Generated Text
    From the Sheets, most of the tests have 100% accuracy. Only one test has 99% accuracy. The performance of GPT3 on detecting the AI-based text is quite well, On average the detection score is 99.95%. No wonder, Originality.AI is trained based-on GPT3 model. Therefore, Originality.AI can detect text generated by GPT3 so well. Besides that, Originality.AI can detect plagiarism from the text very well too. Most of the tests conducted contain plagiarism, because the GPT3 model is trained with a huge amount of text data from multiple sources, which is mostly from the Internet, and it has a lower ability (than GPT3.5) to compose the words independently (in most cases, the amount of words generated by GPT3 is less than GPT3.5 and ChatGPT).
  2. Originality.AI Performance Analysis on GPT3.5 Generated Text
    The performance of GPT3.5 on detecting the AI-based text is quite well, On average the detection score is 99.65%. The performance of Originality.AI is slightly reduced compared to the previous test on GPT3-based text. It can be seen from the range accuracy of the AI detection score. The lowest accuracy of tests is 98%. The GPT3.5 model is an advanced version of GPT3. Hence, it has a better ability to compose the words from the prompt. However, the performance is still quite good. Despite Originality.AI is trained based-on GPT3, it still has a good performance when detecting text generated by GPT3.5. Another thing to note is the number of plagiarism detected from the tests is less than the GPT3 test.
  3. Originality.AI Performance Analysis on ChatGPT Generated Text
    Originality.AI had the lowest performance compared to the previous  2 tests. The performance of ChatGPT on detecting the AI-based text is quite well, On average the detection score is 98.65%.  It can be seen from the accuracy of each test. Out of 20 tests, there are 2 tests that have accuracy below 95%, i.e. the fifth and the sixth test. The fifth test has 93% accuracy.

    The prompt used on this test is “write blog with 500 words about Best Dog Training Leash” with 1 request. The sixth test has 90% accuracy. The prompt used on this test is “write news with 500 words about How to Potty Train a Puppy written by Donald Trump” with 1 request. Seemingly, nothing complicated from these 2 prompts. I assumed this is because the model used to generate the text~ChatGPT is the fine-tuned version of GPT3.5. It interacts in a conversational way. Therefore, the text generated will have a different structure compared to GPT3 and GPT3.5. Besides, Originality.AI is trained using GPT3. The Originality.AI performance will be slightly reduced when detecting the text generated by GPT3.5, let alone text generated by fine-tuned GPT3.5. As well as the GPT3.5, the number of plagiarism detected from the tests is less than GPT3 test.

Summary

From the result of Originality.AI testing, overall I can conclude that Originality.AI has high performance for AI-generated text detection. I can find it in the Sheets attached. Even though Originality.AI is trained based on the GPT3 model, It’s also able to detect text generated by GPT3.5 and Chat GPT. It can be proven from the average accuracy of Originality.AI detecting text generated from GPT3.5 and ChatGPT. However, when Originality.AI is used to detect text generated by GPT3.5 and ChatGPT, its performance is slightly reduced.

According to the result, for now, retraining Originality.AI based on GPT3.5 is nice to have as it still performs quite well. However, In order to maintain the reliability and robustness of Originality.AI, knowing that GPT3.5 and ChatGPT are based on reinforcement learning, which means it kept being trained on various corpus and new data, so over time, the performance of the AI detection on originality AI might slightly be reduced (approximately 3-10%). In this case,  Then, retraining the model used in Originality.AI based on GPT3.5 might increase the performance of Originality.AI in detecting AI-generated text by GPT3.5 and ChatGPT.

Jonathan Gillham

Founder / CEO of Originality.AI I have been involved in the SEO and Content Marketing world for over a decade. My career started with a portfolio of content sites, recently I sold 2 content marketing agencies and I am the Co-Founder of MotionInvest.com, the leading place to buy and sell content websites. Through these experiences I understand what web publishers need when it comes to verifying content is original. I am not For or Against AI content, I think it has a place in everyones content strategy. However, I believe you as the publisher should be the one making the decision on when to use AI content. Our Originality checking tool has been built with serious web publishers in mind!

More From The Blog

AI Content Detector & Plagiarism Checker for Serious Content Publishers

Improve your content quality by accurately detecting duplicate content and artificially generated text.