See updated studies on the ability of AI Detectors like Originality.ai's free ChatGPT text detector to identify the latest models from OpenAI see:
For a complete study on the current state of AI detection accuracy see this AI detection efficacy study.
Below shows the ability to detect ChatGPT 3 and 3.5 from 2023
With the release of OpenAI’s new model for AI-generative text, we needed to re-check Originality.AI’s accuracy. Essentially we completed a study to identify if the AI developed at Originality.AI can detect if the content that was produced by ChatGPT, GPT 3.5 (DaVinci-003) with the same accuracy as it can for GPT-3 which is 94%.
TLDR:
We tested 20 articles across GPT-3, GPT-3.5, and ChatGPT and ran each result through Originality.AI to determine the effectiveness of identifying AI content.
The results are that both ChatGPT and GPT-3 can be successfully identified with the existing AI but are superior to GPT-3.
Further training of the Originality.AI model will further improve its detection ability for GPT-3.5 and ChatGPT.
OpenAI just released a new model for AI-generated text, which is GPT3.5. In order to test the reliability of Originality.AI, several tests have been conducted. The test is conducted by entering the article generated from three different OpenAI models (GPT3, GPT3.5, and ChatGPT) into Originality. AI.
Originality.AI has the ability to detect whether the text is AI-generated. The performance of Originality.AI will be represented by the AI detection score and plagiarism check. The aim of this testing is to find out whether Originality.AI needs to be retrained or not.
The test is conducted by testing 20 articles generated from three different OpenAI models; GPT3, GPT3.5, and ChatGPT, which are the most advanced AI text-generated models in the market. The articles were approximately 500 characters long with various topics: fact-based news, sports articles, how-to article, and fun-fact articles.
In Order to generate this AI-Generated text, various prompt has been tested. Open-AI GPT Models are leading in bringing personalization to the text written. Most AI-Generated text platforms such as Jasper AI and Copy AI built these platforms with GPT-based models and extra prompt engineering.
To test the originality AI Performance, I test various keywords to generate the article by randomizing various of article types; blog, news, article, tutorial, magazine article, and also Instagram caption) and adding personalization to the prompt.
An example of the prompt would be “Write a news article about Who will win the world cup in Qatar by BBC News Journalist”. These are list of article types and lengths which tested on the OpenAI models.
I also add personalization to the prompt which makes this AI have more persona to see if originality AI still can predict the AI on the text despite a persona added here. The list of personas tested is Bloomberg Journalist, Ariana Grande, Jimmy Fallon, Nelson Mandela, Gumball, Elon Musk, best magazine journalist, BBC News Journalist, Donald Trump, Jack Ma, Gordon Ramsay, Jennifer Lawrence, Los Angeles Times Journalist, Machio Kaku, Rachael Ray, mythology, William Shakespeare, Sebastian Stan, Kevin Hart, Anthony Mackle, Vogue Journalist, Neil deGrasse Tyson, Joko Widodo – Indonesian President, New York Times Journalist, Elizabeth Olsen, Jeff Bezos, Guido van Rossum, IKEA, NASA Scientist, CNN News Journalist, Carl Sagan, and CBC Journalist.
After conducting the test on three different OpenAI Models, I found out that Originality AI can detect the AI on all the text generated by GPT-3, GPT-3.5, and ChatGPT by 99.41% on average. With GPT-3 has the highest maximum average score of 99.95%, followed by GPT-3.5 99.65%, and ChatGPT at 98.65%. It makes sense because GPT 3.5 is a more advanced model of GPT 3. ChatGPT is able to be detected as a non-AI on average by 1.35% because it has a better personalization feature. According to this performance test, I can conclude that Originality.AI performs really well as it can detect the AI-generated by various models with different combinations of prompt and personalization as high as 99.41%.
The graph and table below show the average, min, and max of GPT-3, GPT-3.5, and ChatGPT score results.
According to the graph below, ChatGPT-based text has the lowest AI Detection score, followed by GPT3.5 and GPT3.
I tried to compare the AI detection score on the OpenAI model with a persona added and without a persona. Adding a persona in the prompt tends to make the AI detection score slightly lower. The OpenAI might pick up some persona from the internet therefore it’s able to trick the AI Detector slightly~very low score, less than 1 %.
Originality.AI can detect plagiarism from the text very well. Most of the tests conducted contain plagiarism. Attached sample of plagiarism detected.
Approximately 10-25% of the text generated by GPT-3, GPT-3.5, and ChatGPT actually passed the plagiarism check. Originality.AI is able to detect that these text has plagiarism because GPT has trained on various text sources, including news source and online articles. I found out that 6 out of 20 sample ChatGPT-based texts were able to pass the plagiarism check, while only 3 samples out of each 20 GPT3 and GPT3.5-based texts passed the plagiarism check.
From the result of Originality.AI testing, overall I can conclude that Originality.AI has high performance for AI-generated text detection. I can find it in the Sheets attached. Even though Originality.AI is trained based on the GPT3 model, It’s also able to detect text generated by GPT3.5 and Chat GPT. It can be proven from the average accuracy of Originality.AI detecting text generated from GPT3.5 and ChatGPT. However, when Originality.AI is used to detect text generated by GPT3.5 and ChatGPT, its performance is slightly reduced.
According to the result, for now, retraining Originality.AI based on GPT3.5 is nice to have as it still performs quite well. However, In order to maintain the reliability and robustness of Originality.AI, knowing that GPT3.5 and ChatGPT are based on reinforcement learning, which means it kept being trained on various corpus and new data, so over time, the performance of the AI detection on originality AI might slightly be reduced (approximately 3-10%). In this case, Then, retraining the model used in Originality.AI based on GPT3.5 might increase the performance of Originality.AI in detecting AI-generated text by GPT3.5 and ChatGPT.