AI Writing

How Does AI Content Detection Work?

A lot of questions have come up about how our AI content detection works and achieves the 94%+ accuracy for all modern AI text generation tools. Here is the nitty gritty details provided by our engineering team communicating how they trained the most accurate AI content detection tool.

A lot of questions have come up about how our AI content detection works and achieves the 94%+ accuracy for all modern AI text generation tools.

Here is the nitty-gritty details provided by our engineering team communicating how they trained the most accurate AI content detection tool.

1. How the tool works?

The Problem

  • The tool's core is based on a modified version of the BERT model. Also known as Bidirectional Encoder Representations from Transformers, BERT is a Google-developed artificial intelligence (AI) system for processing natural language.
  • A lot of research has shown that it is increasingly difficult for humans to identify if a piece of text was created by a writer or AI-generated. As the generating models get bigger and bigger, the quality of the generated texts get better and better. Therefore, the use of a Machine Learning (ML) model for detection (ML-based detection) is an urgent problem.
  • However, detecting the synthesis document is still a difficult task. The adversary can use datasets outside the training set distribution or use different sampling techniques to generate text and many other techniques. We did a lot of research and did our best to deliver an optimal model, in terms of architecture, data, and other techniques.


  • We used a modified version of the BERT model for classification. We tested on multiple model architectures and concluded that with the same capacity, the discriminate model with having a more flexible architecture than the generative model (e.g bidirectional) makes it more powerful in detection. Another thing we also learned is that the larger the model used, the more accurate and robust the prediction will be. The usage model is large enough but still meets the response time, thanks to the most modern techniques in deployment.
  • Of course, we did not use BERT to train the model but we trained a pre-training language model with a completely new architecture, and of course, it is still a kind of architecture based on Transformer.
  • We first trained a new language model that had a completely new architecture based on 160GB of text data. One technique that we apply to train the model is similar to ELECTRA. The model is trained by two models: generator and discriminator. After training, discriminator is used as our language model.
  • Second, fine-tune the model on the training dataset that we built up to millions of samples.

Training data

  • It is obvious that the quality of the model depends a lot on the input data. With the collection and processing of a large amount of actual text from many sources, through the clean process, we have a large amount of real text to train.
  • With generated text, In addition to using the best pre-trained generated models available today, we also build on that, fine-tuning with the existing real text dataset to make the generated text more natural, close to the real distribution. text as much as possible, so that the model can learn a good enough boundary for future inference connections.
  • Our training dataset is carefully generated using various sampling methods. It is manually reviewed by humans a lot.

Other techniques

  • To generate text, there are many different sampling techniques, like temperature, Top-K, and nucleus sampling. We expect more advanced generation strategies (such as nucleus sampling) could make detection more difficult than generations produced via Top-K truncation. Using a mixed dataset from different sampling techniques also helps to increase the accuracy of the model. In our training data generation, we try to generate in different ways and on many different models so that the data is diversified and the model understands the types of text generated by the AI in a better way.
  • One of the challenges of the detection model is detecting difficult short texts, although it is very short texts, we have also improved the accuracy of the model in these cases by using our different techniques. “after using different methods and testing, our model…” can improve accuracy by more than 15% compared to the previous model. Therefore, the current model has improved accuracy on much shorter texts, but according to our experiments, with lengths of 50 tokens or longer, the model has acceptable reliability.

Future work

  • Given the release of new complex text-generation models, this problem requires regular updating. The model needs to be retested and evaluated before we decide whether to use its results or retrain it using the data from those new text-generation models. However, with tests using the most recent SOTA models, the model we created can do its job very well
  • Taking our proven model training method and applying it to other languages is on our roadmap.
  • One of our future research topics is that we want the model to understand deeply each sentence and paragraph of the text and show which sentence/paragraph model is written by AI and which sentence/paragraph is written by AI/Human. We call this model the AI Highlight Detector.

Simplified Version of What is Under Development:

2. How accurate is Originality.AI?

  • The model is trained on a million pieces of textual data labeled “human-generated” or “AI-generated”. Then during testing, we test the trained model on documents generated by artificial intelligence models, including GPT-3, GPT-J, and GPT-NEO (20 thousand data each). And the result is that our model successfully identified 94.06% of the text created by GPT-3, 94.14% of text written by GPT-J, and 95.64% of text generated by GPT-Neo.
  • The results show that the more powerful the models like GPT-J/3, the harder it is for the model to recognize that the human or AI is writing.

3. What does the score mean?

  • This is a binary classification problem because the objective is to determine whether the sentence was produced by AI. After training, the model takes text as input and determines whether it is likely to have been generated by AI or not. In order to determine the final outcome, we must first select a threshold; if the probability result of the AI-generated input sentence exceeds that threshold, the output is fake.
  • For instance, the model estimates a 0.6 probability of artificial intelligence (AI) generation and a 0.4 probability of human creation. If we use a threshold of 0.5, the output will be an AI-generated input text because 0.6 > 0.5.

To evaluate the model, we use the accuracy measure of the label it predicts compared to the true label of the data. Informally, accuracy is the fraction of predictions our model got right.

  • Many other metrics are also taken to test and evaluate the model more specifically, in detail, and more accurately.

Jonathan Gillham

Founder / CEO of Originality.AI I have been involved in the SEO and Content Marketing world for over a decade. My career started with a portfolio of content sites, recently I sold 2 content marketing agencies and I am the Co-Founder of, the leading place to buy and sell content websites. Through these experiences I understand what web publishers need when it comes to verifying content is original. I am not For or Against AI content, I think it has a place in everyones content strategy. However, I believe you as the publisher should be the one making the decision on when to use AI content. Our Originality checking tool has been built with serious web publishers in mind!

More From The Blog

AI Content Detector & Plagiarism Checker for Serious Content Publishers

Improve your content quality by accurately detecting duplicate content and artificially generated text.