AI Studies

AI Fact Checking Accuracy Study

In the constantly evolving realm of AI generated content the veracity of information is of utmost importance. With a couple of fact checking solutions available, discerning their efficacy becomes crucial. Originality.ai, revered for its transparency and accuracy in AI content detection had recently ventured into the domain of fact checking but how does our solution stack up against well established giants like ChatGPT or emerging contenders like Llama-2? This study aims to answer this question.

In the constantly evolving realm of AI generated content the veracity of information is of utmost importance. With a couple of fact checking solutions available, discerning their efficacy becomes crucial. Originality.ai, revered for its transparency and accuracy in AI content detection had recently ventured into the domain of fact checking but how does our solution stack up against well established giants like ChatGPT or emerging contenders like Llama-2? This study aims to answer this question.

Summary of Findings:

  • The article compares 6 AI models for fact checking: Originality.ai, GPT-4, GPT-3.5, CodeLlama-34b, Llama-2-13b, and Llama-2-70b.
  • A dataset of 120 recent facts (60 true, 60 false) was created and fed to each model to test accuracy.
  • Originality.ai’s Fact Checking Aid has the highest accuracy at 72.3%. GPT-4 was second at 64.9%.
  • GPT-4 and CodeLlama-34b had high rates of returning "unknown", making them unreliable for fact checking.
  • Originality.ai stood out with 0% "unknowns" and low error rate, indicating reliability.
  • When fact checking a generated article, Originality.ai caught more false facts than GPT-4 and Llama-70b.
  • Originality.ai's strength is fact checking for content creators, providing sources and explanations.
  • Originality.ai performed well on recent general knowledge facts and aims to continue improving its fact checking capabilities.

What is Fact-checking?

Fact-checking is the rigorous process of verifying the accuracy and authenticity of information presented in various forms of content whether that be news articles, blogs, speeches or even social media posts. In an age where information can be shared at lightning speed the spread of misinformation can have profound consequences, from influencing public opinion to endangering public health and safety.

The importance of fact checking is multifold:

  • Upholding trust: Reliable information forms the basic foundation of trust between content creator, news organizations and their audience. Fact-checking ensures that the information shared is accurate maintaining the credibility of these entities.
  • Informed decision making: Accurate and up to date information enables individuals to make informed decisions about life based off of the content found across the media.
  • Preventing misinformation spread: Fact-checking acts as a filter preventing the spread of false narratives and misconceptions that can potentially have an overwhelming negative impact on the public.
  • Promoting accountability: It holds public figures, journalists, and content creators accountable for their statements, ensuring they are responsible in their communications and discouraging the spread of false information.

LLM Hallucinations

When we discuss fact checking using LLMs we would be amiss to not discuss hallucinations which are defined as “a confident response by an AI that does not seem to be justified by its training data.”1

Here are the hallucination rates for some of the popular LLMs found from Anyscale

Model Accuracy (%)
Llama-2-13b 58.7
Llama-2-70b 81.8
GPT-3.5-turbo 67.0
GPT-4 85.5

The Methodology

  • Crafting the dataset: At the heart of this study lies the dataset, a carefully selected collection of 120 different facts. Let’s break down the composition:
  • True facts (60 items): These statements grounded in reality were gathered from trustworthy and reliable sources.
  • False facts (60 items): These statements were crafted to oppose the true facts, theses statements while seeming credible are fundamentally flawed.

It’s worth noting that ALL the facts contained within this data set are facts from 2022-onwards.

Link to the full data and results: Dataset

An example of the data:

True facts:

False facts:

The Testing Blueprint

Each of these 60 facts was fed into the AI-powered fact checking solutions, Originality.ai, GPT-3.5-turbo, GPT-4, Llama-2-13b, Codellama-34b and Llama-2-70b. 

The main goal? 

To measure the accuracy of each tool in distinguishing fact from fiction

Gathering the Responses

For each fact checked we logged the response from the tool. The responses fell into one of 3 categories

  • True: The tool marked the fact as true
  • False: The tool marked the fact as false
  • Unknown: The tool was unable to verify the fact true or false 
  • Error: The tool did not provide a response
  • In the scenario involving Originality.ai, when the provided explanation was at odds with the isTrueClaim value, the explanation was regarded as the accurate source of information.

Analysis

After collecting the data the primary metrics under the microscope were:

  • Accuracy: The ration of facts correctly identified as true or false.
  • False positives: Occasions when False facts were incorrectly identified as True.
  • False negatives: Situations where True facts were incorrectly identifies as False.

Results & diagrams:

Model Accuracy(%) Unknowns Errors
Originality.ai 72.3 0 2
GPT-4 64.9 43 0
GPT-3.5 58.6 4 0
CodeLlama-34b 58.6 33 0
Llama-2-70b 55.2 0 4
Llama-2-13b 55.0 11 0

Accuracy = (Number of correct predictions/Total number of predictions) x 100

Unknown = The model return the result ‘unknown’

Error = The model returned an error

Analysis of results:

In an analysis of the 6 AI models on a dataset of 120 facts, Originality.ai achieved the highest accuracy at 72.3%. GPT-4 followed with a respectable 64.9%. 

GPT-4 had the highest unknown rate at 34.2% followed by CodeLlama-34b with 27.5% making them both unreliable for the purposes of fact checking. Both Llama2-70b and Originality.ai had the lowest unknown rate at 0%.

When it comes to error rates Llama-2-70b had the highest at 3.3% while Originality had a modest 1.7%. Notably the rest of the models did not return any errors.

Lower unknown rates are preferable for reliable fact checking. Originality.ai and Llama-2-70b stand out in this regard with 0% unknowns. High error rates or unknowns such as that in Llama-2-70b or GPT-4 could pose a challenge in the real world as their use cannot be relied upon.

Use Originality.ai’s Fact Checking Aid to Help Fact Check AI or Human Text

This testing however is not where Originality.ai shines, Our fact checking tool was built with content editors in mind which is why we include sources and an explanation when using our graphical interface.

We used ChatGPT-3.5 to generate a short article about a very recent news event, Iwe then passed that information to Originality.ai’s tool.

Originality.ai

Originality.ai’s fact checking feature in action

Conclusion:

The Originality.ai model performed well with a performance of 72.3% accuracy rate. Impressively it did not produce any ‘unknown’ outcomes suggesting a level of reliability in regard to being able to produce a result. It is worth noting that all of the facts used in this dataset show that the Originality.ai model is particularly strong in handling recent general knowledge questions. Our fact checking tool was built with content editors in mind which is reflected by the inclusion of sources and explanations when using our graphical interface. This thoughtful feature empowers content editors, enriching the fact-checking process and underscoring the unique value proposition of Originality.ai.

Originality.ai Fact Checking Aid Beta Version Released

Originality.ai is deeply committed to enhancing this fact checking feature to deliver even more accurate and reliable results. Recognizing the critical role that factual accuracy plays in today’s information rich landscape the team is committed to continuously refining the algorithms. Our initial version already performs commendably in the “Recent General Knowledge” category but the pursuit of excellence is an ongoing journey and Originality.ai aims to set new benchmarks in AI assisted fact checking, ensuring users can rely on the platform for precise and trustworthy information.

Jonathan Gillham

Founder / CEO of Originality.AI I have been involved in the SEO and Content Marketing world for over a decade. My career started with a portfolio of content sites, recently I sold 2 content marketing agencies and I am the Co-Founder of MotionInvest.com, the leading place to buy and sell content websites. Through these experiences I understand what web publishers need when it comes to verifying content is original. I am not For or Against AI content, I think it has a place in everyones content strategy. However, I believe you as the publisher should be the one making the decision on when to use AI content. Our Originality checking tool has been built with serious web publishers in mind!

More From The Blog

AI Content Detector & Plagiarism Checker for Serious Content Publishers

Improve your content quality by accurately detecting duplicate content and artificially generated text.