Keyword density helper – This tool comes with a built-in keyword density helper in some ways similar to the likes of SurferSEO or MarketMuse the difference being, ours is free! This feature shows the user the frequency of single or two word keywords in a document, meaning you can easily compare an article you have written against a competitor to see the major differences in keyword densities. This is especially useful for SEO’s who are looking to optimize their blog content for search engines and improve the blog’s visibility.
File compare – Text comparison between files is a breeze with our tool. Simply select the files you would like to compare, hit “Upload” and our tool will automatically insert the content into the text area, then simply hit “Compare” and let our tool show you where the differences in the text are. By uploading a file, you can still check the keyword density in your content.
Comparing text between URLs is effortless with our tool. Simply paste the URL you would like to get the content from (in our example we use a fantastic blog post by Sherice Jacob found here) hit “Submit URL” and our tool will automatically retrieve the contents of the page and paste it into the text area, then simply click “Compare” and let our tool highlight the difference between the URLs. This feature is especially useful for checking keyword density between pages!
You can also easily compare text by copying and pasting it into each field, as demonstrated below.
Ease of use
Our text compare tool is created with the user in mind, it is designed to be accessible to everyone. Our tool allows users to upload files or enter a URL to extract text, this along with the lightweight design ensures a seamless experience. The interface is simple and straightforward, making it easy for users to compare text and detect the diff.
Multiple text file format support
Our tool provides support for a variety of different text files and microsoft word formats including pdf file, .docx, .odt, .doc, and .txt, giving users the ability to compare text from different sources with ease. This makes it a great solution for students, bloggers, and publishers who are looking for file comparison in different formats.
Protects intellectual property
Our text comparison tool helps you protect your intellectual property and helps prevent plagiarism. This tool provides an accurate comparison of texts, making it easy to ensure that your work is original and not copied from other sources. Our tool is a valuable resource for anyone looking to maintain the originality of their content.
User Data Privacy
Our text compare tool is secure and protects user data privacy. No data is ever saved to the tool, the users’ text is only scanned and pasted into the tool’s text area. This makes certain that users can use our tool with confidence, knowing their data is safe and secure.
Compatibility
Our text comparison tool is designed to work seamlessly across all size devices, ensuring maximum compatibility no matter your screen size. Whether you are using a large desktop monitor, a small laptop, a tablet or a smartphone, this tool adjusts to your screen size. This means that users can compare texts and detect the diff anywhere without the need for specialized hardware or software. This level of accessibility makes it an ideal solution for students or bloggers who value the originality of their work and need to compare text online anywhere at any time.
Researchers at UPenn, University College London, King’s College London, and Carnegie Mellon University recently completed the most complete study yet to evaluate AI detector efficacy - RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors (study).
Originality.ai’s AI detector was the most accurate detector in the study!
The study looked at 12 AI detectors, 11 different text-generating LLMs (like ChatGPT), and 11 types of adversarial attacks (like paraphrasing), resulting in a dataset of over 6 million text records. This is the most robust evaluation of AI text detectors to date, and our own AI detection accuracy study is likely the second most in-depth study.
Below is a summary of the study, including the key findings related to Originality.ai’s industry-leading performance and a couple of our weaknesses identified in the study that we are excited to address.
Study Details:
Study: https://arxiv.org/abs/2405.07940
It is important to note these accuracy scores reflect a 5% false positive threshold, more on what that means below.
Originality.ai’s model 2.0 Standard was used for these results, however we would expect model 3.0 Turbo (which was released 1 month after the authors used Originality.ai) to outperform 2.0.
Note: In October 2024, we released an updated Turbo 3.0.1 model, visit our AI detection accuracy post for details on the latest models.
11 AI models were used to generate a non-adversarial dataset and it was evaluated against all AI detectors. These included the most popular AI models such as ChatGPT, Llama, Mistral and GPT-4.
Originality.ai was the most accurate across the test achieving 98.2% accuracy on ChatGPT content and an average of 85% across all 11 AI models.
Of the 11 adversarial (adversarial means trying to make undetectable AI content) techniques used Originality.ai was the most accurate of the 12 detectors in 9 of the 11 tests, 2nd in 1, and performed poorly in 2 rarely used bypassing techniques.
“Originality.ai Rank” shows how we performed vs the 12 AI detectors for each of the Adversarial Bypassing Techniques.
Clearly, Originality.ai’s AI Detection needs to improve on the Homoglyph and Zero-Width AI bypassing strategies, but was the leading AI detector on the other more common bypassing techniques.
The authors looked at how AI detectors performed on different types of content (news articles vs poems etc).
Originality.ai was the most accurate across 5 of the 8 types of content “domains” and the 2nd most accurate in the other 3 domains.
Important to note is that the study did not include the domain/type of content we at Originality.ai focus most on, i.e. web content/marketing content.
The most common method used in trying to make AI content undetectable is “Paraphrase Plagiarism”. The strategy uses a paraphrasing AI like Quillbot to change words in an attempt to bypass AI detection tools. This is the strategy used by common tools such as Undetectable.AI.
Originality.ai performed uniquely well on this common adversarial bypassing strategy achieving 96.7% accuracy while the average accuracy from other detectors was 59%.
In the context of text detection, a false positive occurs when a detector incorrectly labels a piece of human-written text as machine-generated. False positives can occur for several reasons listed here. The False Positive Rate (FPR) is the percentage of these incorrect labels out of all the human-written texts evaluated. For instance, if a detector examines 100 human-written texts and incorrectly labels 5 of them as machine-generated, the FPR would be 5%.
A 5% FPR threshold is a benchmark used to ensure a balanced evaluation of text detectors. It was selected by the authors of this study.
Here’s why it matters for this study:
This is the best study by far that looks to address if AI detectors work and what their limitations are.
I would have several comments for the authors if the study is continued moving forward or for future studies …
Cover More Relevant Domains:
Most users of AI detectors fall into 2 types of content… Web Content and Academia.
The domains that were selected for this study included essentially no web content (Abstracts, Books, News, Poetry, Recipes, Reddit, Reviews and Wiki). Our detector is specifically trained on web content and it would have been great to see that domain included. The societal importance of being able to detect AI-generated poems is lower than the importance of being able to detect AI-generated news.
False Positive Rate Threshold Makes Sense But Incomplete for Users:
This study standardizes on a FPR threshold of 5%. However, there is use cases where a VERY low FPR rate is required and use cases where a higher false positive rate is acceptable if it means very very little AI can get past the detector. Ultimately the users of AI detectors will be the ones who want to make a decision on what the right trade-off between correctly identifying AI as AI and correctly identifying human as human is. We ask to consider other methods of displaying detector efficacy instead of just accuracy with an FPR threshold including confusion matrix and corresponding efficacy scores (F1, TPR, FPR, etc).
AI Edited Dataset:
One dataset that is missing that will become more prevalent is AI-edited content (whether that be lightly AI-edited or heavily AI edited). With AI editing tools increasing the amount of AI they inject into an edited article, the ability to identify lightly edited AI content is important. For a small analysis here is the impact of Grammarly editing on AI detection.
Our AI detector performed poorly on 2 bypassing strategies. Homoglphyh and Zero-Width-Space attacks. These two strategies work by creating text that seems readable to humans but is less readable to machines. We strongly recommend against using this strategy to try and trick Google’s AI detection.
Originality.ai focuses on providing an accurate AI detector for web publishers, so focusing on Homoglyph bypassing and Zero-Width Space had not been a priority for us (these aren’t machine-readable and therefore not a good strategy for web publishing). However, in order to ensure we are the most accurate across all bypassing strategies, we look forward to having our AI research team close these 2 smaller weaknesses in our detector!
We will shortly have 3 models for users to choose from depending on their desired tradeoff between false positives (calling human text AI) and false negatives (calling AI content human).
This is well represented on the graph Figure 4 from the study and overlaying our target ranges for our 3 AI detection models.
Our model showed leading performance for a given false positive rate until it got near 1%. Our Lite model will have industry leading low false positive rates.
It is great to see a high-quality efficacy study after some flawed early studies provided misinformation about AI detectors and their efficacy. Thank you, researchers - this study will help people!
It is most exciting to see our approach with our Blue Team (building detectors) and Red Team (trying to beat our detector) is working. Our entire AI research team should be very proud of the industry leading results they are producing.
If anyone is interested in running a study that includes Originality.ai, we are happy to make credits available and have our own open source tool available for evaluating detector efficacy for free.
No, that’s one of the benefits, only fill out the areas which you think will be relevant to the prompts you require.
When making the tool we had to make each prompt as general as possible to be able to include every kind of input. Not to worry though ChatGPT is smart and will still understand the prompt.
Originality.ai did a fantastic job on all three prompts, precisely detecting them as AI-written. Additionally, after I checked with actual human-written textual content, it did determine it as 100% human-generated, which is important.
Vahan Petrosyan
searchenginejournal.com
I use this tool most frequently to check for AI content personally. My most frequent use-case is checking content submitted by freelance writers we work with for AI and plagiarism.
Tom Demers
searchengineland.com
After extensive research and testing, we determined Originality.ai to be the most accurate technology.
Rock Content Team
rockcontent.com
Jon Gillham, Founder of Originality.ai came up with a tool to detect whether the content is written by humans or AI tools. It’s built on such technology that can specifically detect content by ChatGPT-3 — by giving you a spam score of 0-100, with an accuracy of 94%.
Felix Rose-Collins
ranktracker.com
ChatGPT lacks empathy and originality. It’s also recognized as AI-generated content most of the time by plagiarism and AI detectors like Originality.ai
Ashley Stahl
forbes.com
Originality.ai Do give them a shot!
Sri Krishna
venturebeat.com
For web publishers, Originality.ai will enable you to scan your content seamlessly, see who has checked it previously, and detect if an AI-powered tool was implored.
Industry Trends
analyticsinsight.net
Tools for conducting a plagiarism check between two documents online are important as it helps to ensure the originality and authenticity of written work. Plagiarism undermines the value of professional and educational institutions, as well as the integrity of the authors who write articles. By checking for plagiarism, you can ensure the work that you produce is original or properly attributed to the original author. This helps prevent the distribution of copied and misrepresented information.
Text comparison is the process of taking two or more pieces of text and comparing them to see if there are any similarities, differences and/or plagiarism. The objective of a text comparison is to see if one of the texts has been copied or paraphrased from another text. This text compare tool for plagiarism check between two documents has been built to help you streamline that process by finding the discrepancies with ease.
Text comparison tools work by analyzing and comparing the contents of two or more text documents to find similarities and differences between them. This is typically done by breaking the texts down into smaller units such as sentences or phrases, and then calculating a similarity score based on the number of identical or nearly identical units. The comparison may be based on the exact wording of the text, or it may take into account synonyms and other variations in language. The results of the comparison are usually presented in the form of a report or visual representation, highlighting the similarities and differences between the texts.
String comparison is a fundamental operation in text comparison tools that involves comparing two sequences of characters to determine if they are identical or not. This comparison can be done at the character level or at a higher level, such as the word or sentence level.
The most basic form of string comparison is the equality test, where the two strings are compared character by character and a Boolean result indicating whether they are equal or not is returned. More sophisticated string comparison algorithms use heuristics and statistical models to determine the similarity between two strings, even if they are not exactly the same. These algorithms often use techniques such as edit distance, which measures the minimum number of operations (such as insertions, deletions, and substitutions) required to transform one string into another.
Another common technique for string comparison is n-gram analysis, where the strings are divided into overlapping sequences of characters (n-grams) and the frequency of each n-gram is compared between the two strings. This allows for a more nuanced comparison that takes into account partial similarities, rather than just exact matches.
String comparison is a crucial component of text comparison tools, as it forms the basis for determining the similarities and differences between texts. The results of the string comparison can then be used to generate a report or visual representation of the similarities and differences between the texts.
Syntax highlighting is a feature of text editors and integrated development environments (IDEs) that helps to visually distinguish different elements of a code or markup language. It does this by coloring different elements of the code, such as keywords, variables, functions, and operators, based on a predefined set of rules.
The purpose of syntax highlighting is to make the code easier to read and understand, by drawing attention to the different elements and their structure. For example, keywords may be colored in a different hue to emphasize their importance, while comments or strings may be colored differently to distinguish them from the code itself. This helps to make the code more readable, reducing the cognitive load of the reader and making it easier to identify potential syntax errors.
With our tool it’s easy, just enter or upload some text, click on the button “Compare text” and the tool will automatically display the diff between the two texts.
Using text comparison tools is much easier, more efficient, and more reliable than proofreading a piece of text by hand. Eliminate the risk of human error by using a tool to detect and display the text difference within seconds.
We have support for the file extensions .pdf, .docx, .odt, .doc and .txt. You can also enter your text or copy and paste text to compare.
There is never any data saved by the tool, when you hit “Upload” we are just scanning the text and pasting it into our text area so with our text compare tool, no data ever enters our servers.
Copyright © 2023, Originality.ai
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
This table below shows a heat map of features on other sites compared to ours as you can see we almost have greens across the board!
Have you seen a thought leadership LinkedIn post and wondered if it was AI-generated or human-written? In this study, we looked at the impact of ChatGPT and generative AI tools on the volume of AI content that is being published on LinkedIn. These are our findings.