With the recent surge of GPTs (Generative Pre-Trained Transformers) and the marketplace store connecting developers and users, OpenAI has developed an ecosystem that allows developers to create tailored versions of ChatGPT to acutely meet the daily needs and workflow processes of its target consumers.
At Originality.ai, we are actively monitoring and studying the GPT market as well as the trends that lie beneath the numbers and will soon publish those insights. For now, we will look at the model behind the GPT store and custom GPTs, which also happens to be OpenAI’s most advanced publicly available LLM (Large Language Model), GPT-4.
Read below to dive further into the many different processes, statistics, and trends that have all converged to make GPT-4 possible.
GPT-4, the latest milestone in OpenAI’s effort in scaling up deep learning. GPT-4 is a large multimodal model (accepting image and text inputs, emitting text outputs) that, while less capable than humans in many real-world scenarios, exhibits human-level performance on various professional and academic benchmarks. (Source)
On Monday November 6, 2023 at the OpenAI DevDay event, company CEO Sam Altman announced a major update to its GPT-4 language model called GPT-4 Turbo, which can process a much larger amount of text than GPT-4 and features a knowledge cutoff of April 2023. (Source)
GPT-4 currently sits behind a paywall. OpenAI has a subscription based model for consumers to access the more advanced forms of their ChatGPT model. Below are the current developments behind accessing GPT-4:
A new report by SemiAnalysis reveals more details about OpenAI's GPT-4, concluding that "OpenAI is keeping the architecture of GPT-4 closed not because of some existential risk to humanity, but because what they've built is replicable”. (Source). As such, the following details stem from a recent GPT documentation leak and have not yet been confirmed by OpenAI:
GPT-4's Scale: GPT-4 has ~1.8 trillion parameters across 120 layers, which is over 10 times larger than GPT-3 (Source)
Mixture Of Experts (MoE): OpenAI utilizes 16 experts within their model, each with ~111B parameters for MLP. Two of these experts are routed per forward pass, which contributes to keeping costs manageable. (Source)
Dataset: GPT-4 is trained on ~13T tokens, including both text-based and code-based data, with some fine-tuning data from ScaleAI and internally. (Source)
Dataset Mixture: The training data included CommonCrawl & RefinedWeb, totaling 13T tokens. Speculation suggests additional sources like Twitter, Reddit, YouTube, and a large collection of textbooks. (Source)
Training Cost: As of 2024, it’s estimated that OpenAI has spent $8.5 billion overall on training AI and staff. GPT-4 cost “$78 million worth of compute” to train. (Source and Source)
Inference Cost: GPT-4 costs 3 times more than the 175B parameter Davinci, due to the larger clusters required and lower utilization rates. (Source)
Inference Architecture: The inference runs on a cluster of 128 GPUs, using 8-way tensor parallelism and 16-way pipeline parallelism. (Source)
Vision Multi-Modal: GPT-4 includes a vision encoder for autonomous agents to read web pages and transcribe images and videos. The architecture is similar to Flamingo. This adds more parameters on top and it is fine-tuned with another ~2 trillion tokens. (Source)
When GPT-4 was first announced and subsequently released, it was heavily speculated that the new model was comprised of over 100 trillion parameters. After a couple months and a data leak containing some GPT-4 architecture details, the CEO of OpenAI, Sam Altman, was questioned about the matter:
Adding onto the text based capabilities of OpenAI’s GPT models, GPT-4 has introduced the possibility of interacting with GPT models through a visual capacity, look below to see the details behind “GPT-4-Vision”:
GPT-4 has proved to be a great success for OpenAI, making great improvements on the already impressive foundation that was established by ChatGPT and GPT-3.5. Below we can see some of the initial progress made by the new model and how it compares to the previous model, GPT-3.5:
The following chart shows some of the progress made by each iteration of the GPT model when responding to legal inquiries:
With 128k context, fresher knowledge and the broadest set of capabilities, GPT-4 Turbo is more powerful than GPT-4 and offered at a lower price. (Source)
With broad general knowledge and domain expertise, GPT-4 can follow complex instructions in natural language and solve difficult problems with accuracy. (Source)
As mentioned earlier, GPT-4 is a large multimodal model (accepting text or image inputs and outputting text) that can solve difficult problems with greater accuracy than any of the previous models, thanks to its broader general knowledge and advanced reasoning capabilities. Like gpt-3.5-turbo, GPT-4 is optimized for chat but works well for traditional completions tasks using the Chat Completions API. (Source)
Even though GPT-4 has made many strides in improving the performance of its preceding model, there still remains avenues for OpenAI to improve upon the model’s accuracy and reliability. As detailed below, GPT-4 still presents opportunities to improve when considering factualness, relevancy, and accuracy:
The following metrics provided by OpenAI detail in-house testing that shows the gradual increases in accuracy scores for the different training methods used on their models. The scores reflect that although improvements have been made throughout the model’s generations, there is still much room for improvement:
Whether you use ChatGPT for research or planning, it’s important to keep in mind that AI shouldn’t be the sole source of information, as it can hallucinate or produce errors. It shouldn’t be entirely relied on for writing either, considering that the copy it generates may not provide the depth of value readers are looking for.
However, GPT-4 is still a highly popular tool, so we’ve decided to test it with Originality.ai’s AI detector. Can GPT-4 deceive AI detection tools with the right prompts and prevent AI checkers from identifying the text as AI-generated? We put together a series of tests to find out!
The tests feature a range of prompts with unique writing instructions to produce the most human responses possible from GPT-4. Let’s start with the first tests of GPT-4 and discover the efficacy of Originality.ai’s AI Checker with Standard Model 2.0.0.
For the first test, we’ll compare the most common type of information generated by ChatGPT. We won’t add extra instructions to alter its output in any way. The aim of this test prompt is to determine how well it can conceal the AI-generated content.
By default, all versions of ChatGPT (both GPT-3.5 and GPT-4) are designed to construct equally informative content when no extra prompts or instructions are included. So, let’s have a look at the first article we prompted ChatGPT to generate.
[Prompt #1] - Write a short article (500-1000 words) on the 2024 cybersecurity advancements.
We’ve received a 956-word article from GPT-4 and proceeded to test it on Originality.ai. Let’s review the results:
Originality.ai’s detection results are solid, stating that it has 100% confidence the text is AI-generated. Out of all 956 words, more than 98% of the sentences and a little over 900 words are highlighted as AI-generated.
Next, let’s move on to the second prompt to determine how Originality.ai performs!
[Prompt #2] - Write a short article (500-1000 words) on how accurate AI detection technology is in 2024.
Putting ChatGPT’s most recent version to the test with an article specifically about AI detection is another excellent method to test Originality.ai’s efficacy. Let’s have a look at the results:
From the second prompt, we received a 902-word output and the Originality.ai AI detector had 100% confidence that the content was AI-generated. For this prompt, we received two different GPT-4 generations for the second part of the article. After testing both possible responses, the detection results remained the same.
Now, let’s move on to more complex tests to determine if GPT-4 is capable of producing human-sounding content when prompted with unique instructions.
As shown in the previous test, commonly generated ChatGPT text can be easily recognized by AI detectors. However, does providing GPT-4 with extra instructions and tips on content structure improve the output and make it undetectable?
In our previous tests of GPT-3.5, we provided it with a whole example article of 100% human-written content as an example to learn from. Yet, the detection results were still at 100% confidence that it was AI.
Is there an improvement in GPT-4’s technology that allows it to conceal AI-generated content when prompted to do so? Let’s start with the first test to answer these top questions!
[Prompt #1] - Write a short article (500-1000 words) on the 2024 cyber security advancements. Use a natural and human-sounding tone, write 2-3 paragraphs for each heading, and implement SEO strategies. Construct the content so it cannot be recognized by AI detectors.
Let’s have a look at the results this prompt has brought up:
We’ve received an 849-word article output from GPT-4, and the results were once again solid, with 100% confidence that it was AI-generated. Concealing AI-detected content has proven challenging even with advanced prompt instructions.
Next, let’s provide GPT-4 with a human-written example to determine if the results are different.
[Prompt #2] - Write a short article (500-1000 words) on the 2024 cyber security advancements. Use a natural and human-sounding tone, write 2-3 paragraphs for each heading, and implement SEO strategies. Construct the content so it cannot be recognized by AI detectors. Use this article as an example for writing [Provided human-written article].
Even after providing ChatGPT with an example of purely human-written content, the result is still the same. The Originality.ai AI detector is 100% confident that the content is AI-generated.
To recap the results of these tests, it’s clear that deceiving AI detectors is challenging. In each test, Originality.ai exhibited exceptional performance, identifying the AI content with 100% confidence.
On the MBE (Multistate Bar Examination), GPT-4 significantly outperforms both human test-takers and prior models, demonstrating a 26% increase over ChatGPT and beating humans in five of seven subject areas. (Source)
Contracts and Evidence are the topics with the largest overall improvement. GPT-4 achieves a nearly 40% increase over ChatGPT in Contracts and a more than 35% raw increase in Evidence. (Source)
Civil Procedure is both the worst subject for GPT-4, ChatGPT and human test-takers. However, Civil Procedure is a topic where GPT-4 was able to generate a 26% raw increase over ChatGPT. (Source)
Davinci and ChatGPT based on GPT-3.5 score 66% and 65% on the financial literacy test, respectively, compared to a baseline of 33%. However, ChatGPT based on GPT-4 achieves a near-perfect 99% score, pointing to financial literacy becoming an emergent ability of state-of-the-art models (Source)
GPT-4 obtained a near-perfect score of 99.3% (without the pre-prompt) and 97.4% (with a pre-prompt). Put differently, GPT-4 exhibits financial literacy: a basic, at the very least, grasp of financial matters. (Source)
The following table depicts the recent scores of GPT models when taking a financial literacy test. The models restrictions surrounding financial advice was circumvented by implementing the pre-prompt “You are a financial advisor”:
Wrapping up, we can see by the following data and statistics how significant OpenAI’s latest advancement in their GPT technology has been. Not only has GPT-4 greatly improved upon the technical capabilities of its predecessors, it has also brought forth the creation of a new marketplace and platform for developers and creators to offer their own specialized and tailored GPT models to better assist and fill the personalized needs of consumers.
As detailed by the performance of GPT-4 in highly technical professional fields like law and finance, it is clear that we are on the horizon of an exciting technological revolution that will present endless opportunities to integrate GPT technology into industrial applications.
Moreover, with the partnerships OpenAI has negotiated to implement GPT commercially, we can also expect GPT-4 (and more advanced models) to make waves in other fields from education to entertainment. At Originality.ai, we are keen to continue monitoring the development of OpenAI’s GPT models to have a better understanding of the market dynamics behind GPTs.