The generative large language model (LLM) space is constantly evolving with new innovations. Renowned companies like OpenAI, Google, Meta, Nvidia, and Microsoft are all competing to build the best foundational LLM models.
At the same time, new startups are emerging, creating new models or fine-tuning existing ones. So, they’re keen to understand how they stack up — who's leading, innovating, and growing? However, establishing an industry benchmark for LLM models can be challenging.
We used the Elo rating system from Large Model Systems Organization (LMSYS) to rank 35 organizations based on their top model performance from the Chatbot Arena.
We also collected data on:
Then, we added the organization's website as a comparison and performance reference.
The data is presented in an Airtable, displaying organization rankings and top metrics. Click on each organization to explore the metrics in detail.
Note: This table will be updated regularly as the Chatbot Arena model’s rankings and the organization’s standings change.
Large Model Systems Organization (LMSYS Org) is an open research organization founded by students and faculty from UC Berkeley in collaboration with UCSD and CMU. They adopted a pairwise comparison ranking system known as the Elo rating system, also officially used by the US Chess Federation to rank chess players.
The LMSYS Chatbot Arena Leaderboard rates and ranks LLM models.
The system is gaining traction in the industry by providing a transparent, competitive framework for model evaluation.
Note: For the latest information on the ranking system that LMSYS uses, visit their blog post, ‘Chatbot Arena: New models & Elo system update.’
A quick overview of some of the features that are included in our table.
Rankings are determined by the Top Model ELO score. The organization with the highest Top Model ELO is ranked first, and so on. This provides a clear view of which organizations have leading models based on the Elo rating system.
This score represents the Elo rating of the highest-rated model for each organization. Elo ratings are a method for calculating the relative skill levels of players (or models, in this case) in competitor-versus-competitor games. The higher the Elo score, the better the model is considered to be.
Sourced from SEMRush via Crunchbase, this metric indicates the number of visits the organization’s website receives monthly to provide insight into the organization’s online presence and reach.
This is the total number of models an organization has in the Chatbot Arena. It gives an idea of the organization’s presence and diversity in the model landscape.
This column aggregates the total number of pairwise comparison votes received by all models of an organization in the Chatbot Arena. Each vote represents a user’s choice, compared between two models, contributing to the Elo rating.
This column shows the total amount of funding an organization has raised, sourced from Crunchbase. It reflects the financial backing and investor confidence in the organization's projects and potential.
This column provides the estimated market value of the organization. For public companies, this data is straightforward, while for private companies, it is often estimated based on funding rounds and press release news.
This data comes from Crunchbase and is typically presented as a range, such as ‘251-500.’ It’s an approximation because the data is aggregated from various sources and cross-referenced to provide a close estimate rather than an exact number like those reported by public companies in their quarterly filings.
This feature indicates the last time the Airtable data was updated.
In brief, Chatbot Arena by LMSYS Org is an open research platform from UC Berkeley in collaboration with UCSD and CMU. It uses the Elo rating system to rank LLMs through over 1,000,000 human pairwise comparisons. It’s a transparent and competitive framework for model evaluation, making it an emerging benchmark in the industry.
Link: Chatbot Arena
Crunchbase provides comprehensive data on private companies, including financial information, the number of employees, and web traffic. This data helps contextualize the business environment and operational scale of each organization in our ranking.
Link: Crunchbase
To compile the rankings and data for the top organizations developing LLMs, we followed these steps.
Have you seen a thought leadership LinkedIn post and wondered if it was AI-generated or human-written? In this study, we looked at the impact of ChatGPT and generative AI tools on the volume of AI content that is being published on LinkedIn. These are our findings.
We believe that it is crucial for AI content detectors reported accuracy to be open, transparent, and accountable. The reality is, each person seeking AI-detection services deserves to know which detector is the most accurate for their specific use case.