Try the Most Accurate AI Detector on the Market
Our patented AI checker is the most accurate detector on the market! Don't believe us? Try it for yourself!
Try for FREE Here!
Free Tools

LLMs.txt Tracking Study and Live Dashboard

New AI web standards like llms.txt, llms-full.txt, and ai.txt are quietly emerging to help site owners define how AI interacts with their content; learn more in our LLMs.txt tracking study and dashboard.

They may not have the same name recognition as robots.txt just yet, but new AI web standards like llms.txt, llms-full.txt, and ai.txt are quietly emerging to help site owners define how AI interacts with their content.

Like it or not, AI bots, including GPTBot, CCBot, and ClaudeBot, are now actively visiting websites and collecting data to train large language models (LLMs), generate responses, and pull summaries for AI tools. 

Although robots.txt has been the standard for controlling how search engine crawlers interact with sites for decades, it was never meant to handle these AI bots. Therefore, some site owners have been adopting the new AI web standards to take back control of their web content.

In addition to tracking their adoption over time, this article will explore the new AI web standards, how to implement them, and their benefits and limitations.

Why robots.txt Doesn’t Reliably Control AI Crawlers

Robots.txt doesn’t reliably control AI crawlers for several reasons:

  • Built for search, not AI. Robots.txt was originally designed to manage search engine crawlers and was never meant to address issues like AI data collection.
  • Voluntary. As with search bots, there are no technical or legal requirements for AI bots to obey the robots.txt protocol.
  • Relies on agents to self-identify. Although robots.txt works by specifying rules for user agents, AI crawlers can easily ignore them, spoof their identity, or otherwise avoid detection.
  • Can’t distinguish between tasks. Robots.txt treats all crawlers the same and can’t recognize or specify rules around why a crawler is accessing web content. For example, it can’t distinguish between access for search indexing vs AI training.
  • Lacks granularity. The robots.txt standard takes a sort of “all or nothing” approach by only specifying simple allow/disallow rules, so it can’t tell AI crawlers to index a page but not train models based on its data. 
  • No support for licensing or contact information. There is no way for robots.txt to specify how content can be used, or even who to ask for permission. 

Overall, the problem isn’t just a lack of features. Robots.txt is based on an outdated idea of what a crawler is, and it simply can’t handle the new challenges of how today’s AI models access and interpret web content.

The Emerging AI Web Standards: llms.txt, llms-full.txt, and ai.txt

The Emerging AI Web Standards: llms.txt, llms-full.txt, and ai.txt

To address the limitations of robots.txt, new AI web standards are emerging to help give site owners more control over how AI models interact with their content: llms.txt, llms-full.txt, and ai.txt.

Llms.txt

What is llms.txt? Proposed by data scientist Jeremy Howard in 2024, llms.txt is a markdown file that allows site owners to help LLMs better understand their content. 

Unlike robots.txt, which only allows or blocks crawlers, llms.txt is all about guidance, giving AI models context and clarity about what’s at a given URL. Whether an LLM takes that guidance, though, is up for debate — using llms.txt is completely voluntary, so AI models can choose to ignore it, even if it’s for their benefit.

Placed at a site’s root, llms.txt can include both human- and machine-readable information like:

  • A brief description or summary of the site and its purpose
  • Explanations of and links to pages that best represent the site that LLMs can follow for more details
  • Notes about how AI models should prioritize and interpret content

Because LLMs have limited context windows (or can only process so much information at a time), this kind of high-level guidance can help them focus their efforts on analyzing or summarizing the most relevant or important parts of a site. Otherwise, it leaves them grabbing text at random, which can completely misrepresent what a site’s all about.

Although the adoption of new AI web standards is still fairly low and slow across the board, llms.txt has emerged as the clear frontrunner, suggesting that it may just become the go-to option for guiding how LLMs interact with websites in the future.

Llms-full.txt

Llms-full.txt takes things a step further than llms.txt by offering LLMs access to all relevant content in one place, not just summaries or an index of links. Basically, it makes it even easier for AI models to understand how all the different content pieces of a site come together.

See, llms-full.txt provides LLMs with a full, flattened map of a website. Unlike traditional maps, like sitemap.xml, it includes the actual content, laying out every page and section in a single, readable format to enable AI models to access and understand everything at once. As long as they choose to use it, that is — as with llms.txt, LLM compliance with llms-full.txt is voluntary.

Also like llms.txt, llms-full.txt is placed at the root of a website, and its content is readable to both humans and machines. However, it includes slightly different information:

  • The full text of every important page and section on the site in a single markdown file
  • Headings and structure that provide context for the site’s organization
  • Guidance for AI models on how to interpret, relate to, or prioritize different parts of the content

Since it contains so much text and LLMs have limited context windows, llms-full.txt is best suited for smaller sites or sets of specific, self-contained information, such as a product manual or knowledge base for a single topic. As long as everything in the file can fit into an AI model’s context window, it should be good to go.

Ai.txt

Ai.txt is a little different than both llms.txt and llms-full.txt. Instead of guiding AI models on how to interpret your content, ai.txt provides AI crawlers with instructions about what they can and cannot do with it.

Whenever AI bots download content from a website, they’re supposed to read ai.txt files to learn whether site owners permit their work to be used for AI-specific purposes (like training or summarizing). However, the key term here is “supposed to”.

Like robots.txt, llms.txt, and llms-full.txt, compliance with ai.txt is up to the bots and the companies behind them. Spawning, a company actively promoting the standard, notes that AI bots using its API will respect ai.txt, but most do not. At least, not yet. 

Until adoption increases or another option starts to gain momentum in the meantime, though, ai.txt is still one of the most effective ways to communicate a site owner’s AI usage preferences.

Simply place the human- and machine-readable ai.txt at the root of a website, and include the following information to specify what AI can do with its content:

  • File-type restrictions indicating what content shouldn’t be used, like .jpg or .pdf
  • Terms or conditions for how AI models can use content
  • Contact information for questions surrounding licensing or usage permissions
  • Notes detailing the site’s expectations around AI training or reuse
  • Allow/Disallow directives specifying which AI crawlers can or cannot access your site (optional)

Now, don’t let the allow/disallow directives fool you — ai.txt isn’t just about controlling bot access like robots.txt. It builds on that idea for the AI era, giving site owners much more granular control over how AI systems can use, train on, and reuse content.

How to Implement llms.txt

One of the best things about the new AI web standards in general is that they’re relatively easy to set up, as all you really need is a plain text editor. Other than that, no special tools are required.

Here’s how to get started with llms.txt and how to create an llms.txt file:

Step 1: Create and structure the llms.txt file

To create an llms.txt file, use the following structure (in order) using markdown-style formatting in plain text:

  • H1 title. The first line is a top-level heading (#) with the site or project name. Note that this H1 title is the only required step for an llms.txt file. 

All of the following steps are optional:

  • Blockquote summary. The second line can be a blockquote (>), with a brief summary explaining what the site is about. This line is strongly recommended because it provides context, helping LLMs better understand the rest of the file.
  • Additional details without headers. Add any paragraphs or lists to give even more context or instructions.
  • Additional context with headers. Add sections with H2 headers (##) describing the site’s most important content categories. Under each header, list key links in markdown (denoted by [link title](link_url): Optional link description). AI models can choose to follow these links for more details. 
  • Optional secondary information. When included (## Optional), the links in the section may be skipped if a shorter context is necessary.

With both required and optional elements included, the llms.txt file should look something like this:

# Website name

> Website name is an educational resource that provides guides, tutorials, and reference materials on a range of technical topics.

This site is maintained by a small team of independent contractors, and focuses on accuracy, clarity, and accessibility. We prioritize evergreen content and aim to support both beginners and professionals.

## Docs

- [Getting Started](https://websitename.com/getting-started): Introductory resources for new users.

## Optional

- [Team](https://websitename.com/team): Contributor bios and site background

Alternatively, tools like this llms.txt and llms-full.txt generator WordPress plugin and this llms.txt generator can also be used to generate llms.txt files.

Step 2: Place the llms.txt file in the website’s root directory, and test accessibility

Once complete, place the new llms.txt file in the website’s root directory. Then, ensure it's accessible at websitename.com/llms.txt.

Step 3: Add HTTP header (recommended, but optional)

Configure your server to add the following HTTP header for the llms.txt file: 

X-Robots-Tag: llms-txt

It isn’t necessary, but it does help AI systems identify the purpose of the file and distinguish it from unrelated text files.

Step 4: Verify implementation

Verifying the llms.txt implementation involves three important steps:

  • Testing accessibility by opening websitename.com/llms.txt in a browser to confirm it loads correctly
  • Checking for the X-Robots-Tag: llms-txt header (if added)
  • Validating that the links work and the markdown structure in general is correct

If everything checks out, the llms.txt implementation is complete. Just remember to update it periodically to reflect any changes to the website structure or content, and keep an eye on best practices for implementing llms.txt. Since it’s still a new standard, the formatting and content could change over time.

How to Implement llms-full.txt

Although it shares similarities with llms.txt, the llms-full.txt standard requires a slightly different approach, as it includes the full content of a site in one flattened file. 

So, to give bots an even better understanding of a site, here’s how to get started with llms-full.txt.

Step 1: Create and structure the llms-full.txt file

To create an llms-full.txt file, use the following structure (in order), using markdown-style formatting in plain text:

  • H1 title. Start with a top-level heading (#) with the project or site name. As with llms.txt, note that the H1 title is the only required element for a llms-full.txt file. 

The following sections are technically optional, but recommended for a comprehensive llms-full.txt file:

  • Blockquote summary. Add a blockquote (denoted by >) with a short paragraph summarizing the site or project. Though optional, this step is strongly recommended to give AI models immediate context.
  • Additional details without headers. Paragraphs or lists that give AI models more context about the site’s purpose, focus, or audience.
  • Table of contents. Since it's such a long and detailed file, a table of contents using anchor-style references (like [Docs](#docs)) can help organize information.
  • Flattened site content. Include the full text of each important section or page, formatted with markdown H2 headings (##) to reflect the site’s structure. Feel free to exclude any irrelevant information that doesn’t need to be accessible to AI systems.
  • Optional secondary information. Include an Optional (##) section for AI models to skip if context length is an issue — these files can get quite big.

Since providing a comprehensive file with all of the above elements is beyond the scope of this article (it could go on for pages and pages), here is a brief example of an llms-full.txt file:

# Website name

> Website name is an educational resource that provides guides, tutorials, and reference materials on a range of technical topics.

This site is maintained by a small team of independent contractors, and focuses on accuracy, clarity, and accessibility. We prioritize evergreen content and aim to support both beginners and professionals.

## Table of Contents

- [Docs](#docs)
- [Optional](#optional)

## Docs

The Getting Started guide introduces new users to the basics of our platform, including account setup, key features, and navigation tips. It also includes frequently asked questions to help users troubleshoot common issues.

This section provides full installation steps for both Windows and macOS users, covering dependencies, configuration options, and known compatibility notes.

## Optional

We’re a distributed team of educators, developers, and technical writers. Learn more about our background and approach to content creation.

Llms-full.txt files can also be generated with the llms.txt and llms-full.txt generator WordPress plugin or this llms-full.txt generator.

Step 2: Place the llms-full.txt file in the website’s root directory, and test accessibility

Next, place the completed llms-full.txt file in your website’s root directory so it can be accessed at websitename.com/llms-full.txt.

Step 3: Add HTTP header (recommended, but optional)

To the website’s server configuration, add the following HTTP header for the llms-full.txt file: 

X-Robots-Tag: llms-full-txt

As with llms.txt, this header is an optional addition. However, it can help LLMs figure out why the file is there and what it’s for.

Step 4: Verify implementation

Finally, see if the llms-full.txt implementation works by doing the following:

  • Navigating to websitename.com/llms-full.txt to ensure it displays properly
  • Checking that the HTTP header is there, if relevant
  • Confirming that the formatting structure is correct and that all necessary content is accounted for

The llms-full.txt implementation should now be complete — at least, mostly. The only thing left to do would be to update it periodically to reflect any major site organization and content changes, or new best practices that may emerge involving llms-full.txt.

How to Implement ai.txt

Although llms.txt and llms-full.txt contain lots of optional add-ons, these files tend to follow the same general format. However, this isn’t necessarily the case for ai.txt, at least as of this writing.

So, for simplicity, the following guide will be based strictly on the structure promoted by Spawning AI, one of the few companies actively promoting the standard.

Here’s how to get started with ai.txt.

Step 1: Create and structure the ai.txt file

To create an ai.txt file, the key is to use plain text formatting with directive-based syntax. The structure is generally similar to robots.txt but with added support for file-type wildcards and more granular control of what can and cannot be used for AI training.

Technically, there are no required elements for an ai.txt file. As long as the ai.txt file exists in the website’s root directory, it could be completely empty. However, this would have no effect and defeat the purpose of putting it there in the first place.

 So, to create an effective ai.txt file, consider including the following elements:

  • User-Agent. Start the file with User-Agent: * to indicate that the following rules apply to all AI agents. 
  • Disallow rules. Use Disallow: followed by a directory, file path, or wildcard pattern to specify which content should be blocked from AI training datasets.
  • Allow rules: Use Allow: to give AI bots explicit permission to paths or content types.
  • Comments. For human readability only, consider starting the file with one or more comments (#) explaining the file’s purpose.

Using the Spawning ai.txt generator, an example ai.txt file should look like this:

# Spawning AI
# Prevent datasets from using the following file types

User-Agent: *
Disallow: *.aac
Disallow: *.aiff
Allow: *.txt
Allow: *.pdf

If used, note that Spawning’s tool tends to block or allow entire categories, such as video or coding files. Feel free to remove file types as necessary to customize targeting.

Step 2: Place the ai.txt file in the website’s root directory and test accessibility

After fine-tuning the file (if needed), upload it to the root directory of the website so it’s accessible at websitename.com/ai.txt.

Step 3: Verify implementation

To verify that the ai.txt file is working:

  • Open websitename.com/ai.txt in a browser to make sure it loads correctly
  • Ensure there are no formatting issues or redirects
  • Confirm the path is exactly /ai.txt, as it won’t work from a subdirectory

Once everything is verified, the ai.txt implementation is complete. As with the other web standards, remember to update it as necessary if targeted file preferences change or new best practices emerge.

How to Use AI Web Standards with AI Systems

Note that most AI systems don’t automatically detect these files — at least, not yet. Since these new web standards are still emerging and voluntary, manual input is often required.

So, to help AI tools recognize and apply files like llms.txt, llms-full.txt, and ai.txt, try one of the following:

  • Provide it with a direct link to the corresponding .txt file
  • Copy and paste the contents of the file directly into its prompt field
  • If available, upload the .txt file using its file upload feature

Typically, doing just one of the above is enough for the tool to discover the file. However, if one method confuses the model or otherwise doesn’t seem to work, try combining two to improve results.

Benefits of Using Emerging AI Web Standards

Since creating some of these files may seem like a lot of work (a comprehensive llms-full.txt file can be hundreds of thousands of words) and there’s no guarantee that AI will comply, it may seem like a waste of time. 

However, there are some real benefits to implementing these emerging AI web standards:

  • Help AI find the content that actually matters. Llms.txt and llms-full.txt highlight a website’s most useful pages (including guides, documentation, reference materials, etc.) and leave out irrelevant or outdated content.
  • Get supported tools to summarize content more accurately. Some AI tools, like Claude or Perplexity, support llms.txt or llms-full.txt directly. Providing one of these structured files can make it easier for these tools to summarize or quote content correctly.
  • Set clear boundaries around what AI can use. Files like ai.txt allow website owners to specify which content can and cannot be used for AI training. It’s not enforceable, of course, but some tools will respect it.
  • Send a visible opt-out signal. Even if a crawler ignores an ai.txt file, it still shows up. Including one demonstrates a site’s position on AI usage, and could matter more if adoption picks up or new policies emerge.
  • Reduce the need for full-site scraping. Since llms-full.txt provides a flattened, structured version of a website’s content, it can help LLMs access context without crawling every single site page.

Overall, files such as llms.txt, llms-full.txt, and ai.txt have the potential to provide site owners with more clarity and control over how AI interacts with their sites. 

Don’t expect them to solve all AI-related scenarios, though. It’s easy to overestimate how much these files can really accomplish.

What These Files Do Not Do

Although they may have their benefits, the new AI web standards also have their limits. Sure, they can help shape how AI systems interpret a website, but ultimately, they offer no guarantees.

Here’s what files like llms.txt, llms-full.txt, and ai.txt do not do:

  • Prevent AI from using content. Since compliance is voluntary, AI bots and systems can simply ignore these files without repercussion.
  • Ensure instructions are seen by AI. There’s no way to tell if a model or bot skips over one of these files.
  • Block scraping, archiving, or reusing content. If bots ignore the rules, content can still be scraped, archived, or reused by AI. 
  • Improve SEO or search rankings. These standards aren’t SEO tools. They don’t improve visibility, help with indexing, or do anything to impact a website’s performance on Google.
  • Behave the same way with all AI tools. Some bots may follow all or parts of the file, while others may parse things differently or ignore sections entirely. There’s no shared standard for interpretation.

Now, these limitations shouldn’t necessarily be dealbreakers. The new standards are, after all, new, so they’re not going to be perfect right out of the gate.

However, even with all the controversy and concern surrounding AI scraping and training (just look at the long list of OpenAI lawsuits), adoption has still been slow.

And interestingly, it’s not just due to technical gaps.

Why Adoption of AI Web Standards is Low: Challenges and Barriers

Although the limitations offer some insight into why it’s taking so long for people to adopt the new AI web standards, they’re really only one piece of the puzzle. 

From confusion around what the files actually do to questions about whether they even work in the first place, here’s a look at why the adoption of llms.txt, llms-full.txt, and ai.txt may be so slow:

  • Many people still haven’t heard of them. The first comment on a Reddit post about an llms.txt directory from December 2024 asks, “why isn’t this post getting any traction?” with someone commenting a few months later, “it’s too cutting edge”. This suggests that awareness is low, but is perhaps picking up speed.
  • No sign that major AI companies are using them. Even Google Search Advocate and former Senior Webmaster Trends Analyst John Mueller mentioned on Bluesky that “FWIW no AI system currently uses llms.txt” in June 2025.
  • There’s confusion over how they work. In another Reddit post, one user seems to think it’s designed to act like a robots.txt file, which is not the case.
  • No enforcement. Reddit is full of doubters, with another user commenting, “won’t they just ignore it anyway? I don’t get why people are talking about this…” Multiple users say it’s pointless or a bad idea in the same thread.
  • No obvious payoff. Another Reddit thread has comments including “I think it’s a waste of time and resources”, “Check server logs. No requests anywhere for it.”, “I’m using it, but still no visible effects”. Until users start seeing a clear impact, it may be hard to justify putting in the time and effort to implement the new standards.

Ultimately, adoption of these new AI web standards is slow because there’s no good reason to use them. If early adopters had been reporting clear, measurable results, they should’ve gained some serious traction by now.

That said, momentum is building, as seen in our tracker. However, with such slow adoption, the question remains: will these truly become the new AI web standards?

Tracking the Adoption of AI Web Standards

Files like llms.txt, llms-full.txt, and ai.txt may not have hit the mainstream just yet, but they clearly have people talking. Although it's only based on a large subset of the web, our tracker does indicate that adoption is slow, but it’s not like it’s stagnant.

Whether or not these files end up becoming as commonplace as robots.txt will depend on several factors, with one of the most important being whether the companies behind AI bots and models start to take them seriously. After all, if all the major players are going to ignore them, then what’s the point?

For now, though, they are a step in the right direction. They may not be perfect, and they may not be enforceable, but they’re something. They offer site owners a way to tell AI companies how they want their content to be treated.

And in a world where AI is starting to play a role in how everyone is learning, searching, creating, and communicating, even the smallest tools for taking back some of that control are worth paying attention to.

Jonathan Gillham

Jonathan Gillham

Founder / CEO of Originality.ai I have been involved in the SEO and Content Marketing world for over a decade. My career started with a portfolio of content sites, recently I sold 2 content marketing agencies and I am the Co-Founder of MotionInvest.com, the leading place to buy and sell content websites. Through these experiences I understand what web publishers need when it comes to verifying content is original. I am not For or Against AI content, I think it has a place in everyones content strategy. However, I believe you as the publisher should be the one making the decision on when to use AI content. Our Originality checking tool has been built with serious web publishers in mind!

More From The Blog

Al Content Detector & Plagiarism Checker for Marketers and Writers

Use our leading tools to ensure you can hit publish with integrity!