llms.txt is a Markdown file at the root of a domain that lists the pages a site considers authoritative, intended to be parsed by [AI search](/blog/ai-search-optimization-how-to-get-cited-by-chatgpt-claude-and-perplexity) engines. It serves as a curated index, telling AI retrievers which pages to prioritize when looking for sources.

llms.txt explained: what it is, why it matters, how to

llms.txt is a small file at the root of your domain that tells AI engines which pages on your site are worth ingesting and how to interpret them. Adoption grew through 2025 and is now expected by most major AI crawlers. This is the working guide: what it is, why it matters, and how to write one that actually shapes what gets cited.

Key takeaways

llms.txt is a Markdown file at /llms.txt that tells AI engines which pages on your site to ingest as authoritative sources.
It's not a robots.txt replacement. robots.txt controls crawling; llms.txt curates what gets cited.
The format is intentionally simple: an H1 title, a one-paragraph description, and sectioned link lists.
Two common mistakes: dumping the entire sitemap (provides no curation signal) and writing summaries that don't match the linked pages (degrades trust).
A well-curated llms.txt can shift AI citation share measurably toward the pages you want surfaced.

What llms.txt is in 2026

llms.txt is a proposed standard introduced in 2024 and adopted broadly through 2025. It's a Markdown file at the root of your domain (https://example.com/llms.txt) that lists the pages you want AI engines to treat as primary sources.

The file is structured like a small site map written for human-readable parsing by language models: an H1 with the site name, an introductory paragraph, then H2 sections each containing a curated list of links with one-line descriptions.

It's not a directive standard. AI engines don't strictly obey it the way crawlers obey robots.txt. They use it as a signal of what the publisher considers authoritative, which influences what gets surfaced when their retrievers look for sources.

Why llms.txt matters for AI search visibility

The retriever inside an AI engine has two choices when it indexes your site: crawl the whole thing and infer importance from internal structure, or check for a curated index that tells it what's important. llms.txt is the curated index.

Sites with a clean, well-curated llms.txt get their preferred pages cited at higher rates than sites that leave the retriever to guess. The effect is most visible for content sites with hundreds of pages where a few are particularly authoritative and the rest are supporting material.

Adoption has accelerated. Most major AI engines respect llms.txt by 2026. The cost of shipping one is low. The cost of not shipping one when your competitors do is measurable.

The llms.txt file format

The format is intentionally simple. Markdown, no XML, no JSON.

# Site name

> One-paragraph description of what the site does and who it serves.

Optional follow-up paragraph with context the retriever should weight.

## Docs

- [Page title](https://example.com/docs/page): One-line description of why this page matters.
- [Another page](https://example.com/docs/another): Same pattern.

## Blog

- [Article title](https://example.com/blog/article): One-line summary.

## Optional
- [Less critical pages](https://example.com/other): Lower priority.

The structure that actually parses

H1: the site name.

Blockquote (>): the primary description. This is what AI engines display when they cite your site as a source.

H2 sections: thematic groupings. Common ones include "Docs," "Blog," "Products," "Optional."

Bulleted links: each link points to a real URL with a one-line description after the colon. The description is what the retriever uses to decide whether to fetch that page for a given query.

Keep descriptions concrete. "Pricing plans and feature comparison" beats "everything about our pricing."

llms-full.txt: when to ship both

Some publishers also ship /llms-full.txt, which contains the full rendered text of every page listed in llms.txt. The use case: AI engines that prefer to ingest a single file rather than crawling individual URLs.

Ship llms-full.txt if you have under 50 pages and want to make ingestion as easy as possible. Skip it if you have hundreds of pages, because the file gets too large to be useful and the standard crawl path works fine.

For most sites, llms.txt alone is sufficient. llms-full.txt is a nice-to-have, not a requirement.

Two mistakes that make llms.txt useless

Both come from misunderstanding the file's purpose.

Mistake one: dumping the whole sitemap

The first mistake is listing every URL on the site. That gives the retriever no curation signal. If everything is authoritative, nothing is. Limit the file to the pages you would proudly point a journalist at.

Most sites should have 20-100 entries in llms.txt, not 5,000.

Mistake two: writing summaries that don't match the pages

The second mistake is writing descriptions that promise more than the page delivers. The retriever fetches the page, compares it against the description, and downweights mismatched entries. Worse, the description can show up in AI citations as the source caption, embarrassing the publisher if it overpromises.

Write descriptions that accurately summarize the page in one line. The retriever rewards honesty.

How to validate your llms.txt

Three checks before shipping.

Fetch /llms.txt with curl. Confirm it returns 200, content-type text/plain or text/markdown, and looks correct.

Parse it as Markdown in any reader. Confirm the H1, blockquote, and bulleted lists render cleanly.

Visit two or three of the listed URLs. Confirm they're alive, return 200, and roughly match their descriptions.

After shipping, monitor your AI citation footprint over the next two weeks. The signal will tell you whether the file is shaping which pages get surfaced.

Your next move this week

Draft a minimal llms.txt. List 10-30 of your highest-authority pages with one-line descriptions. Ship at /llms.txt. Validate. Re-check your AI citation footprint two weeks later.

FAQ

What is llms.txt?

llms.txt is a Markdown file at the root of a domain that lists the pages a site considers authoritative, intended to be parsed by AI search engines. It serves as a curated index, telling AI retrievers which pages to prioritize when looking for sources.

How does llms.txt work in 2026?

AI engines fetch /llms.txt when they index a site. They parse the H1, blockquote description, and bulleted link lists, treating the listed pages as primary sources. The descriptions help the retriever decide which specific pages to fetch for a given query.

Why does llms.txt matter for SEO?

llms.txt shapes which pages get surfaced in AI search engines. Sites with a clean curated file get their preferred pages cited at higher rates than sites that leave the retriever to discover authority through crawling alone. It's not a Google ranking signal, but it's a meaningful AI-search-visibility signal.

Is llms.txt the same as robots.txt?

No. robots.txt is a directive standard that tells crawlers what they may and may not crawl. llms.txt is a curation hint that tells AI engines what content the publisher considers authoritative. They serve different purposes and live in different files.

Do I need llms-full.txt as well?

Only if you have a small site (under 50 pages) and want to make AI ingestion as friction-free as possible. For most sites, llms.txt alone is sufficient. llms-full.txt is a complement, not a replacement.

Which AI crawlers respect llms.txt?

Most major AI search engines respect llms.txt by 2026, including engines from OpenAI, Anthropic, Google, and Perplexity. Adoption isn't universal but it's high enough that the file is worth shipping.

Can llms.txt block AI scrapers I don't want?

No. llms.txt is a curation hint, not a directive. To block AI scrapers, use robots.txt with the appropriate user-agent rules. llms.txt is the inverse signal, it tells AI engines what to prioritize, not what to avoid.

llms.txt explained: what it is, why it matters, how to write one