Written by Giorgio Volpe 12 min read

Markdown for AI agents: the web’s new language

The web has always been built for human beings. HTML pages loaded with menus, scripts, advertising banners, tracking tags: everything designed for a browser that knows how to render each element visually. For years SEO worked to solve or work around content accessibility problems, many of which were progressively overcome thanks to the evolution of crawling techniques, Google’s in particular. With the arrival of AI crawlers, autonomous agents and conversational assistants the problem has resurfaced in a different form: for them, that structure of content drowned in code is pure noise.

In this scenario a concept is emerging that anyone working in SEO, content and digital marketing can no longer ignore: markdown for agents. It is not an entirely new format, but the way it is used, proposed as a standard and served by the web infrastructure makes it one of the potential key elements of the shift from the traditional web to the agentic web. A shift that, as we will see, is already underway – but far more uneven than the enthusiastic headlines of recent months suggest.

What markdown is (and why it matters to AI agents)

Markdown is a text formatting language invented in 2004 by John Gruber. Its logic is simple: use ASCII characters to indicate the structure and hierarchy of a text, without the need for complex tags. A ## indicates a second-level heading, an asterisk creates a list item, two asterisks make text bold. Readable by a human even without rendering, interpretable by a machine with extreme efficiency.

For a long time it remained a developers’ tool: READMEs on GitHub, technical documentation, personal notes. Then LLMs – the language models behind ChatGPT, Claude, Gemini and the other AI assistants – began using it as their preferred format for their own communication. This is no accident: markdown is structured just enough to give the text semantic context, yet light enough not to waste tokens – the minimal units of processing on which a language model’s work is based, and which have a real cost at every call.

Feeding an AI agent a raw HTML page is a bit like giving it a manual to read with half the words replaced by bureaucratic code: it understands anyway, but at a much higher cost. The numbers confirm it clearly: this Cloudflare blog post weighs 16,180 tokens in HTML and only 3,150 tokens once converted to markdown – an 80% reduction. Multiplied by millions of requests a day, the computational – and therefore economic – saving becomes enormous.

This could have direct consequences for those who work in SEO and digital content. Traditional ranking on Google remains important, but alongside it a new type of visibility is emerging: visibility towards the AI agents that synthesize content, answer users’ questions, or feed generative search systems.

Concrete applications: not just a format, an ecosystem

The adoption of markdown in the agentic web is taking shape along at least three distinct strands, with very different levels of maturity and impact.

1. The llms.txt file: an interesting proposal, with real limits

The llms.txt proposal, put forward by Jeremy Howard in September 2024, suggests adding a /llms.txt file in markdown format to the root of websites, to give LLMs a structured and curated access point to the site’s information. The llms.txt file is a document that usually presents an introductory section describing the site, followed by links to the main internal pages accompanied by a brief description of their content.

The analogy with robots.txt is suggestive: that too is a text file placed in the site root, designed to guide robots. But while robots.txt is a well-established standard respected by all the main crawlers – at least the “good” ones – llms.txt is still a proposal that, at least for now, does not appear to be used by the main AI systems. Moreover, the purpose of llms.txt is not to restrict more or less limited areas of the site from crawling, but rather to provide a map for accessing the site’s most important resources, a bit like sitemap.xml does.

In mid-2025, John Mueller of Google explicitly stated that Google’s AI systems do not use llms.txt – and the same applies to the crawlers of OpenAI and Anthropic, which do not request it in significant volumes. The most rigorous research available is that of SE Ranking (November 2025), conducted on around 300,000 domains: 10.13% had an llms.txt file, with a surprisingly flat distribution across traffic tiers – small sites at 9.88%, medium at 10.54%, large at 8.27%. There is therefore no correlation between a site’s authority and adoption of the file. Above all, removing llms.txt as a variable from the predictive model of AI citations improved the model’s accuracy: the file seems to introduce noise rather than a signal. Among the 50 most cited domains in AI search results, only one had an llms.txt file.

An independent test by OtterlyAI found that, out of 62,100 total AI crawler visits over 90 days, only 84 were direct requests to the llms.txt file – roughly 0.1% of the traffic. An Ahrefs analysis (May 2026) of over 137,000 domains confirms the picture: 97% of sites with a valid llms.txt received no request for the file in the observed month.

It is worth remembering, however, that an endorsement of the llms.txt file comes from Addy Osmani, Director of Engineering at Google Cloud AI, who in an article last April on Agentic Engine Optimization (not to be confused with Answer Engine Optimization, with which it shares the acronym AEO) advocates its adoption, also explaining how it should be built. Osmani himself, however, narrows the advice: his audience is developers and the technical documentation consumed by coding agents, and he specifies that Google Search does not officially recommend llms.txt as a standard.

Moreover, its presence and formal correctness is taken into account by Lighthouse’s new Agentic Browsing category (introduced in version 13.3), which assesses a site’s readiness to interact with bots. The weight of this check should be put in perspective, though: the category is explicitly experimental, it does not produce a 0-100 score but a simple report of checks passed, and the absence of the file is not treated as an error – it is marked as Not Applicable, because the file remains optional. In the same period, moreover, Google Search published guidance that lists llms.txt among the things not needed to appear in generative AI features: two Google teams, two divergent signals – the clearest sign of how fluid the scenario still is.

2. The AGENTS.md file: instructions for coding agents

In the world of software development, another parallel standard is taking hold: the AGENTS.md file. It is a simple, open format for guiding coding agents – a README dedicated to agents, a clear and predictable place to provide the context and instructions AI tools need to work on a project.

The distinction from the traditional README.md is precise: the README is for humans – project descriptions, installation instructions, contributor guides. AGENTS.md contains the more detailed context that coding agents need: build steps, tests, code conventions, which might weigh down a README or not be relevant to human contributors.

The format – proposed by OpenAI in August 2025 and, since December 2025, managed by the Agentic AI Foundation of the Linux Foundation – is adopted, according to the data published on agents.md, by over 60,000 open source projects, and is natively supported by tools such as Codex, GitHub Copilot, Cursor, Devin, Windsurf and many others. The notable exception is Claude Code, which reads only its own CLAUDE.md file: Anthropic’s own documentation suggests, for repositories that use AGENTS.md, creating a CLAUDE.md that imports it. The direction is clear nonetheless: every project will soon have its “for humans” documentation and its “for agents” documentation, and the two will not coincide.

3. On-the-fly conversion: Cloudflare’s approach

The third strand is the most interesting from a web infrastructure standpoint, because it does not require webmasters to do anything new: it is Cloudflare that does it for them.

In February 2026, in fact, Cloudflare announced a beta feature called Markdown for Agents, which can be enabled from its dashboard for the Pro, Business and Enterprise plans (as well as for SSL for SaaS customers), at no extra cost.

The mechanism looks elegant. When an AI system requests a page from a site using Cloudflare with this feature enabled, it can express its preference for text/markdown through the HTTP Accept header. The Cloudflare network detects this preference, retrieves the original HTML version from the origin server and converts it to markdown in real time, at the CDN level – without the origin server having to do anything special.

Does Cloudflare’s feature really matter?

Below are the main strengths of this feature:

Reduced computational cost the saving on tokens is real and measurable – up to 80% on pages heavy with markup, with values around 30% on typical content pages. Cloudflare also exposes the x-markdown-tokens and x-original-tokens headers, which allow it to be estimated page by page. For those running AI pipelines that crawl and massively analyze web content, the economic difference is immediate.
Improved reading quality an HTML page inevitably contains navigation, sidebars, footers, third-party scripts. Converting to markdown removes all this noise, leaving only the semantically relevant content. An agent reading the markdown version could produce more accurate answers.
Scalability with no editorial burden Cloudflare’s automatic conversion works across the whole site with no intervention. It is an infrastructural solution, not an editorial one.
Signaling intended use the converted responses include a Content-Signal header indicating whether the content can be used for AI training, search or as input for agents – a first step towards a system in which publishers can express preferences on the use of their content. A caveat about the default, though: if the origin does not specify its own policy, the header is set to ai-train=yes, search=yes, ai-input=yes. Those with restrictive choices about the AI use of their content should review these values before enabling the feature.

Some potential weaknesses also need to be considered:

Conversion quality depends on HTML quality a well-structured site, with a coherent heading hierarchy and semantically clean content, converts into good-quality markdown. A site with messy HTML – headings used for graphic rather than semantic purposes, content scattered across anonymous divs, etc. – will produce lower-quality markdown. In essence, the feature translates the existing content, it does not optimize it.
Agent behavior is not yet standardized very few AI agents send the Accept: text/markdown header. Many crawlers continue to work with raw HTML. As things stand (July 2026), according to Cloudflare Radar – a figure confirmed by our Fortop observatory – the share of AI-bot requests served with markdown is below 0.1%. The feature is useful today for those who want to be ready, not a universal solution.
Technical limits of the beta granular control, initially absent, has been introduced – it is now possible to limit conversion to specific subdomains or paths via configuration rule – but documented operational constraints remain: conversion applies only to HTML documents, does not occur if the origin’s response does not include a Content-Length header (a frequent case with the chunked transfer encoding of some managed hosting) and not even if the page exceeds 1MB. If enabled, it must therefore be verified page by page, not taken for granted.
The cloaking question is not yet settled serving bots content different from what is shown to users inevitably evokes the cloaking penalized by Google. John Mueller’s criticisms, however, concerned separate markdown pages, not the standard content negotiation used by Cloudflare, on which Google has not yet ruled. The informational content stays identical and only the delivery format changes – a solid argument in favor of its legitimacy.

An evolving scenario for GEO

LLMs do not “read” the web the way we read it. When an agent navigates a page, every HTML element must be turned into tokens, processed, contextualized. Noise – everything that is not informational content – has a real cost. Markdown answers this need better than HTML because it was designed to be written and read by human beings, yet it has enough structure to be processed efficiently by machines.

The initiatives described in this article are not isolated events but signals of a direction: the web is preparing – unevenly and still without a single standard – to serve two types of reader at once. And here is the thesis worth stating clearly: markdown for agents is not yet a measurable SEO factor, but it could represent a first building block on which the next phase will be founded. Those who build well-structured, semantically coherent content today, as readable by a machine as by a human, are not chasing a technical fad: they are investing in a potential competitive advantage with a deferred maturity.

Visibility in generative search systems depends on the quality with which content is “understood” by an LLM. Cleanly structured content has a far greater chance of being correctly synthesized – and therefore cited – by an AI assistant than equivalent content buried under layers of decorative HTML. It is not about abandoning optimization for traditional search engines: it is about adding a layer that today is still optional, but that may not remain so for long.

Where to start, in practice? With monitoring the access logs of AI crawlers, to gauge the incidence of requests that accept responses in markdown format. Based on those measurements, we can evaluate whether and how to implement a markdown-conversion system in specific areas of the site. With what criteria? That is another story.

Markdown for AI agents: the web’s new language

What markdown is (and why it matters to AI agents)

Concrete applications: not just a format, an ecosystem

1. The llms.txt file: an interesting proposal, with real limits

2. The AGENTS.md file: instructions for coding agents

3. On-the-fly conversion: Cloudflare’s approach

Does Cloudflare’s feature really matter?

An evolving scenario for GEO

Other categories

Market Insights

The Builder’s Perspective

Blog

Markdown for AI agents: the web’s new language

What markdown is (and why it matters to AI agents)

Concrete applications: not just a format, an ecosystem

1. The llms.txt file: an interesting proposal, with real limits

2. The AGENTS.md file: instructions for coding agents

3. On-the-fly conversion: Cloudflare’s approach

Does Cloudflare’s feature really matter?

An evolving scenario for GEO

Prompt Analysis: a new world to explore

The impact of AI Overview on People Also Ask answers

Google, Bing and GEO: who is right?

Technical GEO: Discoverability, Accessibility and Readability

Google Search Console and Bing Webmaster Tools: Governing Access to Your Data

From Keywords to Prompts and Back: Bing’s AI Performance Panel

Market Insights

The Builder’s Perspective

Blog