All posts

An experimental guide to Answer Engine Optimization

Ross Hill · April 1, 2026

Nearly 60% of Google searches now end without a click. AI referral traffic to major websites grew 357% year over year in 2025. When someone asks ChatGPT or Perplexity "what's the best Canadian hosting platform," the answer doesn't come from a ranked list of blue links. It comes from whatever the model already knows, or can fetch in real time.

Being the best result on a search engine results page may no longer be sufficient. You might also need to be the source an AI model can most easily understand and cite. That's the premise behind Answer Engine Optimization (AEO): making your content legible to the systems that are increasingly mediating how people find information.

Here's the problem AEO tries to solve. A browser downloads your JavaScript, hydrates your React tree, renders your components, and gives the user a fully interactive page. An AI answer engine downloads your HTML and tries to extract meaning from it. What it gets is something like this: a <div> with a class name that's a hash, containing another <div>, containing a <section> with Tailwind utility classes, wrapping a <h2> that finally has the text it's looking for. If the model is doing real-time retrieval (like Perplexity or ChatGPT with browsing), it has a time budget, and the harder you make it to find the content, the less likely you are to get cited.

So I rebuilt my content pipeline to fix this: moved everything into markdown, added middleware to serve it directly to AI agents, and layered in the metadata they need to cite you accurately. This post walks through each step, in case you want to try it too.

Fair warning: the standards here are drafts or proposals (llms.txt, Content-Signal), nobody knows which techniques will matter long term, and I can't yet measure direct impact on AI citations. I tried them anyway because the underlying trend feels real, it was a fun engineering challenge, and the results are useful regardless.

Step 1: put all your content in markdown

Move every page on your marketing site from JSX components into markdown files. That includes rich landing pages with hero sections, feature grids, comparison tables, and FAQ sections. The content directory becomes the single source of truth for all page content.

Markdown is the right foundation because it's the format AI models already understand best. They're trained on enormous amounts of it (docs, READMEs, blog posts, wikis), so the structure is already familiar to them. Compare that to HTML where the same information is buried in nested <div>s, CSS class hashes, and wrapper elements that exist only for styling.

Markdoc is Stripe's content authoring system, essentially markdown with a tag syntax for custom components. MDX solves a similar problem by letting you embed JSX directly in markdown, and either works well for this approach since LLMs handle both formats fine. If you're on Next.js App Router, use the core @markdoc/markdoc library rather than the @markdoc/next.js plugin, which targets Pages Router.

A landing page that used to be a React component full of hardcoded strings becomes something like this:

---
title: Canadian hosting
description: Deploy on Canadian infrastructure with git push.
---

{% hero-section %}
Deploy on Canadian infrastructure
{% /hero-section %}

{% feature-grid-section %}
{% feature-card title="Git push deploys" %}
Push to main. Your app is live in minutes.
{% /feature-card %}
{% feature-card title="Managed databases" %}
PostgreSQL, MySQL, Redis. One-click provisioning.
{% /feature-card %}
{% /feature-grid-section %}

Custom tags like hero-section and feature-card map to React components at render time, so the browser still gets the same interactive page. But the source file is clean markdown that's directly readable by any system that understands text. That's the key insight: your Markdoc source files are now context you can serve directly to AI systems without any transformation.

Step 2: add llms.txt

The llms.txt spec is a simple convention: put a text file at /llms.txt that gives AI models a curated index of your site, essentially sitemap.xml for language models.

A sitemap is a flat list of URLs for crawlers that visit each page individually. An llms.txt file is a curated, annotated map, the table of contents you'd write if you were turning your entire site into a document. It tells the model who you are, what your key pages cover, and where to go for depth.

AI agents with tool use are increasingly capable of multi-step research: fetch an index, decide which pages are relevant, fetch those pages, synthesize an answer. A well-structured llms.txt gives those agents exactly the entry point they need to navigate your site efficiently.

Generate yours dynamically from the content directory:

# MapleDeploy

> Canadian-first managed Coolify hosting.

## Key pages
- [Home](https://mapledeploy.ca): MapleDeploy homepage...
- [Canadian Hosting](https://mapledeploy.ca/canadian-hosting): Deploy on Canadian infrastructure...

## Comparisons
- [vs Railway](https://mapledeploy.ca/compare/railway): Compare MapleDeploy with Railway...

## Blog
- [Why We Chose Canadian-First Infrastructure](https://mapledeploy.ca/blog/why-we-chose-canadian-first-infrastructure): ...

## Full content
For complete page content, see [/llms-full.txt](https://mapledeploy.ca/llms-full.txt)

Also serve /llms-full.txt, which concatenates the full markdown body of every published page into a single file. An AI agent that wants comprehensive context about your product can fetch the entire site in one request.

Filter out unpublished content (future-dated posts) from both routes and cache them for 24 hours.

Making llms.txt discoverable

The well-known /llms.txt path and the llms_txt field in the enriched frontmatter (Step 4) already make the index findable. But you can reinforce discovery with HTTP response headers on every page:

Link: <https://yoursite.com/llms.txt>; rel="llms-txt"
X-Llms-Txt: https://yoursite.com/llms.txt

The Link header follows the proposed llms-txt link relation. X-Llms-Txt is a simpler alternative for agents that don't parse Link syntax. Set both globally in your framework's headers config and on markdown API responses. In the HTML <head>, add the equivalent <link> tag:

<link rel="llms-txt" href="https://yoursite.com/llms.txt" />

Step 3: serve markdown to AI agents

With content in markdown, add middleware that intercepts requests from AI agents and serves the raw markdown instead of rendered HTML. This involves detecting the right requests, giving humans and agents an explicit way to request markdown via the .md extension, and making sure the markdown URLs don't create duplicate content problems with search engines.

Detecting AI agents

Check the User-Agent header against a list of known AI bots. The ai-robots-txt repo maintains a comprehensive, regularly updated list. In practice you're matching against roughly 60 patterns covering OpenAI (ChatGPT-User, OAI-SearchBot), Anthropic (Claude-SearchBot, Claude-User), Perplexity (PerplexityBot), Google (Gemini-Deep-Research), and others.

Also respond to Accept: text/markdown, so any client can request the markdown version explicitly. When a match is found, rewrite the request to an API route that reads the Markdoc source file and returns it with text/markdown as the content type:

export function middleware(request: NextRequest) {
  const userAgent = request.headers.get("user-agent") || "";
  if (!isAICrawler(userAgent) && !wantsMarkdown(request)) {
    return NextResponse.next();
  }

  const slug = pathToSlug(request.nextUrl.pathname);
  const url = request.nextUrl.clone();
  url.pathname = "/api/markdoc-source";
  url.searchParams.set("slug", slug);
  return NextResponse.rewrite(url);
}

The examples here use Next.js middleware, but the pattern works in any framework that lets you intercept requests before they reach your route handlers. Astro, Nuxt, SvelteKit, and Remix all have equivalent middleware or server hook layers. The logic is the same: check the user-agent or Accept header, and rewrite to a route that serves markdown.

Cover all content paths in the matcher and exclude only API routes, framework internals, and static files. New content pages get covered automatically without editing the middleware.

Supporting the .md extension

There's a third trigger worth adding: the .md file extension. If someone appends .md to any URL on your site, serve the markdown with full metadata directly. This gives both people and AI agents an explicit, unambiguous way to request markdown without relying on user-agent detection or content negotiation. Add a check at the top of the middleware, before the AI agent logic. Strip the .md suffix, resolve the slug, and rewrite to the same markdown API route.

Try it on any page on this site: append .md to the URL (e.g., /blog/answer-engine-optimization-guide.md) and you'll see the markdown with full business metadata in the frontmatter. Every page has a "This page as markdown" link in the footer that links to its .md version.

Handling duplicate content

The .md URLs create duplicate content from Google's perspective. The fix is an HTTP Link header with rel="canonical" on the markdown response, pointing back to the HTML version. This is the standard way to declare a canonical URL when the response isn't HTML (where you'd use a <link> tag instead). Google explicitly supports it.

return new Response(markdown, {
  headers: {
    "Content-Type": "text/markdown; charset=utf-8",
    Link: `<https://example.com${canonicalPath}>; rel="canonical"`,
  },
});

This tells search engines that /canadian-hosting is the authoritative version of /canadian-hosting.md. Make sure your HTML pages also have the standard <link rel="canonical"> tag pointing to themselves, so both versions agree on which URL is canonical.

Verifying it works

Once the middleware is deployed, you'll want to confirm AI agents are actually getting markdown. The simplest test is to curl your own site with a spoofed user-agent, the Accept: text/markdown header, or the .md extension:

curl -H "User-Agent: ChatGPT-User" https://yoursite.com/some-page
curl -H "Accept: text/markdown" https://yoursite.com/some-page
curl https://yoursite.com/some-page.md

All three should return the same markdown response. Check that the Content-Type header is text/markdown and that the Link canonical header is present.

Beyond spot-checking, you'll want ongoing visibility. The AEO tooling landscape is still young and changing fast, so the most reliable approach is monitoring your own server logs: filter for AI user-agent strings and track how often they're hitting your pages, which content gets the most attention, and how that volume trends over time. If you're behind a CDN like Cloudflare, their analytics dashboards will surface bot traffic as a distinct category, making it easier to spot which pages get visited most and whether new bots are showing up that your list doesn't cover.

Step 4: enrich the markdown with metadata

Serving raw markdown is a good start, but AI models lose context when they only see one page in isolation. On a normal HTML page, models can pull identity and context from meta tags, OpenGraph properties, and JSON-LD structured data. In a markdown response, all of that is gone unless you put it back explicitly.

Solve this by injecting business metadata into the YAML frontmatter before serving the response. When ChatGPT fetches /canadian-hosting, it should get back the original frontmatter plus the same kind of context that HTML meta tags and structured data would normally provide:

---
# ... original title, description, etc.
type: article
author: Ross Hill
locale: en_CA
site_name: MapleDeploy
slogan: Powerful hosting on Canadian soil
founding_date: 2026-01-13
email: hello@mapledeploy.ca
geo_region: CA-ON
geo_placename: Toronto
address_country: CA
area_served: Canada
offers: Starter $45/mo, Pro $95/mo, Ultra $195/mo CAD
canonical_url: https://mapledeploy.ca/canadian-hosting
llms_txt: https://mapledeploy.ca/llms.txt
---

This metadata serves a specific purpose: it grounds the model's response in verifiable facts about your business. Without it, a model answering "what does MapleDeploy cost?" has to infer pricing from whatever it finds in the page body, if it finds it at all. With global metadata injected into the frontmatter, that information is structured, unambiguous, and immediately available. The llms_txt field is especially useful because it gives the model a pointer to your full site index, enabling it to fetch additional context if the current page doesn't answer the user's question completely.

Keep these fields only in the AI-served version, not in the raw .md files in the repo. If you've built anything with Next.js layouts or template inheritance, the mental model is the same: global metadata defines site-wide defaults, page-specific frontmatter overrides them. The new part is pulling from all the places this information normally lives in HTML (meta tags, OpenGraph properties, JSON-LD structured data) and merging them into a single YAML frontmatter block.

On the HTML side, add JSON-LD structured data to your pages. This helps both traditional search engines and AI systems understand your content, and the same types pull double duty for AEO. Choose the types that match what's on each page: Organization on your homepage, FAQPage on pages with FAQ sections, Article and BreadcrumbList on blog posts, product-specific types like SoftwareApplication or WebApplication on pricing pages. The full list of schema.org types is large, but most sites only need a handful.

Step 5: set permissions

Steps 1 through 4 make your content easy for AI systems to find and understand. This step tells them what they're allowed to do with it.

This is worth getting right because the permissions landscape for AI is fragmented and evolving. Different systems respect different mechanisms, and the legal and ethical norms around AI training data are still being established. Being explicit about your permissions, even if not every system reads them yet, sets a clear intent that benefits you regardless of how the ecosystem matures.

robots.txt is the oldest mechanism. If you want every crawler to have access, a simple User-agent: * / Allow: / is sufficient. There's no need to list AI bots individually. A wildcard already grants access to everything. The only reason to list specific user agents is to block them with Disallow. If you do want to block certain crawlers (for example, allowing search retrieval but disallowing training crawlers), robots.txt is the place to do it, though it requires knowing the specific user-agent strings for each crawler category.

If you want more granular control (for example, allowing citation but blocking training), the Content-Signal IETF draft header lets you express that per-response:

Content-Signal: intent="allow", use="search,assist", block="train"

This tells AI systems they can use your content for search results and assistant responses, but not for model training. You can set this header globally in your middleware or CDN config. The spec is still a draft and adoption is early, so treat it as a forward-looking signal rather than an enforcement mechanism. If you're permissive across the board, a wildcard robots.txt is all you need.

Caveats

User-agent sniffing can backfire. An AI agent's fetch tool probably expects HTML. Instead it gets raw Markdoc with custom tag syntax. Most LLMs handle this fine, but a data ingestion pipeline expecting HTML might choke. Similarly, a client requesting Accept: text/markdown probably expects CommonMark, not Markdoc's custom tags. If this becomes a problem, strip the custom tags and serve only the prose content.

The bot list is a moving target. New AI user agents appear regularly. Check the ai-robots-txt repo periodically and update your list.

Budget time for metadata plumbing. Merging global and page-level frontmatter is the most tedious part. Flattening multiple metadata sources into a single coherent frontmatter block, with correct precedence rules, takes more effort than you'd expect.

Conclusion

That's the full pipeline: markdown source, smart routing, layered metadata, and clear permissions. Every step builds on the one before it, and the end result is a site that's as easy for AI systems to read as it is for browsers to render.

Is all of this necessary? The line between "your website" and "a document an AI can read" is blurring, but whether you need to serve markdown to bots today is genuinely unclear. Having all your content in markdown with clean metadata is a good foundation regardless of what happens with AI search. And if AI-mediated discovery does become the norm, you'll already be set up for it.


MapleDeploy is a managed Coolify hosting platform on Canadian infrastructure, with plans starting at $45 CAD/month and a 14-day free trial. If you're building something that needs Canadian data residency, take a look.

Canadian hosting, managed for you

Deploy on Canadian infrastructure with git push. Start with a 14-day free trial.