← Back to Resources
Primer

Agentic commerce, in plain language

Lantern Team·12 min read

Six months ago, the buying question still ended at a search bar. Today, increasingly, it ends at a sentence. A shopper types "best subscription coffee 2026" into ChatGPT, Claude, Perplexity, Gemini, or Amazon Rufus — and a model picks the shortlist for them, names brands, gives reasons. The shelf you have been optimising for is being assembled before the click ever lands on your site.

This is agentic commerce. It is not a future tense.

What an AI shopping agent actually is

A shopping agent is a model — and increasingly, a model plus a tool layer — that handles the deciding step of a purchase the way Google once handled the finding step. Five consumer-facing surfaces dominate the marketing pitch today: ChatGPT, Claude, Perplexity, Gemini, and Amazon Rufus. Each takes a buyer-intent query in natural language, reads the open web (plus, in some cases, structured product feeds and partner integrations), and produces an answer that names a handful of brands — almost always three to five — with reasons attached.

The scale is real. Amazon's AI shopping layer, which powers Rufus, was credited with roughly $12B in incremental annual sales and 3× the conversion rate of non-AI sessions (Tinuiti). On the open web, Criteo and AirOps put the channel signal harder: about 37% of consumers now begin digital purchase research with an AI tool, and ChatGPT-referred sessions convert at roughly twice the rate of organic visits.

Two things matter about the shape of that answer. First, it is curated, not ranked — a model picks two or three brands rather than presenting ten in order. Second, the citations underneath it are categorical: a piece of evidence is Own (your domain), Earned (a third-party reviewer or trade title), Competitor (one of the brands beating you), or Social (Reddit, YouTube, a community thread). Lantern's Citations surface tags every cited URL in exactly those four buckets, because they correspond to four very different kinds of effort to influence.

Why this isn't just SEO 2.0

The instinct is to call this "AI SEO" and reach for the existing playbook. That instinct misleads in three concrete ways.

The output is unstable in a way the SERP was not. Reddit's share of ChatGPT citations swung from roughly 60% to 10% over two weeks in September 2025. When Reddit sued Perplexity in October 2025, Perplexity's Reddit cite share dropped 86% almost overnight, with YouTube filling the gap (Tinuiti Q1 2026 Citations Report). A SERP that moves five places is a story; an AI answer landscape that swaps a third of its citation sources in a fortnight is normal.

Cross-provider overlap is much weaker than you would expect. Only about 11% of the domains cited by ChatGPT also show up in Perplexity, and only 13.7% overlap between Google's AI Overviews and AI Mode (DailyGeoInsights, 2026). The evidence Wildgrain Roasters needs to win on ChatGPT is meaningfully different from the evidence it needs to win on Gemini.

Position in the SERP is replaced by position in a paragraph. BrightEdge found that 44.2% of all LLM citations come from the first 30% of a page's text. The first 200 words of /collections/whole-bean now do the work that the title tag did in 2018. The Content agent's edits start there for a reason.

Together, those three facts argue against the SEO mental model and toward a monitoring mental model — one that measures the answer landscape every day, classifies what changed by family, and never assumes last week's snapshot is still true.

The four stable signals

Despite the noisy citation surface, the user-visible outputs of an AI buying answer reduce to four numbers that are surprisingly stable. Lantern tracks all four under a single composite called Brand AI Health — 0 to 100 — built from three structural dimensions and three measurable drivers.

The three dimensions are Website Signals (what your owned pages tell a model), Brand Reputation (what the rest of the internet says about you), and Content Strategy (whether your prose answers the buyer questions models route on). Each dimension fails differently and is fixed by a different team — which is why the composite is more useful than any single underlying score.

The three drivers underneath the dimensions are the metrics most operators end up watching directly.

  • Visibility — your percentage share of AI answers for the prompts you track. Wildgrain Roasters tracking 45 prompts and showing up named in 23 of them sits at 51% Visibility. This is the closest analog to share of voice.
  • Favorability — a 0–100 score derived from per-provider sentiment, measured on a −1.0 to +1.0 scale with a sample count n= attached to every per-provider read. Favorability also pulls from the descriptor cloud, the actual words a model uses about your brand, each tagged positive, neutral, watch, or negative.
  • Citation Score — a per-URL 0–100 score that tells you which of your pages are actually pulling weight. The Citation Score methodology piece in this collection unpacks it in full; the short version is provider count × query count × position in answer × recency, classified by the four citation types above.

All three drivers feed Brand AI Health. Brand AI Health is the one number to put on the operator's wall.

What the four agents actually do

Inside Lantern, the work gets done by four named agents — each owning one slice of the loop.

Query strengthens the prompts AI users actually ask. It pulls from the live prompt feed (which buyer queries are routing demand into the category), proposes new prompts you should be tracking, and prunes prompts where coverage has fallen permanently. Query is upstream of everything else: if you are tracking the wrong prompts, the rest of the system optimises for the wrong shelf.

Catalog keeps product data clean, complete, and citable. Product schema, feed attributes, FAQ schema, variant rollups. When a Catalog recommendation lands as an Engine bundle it ships now — the schema patch goes in, the FAQ block goes up, Lantern measures the move on a 7-day window.

Content tunes on-page content for AI scannability. Hero-copy edits, claim-proof updates, the first 200 words of cluster pages, comparison blocks on /collections/ pages. Content is the agent the Visibility driver responds to first.

Offsite builds external trust signals — Wikipedia presence, YouTube channel, G2 and Trustpilot reviews, LinkedIn organisation profile, the press and partner placements that make the model's external evidence file fatter. Offsite work is slower (the External Signals 30-day plan in this collection has the week-by-week) but it is where Brand Reputation and Favorability ultimately move.

How a finding becomes a shipped change

A Lantern recommendation lives in one of four buckets. The taxonomy matters because the same finding can route to very different work.

Engine bundles are the recommendations you ship now — Product schema, FAQ schema, hero-copy edits, claim-proof updates. Each Engine bundle carries a predicted lift in points (e.g. "+4 Visibility on the Catalog dimension") and a Trust Phase autonomy setting. Trust Phase is the per-brand, per-family lever that decides whether the agent applies the change directly or sends it for approval — set FAQ schema to auto-apply if you trust the Content agent, set hero copy to send-for-review if you don't.

Scanner bundles are site-infrastructure recommendations: llms.txt, robots.txt, sitemap fixes. Think of llms.txt as the AI-era equivalent of the sitemap and robots.txt pair — by April 2026, ChatGPT, Perplexity, Claude, and Gemini all actively read it (Google declined; see the methodology piece for the operational consequences).

Long-term recs are operator-led work that compounds over months: origin-stories hubs, comparison-content libraries, Sprudge-class press placements. Each Long-term rec carries a confidence tier — single-event, low (3–7 supporting citations), medium (8–24), or high (25+). Lantern surfaces signals at medium and above by default; single-event findings are visible but never auto-actioned.

Applied is the audit log. Every shipped recommendation lands here, paired with its predicted versus measured lift on a 7-day window. If a Catalog Engine bundle promised +4 Visibility and delivered +2.7, that gap is the input to the next confidence calibration.

Above all four sits the Impact Runway — the projected lift across every open recommendation right now. A typical board-ready Runway reads "+14.6 pts across 8 recs"; it is the closest thing the channel has to a sales pipeline.

Who needs to care

For founders and operators, the agentic shelf is the dashboard question — Brand AI Health on the wall, Industry Changes in the weekly review. The point is not to add another tool; it is to know which buying questions route demand into your category, and which of those route it past you.

For growth and acquisition leads, the AI channel is now a measurable acquisition surface. AI-engaged visitors convert at roughly 12.3% versus 3.1% for self-serve browsing on the same site (Immerss 2026 ecommerce benchmarks). That delta is the largest available conversion opportunity in most DTC stacks today.

For content and SEO teams, the work changes shape, not direction. Comparison content, evidence pages, FAQ schema, the descriptor cloud — the Content agent's queue is the new editorial calendar. The Writing comparison content that wins AI citations guide in this collection has the format details.

For ecommerce and merchandising teams, the Catalog agent is where the work lives. Product schema, feed attributes, PDP hero copy. The Make product pages legible to AI shopping agents guide walks through every recommendation type.

What changes versus SEO and SEM

Versus SEO: the ranking is a curated shortlist, not a ranked list. The position in the answer matters more than the position in the SERP. The 44.2% / first-30% rule applies. Citations are categorical (Own / Earned / Competitor / Social), not a gradient. The work compounds along the same axes — authority, structure, freshness — but the measurement moves to a Brand AI Health composite, not blue-link rankings.

Versus SEM: there is no auction. Visibility is earned through evidence, not paid for. The closest paid-channel analog — sponsored AI placements — exists in beta on a couple of providers but is not yet a primary lever. Until that changes, the operating logic is earned media at scale: build the evidence base, ship the schema, claim the off-site footprint.

The short glossary
Brand AI Health — the composite. Visibility, Favorability, Citation Score — the three drivers. Query, Catalog, Content, Offsite — the four agents. Engine, Scanner, Long-term, Applied — the four queues recommendations route into. Confidence tiers — single-event, low, medium, high. Trust Phase — the per-brand, per-family autonomy lever. Impact Runway — the projected lift across open recs. Helping and hurting families — Brand Authority, External Consensus, Structured Data, Query Relevance, Freshness, Entity Clarity, Crawl Accessibility, Content Quality — the eight evidence categories every model weights, in different proportions, on every answer. That is the vocabulary. The rest of this collection is what to do with it.