Back to Articles
7 min read

The Next Layer of the Web: How FetchRight Connects Publishers and AI for Smarter, Cheaper Knowledge

Part 2 of 2 -- How publishers and LLMs share structured context to save compute and preserve authority

Gary Newcomb

CTO & Co-Founder, FetchRight

AI InfrastructureRetrievalPeek-Then-PayPublishing

The Problem Beneath the Problem

This article builds on Part 1: The Cost of Context, where I explained why modern LLMs burn enormous compute reconstructing context they've already seen.

If you haven't read that first piece, it provides the economic foundation for what follows.

Large language models don't browse the web the way humans do.

When you ask an AI assistant a question, it doesn't "go online". It interprets your prompt, retrieves text from cached indexes or APIs, and burns compute tokens to rebuild context each time. Every token it reprocesses costs money and time.

Meanwhile, publishers (the specialists of the web) already have structured, vetted content. But the AI systems that depend on it rarely contact them directly. Instead, they route through general-purpose search engines, re-embedding or scraping pages, losing brand attribution and adding massive redundant compute.

That's the disconnect FetchRight and the open Peek-Then-Pay standard are designed to fix.

How Retrieval Really Works (Today)

Here's the mechanical chain inside an LLM "web search":

  • The model writes a search query ("best 4K monitor 2025")
  • A helper agent calls a commercial search API (Google, Bing)
  • It receives a ranked list of titles, snippets, and URLs (sometimes including ads)
  • The agent fetches a few pages, strips HTML, chops them into chunks, embeds them, and ranks them again by vector similarity
  • The highest-scoring chunks, just a few kilobytes of text, are injected into the model's prompt for reasoning

Every one of those steps costs CPU/GPU time and discards most of the data fetched. And every LLM provider on Earth repeats this work independently.

The Better Way: Ask the Specialists Directly

Search engines are still the best tool for one crucial job:
identifying who the experts are.

LLMs should continue using Google or Bing for discovery
"Who are the authoritative sources for this topic?"

But once those experts are known, agents shouldn't need to scrape them, re-embed them, or repeatedly process full HTML pages on every query.

A better pattern emerges:

Search engines identify the specialists.
Publishers answer agentic questions directly.

Under Peek-Then-Pay, participating sites expose two lightweight capabilities:

1. Publisher Search (Cross-Resource Discovery)

GET /.well-known/peek/search?q="best 4K monitor 2025"

Returns a ranked list of canonical URLs, along with content/media types and scoring metadata (keyword, vector, or hybrid).

This helps the agent understand which specific pages are authoritative for the query — without relying solely on third-party search snippets.

2. Chunk Retrieval (Per-Resource Evidence Extraction)

After search identifies relevant URLs, the agent selects one and requests semantically relevant evidence from that specific page:

GET /products/best-4k-monitors?intent=chunk
    &embedding=[...]
    &top_k=5
    &license=...

The enforcer then:

  • Normalizes the page content
  • Uses the publisher's own embeddings/model to identify the most relevant spans
  • Returns short, anchored text chunks with provenance
  • Optionally uses cached or precomputed chunk indexes for speed

The result:
LLMs receive only the passages that matter, without scraping, without full-page re-embedding, and without losing the publisher's attribution or voice.

How the model uses it

Once this pattern is in place, an AI agent can:

  • Generate a query embedding once
  • Use Publisher Search to locate relevant URLs
  • Request chunk retrieval for each URL of interest
  • Use the returned spans as the grounded evidence

No scraping.
No redundant re-embedding.
No lost attribution.

Where FetchRight Fits

Peek-Then-Pay defines how those endpoints behave.
FetchRight operationalizes them.

FetchRight sits between AI crawlers and publishers, providing:

  • License management – time-limited, intent-specific tokens
  • Audit & attribution – signed requests and response metadata so both sides can trace usage
  • Edge enforcement – Cloudflare Worker layer with caching, budgets, and bot management
  • Transformation services – publisher-controlled summarization, embedding, and analysis endpoints

To the AI agent, it looks like a single, clean API for structured context. To the publisher, it's a protective gateway that maintains brand authority and monetizes access.

Developer Note: Full MCP Support

Both the FetchRight Licensing API (api.fetchright.ai) and the Cloudflare enforcer support the Model Context Protocol (MCP).

This allows agents to discover publishers, request licenses, and invoke search or chunk retrieval directly as MCP tool calls, with no custom client code. It makes publishers first-class participants in agentic ecosystems.

The Benefits - Quantified

For Publishers:

  • Preserve brand voice — AI answers use snippets provided directly from the source
  • Maintain attribution — Each returned chunk carries publisher metadata and license ID
  • Generate new revenue — License API enables pricing for deep or high-volume access
  • Reduce server load — Controlled peek/chunk requests replace uncontrolled bot scraping
  • Demonstrate expertise — Being the "specialist endpoint" reinforces E-E-A-T and brand trust

For LLM Operators:

  • Cut compute costs — Pre-embedded, pre-filtered spans eliminate 90% of redundant token processing
  • Improve accuracy — Fewer irrelevant snippets, higher semantic relevance per token
  • Lower latency — Smaller context windows → faster inference
  • Simplify compliance — Responses arrive with explicit, machine-readable rights
  • Boost user trust — Citations point to authoritative sources, not random blogs

Both sides save money. Both sides gain clarity and provenance.
And the web itself becomes semantically structured instead of being endlessly re-scraped.

A Shared Future of Context

The future web isn't about who owns data; it's about who provides the best structured access to it. FetchRight turns publishers into first-class participants in the AI economy, and gives LLMs a cheaper, faster, auditable way to think.

It starts with better context.
And that context already exists - on the publisher's side of the glass.

---

Missed Part 1? Start here: The Cost of Context
It explains why context reconstruction is the real cost center of modern AI.