Back to Articles
6 min read

The Cost of Context: How FetchRight and Peek-Then-Pay Give LLMs a Smarter Web

Part 1 of 2 -- Why Context Reconstruction is the Real Cost Center of Modern AI

Gary Newcomb

CTO & Co-Founder, FetchRight

AI EconomicsLLM EngineeringPeek-Then-PayContext Optimization

The Invisible Price of Understanding

Every large-language-model query comes with an invisible price tag: context reconstruction.

In this first article of a two-part series, I'll unpack why today's LLMs waste so much compute rebuilding knowledge they've already seen, and how that inefficiency reshapes the economics of AI.

In Part 2, I'll move from diagnosis to design, showing how publishers and model operators can share structured, licensed context through FetchRight and the open Peek-Then-Pay standard.

The Hidden Cost in Every AI Query

Every time a large language model answers a question about the world - whether it's "what's the best router for gaming?" or "summarize this article from PCMag" - it has to reacquire and reprocess context.

That means:

  • Crawling or embedding raw web text,
  • Tokenizing it all again,
  • And then discarding it after a single inference.

For a model like GPT-4, that process burns thousands of tokens just to get to the starting line, before even generating a response. Multiply that across billions of daily queries and you begin to see it: context is the real cost center of modern AI.

Publishers Already Have the Context

Many publishers already manage structured, high-quality content with canonical URLs, metadata, topics, and sometimes even embeddings for internal search or personalization.

Yet few have exposed that structure in ways AI agents can directly query or understand; capabilities that increasingly belong on their own domains.

Still, today's AI systems treat publishers as if they're flat text files.

They scrape, strip, and rebuild knowledge from scratch, spending enormous compute to recreate structure that could already exist in higher fidelity within publishers' own systems, while offering publishers no visibility or value in return.

This mismatch is not just inefficient. It's unsustainable for both sides.

Enter Peek-Then-Pay: The Context Protocol for AI

Peek-Then-Pay is an open standard that defines how AI crawlers can discover, preview, and license structured content from the web.

Think of it as: robots.txt for AI, but enforceable and auditable.

A publisher hosts a simple manifest file - peek.json - that declares:

  • What types of transformed content can be served,
  • How to obtain full content or transformed data,
  • And what licensing terms apply (via the linked API).

LLMs can "peek" to see what's available and relevant without violating terms, using HTTP 203 responses for previews instead of blind 402 rejections.

When deeper access is needed, the model requests a license through the publisher's chosen provider.

That provider is FetchRight.ai.

FetchRight: Licensing Infrastructure for the AI Web

FetchRight operationalizes the Peek-Then-Pay protocol.

It gives publishers, AI agents, and model operators a common language for:

  • Declaring intent (e.g., "summarization", "embedding", "training")
  • Issuing time-limited, budgeted licenses
  • Verifying provenance via signed tokens (DPoP / JWS)
  • Auditing access across CDNs and transforms

It's built to work transparently with Cloudflare, Fastly, or other edge providers, enforcing rules at the perimeter and caching peeks for efficiency.

For LLM builders, FetchRight isn't another gatekeeper - it's the missing layer of clarity.

Instead of guessing what's allowed, you get a contractually clear, machine-readable pathway to authorized, structured context.

Why This Matters for LLM Engineers

If you're building or maintaining retrieval pipelines, you already know the economics:

  • Embedding a document costs tokens.
  • Fetching raw HTML costs bandwidth.
  • Re-embedding every crawl cycle costs more GPU time.

Now imagine a world where:

  • The publisher provides pre-computed embeddings (in OpenAI, Cohere, or HuggingFace formats).
  • You retrieve those via standardized vectors instead of recomputing them.
  • You only pay to reason over context - not regenerate it.

That's the FetchRight promise: shared efficiency without shared exposure.

It's cheaper for the LLM, safer for the publisher, and traceable for everyone.

The Economic Math

For a typical web-scale LLM:

  • 90% of token costs go to context reconstruction
  • 10% go to inference

By replacing raw text crawling with structured peek.json access, models can cut that 90% dramatically - while publishers finally get paid for the structured data already in their CMS.

It's a rare equilibrium where both compute and compensation improve.

Built for the Edge

FetchRight isn't theory - it's live infrastructure.

It runs natively today on Cloudflare Workers, combining low-latency edge compute with licensing intelligence.

Today, the enforcer supports:

  • KV storage for cached peeks and pricing data,
  • Durable Objects for per-license budget ledgers,
  • Bot Management for identity gating,
  • Configurable transform services for publisher-controlled summaries and embeddings, and
  • Integrated search and QA endpoints for semantic lookups over publisher content.

In production, that means:

  • Human visitors: +15–60 ms latency (imperceptible)
  • Search-engine bots: immediate pass-through via robots.txt Allow
  • Licensed AI agents: +80–460 ms typical (hundreds more if transformations are requested)

Performance scales with caching and warmup; subsequent licensed requests typically resolve in under 120 ms.

The result is edge-native licensing and retrieval — fast enough for real-time AI agents, lightweight enough for global publishers, and flexible enough to support live semantic search, QA, and transformation directly at the perimeter.

The Future of the Licensed Web

The web's next protocol war isn't about privacy or SEO.

It's about how AI learns - and who gets a voice in that process.

Peek-Then-Pay gives the web a shared grammar for structured, licensed knowledge exchange.

FetchRight turns that grammar into a real economy.

If you're building AI agents, RAG systems, or model APIs - the path to better answers, faster responses, and lower costs doesn't start with another GPU.

It starts with better context.

And that context already exists - on the publisher's side of the glass.

---

Continue to Part 2: The Next Layer of the Web

A detailed look at how publishers and LLMs share structured context — with FetchRight as the bridge.