A Difficult Position
The rapid rise of AI agents and large language models has forced publishers and platform operators into a difficult position.
On one side, publishers want their content to be understood by AI systems. They want their brand, expertise, and editorial voice represented accurately in AI answers and recommendations.
On the other side, they want control over how their intellectual property is accessed, transformed, and monetized.
Meanwhile, AI platforms face the opposite challenge: they need high-quality knowledge sources that are structured, reliable, and legally accessible. But today they are often forced to scrape raw HTML and reverse engineer meaning from it.
The current ecosystem is messy because three fundamentally different responsibilities are being blended together:
- Licensing and policy enforcement
- Usage accounting and metering
- Content transformation and representation
These concerns are often implemented in a single system: a CDN feature, a crawler policy, or an ad-hoc licensing API.
But when these responsibilities are combined, both publishers and AI systems suffer.
The FetchRight / Peek-Then-Pay model proposes something different:
A clean separation of concerns where each responsibility is handled in the layer where it belongs.
This architecture benefits both publishers and AI platforms.
The Three Responsibilities in AI Content Access
To understand why this matters, it's helpful to look at the three core functions separately.
1. Licensing: Who Is Allowed to Access What?
Licensing answers questions like:
- Which organizations can access this content?
- Under what conditions?
- For which purposes?
- At what price?
Licensing should be centralized.
Why?
Because licensing is fundamentally a business relationship, not a runtime infrastructure concern.
A publisher may negotiate agreements with multiple AI companies, each with different policies, pricing models, or permitted uses.
Centralized licensing systems allow:
- Publishers to manage agreements in one place
- AI companies to manage credentials across multiple publishers
- Policy changes without infrastructure redeployments
In the FetchRight architecture, licensing is handled by a central license authority, but runtime systems do not depend on it for every request.
That's where the second concern comes in.
2. Usage Accounting: How Much Has Been Consumed?
Traditional API systems rely on centralized accounting; every request must contact a license server.
At internet scale, this creates problems:
- It adds latency.
- It introduces a central bottleneck.
- It increases failure risk.
FetchRight uses a different model:
Assertion-only licenses with local bookkeeping.
This means:
- A license server issues signed permissions
- Crawlers or agents present those assertions when requesting content
- Enforcement systems validate the assertion locally
- Usage is tracked locally at the edge
- No round-trip to the licensing server is required
This approach offers several advantages:
- Edge-level scalability
- Lower latency
- Resilience to network failures
- Easier integration with CDNs and proxies
Local metering systems (whether Redis, Durable Objects, or other storage) track usage in real time.
Licensing remains centralized. Accounting remains distributed.
Each system does the job it is best suited for.
3. Content Transformation: How Content Is Represented
The third concern is often overlooked but may be the most important.
AI systems rarely consume raw HTML.
They transform content into:
- Summaries
- Embeddings
- Structured knowledge
- RAG-ready chunks
- Question-answer pairs
Today, these transformations are typically performed by:
- The AI platform scraping the content
- A generic CDN feature
- A third-party intermediary
But this introduces a critical problem.
The entity transforming the content is often not the publisher.
Which means:
- The publisher's narrative can be lost
- Brand messaging may be distorted
- Commercial relationships (like affiliate attribution) disappear
- Structured knowledge embedded in the page may be ignored
This is why FetchRight treats content transformation as a publisher-controlled responsibility.
The publisher decides:
- How content is represented
- How it is summarized
- How it is chunked
- What metadata accompanies it
- What semantic signals should be preserved
This is where modern AI tooling becomes incredibly powerful.
Publisher-Controlled Transformations Are the Missing Layer
Publishers already understand their content better than anyone else.
They know:
- The meaning of their taxonomy
- How their editorial voice should sound
- Which relationships matter commercially
- How products, brands, and categories connect
Using modern AI models internally, publishers can generate representations that preserve this knowledge.
For example, a publisher could generate embeddings using systems like Gemini Embeddings 2.
These embeddings might incorporate:
- Editorial text
- Product information
- Images
- Taxonomy metadata
- Internal linking structure
Instead of exposing raw HTML to be scraped, the publisher exposes a canonical semantic representation of their content.
This benefits everyone.
Why Publisher-Generated Transformations Are Better for LLMs
From the perspective of an AI platform, publisher-generated transformations offer several advantages.
Higher Semantic Accuracy
Generic crawlers must infer meaning from messy web pages.
Publishers can provide clean semantic representations built from the original source.
Better Context Preservation
Important signals like:
- Brand identity
- Editorial standards
- Product relationships
- Commercial disclosures
can be preserved intentionally.
Reduced Ingestion Cost
Instead of scraping, cleaning, parsing, chunking, and embedding content themselves, AI platforms can consume ready-to-use knowledge artifacts.
Provenance
Publisher-provided transformations come with clear attribution and traceability.
This is increasingly important as AI answers are scrutinized for accuracy and fairness.
Why This Model Is Better for Publishers
Publishers also benefit significantly.
Control Over Narrative and Representation
Instead of allowing third parties to summarize or reinterpret content, publishers define the canonical representation.
Protection of Commercial Relationships
Affiliate links, product partnerships, and editorial positioning can be preserved in the knowledge representation.
Monetization
Transformations themselves become licensable assets.
Examples include:
- Semantic embeddings
- Curated RAG chunks
- Structured commerce knowledge
- Brand grounding datasets
Reduced Scraping Pressure
If AI systems can access clean semantic representations through a licensed interface, there is less incentive to crawl raw pages aggressively.
Why Generic Transformations Fall Short
Some platforms attempt to solve the AI access problem by offering generic transformation services.
Examples include:
- CDN-based transformations
- Intermediary licensing platforms
- Automated scraping pipelines
These solutions can be useful for basic access control.
But they suffer from a fundamental limitation.
They do not understand the publisher's content model.
Generic transformation systems cannot easily capture:
- Editorial nuance
- Brand positioning
- Internal taxonomy
- Commerce relationships
- Structured product knowledge
As a result, the outputs they generate are often lower quality.
AI systems must still reconstruct meaning from incomplete representations.
The Peek-Then-Pay Model
Peek-Then-Pay ties these ideas together.
It introduces a simple concept:
AI agents should be able to preview what a resource contains before licensing full access.
This preview might include:
- Topic metadata
- Semantic summaries
- Structural descriptions
- Resource quality indicators
If the resource is valuable, the agent can request the full licensed representation.
That representation might include:
- Structured text
- Publisher-generated summaries
- Embeddings
- Metadata
- Transformation outputs
The result is a marketplace where:
- Publishers expose high-quality semantic assets
- AI systems consume clean, licensed knowledge
- Infrastructure providers enforce policies at scale
Each participant focuses on what they do best.
Separation of Concerns Is the Key
The architecture that emerges looks like this:
Licensing → centralized policy and agreements
Usage accounting → distributed assertion validation and local metering
Content transformation → publisher-controlled semantic representation
This separation of concerns enables:
- Scalability
- Flexibility
- Higher content quality
- Stronger publisher control
- Better outcomes for AI systems
Most importantly, it aligns incentives.
Publishers want their knowledge represented accurately.
AI platforms want reliable, high-quality sources.
Peek-Then-Pay provides a framework where both sides win.
The Future of AI Knowledge Access
As AI systems become the primary interface to information, the way knowledge is accessed will continue to evolve.
The web was built for humans reading HTML.
AI systems need something different.
They need:
- Structured knowledge
- Semantic representations
- Licensing clarity
- Provenance
The FetchRight / Peek-Then-Pay model enables this future by recognizing that licensing, metering, and transformation are separate problems that deserve separate solutions.
When these concerns are cleanly separated, the ecosystem becomes healthier for everyone involved.
Publishers retain control of their voice and intellectual property.
AI platforms gain access to higher-quality knowledge.
And the web evolves into a more sustainable knowledge infrastructure for the AI era.