01
The Problem
Take any JSON response from a REST API and count the curly braces, the brackets, the quoted key names, the colons, and the commas. That structural scaffolding accounts for roughly 40-50% of the tokens an LLM processes — and TOON eliminates it. Your model cannot skip those tokens in JSON. Every one occupies space in the context window, consumes attention, and costs money while carrying zero semantic information.
This was never a problem when the consumers of your APIs were browsers and microservices. A JsonSerializer.Deserialize() call costs microseconds. But the fastest-growing class of API consumer today is an LLM-backed agent, and LLMs pay for structure in a currency that matters: context window space.
02
Enter TOON
TOON — Token-Oriented Object Notation — encodes the same JSON data model but strips the structural noise. Hierarchy is expressed through indentation. Arrays of uniform objects collapse into compact tables with a field header ([count]{fields}:) followed by CSV-style rows. No braces, no brackets, no quoted keys, no commas. The result tokenizes roughly 50% more efficiently while remaining trivially readable.
The key insight
You do not need to rip out JSON. A single middleware layer lets AI agents opt into TOON via content negotiation while every other client continues receiving JSON. Additive, not disruptive.
03
Counting What Matters
Let us start with a concrete example. Here is a product record — the kind of thing a typical e-commerce API returns a thousand times a day:
{
"product": {
"id": "prd_8kx92m",
"name": "Merino Wool Crew",
"brand": "Outlier",
"category": "apparel",
"price": 89.00,
"currency": "USD",
"in_stock": true,
"rating": 4.7,
"variants": [
{
"sku": "MWC-BLK-S",
"color": "black",
"size": "S",
"stock_count": 24
},
{
"sku": "MWC-BLK-M",
"color": "black",
"size": "M",
"stock_count": 18
},
{
"sku": "MWC-NAV-L",
"color": "navy",
"size": "L",
"stock_count": 7
}
]
}
}product:
id: prd_8kx92m
name: Merino Wool Crew
brand: Outlier
category: apparel
price: 89
currency: USD
in_stock: true
rating: 4.7
variants[3]{sku,color,size,stock_count}:
MWC-BLK-S,black,S,24
MWC-BLK-M,black,M,18
MWC-NAV-L,navy,L,7The difference is visible at a glance: 589 bytes versus 263. But byte count is not the metric that matters for LLMs — tokens are. Running both through a modern tokenizer:
~147
JSON tokens
~65
TOON tokens
56%
fewer tokens
A 56% reduction on a single record. Real payloads are larger — a paginated listing returns 25-50 items, an inventory sync might return hundreds — and the savings compound across every request in an agent's reasoning chain. Where do those savings come from? Two mechanisms:
1. Punctuation overhead
JSON requires braces, brackets, quotes on every key,
colons, and commas. For an object with 4 fields:
JSON: { "sku": "...", "color": "...", "size": "...", "stock": 24 }
TOON: sku: ..., encoded as a CSV row under a schema header
Each punctuation character is either a token itself or
forces a token boundary, inflating the count.
2. Schema-aware array compression
JSON repeats every key for every object in an array.
TOON declares the schema once and streams rows:
JSON (3 variants):
{"sku":"MWC-BLK-S","color":"black","size":"S","stock":24},
{"sku":"MWC-BLK-M","color":"black","size":"M","stock":18},
{"sku":"MWC-NAV-L","color":"navy","size":"L","stock":7}
TOON (3 variants):
variants[3]{sku,color,size,stock}:
MWC-BLK-S,black,S,24
MWC-BLK-M,black,M,18
MWC-NAV-L,navy,L,7
The 12 repeated key tokens in JSON become 1 schema line.
The larger the array, the bigger the savings.The savings are most extreme in flat structures with repeated keys — which is exactly what REST APIs return. Product listings, user tables, transaction logs, search results. These are the payloads agents consume at volume, and they are precisely where the overhead hurts most.
On token counts
Token counts vary by tokenizer. The estimates in this article are representative of modern tokenizers (cl100k_base, o200k_base) used by current frontier models. The exact numbers will differ by a few percent between model families, but the relative savings — roughly half the tokens — are consistent across tokenizers because TOON eliminates structural characters that all tokenizers must encode.
04
Context Rot
Fewer tokens does not just mean lower cost. It means better reasoning.
An LLM has a fixed context window — a hard ceiling on how much information it can hold at once. Everything in that window competes for attention: the system prompt, conversation history, retrieved documents, tool outputs, and the API data itself. Structural tokens from JSON crowd out the information that actually matters.
This is context rot: the gradual degradation of model performance as the window fills with low-signal content. It does not fail catastrophically. The agent does not throw an error. It just gets slightly worse — slightly less accurate, slightly more prone to hallucination. The failure mode is invisible, which makes it dangerous.
Context engineering is the discipline of managing this window deliberately. For every token we put in, what is the marginal value? System prompts: high value. Retrieved documents: high value. Thirty-seven closing braces from a JSON payload: zero value.
TOON is a context engineering tool. By eliminating structural overhead, it shifts the ratio in the context window toward signal. Consider an agent that needs to reason over 200 products to find the best match for a customer query:
Available context: 128,000 tokens
System prompt: 2,000 tokens
Conversation history: 8,000 tokens
RAG documents: 12,000 tokens
─────────────────────────────────────────
Remaining for API data: 106,000 tokens
With JSON (~75 tokens/product):
→ 1,413 products fit → 200 products = 15,000 tokens
With TOON (~40 tokens/product):
→ 2,650 products fit → 200 products = 8,000 tokens
→ 7,000 tokens freed for additional context
Those 7,000 reclaimed tokens can hold:
→ ~175 more products, OR
→ ~2 additional RAG documents, OR
→ Richer system prompt + few-shot examplesThose reclaimed tokens are working memory. More room for chain-of-thought reasoning, few-shot examples, and guardrails that prevent hallucination. Every token you save on serialization overhead is a token you can spend on intelligence.
Why this matters now
As agent systems move from single-shot API calls to multi-step reasoning chains, the context window becomes a shared resource across many tool calls. A ~50% reduction per call means the difference between an agent that can complete a 12-step research task and one that runs out of context at step 8.
05
The Middleware
The implementation is embarrassingly simple, which is the point. One middleware, deployed once, gives every endpoint TOON support through standard HTTP content negotiation. No per-endpoint changes, no API rewrites.
The pattern: intercept the response, check the Accept header, serialize to TOON if requested, pass through unchanged otherwise. Here is the ASP.NET Core implementation:
using Microsoft.AspNetCore.Http;
using System.Text.Json;
using ToonFormat; // See toonformat.dev/ecosystem/implementations
public class ToonMiddleware
{
private readonly RequestDelegate _next;
public ToonMiddleware(RequestDelegate next)
{
_next = next;
}
public async Task InvokeAsync(HttpContext context)
{
var acceptHeader = context.Request.Headers.Accept.ToString();
if (!acceptHeader.Contains("text/toon"))
{
await _next(context);
return;
}
// Capture the original response body
var originalBody = context.Response.Body;
using var buffer = new MemoryStream();
context.Response.Body = buffer;
await _next(context);
// Read the JSON response
buffer.Seek(0, SeekOrigin.Begin);
var json = await JsonDocument.ParseAsync(buffer);
// Serialize to TOON
var toon = ToonEncoder.Encode(json.RootElement);
// Write the TOON response
context.Response.Body = originalBody;
context.Response.ContentType = "text/toon; charset=utf-8";
context.Response.Headers["X-Original-Content-Type"]
= "application/json";
await context.Response.WriteAsync(toon);
}
}Registration is one line in your pipeline:
var app = builder.Build();
app.UseMiddleware<ToonMiddleware>();
app.MapGet("/api/products", async (AppDbContext db) =>
{
var products = await db.Products
.Include(p => p.Variants)
.Take(50)
.ToListAsync();
return Results.Ok(new { products });
});
// That's it. If the client sends "Accept: text/toon",
// they get TOON. Otherwise, they get JSON.
// Every endpoint behind this middleware works the same way.The critical design decision is that TOON is opt-in per request, not per endpoint. Your OpenAPI spec stays the same. Your tests stay the same. Your human consumers never see the difference. Only the agents that know to ask for text/toon get the optimized format.
Other frameworks
TOON has official and community implementations for TypeScript, Python, Go, Rust, and more. The middleware pattern is identical in every framework. See toonformat.dev/ecosystem/implementations for the full list.
Production consideration
In production, you may want to add TOON support at the API gateway level (YARP, Envoy, Kong, or a custom edge function) rather than per-application. This gives you organization-wide coverage, centralized caching, and the ability to toggle it with a feature flag — no application code changes needed.
06
Full Response Comparison
Beyond a single product, here is what a real paginated API response looks like — the kind of payload an agent receives from a product search endpoint:
{
"meta": {
"total": 142,
"page": 1,
"per_page": 3,
"query": "wool"
},
"products": [
{
"id": "prd_8kx92m",
"name": "Merino Wool Crew",
"price": 89.00,
"in_stock": true,
"variants": [
{
"sku": "MWC-BLK-S",
"color": "black",
"size": "S",
"stock": 24
},
{
"sku": "MWC-NAV-M",
"color": "navy",
"size": "M",
"stock": 12
}
]
},
{
"id": "prd_3jn71q",
"name": "Wool Zip Hoodie",
"price": 148.00,
"in_stock": true,
"variants": [
{
"sku": "WZH-GRY-L",
"color": "grey",
"size": "L",
"stock": 6
}
]
},
{
"id": "prd_9vm45r",
"name": "Lambswool Scarf",
"price": 55.00,
"in_stock": false,
"variants": []
}
]
}meta:
total: 142
page: 1
per_page: 3
query: wool
products[3]:
- id: prd_8kx92m
name: Merino Wool Crew
price: 89
in_stock: true
variants[2]{sku,color,size,stock}:
MWC-BLK-S,black,S,24
MWC-NAV-M,navy,M,12
- id: prd_3jn71q
name: Wool Zip Hoodie
price: 148
in_stock: true
variants[1]{sku,color,size,stock}:
WZH-GRY-L,grey,L,6
- id: prd_9vm45r
name: Lambswool Scarf
price: 55
in_stock: false
variants[0]:913 bytes versus 480 — but the token breakdown is what matters:
Token comparison — paginated response
JSON
~228
tokens
TOON
~120
tokens — 47% fewer
The schema-aware compression is doing the heavy lifting. In JSON, the four variant field names repeat for every variant object across all three products — the keys appear 24 times. In TOON, they appear once per product as a schema header: variants[2]{sku,color,size,stock}:. After that, each row is pure values. The gap widens as payloads grow:
Products JSON tokens TOON tokens Reduction
───────── ─────────── ─────────── ─────────
1 ~147 ~65 ~56%
3 ~228 ~120 ~47%
10 ~750 ~390 ~48%
25 ~1,870 ~960 ~49%
50 ~3,740 ~1,920 ~49%
100 ~7,480 ~3,840 ~49%
Schema-aware arrays become more efficient as they grow —
field names declared once, not repeated for every object.
At 100 products, TOON cuts token usage roughly in half.A note on cost
The direct dollar savings depend on which model you use and how you are billed — pricing changes frequently and varies by provider. What does not change is the ratio: TOON consistently uses ~50% fewer tokens than JSON for typical API payloads. That means ~50% lower input token cost for any model, at any price point. Whether your bill is $100/month or $100,000/month, halving the tokens spent on serialization overhead is worth capturing.
Where This Goes
TOON is not replacing JSON. Browsers, mobile apps, legacy integrations — they keep speaking JSON. The transformation happens at the edge, transparently, for the consumers that benefit from it.
The pattern is the same one we followed with gzip. Nobody rewrote their APIs to produce compressed output. A middleware layer checked for Accept-Encoding: gzip and handled it transparently. TOON applies the same idea to a different bottleneck: where gzip optimized for bandwidth, TOON optimizes for context windows.
The middleware is trivial. The content negotiation is standard HTTP. The hard part is recognizing that the consumers of your API have changed — and that optimizing for them is a competitive advantage, not an academic exercise.
Deploy the middleware. Ship the header. Let your agents breathe.