What RTB looks like when the page is a paragraph
Real-time bidding was built for page loads. Conversations load one token at a time. Here's what has to change — and what surprisingly doesn't.
Real-time bidding has a comfortable set of assumptions baked into it. A user hits a page. The publisher fires an ad request with a URL, some viewport dimensions, maybe a user ID. DSPs bid, a winner is picked, a creative is returned, and the browser renders it. All of this happens in the ~100ms before the page paints.
None of those assumptions survive a streaming AI response. There is no page URL. There’s no viewport. There’s often no user ID. And the “content” the ad is supposed to sit next to doesn’t exist yet — the model is still generating it.
So what does RTB have to become?
The bid request moves upstream
In a conversational product, the ad decision has to happen before the model starts streaming, not after. By the time tokens are flowing, it’s too late to inject a sponsored recommendation into the middle of a paragraph that’s already been sent.
That means the ad request has to fire at inference time, not render time. The inputs are different too: instead of a URL and a viewport, the DSP sees a structured intent signal — what the user asked, what the model is about to answer, and what shape of placement the product supports (inline mention, attached card, follow-up suggestion).
The DSP isn’t bidding on an impression. It’s bidding on a moment.
Creatives stop being rectangles
A 300×250 banner assumes the publisher has 300×250 pixels to give it. A conversational placement has no pixels. It has words, and maybe a structured card.
So the creative format has to change. In our work at Growl, the unit looks more like:
- A short product name and one-line description, written to match the tone of the surrounding generated text
- A canonical URL for the clickthrough
- A product image (optional, surfaced only when the client supports a card format)
- A taxonomy tag advertisers use to target what the user is doing rather than what page they’re on
Advertisers writing creative for conversational surfaces have to let go of pixel-perfect layout control. In exchange, they get something banner ads never had: the ad renders in the user’s native reading context, at their reading width, in their theme, with their font.
What doesn’t have to change
A lot, actually. The auction mechanics — first-price, second-price, header bidding analogues — map over cleanly. The DSP/SSP split is still useful. Measurement primitives (impression, click, conversion) are still meaningful, just pegged to a slightly different surface.
The identity graph is the part where the industry has to make a real call. Some conversational products will have a logged-in user and pass a hashed ID. Many won’t. The ones that don’t need a pure contextual path — bidding on the intent signal alone, with no user identifier at all. That path exists today for CTV and for some cookieless web inventory, and it ports over well.
The latency budget is different
Web RTB runs on a ~100ms budget because that’s what page paint tolerates. Conversational RTB runs on a different budget entirely: the time between when the user hits enter and when the model starts streaming. For most consumer AI products that’s 300–800ms, sometimes longer when the model is doing tool calls.
That’s a lot of headroom by RTB standards. You can run a real auction, apply real brand-safety filters, and still come in under budget. The constraints we’ve been living with on the web — shaving milliseconds off bid timeouts — mostly dissolve on this surface.
What to actually build
If you’re a DSP or SSP thinking about AI inventory, three things are worth working on now:
- A conversational bid request schema that models intent instead of URL. OpenRTB 2.6 is close enough to extend; we don’t need a fresh protocol.
- Creative formats that degrade gracefully from rich card to inline mention, because different clients will support different things.
- An intent taxonomy that slots into what buyers already know — product categories, purchase stages, verticals — without requiring them to learn a new vocabulary.
None of this requires tearing down the existing ad stack. It requires meeting it at a new surface.