Decision

250 tokens

Three sentences. Hard cap. In any language.

~1 min · 181 words

We picked 250 tokens as the hard ceiling for every chat reply.

The model literally cannot exceed it at the API layer.

250 fits about three sentences in any language we've tested — English, Russian, Georgian, Turkish — plus a markdown image URL on its own line, which we need because the chat sometimes recommends a catalog photo and a truncated URL is worse than no URL. Lower than 250, we risked breaking the URL mid-line. Higher, and the model started writing paragraphs. Paragraphs sound like an essay, not a conversation.

The companion rule in the system prompt: “Reply in 1–3 sentences. Never more than 3, in any language. This is a hard rule.” Sentence-based rules are language-neutral; the model counts boundaries (. ? !) regardless of whether it's writing in Russian or Georgian. We don't have to localize the rule per language.

Side effect: 774 tokens recovered from the old 1024 budget. That's a lot of conversation history a single reply no longer has to compete with.

You wouldn't think a number could be a design decision. This one is.

You wouldn't think a number could be a design decision. This one is.