Structured output has become a de facto expectation for LLM APIs, but streaming that structured output is still a mess. A big miss, streaming partial results in parallel can nearly double agent throughput.
OpenAI, Anthropic, and Gemini handle streaming structured responses, but each breaks developer expectations differently: missing parsed results, losing final token stats, or adding unnecessary boilerplate.
Comments URL: https://news.ycombinator.com/item?id=45607915
Points: 1
# Comments: 0
Source: schnabl.cx