A cheap trick for reliable structured output: feed the validation error back into the retry
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
If you generate structured output from an LLM and validate against a schema, you know the failure mode: usually fine, occasionally a missing field or an unparseable response. The common fix is a retry, but a plain retry is the same prompt at the same temperature, so you are just re-rolling.
What worked much better for me was making the retry self-correcting: when validation fails, put the validation error and the model's own previous output back into the next prompt and ask it to fix that specific thing. It edits instead of regenerating.
python except ValidationError as e: attempts += 1 error_message = f""" The last response failed validation due to this error: <error>{format_error_for_llm(e)}</error> Fix the error and return the corrected data: <data>{serialize(response).decode()}</data> """ response = None # next loop appends error_message to the prompt
Two details matter: describe the error for the model, not for a log ("field X must be an int, you sent a string"), and hand back its own prior output as the thing to correct. Tradeoffs: an extra call plus a longer prompt on failures (cap attempts); it only works when the bad output is parseable enough to feed back; and if you also fail over between providers, do not count the swap as an attempt.
This is from a RAG platform I built. How are others handling this, constrained decoding / grammars, or feedback loops like this?
[link] [comments]
More from r/LocalLLaMA
-
What's in your RAG?
Jul 2
-
Palantir CEO rages against closed models
Jul 2
-
SenseNova-U1-8b-MoT-Infographic-V2 (released yesterday) - An open source SOTA beast for infographic design and image editing.
Jul 2
-
[Benchmark] Kimi K2.7 Code Q3 on Mac Studio M3 Ultra + RTX PRO 6000 over llama.cpp RPC: prefill improves, no changes in token generation/decode
Jul 2
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.