Posted by gmays 2 days ago
https://blog.dottxt.ai/say-what-you-mean.html
The blog post is doubly bad because any "failures" involving images and image understanding can't necessarily be traced back to structured generation at all!!!
> you need a parser that can find JSON in your output and, when working with non-frontier models, can handle unquoted strings, key-value pairs without comma delimiters, unescaped quotes and newlines; and you need a parser that can coerce the JSON into your output schema, if the model, say, returns a float where you wanted an int, or a string where you wanted a string[].
Oh cool I'm sure that will be really reliably. Facepalm.
> Allow it to respond in a free-form style: let it refuse to count the number of entries in a list, let it warn you when you've given it contradictory information, let it tell you the correct approach when you inadvertently ask it to use the wrong approach
This makes zero sense. The whole point of structured output is that it's a (non-AI) program reading it. That program needs JSON input with a given schema. If it is able to handle contradictory-information warnings, or being told you're using the wrong approach then that will be in the schema anyway!
I think the point about thinking models is interesting, but the solution to that is obviously to allow it to think without the structuring constraint, and then feed the output from that into a query with the structured output constraint.