Using “underdrawings” for accurate text and numbers

Posted by samcollins 2 days ago

Using “underdrawings” for accurate text and numbers(samcollins.blog)

264 points | 86 commentspage 2

nottorp 5 hours ago|

LLMs are like a box of chocolates...

docheinestages 4 hours ago||

And what happens if the model can't come up with a good enough SVG to begin with?

nine_k 6 hours ago||

It's normal to first create a plan, then allow agents to write code. But it seems to be surprising for many to first create a draft / outline of a picture, then go for a final render.

utopiah 3 hours ago||

Love the concluding note : it works, but not really.

So LLM/GenAI crave. An entire article to show that it's nearly there, yet it's not, despite convoluted effort to make it just so on a very very niche example.

Al-Khwarizmi 2 hours ago|

But if it works part of the time, it's useful. It's easy for a human to check that the numbers are correct, and if they aren't, just regenerate the image. Orders of magnitude easier than creating the image from scratch without the model.

cheekyant 4 hours ago||

Has anyone built a platform which has image to image pipelines and lets you use prompt to SVG generation from SOTA LLMs?

TeMPOraL 4 hours ago|

ComfyUI?

BobbyTables2 9 hours ago||

How is it that LLMs aren’t good at rendering the sequence of numbers but can reliably put the supplied pieces all in the right order?

mk_stjames 9 hours ago|

Because the image generation is powered by a diffusion model that is only guided by the transformer model and still has somewhat vague spatial representation especially when it comes to coupling things like counting and complex positioning.

But by using the LLM to generate code like an SVG graphic is made up of, and then using a rasterized image of that SVG as an input to the diffusion model, this takes place of the raw noise input and guides the denoising process of the diffusion model to put the numerical parts in the right spots.

The LLM is putting the SVG in the right order because the code that drives the SVG is just that - code - and the numerical order is easily defined there, even if it has to follow something like a spiral.

Edit: although LLMs now also may be using thinking modes with their feedback during generation to help with complex positioning when drawing something like an SVG, as I just asked claude to generate me one such spiral number SVG and it did so interactively via thinking, and the code generated is incredibly explicit with positions, so, that must help. But the underlaying idea to two-step SVG-to-diffusion model is the real key here.

wg0 7 hours ago||

Has anyone had good luck with making consistent game art and assets?

choeger 8 hours ago||

Transformers are great translators. So, yeah, starting with structured output like SVG is probably the best way to start.

It should be fairly trivial to fix any logic errors in the structured output, too.

tracerbulletx 10 hours ago|

Ive been doing charts for slides like this for a while. Noticed html viz was super reliable, but I could style it with diffusion model. Its very useful for data viz.

More comments...