Using “underdrawings” for accurate text and numbers

Posted by samcollins 2 days ago

Using “underdrawings” for accurate text and numbers(samcollins.blog)

283 points | 92 commentspage 3

tracerbulletx 12 hours ago|

Ive been doing charts for slides like this for a while. Noticed html viz was super reliable, but I could style it with diffusion model. Its very useful for data viz.

SomaticPirate 7 hours ago||

inb4 this technique is subsumed into the next MoE model release

LLMs are evolving so fast I wouldn’t be surprised if this technique was not needed in <6 months

krackers 7 hours ago||

I don't think the MoE part has anything to do with it, but the current gen of multimoddal models can do thinking interleaved with autoregressive(?*) image-gen so it's probably not long before they bake this into the RL process, same way native thought obviated need for "think carefully step by step" prompts.

rimliu 7 hours ago||

LLMs are rather devolving at this point.

Melamune 7 hours ago||

I wondered why I was losing all passion for creating. These tips and tricks are part of the answer.

globular-toast 6 hours ago||

Wait, where did it get the "Sweet Path//Trail of treats" thing from in the SVG? It wasn't about sweets at that point. Something missing here, I think.

jeffrallen 9 hours ago||

I wish the opposite was true: that when I tell Gemini I want "a diagram of X" that it immediately breaks out Python and mathplotlib, instead of wasting my time with Nano Banana.

nullc 10 hours ago||

Inpainting/guiding from a sketch is how I've always used diffusion models. I thought everyone did that, or at least everyone who wasn't just trying to get some arbitrary filler material without much care of what the output looked like.

foxes 6 hours ago||

I feel sorry for the recipient.

psychoslave 7 hours ago||

A few months ago I tried to make Le-chat Mistral output a French poetry in Alexandrin (12 vowels). Disastrous at first. Then adding in specifications that each line had to also be transposed in IPA and each syllable counted, it went better.

Still emotionally unrelatable, but definitely was providing something that match the specifications of there are explicit and systematically enforced through deterministitic means. For now I retain that LLM limitations are thus that they can't seize the ineffable and so untrustworthy they can only be employed under very clear and inescapable constraints or they will go awry just as sure as water is wet.

gwern 11 hours ago||

tldr: do a standard img2img workflow where you lay out a skeleton or skeleton or low-res version, and then turn it into the final high-quality photorealistic version, instead of trying to zeroshot it purely from a text prompt.

brentcrude 4 hours ago|

[dead]