Posted by wahnfrieden 2 days ago
System card: https://deploymentsafety.openai.com/chatgpt-images-2-0/chatg...
> Wow, the difference between AI and non-AI images collapses. I hate the future where I won't be able to tell the difference.
Image generation is now pretty much "solved". Video will be next. Perhaps things will turn out the same as chess: in that even though chess was "solved" by IBM's Deep Blue, we still value humans playing chess. We value "hand made" items (clothes, furniture) over the factory made stuff. We appreciate & value human effort more than machines. Do you prefer a hand-written birthday card or an email?
Feels like now is a bit of a catchup after pretty tepid period that was most of my life.
Photographs, videos, and digital media in general, in contrast, are used for much, much more than just socializing.
Consistency? So it fails less often?
Based on the released images, (especially the one "screenshot" of the Mac desktop) I feel like the best images from this model are so visually flawless that the only way to tell they're fake is by reasoning about the content of the image itself (ex. "Apple never made a red iPhone 15, so this image is probably fake" or "Costco prices never end in .96 so this image is probably fake")
Especially when it comes to detailed outputs or non-standard prompts.
I do believe it will get even better - not sure it will happen within a year but I wouldn't be incredibly surprised if it did.
I experimented with the concept of procedural generation of Waldo-style scavenger images with Flux models with rather disappointing results. (unsurprisingly).
If you asked me what I expected, since this one has "thinking", it'd be that it would've thought to do something like generate the image without Waldo first, then insert Waldo somewhere into that image as an "edit"
It doesn't reliably give you 10 slices, even if you ask it to number them. None of the frontier models seem to be able to get this right
That's because you're focusing a little bit too much on visual fidelity. It's still relatively trivial to create a moderately complex prompt and have it fail miserably.
Even SOTA models only scored a 12 out of 15 on my benchmarks, and that was without me deliberately trying to "flex" to break the model.
Here's one I just came up with:
A Mercator projection of earth where the land/oceans are inverted. (aka land = ocean, and oceans = land)https://chatgpt.com/s/m_69e8cc31dac48191a09bb9c00d5aa3fe
kinda funny, I guess
"Hey give me a comic of how to create a rocket engine i can build at home"
Unlimited creativity will be shackled by safety.
Still pretty amazing.