Top
Best
New

Posted by wahnfrieden 2 days ago

ChatGPT Images 2.0(openai.com)
Livestream: https://openai.com/live/

System card: https://deploymentsafety.openai.com/chatgpt-images-2-0/chatg...

994 points | 894 commentspage 6
modeless 2 days ago|
Can it generate transparent PNGs yet?
alasano 2 days ago||
Previous gpt image models could (when generating, not editing) but gpt-image-2 can't.

Noticed it earlier while updating my playground to support it

https://github.com/alasano/gpt-image-playground

lxgr 2 days ago||
Works for me, but really weirdly on iOS: Copying to clipboard somehow seems to break transparency; saving to the iOS gallery does not. (And I’ve made sure to not accidentally depend on iOS’s background segmentation.)
vunderba 2 days ago||
OpenAI’s API docs are frustratingly unclear on this. From my experience, you can definitely generate true transparent PNG files through the ChatGPT interface, including with the new GPT-Image-2 model, but I haven’t found any definitive way to do the same thing via the API.
baalimago 2 days ago||
"Benchmarks" aside, do anyone actually use these image models for anything?
medlazik 2 days ago||
Look around? It's everywhere. Try talking to a graphic designer looking for a job theses days. Companies didn't wait for these tools to be good to start using them.
razorbeamz 2 days ago|||
Here in Japan every fucking food truck uses them for pictures of their menu, which really pisses me off because it's not representative of their food at all.
sumedh 1 day ago|||
People are using them for creating marketing material for their business.
croisillon 2 days ago||
MAGA to show how terrible Europe is ;)
ghstinda 1 day ago||
Humans have a new tool to make porn.
lifeisstillgood 2 days ago||
Pretty much all of the kerfuffle over AI would go away of it was accurately priced.

After 2008 and 2020 vast (10s of trillions) amounts of money has been printed (reasonably) by western gov and not eliminated from the money supply. So there are vast sums swilling about - and funding things like using massively Computationally intensive work to help me pick a recipie for tonight.

Google and Facebook had online advertising sewn up - but AI is waaay better at answering my queries. So OpenAI wants some of that - but the cost per query must be orders of magnitude larger

So charge me, or my advertisers the correct amount. Charge me the right amount to design my logo or print an amusing cat photo.

Charge me the right cost for the AI slop on YouTube

Charge the right amount - and watch as people just realise it ain’t worth it 95% of the time.

Great technology - but price matters in an economy.

codebolt 2 days ago||
Anyone test it out for generating 2D art for games? Getting nano banana to generate consistent sprite sheets was seemingly impossible last time i tried a few months ago.
hersko 1 day ago|
I'm still looking for a free tool to convert images to 3d models well.
rambojohnson 1 day ago||
Just tried it and got the usual six fingers, and half a thumb. What are they actually iterating on with these models by now…
thevinter 2 days ago||
Every time a new image gen comes out I keep saying that it won't get better just to be surprised again and again. Some of the examples are incredible (and incredibly scary. I feel like this is truly the point where understanding if something is AI becomes impossible)
lehmacdj 2 days ago|
So do you think there will be a better image model in a year?
throw310822 2 days ago|||
I'll bite: no I don't think so. If the examples are not cherry-picked and by "image model" we mean just the ability to generate pictures, this looks like parity with human excellence, there isn't much space for further improvement. The images don't just look real, they look tasteful- the model is not just generating a credible image, it's generating one that shows the talent of a good photographer/ designer/ artist.
Vachyas 2 days ago|||
I'm honestly unsure what could be improved at this point.

Consistency? So it fails less often?

Based on the released images, (especially the one "screenshot" of the Mac desktop) I feel like the best images from this model are so visually flawless that the only way to tell they're fake is by reasoning about the content of the image itself (ex. "Apple never made a red iPhone 15, so this image is probably fake" or "Costco prices never end in .96 so this image is probably fake")

thevinter 2 days ago|||
There is definitely room for improvement: https://gist.github.com/simonw/88eecc65698a725d8a9c1c918478a...

Especially when it comes to detailed outputs or non-standard prompts.

I do believe it will get even better - not sure it will happen within a year but I wouldn't be incredibly surprised if it did.

vunderba 2 days ago|||
Yep. “Where’s Waldo” has been a classic challenge for generative models for a while because it requires understanding the entire concept (there’s only one Waldo), while also holding up to scrutiny when you examine any individual, ordinary figure.

I experimented with the concept of procedural generation of Waldo-style scavenger images with Flux models with rather disappointing results. (unsurprisingly).

Vachyas 2 days ago||||
That's a good example, actually.

If you asked me what I expected, since this one has "thinking", it'd be that it would've thought to do something like generate the image without Waldo first, then insert Waldo somewhere into that image as an "edit"

throw310822 2 days ago|||
I wonder if at this point you could just ask the agent to iteratively refine the image in smaller portions.
RobinL 2 days ago||||
I'm been impressed when testing this model today, but it still can't consistently adhere to the following prompt: make me an image of a pizza split into 10 equal slices with space in between the them, to help teach fractions to a child.

It doesn't reliably give you 10 slices, even if you ask it to number them. None of the frontier models seem to be able to get this right

jinushaun 2 days ago||||
Cost? Speed?
vunderba 2 days ago|||
> I'm honestly unsure what could be improved at this point.

That's because you're focusing a little bit too much on visual fidelity. It's still relatively trivial to create a moderately complex prompt and have it fail miserably.

Even SOTA models only scored a 12 out of 15 on my benchmarks, and that was without me deliberately trying to "flex" to break the model.

Here's one I just came up with:

  A Mercator projection of earth where the land/oceans are inverted. (aka land = ocean, and oceans = land)
jumploops 2 days ago||
Looks like analog clocks work well enough now, however it still struggles with left-handed people.

Overall, quite impressed with its continuity and agentic (i.e. research) features.

sumitkumar 1 day ago|
prompt: create a qr code to https://www.anthropic.com

response: https://chatgpt.com/backend-api/estuary/content?id=file_0000...

result: FAIL

More comments...