ChatGPT Images 2.0 - Hacker News

Posted by wahnfrieden 20 hours ago

ChatGPT Images 2.0(openai.com)

Livestream: https://openai.com/live/

System card: https://deploymentsafety.openai.com/chatgpt-images-2-0/chatg...

898 points | 750 commentspage 2

skybrian 15 hours ago|

This time it passed the piano keyboard test:

https://chatgpt.com/s/m_69e7ffafbb048191b96f2c93758e3e40

But it screwed up when attempting to label middle C:

https://chatgpt.com/s/m_69e8008ef62c8191993932efc8979e1e

Edit: it did fix it when asked.

vunderba 15 hours ago|

When NB 2 came out I actually had to increase the difficulty of the piano test - reversing the colors of all the accidentals and the naturals, and it still managed it perfectly.

https://mordenstar.com/other/nb-pro-2-tests

porphyra 16 hours ago||

The improvement in Chinese text rendering is remarkable and impressive! I still found some typos in the Chinese sample pic about Wuxi though. For example the 笼 in 小笼包 was written incorrectly. And the "极小中文也清晰可读" section contains even more typos although it's still legible. Still, truly amazing progress. Vastly better than any previous image generation model by a large margin.

Lucasoato 14 hours ago|

Is this even better than Chinese models? I suppose they focus much more on that aspect, simply because their training data might include many more examples of Chinese text.

Ladioss 6 hours ago||

Maybe they just use Qwen Image under the hood ;p

Lucasoato 1 hour ago||

It wouldn’t surprise me at this point ahaha

justani 9 hours ago||

I have a few cases where nano banana fails all the time, even gpt image 2 is failing.

A 3 * 3 cube made out of small cubes, with a small 2 * 2 cube removed from it - https://chatgpt.com/share/69e85df6-5840-83e8-b0e9-3701e92332...

Create a dot grid containing a rectangle covering 4 dots horizontally and 3 dots vertically - https://chatgpt.com/share/69e85e4b-252c-83e8-b25f-416984cf30...

One where Nano banana fails but gpt image 2 worked: create a grid from 1 to 100 and in that grid put a snake, with it's head at 75 and tail at 31 - https://chatgpt.com/share/69e85e8b-2a1c-83e8-a857-d4226ba976...

teruakohatu 8 hours ago|

> A 3 * 3 cube made out of small cubes, with a small 2 * 2 cube removed from it - https://chatgpt.com/share/69e85df6-5840-83e8-b0e9-3701e92332...

It is a little ambiguous (what exactly is a "3x3 cube") but I tried a bunch of variations and I simply could not get any Gemini models to produce the right output.

sigmoid10 6 hours ago||

You can do it, but it takes two steps. Code is generally better to create such strict geometry (even from ambiguous prompts), while the image diffusion model is great for tuning style and lighting.

https://chatgpt.com/share/69e88b5c-8628-83eb-8851-f587ef2c95...

schneehertz 13 hours ago||

Generating a 4096x4096 image with gemini-3.1-flash-image-preview consumes 2,520 tokens, which is equivalent to $0.151 per image.

Generating a 3840x2160 image with gpt-image-2 consumes 13,342 tokens, which is equivalent to $0.4 per image.

This model is more than twice as expensive as Gemini.

strangescript 13 hours ago|

this is apples to oranges, the flash version version a full version

this thing is like 5x better than flash at fine grain detail

ac29 13 hours ago|||

Google's naming might be misleading, currently 3.1 flash image outperforms the available pro version (3.0 pro) on most benchmarks: https://deepmind.google/models/model-cards/gemini-3-1-flash-...

altcognito 13 hours ago|||

.40 cents for high quality output is insanely cheap

it is only going to get cheaper

eclipticplane 13 hours ago|||

> .40 cents

Warning: Verizon math ahead.

tfehring 12 hours ago||

In case anyone is unfamiliar with one of the most infuriating phone calls of all time: https://www.youtube.com/watch?v=MShv_74FNWU

ai_fry_ur_brain 2 hours ago|||

You people keep saying this and token prices keep doubling. The cope of the gambler is truly one to marvel.

Oarch 4 hours ago||

Every groundbreaking new AI release feels like a volley of cannonfire towards the soul. Oof.

arkensaw 4 hours ago|

no AI could ever be so poetic. nice

TrackerFF 5 hours ago||

This is the first model I've used for mockups where I feed reference images, and they truly look real and good enough for pro use. I'm impressed.

ripped_britches 3 minutes ago|

Mockups for what type of work? Web or mobile UI?

dktp 18 hours ago||

One interesting thing I found comparing OpenAI and Gemini image editing is - Gemini rejects anything involving a well known person. Anything. OpenAI is happy to edit and change every time I tried

I have a sideproject where I want to display standup comedies. I thought I could edit standup comedy posters with some AI to fit my design. Gemini straight up refuses to change any image of any standup comedy poster involving a well know human. OpenAI does not care and is happy to edit away

Melatonic 18 hours ago||

How does it determine they are well known and not just similar looking?

yreg 15 hours ago|||

Gemini often rejects photos of random people (even ones it generated itself) because it thinks they look too similar to some well known person.

dktp 18 hours ago||||

I don't know tbh. I've tried it on 10-20 various level of famous standups and Gemini refuses every time

Just for testing, I just tried this https://i.ytimg.com/vi/_KJdP4FLGTo/sddefault.jpg ("Redesign this image in a brutalist graphic design style"). Gemini refuses (api as well as UI), OpenAI does it

arjie 18 hours ago|||

It's not super deterministic but it didn't fail once on my attempts. See: https://imgur.com/a/james-acaster-cold-lasagne-1R7fpzQ

dktp 17 hours ago||

Very interesting. It fails every single time for me. I'm in Germany, maybe Google is stricter here?

See https://imgur.com/a/77BRDQv

arjie 17 hours ago||

That makes sense to me. I just Googled around like a fool and got here https://en.wikipedia.org/wiki/Personality_rights#Germany

It seems like they're trying to follow local law. What a nightmare to have to manage all jurisdictions around such a product. Surprised it didn't kill image generation entirely.

jliptzin 16 hours ago||

Yea, especially when they know all that work will be completely pointless in a few years when open source / local models will be just as good and won't have any legal limitations, so people will be generating fake images of famous people like crazy with nothing stopping them

Melatonic 17 hours ago|||

What if you change the prompt to tell it specifically its not a famous person? Or try it without text?

BoorishBears 12 hours ago|||

There are models specifically for detecting well known people https://docs.aws.amazon.com/rekognition/latest/dg/celebritie...

vunderba 15 hours ago||

Are you using Google Gemini directly? I've found the Vertex API seems to be significantly less strict.

6thbit 19 hours ago||

System card link with safety details https://deploymentsafety.openai.com/chatgpt-images-2-0

direct pdf https://deploymentsafety.openai.com/chatgpt-images-2-0/chatg...

dang 14 hours ago|

Link added to toptext. Thanks!

bsenftner 3 hours ago||

My problem with all of this is the terrible educations everyone has, and they cannot discriminate images from art, nor art from communications, and if they had they would realize these points this entire debate hinges is a manipulation to create people that will not help themselves with the latest technologies. But to explain it causes people to get angry, because they either think I'm trying to manipulate them, or they fall in despair when they realize the magnitude of this crime.

amunozo 18 hours ago|

This is not as exciting as previous models were, but it is incredibly good. I am starting to think that expressing thoughts in words clearly is probably the most important and general skill of the future.

aulin 9 hours ago||

Well that was probably the most important general skill even before this.

sigmoid10 6 hours ago||

Perhaps for managers. But for everyone actually doing something, you used to need technical proficiency with tools. Now AI is becoming the universal tool.

bamboozled 5 hours ago|||

In other words, communication is an important skill.

echelon 16 hours ago||

> I am starting to think that expressing thoughts in words clearly is probably the most important and general skill of the future.

Without question.

AI will be indistinguishable from having a team. Communicating clearly has always and will always mattered.

This, however, is even stronger. Because you can program and use logic in your communications.

We're going to collectively develop absolutely wild command over instruction as a society. That's the skill to have.

adamhartenz 13 hours ago|||

How can AI be the amazing thing you say it is, but also too stupid to understand unless you get really good at communicating. Wouldn't better AI just mean it understands your ramblings better?

pickleRick243 12 hours ago|||

It's fine if the "rambling" is logically coherent. So the communication ability isn't really about expressing your thoughts eloquently, but just effectively and clearly. Run on sentences and train of thought is fine as long as you are saying something meaningful. But no AI will be able to read your mind and know exactly what you mean by "make really cool looking website, not lame please, also nice colors, not boring". Declarative programming through natural language will become incredibly powerful.

jstanley 7 hours ago||||

It can't grab out information that isn't there. If your ramblings are ambiguous then it has to make a guess.

raincole 12 hours ago|||

Many humans are great at their expertise but bad at communicating. How?

yreg 15 hours ago|||

On the other hand LLMs are getting very good at understanding poorly constructed instructions as well.

So being able to express oneself clearly in a structured way may not be such an edge.

amunozo 9 hours ago||

Yes, I agree, but as one of the other comments say, they are not able to read your mind. So even if the structure and style is not clear, you must be able to express what you want.

yreg 4 hours ago||

Certainly. I just think "expressing thoughts in words clearly" might in the end turn out to be something different than what we, humans consider clear.

For example long unstructured rambling might turn out to be a non-issue, while as human I would rank such message low no matter how good it is in other informational aspects.

amunozo 2 hours ago||

That's true. I feed Codex some very long .md files that I use as a kind of work diary and that are pain to use into something very much usable. Writing your thoughts is important even if done carelessly.

More comments...