Posted by gaws 13 hours ago
That said, I am surprised Seedream 4.0 beat it in these tests.
Google is so weirdly non-integrated.
https://blog.google/technology/ai/nano-banana-google-product...
> Google is so weirdly non-integrated.
Where by try gemini non- integrated have you tried gemini you mean gemini is here they shove use gemini gemini into every single product they have?
OP here. While Seedream did have the edge in adherence it also tends to introduce slight (but noticeable) color gradation changes. It's not a huge deal for me, but it might be for other people depending on their goals in which case NanoBanana would be the better choice.
Seedream 4.0 is somewhat slept on for being 4k at the same cost as nano-banana. It's not as great at perfect 1:1 edits, but it's aesthetics are much better and it's significantly more reliable in production for me.
Models with LLM backbones/omni-modal models are not rare anymore, even Qwen Image Edit is out there for open-weights.
I've been using Nano Banana quite a lot, and I know that it absolutely struggles at exterior architecture and landscaping. Getting it to add or remove things like curbs, walkways, gutters, etc, or to ask to match colors is almost futile.
I think this was fairly predictable, but as engineering improvements keep happening and the prompt adherence rate tightens up we're enjoying a wild era of unleashed creativity.
If I were to make an image editing app, this would be the model I'd choose.
E.g. Gemini 2.5 Flash is given extreme leeway with how much it edits the image and changes the style in "Girl with Pearl Earring" only to have OpenAI gpt-image-1 do a (comparatively) much better job yet still be declared failed after 8 attempts, while having been given fewer attempts than Seedream 4 (passed) and less than half the attempts of OmniGen2 (which still looks way farther off in comparison).
Even so, Gemini would lose by 1, but I found that I would often choose it as the winner(especially say, The Wave surfer). Would love to see a x/10 instead of pass/fail.
Still, to my eye, ai generated images still feel a bit off when doing with real world photographs.
George's hair, for example, looks over the top, or brushed on.
The tree added to the sleeping person on the ground photo... the tree looks plastic or too homogenized.
It's mostly because image model size and required compute for both training and inference have grown faster than self-hosted compute capability for hobbyists. Sure, you can run Flux Kontext locally, but if you have to use a heavily quantized model and wait forever for the generation to actually run, the economics are harder to justify. That's not counting the "you can generate images from ChatGPT for free" factor.
> George's hair, for example, looks over the top, or brushed on.
IMO, the judge was being too generous with the passes for that test. The only one that really passes is Gemini 2.5 Flash Image:
Flux Kontext: In addition to the hair looking too slick, it does not match the VHS-esque color grading of the image.
Qwen-Image-Edit: The hair is too slick and the sharpness/saturation of the face unnecessarily increases.
Seedream 4: Color grading of the entire image changes, which is the case with most of the Seedream 4 edits shown in this post, and why I don't like it.
The economics 1000% do not justify me owning a GPU to do this. I just happen to own one.
My usecase: An image of a cartoon character, holding an object and looking at it. Wanted to edit so that the character no longer has the object in her hand and now looking towards the camera.
Result Nanobanana: At first pass it only removed the object that the character was holding, however there was no change in her eyeline, she was still looking down at her now empty hand. Second prompt explicitly asked to change the eyeline to look at camera. Unsuccessful. Third attempt asked the character to look towards ceiling. Success but unusable edit as I wanted the character to look at the camera.
Result Reve: At first attempt it gave me 4 options and all 4 are usable. It not only removed the object and changed the eyeline of the character to look at the camera, but it also made posture changes so that the empty hands were appropriately positioned, and now since the character is in a different situation (sans the object that was holding her attention) Reve posed the character in different ways which were very appropriate - which I didn't think of prompting for earlier (maybe because my focus was on immediate need - object removal and change in eyeline).
On a little more digging found this writeup which will make me to signup for their product.
Some might critique the prompts and say this or that would have done better, but they were the kind of prompt your dad would type in not knowing how to push the right buttons.