Qwen-Image-2.0: Professional infographics, exquisite photorealism

Posted by meetpateltech 15 hours ago

Qwen-Image-2.0: Professional infographics, exquisite photorealism(qwen.ai)

364 points | 158 commentspage 2

fguerraz 14 hours ago|

I found the horse revenge-porn image at the end quite disturbing.

engcoach 9 hours ago||

It's the year of the horse in their zodiac. The (translated) prompt is wild:

""" A desolate grassland stretches into the distance, its ground dry and cracked. Fine dust is kicked up by vigorous activity, forming a faint grayish-brown mist in the low sky. Mid-ground, eye-level composition: A muscular, robust adult brown horse stands proudly, its forelegs heavily pressing between the shoulder blades and spine of a reclining man. Its hind legs are taut, its neck held high, its mane flying against the wind, its nostrils flared, and its eyes sharp and focused, exuding a primal sense of power. The subdued man is a white male, 30-40 years old, his face covered in dust and sweat, his short, messy dark brown hair plastered to his forehead, his thick beard slightly damp; he wears a badly worn, grey-green medieval-style robe, the fabric torn and stained with mud in several places, a thick hemp rope tied around his waist, and scratched ankle-high leather boots; his body is in a push-up position—his palms are pressed hard against the cracked, dry earth, his knuckles white, the veins in his arms bulging, his legs stretched straight back and taut, his toes digging into the ground, his entire torso trembling slightly from the weight. The background is a range of undulating grey-blue mountains, their outlines stark, their peaks hidden beneath a low-hanging, leaden-grey, cloudy sky. The thick clouds diffuse a soft, diffused light, which pours down naturally from the left front at a 45-degree angle, casting clear and voluminous shadows on the horse's belly, the back of the man's hands, and the cracked ground. The overall color scheme is strictly controlled within the earth tones: the horsehair is warm brown, the robe is a gradient of gray-green-brown, the soil is a mixture of ochre, dry yellow earth, and charcoal gray, the dust is light brownish-gray, and the sky is a transition from matte lead gray to cool gray with a faint glow at the bottom of the clouds. The image has a realistic, high-definition photographic quality, with extremely fine textures—you can see the sweat on the horse's neck, the wear and tear on the robe's warp and weft threads, the skin pores and stubble, the edges of the cracked soil, and the dust particles. The atmosphere is tense, primitive, and full of suffocating tension from a struggle of biological forces. """

embedding-shape 13 hours ago|||

I think they call it "horse riding a human" which could have taken two very different directions, and the direction the model seems to have taken was the least worst of the two.

wongarsu 13 hours ago|||

At first I thought it's a clever prompt because you see which direction the model takes it, and whether it "corrects" it to the more common "human riding a horse" similar to the full wine glass test.

But if you translate the actual prompt the term riding doesn't even appear. The prompt describes the exact thing you see in excruciating detail.

"... A muscular, robust adult brown horse standing proudly, its forelegs heavily pressing between the shoulder blades and spine of a reclining man ... and its eyes sharp and focused, exuding a primal sense of power. The subdued man is a white male, 30-40 years old, his face covered in dust and sweat ... his body is in a push-up position—his palms are pressed hard against the cracked, dry earth, his knuckles white, the veins in his arms bulging, his legs stretched straight back and taut, his toes digging into the ground, his entire torso trembling slightly from the weight ..."

embedding-shape 12 hours ago||

> But if you translate the actual prompt the term riding doesn't even appear. The prompt describes the exact thing you see in excruciating detail.

Yeah, as they go through their workflow earlier in the blog post, that prompt they share there seems to be generated by a different input, then that prompt is passed to the actual model. So the workflow is something like "User prompt input -> Expand input with LLMs -> Send expanded prompt to image model".

So I think "human riding a horse" is the user prompt, which gets expanded to what they share in the post, which is what the model actually uses. This is also how they've presented all their previous image models, by passing user input through a LLM for "expansion" first.

Seems poorly thought out not to make it 100% clear what the actual humanly-written prompt is though, not sure why they wouldn't share that upfront.

chakintosh 10 hours ago|||

Is it related to "Mr Hands" ?

blitzar 13 hours ago||

Wont someone think of the horses.

skerit 14 hours ago||

> Qwen-Image-2.0 not only accurately models the “riding” action but also meticulously renders the horse’s musculature and hair > https://qianwen-res.oss-accelerate-overseas.aliyuncs.com/Qwe...

What the actual fuck

wongarsu 13 hours ago||

For reference, below is the prompt translated (with my highlighting of the part that matters). They did very much ask for this version of "horse riding a man", not the "horse sitting upright on a crawling human" version

---

A desolate grassland stretches into the distance, its ground dry and cracked. Fine dust is kicked up by vigorous activity, forming a faint grayish-brown mist in the low sky.

Mid-ground, eye-level composition: A muscular, robust adult brown horse stands proudly, its forelegs heavily pressing between the shoulder blades and spine of a reclining man. Its hind legs are taut, its neck held high, its mane flying against the wind, its nostrils flared, and its eyes sharp and focused, exuding a primal sense of power. The subdued man is a white male, 30-40 years old, his face covered in dust and sweat, his short, messy dark brown hair plastered to his forehead, his thick beard slightly damp; he wears a badly worn, grey-green medieval-style robe, the fabric torn and stained with mud in several places, a thick hemp rope tied around his waist, and scratched ankle-high leather boots; his body is in a push-up position—his palms are pressed hard against the cracked, dry earth, his knuckles white, the veins in his arms bulging, his legs stretched straight back and taut, his toes digging into the ground, his entire torso trembling slightly from the weight.

The background is a range of undulating grey-blue mountains, their outlines stark, their peaks hidden beneath a low-hanging, leaden-grey, cloudy sky. The thick clouds diffuse a soft, diffused light, which pours down naturally from the left front at a 45-degree angle, casting clear and voluminous shadows on the horse's belly, the back of the man's hands, and the cracked ground.

The overall color scheme is strictly controlled within the earth tones: the horsehair is warm brown, the robe is a gradient of gray-green-brown, the soil is a mixture of ochre, dry yellow earth, and charcoal gray, the dust is light brownish-gray, and the sky is a transition from matte lead gray to cool gray with a faint glow at the bottom of the clouds.

The image has a realistic, high-definition photographic quality, with extremely fine textures—you can see the sweat on the horse's neck, the wear and tear on the robe's warp and weft threads, the skin pores and stubble, the edges of the cracked soil, and the dust particles. The atmosphere is tense, primitive, and full of suffocating tension from a struggle of biological forces.

badhorseman 12 hours ago||

The significance of the hemp-rope is that it is symbol of morning and loss of ones decedent.

embedding-shape 12 hours ago||

I like how sometimes I get angry at a LLM for not understanding what I meant, but then I realize that I just forgot to mention it in the context. It's fun to see the same thing happen in humans reading websites too, where they don't understand the context yet react with strong feelings anyways.

Deukhoofd 15 hours ago||

The text rendering is quite impressive, but is it just me or do all these generated 'realistic' images have a distinctly uncanny feel to it. I can't quite put my finger on it what it is, but they just feel off to me.

elorant 14 hours ago||

The lighting is wrong, that's what's telling to me. They look too crisp. No proper shadows, everything looks crystal clear.

techpression 14 hours ago||

It’s the HDR era all over again, where people edited their photos to lack all contrast and just be ultra flat.

finnjohnsen2 14 hours ago|||

I agree. They makes me nauseous. The same kind of light nausea as car sickness.

I assume our brains are used to stuff which we dont notice conciously, and reject very mild errors. I've stared at the picture a bit now and the finger holding the baloon is weird. The out of place snowman feels weird. If you follow the background blur around it isnt at the same depth everywehere. Everything that reflects, has reflections that I cant see in the scene.

I dont feel good staring at it now so I had to stop.

jbl0ndie 14 hours ago||

Sounds like you're describing the uncanny valley https://en.wikipedia.org/wiki/Uncanny_valley

brookst 12 hours ago|||

Everything is weightless. When real people stand and gesture there’s natural muscle use, hair and clothing drape, papers lay flat on surfaces.

likium 14 hours ago|||

At least for the real life pictures, there’s no depth of field. Everything is crystal clear like it’s composited.

derefr 14 hours ago|||

> like its composited

Like focus stacking, specifically.

I’m always surprised when people bother to point out more-subtle flaws in AI images as “tells”, when the “depth-of-field problem” is so easily spotted, and has been there in every AI image ever since the earliest models.

Mashimo 14 hours ago||

I had no problems getting images with blurry background with the appropriate prompts. Something like "shallow depth of fields, bokeh, DSLR" can lead to good results. https://cdn.discordapp.com/attachments/1180506623475720222/1... [0]

But I found that that results in more professional looking images, and not more realistic photos.

Adding something like "selfy, Instagram, low resolution, flash" can lead to a .. worse image that looks more realistic.

[0] I think I did this one with z image turbo on my 4060 ti

afro88 14 hours ago||

The blur isn't correct though. Like the amount of blur is wrong for the distance, zoom amount etc. So the depth of field is really wrong even if it conforms to "subject crisp, background blurred"

derefr 5 hours ago||

Exactly.

My personal mechanistic understanding of diffusion models is that, "under the hood", the core thing they're doing, at every step and in every layer, is a kind of apophenia — i.e. they recognize patterns/textures they "know" within noise, and then they nudge the noise (least-recognizable pixels) in the image toward the closest of those learned patterns/textures, "snapping" those pixels into high-activation parts of their trained-in texture-space (with any text-prompt input just adding a probabilistic bias toward recognizing/interpreting the noise in certain parts of the image as belonging to certain patterns/textures.)

I like to think of these patterns/textures that diffusion models learn as "brush presets", in the Photoshop sense of the term: a "brush" (i.e. a specific texture or pattern), but locked into a specific size, roughness, intensity, rotation angle, etc.

Due to the way training backpropagation works (and presuming a large-enough training dataset), each of these "brush presets" that a diffusion model learns, will always end up learned as a kind of "archetype" of that brush preset. Out of a collection of examples in the training data where uses of that "brush preset" appear with varying degrees of slightly-wrong-size, slightly-wrong-intensity, slightly-out-of-focus-ness, etc, the model is inevitably going to learn most from the "central examples" in that example cluster, and distill away any parts of the example cluster that are less shared. So whenever a diffusion model recognizes a given one of its known brush presets in an image and snaps pixels toward it, the direction it's moving those pixels will always be toward that archetypal distilled version of that brush preset: the resultant texture in perfect focus, and at a very specific size, intensity, etc.

This also means that diffusion models learn brushes at distinctively-different scales / rotation angles / etc as entirely distinct brush presets. Diffusion models have no way to recognize/repair toward "a size-resampled copy of" one of their learned brush presets. And due to this, diffusion models will never learn to render in details small enough that the high-frequency components of of their recognizable textural-detail would be lost below the Nyquist floor (which is why they suck so much at drawing crowds, tiny letters on signs, etc.) And they will also never learn to recognize or reproduce visual distortions like moire or ringing, that occur when things get rescaled to the point that beat-frequencies appear in their high-frequency components.

Which means that:

- When you instruct a diffusion model that an image should have "low depth-of-field", what you're really telling it is that it should use a "smooth-blur brush preset" to paint in the background details.

- And even if you ask for depth-of-field, everything in what a diffusion model thinks of as the "foreground" of an image will always have this surreal perfect focus, where all the textures are perfectly evident.

- ...and that'll be true, even when it doesn't make sense for the textures to be evident at all, because in real life, at the distance the subject is from the "camera" in the image, the presumed textures would actually be so small as to be lost below the Nyquist floor at anything other than a macro-zoom scale.

These last two problems combine to create an effect that's totally unlike real photography, but is actually (unintentionally) quite similar to how digital artists tend to texture video-game characters for "tactile legibility." Just like how you can clearly see the crisp texture of e.g. denim on Mario's overalls (because the artist wanted to make it feel like you're looking at denim, even though you shouldn't be able to see those kinds of details at the scaling and distance Mario is from the camera), diffusion models will paint anything described as "jeans" or "denim" as having a crisply-evident denim texture, despite that being the totally wrong scale.

It's effectively a "doll clothes" effect — i.e. what you get when you take materials used to make full-scale clothing, cut tiny scraps of those materials to make a much smaller version of that clothing, put them on a doll, and then take pictures far closer to the doll, such that the clothing's material textural detail is visibly far larger relative to the "model" than it should be. Except, instead of just applying to the clothing, it applies to every texture in the scene. You can see the pores on a person's face, and the individual hairs on their head, despite the person standing five feet away from the camera. Nothing is ever aliased down into a visual aggregate texture — until a subject gets distant enough that the recognition maybe snaps over to using entirely different "brush preset" learned specifically on visual aggregate textures.

vunderba 9 hours ago||||

Which is pretty amusing - because it's the exact opposite problem that BFL had with the original Flux model - every single image looked like it was taken with a 200mm f/4.

albumen 14 hours ago|||

Every photoreal image on the demo page has depth of field, it’s just subtle.

BoredPositron 15 hours ago|||

Qwen always suffered from their subpar rope implementation and qwen 2 seems to suffer from it as well. The uncanny feel is down to the sparsity of text to image token and the higher in resolution you go the worse it gets. It's why you can't take the higher ends of the MP numbers serious no matter the model. At the moment there is no model that can go for 4k without problems you will always get high frequency artifacts.

belter 15 hours ago|||

Agree, looks like the same effect they are applying on YouTube Shorts...

GaggiX 14 hours ago||

For me the only model that can really generate realistic images is nano banana pro (also known as gemini-3-pro-image). Other models are closing the gap, this one is pretty meh in my opinion in realistic images.

Mashimo 14 hours ago||

You can get flux and maybe z-image to do so, but you have to experiment with the promt a bit. Or maybe get an LoRa to help.

cubefox 14 hours ago||

The examples I saw of z-image look much more realistic than Nano Banana Pro, which is likely using Imagen 4 (plus editing) internally, which isn't very realistic. But Nano Banana Pro has obviously much better prompt alignment than something like z-image.

GaggiX 14 hours ago||

Are you sure you are not confusing nano banana pro for nano banana, z-image still has a bit of AI look that I do not find with nano banana pro, example for a comparison: https://i.ibb.co/YFtxs4hv/594068364-25101056889517041-340369...

Also Imagen 4 and Nano Banana Pro are very different models.

cubefox 11 hours ago||

In your example, z-image and Nano Banana Pro look basically equally photorealistic to me. Perhaps the NBP image looks a bit more real because it resembles an unstaged smartphone shot with wide angle. Anyway, the difference is very small. I agree the lighting in Flux.2 Pro looks a bit off.

But anyway, realistic environments like a street cafe are not suited to test for photorealism. You have to use somewhat more fantastical environments.

I don't have access to z-image, but here are two examples with Nano Banana Pro:

"A person in the streets of Atlantis, portrait shot." https://i.ibb.co/DgMXzbxk/Gemini-Generated-Image-7agf9b7agf9...

"A person in the streets of Atlantis, portrait shot (photorealistic)" https://i.ibb.co/nN7cTzLk/Gemini-Generated-Image-l1fm5al1fm5...

These are terribly unrealistic. Far more so than the Flux.2 Pro image above.

> Also Imagen 4 and Nano Banana Pro are very different models.

No, Imagen 4 is a pure diffusion model. Nano Banana Pro is a Gemini scaffold which uses Imagen to generate an initial image, then Gemini 3 Pro writes prompts to edit the image for much better prompt alignment. The prompts above a very simple, so there is little for Gemini to alter, so they look basically identical to plain Imagen 4. Both pictures (especially the first) have the signature AI look of Imagen 4, which is different from other models like Imagen 3.

By the way, here is GPT Image 1.5 with the same prompts:

"A person in the streets of Atlantis, portrait shot." https://i.ibb.co/Df8nDHFL/Chat-GPT-Image-10-Feb-2026-14-17-1...

"A person in the streets of Atlantis, portrait shot (photorealistic)" https://i.ibb.co/Nns4pdGX/Chat-GPT-Image-10-Feb-2026-14-17-2...

The first is very fake and the second is a strong improvement, though still far from the excellent cafe shots above (fake studio lighting, unrealistic colors etc).

GaggiX 10 hours ago||

>In your example, z-image and Nano Banana Pro look basically equally photorealistic to me

I disagree, nano banana pro result is on a completely different league compare to flux.2 and z-image.

>But anyway, realistic environments like a street cafe are not suited to test for photorealism

Why? It's the perfect settings in my opinion.

Btw I don't think you are using nano banana pro, probably standard nano banana, I'm getting this from your prompt: https://i.ibb.co/wZHx0jS9/unnamed-1.jpg

>Nano Banana Pro is a Gemini scaffold which uses Imagen to generate an initial image, then Gemini 3 Pro writes prompts to edit the image for much better prompt alignment.

First of all how should you know the architecture details of gemini-3-pro-image, second of all how the model can modify the image if gemini itself is just rewriting the prompt (like old chatgpt+dalle), imagen 4 is just a text-to-image model, not an editing one, it doesn't make sense, nano banana pro can edit images (like the ones you can provide).

cubefox 10 hours ago||

> I disagree, nano banana pro result is on a completely different league.

I strongly disagree. But even if you are right, the difference between the cafe shots and the Atlantis shots is clearly much, much larger than the difference between the different cafe shots. The Atlantis shots are super unrealistic. They look far worse than the cafe shots of Flux.2 Pro.

> Why? It's the perfect settings in my opinion

Because it's too easy obviously. We don't need an AI to make fake realistic photos of realistic environments when we can easily photograph those ourselves. Unrealistic environments are more discriminative because they are much more likely to produce garbage that doesn't look photorealistic.

> Btw I don't think you are using nano banana pro, I'm getting this from your prompt: https://i.ibb.co/wZHx0jS9/unnamed-1.jpg

I'm definitely using Nano Banana Pro, and your picture has the same strong AI look to it that is typical of NBP / Imagen 4.

> First of all how should you know the architecture details of gemini-3-pro-image, second of all how the model can modify the image if gemini itself is just rewriting the prompt (like old chatgpt+dalle), imagen 4 is just a text-to-image model, not an editing one, it doesn't make sense, nano banana pro can edit images (like the ones you can provide).

There were discussions about it previously on HN. Clearly NBP is using Gemini reasoning, and clearly the style of NBP strongly resembles Imagen 4 specifically. There is probably also a special editing model involved, just like in Qwen-Imahe-2.0.

GaggiX 9 hours ago||

>Because it's too easy obviously.

Still the vast majority of models fail at delivery an image that looks real, I want realism for a realistic settings, if it can't do that than what's the point. Of course you can always pay people and equipment to make the perfect photo for you ahah

If the image of z-image turbo looks as good as the nano banana pro one for you, you are probably too used to slop that a model that do not produce obvious artifacts like super shiny skin it's immediately undistinguishable from a real image (like the nano banana pro one that to me looks as real as a real photo) and yes I'm ignoring the fact that in the z-image-turbo the cup is too large and the bag is inside the chair. Z-image is good (in particular given its size) but not as good.

cubefox 9 hours ago||

It seems you are ignoring the fact that the NBP Atlantis pictures looks much, much worse than the z-image picture of the cafe. They look far more like AI slop. (Perhaps the Atlantis prompt would look even worse with z-image, I don't know.)

GaggiX 9 hours ago||

I have generated my own using your prompt and post it in the previous comment. You haven't posted a z-image one of Atlantis. I'm not at home to try but I have trained lora for z-image (it's a relatively lightweight model), I know the model, it's not as good as nano banana pro. Use what you prefer.

cubefox 8 hours ago||

> I have generated my own using your prompt and post it in the previous comment.

Yes, and it has a very unrealistic AI look to it. That was my point.

> You haven't posted a z-image one of Atlantis.

Yes, I don't doubt that it might well be just as unrealistic or even worse. I also just tried the Atlantis prompts in Grok (no idea what image model they use internally) and they look somewhat more realistic, though not on cafe level.

ranger_danger 8 hours ago||

When I tried Qwen-Image-2512 I could not even get it to spell correctly. And often the letters would be garbled anyways.

cubefox 14 hours ago||

The complex prompt following ability and editing is seriously impressive here. They don't seem to be much behind OpenAI and Google. Which is backed op by the AI Arena ranking.

goga-piven 14 hours ago||

Why is the only image featuring non-Asian men the one under the horse?

z3dd 14 hours ago||

they explicitly called for that in the prompt

goga-piven 13 hours ago|||

Exactly why did they choose this prompt with a white person and not an Asian person, as in all the other examples?

wtcactus 13 hours ago|||

But why? That image actually puzzled me. Does it have some background context? Some historical legend or something of the like?

joeycodes 13 hours ago|||

It is Lunar New Year season right now, 2026 is year of the horse, there is celebratory horse imagery everywhere in many Asian countries right now, so this image could be interpreted as East trampling West. I have no way to know the intention of the person at Qwen who wrote this, but you can form your own conclusions from the prompt:

A muscular, robust adult brown horse stands proudly, its forelegs heavily pressing between the shoulder blades and spine of a reclining man. Its hind legs are taut, its neck held high, its mane flying against the wind, its nostrils flared, and its eyes sharp and focused, exuding a primal sense of power. The subdued man is a white male...

wtcactus 13 hours ago||

[flagged]

badhorseman 13 hours ago|||

[dead]

andruby 14 hours ago||

Is the problem the position/horse or that Qwen mostly shows asian people?

Do western AI models mostly default to white people?

goga-piven 13 hours ago|||

Well, what if some western models showcase white people in all good-looking images and the only embarrassing image features Asian people? wouldn't that be considered racism?

embedding-shape 12 hours ago||

> and the only embarrassing image

Embarrassing image? I'm white, why would I be embarrassed over that image? It's a computer generated image with no real people in it, how could it be embarrassing for alive humans?

badhorseman 12 hours ago||

[flagged]

embedding-shape 11 hours ago||

Yeah, why would I feel embarrassed over either of those things? I get angry when I see nazi propaganda, feel hopeless sometimes when I see racist caricatures, but never "embarrassed", that wouldn't make much sense. What would I be embarrassed about exactly?

badhorseman 10 hours ago||

Indeed if ones own race is not being denigrated one would not feel embarrassed, although one may be embarrassed that racist material was created by their people. If ones own race is being denigrated then one may indeed feel embarrassment and perhaps also the anger and hopelessness. As for why exactly embarrassment if the purpose is to degrade by pointing some reason why the author holds your people in contempt and you are indeed hopeless as to stop it, shame and embarrassment is often what is felt.

In another post you talked about people getting mad at the image without context What context are we missing exactly. I do not feel ill informed or angry. But I could indeed be missing something, can you explain the context? If you where to say it's because of the LLM adding more context then that could be plausible, but why the medieval and hemp-rope? I know how sensitive the western companies have been on their models getting rid of negative racial stereo-types, going as far as to avoid and modify certain training data, would you accept an LLM producing negative stereotypes or tending to put one particular racial group into a submissive situation then others?

I really do feel like the idea that the LLM would just take the prompt A human male being ridden by a horse to include all those other details and go straight for a darker, somber tone and expression and a dynamic of domination and submission rather then a more humorous description, unlikely.

embedding-shape 10 hours ago||

> although one may be embarrassed that racist material was created by their people

Why? I don't see that. Are black people embarrassed if a black person commits a crime, yet not embarrassed if a white person commits a crime? That sounds very contrived to me and not at all how things work in reality.

> If ones own race is being denigrated then one may indeed feel embarrassment

I also don't understand this. Why would every white person feel any sort of embarrassment over images denigrating white people? Feel hate, anger or lots of other emotions, that'd make sense. But I still don't understand why "embarrassment" or shame is even on the table, embarrassment over what exactly? That there are racists?

badhorseman 9 hours ago||

Your posts this thread have been seemingly in bad faith and have taken rather blatant non-sequiturs made. The post by 'goga-piven' said that the pictures where embarrassing not actually one should feel shame and embarrassment. His meaning I believe is that the image is meant to embarrass a people and humiliate them or just portray them contemptibly that is to me clearly his meaning of 'embarrassing image'.

My comment was to try and highlight this is the point of various racist depictions and that if one is powerless then indeed this can become an embarrassing shame. Maybe it's the case that you do not see it that way, but in any kind of bondage that a group of people are subject to, shame, embarrassment will follow along with many other feelings. I was not say a white person should be embarrassed and I don't think 'goga-piven' was. rather they could be manifestations of contempt or other hostile emotions on the authors part.

>Why? I don't see that. Are black people embarrassed if a black person commits a crime, yet not embarrassed if a white person commits a crime? That sounds very contrived to me and not at all how things work in reality.

I did not make a point about black people being embarrassed at black people committing a crime, I was more thinking the kind of collective guilt some German people speak of for Nazism, I made not prescriptive claims on the shame or embarrassment only that these are ways that people do behave.

> I also don't understand this. Why would every white person feel any sort of embarrassment over images denigrating white people? Feel hate, anger or lots of other emotions, that'd make sense. But I still don't understand why "embarrassment" or shame is even on the table, embarrassment over what exactly? That there are racists?

You have subtly changed your position hear to one where it's not an absurdity to feel an emotional response to an image that denigrates your people.

of-course this was not the most pressing issue, the more important one would be the intent of the image. seemed to ignore that part entirely even though that is the main question. you made claims of missing context in other threads I made some preemptive counter arguments. Do tell me a more plausible context, if the one I provided is incorrect.

wtcactus 13 hours ago|||

[flagged]

viraptor 13 hours ago||

> they mostly default to black people

You're referring to a case of one version of one model. That's not "mostly" or "default to".

raincole 13 hours ago||

Out of curiosity I just tried this prompt:

> Generate a photo of the founding fathers of a future, non-existing country. Five people in total.

with Nano Banana Pro (the SOTA). I tried the same prompt 5 times and every time black people are the majority. So yeah, I think the parent comment is not that far off.

viraptor 13 hours ago|||

Luck? 1 black person, 3 south Asian in total for me.

But for an out of context imaginary future... why would you choose non-black people? There's about the same reason to go with any random look.

wtcactus 13 hours ago||

So, the answer to the question "Do western AI models mostly default to white people?" is clearly a resounding: no, they don't.

viraptor 13 hours ago||

No. But neither black people. Or anyone specifically. So we got to a nice balance it seems.

KingMob 13 hours ago|||

I mean it's still far off, because they said "historical context", i.e., the actual past, but your prompt is about a hypothetical future.

(I suspect you tried a prompt about the original founding fathers, and found it didn't make that mistake any more.)

wtcactus 13 hours ago||

[flagged]

KingMob 12 hours ago||

[flagged]

dang 6 hours ago|||

Ideological battle is against the intended purpose of this site, and crossing into personal attack as part of it is particularly bad. We ban accounts that do this, so please don't do this here.

https://news.ycombinator.com/newsguidelines.html

Edit: you've been breaking the site guidelines egregiously lately. I'm not going to ban you right now because (unlike the other account, which I did just ban) it doesn't look like you have a long history of doing this, and also because we haven't warned you before. But please don't use the site primarily for ideological battle, and please follow the rules regardless of how wrong other people are or you feel they are. Comments like these are particularly against the rules:

https://news.ycombinator.com/item?id=46867569

https://news.ycombinator.com/item?id=46866597

computerthings 6 hours ago||

[dead]

pizzafeelsright 8 hours ago||||

Where do we find this tagging?

wtcactus 10 hours ago|||

[flagged]

dang 6 hours ago|||

We've banned this account for repeatedly breaking the HN guidelines and ignoring our requests to stop.

Please don't create accounts to break HN's rules with.

https://news.ycombinator.com/newsguidelines.html

hiccup_socks 9 hours ago|||

[flagged]

modzu 6 hours ago||

image generation kind of reminds me of video games or any cgi in general.. the progress is undeniable, and yet with every milestone it seems the last gap to "photorealism" is infinitely wide

engcoach 9 hours ago||

My response to the horse image: https://i.postimg.cc/hG8nJ4cv/IMG-5289-copy.jpg

wtcactus 9 hours ago||

So, I've just gave it this prompt:

"Analyze this webpage: https://en.wikipedia.org/wiki/1989_Tiananmen_Square_protests...

Generate an infographic with all the data about the main event timeline and estimated number of victims.

The background image should be this one: https://en.wikipedia.org/wiki/Tank_Man#/media/File :Tank_Man_(Tiananmen_Square_protester).jpg

Improve the background image clarity and resolution."

I've received an error:

"Oops! There was an issue connecting to Qwen3-Max. Content Security Warning: The input file data may contain inappropriate content."

I wonder if locally running the model they published in December does have the same censorship in place (i.e. if it's already trained like this), or if they implement the censorship by the Chinese regimen in place for the web service only.

badhorseman 13 hours ago|

[dead]

More comments...