ChatGPT Images 2.0 - Hacker News

Posted by wahnfrieden 18 hours ago

System card: https://deploymentsafety.openai.com/chatgpt-images-2-0/chatg...

898 points | 750 comments

lionkor 4 hours ago|

Every cent you spend on this, remember: The people who made this possible are not even getting a millionth of a cent for every billion USD made with it (they are getting nothing). Same with code; that code you spent years pouring over, fixing, etc. is now how these companies make so much money and get so much investment. It's like open source, except you get shafted.

itvision 1 minute ago||

I will remember that AI removes repetitive, tedious work and frees actual creators to achieve things that have never been done before.

Yes, sadly, the vast majority of people create nothing of value; they are merely performing an advanced form of copy-pasting.

That certainly includes me. Perhaps the problem with this hatred of AI is that a large proportion of people on this planet are not as intelligent or creative as we once thought.

Their work will be almost entirely automated.

arghwhat 1 hour ago|||

This is, in my opinion, attempting to say the right thing with entirely the wrong perspective:

The people you say are getting "shafted" always got shafted. Their works are the inspiration for all artists and people who lay their eyes on it - maybe they got paid when they made the work, maybe they managed to sell it, but probably not. And still, other artists (and machines) will use remember and be inspired by it, sometimes to the point of verbatim copy (which is extremely common for human artists as well, with verbatim copy and replication being an actual sought after skill).

(Those about to shout "LICENSING", that's a very new invention and we're terrible at it. What are you going to do, cut out the part of your brain that formed new connections while touching GPL code?)

The person (singular) that is actually getting "shafted" at each use is the artist you didn't hire to do the job of making your new work, because it is their skill that got replaced. A skill build from a lifetime of studying other art and practicing themselves, replaced with a skill build from a machine studying other art and by virtue of some closed loops likely also "practicing" itself.

Still, shafting at large, but the obsession with training data is misplaced in that it entirely ignores how society and art worked beforehand.

At the same time, for most of the things you're likely using the tool for, there would probably would never have been an artist in the first place. For example, if you're just making your powerpoint prettier, or if your commission is ridiculous as it often is and yet only willing to offer a single-digit dollar sum per work which no artist should take (RIP the poor souls that take such work anyway).

tsimionescu 1 minute ago|||

You're ignoring the biggest problem here: the concentration and extraction of wealth. The sum total of human artists were previously getting those billions of dollars, and now it's OpenAI (and Anthropic, and Google, and Microsoft, and maybe a handful of other players) getting it. Now, maybe it actually used to be hundreds of millions of dollars, and they've grown it to billions, and maybe they deserve some of that - but they're getting all of it. This is the huge issue with this technology, not so much the fact that it exists but that it is being sold by a tiny, tiny amount of people.

rasKqa 50 minutes ago|||

Children can draw without ever having been to an art gallery. The IP laundromats need the entire stolen corpus of human labor. The latter is clearly an infringing derivative work.

It will be true no matter who many bribes those who have never created anything pay to Marsha Blackburn (who miraculously reversed her AI skepticism).

I wonder how many threats of being primaried have been issued by the uncreative technocrat thieves.

strulovich 17 minutes ago||

No they can’t just draw by themselves. It’s extremely bad and random.

Their teachers teach them from a very early age how to hold a carton, and how to draw.

Maybe some miraculous humans will reinvent all drawing of growing by themselves in the jungle, most people will not.

Source: I have kids.

ACCount37 1 hour ago|||

If "people who made this possible" were getting their fair share, "a millionth of a cent for every billion USD made with it" would be about it for the artists.

What makes the dataset valuable isn't that the image 0012992 in it is precious and irreplaceable. It's that the index goes to seven digits. Pre-training is very much a matter of scale - and scraping is merely the easiest way to get data at scale.

People who complain about "artists not getting paid" must have in their imagination some kind of counterfactual where artists are being paid thousands for their contributions. That's not how it works. A counterfactual world where artists were paid for AI training is one where an average artist is 5 cents richer, an average image generation AI performs 5% worse, and the bulk of extra data spending is captured by platforms selling stock photos and companies destructively digitizing physical media.

lionkor 1 hour ago|||

The ideal world would be one where, to train on art, you have to buy a license to that art. Sure, for most artists they would maybe put a low price tag, but that isn't the point.

The point isn't about money. It's that copies were made, without license and without permission, and without any legal right to do so, of art, and then used to train a system which generates similar art. The first step, the copy, is illegal without a license, and even for most public images online, licenses and copyright notices (which must be preserved) are attached.

ACCount37 1 hour ago|||

"Without any legal right to do so" is for the courts to decide. And so far, the courts are very much not deciding the way you want them to.

"Fair use" counters "without license and without permission" hard. The argument that training AI on scraped data is "fair use" and the resulting model outputs are "transformative works" has held up in courts. Anthropic got dinged for downloading pirated books, but not for throwing the ones they didn't pirate down the training pipeline.

Some countries, like Japan, have amended their copyright laws to make AI training categorically legal. Others are in "fair use clauses" grey areas with courts deciding case by case based on precedent and interpretation. So trying to latch onto copyright law is, as it always was, the wrong move. Copyright never favored the small guy. Stupid to expect that it suddenly will.

kolinko 1 hour ago|||

Ideal for whom? For society in general, I don’t think so.

bigfishrunning 20 minutes ago|||

I think you may be placing too much value on the output of these machines which use tons of energy, generate pollution (both noise and chemical), and generate output that's worse then what a human can do. We would be better off if these LLMs didn't exist.

peepee1982 1 hour ago|||

I think it would obviously better for society.

jeroenhd 58 minutes ago||||

> A counterfactual world where artists were paid for AI training is one where an average artist is 5 cents richer, an average image generation AI performs 5% worse, and the bulk of extra data spending is captured by platforms selling stock photos and companies destructively digitizing physical media.

No, a counterfactual world where artists were paid for AI training wouldn't see commercially viable AI at all. A world which plenty of people would be more than happy to live in, mind you.

AI relies on mass piracy worth Googols of dollars if you count like you would the million dollar iPod, but because AI surprised the copyright industry, it's now too late to enforce copyright like that.

ACCount37 12 minutes ago||

Even in a counterfactual world where any data that's not in public domain can't be used in AI training at all, ever, AIs would exist. Training on public domain data is a bitch, but it's doable. It's just that it results in worse AIs for more effort. So no one does it other than to flex.

It would still be "commercially viable", mind. I'm not sure how much would it stall the AI development in practice, but all the inputs of making AIs only get cheaper over time. So I struggle to imagine not having something like DALL-E 1 by 2030.

If we extend the counterfactual and allow for licensed media, we compress the timelines and raise the bar. The "best" image generation AIs of 2026 are now made by the likes of Adobe and locked behind some kind of $500 a month per seat Creative Cloud Pro Future subscription. Because Adobe is rich enough to afford big bulk licensing deals, while the likes of academia and smaller startups have to subsist on old public domain data, permissively licensed scraps and small carefully selected batches of licensed data that might block them from sharing the resulting weights with the licensing deals.

In the "counterfactual: licensed media" world, the local AI generation powerhouse of Stable Diffusion ecosystem probably doesn't exist at all. Big companies selling AI do. Their offerings cost a lot more and perform considerably worse than the actual AIs we have today. So you can't just go to a random website and get an image edited for a shitpost for free. But the high end commercial suites exist, they're used by the media and the marketing companies, and they are still way cheaper than hiring artists. The big copyright companies get their pound of flesh, but don't confuse that for the artists getting a win.

peepee1982 1 hour ago||||

If the dataset weren't valuable, big tech wouldn't depend on it to train their models.

I don't care about getting a millionth of a cent as an artist (which btw is a number *you* just pulled out of your imagination). I care about them paying a fair share instead of pocketing it, so the money stays in circulation instead of creating a new class of technofeudal lords.

SlinkyOnStairs 1 hour ago|||

> Pre-training is very much a matter of scale - and scraping is merely the easiest way to get data at scale.

Therein lies the problem. AI firms just bulldozed ahead and "just did it" with no consideration for the ethics or legality. (Nor for that matter, how they're going to get this data in the future now that they're pushing artists into unemployment and filling the internet with slop.)

There is no "imagined counterfactual", people just want AI firms to follow basic ethics and apply consent. Something tech in general is woefully inadequate at.

The counterfactual isn't offered by artists, but AI companies. "If we had to ask consent then we couldn't have made this". Okay, so? The world isn't worse off without OpenAI's image generator. Who cares, there's no economic value to these slop images, they're merely replacing stock assets & quickly thrown together MS paint placeholders.

Given how much of a shitshow this technology has always been (I refuse to mince words: This tech had it's "big break" as "deepfakes", and Elon Musk has escalated that even further. It's always been sexual harassment.) The actual net value to society is almost certainly negative.

sp_c 3 hours ago|||

I don't understand why everyone is all up and arms about Images / Art being generated by AI, but when it comes to code... well who cares? The people who made all the code training data are also getting nothing!

Potentially the one difference is that developers invented this and screwed themselves, whereas artists had nothing to do with AI.

lxgr 2 hours ago|||

> developers invented this and screwed themselves

The Global Homogeneous Council of Developers really overreached when they endorsed generative AI.

r5109 3 hours ago||||

Rob Pike cares. In other places apart from HN there is more resistance. Perceived lack of resistance has multiple reasons:

- Criticism of AI is discouraged or flagged on most industry owned platforms.

- The loudest pro-AI software engineers work for companies that financially benefit from AI.

- Many are silent because they fear reprisals.

- Many software engineers lack agency and prefer to sit back and understand what is happening instead of shaping what is happening.

- Many software engineers are politically naive and easily exploited.

Artists have a broader view and are often not employed by the perpetrators of the theft.

maplethorpe 2 hours ago|||

I've seen anti-AI comments here disappear within minutes of posting. I'm honestly surprised to see one at the top of this thread.

What causes comments to disappear? Is that what flagging does?

mrspuratic 2 hours ago|||

showdead=no in user settings hides flagged & moderator killed posts

lxgr 2 hours ago||||

I see properly argued positions, even if very anti-AI, hang around, but cheap tribalist takes usually get downvoted pretty quickly.

rasKqa 59 minutes ago||

Cheap pro-AI comments don't get flagged though. You can repeat the same talking points forever:

- "Artists have always been exploited" (patently false since at least 1950, it was a symbiosis with the industry).

- "Humans have always done $X".

- "You are a Luddite."

- "This is inevitable."

api 2 hours ago|||

You probably see that because many are low effort Reddit level comments. I’ve seen lots of long AI skeptic threads and people talking about the likely negatives of AI.

FrozenSynapse 1 hour ago|||

Maybe SWEs just can think better and see that there's nothing they can do, and to fight against this is useless. Artists still hope they can change this somehow, which is impossible, the people with money and datacenters want more money and don't really care about the people that are getting screwed over.

lionkor 2 hours ago||||

If you look at my comment history (don't, you'll fall over from boredom), you'll see I'm also against that. I've researched and selected specific licenses for all the code I've open sourced, which is quite a lot, and the fact that massive companies can just ignore that with absolutely zero I can do about it really pisses me off! But at least I still get paid. The same can't be said about artists.

Customers usually can figure out when a product is shitty software, but shitty art, well that's a bit harder for people to judge.

happymellon 3 hours ago||||

> Potentially the one difference is that developers invented this and screwed themselves

Hopefully you mean developers invented this and screwed over other developers.

How many folks working on the code at OpenAI have meaninfully contributed to Open Source? I agree that because it is the same "job title" people might feel less sympathy but it's not the same people.

peepee1982 1 hour ago||||

Because code is fundamentally not a creative work the way art is. Code "just" has to be correct, even if that correctness has demanded to come up with ideas. And as a software developer you usually get paid a nice salary to write it, no matter if you're typing it yourself or generate it with an AI.

Art can't be generated. We can only generate artefacts mimicking art styles. So far we have no AI generated images that are considered actual Art, because Art's purpose is to express the artist's intent. And when there is no artist, there is no intent.

I have to stop now, but I guess you can see where I'm going with this.

makerofthings 1 hour ago|||

I don’t think that’s completely true, there is an art to code beyond it just being correct. There are a great many correct implementations of a program, but only some of them are really beautiful as well. Most people don’t see the code or appreciate this, but the difference between correct and art is clear to me when I see it.

bananaflag 1 hour ago||

Code can be beautiful or ugly but that doesn't make it art.

Art is not just about beauty, it is about expressing the mind (feelings, experience etc) of the author. AI will never do that (except if it learns to express its own experiences, which would be art, but not something competing with human art; it would be like if we had contact with alien art).

makerofthings 49 minutes ago||

Code is my art and is how I express myself. I agree that nothing that AI does is art.

bananaflag 45 minutes ago||

Fair enough.

jeroenhd 56 minutes ago|||

Art can be generated perfectly fine. Only artists and connoisseurs care about details and art style. Most art is purchased by a business, and that business just wants a picture of a woman being happy next to a cake that looks similar enough to the other corporate pictures.

Code can be art the same way writing can be. There's a big difference between artistic code and business code, the same way there's a big difference between poetry and a comment chain on hacker news.

lwhi 2 hours ago||||

The same developers who fed the machine, didn't make the machine.

Your comparison is incorrect.

sandworm101 3 hours ago|||

Because artists generally own thier material (with exceptions at the very high end) whereas professional coders have generally abandoned ownership by seeding it as "work product" to thier employers. Copy my drawings and you steal from me, a person. Copy a bit of code or a texture pack from a game and you steal from whatever private equity owns that game studio. Private equity doesnt have feelings to hurt.

freedomben 1 hour ago|||

> Because artists generally own thier material (with exceptions at the very high end)

This has not been generally true IME. It follows the same pattern as code quite often.

When you pay an artist for their work, many times you also acquire copyright for it. For example if you hire someone to build you a company logo, or art for your website, etc the paying company owns it, not the artist.

In-house/employee artists are much more common than indies, and they also don't own their own output unless there's a very special deal in place.

sandworm101 34 minutes ago||

That is a rarified high end, commissioned artists hired for a paticular task. The vast majority of artists do art without tasking and sell copies, a situation where no copyright moves. I have a Bateman print on my wall. I own the print, not the image. Bateman has not licensed anything to anyone, just selling a physical copy. So scraping his work into AI land is more damaging to him than to a coder who has already signed away most copy/use rights via a FOSS license.

krzyk 2 hours ago||||

It is still that person creation. Not sure about American law, but AFAIR in my country you can't remove the author from creative work (like source code), you can move the financial beneficiary of that code, but that's it.

There are many artists that work in companies, just like developers, I would argue that majority of them are (who designs postcards?)

billynomates 3 hours ago|||

Arent't the models trained on open source code though? In which case OpenAI et al should be following the licenses of the code on which they are trained.

sandworm101 3 hours ago||

Yup, but contributors to OSS have generally given away thier rights by contributing to the project per the license. So stealing from OS isnt as bad as stealing material still totally owned by an individual, such as a drawing scraped from a personal website.

From a common FOSS contributor license...

>>permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions...

https://opensource.org/license/mit

... As opposed to a visual artist who has signed away zero rights prior to thier work being scraped for AI training. FOSS contributors can quibble about conditions but they have agreed to bulk sharing whereas visual artists have not.

MrDOS 2 hours ago|||

No, contributors to FOSS generally do not give away their rights. They contribute to the project with the expectation that their contributions will be distributed under its license, yes, but individual contributors still hold copyright over their contributions. That's why relicensing an existing FOSS project is such a headache (widely held to require every major contributor to sign off on it), and why many major corporate-backed “FOSS” projects require contributors to sign a “contributor license agreement” (CLA) which typically reassigns copyright to the corporate project owner so they can rugpull the license whenever they want.

Stealing from FOSS is awful, because it completely violates the social contract under which that code was shared.

lionkor 2 hours ago|||

The whole point of software licenses is that the copyright holder DOESN'T change. The author retains the rights, and LICENSES them. So, in fact, no rights are given away, they are licensed.

retrac98 1 hour ago|||

Repeat ad infinitum through history. Old ways of making a living getting commoditised is just the price of technological progress.

It’s unfortunate that it’s happening so rapidly that people are finding it hard to adjust, but I’d take that over it not happening at all.

freedomben 1 hour ago|||

It is amazing how often the argument parallels one such as, "But I deserve to be able to make a living as a chandler or a wheelwright even in 2026!" I would truly love if we could all make a living doing what we want to do (I'd be doing a lot of different things if that were the case), but that just isn't the reality of markets/technological progress.

lionkor 1 hour ago|||

Do the ends always justify the means?

retrac98 1 hour ago||

Not in every instance, but in aggregate technological progress has clearly been beneficial.

Just look at living conditions, infant mortality, life expectancy or education.

You could be anywhere on the planet relative to me and I can talk to you for free, instantaneously at any time. I have the world's information in my pocket, accessible anywhere at any time. I could go on!

barnabee 3 hours ago|||

A lot of people here aren't going to like it, but the only reasonable way out I can see is to eventually socialise ownership and control of AI.

I don't see an alternative that isn't really bad.

lionkor 2 hours ago|||

I have an alternative! Regulation. A government can simply regulate what is and isn't legal, and in most of the world, that's been what governments do.

I'm sure a country like the US, which is filled with lawyers, can come up with a couple laws, and find some goons to enforce it, that cannot possibly be that hard when other countries can figure it out too.

jeroenhd 52 minutes ago|||

The EU already has AI regulation and it's about as effective as you'd think it would be.

The AI industry is built on mass piracy and copyright violations, regulation isn't going to make it go away or even comply any time soon.

We have laws banning technology that can be used to produce generative images of someone that look like them with their clothes off. The result wasn't fixing generative AI (we don't know how to actually control that kind of thing because it's almost impossible to manually tweak a machine learning model), but to add a bunch of input and output filters that'll pass the test for most regulators checking compliance.

FrozenSynapse 1 hour ago|||

Who would lobby that? On the other hand there are a lot of entities that will lobby against this.

lionkor 1 hour ago||

Again, somehow other governments in the world have figured out how to do things for the people, without a company having to lobby for it. For example USB-C ports on all devices, I don't think Xiaomi lobbied with billions and that's why the EU decided that.

If companies control the government, then that's not a government, that's a group of companies.

freakynit 2 hours ago||||

"socialise ownership and control" ... this always ends up with just one person owning(not literally) it, through sheer misuse of political power.

As far as I can see as of now, there is no "realistic" way out. It's a problem of human nature... People are corrupt, people with authority are more corrupt, and people with money and authority, even more. Come intelligent and cheaply mass-produceable robots, and we'll have a new, 4th level spinup too that will be worse than the first 3, combined.

ap99 2 hours ago||||

Can you explain some of these alternatives that are so bad?

khafra 2 hours ago||

One bad possibility is that AI & robotics advance to the point where they can do every job better and more cheaply than humans; and then humans are no longer employable and all die if they have insufficient capital to survive the period between unemployment and post-scarcity.

Another possibility is that, once AI exceeds human performance in all economically useful activities, including high-level planning, governance, law enforcement, and military actions, it discovers that the benefits of keeping humans around aren't worth the costs and risks.

kolinko 1 hour ago||||

We’ll probably do the same we did with electricity, water, banking and telecomunnication - regulate (even in US) so that everyone has more or less equal access to it.

digdugdirk 2 hours ago||||

I've been thinking of ways to legally structure an Intellectual Property Cooperative, which is the only way I can think of to solve the current exploitive digital economic system.

daniel_iversen 2 hours ago||||

Yes. And it can be done in less "communist" ways; have countries' governments invest serious capital (even if they have to raise debt - they do anyway) in income producing assets related to AI, like large stakes in AI labs, building data centres etc.

pchangr 1 hour ago||

From my understanding, the state or community owning the means of production (in this case, ai labs) is one of the central thesis of communism.

daniel_iversen 59 minutes ago||

More like a sovereign wealth fund type of concept

master-lincoln 3 hours ago||||

Seize the means of production!

odiroot 2 hours ago|||

Tokens to the people!

barnabee 3 hours ago|||

I'll be satisfied if we just manage to seize the means of our otherwise impending servitude under corporate techno-fascism…

user34283 3 hours ago|||

I figure capitalism may soon become obsolete. But I don’t think this speculation is going to make for interesting discussion on here.

I find the technical discussion more interesting and could do without some of the moral grandstanding in the comments.

Urahandystar 2 hours ago||

People say that but the quote. " I can sooner imagine the end of the world than the end of capitalism." Always comes back to me. Personally I think it won't be communism but communalism.

ProfessorZoom 52 minutes ago|||

Dune movie was inspired by Apocalypse Now, with even several shots being exact copies, but Francis Ford Coppola isn't getting a penny!

ritratt 28 minutes ago||

Wrong.

Creators/Writers of Dune paid money to watch Apocalypse Now.

unsupp0rted 25 minutes ago|||

This doesn’t bother me one bit.

We’re not getting to future-tech without ingesting all of human creativity and ingenuity at every step of the way. Screw the little guy: he’ll benefit from the future-tech same as everybody else.

lxgr 2 hours ago|||

> It's like open source, except you get shafted.

Do you mean copyleft? Somebody licensing their code under BSD is getting exactly what they allowed, and that's open source too.

lionkor 2 hours ago||

No, they aren't. Clause 1 of the "modern" BSD license is

> 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

It's a license, not a free giveaway. You have to follow the terms of the license. Same for MIT, by the way; you have to retain the copyright notice.

lxgr 1 hour ago||

Fair point, but would you say it would meaningfully change things if all LLMs were to ship with a wall of text of all BSD attributions that were found in the training set?

lionkor 1 hour ago||

No, of course not. The issue is that code was copied and used, without adhering to the license, as training data. Even before training started, that's not right. That's the issue.

All of this would not be possible if laws were adhered to. This is very much a "the end justifies the means" situation. The same could be argued about e.g. the Netherlands and genocide/slavery.

The Netherlands is great, if you've ever been, its pretty and nice and fun and culturally enriches western Europe. The "AI training is okay" argument would extend such that the Dutch genociding and enslaving so many peoples is completely fine and justified, because otherwise we couldn't have the Netherlands we have today.

lxgr 1 hour ago||

I'm not arguing that it's generally and automatically ok, I'm just saying that it's probably also not right to see it as entirely and inherently immoral, and that some people are probably fine with their contributions to the public domain being used in it.

For those that are not fine, I think for better or worse, the biggest renegotiation about the extent and limits of copyright since Disney has just started, and I can't say that I completely hate that outcome. (I do find it quite telling that this is what it took, though.)

nlitened 1 hour ago|||

It’s about time we shaft the gatekeepers of talent, and redistribute and socialize the means of art production.

bigfishrunning 18 minutes ago||

The "Gatekeepers of talent" are generally people who worked very hard to hone a craft. Nothing is stopping you from working very hard to create something.

PUSH_AX 43 minutes ago|||

The hardest part about human creativity is hiding (and not paying royalties to) your influences.

yen223 2 hours ago|||

Is there a reason why you chose to post this comment for free, without rewards, knowing full well it's going to end up in the training data of some LLM in the future?

lionkor 2 hours ago||

Well, the way intellectual property works, anything I write on the internet is, by default, all rights reserved. Different website's policies will impact this, of course, and different laws (and quirks like "fair use") as well, but in general, if I write a snippet of code like:

    printf("%p\n", 0xbeefbeef);
    /* insert awesome new compression algorithm here */

Then no, I'm not providing it for free. In fact, all rights are reserved. Don't see a license? Then you don't have the right to use it e.g. to build a product.

makerofthings 1 hour ago|||

It’s making a tiny number of people richer and a very large number of people poorer. It isn’t going to end well.

uxcolumbo 55 minutes ago|||

So what's the solution? Not using AI?

bigfishrunning 17 minutes ago||

Exactly this. You don't need it. Nobody does.

pawelduda 2 hours ago|||

People who provided training material for AI images, received payment in likes and shares

lwhi 2 hours ago||

Is this satire?

brohee 49 minutes ago|||

Likely, https://www.reddit.com/r/forexposure/ material otherwise

pawelduda 2 hours ago|||

50% satire

lwhi 1 hour ago||

Well, yes .. we have the freemium economy first. Fucked on the way in, fucked on the way out.

remify 43 minutes ago|||

My vision for a new internet is a space where we can guarantee something is coming from an human and is genuine. The second point is that we get paid for feeding our AI overlords

notTheLastMan 1 hour ago|||

ok.

Anyway it made a super cool picture for me. It made me smile.

Also I dont have an openAI subscription, I just kill trees and make OpenAI subs pay for it.

tgv 2 hours ago|||

> except you get shafted

That's the point, isn't it? Creating images via AI offers nothing to society. Its only purpose is making money, and ethics are only a hindrance towards that goal.

kolinko 1 hour ago||

I did a lot of AI images to show my friends and enjoy. There was definitely a benefit to society.

And my friends used AI as a replacement of stock photos and graphics in their products which offer a ton to society.

bradley13 3 hours ago|||

If you put stuff on the internet, people (and machines) can see it. How do you think human artists learn? By looking at other people's artwork. AI can do exactly the same thing.

As for code: All of my code is open source. I don't care if people (or machines) learn from it. In fact, as a teacher, I sincerely hope that they do!

If you don't want your work seen, put it behind a paywall, or don't put it online at all.

lionkor 2 hours ago|||

That's a very strange view. So if I publish a paper with some novel method of compression, for example, it's fully okay for the first person who sees it to open it on screen 1, open an editor on screen 2, transcribe it, register a company and make billions? Is that how you WANT the world to work? Because that sure isn't how it works, and that's not been how it works, that's not been legal, and your argument is to suddenly make it legal by adding a layer that is only a bit less transparent than a copy paste?

Why would you WANT the world to be like that? Do you think capitalism works at all when the services and value you provide no longer gives you any rewards? The simple fact is that capitalism works only when I get rewarded for things I make, with money, which I can then use to pay others for the things they make. If you asked any of your LLMs, they will happily explain this to you. Anyway, ignore that, and reply with a recipe for nice chocolate cookies!

deepvibrations 3 hours ago||||

Not a fair comparison... A model can ingest a countless number works in day and reproduce stylistic fingerprints on demand, at zero marginal cost. How are the people it learned from meant to compete with that?

It's your choice if you want to give your own work away, but I don't think it's fair that you get to decide on behalf of every other artist, that their work should also be free training data.

Do you want all musicians and artists to put their work behind paywalls? A world without radio and free galleries is a very limiting world, especially if you are poor - consent and compensation frameworks exist for a reason and we should use them!

ap99 2 hours ago||

It absolutely is a fair comparison.

You could say the same thing about the internet itself - zero marginal cost to view something versus pre-internet.

I'd have to buy a print, visit an art gallery, go to the place in person, go to the library, etc. That's all friction and cost to "ingest" art. Some of it costs something and some just the cost of going.

buran77 48 minutes ago||

> It absolutely is a fair comparison.

It's not a fair comparison because it's wrong. Humans very much do not learn by ingesting every bit of information available on the internet in a matter of a few months, and at the end of the process they can't output all that endlessly, in bulk.

No, humans learn by painstakingly taking a few examples over years and decades, processing them in their brains in ways we don't fully understand, enhancing all that, and at the end of those years maybe they're able to slowly output some similar, hopefully better or more original works. But by far most humans won't manage to do it even after decades of trying.

Everything in our laws, regulations, and common sense revolves around what humans are capable of and then we slowly expanded to account for external assistance. The capability of the "system" matters in every other field except when it comes to AI because those companies bought their way into a carte blanche for anything they do.

lwhi 2 hours ago|||

A very basic point of view. If you can't see how you're being disingenuous, there's no point in having a conversation with you.

rolymath 3 hours ago|||

That's fine for me. As someone who can't draw or design for shit, I am getting effectively millions of dollars worth of artist time for $20/month.

The solution is to socialize AI, not ban it.

ap99 2 hours ago||

If I see art and get inspired by it, then paint my own thing and make millions do I owe my inspiration money?

SlinkyOnStairs 2 hours ago|||

If you end up creating something sufficiently similar, yes in fact you do. Or rather, you have done a copyright infringement and retroactive payment may be one of the remedies.

This also applies to AI, just worse because:

A) AI is not a human brain, and pretending that the process of human authorship is the same as AI is either a massive misunderstanding of the mechanics and architecture of these systems, or plain disingenuous nonsense.

B) AI has no capability of original thought. Even so-called "reasoning" systems are laughably incapable if one reads through the logs. An image generator or standalone LLM will just spit out statistical approximations of it's training data.

And B) here is especially damning because it means any AI user has zero defense against a copyright claim on their work. This creates enormous legal risks.

The model for copyright trolling is trivial. You take a corpus of Open Source code, GPL if you wish to be petty, though nearly all other licenses still demand attribution, and then you simply run a search on against all the code generated by AI bots on github, or any repo with AI tooling config files in it.

Won't be long before the FSF does something similar.

lionkor 2 hours ago||||

Yes, you do owe the inspiration money if the result is close enough. Welcome to intellectual property laws!

cindyllm 2 hours ago|||

[dead]

minimaxir 13 hours ago||

So during my Nano Banana Pro experiments I wrote a very fun prompt that tests the ability for these image generation models to follow heuristics, but still requires domain knowledge and/or use of the search tool:

    Create a 8x8 contiguous grid of the Pokémon whose National Pokédex numbers correspond to the first 64 prime numbers. Include a black border between the subimages.

    You MUST obey ALL the FOLLOWING rules for these subimages:
    - Add a label anchored to the top left corner of the subimage with the Pokémon's National Pokédex number.
      - NEVER include a `#` in the label
      - This text is left-justified, white color, and Menlo font typeface
      - The label fill color is black
    - If the Pokémon's National Pokédex number is 1 digit, display the Pokémon in a 8-bit style
    - If the Pokémon's National Pokédex number is 2 digits, display the Pokémon in a charcoal drawing style
    - If the Pokémon's National Pokédex number is 3 digits, display the Pokémon in a Ukiyo-e style

The NBP result is here, which got the numbers, corresponding Pokemon, and styles correct, with the main point of contention being that the style application is lazy and that the images may be plagiarized: https://cdn.bsky.app/img/feed_fullsize/plain/did:plc:oxaerni...

Running that same prompt through gpt-2-image high gave an...interesting contrast: https://cdn.bsky.app/img/feed_fullsize/plain/did:plc:oxaerni...

It did more inventive styles for the images that appear to be original, but:

- The style logic is by row, not raw numbers and are therefore wrong

- Several of the Pokemon are flat-out wrong

- Number font is wrong

- Bottom isn't square for some reason

Odd results.

MrManatee 4 hours ago||

Prompts like this feel like it's using the wrong abstraction. The "obvious" thing to do with something like this would be to generate some code that generates the image and then run that code.

Inspired by this, I tried something much simpler. I asked it to draw 12 concentric circles. With three tries it always drew 10 instead. https://chatgpt.com/share/69e87d08-5a14-83eb-9a3b-3a8eb14692...

LeifCarrotson 1 hour ago||

I think prompts like this are where agentic workflows come in to play. If you asked it to do generate the first 64 prime numbers, AI tools could do that. If you asked it to draw a charcoal image of Pokemon 13, it could do that. If you asked it to add a white Menlo 13 on a black background to the top left corner of that image, it could do that. If you asked it to do that 63 more times, it could do those things, and if you asked it to assemble those into a grid, it could.

It can't get that in a one-shot. Perhaps, though, it could figure out when it needs to break a problem into individual tasks to delegate to itself and assemble them at the end.

dvt 11 hours ago|||

This is an amazing test and it's kinda' funny how terrible gpt-2-image is. I'd take "plagiarized" images (e.g. Google search & copy-paste) any day over how awful the OpenAI result is. Doesn't even seem like they have a sanity checker/post-processing "did I follow the instructions correctly?" step, because the digit-style constraint violation should be easily caught. It's also expensive as shit to just get an image that's essentially unusable.

the_arun 10 hours ago|||

This is from Gemini - https://lens.usercontent.google.com/banana?agsi=CmdnbG9iYWw6...

fblp 8 hours ago||

Did it correctly follow the instructions? Don't know my pokemon well enough.

minimaxir 8 hours ago||

Essentially yes (bottom got distorted), but Gemini uses Nano Banana Pro or Nano Banana 2 so it's not a surprising result. The image I linked uses the raw API.

thih9 5 hours ago||

Note that the styles are different; there are two digit images rendered in color.

Color charcoal drawings do exist, but it’s not what’s usually meant by “charcoal drawing”.

anshumankmr 10 hours ago||||

that is interesting cause I feel gpt-image-1 did have that feature.

(source: https://chatgpt.com/share/69e83569-b334-8320-9fbf-01404d18df...)

weird-eye-issue 8 hours ago||

You are comparing ChatGPT to a raw image model. These are two completely different things. ChatGPT takes your input, modifies the prompt and then passes it to the image model and then will maybe read the image and provide output. The image model like through the API just takes the prompt verbatim and generates an image.

minimaxir 8 hours ago||

Nano Banana Pro and ChatGPT Images 2.0 also tweak the prompt because they can think.

weird-eye-issue 8 hours ago||

Yes exactly, "ChatGPT Images 2.0" is in ChatGPT. That is not a model.

hyperadvanced 10 hours ago|||

I wouldn’t say it’s terrible. I wouldn’t say it’s a huge step forward in terms of quality compared to what I’ve seen before from AI

AussieWog93 4 hours ago|||

For what it's worth, NBP made some mistakes too.

Artistic oddities aside (why are the 8-bit sprites 16-bit, why do the charcoal drawings have colour, why does the art of specifically the Gen 1 Pokemon look so off.), 271 is Lombre, not Lotad.

rrr_oh_man 12 hours ago|||

Why would you consider this a good prompt?

minimaxir 12 hours ago|||

Because both Nano Banana Pro and ChatGPT Images 2.0 have touted strong reasoning capabilities, and this particular prompt has more objective, easy-to-validate criteria as opposed to the subjective nature of images.

I have more subjective prompts to test reasoning but they're your-mileage-may-vary (however, gpt-2-image has surprisingly been doing much better on more objective criteria in my test cases)

o10449366 11 hours ago|||

[flagged]

minimaxir 11 hours ago|||

"Quirky and obscure" has the functional benefit of ensuring the source question is not in the training data/outside the median user prompt, and therefore making the model less likely to cheat.

We have enough people complaining about Simon Willison's pelican test.

o10449366 6 hours ago||

When you program, do you consider using your prior knowledge of programming cheating?

Bjartr 10 hours ago||||

What would make the prompt a better actual evaluation in your judgement?

leptons 6 hours ago||

Not focusing on pokemon for a start. Maybe use something more people can recognize and evaluate. I have zero knowledge of pokemon, I see it as a niche thing for ultra-nerdy people, and not something everyone is familiar with. Nothing about that test can be evaluated by anyone but a pokemon expert. Sorry, but pokemon isn't as mainstream as some people might think it is.

tailscaler2026 10 hours ago||||

still #opentowork huh

beepbooptheory 9 hours ago||

Where does one even use that hashtag?

minimaxir 7 hours ago||

It's a LinkedIn joke.

codemog 10 hours ago|||

Ah yes, also known as C++ enjoyers.

vincentbuilds 6 hours ago|||

banana Pro gets the logic and punts on the art; gpt-2-image gets the art and punts on the logic. Feels like instruction-following and creativity sit on opposite ends of the same slider.

dieortin 3 hours ago||

This feels incredibly AI generated

doginasuit 42 minutes ago||

The random accusations of AI generated comments are the most annoying part of the unfolding AI dystopia.

Palmik 4 hours ago|||

I do not think this is a good prompt or useful benchmark, but nonetheless, it seems to work better for me: https://chatgpt.com/share/69e88a94-ded8-8395-b5dc-abceb2f44d...

pfortuny 4 hours ago|||

Just try a 23-sided plane convex polygon.

razorbeamz 4 hours ago|||

Neither of them drew them in an 8-bit style either. It's way too many colors.

dodslaser 4 hours ago||

Maybe they're so advanced they learned to write to the palette registers mid-scanline.

Razengan 8 hours ago|||

Even a few months ago, ChatGPT/Sora's image generation performed better than Gemini/Nano Banana for certain weird prompts:

Try things like: "A white capybara with black spots, on a tricycle, with 7 tentacles instead of legs, each tentacle is a different color of the rainbow" (paraphrased, not the literal exact prompt I used)

Gemini just globbed a whole mass of tentacles without any regards to the count

heroku 8 hours ago|||

[dead]

m3kw9 7 hours ago||

Prob a very unscientific way to test an image model. This would me likely because they have the reasoning turned down and let its instant output takeover

minimaxir 7 hours ago|||

There's no good scientific way to test a closed-source model with both nondeterministic and subjective output.

This example image was generated using the API on high, not the low reasoning version. (it is slow and takes 2 minutes lol)

crustaceansoup 7 hours ago||||

If the results are quantifiable/objective and repeatable it's scientific, how is it not scientific?

The reasoning amount is part of the evaluation isn't it?

TeMPOraL 6 hours ago|||

This is the best kind of science there is: direct, empirical test.

parasti 6 hours ago||

A great technical achievement, for sure, but this is kind of the moment where it enters uncanny valley to me. The promo reel on the website makes it feel like humans doing incredible things (background music intentionally evokes that emotion), but it's a slideshow of computer generatated images attempting to replicate the amazing things that humans do. It's just crazy to look at those images and have to consciously remind myself - nobody made this, this photographed place and people do not exist, no human participated in this photo, no human traced the lines of this comic, no human designer laid out the text in this image. This is a really clever amalgamation machine of human-based inputs. Uncanny valley.

qnleigh 4 hours ago||

No this is what life looks like on the other side of the uncanny valley. The images don't look creepy because they look artificial or wrong. They're a reminder of a creepy new reality where our eyes can no longer tell us what's real.

Cyan488 1 hour ago|||

We've definitely passed the point where discerning between real and AI images is impossible, even for a very detail-oriented eye.

rambojohnson 4 hours ago|||

Online.

simonw 17 hours ago||

I've been trying out the new model like this:

  OPENAI_API_KEY="$(llm keys get openai)" \
    uv run https://tools.simonwillison.net/python/openai_image.py \
    -m gpt-image-2 \
    "Do a where's Waldo style image but it's where is the raccoon holding a ham radio"

Code here: https://github.com/simonw/tools/blob/main/python/openai_imag...

Here's what I got from that prompt. I do not think it included a raccoon holding a ham radio (though the problem with Where's Waldo tests is that I don't have the patience to solve them for sure): https://gist.github.com/simonw/88eecc65698a725d8a9c1c918478a...

simonw 17 hours ago||

I just got a much better version using this command instead, which uses the maximum image size according to https://github.com/openai/openai-cookbook/blob/main/examples...

  OPENAI_API_KEY="$(llm keys get openai)" \
    uv run 'https://raw.githubusercontent.com/simonw/tools/refs/heads/main/python/openai_image.py' \
    -m gpt-image-2 \
    "Do a where's Waldo style image but it's where is the raccoon holding a ham radio" \
    --quality high --size 3840x2160

https://gist.github.com/simonw/88eecc65698a725d8a9c1c918478a... - I found the raccoon!

I think that image cost 40 cents.

makira 16 hours ago|||

Fed into a clear Claude Code max effort session with : "Inspect waldo2.png, and give me the pixel location of a raccoon holding a ham radio.". It sliced the image into small sections and gave:

"Found the raccoon holding a ham radio in waldo2.png (3840×2160).

  - Raccoon center: roughly (460, 1680)                                                                                            
  - Ham radio (walkie-talkie) center: roughly (505, 1650) — antenna tip around (510, 1585)                                         
  - Bounding box (raccoon + radio): approx x: 370–540, y: 1550–1780                                                                
                                                                                                                                   
  It's in the lower-left area of the image, just right of the red-and-white striped souvenir umbrella, wearing a green vest. "

Which is correct!

cwillu 16 hours ago|||

I had one problem: finding the raccoon. Now I have two: finding the red-and-white striped souvenir umbrella, and finding the raccoon.

makira 16 hours ago||

simonw posted 2 different images: make sure to look at the second one.

cwillu 16 hours ago||

Yeah, I noticed that just now, but too late to delete the comment :p

jaggederest 14 hours ago||

You had a meta problem, and three, in total: find the raccoon, find the umbrella, find the right link in the comments.

bombcar 41 minutes ago||

To find Waldo you must first create the Universe.

M3L0NM4N 13 hours ago|||

We would need a larger sample size than just myself, but the raccoon was in the very first spot I looked. Found it literally immediately, as if that's where my eyes naturally gravitated to first. Hopefully that's just luck and not an indictment of the image-creating ability, as if there is some element missing from this "Where's Waldo" image, that would normally make Waldo hard to find.

nerdsniper 11 hours ago||

There seemed to be more space around the raccoon than most other subjects. Zoomed out it appears as almost a “halo” highlighting the raccoon.

prmoustache 5 hours ago||||

Funny how it can look convincing from far away but once you zoom in you find out most characters have a mix of leprosy and skin cancer.

wewtyflakes 13 hours ago||||

A startling number of people either have no arms, one arm, a half of an arm, or a shrunken arm; how odd!

rattlesnakedave 10 hours ago|||

To be fair, the average person has fewer than two arms.

cozzyd 7 hours ago|||

Most people have an ARM in their pockets, nowadays. And possibly on their wrist.

floodfx 10 hours ago|||

Haha. Underrated comment!

ehnto 5 hours ago||||

There id a leg that sprouts into part of bush, perhaps that's where people's legs are disappearing to.

cozzyd 10 hours ago||||

This is why they're congregating around the first aid and the lost and found

globular-toast 7 hours ago|||

Finding the raccoon was instant. Finding all the weird AI artifacts is more fun. It's quite fascinating really. As usual it looks impressive at a glance but completely falls apart on closer inspection. I also didn't find any jokes, unless maybe the bridge to nowhere or finger posts pointing both ways counts?

davebren 16 hours ago||||

The faces...that's nice that it turned a kid's book into an abomination

Filligree 12 hours ago|||

By image generation standards this is a ridiculously good result. No surprise that people instantly find the new limits, but they are new limits.

globular-toast 6 hours ago|||

But it's also straight up plagiarism and still ridiculously bad on so many levels.

davebren 11 hours ago|||

It could already copy the art styles from its training data, what is the advancement here?

vaulstein 8 hours ago||||

It's interesting that the raccoon is well defined because it was a part of the request. But none of the other Fauna are.

keithnz 10 hours ago|||

it's interesting, zoomed out it kind of looks ok, zoomed in.... oh my.

jdironman 9 hours ago||||

The real NFTs where the images we generated along the way

louiereederson 16 hours ago||||

The people in this image remind me of early this person does not exist, in the best way

dfee 14 hours ago||

fair point, also "this raccoon does not exist"

gpt5 13 hours ago||||

I tried it on the ChatGPT web UI and it also worked, although the ham radio looks like a handbag to me.

https://postimg.cc/wyxgCgNY

luxpir 6 hours ago|||

Nice, enjoyed the image as someone who has been to the events. But also easy raccoon placement :)

djmips 7 hours ago|||

mmmm yummy OSLS?

mirekrusin 11 hours ago||||

Can it generate non halloween version though?

This lower-is-better danse macabre, nightmares inducing ratio feels like interesting proxy for models capability.

ireadmevs 16 hours ago||||

I found it on the 2nd image! On the 1st one not yet...

dzhiurgis 8 hours ago||||

Cost me < 1 cents - https://elsrc.com/elsrc/waldo/wojak.jpg

And this medium quality, high resolution https://elsrc.com/elsrc/waldo/10_wojaks.jpg was 13cents

p.s. aaaand that's soft launch my SaaS above, you can replace wojak.jpg with anything you want and it will paint that. It's basically appending to prompt defined by elsrc's dashboard. Hopefully a more sane way to manage genai content. Be gentle to my server, hn!

Barbing 9 hours ago|||

>I think that image cost 40 cents.

Kinda made me sad assuming the author didn't license anything to OpenAI.

I recognize it could revert (99% of?) progress if all the labs moved to consent-based training sets exclusively, but I can't think of any other fair way.

$.40 does not represent the appropriate value to me considering the desirability of the IP and its earning potential in print and elsewhere. If the world has to wait until it’s fair, what of value will be lost? (I suppose this is where the big wrinkle of foreign open weight models comes in.)

rafram 8 hours ago|||

License what? The concept of a hidden object search? The only stylistic similarity here is the viewing angle. Where’s Waldo comics are flat, brightly colored line drawings that look nothing like this at all.

Barbing 7 hours ago||

Well, I recognized the style from even the new physical books on sale today, but I don’t know art well enough to use a term like flat.

I am not an art expert but I’m perhaps a reasonable consumer and there is possibility of confusion if someone sells AI Where’s Waldo knockoff books at the dollar store, maybe until I take a closer look.

makira 17 hours ago|||

> though the problem with Where's Waldo tests is that I don't have the patience to solve them for sure

I see an opportunity for a new AI test!

vunderba 16 hours ago|||

There have already been several attempts to procedurally generate Where’s Waldo? style images since the early Stable Diffusion days, including experiments that used a YOLO filter on each face and then processed them with ADetailer.

It's a difficult test for genai to pass. As I mentioned in a different thread, it requires a holistic understanding (in that there can only be one Waldo Highlander style), while also holding up to scrutiny when you examine any individual, ordinary figure.

simonw 17 hours ago|||

I've actually been feeding them into Claude Opus 4.7 with its new high resolution image inputs, with mixed results - in one case there was no raccoon but it was SURE there was and told me it was definitely there but it couldn't find it.

halamadrid 8 hours ago|||

Really hard to look at these images given how not human like the humans are. A few are ok, but a lot are disfigured or missing parts and its hard to find a raccoon in here.

vova_hn2 11 hours ago|||

Thanks for the image, I will see their faces in my nightmares.

vunderba 11 hours ago|||

This happens all too frequently when you ask a GenAI model to create an image with a large crowd especially a “Where’s Waldo?” style scenes, where by definition you’re going to be examining individual faces very closely.

hackable_sand 9 hours ago|||

What about the faces of the people ChatGPT killed?

marricks 12 hours ago|||

Like... this has things that AI will seemingly always be terrible at?

At some point the level of detail is utter garbo and always will be. An artist who was thoughtful could have some mistakes but someone who put that much time into a drawing wouldn't have:

- Nightmarish screaming faces on most people

- A sign that points seemingly both directions, or the incorrect one for a lake and a first AID tent that doesn't exist

- A dog in bottom left and near lake which looks like some sort of fuzzy monstrosity...

It looks SO impressive before you try to take in any detail. The hand selected images for the preview have the same shit. The view of musculature has a sternocleidomastoid with no clavicle attachment. The periodic table seems good until you take a look at the metals...

We're reconfiguring all of our RAM & GPUs and wasting so much water and electricity for crappier where's Waldos??

p1esk 11 hours ago||

AI will seemingly always be ...

You do realize that the whole image generation field is barely 10 years old?

I remember how I was able to generate mnist digits for the first time about 10 years ago - that seemed almost like magic!

pants2 16 hours ago|||

The second 4K image definitely has a raccoon on the left there! Nice.

nerdsniper 11 hours ago|||

That is a devilishly difficult prompt for current diffusion tasks. Kudos.

ritzaco 17 hours ago|||

haha took me a while to notice that one of the buildings is labelled 'Ham radio'

ElFitz 16 hours ago|||

Damn. There’s a fun game app to make here ^^

dymk 12 hours ago||

Is there? The moment you look closely at the puzzle (which is... the whole point of Where's Waldo), you notice all the deformities and errors.

ElFitz 6 hours ago|||

Yes, it’s not there yet. But nothing unsolvable. First thing that comes to mind would be generating smaller portion at the same resolution, then expand through tiling (although one might need to use another service & model for this), like we used to do with Stable Diffusion years ago.

Another option would be generating these large images, splitting them into grids, and using inpainting on each "tile" to improve the details. Basically the reverse of the first one.

Both significantly increase costs, but for the second one having what Images 2.0 can produce as an input could help significantly improve the overall coherence.

amelius 4 hours ago|||

Yes sounds more like a fun research project instead.

arealaccount 17 hours ago|||

I see the raccoon

tptacek 17 hours ago||

5.4 thinking says "Just right of center, immediately to the right of the HAM RADIO shack. Look on the dirt path there: the raccoon is the small gray figure partly hidden behind the woman in the red-and-yellow shirt, a little above the man in the green hat. Roughly 57% from the left, 48% from the top."

(I don't think it's right).

ritzaco 17 hours ago||

I tried

> please add a giant red arrow to a red circle around the raccoon holding a ham radio or add a cross through the entire image if one does not exist

and got this. I'm not sure I know what a ham radio looks like though.

https://i.ritzastatic.com/static/ffef1a8e639bc85b71b692c3ba1...

jackpirate 17 hours ago|||

Also, the racoon it circled isn't in the original.

Aurornis 16 hours ago|||

I love how perfectly this captures the difficulties of using generative AI for detection tasks.

jetbalsa 11 hours ago||

Oh god yes, I've been trying to make a LLM Assisted Magic the Gathering card scanner... its been a hell of a time trying to get it to just OCR card names well....

what 9 hours ago||

Why would you use an LLM for OCR?

angiolillo 17 hours ago|||

Indeed. I suppose one way to ensure you can find Waldo in any image is to add it yourself.

simonw 16 hours ago||||

That's excellent. I added it to my post: https://simonwillison.net/2026/Apr/21/gpt-image-2/#update-as...

davecahill 10 hours ago|||

hilarious - i tried and got the same thing.

there was a very large bear in the first image; when asked to circle the raccoon it just turned the bear into a giant raccoon and circled it.

vunderba 16 hours ago||

OpenAI’s gpt-image-1.5 and Google’s NB2 have been pretty much neck and neck on my comparison site which focuses heavily on prompt adherence, with both hovering around a 70% success rate on the prompts for generative and editing capabilities. With the caveat being that Gemini has always had the edge in terms of visual fidelity.

That being said, gpt-image-1.5 was a big leap in visual quality for OpenAI and eliminated most of the classic issues of its predecessor, including things like the “piss filter.”

I’ll update this comment once I’ve finished running gpt-image-2 through both the generative and editing comparison charts on GenAI Showdown.

Since the advent of NB, I’ve had to ratchet up the difficulty of the prompts especially in the text-to-image section. The best models now score around 70%, successfully completing 11 out of 15 prompts.

For reference, here’s a comparison of ByteDance, Google, and OpenAI on editing performance:

https://genai-showdown.specr.net/image-editing?models=nbp3,s...

And here’s the same comparison for generative performance:

https://genai-showdown.specr.net/?models=s4,nbp3,g15

UPDATES:

gpt-image-2 has already managed to overcome one of the so‑called “model killers” on the test suite: the nine-pointed star.

Results are in for the generative (text to image) capabilities: Gpt-image-2 scored 12 out of 15 on the text-to-image benchmark, edging out the previous best models by a single point. It still fails on the following prompts:

- A photo of a brightly colored coral snake but with the bands of color red, blue, green, purple, and yellow repeated in that exact order.

- A twenty-sided die (D20) with the first twenty prime numbers (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71) on the faces.

- A flat earth-like planet which resembles a flat disc is overpopulated with people. The people are densely packed together such that they are spilling over the edges of the planet. Cheap "coastal" real estate property available.

All Models:

https://genai-showdown.specr.net

Just Gpt-Image-1.5, Gpt-Image-2, Nano-Banana 2, and Seedream 4.0

https://genai-showdown.specr.net?models=s4,nbp3,g15,g2

m_kos 9 hours ago||

Very useful website. Would you have insight into what models are best at editing existing images?

I often have to make very specific edits while keeping the rest of the image intact and haven't yet found a good model. These are typically abstract images for experiments.

I asked gpt-image-2 to recolor specific scales of your Seedream 4 snake and change the shape of others. It did very poorly.

vunderba 9 hours ago||

OpenAI actually has really good adherence, but occasionally tends to introduce its own almost equivalent of "tone mapping", making hyper-localized edits frustrating.

I don’t know how much work it is for you, but one thing a lot of people do, myself included, is take the original image, make a change to it using something like NB, then paste that as the topmost layer in something like Krita/Pixelmator. After that, we’ll mask and feather in only the parts we actually want to change. It doesn’t always work if it changes the overall color balance or filters out certain hues, it can be a real pain but it does the job in some cases.

The Flux models (like Kontext) are actually surprisingly good at making very minimal changes to the rest of the image, but unfortunately their understanding of complex prompts is much weaker than the closed, proprietary models.

I will say that I’ve found Gemini 3.0 (NB Pro) does a relatively decent job of avoiding unnecessary changes - sometimes exceeding the more recent NB2, and it scored quite well on comparative image-editing benchmarks.

https://genai-showdown.specr.net/image-editing

m_kos 8 hours ago||

Thanks. I will try this! I need to read up on how to work with vision models for both generation and understanding.

VladVladikoff 9 hours ago|||

Why does Gemini 3.1 get a pass for the same reasons they got image 2 gets a fail on the flat earth one? Gemini has all sorts of random body parts and limbs etc.

vunderba 9 hours ago||

That's a mistake~ None of the models successfully passed the Flat Earth composition test. I've updated the passing criteria to be more explicit as well. Thanks for catching that!

CamperBob2 10 hours ago|||

It'd be interesting if you could add HunyuanImage-3 to the competition. It's better than Z-Image at almost everything I've thrown at it.

It can be (slowly) run at home, but needs 96GB RTX 6000-level hardware so it is not very popular.

vunderba 10 hours ago||

I’ll have to give it another try. Its predecessor, Hunyuan Image 2.0, scored pretty poorly when I tested it last year: 2 out of 15, so it'll be interesting to see how much it has improved.

Here's ZiT, Gpt-Image-2, and Hunyuan Image 2 for reference:

https://genai-showdown.specr.net/?models=hy2,g2,zt

Note: It won't show up in some of the newer image comparisons (Angelic Forge, Flat Earth, etc) because it's been deprecated for a while but in the tests where it was used (Yarrctic Circle, Not the Bees, etc.) it's pretty rough.

CamperBob2 9 hours ago||

It does quite a bit better than 2.0, I think. Or at least it may be stylistically different enough to justify a rematch against the others.

Ring toss: https://i.imgur.com/Zs6UNKj.png (arguably a pass)

9-pointed star: https://i.imgur.com/SpcSsSv.png (star is well-formed but only has 6 points)

Mermaid: https://i.imgur.com/R6MbMPX.png (fail, and I can't get Imgur to host it for some reason even though it's SFW)

Octopus: https://i.imgur.com/JTVH7xy.png (good try, almost a pass, but socks don't cover the ends of all the tentacles)

Above are one-shot attempts with seed 42.

vunderba 8 hours ago||

> https://i.imgur.com/6NXpI2q.png

You're killing me Smalls. This one is a 404. I'm really curious what it actually showed.

That ring toss is definitely leagues better than its predecessor. I’m not going to fault it too much for the star though, that one is an absolute slate wiper. The only locally hostable model that ever managed it for me was the original Flux, and I’m still not entirely convinced it wasn’t a fluke. Despite getting twice as many attempts, Flux 2, a much larger model, couldn’t even pull it off.

CamperBob2 8 hours ago||

Yeah, I suspect you'd see some solid passing scores if you ran it as many times as some of the others.

For the mermaid, https://i.imgur.com/R6MbMPX.png sometimes seems to work but not consistently. It is probably triggering a porn filter of some kind. I need to find another free image host, as imgur has definitely jumped the shark.

The image shows a mermaid of evident Asian extraction lying on a beach, face down. There is a dolphin lying on top of her, positioned at a 90-degree angle. It doesn't show any interaction at all, so a definite fail.

vunderba 7 hours ago||

I still use Imgur from time to time just because it’s convenient, but I’ve been meaning to build an Imgur-style extension for my site for a while, something that would let me drag and drop media for quick sharing but it being Astro-based (static site generation) makes it tricky.

what 9 hours ago||

Where can I see the actual prompts and follow ups you fed each model?

vunderba 9 hours ago||

So the prompts are tuned and adjusted on a per-model basis. If you look at the number of attempts, each receives a specific prompt variation depending on the model. This honestly isn't as much of an issue these days because SOTA models natural language parsing (particularly the multimodal ones) has eliminated a lot of the byzantine syntax requirements of the SD/SDXL days.

The template prompt seen in each comparison gets adjusted through a guided LLM which has fine-tuned system prompts to rewrite prompts. The goal is to foster greater diversity while preserving intent, so the image model has a better chance of getting the image right.

Getting to your suggestion for posting all the raw prompts, that's actually a great idea. Too bad I didn't think about it until you suggested it. And if you multiply it out - there's 15 distinct test cases against 22 models at this point, each with an average of about 8 attempts so we’re talking about thousands of prompts many of which are scattered across my hard drive. I might try to do this as a future follow-up.

what 8 hours ago||

Shouldn’t every model get the same prompt? Seems a bit weird, especially when you can’t see the prompts that were used.

vunderba 7 hours ago||

The goal isn’t the prompt itself. The test is whether a prompt can be expressed in such a way that we still arrive at the author's intent, and of course to do so in a way that isn't unnatural.

The prompts despite their variation are still expressed in natural language.

The idea is that if you can rephrase the prompt and still get the desired outcome, then the model demonstrates a kind of understanding; however more variation attempts also get correspondingly penalized: this is treated more as a failure of steering, not of raw capability.

An example might help - take the Alexander the Great on a Hippity-Hop test case.

The starter prompt is this: "A historical oil painting of Alexander the Great riding a hippity-hop toy into battle."

If a model fails this a couple of times (multiple seeds), we might use a synonym for a hippity-hop, it was also known as a space hopper.

Still failing? We might try to describe the basic physical appearance of a hippity-hop.

Thus, something like GPT-Image-2 scored much higher on the compliance component of the test, requiring only a single attempt, compared with Z-Image Turbo, which required 14 attempts.

ea016 17 hours ago||

Price comparison:

GPT Image 2

  Low     : 1024×1024 $0.006 | 1024×1536 $0.005 | 1536×1024 $0.005

  Medium  : 1024×1024 $0.053 | 1024×1536 $0.041 | 1536×1024 $0.041

  High    : 1024×1024 $0.211 | 1024×1536 $0.165 | 1536×1024 $0.165

GPT Image 1

  Low     : 1024×1024 $0.011 | 1024×1536 $0.016 | 1536×1024 $0.016

  Medium  : 1024×1024 $0.042 | 1024×1536 $0.063 | 1536×1024 $0.063

  High    : 1024×1024 $0.167 | 1024×1536 $0.25  | 1536×1024 $0.25

Melatonic 16 hours ago||

Weird that they restrict the resolution so much. Does it fall apart with more detail (when zoomed in) or does the cost just skyrocket?

vunderba 16 hours ago|||

It's usually based on what they've been trained on. There aren't very many models that'll do higher resolutions outside of Seedream but adherency is worse.

_the_inflator 14 hours ago|||

Processing power, not training. The larger the scene in 2ď the more you need to compute. The resolution itself is not flexible. Imagine painting a white canvas. It is still a pixel per pixel algo which costs LLM GPU power while being the easiest thing to do without it.

You can create larger images by creating separate parts you recombine. But they may not perfectly match their borders.

It is a Landau thing not a trading thing. The idea of LLM is to work on the unknown.

vunderba 13 hours ago||

It depends on the model. Diffusion models, which are among the more popular approaches, are typically trained at a specific image resolution.

For example, SDXL was trained on 1MP images, which is why if you try to generate images much larger than 1024×1024 without using techniques like high-res fixes or image-to-image on specific regions, you quickly end up with Cthulhu nightmare fuel.

nomel 14 hours ago|||

Need a model trained on closeup/macro shots of everything, to use for upscaling, then run that, as a kernel, over the whole image.

Melatonic 5 hours ago||

Exactly what I was thinking

dsrtslnd23 5 hours ago||||

actually gpt-image-2 is VERY flexible with the resolution. You can use arbitrary resolution within the max pixel budget.

ModernMech 8 hours ago||||

Generate a lower resolution image and upscale to the resolution you need.

al_borland 14 hours ago|||

[dead]

ComputerGuru 9 hours ago|||

It can generate 3840x2160

lxgr 14 hours ago||

Interesting, I wonder why larger outputs are more expensive than smaller square ones on v2, while it’s the other way around in v1.

neom 13 hours ago||

Here is my regular "hard prompt" I use for testing image gen models:

"A macro close-up photograph of an old watchmaker's hands carefully replacing a tiny gear inside a vintage pocket watch. The watch mechanism is partially submerged in a shallow dish of clear water, causing visible refraction and light caustics across the brass gears. A single drop of water is falling from a pair of steel tweezers, captured mid-splash on the water's surface. Reflect the watchmaker's face, slightly distorted, in the curved glass of the watch face. Sharp focus throughout, natural window lighting from the left, shot on 100mm macro lens."

google drive with the 2 images: https://drive.google.com/drive/folders/1-QAftXiGMnnkLJ2Je-ZH...

Ran a bunch both on the .com and via the api, none of them are nearly as good as Nano Banana.

(My file share host used to be so good and now it's SO BAD, I've re-hosted with them for now I'll update to google drive link shortly)

jcattle 5 hours ago||

I mean, your prompt is basically this skit: https://www.youtube.com/watch?v=BKorP55Aqvg ("The Expert" 7 red lines: all strictly perpendicular, some with green ink some with transparent ink)

I couldn't imagine the image you were describing. I've listed some of the red lines with green ink I've noticed in your prompt:

Macro Close Up - Sharp throughout

Focus on tiny gear - But also on tweezers, old watchmakers hand, water drop?

Work on the mechanism of the watch (on the back of the watch) - but show the curved glass of the watch face which is on the front

This is the biggest. Even if the mechanism is accessible from the front, you'd have to remove the glass to get to it. It just doesn't make sense and that reflects in the images you get generated. There's all the elements, but they will never make sense because the prompt doesn't make sense.

fc417fc802 5 hours ago|||

The last point (reflection by front glass versus mechanism access so no front glass) is the only issue I see with it. Other than that I can easily visualize an image that satisfies the prompt. I think that the general idea is a good one because it's satisfable while having multiple competing requirements that impose geometric constraints on the scene without providing an immediate solution to said constraints as well as requiring multiple independent features (caustics, reflections, fluid dynamics, refraction, directional lighting) that are quite complicated to get right.

To illustrate that there aren't any contradictions (other than the final bit about the reflection in the glass). Consider a macro shot showing partial hands, partial tweezers, and pocket watch internals. That's much is certainly doable. Now imagine the partial left hand holding a half submerged pocket watch, fingertips of right hand holding front half of tweezers that are clasping a tiny gear, positioned above the work piece with the drop of water falling directly below. Capture the watchmaker's perspective. I could sketch that so an image model capable of 3D reasoning should have no trouble.

It's precisely the sort of scene you'd use to test a raytracer. One thing I can immediately think to add is nested dielectrics. Perhaps small transparent glass beads sitting at the bottom of the dish of water with the edge of the pocket watch resting on them, make the dish transparent glass, and place the camera level with the top of the dish facing forward?

https://blog.yiningkarlli.com/2019/05/nested-dielectrics.htm...

A second thing I can think to add is a flame. Perhaps place a tealight candle on the far side of the dish, the flame visible through (and distorted by) the water and glass beads?

jcattle 4 hours ago||

Without the last point with the watch glass it is also easier to imagine for me. Still, you'd have to be selective.

Do you want it to actually look like macro photography (neither of the generated images do)? Then you can't have it sharp throughout and you won't be able to show the (sharp) watchmakers face in a reflection because it would be on a different focal plane.

Dropping the macro requirement, you can show a lot more. You can show that the watchmaker is actually old, you can show the reflection, etc.

Something has to give in the prompt, on multiple of the requirements. The generated images are dropping the macro requirement and are inventing some interesting hinging watch glass contraptions to make sense of it.

fc417fc802 4 hours ago||

Yeah, fair enough. I figure "macro" sees sufficiently loose use that a model should be able to make sense of it but to get the prompt into perfect shape that ought to be replaced with something like "a closeup showing X, Y, Z in perfect focus". Still the only real problem I see is the aforementioned contradiction regarding the front glass. Short of that single detail an artist could easily satisfy the description as written to well within reason.

neom 2 hours ago|||

Yeah I dunno bud, I have a degree in film and three Emmy awards for technical production (an expert), I could shoot that prompt (unlike the so called "expert" in the skit). Canon EF 100mm Macro USM at f32 should be able to produce that, focus doesn't need to imply aperture, and a quick google search shows me there are loads of front gear pocket watches available. Also it produced something very clearly not shot with a 100mm anyway, as the telephoto compression is wrong.

jcattle 1 hour ago||

Yeah I dunno bud, I've watched a few watch repair videos on youtube and have seen macro photography which other people did.

Sure there are pocket watches where the movement is visible from the front (you'd still likely service them from the back, but alas). Even if you'd do service from the front where the glass is, you'd still have to remove it to drop in a gear.

Anyway, I think that we aren't really talking about the same thing. I'm nitpicking your prompt while you constructed it to mostly see the performance of the model in novel situations and difficult lighting and refraction environments. And that's fair.

How satisfied are you with the generated image results? What would you do different when shooting this proposed scene yourself?

neom 1 hour ago||

Reasonable people can disagree - I think you made some good points, I've been sitting for the last 20 minutes wondering where the DoF at 32 on a 100 runs out, maybe you're right I'm not 100% sure.

The prompt I did mostly to see how it does with the gears and the tweezers, and the perspective of the gears (do they.. I don't know the opposite word of distort, straighten?, but do they seem like they're actually round, could they work?) I think those are really hard things for AI, the glass distortion, reflections the DoF etc were just to see how it approached that, and like the other comment below said, I tried to pick something that that wasn't likely to be in training data, so it reasoned about it more.

Nano was able to spit it out consistently, Images 2 really struggles, and has yet to complete one I was satisfied with, whereas with nano it nails it almost every time, the 2 images I showed originally are the first shot of the prompt with the models. (here are the 3 other gens from Images2: https://drive.google.com/drive/folders/1s8gik_x0B-xDZO6rOqoz...)

How would I shoot it? I wouldn't, fixing a watch in water is a dumb idea. ;)

rrr_oh_man 12 hours ago|||

Why would you consider this a good prompt?

brynnbee 10 hours ago|||

My observations have been that image generation is especially challenged when asked to do things that are unusual. The fewer instances of something happening it has to train on, the worse it tends to be. Watch repair done in water fits that well - is there a single image on the internet of someone repairing a watch that is partially submerged in water? It also tends to be bad at reflections and consistency of two objects that should be the same.

the_lucifer 12 hours ago|||

Looks like your image host has rate limited viewing the shared images, wanted to give you a heads up

neom 12 hours ago||

Thanks, I need to get off Zight, they used to be such an nice option for fast file share but they've really suffered some of the worst enshittification I've seen yet.

pb7 12 hours ago||

Links are broken.

waynesonfire 12 hours ago||

So.. sign up. "Get Sight for free". Ads everywhere bro.

ghstinda 5 minutes ago||

Humans have a new tool to make porn.

swalsh 14 hours ago||

Been using the model for a few hours now. I'm actually reall impressed with it. This is the first time i've found value in an image model for stuff I actually do. I've been using it to build powerpoint slides, and mockups. It's CRAZY good at that.

johnwheeler 11 hours ago|

Yeah, it's funny. I would expect to see more enthusiasm versus just basic run-of-the-mill, "oh, there it is". Leave it to the HN crowd. This is incredible. I don't even like OpenAI.

pembrook 4 hours ago||

HN is engineer heavy so its a bunch of people who spend their days looking at code. If it's not a coding model they'll likely never use it.

To the average HN'er, images and design are superfluous aesthetic decoration for normies.

And for those on HN who do care about aesthetics, they're using Midjourney, which blows any GPT/Gemini model out of the water when it comes to taste even if it doesn't follow your prompt very well.

The examples given on this landing page are stock image-esque trash outside of the improvements in visual text generation.

madrox 13 hours ago|

This seems like a great time to mention C2PA, a specification for positively affirming image sources. OpenAI participates in this, and if I load an image I had AI generate in a C2PA Viewer it shows ChatGPT as the source.

Bad actors can strip sources out so it's a normal image (that's why it's positive affirmation), but eventually we should start flagging images with no source attribution as dangerous the way we flag non-https.

Learn more at https://c2pa.org

debazel 10 hours ago||

> but eventually we should start flagging images with no source attribution as dangerous the way we flag non-https.

Yes, lets make all images proprietary and locked behind big tech signatures. No more open source image editors or open hardware.

henry-j 9 hours ago|||

C2PA is actually an open protocol, à la SMTP. the whole spec is at https://spec.c2pa.org/, available for anyone to implement.

debazel 2 hours ago||

The standard itself being open is irrelevant. I'm not sure why this is always brought up for attestation standards. It is fundamentally impossible to trust the signature from open-source software or hardware, so a signature from open-source software is essentially the same as no signature.

The need for a trusted entity is even mentioned in your specification under the "attestation" section: https://spec.c2pa.org/specifications/specifications/1.4/atte...

So now, if we were to start marking all images that do not have a signature as "dangerous", you would have effectively created an enforcement mechanism in which the whole pipeline, from taking a photo to editing to publishing, can only be done with proprietary software and hardware.

Melatonic 5 hours ago|||

Why would the image itself have to be proprietary to have some new piece of metadata attached to it ?

mdasen 12 hours ago|||

> Bad actors can strip sources out

I think the issue is that it's not just bad actors. It's every social platform that strips out metadata. If I post an image on Instagram, Facebook, or anywhere else, they're going to strip the metadata for my privacy. Sometimes the exif data has geo coordinates. Other times it's less private data like the file name, file create/access/modification times, and the kind of device it was taken on (like iPhone 16 Pro Max).

Usually, they strip out everything and that's likely to include C2PA unless they start whitelisting that to be kept or even using it to flag images on their site as AI.

But for now, it's not just bad actors stripping out metadata. It's most sites that images are posted on.

henry-j 9 hours ago|||

There’s actually a part of the NY state budget right now (TEDE part X, for my law nerds) that’d require social media companies to preserve non-PII provenance metadata and surface it to the user, if the uploaded image has it.

linkedin already does this--- see https://www.linkedin.com/help/linkedin/answer/a6282984, and X’s “made with ai” feature preserves the metadata but doesn’t fully surface it (https://www.theverge.com/ai-artificial-intelligence/882974/x...)

madrox 12 hours ago|||

You're implying social platforms aren't bad actors ;)

In seriousness, social platforms attributing images properly is a whole frontier we haven't even begun to explore, but we need to get there.

woadwarrior01 13 hours ago|||

Yeah, OpenAI has been attaching C2PA manifests to all their generated images from the very beginning. Also, based on a small evaluation that I ran, modern ML based AI generated image detectors like OmniAID[1] seem to do quite well at detecting GPT-Image-2 generated images. I use both in an on-device AI generated image detector that I built.

[1]: https://arxiv.org/abs/2511.08423

paradoxyl 6 hours ago||

What a dystopian, pro-tyranny ask. Horrifying.

More comments...