Posted by davidbarker 8 hours ago
Two what I could consider "interesting prompts" for image gen testing. Did pretty well.
"A macro close-up photograph of an old watchmaker's hands carefully replacing a tiny gear inside a vintage pocket watch. The watch mechanism is partially submerged in a shallow dish of clear water, causing visible refraction and light caustics across the brass gears. A single drop of water is falling from a pair of steel tweezers, captured mid splash on the water's surface. Reflect the watchmaker's face, slightly distorted, in the curved glass of the watch face. Sharp focus throughout, natural window lighting from the left, shot on 100mm macro lens." - Only major problem i could find at a glance is the clasps don't make sense probably, and the drop of water inside the watch on the cog doesn't make sense/cog mangled into tweezers.
"A candid photograph taken from behind an elderly woman sitting alone on a park bench in late autumn. She is gently resting one hand on the empty seat beside her, where a man's weathered flat cap and a folded newspaper sit untouched. Fallen golden leaves cover the path ahead. The low afternoon sun casts her long shadow alongside a second, fainter shadow that almost seems to be there, the suggestion of someone sitting next to her, visible only in the light on the ground. Muted, warm color palette, shallow depth of field on the background trees, photojournalistic style." - I don't know why but it internal errored twice on this one but then got there.
I use all those fancy image models editing capabilities for my fast fashion web shop. I must say: product photography for clothing and accessories product is dead. Those models are amazing at style transfering and garment transferring.
We will see how good will be Seedream 5.0 full version.
You can argue things like code generation are an extension of the engineer wielding it. Image generation just seems like a net negative overall if it’s used at scale.
Edit: By scale, I mean large corporations putting content in front of millions. I understand the appeal for smaller businesses where they probably weren’t going to pay an artist anyway.
When a company sends an email or docu-sign, they don’t want to pay a courier.
Technology supplements or replaces jobs, often reducing costs. This is no different.
It's an ethical conundrum because we're not paying anyone, but we don't have the money to pay anyone, and it's good enough for our budget.
But we're getting used to the process of changing a part of the text in a few seconds without any artist involved and for 0$.
I guess that soon we'll be able to create voice sample from know personalities for a few $ with prices based on the popularity of the artist and some sanity check based on the artist preferences.
My thought is the large corps that could afford it, still won’t because it’s a cost they don’t need to incur. For them it’s not even a moral conundrum.
Much like the star bellied sneetches, when the quality of some ad format becomes untethered from the cost of production and placement, then marketers will flock to some alternative.
YouTube influencers fill[ed] that niche for a while because content milling SEO spam and fake reviews is a lot more expensive if you present the results in video form with good production values. (Not sure how long that will be true, since AI is getting better at short-term video).
This is like the last mile for online presence. The average barber out here doesn't use Squarespace, barely knows how to use Facebook and doesn't touch GenAi. But they can still cut your hair pretty well - tech savvyness doesn't have a huge connection to business competence out here.
Average person won't notice, and would not care either way.
Things that would take me an hour or so the old way takes three minutes with NB.
But I can see this applying to small businesses. Something that some random person would have to spend on hour photoshopping can be done in a few minutes with NB.
Larian Studios most recently was under fire for this [1]. Like I can see a director going “what would X look like?” and then speeding over to the concept artists for a proper rendition if they liked it. I don’t think this is at scale though. Any large business is just going to get rid of the concept artists.
[1]: https://www.pcgamer.com/games/rpg/baldurs-gate-3-developer-l...
I'm torn on the scale thing. It definitely seems net negative. But I think we collectively underestimate just how deeply sick the existing thing already is. We're repulsed by image gen at scale because it breaks our expectation that images are at least somewhat based on reality, that they reflect the natural world or what we can really expect from a product, from a company, from the future. But that was already a bad expectation: when's the last time you saw a mcdonalds meal that looked like the advert? Or a sub-30$ amazon product that wasn't a complete piece of shit? Advertisements were already actively malicious fantasies to exploit the way our brains react to pictures. They're just fantasies that required whole teams of humans doing weird bullshit with lighting and photoshop, and I'm not sure that's much better. It was already slop. All the grieving we do about the loss of truth, or the extent to which corps will gleefully spray us with mind-breaking waterfalls of outright lies, I think those ships sailed a long time ago. The disgust, deceit, the rage we feel about genAI slop is the way we should have felt about all commercials since at least the 80s IMO.
This is a good point. My gut reaction is “well at least someone was paid to do it and can continue to keep society/the economy going ”.
I can see the other side where that’s a soulless job. Not sure what’s worse. Soulless job where your skills apply or even less jobs in a competitive industry.
You could easily say the same about anytime computers or robots or automation have taken a job away. We’ve been going down this road for decades.
Why can't Google, for example just call:
Gemini Image = Nano Banana
Gemini Video = Veo
...And not a (botched) fake white/gray grid background that is commonly used to visualize transparency?
I guess even Google is running out of GPUs.
My main use case is editing user uploads to enhance their clothing images. A large part of it is preserving logo, graphics and other technical details. I noticed over time it felt like Nano Banana has gotten worse at this.
I have a test set of graphic t-shirts that I noticed the model seeming getting worse with it. This combined with price and the terrible experience of their cloud console got me to migrate off.
EDIT: after significant prompting, it actually solved it. I think it's the first one to do so in my testing.
Pretty close to Gemini 3 Pro Image (aka Nano Banana Pro) in most benchmarks, even without thinking+search, and even exceeding it in 2 most important ones of 'Overall Preference' and 'Visual Quality'. I'm excited about the big jump in Infographics/Factuality (even without thinking+search; I'm surprised that text+image search grounding doesn't make an even bigger dent).