Posted by participant3 2 days ago
Now, what if I get the highest fidelity speakers and the highest fidelity microphone I can and play that song in my home. Then I use a deep learned denoiser to clean the signal and isolate the song’s true audio. Is this theft?
The answer does not matter. The genie is out of the bottle.
There’s no company like Napster to crucify anymore when high quality denoising models are already prior art and can be grown in a freaking Jupyter notebook.
You want to generate photos of copyrighted characters? Go for it. But OpenAI is making money off of that and that's the issue.
It seems like they made an effort to stop it, but their product is designed in such a way that doing so effectively is a sisyphean task.
> Now, what if I get the highest fidelity speakers and the highest fidelity microphone I can and play that song in my home. Then I use a deep learned denoiser to clean the signal and isolate the song’s true audio. Is this theft?
If the answer to this becomes "yes" for some motion down this spectrum, then it seems to me that it's tantamount to prohibiting general-purpose computing.
If you can evaluate any math of your fancy using hardware that you own, then indeed you can run this tooling, and indeed your thoughts can be repaired into something closely resembling the source material.
What if I want to prompt:
"An image of an archeologist adventurer who wears a hat and uses a bullwhip, make sure it is NOT Indiana Jones."
One way or another, you (and the model) do need to know who Indiana Jones is.
After that, the moral and legal choices of whether to generate the image, and what to do with it, are all yours.
And we might not agree on what that is, but you do get the choice
Are you able to link to or write out your reasoning (however concisely?).
Is your view here legal, ethical, and/or vibes based? Each can can be interesting!
We shouldn't complain about AI holding a mirror up to our world and noting "you guys love Indiana Jones a lot. Here's a picture inspired by his appearance, based on your generic prompt that I'm guessing is a nod to the franchise."
The AI is a step ahead of your unsubtle attempts to "catch it stealing".
The image of Indiana Jones is not "ready for market" when it emerges from your prompt. Just like Googling "Indy with whip", the images that emerge are not a commercial opportunity for you.
When you make multi-billion dollar movies with iconic characters, expect AI to know what they look like and send them your way if your prompt is painfully obvious in its intent.
The photo was of poor quality, but one could certainly see all the features - so I figured, why not let ChatGPT try to play around with it? I got three different versions where it simply tried to upscale it, "enhance" it. But not dice.
So I just wrote the prompt "render this photo as a hyper realistic photo" - and it really did change us - the people in the photo - it also took the liberty to remove some things, alter some other background stuff.
It made me think - I wonder what all those types of photos will be like 20 years from now, after they've surely been fed through some AI models. Imagine being some historian 100 years from now, trying to wade through all the altered media.
I've had much better luck with models specifically trained for denoising. For denoising, the SCUNet model run via chaiNNer works well for me most of the time. (Occasionally SCUNet likes to leave noise alone in areas that are full of background blur, which I assume has to do with the way the image gets processed as tiles. It would make sense for the model to get confused with a tile that only has background blur, like maybe it assumes that the input image should contain nonzero high-frequency data.)
For your use case, you might want to use something like Real-ESRGAN or another superresolution / image restoration model, but I haven't played much in that space so I can't make concrete recommendations.
Never use the words "hyper realistic" when you want a photo. It makes no sense and misleads the generator. No one would describe a simple photograph as "hyper realistic," not a single real photo in the dataset will be tagged as "(hyper) realistic."
Hyperrealism is an art style and only ever used in the context of explicitely non-photographic artworks.
Generative AI exposes how broken copyright law is, and how much reform is needed for it to serve either it's original or perverted purpose.
I would not blame generative AI as much as I would blame the lack of imagination, forethought and indeed arrogance among lawmakers, copyright lobbyists and even artists to come up with better definitions of what should have been protected.
Or, (2) LLMs are creative and do have agency, and feeding them bland prompts doesn't get their juices flowing. Copyright isn't a concern, the model just regurgitated a cheap likeness of Indiana Jones as Harrison Ford the world has seen ad nauseam. You'd probably do the same thing if someone prompted you the same way, you lazy energy conserving organism you.
In any case, perhaps the idea "cheap prompts yield cheap outputs" holds true. You're asking the model respond to the entirely uninspired phrase: "an image of an archeologist adventurer who wears a hat and uses a bullwhip". It's not surprising to me that the model outputs a generic pop-culture-shaped image that looks uncannily like the most iconic and popular rendition of the idea: Harrison Ford.
If you look at the type of prompts our new generation of prompt artists are using over in communities like Midjourney, a cheap generic sentence doesn't cut it.
"An image of an Indian female archeologist adventurer who wears a hat and uses a bullwhip" (https://sora.com/g/gen_01jqzet1p8fjaa808bmqnvf7rk)
"An image of a fat Russian archeologist adventurer who wears a hat and uses a bullwhip" (https://sora.com/g/gen_01jqzfk727erer98a6yexafe70)
"An image of a skeletal archeologist adventurer who wears a hat and uses a bullwhip" (https://sora.com/g/gen_01jqzfnaz6fgqvgwqw8w4ntf6p)
Or, give ChatGPT a starting image. (https://sora.com/g/gen_01jqzf7vdweg4v5198aqfynjym)
And by further remixing the images ChatGPT produces, you can get your images to be even more unique. (https://sora.com/g/gen_01jqzfzmbze0wa310m42f8j5yw)
"An image of an archeologist adventurer who wears a hat and uses a bullwhip. He is wearing a top hat, a scarf, a knit jumper, and pink khaki pants. He is not wearing a bag" (https://sora.com/g/gen_01jqzkh4z2fqctzr9k1jsfnrhy)
Want to get rid of the pose? Add that the archeologist is "fun and joyous" to the prompt. (https://sora.com/g/gen_01jqzksmjgfppbv5p51hw0xrzn)
You have so much control, it is up to you to ask for something that is not a trope.
And the stereotypical meme "archeologist hat" is the pith helmet.
You can just ask for whatever changes you want.
Yes, as long as what you're asking for is Indiana Jones.
"A nerdy archaeologist adventurer in a pith helmet, with glasses and a backpack, stumbling his way through a green overgrown abandoned temple. Vines reach for his heels" (https://sora.com/g/gen_01jr0yd810e8xsenp85xy2g47f)
"A nerdy archaeologist adventurer in a pith helmet, with glasses and a backpack, nervously sneaking her way through a green overgrown abandoned temple. She is wearing pink khaki pants, and a singlet" (https://sora.com/g/gen_01jr0z837jecpa770v009bs1m3)
Is it as creative as good humans? Not at all. It definitely falls into tropes readily. But we can still inject novel ideas into our prompts for the AI, and get unique results. Especially if you draw sketches and provide those to the AI to work from.
The most creative person is someone who generates original, compelling work with no prompting at all. A very creative person will give you something amazing and compelling from a very small prompt. A so-so creative person will require more specific direction to produce something good. All the way down to the new intern who need paragraphs of specs and multiple rounds of revision to produce something usable. Which is about where the multi-billion-dollar AI seems to be?
Did Karin or her children ever see a ¥ from this adaptation on robbers ? https://en.wikipedia.org/wiki/Ronja,_the_Robber%27s_Daughter...
This image generation is a tool like any other tool. If the image generator generates an image of Mickey Mouse or if I draw Mickey Mouse by hand in photoshop, I can't use it commercially either way.
So what exactly is new or different here?
Indeed, this phenomenon among normal or true intelligences (us) is thought to be a good thing by copyright holders and is known as "brand recognition".
Intelligences -- the normal, biological kind -- are capable of copyright infringement. Why is it a surprise that artificial ones can help us do so was well?
This argument boils down to "oh no, a newly invented tool can be used for evil!". That's how new power works. If it couldn't be used for both good and evil, it's not really power, is it?
> I only have one image in mind when I hear “an archeologist adventurer who wears a hat and uses a bullwhip”.
> It would be unexpected and sort of amazing were the LLMs to come up with completely new images for the above prompts.
> Still, the near perfect mimicry is an uncomfortable reminder that AI is getting better at copying and closer to…something, but also a clear sign that we are a ways off from the differentiated or original reasoning/thinking that people associate with Artificial General Intelligence (AGI)