Posted by meetpateltech 3 days ago
It uses a prompt transmutation trick (convert the uploaded images into a textual description; can verify by viewing the description of the uploaded image) and the strength of Imagen 3's actually modern text encoder to be able to adhere to those long transmuted descriptions for Subject/Scene/Style.
> Enter your email to be notified when it becomes available
(Submit)
> We can't collect your emails at the moment
Easily circumventes with a VPN though, most just limit by location, not busy account data.
Yes the costs will get so low that there will be almost no barrier to making content but if there is no barrier to making content, the ROI will be massive, and so everyone will be doing it, you can more or less have the exact movie you want in your head on demand, and even if you want a bespoke movie from an artist with great taste and a point of view there will be 10,000 of them every year.
This is what Instagram and YouTube did and we got MrBeast and Kylie Jenner making billions of dollars. The cost of creating content is tapping record on your phone and the traditional "quality" as defined by visuals doesn't matter (see Quibi). Viral videos are selfies recorded in the bedroom.
When you lower the barrier to entry things get more heterogeneous, not less. So you have bigger outcomes, not smaller, because the playing field expands. TikTok's inside was built on surfacing the 1 good video from a pool of 10s of millions. The platforms that surface the best content will be even more important.
It's a little disheartening, I think, for people to think that the only reason they can't be creative is money, time, or technical skill, but in reality, it's just that they aren't that creative.
So yes, everyone can create content in a world of AI, but not everyone is a good content creator/director/artist (or has the vision), same as it is now.
He started out simple, pointing a phone camera at himself counting really high, but his current channel is not a great example of a low barrier to entry. He explicitly sets himself apart by doing what other youtube creators or TV shows simply can't do
That doesn't mean they aren't incredibly good at what they do and that millions (billions) of people have tried to do what they have and failed.
One of the reasons it's "common denominator crap" is because the blob of the internet has 100s of millions of videos copying MrBeast and the Jenner/Kardasians created an entire generation of people that wanted to be influencers. Most of the copies are Slop.
Once they are intrenched they can continue to produce "crap" as you call it because they have distribution, the copies don't work because they aren't novel, which makes people feel like it doesn't take talent and is the algorithms fault, until the next person to be "creative" gets distribution and the cycle repeats.
There is just a lot less creativity than people imagine. It's not a right that we all have as humans; it's rare. 8.2 billion people on earth, 365 days in a year, 3 trillion shots on goal, and only a few hundred novel discoveries, art creations, companies, and ideas come from it.
People are always out there tying to convince others that AI is better than humans at X. How close is it to being better than humans at being a content creator itself? Or how long before that threshold is crossed?
Even when AI is objectively better and dominates in blind ratings tests, there will still be a strong market for "authentic" media.
For instance we already have factories that churn out wares that are cheaper, stronger, better looking, and longer lasting than "hand made", yet people still seek out malformed $60 coffee mugs from the local artistan section in country shops.
For some content, say summer blockbusters the answer may just be that it is moderately entertaining way to spend some time. I expect AI may well be able to do reasonably well in this category, although what we find entertaining may well shift if the supply/demand curve shifts drastically enough. In other words, people may still pay to see a new action film even if it hasn't anything particularly new to say.
Then there is the more cerebral kind of art. Where there is an actual message that someone is trying to communicate to us. It's a form of argument, but not purely logical, but also aesthetical. I'm completely unconvinced that present day AI architectures will ever have something to say, purely because they lack agency, and so there isn't anyone there saying it to us.
Finally, there is the art that is entirely spiritual or internal. The whole point of that kind is the author baring their soul to us. Why on earth would anyone want a soulless machine barring their non-existent soul?
Another issue is quality. Most of these AI generators output quite blurry 720p. If you want proper 4K output, we're at least a couple of doublings away.
I think we will have some decent AI-generated animations next year, because 2D cartoons are relatively easy to upscale.
And yet Marvel exists.
Turns out in a world of infinite supply, value comes from story, character, branding, marketing and celebrity. Those factors in combination have very limited supply and people still pay up.
I don't see any reason why AI-gen video is any different.
Granted, 5 minutes isn't 1h30 but it's not a million miles away either.
I just watched Kitsune, thanks for sharing.
It reminds me why Flow was so good.
Flow was great because I could see the shader artifacts. It was the opposite of a Disney model, it was not polished and perfect.
That's why I loved it. Disney would never do a movie with a plot like Flow. They would write and rewrite it and it would be a perfect example of humanity, but totally devoid of the humanity behind it.
It is ironic that this new coming wave of AI generated (or AI assisted) films feel like they have more human craftmanship than Disney films, when honestly it is the opposite. Disney has incredible and brilliant animators, but that is all crushed behind the merchandising and gross behemoth of the Disney corporation.
I used to love seeing independent films. Those art house theaters really only exist in places like Portland, OR these days. But, I'm excited about the next wave of film because it'll permit small storytelling, and that's going to be great.
I've been a VideoFX tester, and have made a couple of five minute shorts. You end up having to generate a lot of shots that you throw away. This is a lot easier to bear if you are tester without really strict monthly limits, or having to pay to get past them.
Also, there are all sorts of things you have to juggle or sidestep related to character consistency and sound synchronization. They'll be also sorts of improvements there, but I suspect getting to 90 minutes isn't really a question of spending more time and generations. Right now I think a strong option for solo aspiring AI film makers is to work on a number of small projects, to master the art, and tackle longer projects when the tooling is better.
Given the dreck coming out of Hollywood, I'm open to that, even if other folks have to wade through a million shitty videos for me to get it.
Someone created that relatively coherent 5min animated story largely by communicating with a computer in natural language.
The masses have had plenty worse
This kind of rhetoric can best be summed up by one meme: "It's the children who are wrong"
Spouting off "unwashed masses" prose will only make people hate (snobs + critics + artists by proxy) more, if you're not willing to do your part and stop shooting down beginning attempts as "amateurish and cliche".
Actually say, **in words**, what directions & improvements can be made.
$36 million dollars and an Academy Award. A l m o s t done by just one person. And entirely with open source software.
The guy's previous movie was a true one-man show but didn't really get screenings: https://en.wikipedia.org/wiki/Away_(2019_film)
It is closer to one than number the staff of other animated films. It's a good data point to keep in mind as AI tools enable even smaller teams to do more.
This isn't solo dev game project.
Once industry adopts AI generation, which it will, a new law will be quickly signed.
In a way, not allowing copyright of AI material really only serves a tiny group of people. "We want to empower everyone to bring their ideas to market, not just those with the ability to draw them" is not a particularly evil or amoral sentiment.
Hollywood can barely get any well made movies past $100 million these days unless it's based on some well known franchise (minecraft, Captain America, Snow White) or it has some well known actor.
If we're talking regular people, the best chance would be someone like Andy Weir, blogging their way to a successful book, and working on the side on a video project. I wouldn't be surprised if something along these lines happens sooner or later.
Wouldn't it be possible to draw a rough sketch of a terrain, drop a picture of the character, draw a 3D spline for the walk path, while having a traditional keyframe style editor, and give certain points some keyframe actions (like character A turns on his flashlight at frame 60) - in short, something that allows minute creative control just like current tools do?
To train these models you need inputs and expected output. For text-image pairs there exists vast amounts of data (in the billions). The models are trained on text + noise to output a denoised image.
The dataset of sketch-image pairs are significantly smaller, but you can finetune an already trained text->image model using the smaller dataset by replacing the noise with a sketch, or anything else really, but the quality of the output of the finetuned model will highly depend on the base text->image model. You only need several thousand samples to create a decent (but not excellent) finetune.
You can even do it without finetuning the base model and training a separate network that applies on top of base text->image model weights, this allows you to have a model that essentially can wear many hats and do all kinds of image transformations without affecting the performance of the base model. These are called controlnets and are popular with the stable diffusion family of models, but the general technique can be applied to almost any model.
There are many workflows for using generative AI to adhere to specific functional requirements (the entire ComfyUI ecosystem, which includes tools such as LoRAs/ControlNet/InstantID for persistence) and there are many startups which abstract out generative AI pipelines for specific use cases. Those aren't fun, though.
Multi modality is new; you won’t have to wait too long until they can do what you’re describing.
And there's a near infinity of data out there to train "image-to-3D-scene" models. You can literally take existing stuff and render it from different angles, different lighting, different background, etc.
I've seen a few unconclusive demos of "...-to-3D-scene" but this 100% coming.
I can't wait to sketch out a very crude picture and have an AI generate me a 3D scene out of that.
> ... in short, something that allows minute creative control just like current tools do?
With 3D scenes generated by AI, one shall be able to decide to just render it as it (with proper lighting btw) or one shall all all the creative control he wants.
I want this now. But I'll settle with waiting a bit.
P.S: same for songs and sound FX by the way... I want the AI to generate me stuff I can import in an open-source DAW. And this is 100% coming too.
(It isn't "Hayao Miyazaki".)
These things are great (I am not being sarcastic, I mean it when I say great) if and only if you don't actually care about all of your requirements being met, but if exactness matters they are mind-bogglingly frustrating because you'll get so close to what you want but some important detail is wrong.
Even a bad VFX artist has so much more control over what they do. I think that the day "text-to-video" reaches the level of control that said bad VFX artist has from week one, it will be because we have sentient AIs which will, for all ends and purposes, be people.
That's not to say that there is no place for AI-generated content. Worst case scenario, it will be so good at poisoning the well that people will need to find another well.
I would say 97% of the time, the results are not what I want (and of course that's the case, it's just textual input) and so I change the text slightly, and a whole new thing comes out that is once again incorrect, and then I sit there for 5minutes while some new slop churns out of the slop factory. All of this back and forth drains not only my wallet/credits, but my patience and my soul. I really don't know how these "tools" are ever supposed to help creatives, short of generating short form ad content that few people really only want to work on anyway. So far the only products spawning from these tools are tiktok/general internet spam companies.
The closest thing that I've bumped into that actually feels like it empowers artists is https://github.com/Acly/krita-ai-diffusion that plugs into Krita and uses a combination of img2img with masking and txt2img. A slightly more rewarding feedback loop
Help me here. If tiktok becomes filled with these, will it mean that watching tiktok "curated" algorithmic results will be about digesting AI content? Like, going to a restaurant to be served rubber balloons full of air that then people will do their best to swallow whole?[^1]. Could this be it? The demise of the algorithm? Or will people just swallow rubber balloons filled with air?
[^1]: Do please use this sentence as a prompt :-)