Posted by redbell 11 hours ago
https://security.googleblog.com/2014/01/ffmpeg-and-thousand-...
So, while it's a demo of the capabilities of LLMs, this should not be at all surprising. Ffmpeg is absolutely not something you should be running outside of a sandbox if you're touching any untrusted or user-supplied content. I know that people do, and these people are taking unreasonable risks.
99% of what I throw though ffmpeg is trusted i.e. I created it. It’s not a major risk.
They were talking about how there was a vulnerability in an extremely niche codec that is only used for one video game from the 90s or something, and were saying that the person who reported the vulnerability was acting like it was a big deal but it's really not because this codec is hardly ever used.
I was left wondering whether they were oblivious to the fact that an attacker who can supply a video file to you is free to use whatever video codec they want? It wouldn't matter if the developers thought the codec was never used at all; if it is still available then an attacker can use it.
Or was I just missing something? Is there a good reason why vulnerabilities in this codec are not a big deal after all?
If your service works by taking whatever file the user gives you and shoving it into unsandboxed ffmpeg, you've already fucked up. It would be nice if you could do that, but that's not a guarantee ffmpeg has ever provided, nor would it make sense for them to spend their limited resources on it.
Security is the punch line for ffmpeg.
goddamn, and this is a project that prides itself on having had-written assembly in it
I couldn’t believe they had fallen for an April fools so hard.
I agree it reflects poorly on them though
Do you have an example?
That said, that dude has a point. "Researchers" chasing clout with their names attached to CVEs is kind of ridiculous. Half these CVEs are missing bounds checks that can be fixed with a patch in as much effort as writing up the blog post announcing that there was a missing bounds check.
The developers of ffmpeg are very good at the first thing and not very good at the second. But few people on this planet, if instructed to write a complex video format parser in C or assembly, can produce something that's secure on the first try. The main failing of the ffmpeg team is that they should have spent more time on architectural hardening and mitigations. Most other large projects of this type do.
Except yourself, presumably, to me it almost seems nobody is perfect.
ffmpeg is Free Software. You are also free not to use it.
Oddly enough, despite all these endless grievances, no one has come up with a better or more capable tool, certainly not one that is freely available.
Evidently no one cares either, because most implementations of ffmpeg I've seen typically run it as root "because we have to". Don't worry we use Docker bro.
Actual well written vulnerability reports are not the same as slop.
AI slop is a real problem and annoying. Just because it exists does not mean every vulnerability report is AI slop.
Ffmpeg devs are free not to care, but then they cant complain when they start to get a bad reputation.
Ok but who is going to sift through it all to triage the good bits when you're working on something for free?
> Ffmpeg devs are free not to care, but then they cant complain when they start to get a bad reputation
Who gives a shit about reputation when you're the only game in town?
There is nothing out there that even attempts to approximate an ffmpeg clone. They are the Swiss army knife of media encoding and all complainers have produced are plastic sporks.
Its like anything else in open source. Maintainers will do so if they care. Maybe they decide they don't care. That is always their decision to make but there are consequences for the project. Maybe those consequences make sense. Being a maintainer is all about making cost-benefit trade offs.
> Who gives a shit about reputation when you're the only game in town?
Its up to the maintainers whether they care or not. It depends on what they value.
Ultimately if maintainers make decisions that are at odds with what their userbase want, someone eventually forks and people vote with their feet.
Today it's an industry driven by unscrupulous clout-chasers and a commitment to quantity over quality.
There is a difference between going through patches and pull requests vs. the endless stream of LLM-assisted bullshit that has started cluttering security inboxes in the last few years.
Until someone cares enough to do it. This is open source software. When it comes to open source, the golden rule is you either do the things you care about yourself or stfu.
Given the libav fork wasn't all that long ago, it can obviously happen to ffmpeg just as much as it can happen to any other project.
Within the framework there are multitudes of plugin packages that contain said elements and many of them are built on top of ffmpeg.
In both cases you are best off restricting things to what you actually use.
Different cases really I think both are good.
Gstreamer has a different model, chaining together plugins. Lots of overlap, but I think Gstreamer only has real traction because some silicon vendors use it.
ffmpeg's core functionality (encode, decode, streams, pipes, channels) are all implemented in `libav` which gstreamer links against.
ffmpeg and other media frameworks (Windows Media Foundation, Apple’s AVFramwork) only support static pipelines. You can use “switcher” components but the inputs are still static.
GStreamer is extremely special. The only thing that comes close was Microsoft’s DirectShow, which has since been replaced with Media Foundation which can’t do it. And while DirectShow did support it, it was fragile because many 3rd party filters did not support dynamic configuration.
GStreamer does use ffmpeg, but it just wraps the core encoder/decoder/filter code and discards the streams/graph/pipe part of ffmpeg.
FFmpeg doesn't do “pipelines”. It's a library, not a framework.
You would change your opinion quickly if your browser, apps and TV suddenly stopped supporting videos due to relying on FFmpeg.
It's okay for a sandbox to fall over due to bad inputs and poor memory security if it can just be restarted and move onto other streams.
Thus:
1. Code which processes untrusted input
2. Code written in unsafe languages like C or C++
3. Code that runs without a sandbox
So ffmpeg should be sandboxed, same as the network code and GPU process are sandboxed.
Cheap arse low resource TVs should either include some form of sandboxing OR the entire device should be treated as a "can fall over" sandbox .. well isolated from any household LAN of consequence, etc.
It seems unlikely that BoxStore Brand Android TVs will be well designed with an eye to security so <shrug> they're an exercise for home net admin masochists and/or an opportunity to market sensible easy to use IoT age routers that come preconfigured to handle bad-device(s).
Yes, there are security issues but quite a few are not ffmpeg itself related - the input is pretty shabby or at least not exactly easy to deal with!
Obviously, they could do with some assistance and I'm sure you and I will both dive in with equal zeal.
They should prompt one of the more adventurous LLMs to find security bugs and with some luck it will deviate from the prompt and rewrite ffmpeg in Rust.
A few months ago I started working on a system that finds critical security issues and opens PRs instead of just filing reports. The acceptance rate is sitting at roughly 94% so far. Most of the failures were due to project-specific kill switches or other internal mechanisms that weren’t documented, not because the vulnerability itself was misidentified.
Developers generally seem to prefer this approach. A bug report creates work. A good PR removes work. That sounds obvious, but a lot of security products still stop at the report and call it a day.
Indeed: The industry optimizes for speed, time to market, and features, and applies the ostrich model to everything that doesn't bring short-time revenue (security considerations, accessibility, vendor lock-in, interoperability, …)
This has been going on for as long as the industry exists, and now we start to have the proper tools to assess the damage and understand the brittleness of it all.
Wow this is actually pretty serious - I'm even surprised its being published. There are several services where I can imagine this is exploitable today.
(There are a number of reasons for this, not least being that C makes it very easy to ship partially initialized memory over the wire.)
Oh, and licensing. Licensing is the real killer. I could just write my own mp3 decoder easily (the format not the file type) but I'm not gonna risk my company getting sued into the ground by doing that.
I agree about long periods of development and difficult standards, though.
Very serious, though in practice it doesn't sound like this bug achieves arbitrary RCE on its own (especially in the presence of ASLR). You would need there to be some writable and executable page of memory lying around.
If a security bug is exploited in the wild, it's an n-day if it's been first exploited n days after the publication of the bug, and a zero-day if it's been exploited before or on the day of the publication.
When a bug is not yet exploited in the wild, it's just a discovery of a bug, not a zero-day.
I understand why it's poorly understood. It's a snappy term, and people assume it means "bad" and nothing else because that's all you can get from the context. However, since most people also don't know the difference between a vulnerability and an exploit, they won't understand the definition of a zero-day when they read it.
But I'm still going to complain if a security vulnerability research company is using the term incorrectly in their own press copy. It makes them look amateurish.
is it the difference between a knife and a stab wound?
The vulnerability is the exposed weakness. Vulnerabilities get fixes, and they exist without anybody knowing about them. Vulnerabilities get CVEs assigned to them.
The exploit is the means of attack. It's the specific actions or calls that let you take advantage of a vulnerability. It could be a worm, or botnet scripts, or specifically crafted data[0]. A proof of concept is not an exploit itself, but it demonstrates that the vulnerability can be exploited.
An example of a vulnerability might be a gate where the gap between the door and the jam are too wide. The exploit is a coat hanger used to lift the inside latch from outside the gate. That results in unprivileged access.
And zero-day specifically compares when the white hats (vendors, system owners) and the black hats learn about the existence of a vulnerability. If white hats learn that a vulnerability exists by being subject to an in-the-wild black hat exploit of it, then it's a true zero-day.
But I can't think of a program more worthy of sandboxing when run with untrusted input than ffmpeg. It's a huge amount of C dealing with the most complicated video and audio codecs, which is notoriously impossible to get completely right.
But it's not actually that big of a problem. I run ffmpeg inside a VM or gVisor, and the end result is usually a video file that I'm perfectly willing to play in my browser, where it gets decoded in yet another sandbox because this shit is hard.
Secure sandboxing tends to mean opportunities to make unrestricted copies.
It's 'safe to assume' it's not. It's emphatically not safe to assume any mitigation is perfect.
Why would that be safe to assume? If that were a reasonable assumption, you could just as well assume that it's safe to run ffmpeg.
A manually run ffmpeg on the command line does nothing to restrict its privileges, and its security model has very little interest in doing so, while browsers very much have.
And get hardware acceleration working...
If the attackers of ffmpeg need to be using such those authors’ services to find RCE in popular tools to attack, what the ffmpeg team needs to defeat attackers is to reduce efficiency of such tools depthfirst
LLM constantly confidently giving me this same sounding script with a "the root cause" and how it "is simple" while being completely incorrect.
In and of itself there's not a massive issue from what I can see, they're entry vectors that can lead to worse situations.
That's not to say they're not serious but if a Russian hacking group is using one of them it's in conjunction with other exploits or security flaws. Which is common in practice when it comes to decoding.