What you need to know before touching a video file

Posted by qbow883 12/27/2025

What you need to know before touching a video file(gist.github.com)

376 points | 245 commentspage 3

perching_aix 1/2/2026|

I've had a lot of misconceptions that I had to contend with over the years myself as well. Maybe this thread is a good opportunity to air the biggest one of those. Additionally, I'll touch on subbing at the end, since the post specifically calls it out.

My biggest misconception, bar none, was around what a codec is exactly, and how well specified they are. I'd keep hearing downright mythical sounding claims, such as how different hardware and software encoders, and even decoders, produce different quality outputs.

This sounded absolutely mental to me. I thought that when someone said AVC / H.264, then there was some specification somewhere, that was then implemented, and that's it. I could not for the life of me even begin to fathom where differences in quality might seep in. Chief of this was when somebody claimed using single threaded encoding instead of multi threaded encoding was superior. I legitimately considered I was being messed with, or that the person I was talking to simply didn't know what they were talking about.

My initial thoughts on this were that okay, maybe there's a specification, and the various codec implementations just "creatively interpret" these. This made intuitive sense to me because "de jure" and "de facto" distinctions are immensely common in the real world, be it for laws, standards, what have you. So I'd start differentiating and going "okay so this is H.264 but <implementation name>". I was pretty happy with this, but eventually, something felt off enough to make me start digging again.

And then, not even a very long time ago, the mystery unraveled. What the various codec specifications actually describe, and what these codecs actually "are", is the on-disk bitstream format, and how to decode it. Just the decode. Never the encode. This applies to video, image, and sound formats; all lossy media formats. Except for telephony, all these codecs only ever specify the end result and how to decode that, but not the way to get there.

And so suddenly, the differences between implementations made sense. It isn't that they're flaunting the standard: for the encoding step, there simply isn't one. The various codec implementations are to compete on finding the "best" way to compress information to the same cross-compatibly decode-able bitstream. It is the individual encoders' responsibility to craft a so-called psychovisual or psychoacoustic model, and then build a compute-efficient encoder that can get you the most bang for the buck. This is how you get differences between different hardware and software encoders, and how you can get differences even between single and multi-threaded codepaths of the same encoder. Some of the approaches they chose might simply not work or work well with multi threading.

One question that escaped me then was how can e.g. "HEVC / H.265" be "more optimal" than "AVC / H.264" if all these standards define is the end result and how to decode that end result. The answer is actually kinda trivial: more features. Literally just more knobs to tweak. These of course introduce some overhead, so the question becomes, can you reliably beat this overhead to achieve parity, or gain efficiency. The OP claims this is not a foregone conclusion, but doesn't substantiate. In my anecdotal experience, it is: parity or even efficiency gain is pretty much guaranteed.

Finally, I mentioned differences between decoder output quality. That is a bit more boring. It is usually a matter of fault tolerance, and indeed, standards violations, such as supporting a 10 bit format in H.264 when the standard (supposedly, never checked) only specifies 8-bit. And of course, just basic incorrectness / bugs.

Regarding subbing then, unless you're burning in subs (called hard-subs), all this malarkey about encoding doesn't actually matter. The only thing you really need to know about is subtitle formats and media containers. OP's writing is not really for you.

dylan604 1/2/2026||

I was a DVD programmer for 10 years. There was a defined DVD spec. The problem is that not every DVD device adhered to the spec. Specs contain words like shall/must and other words that can be misinterpreted, and then you have people that build MVP as a product that do not worry about the more advanced portion of the spec.

As a specific example, the DVD software had a random feature that could be used. There was one brand of player that had a preset list of random numbers so that every time you played a disc that used random, the random would be the exact same every time. This made designing DVD-Video games "interesting" as not all players behaved the same.

This was when I first became aware that just because there's a spec doesn't mean you can count on the spec being followed in the same way everywhere. As you mentioned, video decoders also play fast and loose with specs. That's why some players cannot decode the 10-bit encodes as that's an "advanced" feature. Some players could not decode all of the profiles/levels a codec could use according to the spec. Apple's QTPlayer could not decode the more advanced profiles/levels just to show that it's not "small" devs making limited decoders.

socalgal2 1/3/2026|||

The issue is encoding is an art, especially as it's lossy. You choose how much data to throw away (kind of like when you pick a quality in JPEG). Further, for video, you generally try to encode the differences between 2 frames. Again, because it's a lossy difference, it's up to the creator of the encoder to decide how to compute the difference. different algorithms come up with a different answers. There result still fits the spec.

Let's just say we were encoding a list of numbers. So we get a keyframe (an exact number) and then all frames after that until the next keyframe are just deltas. How much to add to that keyframe number

    keyframe = 123
    nextFrame += 2   // result = 125
    nextFrame += 3   // result = 128
    nextFrame -= 1   // result = 127

etc... A different encoder might have different deltas. When it comes to video, those difference are likely relatively subtle, tho some definitely look better than others.

The "spec" or "codec" only defines that each frame is encoded as a delta. it doesn't say what those detlas are or how they are computed, only how they are applied.

This is also why most video encoding software has quality settings and those settings often includely the fact higher quality is slower. Some of those settings are about bitrate or bitdepth or other things but others are about how much time is spent looking for the perfect or better delta values to get closer to the original image as searching for bettter matches takes time. Especially because it's lossy, there is no "correct" answer. There is just opinion.

Izkata 1/3/2026|||

> And then, not even a very long time ago, the mystery unraveled. What the various codec specifications actually describe, and what these codecs actually "are", is the on-disk bitstream format, and how to decode it. Just the decode. Never the encode.

Soooo with everyone getting used to creative names instead of descriptive names over the past decade or two, I guess "codec" just became a blob and it just never crosses peoples' minds that this is right there in the name: COding/DECoding. No ENCoding.

perching_aix 1/3/2026||

There's a term overload involved. In implementation terms, codec stands for coder/decoder, with "coder" referring exactly to an encoding capability: https://en.wikipedia.org/wiki/Codec

So that's a swing and a miss I'm afraid. But I'm very interested to hear what do you think a "coder" library does in this context if not encode, and why is it juxtaposed with "decoder" if not for doing the exact opposite.

arch1t3cht 1/3/2026|||

Thanks for bringing this up, since I'm realizing that I did not explicitly spell this out in the post. I'll add a paragraph making this even clearer.

memoriuaysj 1/2/2026||

what if I told you the same issue is true for lossless plain compression like .zip files

the compressor (encoder) decides exactly how to pack the data, it's not deterministic, you can do a better job at it or a worse one

which is why we have "better" zlib implementations which compress more tightly

perching_aix 1/2/2026||

Drives me crazy but I'm glad to learn of it :D

Makes a lot of sense in retrospect, to the extent it bothers me I haven't figured it out myself earlier.

memoriuaysj 1/2/2026||

this is exactly what "higher" compression levels do (among other things like bigger dictionary) - they try harder, more iterations, to find the optimum combination of available knobs for a particular chunk of data.

perching_aix 1/2/2026||

Yes, that much was always clear. I just always thought the way these software go about finding those combinations was also standardized on a high level rather than proprietary to each implementation. It is a fairly recent development for me to realize that the various encoder options and presets are specific to the encoder, not the format (and now, that the same is true for lossless formats too).

memoriuaysj 1/2/2026||

for video there is another constraint - time

hardware encoders (like the ones in GPUs) typically work realtime-ish, so they do minimal exploration of encoding space

you also have the one-pass/two-pass thing which is key for unlocking high quality compression

zzo38computer 1/2/2026||

Sometimes I would want to convert from MPEG-TS H.264 to DVD video format, or other conversions, so there are reasons to do so. However, once I had got desynchronized audio, and I don't know if that is because of the original source, because of the conversion, or because some segments have not been recorded. (Also, it could not retain the EIA-608 captions, but that seems to be a limitation with FFmpeg, rather than something I did.)

liampulles 1/3/2026|

Did you use a decent piece of DVD authoring software? Here are some options: https://www.videohelp.com/software/sections/authoring-dvd

zzo38computer 1/3/2026||

I used the "ffmpeg" to convert the video/audio, and then I used the "dvdauthor" to make the files into the video DVD format, and then I used the "genisoimage" to make the disk image file in the DVD format.

tmaly 1/2/2026||

This is a great write up. Thank you for sharing.

hamonrye 1/2/2026||

Container formats for x.264, AVC, or H.264 are in .mkv or .mp4 codecs to encode and decode.

[1] Technically the term codec refers to a specific program that can encode and decode a certain format.

g4zj 1/2/2026||

I'm curious what the issue is with using Handbrake? I use it all the time on macOS and it's generally a simple and effective tool for my purposes.

xp84 1/2/2026||

Handbrake is fine if you truly need to reencode (aka “transcode”) your video, but if you find yourself with a video that your player can’t read, you might be able to just change the container format (remux it) using ffmpeg, copying the video and audio streams directly across.

With video there are 3 formats: the video stream itself, the audio stream itself, and the container (only the container is knowable from the extension). Formats could technically be combined in any combination.

The video stream especially is costly in CPU to encode, and can degrade quality significantly to transcode so it’s just a shame to re-encode if the original codec is usable.

Container format mkv is notorious for not being supported out of the box on lots of consumer devices, even if they might have codecs for the audio and video streams they typically contain. (It has cool features geeks like, though, but for some reason it gets less support.)

Izkata 1/3/2026||

Subtitles are another kind of stream aside from video/audio.

Also there's one user-level aspect of MKV that makes it not too surprising to me: It can contain any number of video/audio/subtitle streams and the interface needs some way of allowing the user to pick between them. Easier to just skip that complexity, I guess.

dspillett 1/2/2026|||

If you search the page you'll find a reference to having “numerous foot guns”.

I can't say I've experienced either of the ones mentioned, but I have had trouble in the past with output resolution selection (ending up with a larger file than expected with the encoding resolution much larger than the intended display resolution). User error, of course, but that tab is a bit non-obvious so it might be fair to call it a footgun.

socalgal2 1/3/2026|||

the short version is there's nothing wrong with it for your use case.

The author's POV is that the handbrake is a lossy conversion and often people use it in cases where they could have used a different tool that is lossless.

My uses of handbrake are that I always want a lossy conversion so no issue. A good example is anytime I make screen capture and want to post it on github. I want it to be under the 10meg limit (or whatever it is) so I want it to be re-encoded to be smaller. I don't mind the loss in quality.

SG- 1/2/2026||

the author can't stand how it simply re-encodes videos instead of extracting the video tracks and puts them in new containers.

pandemic_region 1/2/2026||

Could have used this in the nineties, where hunting a specific codec to play that video you downloaded off a BBS was an actual thing.

cruffle_duffle 1/2/2026|

Oh man it extended well past the 90’s. Finding some weird windows video codec in a dodgy .ru domain was a time honored tradition for quite some time.

I remember all the weird repackaged video codec installers that put mystery goo all over them machine.

The article bashes VLC but I tell you what… VLC plays just about everything you feed it without complaint. Even horribly corrupt files it will attempt to handle. It might not be perfect by any means but it does almost always work.

zzo38computer 1/2/2026|||

I had found that VLC does not play a MPEG-TS file very well (although it recognizes the file and plays it, it does not work very well); converting it to another format first, will cause it to play better, in my experience.

duskwuff 1/3/2026||

MPEG-TS files are just a pain to play in general. It was never intended as a file format for prerecorded video, and lacks some features (like seek indexes) which are required for reasonable playback.

In most circumstances, a MPEG-TS file can be remuxed (without reencoding) to a more reasonable container format like MP4, and it'll play better that way. In some cases, it'll even be a smaller file.

Agingcoder 1/2/2026|||

I used mplayer ( the ancestor to mpv as I’ve just realized ) in the early 2000s which I think could handle everything under the sun back then.

astrange 1/3/2026||

That's ffmpeg. Neither mplayer nor VLC were doing anything special that let them "play everything". They just used ffmpeg.

(nb they did often use their own demuxers instead of libavformat)

vivzkestrel 1/3/2026||

would be nice if someone like epic spaceman actually broke down how videos are encoded, stored, processed and how encoding algorithms work visually, i am bad at understanding things by reading about them

weinzierl 1/2/2026||

The article talks about image comparisons but does not say what the best way to extract an image is.

If I want the best possible quality image at a precisely specified time, what would I do?

Can I increase quality if I have some leeway regarding the time (to use the closest keyframe)?

Is there a way to "undo" motion blur and get a sharp picture?

latexr 1/2/2026||

I usually use a shortcut in mpv to extract the screenshot. If I want to do it via the command-line:

  ffmpeg -ss 00:00:12.435 -i '/Users/weinzieri/videofile.mp4' -vframes 1 '/Users/weinzieri/image.png'

The means “go to 00:00:12.435 on the file /Users/weinzieri/videofile.mp4 and extract one frame to the file /Users/weinzieri/image.png”.

ErroneousBosh 1/2/2026|||

> Is there a way to "undo" motion blur and get a sharp picture?

Not really, no, any more than there is a way to unblur something that was shot out of focus.

You can play clever tricks with motion estimation and neural networks but really all you're getting is a prediction of what it might have been like if the data had really been present.

Once the information is gone, it's gone.

weinzierl 1/2/2026|||

If the estimation is good it might be enough for some use cases. Is there any software out there that specializes in this? Similarly to maybe AI colorizing or upscaling, which both guess information that is not there anymore.

memoriuaysj 1/2/2026||||

it's not gone, just more difficult to extract

video has certain temporal statistics which can allow you to fit the missing information

only true blurred white noise is impossible to recover

ErroneousBosh 1/2/2026||

It really is gone. You can predict what you think it might have been, but you can't know what it was.

memoriuaysj 1/2/2026||

it's gone in a single still frame

but across many consecutive frames, the information is spread out temporaly and can be recovered (partially)

the same principle of how you can get a high resolution image from a short video, by extracting the same patch from multiple frames

https://en.wikipedia.org/wiki/Video_super-resolution

ErroneousBosh 1/2/2026|||

No, it's not "restoring detail". The information is gone.

It is predicting what the information might maybe have been like.

memoriuaysj 1/2/2026||

you are arguing with math proofs here, the information is not gone, if it was a real video (as opposed to adversarily generated video)

ErroneousBosh 1/3/2026||

I'm struggling with the idea that you can use maths to recover information from a video that simply was not present in the video.

I get that what you're describing can statisically "unblur" stuff you've blurred with overly-simplistic algorithms.

I can provide you with real-world footage that has "natural" motion blur in it, if you can demonstrate this technique working? I'd really like to see how it's done.

weinzierl 1/2/2026|||

That looks interesting. Is there ready-made software that can do this? Doesn't have to be easy to use just useable with a time commitment of a few days.

crazygringo 1/3/2026|||

> Not really, no, any more than there is a way to unblur something that was shot out of focus.

This is actually possible:

https://en.wikipedia.org/wiki/Deconvolution

If you have a high-quality image (before any compression) with a consistent blur, you can actually remove blur surprisingly well. Not completely perfectly, but often to a surprising degree that defies intuition.

And it's not a prediction -- it's recovering the actual data. Just because it's blurred doesn't mean it's gone -- it's just smeared across pixels, but clever math can be use to recover it. It's used widely in certain types of scientific imaging.

For photographers, it's most useful in removing motion blur from accidentally moving the camera while snapping a photo.

perching_aix 1/2/2026||

You'll need to settle on a decoder. I personally just use my video player for this, mpc-hc.

In mpc-hc, you can framestep using CTRL+LeftArrow (steps a frame backward) or CTRL+RightArrow (steps a frame forward). This lets you select the frame you want to capture. You do not need to be on a keyframe. These keybinds are configurable and may be different on the latest version.

Then in the File menu, there's an export image option. It directly exports the frame you're currently on, to disk. Make sure to use a lossless format for comparisons (e.g. PNG).

I'm aware this can be done in other players - like mpv - as well, although there I believe no keybinds are set up for this by default, and the default export format is JPEG.

amelius 1/3/2026|

Nowadays, I just ask an LLM to give me the ffmpeg command that I need.

No need to know anything about the video file anymore.

(Of course if you're hosting billions of videos on a website like YouTube it is a different story, but at that point you need to learn a _lot_ more e.g. about hardware accelerators, etc.)

More comments...