Top
Best
New

Posted by ternaus 3 days ago

OpenCV 5 Is Here: The Biggest Leap in Years for Computer Vision(opencv.org)
566 points | 102 comments
plasticeagle 8 hours ago|
The thing I love about OpenCV is that it remains hands down the best library for simply loading images and video. I've never even used any of its fancy computer vision features, but if I need to load a video file and look at the pixels - which I did need to do recently for an art project - OpenCV does it in about four lines of code.
Joel_Mckay 3 hours ago||
Done a few projects with OpenCV over the years, and I agree it can be fun.

However, it has a few issues:

1. Patented algorithms that are effectively impossible to license in a commercial setting.

2. Permuted API that change how identically named functions behave over versions.

3. Hardware CUDA version coupling deprecating support every major release.

4. Inconsistent and contradictory documentation in the constant subtle permutations. Downstream projects tend to version lock the lib for really practical reasons.

5. A shift away from core C libraries like ImageMagick & V4l, and into C++ abstractions with legacy Swig wrapper libraries in Java or Python.

6. Perpetual-Beta culture means the library will unlikely ever really fully stabilize.

It is a fun library, until people actually try to deploy something serious. As users will often simply suggest using an old version release if there is a bug.

Everything from Build flags to the API documentation has never fully stabilized. ymmv =3

markusMB 1 hour ago|||
Done a few projects with OpenCV myself, and your list of issues reads as if you throw OpenCV and opencv_contrib into the same bucket. Which you shouldnt. And maybe your assessment is outdated here and there and it is time to look again.

- OpenCV is Apache license. Yes, it used to be more complicated.

- The only patented algorithm I am aware of, SIFT, used to be part of opencv_contrib. And the README in opencv_contrib would greet you with a warning, that the code may not be fit commercial use for various reasons. Only when the patent expired, it was moved into OpenCV core.

- Same observation for Aruco marker detection, which was in contrib for a long time because the options to choose from were either not-well-maintained or GPL-licensed code. It is now in core OpenCV (and Apache).

- Despite its age, I think that OpenCV is still more than relevant today. And being part of modern languages like C++, Swig, Java and Python (and for years already) is part of that. Still I was surprised how long they maintained OpenCV 2 and 3.

- Over the past releases and few years, my impression was actually that core API was very much stable(izing). Cant say what happened in contrib – or what it feels like when you treat core and contribute as one and a feature progressed from contributing to core.

- I do agree, that I usually I would check that a MINOR releases wasnt actually a MAJOR release, breaking some API or behavior I was relying on. I am hoping that Version 5 is pulling the ambitions for making things differently away from Version 4. So v4 can be used stably ;-)

Joel_Mckay 1 hour ago||
My point was the release numbers are meaningless, as there is always something subtly broken even in the packaged versions. One can't just use the library beyond basic functionality without becoming involved in the code base.

Indeed, if your library dependency constellation works, some will static link to stabilize/freeze their project for more than a few months.

It wasn't that v3 was particularly good, but rather v4 was a mess. I predict v5 inherited that mess, and improved it... lol =3

Sesse__ 2 hours ago||||
Also, performance is generally pretty low; I've been on projects where we rewrote OpenCV code into more-or-less obvious hand-rolled code and won 5x perf. The abstractions are generally a bit too thick and oriented around single pixels (which also makes the API a bit too verbose for my taste).
Joel_Mckay 1 hour ago||
Machine vision has always been resource intensive... and if you are doing trained ML projects the hardware choices are actually very limited.

To enable Intel TBB, CUDA, and CPU specific compiler optimizations... one will almost certainly need to re-build the library, and customize your application build.

Some tasks degrade in performance on a GPU, and others are 740 times faster... ymmv. =3

harrall 3 hours ago|||
Agree with this too. OpenCV is functionality great but its constituent parts are written by many different people who all kind of do things a little differently and it shows.

But I can’t really complain because it’s open source and added to by contributors.

Joel_Mckay 1 hour ago||
One can... and should report when stuff is broken, or the project becomes worthless to all but one persons passing interest. =3
dheera 1 hour ago|||
> best library for simply loading images and video

But not for saving video. That fourcc pile of crap doesn't open up in QuickTime player, the default Ubuntu video player, or anything anybody actually uses. I've always had to add a os.system("ffmpeg [ask llm to generate the command for you]") afterwards to fix anything that OpenCV generates.

deadbabe 2 hours ago|||
What are you looking for in the pixels?
Geee 2 hours ago||
To see if it's fake.
doctorpangloss 1 hour ago||
opencv file loading is crap. it will load images with the wrong gamma, it will give you floating point values that hide the limitation that it pretty much only loads colors in 8 bit, and it will not be able to save to anything useful.
pzo 7 hours ago||
Quite a good release although not sure why they invest so much time into their ONNX engine. I don't think they have enough stuff and big pockets to compete with ONNXRuntime, CoreAI, ExecuTorch, LiteRT.

I'm happy they added option for ONNXRuntime. I wish their cv.dnn was mostly that unified wrapper around many different backends (ONNXRuntime, Executorch, LiteRT, CoreAI) and maybe just some tooling around it (performance metrics tools, model downloads etc). Transformers(.js) approach looks better for me.

Wish they also invested more time into better production ready Camera I/O (for mobiles, device/format discovery, manual settings, depthmap support, etc) and better Highgui that could use different backends (skia, webgpu) and on mobiles.

ftchd 10 hours ago||
> One practical detail is worth knowing. The new engine is CPU-only at the moment, so if you select a non-CPU backend and target (for example CUDA or OpenVINO through setPreferableBackend and setPreferableTarget), you will want the classic engine.

So there's room for even better performance!

wongarsu 10 hours ago||
It's certainly a choice to make your headline feature a new ONNX engine, feature a bunch of comparisons how it's better than ONNXRuntime, while casually mentioning on the side that the cool new much faster engine is CPU-only

Sure, running models on the CPU is very much a thing in computer vision (the benchmarked YOLOv8n has 37M params). But this whole announcement feels more like OpenCV catching up to the modern world, not "The Biggest Leap in Years for Computer Vision"

Still great, needing fewer libraries is a good thing, but maybe a bit oversold

VadimPR 9 hours ago||
The release post is AI-written with little human oversight and it shows.
claytongulick 7 hours ago|||
I had to stop reading after: "This is not just another incremental release. OpenCV 5 is a major step forward."

If a human can't be bothered to write a piece, I can't be bothered to read it.

dismantlethesun 10 minutes ago|||
I felt that this was an indication that OpenCV had finally discovered SemVer.
danjc 5 hours ago||||
It's not just annoying, it's tiring
VulgarExigency 6 hours ago||||
The endless deluge of AI prose really wears on the soul once you start noticing it.
kphorn 1 hour ago||||
I think the only thing that the human did was remove the emdash between the two sentence fragments and replace with a period.
thin_carapace 5 hours ago|||
i initially adopted this line of thinking. after exposure to arguably valid cases like translated articles, it now seems to me that the most efficient path forward (after first noting AI prose) is to scan past all language and evaluate whether or not useful content is encoded within. theres no benefit to anyone (except those benefitting from societal atrophy) in wasting brain cycles on unnecessary verbosity, however blanket rejection necessarily involves loss of valuable information.
trklausss 7 hours ago||||
This is what I hate about AI. Not that people use it, it's great to accelerate specific workflows, make less mistakes etc. It's just blindly trusting it and just saying "Make a post about a CV library release, make no mistakes" and calling it a day.

Where is the human creativity in writing release notes gone?

vdfs 8 hours ago|||
The illustrations couldn't be any more generic-ai
kphorn 1 hour ago||
my code, my commit - ugh
nnevatie 10 hours ago||
No one uses ONNXRuntime (nor the new engine in OpenCV 5) in production. For anything performance-sensitive, one would run models under TensorRT, as an example.
amorroxic 9 hours ago|||
Curious on what backs this assertion. As a counterpoint we’ve been running 200+ models in production for more than 5 years - language models, embedding, classifiers, low tens to hundred M params. Traffic in the order of 1-2M requests/day and everything is enabled by onnx with some cgo (or Rust) plumbing on top. What’s your SLA?
nnevatie 3 hours ago||
Ahh, I should have probably added some context around my hyperbole. I was referring to real-time computer vision - think of e.g. segmenting FHD/UHD video.
snovv_crash 9 hours ago||||
Strong statement to make when I have at least 2 datapoints contradicting it, in SaaS and embedded/robotics.
pzo 7 hours ago||||
how are supposed to use TensorRT on iOS, iPadOS, Android or even Web? Production is not only cloud.
OvervCW 8 hours ago||||
You can use ONNXRuntime with a TensorRT backend, so one does not exclude the other.
gunalx 9 hours ago||||
Production dosent have to be performance sensitive, so devex may still outcompete the performance differences in some scenarios.
antonvs 6 hours ago||||
We use this in production:

https://docs.rs/onnxruntime/latest/onnxruntime/

It’s a Rust wrapper around ONNX Runtime. We currently serve 5+ million inference requests per day for a highly performance-sensitive application, for a long list of major enterprise clients. We don’t use GPUs for inference, because it would be cost-prohibitive. We launch tens of thousands of VMs per day to run these workloads.

monster_truck 5 hours ago||||
I've never understood how anyone comes into contact with it and thinks its anything more than an incredible inconvenience masked as the easy way of doing things. Given it a few good shakes for various uses and regretted the time spent each time
cik 5 hours ago|||
Ummm embedded robotics is all about this. For years.
boredemployee 2 hours ago||
How can I learn the practical side of computer vision in 2026?

I'm not interested in understanding papers or the math behind it, but rather in how to put a system into production, whether it's object detection, running 20 cameras in parallel on a single computer, like sizing hardware for a specific task, and so on.

Any tips?

bonoboTP 1 hour ago||
By doing it. Decide on a small project, like tracking your cat, detecting food items in your fridge, then take it step by step.

Then do a slightly more ambitious project. Start with something very simple.

It also heavily depends on what you already know regarding programming, image processing etc.

eastof 2 hours ago|||
One of the great things about OpenCV is how ubiquitous it is, there's a ton samples online and well represented in frontier model training data. I recently vibe-coded an object detector for my own personal photo library so I could separate out my pictures with humans in them. Very approachable with Codex + feeding it a sample from Github.
yayitswei 2 hours ago||
Try a coding agent for writing and tuning the OpenCV part, and have it explain its choices. That's probably the most practical path to shipping a working system.

Speaking from experience: never used OpenCV before, recently vibe coded a tool that makes supercuts of pool videos, trimming each clip from the cue ball's first strike to when the motion stops.

GreenSalem 6 hours ago||
AI written release post and it shows...
Npovview 1 hour ago||
I think Technical posts should be written with 3 levels of audiences in mind. Expert, Middle, Beginner. But I guess that is not necessary, since AIs can cut the flab easily.
oceansky 5 hours ago|||
I can't say for sure, but there is a suspicious amount of "it's not x, it's y". At least there are no em-dashes.
_qua 5 hours ago|||
The diagrams definitely look like LLM output as well
M4v3R 4 hours ago|||
The diagrams were generated with Nano Banana Pro (most probably, or alternatively with ChatGPT Image 2), if you look closely in high contrast areas you'll see artifacts in the background that give it away.

I personally don't mind AI generated content when it's properly reviewed, but unfortunately more often than not the author just glances at the result and decides it's good enough.

Example: https://opencv.org/wp-content/uploads/2026/06/image-1.jpeg

I'm not knowledgable enough to determine whether this diagram is 100% accurate, but some things look off - the arrows in the bottom left seem superficial, some arrows are connected in weird ways, the mini diagram in AttentionLayer block doesn't look right (it has two Softmax icons and one MatMul icon, while the "before" diagram is the opposite).

bl0b 3 hours ago||
Yeah that diagram is all over the place. The arrows on the left branching from the outline of the diagram itself?
saberience 4 hours ago|||
Tested one of the diagrams: "Yes, the digital watermark indicates that most or all of this image was generated or edited using Google AI."
jampekka 4 hours ago||
Indeed. Well written, clear, informative and to the point.
xpct 1 hour ago|||
As of now, any human effort is still ~= quality. Human-written article signals to me that a certain amount of time was spent on it, which is a proxy for quality. This goes for both text and diagrams.

If someone slapped together an article from an LLM and a few internal documents, that tells me exactly how much they cared about it.

Aachen 3 hours ago||||
So to-the-point that it comes with a table of contents. Idk if it needs saying that ToCs have legitimate uses, but the number of search results and blog posts having one since ~2022 is, eh, interesting. You come for whatever the headline was and you get a page with thousands of words, split up into five or more chapters, many of them overlapping or a rephrasing of the same question if you've hit a true content farm. This is not that, but I also can't fathom how one could argue that slop is concise as a hallmark

As for being well-written, does that refer to correct use of grammar and no typos, or do you mean that you find that bots write better than humans in any other way?

marknutter 3 hours ago|||
It could be the best written, most informative article they've ever read, but anti-ai folks would dismiss it as slop the moment someone told them it was written by ai.
smt88 2 hours ago||
The problem is that we don’t know if a human fact-checked it before release or if we’re the first humans reading it closely.
jampekka 2 hours ago||
We don't really know that about human written text either.
arcanine 10 hours ago||
They really improved the performance. I tested yolov8 medium segmentation model on intel i7 11th gen cpu.

Opencv 4.11 : ~255ms Opencv 5.0.0 : ~185ms

with the same code.

bobmcnamara 3 hours ago|
Intel never really improved their memory controller and busses and it shows.
ge96 1 hour ago||
I remember trying to do photo stitching myself (panoramas) then I failed miserably but it's built into opencv ha. I've used quite a bit of OpenCV features eg. laplace variance for an automatic zoom/focusing mechanical lens camera system (steppers) and contour/blob finding for crude color segmentation.
wiradikusuma 51 minutes ago||
Curious how do people usually use OpenCV with CCTV? (Use cases)
shelled 8 hours ago||
A few years ago I was using OpenCV is a commercial Android SDK (it might still be being used; also because iOS provided almost all of those "needs" ready-made and Android just didn't, neither did Firebase, or Jetpack suites/tools). I was the one who had added it in the SDK. There was a lot I/we could do but as an Android developer (barely any exposure to CV or even C/C++) what I felt we lacked was documentation, a community. We struggled with even shaving off parts that we did not want to ship with our SDK. Speed was such an issue. The problem was someone who just wanted to use the lib (on mobile) a lot of things felt esoteric and out of reach i.e difficult. It didn't have to be.Sadly LLM wasn't at full speed back then, barely useable, not even talked about. Something like this would have been a perfect use case of AI/LLM. A coder, not from the exact/specific field the tool was made in/from, but being able to take full advantage of its capabilities in a nuanced/selective manner.
hbcondo714 3 days ago|
> LLMs and VLMs, Running Inside OpenCV…Qwen 2.5, Gemma 3, PaliGemma, and the GPT-2 / GPT-4 family

Why these specific models / versions?

mkl 4 hours ago|
Yes, it's weird that they're so old.
More comments...