RF-DETR is both faster and more accurate and truly open source with an Apache 2.0 license: https://github.com/roboflow/rf-detr
Full disclosure: I’m one of the co-founders of Roboflow (we made RF-DETR, wrote this blog post, and are a sub-licensor of Ultralytics’ models.)
Misleading marketing statement.
The catch is that for image resolutions >=700x700pixels (most production usecases), the roboflow license is actually PML1.0 instead of Apache2.0 https://github.com/roboflow/rf-detr#license
Citation needed? 2XL looks like you go up to 800x800 pixel inputs. This isn't the dealbreaker you say it is - all pipelines benefit from thoughtful crop and rescaling before going to inference.
Rescaling is fine for some purposes but but not for all. For many domain-specific (often less common and odd dimensioned) objects, downscaling will severely reduce recall. There is a reason that Roboflow slaps a license that is not open source on those specific architectures.
In some cases tiled inferencing (for example with https://github.com/obss/sahi ) might do the job.
That said, many of the claimed improvements in this model were are efficiency related.
I then tried trained it on a lot of sample images from a 3D point & shoot game, and was quite disappointed in how it performed.
Has anyone else experimented with it recently? How does this suit as a base-model for training custom classifiers? And with hardware growth in the last ~5 years, is it suitable to run in parallel with games which are graphically intensive?
If you want to detect objects and speed is important so you can’t use a LLM architecture, you can give it a try too.
Meanwhile their very own Peter Skalski already does super job with host write ups and examples of all YOLO sorts and is well respected.