Erm, isn't this a bit bad?
The general lesson from this is when you do not allow changes/replacement of invalid data (which is a legitimate thing to do), then you get stuck with handling the bad data in every system which uses it (and then you need to worry about different components handling the badness in different ways, see e.g. browsers).
This is no more true for version upper bounds than it is for version lower bounds, assuming that package installers ensure all package version constraints are satisfied.
I presume you think version lower bounds should still be honoured?
In the reverse direction, many version lower bounds are also "purely defensive" -- arising from nothing more than the version of the dep that you happened to get when you started the project. (Just because you installed "the latest baz" and got version 2.3.4, without testing there is nothing to say that version 2.3.3 would also work fine, so adding the version lower bound >=2.3.4 is purely defensive).
Basically, the two bound types are isomorphic.
Are we losing out on performance of the actual installed thing, then? (I'm not 100% clear on .pyc files TBH; I'm guessing they speed up start time?)
My first cynical instinct is to say that this is uv making itself look better by deferring the costs to the application, but it's probably a good trade-off if any significant percentage of the files being compiled might not be used ever so the overall cost is lower if you defer to run time.
I would bet on a subset for pretty much any non-trivial package (i.e. larger than one or two user facing modules). And for those trivial packages? Well they are usually small, so the cost is small as well. I'm sure there are exceptions: maybe a single gargantuan module thst consists of autogenerated FFI bindings for some C library or such, but that is likely the minority.
Sure, but you pay that hit either way. Real-world performance is always usage based: the assumption that uv makes is that people run (i.e. import) packages more often than they install them, so amortizing at the point of the import machinery is better for the mean user.
(This assumption is not universal, naturally!)
(The key part being that 'less common' doesn't mean a non-trivial amount of time.)
I just read the thread and use Python, I can't comment on the % speedup attributed to uv that comes from this optimization.
If it was a optional toggle it would probably become best practice to activate compilation in dockerfiles.
It seems like tons of people are creating container images with an installer tool and having it do a bunch of installations, rather than creating the image with the relevant Python packages already in place. Hard to understand why.
For that matter, a pre-baked Python install could do much more interesting things to improve import times than just leaving a forest of `.pyc` files in `__pycache__` folders all over the place.
My Docker build generating the byte code saves it to the image, sharing the cost at build time across all image deployments — whereas, building at first execution means that each deployed image instance has to generate its own bytecode!
That’s a massive amplification, on the order of 10-100x.
“Well just tell it to generate bytecode!”
Sure — but when is the default supposed to be better?
Because this sounds like a massive footgun for a system where requests >> deploys >> builds. That is, every service I’ve written in Python for the last decade.
They do.
> Are we losing out on performance of the actual installed thing, then?
When you consciously precompile Python source files, you can parallelize that process. When you `import` from a `.py` file, you only get that benefit if you somehow coincidentally were already set up for `multiprocessing` and happened to have your workers trying to `import` different files at the same time.
Unfortunately, it typically doesn't work out as well as you might expect, especially given the expectation of putting `import` statements at the top of the file.
What I want from a package manager is that it just works.
That's what I mostly like about uv.
Many of the changes that made speed possible were to reduce the complexity and thus the likelihood of things not working.
What I don't like about uv (or pip or many other package managers), is that the programmer isn't given a clear mental model of what's happening and thus how to fix the inevitable problems. Better (pubhub) error messages are good, but it's rare that they can provide specific fixes. So even if you get 99% speed, you end up with 1% perplexity and diagnostic black boxes.
To me the time that matters most is time to fix problems that arise.
This is a priority for PAPER; it's built on a lower-level API so that programmers can work within a clear mental model, and I will be trying my best to communicate well in error messages.
This is kind of fascinating. I've never considered runtime upper bound requirements. I can think of compelling reasons for lower bounds (dropping version support) or exact runtime version requirements (each version works for exact, specific CPython versions). But now that I think about it, it seems like upper bounds solve a hypothetical problem that you'd never run into in practice.
If PSF announced v4 and declared a set of specific changes, I think this would be reasonable. In the 2/3 era it was definitely reasonable (even necessary). Today though, it doesn't actually save you any trouble.
But if we accept that it currently ignores any upper-bounds checks greater than v3, that's interesting. Does that imply that once Python 4 is available, uv will slow down due to needing to actually run those checks?
That said, even if it does happen, I highly doubt that is the main part of the speed up compared to pip.
That is unanswerable now, whether a python package will be compatible with a version that is not released.
Having an ENUM like [compatible, incompatible, untested] at the least would fix this.
what does backwards compatibility have to do with parallel downloads? or global caching? The metadata-only resolution is the only backwards compatible issue in there and pip can run without a setup.py file being present if pyproject.toml is there.
Short answer is most, or at least a whole lot, of the improvements in uv could be integrated into pip as well (especially parallelizing downloads). But they're not, because there is uv instead, which is also maintained by a for-profit startup. so pip is the loser
- uncompressing packages while they are still being downloaded, in memory, so that you only have to write to disk once
- design of its own locking format for speed
But yes, rust is actually making it faster because:
- real threads, no need for multi-processing
- no python VM startup overhead
- the dep resolution algo is exactly the type of workload that is faster in a compiled language
Source, this interview with Charlie Marsh: https://www.bitecode.dev/p/charlie-marsh-on-astral-uv-and-th...
The guy has a lot of interesting things to say.
parallel downloads don't need multi-processing since this is an IO bound usecase. asyncio or GIL-threads (which unblock on IO) would be perfectly fine. native threads will eventually be the default also.
Now I believe unzip releases the GIL already so we could already benefit from that and the rest likely don't dominate perfs.
But still, rust software is faster on average than python software.
After all, all those things are possible in python, and yet we haven't seen them all in one package manager before uv.
Maybe the strongest advantage of rust, on top of very clean and fast default behaviors, is that it attracts people that care about speed, safety and correctness. And those devs are more likely to spend time implementing fast software.
Thought the main benefit of uv is not that it's fast. It's very nice, and opens more use cases, but it's not the killer feature.
The killer feature is, being a stand alone executable, it bypasses all python bootstrapping problems.
Again, that could technically be achieved in python, but friction is a strong force.
people who have this opinion should use Rust, not Python, at all. if Python code does not have sufficient speed, safety, and correctness for someone, it should not be used. Python's tools should be written in Python.
> The killer feature is, being a stand alone executable, it bypasses all python bootstrapping problems.
I can't speak for windows or macs but on Linux, system pythons are standard, and there is no "bootstrapping problem" using well known utilities that happen to be written in Python.
Bootstrapping a clean python env is the single biggest problem for people that are not daily coding in python.
That's half of the community in the python world.
When you write sqla that's not obvious, because you know a lot. But for the average user, uv was a savior.
I wrote a pretty long article on that here:
https://www.bitecode.dev/p/why-not-tell-people-to-simply-use
We also discuss it with brett cannon there:
https://www.bitecode.dev/p/brett-cannon-on-python-humans-and
But the most convincing argument is to teach python to kids, accountants, mathematicians, java coders and sysadmin.
After 20 years of doing that, I saw the same problems again and again.
And then uv arrived. And they disapeared for those people.
I'm not arguing against tools that make things as easy as possible for non programmers, I'm arguing against gigantic forks in the Python installation ecosystem. Forks like these are harmful to the tooling, I'm already suffering quite a bit due to the flake8/ruff forking where ruff made a much better linter engine but didnt feel like implementing plugins, so everyone is stuck on what I feel is a mediocre set of linting tools. Just overall I don't like Astral's style and I think a for-profit startup forking out huge chunks of the Python ecosystem is going to be a bad thing long term.
... but the archive directory is at the end of the file?
> no python VM startup overhead
This is about 20 milliseconds on my 11-year-old hardware.
As for 20 ms, if you deal with 20 dependencies in parallel, that's 400ms just to start working.
Shaving half a second on many things make things fast.
Althought as we saw with zeeek in the other comment, you likely don't need multiprocessing since the network stack and unzip in the stdlib release the gil.
Threads are cheaper.
Maybe if you'd bundle pubgrub as a compiled extension, you coukd get pretty close to uv's perf.
If I have 64 cores, and 20 dependencies, I do want the 20 of them to be uncompressed in parallel. That's faster and if I'm installing something, I wanna prioritize that workload.
But it doesn't have to be 20. Even say 5 with queues, that's 100ms. It adds up.
This bothers me more than once when building a base docker image. Why would I want a venv inside a docker with root?
Personally I never want program to ever touch global shared libraries ever. Yuck.
You absolutely can. But it's not best practice.
https://docs.docker.com/engine/containers/multi-service_cont...
PyPA has been a mess for a very long time for in-fighting, astroturfing, gatekeeping and so on with pip being the battlefield. The uv team just did one thing that PyPA & co stopped doing a long time ago (if they ever did ...) : actually solving pain point of their user and never saying "it's not possible because [insert bullshit]" or reply "it's OSS, do it yourself" to then reject the work with attitude and baseless argument.
They listened to their user's issues and solved their pain points without denying them. period.
I will bring popcorn on python 4 release date.