Building a commercial product means you pay money (or something they equally value) to people to do your bidding. You don't have to worry about politics, licensing, and all the usual FOSS-related drama. You pay them to set their opinions aside and build what you want, not what they want (and if that doesn't work, it just means you need to offer more money).
In this case it's a company that believes they can make a "good" package manager they can sell/monetize somehow and so built that "good" package manager. Turns out it's at least good enough that other people now like it too.
This would never work in a FOSS world because the project will be stuck in endless planning as everyone will have an opinion on how it should be done and nothing will actually get done.
Similar story with systemd - all the bitching you hear about it (to this day!) is the stuff that would've happened during its development phase had it been developed as a typical FOSS project and ultimately made it go nowhere - but instead it's one guy that just did what he wanted and shared it with the world, and enough other people liked it and started building upon it.
> quite a few people would consider "benevolent dictator for life" an outdated model for open source communities.
I think what most people dislike are rugpulls and when commercial interests override what contributors/users/maintainers are trying to get out of a project.
For example, we use forgejo at my company because it was not clear to us to what extent gitea would play nicely with us if we externalized a hosted version/deployment their open source software (which they somewhat recently formed a company around, and led to forgejo forking it under the GPL). I'm also not a fan of what minio did recently to that effect, and am skeptical but hopeful that seaweedfs is not going to do something similar.
We ourselves are building out a community around our static site generator https://github.com/accretional/statue as FOSS with commercial backing. The difference is that we're open and transparent about it from the beginning, and static site generators/component libraries are probably some of the least painful to fork or take issue with their direction, vs critical infrastructure like distributed systems' storage layer.
Bottom line is, BDFL works when 1. you aren't asking people to bet their business on you staying benevolent 2. you remain benevolent.
You’re not wrong, but those are the projects we’re talking about in this thread. uv has become large enough to enter this realm.
> Bottom line is, BDFL works when 1. you aren't asking people to bet their business on you staying benevolent 2. you remain benevolent.
That second point is doing a lot of heavy lifting. All of the BDFL models depend on that one person remaining aligned, interested, and open to new ideas. A lot of the small projects I’ve worked with have had BDFL models where even simple issues like the BDFL becoming busy or losing interest became the death knell of the project. On the other hand, I can think of a few committee-style projects where everything collapsed under infighting and drama from the committee.
The seemingly irresistible social pressure to committee-ize development is a paper tiger. It disappears if you stand your ground and state firmly "This is MY project".
Money is indeed a great lubricator.
However, it's not black-and-white: office politics is a long standing term for a reason.
Of course a group can also have bad actors but that’s not really an issue with politics specifically. Politics are neither good nor bad.
At a large company, your job after a certain level depends on your “impact” and “value delivered”. The challenge is getting 20 other teams to work on your priorities and not their priorities. They too need to play to win to keep their job or get that promotion.
I would not say it’s about getting other people aligned with your priorities instead of theirs but rather finding ways such that your priorities are aligned. There’s always the “your boss says it needs to help me” sort of priority alignment but much better is to find shared priorities. e.g. “We both need X; let’s work together.” “You need Foo which you could more easily achieve by investing your efforts into my platform Bar.”
When you are higher up, that is when you become said good boss, or that boss's boss, the dynamics of the grandfather comment kick in fully.
Impact is a handwavy way of saying “is your work good for the company”.
I believe office politics happens when there are simply too many people at a company or org.
The experts believe they've been hired for the value of their opinions, rather than for being 'yes-people', and have differing opinions to each other.
At a certain pay threshold, there are multiple peoples who's motivation is not "how do I maximise my compensation?" and instead is "how do I do the best work I can?" Sometimes this presents as vocal disagreements between experts.
The former is probably easier. They don't have to justify or determine the salaries, and don't have to figure out who's worth the money, and don't have to figure out how to figure that out.
Switch to what Nobel prize to man instead of the woman who do the work … sometimes. Take the credit and get the promotion.
There's an old saying: politics began when two people in a cave found themselves with only one blanket.
Rust's development used a structured, community RFC process—endless planning by your definition. The result was a famously well-designed toolchain that the entire community praises. FOSS didn't hold it back; it made it good.
So no, commercial backing isn't the only way to ship something good. FOSS is more than capable to ship great software when done right.
Lots of lessons from other FLOSS package managers helped `cargo` become great, and then this knowledge helped shape `uv`.
How's the company behind uv making money?
It doesn't have to make money now. But it's clearly pouring commercial-project-level of resources into uv, on the belief they will somehow recoup that investment later on.
If you’re a Python shop, compare
- writing uv and keeping it private makes package management easier for your own packages
- writing uv and opening it up, and getting all/most third party libs to use it makes package management easier for your own packages and third party packages you use
I suggest everyone save this comment and review it five years later.
It could be that they calculate the existence of uv saves their team more time (and therefore expense) in their other work than it used to create. It could be that recognition for making the tool is worth the cost as a marketing expense. It could be that other companies donate money to them either ahead of time in order to get uv made, or after it was made to encourage more useful tools to be made. etc
Edit: 6 months ago, user simonw wrote a HN comment "Here's a loose answer to that question from uv founder Charlie Marsh last September [2024] : https://hachyderm.io/@charliermarsh/113103564055291456
«« I don't want to charge people money to use our tools, and I don't want to create an incentive structure whereby our open source offerings are competing with any commercial offerings (which is what you see with a lost of hosted-open-source-SaaS business models).
What I want to do is build software that vertically integrates with our open source tools, and sell that software to companies that are already using Ruff, uv, etc. Alternatives to things that companies already pay for today.
An example of what this might look like (we may not do this, but it's helpful to have a concrete example of the strategy) would be something like an enterprise-focused private package registry. A lot of big companies use uv. We spend time talking to them. They all spend money on private package registries, and have issues with them. We could build a private registry that integrates well with uv, and sell it to those companies. [...]
But the core of what I want to do is this: build great tools, hopefully people like them, hopefully they grow, hopefully companies adopt them; then sell software to those companies that represents the natural next thing they need when building with Python. Hopefully we can build something better than the alternatives by playing well with our OSS, and hopefully we are the natural choice if they're already using our OSS. »»
(whether it will pan out or not is another matter, but in the meantime we got a decent open-source package manager out of it)
maybe they would get acquihire like Bun ???? idk, somebody defo needs this
And it's true, while I disagree with a lot of systemd decisions focus has a leveraging effect that's disproportional
numpy is the the de-facto foundation for data science in python, which is one of the main reasons, if not the main reason, why people use python
it's FOSS
and it "actually got done"
At least in my limited experience, the selling point with the most traction is that you don't already need a working python install to get UV. And once you have UV, you can just go!
If I had a dollar for every time I've helped somebody untangle the mess of python environment libraries created by an undocumented mix of python delivered through the distributions package management versus native pip versus manually installed...
At least on paper, both poetry and UV have a pretty similar feature set. You do however need a working python environment to install and use poetry though.
I still genuinely do not understand why this is a serious selling point. Linux systems commonly already provide (and heavily depend upon) a Python distribution which is perfectly suitable for creating virtual environments, and Python on Windows is provided by a traditional installer following the usual idioms for Windows end users. (To install uv on Windows I would be expected to use the PowerShell equivalent of a curl | sh trick; many people trying to learn to use Python on Windows have to be taught what cmd.exe is, never mind PowerShell.) If anything, new Python-on-Windows users are getting tripped up by the moving target of attempts to make it even easier (in part because of things Microsoft messed up when trying to coordinate with the CPython team; see for example https://stackoverflow.com/questions/58754860/cmd-opens-windo... when it originally happened in Python 3.7).
> If I had a dollar for every time I've helped somebody untangle the mess of python environment libraries created by an undocumented mix of python delivered through the distributions package management versus native pip versus manually installed...
Sure, but that has everything to do with not understanding (or caring about) virtual environments (which are fundamental, and used by uv under the hood because there is really no viable alternative), and nothing to do with getting Python in the first place. I also don't know what you mean about "native pip" here; it seems like you're conflating the Python installation process with the package installation process.
Even languages with great compat story are moving to support multi-toolchains natively. For instance, go 1.22 on Ubuntu 24.04 LTS is outdated, but it will automatically download the 1.25 toolchain when it seems go 1.25.0 in go.mod.
They can be a bit long in the tooth, yes, but from past experience another Python version I don't want to use is anything ending in .0, so I can cope with them being a little older.
That's in quite a bit of contrast to something like Go, where I will happily update on the day a new version comes out. Some care is still needed - they allow security changes particularly to be breaking, but at least those tend to be deliberate changes.
Even with LTS Ubuntu updated only at EOL, Python will not be EOL most of the time.
> A single Python version for the entire system fundamentally doesn’t work for many people thanks to shitty compat story in the vast ecosystem.
My experience has been radically different. Everyone is trying their hardest to provide wheels for a wide range of platforms, and all the most popular projects succeed. Try adding `--only-binary=:all:` to your pip invocations and let me know the next time that actually causes a failure.
Besides which, I was very specifically talking about the user story for people who are just learning to program and will use Python for it. Because otherwise this problem is trivially solved by anyone competent. In particular, building and installing Python from source is just the standard configure / make / make install dance, and it Just Works. I have done it many times and never needed any help to figure it out even though it was the first thing I tried to build from C source after switching to Linux.
> Because otherwise this problem is trivially solved by anyone competent. In particular, building and installing Python from source is just the standard configure / make / make install dance, and it Just Works. I have done it many times and never needed any help to figure it out even though it was the first thing I tried to build from C source after switching to Linux.
I compiled the latest GCC many times with the standard configure / make / make install dance when I just started learning *nix command line. I even compiled gmp, mpfr, etc. many times. It Just Works. Do you compile your GCC every time before you compile your Python? Why not? It Just Works.
Time. CPython compiles in a few minutes on an underpowered laptop. I don't recall last time I compiled GCC, but I had to compile LLVM and Clang recently, and it took significantly longer than "a few minutes" on a high-end desktop.
Can you name some?
> Do you compile your GCC every time before you compile your Python? Why not? It Just Works.
If I needed a different version of GCC to make Python work, then probably, yes. But I haven't yet.
Just like I barely ever need a different version of Python. I keep several mainly so that I can test/verify compatibility of my own code.
I'll be using uv for that though, as I'll be using it for its superior package management anyway.
Also, what do you do when you want your to locally test your codebase across many Python versions? Do you keep track of several different containers? If you start writing some tool to wrap that, you're back at square one.
I haven’t found that there was any breakage across Python 3.x. Python 2.x to 3.x yes.
Anyways, this all could be wrapped in a CICD job and automated if you wanted to test across all versions.
Further, we have a rather largish set of internal libraries most of our python programs rely on. And some of those rely on external 3rd party API's (often REST). When we find a bug or something changes, more often than not, we want to roll out the changed internal lib so that all programs that use it get the fix. Having to get everyone to rebuild and/or redeploy everything is a non-starter as many of the people involved are not primarily software developers.
We usually install into the system dirs and have a dependency problem maybe once a year. And it's usually trivially resolved (the biggest problem was with some google libs which had internally inconsistent dependencies at one point).
I can understand encouraging the use of virtual environments, but this movement towards requiring them ignores what, I think, is a very common use case. In short, no one way is suitable for everyone.
You would have a standard container image
Sometimes this is the right answer. Sometimes docker/podman/runc are not an option nor would the headache of volumes/mounts/permissions/hw-pass-through be worth the additional mess.
It is hard to over-state how delightful putting `uv` in the shebang is:
in `demo.py`:
#!/usr/bin/env -S uv run
# /// script
# requires-python = ">=3.13"
print("hello, world")
Then `chmod +x demo.py; ./demo.py`At no point did I have a detour to figure out why `python` is symlinked to `python3` unless I am in some random directory where there is a half-broken `conda` environment...
Though, this isn’t about avoiding installs; it’s about making the one install (uv) the only thing you have to get right, instead of debugging whatever python means today.
I was advocating for containers as the “hard isolation / full stack” solution which eliminate host interpreter ambiguity and OS drift by running everything inside a pinned image. But you do need podman and have the permissions set right on it.
Also, to use uv like this you either need to specify its path, or as shown in the example invoke /usr/bin/env. The Linux shebang requires a path rather than an executable name, and a relative path only works if you're in the exact right directory.
So in practical terms we have gained nothing, since if we want to avoid "PATH-driven interpreter selection" we could specify an absolute path like /usr/bin/python in the shebang, and uv doesn't let us avoid that.
That said, the PEP 723 interface is really nice (there's a lot more going on in the example than just figuring out which Python to use), and the experience of using uv as the interpreter is nicer in the sense that you only need uv to exist in one place. (This, too, is a problem that can be solved just fine in Python, and there are many approaches to it out there already.)
Many Linux distributions ship Python. Alpine and DSL don’t. You can add it to Alpine. If you want the latest, you install it.
Sounds like you've never actually used Python. You should never, ever be using the system Python for anything you need to run yourself. Don't even touch it. It's a great way to break your entire system. Many distros have stopped providing it at all, for good reason.
The first step every Python dev has to take on every single system they want to run their project on is to install their own sandboxed version of Python and it's libraries and it's library manager. Alternatively you pre build a docker container with it all packed inside which is the same basic thing.
Better option still is to simply ditch Python and switch to compiled languages that don't have this stupid problems.
With other dependency managers you end up with "system Python 3.45 -> dep manager -> project Python 1.23 -> project". Or worse, "system Python 1.23 -> dep manager -> project Python 1.23 -> project". And of course there will be people who read about the problem and install their own Python manager, so they end up with a "system Python -> virtualenv Python -> poetry Python -> project" stack. Or the other way around, and they'll end up installing their project dependencies globally...
Virtual environments are the fundamental way of setting up a Python project, whether or not you use uv, which creates and manages them for you. And these virtual environments can freely either use or not use the system environment, whether or not you use uv to create them. It's literally a single-line difference in the `pyvenv.cfg` file, which is a standard required part of the environment (see https://peps.python.org/pep-0405/), created whether or not you use uv.
Most of the time you don't need a different Python version from the system one. When you do, uv can install one for you, but it doesn't change what your dependency chain actually is.
Python-native tools like Poetry, Hatch etc. also work by managing standards-defined virtual environments (which can be created using the standard library, and you don't even have to bootstrap pip into them if you don't want to) in fundamentally the same way that uv does. Some of them can even grab Python builds for you the same way that uv does (of course, uv doesn't need a "system Python" to exist first). "system Python -> virtualenv Python -> poetry Python -> project" is complete nonsense. The "virtualenv Python" is the system Python — either a symlink or a stub executable that launches that Python — and the project will be installed into that virtual environment. A tool like Poetry might use the system Python directly, or it might install into its own separate virtual environment; but either way it doesn't cause any actual complication.
Anyone who "ends up installing their project dependencies globally" has simply not read and understood Contemporary Python Development 101. In fact, anyone doing this on a reasonably new Linux has gone far out of the way to avoid learning that, by forcefully bypassing multiple warnings (such as described in https://peps.python.org/pep-0668/).
No matter what your tooling, the only sensible "stack" to end up with, for almost any project, is: base Python (usually the system Python but may be a separately installed Python) -> virtual environment (into which both the project and its dependencies are installed). The base Python provides the standard library; often there will be no third-party libraries, and even if there are they will usually be cut off intentionally. (If your Linux comes with pre-installed third-party libraries, they exist primarily to service tools that are part of your Linux distribution; you may be able to use them for some useful local hacking, but they are not appropriate for serious, publishable development.)
Your tooling sits parallel to, and isolated from, that as long as it is literally anything other than pip — and even with pip you can have that isolation (it's flawed but it works for common cases; see for example https://zahlman.github.io/posts/2025/02/28/python-packaging-... for how I set it up using a vendored copy of pip provided by Pipx), and have been able to for three years now.
Except for literally anytime you’re collaborating with anyone, ever? I can’t even begin to imagine working on a project where folks just use whatever python version their OS happens to ship with. Do you also just ship the latest version of whatever container because most of the time nothing has changed?
I, as a user, do not care whatsoever about any of this. At all. If you're explaining "virtual environments", you've lost the plot.
Compiled languages got this right. The dev creates a binary and I as a user simply run it. That's it. That's the holy grail.
It's good to see at last someone in the Python space got their ducks in a row and we've finally got a sensible tool.
They haven't. At the end of the day, they just want their program to work. You and I can design a utopian packaging system, but the physics PhD with a hand-me-down windows laptop and access to her university's Linux research cluster don't care about python other than it has a PITA library situation that UV addresses.
As for why conda: wheels do not have post-installation hooks (which given the issues with npm, I'm certainly a fan of), and while for most packages this isn't an issue, I've encountered enough packages where sadly they are required (for integration purposes), and the PyPI packages are subtlety broken on install without them. Additionally, conda (especially Anaconda Inc's commercial repositories) have significantly more optimised builds (not as good as the custom build well-run clusters provide, but better than PyPI-provided ones). I personally do not use conda (because I tend to want to test/modify/patch/upstream packages lower down the chain and test with higher up packages), but for novices (especially novices on Windows), conda for all its faults is the best option for those in the "data science" ecosystem.
Probably either one of python-serial python-pexpect judging by the file dates, and neither of these are so exciting that there should have been any version conflicts at all.
And the only reason I had to rebuild it at all was due to another version conflict in the apm distribution that expects a particular version of pixbuf to be present on the system and all hell breaks loose if it isn't, and you can't install that version on a modern system because that breaks other packages.
It is insane how bad all this package management crap is. The GNU project and the linux kernel are the only ones that have never given me any trouble.
Users do NOT read the manual. Users ignore warnings. Users double click "AnnaKurnikovaNude.exe".
macos and linux usually come with a python installation out of the box. windows should be following suite but regardless, using uv vs venv is not that different for most users. in fact to use uv in a project, `uv venv` seems like a prerequisite.
Yep. But it's either old or broken or both. Using a tool not dependent on the python ecosystem to manage the python ecosystem is the trick here that makes it so reliable and invulnerable to issues that characterize python / dependency hell.
conda already had the independence from python distribution, but it still had its own set of problems with overlap with pip (see mamba).
i personally use uv for projects at work, but for smaller projects, `requirements.txt` feel more readable than the `toml` and `uv.lock`. in the spirit of encouraging best practices, it is still probably simpler to do it with older tools. but larger projects definitely benefit, such as in building container images.
If I want to bootstrap from uv on Windows, the simplest option offered involves Powershell.
Either way, I can write quite a bit with just the standard library before I have to understand what uv really is (or what pip is). At that point, yes, the pip UX is quite a bit messier. But I already have Python, and pip itself was also trivially installable (e.g. via the standard library `ensurepip`, or from a Linux system package manager — yes, still using the command line, but this hypothetical is conditioned on being a Linux user).
The aftertaste python leaves is lasting-disgusting.
Just recently: a build of a piece of software that itself wasn't written in python but that urgently needed a very particular version of it with a whole bunch of dependencies that refused to play nice with Anaconda for some reason (which in spite of the fact that it too is becoming less reliable is probably still the better one). The solution? Temporarily move andaconda to a backup directory, remove the venv activation code from .bashrc and compile the project, then restore everything to the way it was before (which I need it to be because I have some other stuff on the stove that is built using python because there isn't anything else).
And let's not go into bluetooth device support in python, anything involving networking that is a little bit off the beaten path and so on.
Please name a set of common packages that causes this problem reliably.
It currently includes two executables, but having it contain two executables and a bunch of .so libraries would be a fairly trivial change. It only gets messy when you want it to make use of system-provided versions of the libraries, rather than simply vendoring them all yourself.
There is not so much difference between shipping a statically linked binary, and a dynamically linked binary that brings its own shared object files.
But if they are equivalent, static linking has the benefit of simplicity: Why create and ship N files that load each other in fancy ways, when you can do 1 that doesn't have this complexity?
I think there's a few things going on here:
- If you're going have a project that's obsessed with speed, you might as well use rust/c/c++/zig/etc to develop the project, otherwise you're always going to have python and the python ecosystem as a speed bottleneck. rust/c/c++/zig ecosystems generally care a lot about speed, so you can use a library and know that it's probably going to be fast.
- For example, the entire python ecosystem generally does not put much emphasis on startup time. I know there's been some recent work here on the interpreter itself, but even modules in the standard library will pre-compile regular expressions at import time, even if they're never used, like the "email" module.
- Because the python ecosystem doesn't generally optimize for speed (especially startup), the slowdowns end up being contagious. If you import a library that doesn't care about startup time, why should your library care about startup time? The same could maybe be said for memory usage.
- The bootstrapping problem is also mostly solved by using a complied language like c/rust/go. If the package manager is written in python (or even node/javascript), you first have to have python+dependencies installed before you can install python and your dependencies. With uv, you copy/install a single binary file which can then install python + dependencies and automatically do the right thing.
- I think it's possible to write a pretty fast implementation using python, but you'd need to "greenfield" it by rewriting all of the dependencies yourself so you can optimize startup time and bootstrapping.
- Also, as the article mentions there are _some_ improvements that have happened in the standards/PEPs that should eventually make they're way into pip, though it probably won't be quite the gamechanger that uv is.
You'd think PyPy would be more popular, then.
> even modules in the standard library will pre-compile regular expressions at import time, even if they're never used, like the "email" module.
Hmm, that is slower than I realized (although still just a fraction of typical module import time):
$ python -m timeit --setup 'import re' 're.compile("foo.*bar"); re.purge()'
10000 loops, best of 5: 26.5 usec per loop
$ python -m timeit --setup 'import sys' 'import re; del sys.modules["re"]'
500 loops, best of 5: 428 usec per loop
I agree the email module is atrocious in general, which specifically matters because it's used by pip for parsing "compiled" metadata (PKG-INFO in sdists, when present, and METADATA in wheels). The format is intended to look like email headers and be parseable that way; but the RFC mandates all kinds of things that are irrelevant to package metadata, and despite the streaming interface it's hard to actually parse only the things you really need to know.> Because the python ecosystem doesn't generally optimize for speed (especially startup), the slowdowns end up being contagious. If you import a library that doesn't care about startup time, why should your library care about startup time? The same could maybe be said for memory usage.
I'm trying to fight this, by raising awareness and by choosing my dependencies carefully.
> you first have to have python+dependencies installed before you can install python and your dependencies
It's unusual that you actually need to install Python again after initially having "python+dependencies installed". And pip vendors all its own dependencies except for what's in the standard library. (Which is highly relevant to Debian getting away with the repackaging that it does.)
> I think it's possible to write a pretty fast implementation using python, but you'd need to "greenfield" it by rewriting all of the dependencies yourself so you can optimize startup time and bootstrapping.
This is my current main project btw. (No, I don't really care that uv already exists. I'll have to blog about why.)
> there are _some_ improvements that have happened in the standards/PEPs that should eventually make they're way into pip
Most of them already have, along with other changes. The 2025 pip experience is, believe it or not, much better than the ~2018 pip experience, notwithstanding higher expectations for ecosystem complexity.
PyPy is hamstrung by a limited (previously, a lack of) compatibility with compiled Python modules. If it had been a drop-in replacement for the equivalent Python versions, then it'd probably have been much more popular
PyPy doesn't do anything to help startup time. In fact, it's typically a bit slower to start up than CPython.
You reap the speed benefits from PyPy once it's been running for a little while and it can JIT compile the hot bits of code.
Considerably slower on my machine. Yes, that was my point. If the community doesn't care about startup time, you'd expect more adoption of an implementation that sacrifices that startup time for later performance.
Hah. Yes sounds like we are very much on the same page here. Python stdlib could really use a simple generic email/http header parser.
> It's unusual that you actually need to install Python again after initially having "python+dependencies installed".
I’m thinking about 3rd party installers like poetry, pip-tools, pdm, etc, where your installer needs python+dependencies installed before it can start installing.
> “write a pretty fast implementation using python” This is my current main project btw. (No, I don't really care that uv already exists. I'll have to blog about why.)
Do you have anything public yet? I’m totally curious. I started doing this for flake8 and pip back in 2021/2022, but when ruff+uv came along I figured it wasn’t worth my time any more.
The repo is https://github.com/zahlman/paper but it's not really usable and it's missing a bunch of local very unfinished stuff (and my README template definitely needs fixing). More of a "watch this space" but I would really like to push out a Show HN for the first chunk of functionality soon.
But yeah. Python packaging has been dumb for decades and successive Python package managers recapitulated the same idiocies over and over. Anyone who had used both Python and a serious programming language knew it, the problem was getting anyone to do anything about it. I can't help thinking that maybe the main reason using Rust worked is that it forced anyone who wanted to contribute to it to experience what using a language with a non-awful package manager is like.
rustdoc has only slightly changed since the 2010s, it's still very hard to figure out generic/trait-oriented APIs, and it still only does API documentation in mostly the same basic 1:1 "list of items" style. Most projects end up with two totally disjointed sets of documentation, usually one somewhere on github pages and the rustdoc.
Rust is overall good language, don't get me wrong. But it and the ecosystem also has a ton of issues (and that's without even mentioning async), and most of these have been sticking around since basically 1.0.
(However, the rules around initialization are just stupid and unsafe is no good. Rust also tends to favor a very allocation-heavy style of writing code, because avoiding allocations tends to be possible but often annoying and difficult in unique-to-rust ways. For somewhat related reasons, trivial things are at times really hard in Rust for no discernible reason. As a concrete, simplistic but also real-world example, Vec::push is an incredibly pessimistic method, but if you want to get around it, you either have to initialize the whole Vec, which is a complete waste of cycles, or you yolo it with reserve+set_len, which is invalid Rust because you didn't properly use MaybeUninit for locations which are only ever written.)
`Vec::spare_capacity_mut`[1] gives you a view into the unused capacity. There's nothing "invalid" about it.
[1]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.spa...
It seems like your complaint is more about "more APIs should accept possibly uninitialized data, but they don't." Which is fair, but I don't know how big of a deal it really is. There is for sure desire to make this work with `std::io::Read`, and indeed, there are unstable APIs for that[1].
[1]: https://doc.rust-lang.org/std/io/trait.Read.html#method.read...
I had to stop telling people to stop syncing their cargo env around nfs so many times, but sometimes they have no choice.
Anything doing locks on nfs, including trying to use sqlite, is a mistake. This is not a cargo problem this is a nsf problem.
Pun intended?
Jokes aside, what you describe is a common pattern. It's also why Google internally they used to get decent speedups from rewriting some old C++ project in Go for a while: the magic was mostly in the rewrite-with-hindsight.
If you put effort into it, you can also get there via an incremental refactoring of an existing system. But the rewrite is probably easier to find motivation for, I guess.
Someone on this site said most tech problems are people problems - this feels like one.
Greenfield mostly solves the problem because it's all new people.
In my mind they’re pretty heavily linked. But that may be based on not experiencing the opposite. At least not as far as I can remember.
* or workflow tools as they're called here https://packaging.python.org/en/latest/guides/tool-recommend...
Travis Oliphant is a founder of Anaconda and one of the most important people behind NumPy, SciPy etc.
This feels like a very unfair take to me. Uv didn’t happen in isolation, and wasn’t the first alternative to pip. It’s built on a lot of hard work by the community to put the standards in place, through the PEP process, that make it possible.
What uv did was to bring it all together.
By definition greenfield projects literally means free from constraints.
So the answer is in your question: Why did it take a team unbound by constraints to try something new, as compared to a project with millions of existing stakeholders?
Single vision. Smaller team. What they landed on is a hit (no guarantee of that in advance!)
Conversely, with so many stakeholders, getting everyone to rally around a change (in advance) is hard.
In my experience this is about human nature/organisation and spans all types of organisations, not just python or open source etc.
It also looks like python would have got there, given the foundations put in place as noted in the article.
From having sum-types to also having a reasonable packaging system itself.
It's the same way in JS land. You can make a game in a few kilobytes, but most web pages are still many megabytes for what should have been no JS at all.
That's TensorRT-LLM in it's entirety at 1.2.0rc6 locked to run on Ubuntu or NixOS with full MPI and `nvshmem`, the DGX container Jensen's Desk edition (I know because I also rip apart and `autopatchelf` NGC containers for repackaging on Grace/SBSA).
It's... arduous. And the benefit is what exactly? A very mixed collection of maintainers have asserted that software behavior is monotonic along a single axis most of which they can't see and we ran a solver over those guesses?
I think the future is collections of wheels that have been through a process the consumer regards as credible.
A lot of that, in turn, boils down to realizing that it could be fast, and then expecting that and caring enough about it.
> but with the same design decisions (PEP 517/518/621/658 focus, HTTP range tricks, aggressive wheel-first strategy, ignoring obviously defensive upper bounds, etc.), I strongly suspect we'd be debating a 1.3× vs 1.5× speedup instead of a 10× headline
I'm doing a project of this sort (although I'm hoping not to reinvent the wheel (heh) for the actual resolution algorithm). I fully expect that some things will be barely improved or even slower, but many things will be nearly as fast as with uv.
For example, installing from cache (the focus for the first round) mainly relies on tools in the standard library that are written in C and have to make system calls and interact with the filesystem; Rust can't do a whole lot to improve on that. On the other hand, a new project can improve by storing unpacked files in the cache (like uv) instead of just the artifact (I'm storing both; pip stores the artifact, but with a msgpack header) and hard-linking them instead of copying them (so that the system calls do less I/O). It can also improve by actually making the cached data accessible without a network call (pip's cache is an HTTP cache; contacting PyPI tells it what the original download URL is for the file it downloaded, which is then hashed to determine its path).
For another example, pre-compiling bytecode can be parallelized; there's even already code in the standard library for it. Pip hasn't been taking advantage of that all this time, but to my understanding it will soon feature its own logic (like uv does) to assign files to compile to worker processes. But Rust can't really help with the actual logic being parallelized, because that, too, is written purely in C (at least for CPython), within the interpreter.
> why did it take a greenfield project to give Python the package manager behavior people clearly wanted for the last decade?
(Zeroth, pip has been doing HTTP range tricks, or at least trying, for quite a while. And the exact point of PEP 658 is to obsolete them. It just doesn't really work for sdists with the current level of metadata expressive power, as in other PEPs like 440 and 508. Which is why we have more PEPs in the pipeline trying to fix that, like 725. And discussions and summaries like https://pypackaging-native.github.io/.)
First, you have to write the standards. People in the community expect interoperability. PEP 518 exists specifically so that people could start working on alternatives to Setuptools as a build backend, and PEP 517 exists so that such alternatives could have the option of providing just the build backend functionality. (But the people making things like Poetry and Hatch had grander ideas anyway.)
But also, consider the alternative: the only other viable way would have been for pip to totally rip apart established code paths and possibly break compatibility. And, well, if you used and talked about Python at any point between 2006 and 2020, you should have the first-hand experience required to complete that thought.
Specifically regarding the "aggressive wheel-first strategy", I strongly encourage you to read the discussion on https://github.com/pypa/pip/issues/9140.
Tapping the Rust community is a decent reason to do a project in Rust.
With that said, Rust was a good language for this in my experience. Like any "interesting" thing, there was a moderate bit of language-nerd side quest thrown in, but overall, a good selection metric. I do think it's one of the best Rewrite it in X languages available today due to the availability of good developers with Rewrite in Rust project experience.
The Haskell commentary is curious to me. I've used Haskell professionally but never tried to hire for it. With that said, the other FP-heavy languages that were popular ~2010-2015 were absolutely horrible for this in my experience. I generally subscribe to a vague notion that "skill in a more esoteric programming language will usually indicate a combination of ability to learn/plasticity and interest in the trade," however, using this concept, I had really bad experiences hiring both Scala and Clojure engineers; there was _way_ too much academic interest in language concepts and way too little practical interest in doing work. YMMV :)
Alternately, if you have the sort of work or culture that taps into people's intrinsic motivation, why would that work worse with Haskell or Clojure programmers than anybody else?
People are interested in different things along different dimensions. The way somebody is motivated by what they're doing and the way somebody is motivated by how they're doing it really don't seem all that correlated to me.
> there was way too much academic interest in language concepts and way too little practical interest in doing work.
They are communicating something real, but perhaps misattributing the root cause.
The purely abstract ‘ideal’ form of software development is unconstrained by business requirements. In this abstraction, perfect software would be created to purely express an idea. Academia allows for this, and to a lesser extent some open source projects.
In the real world, the creation of software must always be subordinate to the goals of the business. The goals are the purpose, and the software is the means.
Languages that are academically interesting, unsurprisingly, attract a greater preponderance of academically minded individuals. Of these, only a percentage have the desire or ability to let go of the pure abstract, and instead focus on the business domain. So it inevitably creates a management challenge; not an insurmountable one, but a challenge.
Hence the simplified ‘these people won’t do the work!’.
The fastest iterating people engineers I’ve worked with often have a deep user focus rather than a language affiliation.
I think the cultural context has changed.
In "python paradox", 'knows python' is an indication that the developer is interested in something technically interesting but otherwise impractical. Hence, it's a 'paradox' that you end up practically better off by selecting for something impractical.
These days, Python is surely a practical choice, so doesn't really resemble the "interested in something technically interesting but impractical".
Sometimes I thought our teams would be a terrible fit for more cookie-cutter applications where rapid development and deployment was the primary objective. We got into the weeds all the time (sometimes because of rust itself), but it happened to be important to do so.
Had we built those projects with JavaScript or Python I suspect the outcomes would have been worse for reasons apart from the language choice.
But that’s precisely why it is good for developer tools. And it turns out people who write systems code are really damn good at writing tools code.
As someone who cut my teeth on C and low level systems stuff I really ought to learn Rust one of these days but Python is just so damn nice for high level stuff and all my embedded projects still seem to require C so here I am, rustless.
What I like about Rust is ADTs, pattern matching, execution speed. The things that really give me confidence are error handling (right balance between "you can't accidentally ignore errors" of checked exceptions with easy escape hatches for when you want to YOLO,) and the rarity of "looks right, but is subtly wrong in dangerous ways" that I ran into a lot in dynamic languages and more footgun languages.
Compile times suck.
It just doesn't come up in the web and devtools development worlds. Either you're dealing with user input, which is completely untrusted and has to be validated anyways, or you're passing around known validated data.
The closest is maybe ETL pipelines, but type checking can't help there either since your entire goal is to wrestle with horrors.
Type validation can help with some of that but at some point it becomes way easier to just use imperative validation for something like this. It turns out that validating things that are easy is easy no matter what you do, and validating complex rules that were written by people who think imperatively is almost impossible to do declaratively in a maintainable way.
Sure its a little more verbose than bash one-liners, but if you need any kind of error handling and recovery, its way more effective than bash and doesn't break when you switch platforms (i.e. mac/bsd utility incompatibilities with gnu utilities).
My only complaint would be that dealing with OsString is more difficult than necessary. Way to much of the stdlib encourages programmers to just do "non-utf8 paths don't exist" and panic/ignore when encountering one. (Not a malady exclusive to rust, but I wish they'd gotten it right)
Example I had handy: <https://gist.github.com/webstrand/945c738c5d60ffd7657845a654...>
For instance the popular `fd` utility can't actually see files containing malformed utf-8, so you can hide files from system administrators naively using those tools by just adding invalid utf-8.
touch $'example\xff.txt'
fd 'example.*txt' // not found
fd -F $'example\xff.txt' // fails non-utf8
The existing rust libraries for manipulating OsString push people towards ignorance or rejection of non-utf8 filenames and paths.I genuinely can't understand why you suppose that has to do with the implementation language at all.
Languages that attract novice programmers (JS is an obvious one; PHP was one 20 years ago) have a higher noise to signal ratio than one that attracts intermediate and above programmers.
If you grabbed an average Assembly programmer today, and an average JavaScript programmer today, who do you think is more careful about programming? The one who needs to learn arcane shit to do basic things and then has to compile it in order to test it out, or the one who can open up Chrome's console and console.log("i love boobies")
How many embedded systems programmers suck vs full stack devs? I'm not saying full stack devs are inferior. I'm saying that more inferior coders are attracted to the latter because the barriers to entry are SO much easier to bypass.
The node supply chain attacks are also not unique to node community. you see them happening on crates.io and many other places. In fact the build time scripts that cause issues on node modules are probably worse off with the flexibility of crate build scripts and that they're going to be harder to work around than in npm.
uv doesn't exactly stop python package supply chain attacks...
Go: Enforces global, append-only integrity via a checksum database and version immutability; once a module version exists, its contents cannot be silently altered without detection, shifting attacks away from artifact substitution toward “publish a malicious new version” or bypass the proxy/sumdb.
Maven: Requires structured namespace ownership and signed artifacts, making identity more explicit at publish time; this raises the bar for casual impersonation but still fundamentally trusts that the key holder and build pipeline were not compromised.
Your average Go project likely has 10x fewer deps than a JS project. Those deps will not get auto-updated to their latest versions either. Much lower attack surface area.
If A causes other things besides B, then knowing about those other caused things tells us nothing about whether C happened, because we already know it did. "no further information" is elided to things that are relevant to the statement being made. Please apply basic charity in interpreting ideas expressed in prose; LWers who want to express something precisely in logical or mathematical notation are certainly not afraid to do so.
> Less wrong is a bunch of people who think they understand Bayes better than they do.
The objection you point out is not relevant to demonstrating an understanding of Bayes' Law. It's just a semantic quibble.
If you take a group of people who are squarely in the enterprise Java school of thought and have them write Rust, the language won't make much of a difference. They will eventually be influenced by the broader Rust community and the Rust philosophy towards programming, but, unless they're already interested in changed approaches, this will be a small, gradual difference. So you'll end up with Enterprise Java™ code, just in Rust.
But if you hire from the Rust community, you will get people who have a fundamentally different set of practices and expectations around programming. They will not only have a stronger grasp of Rust and Rust idioms but will also have explicit knowledge based on Rust (eg Rust-flavored design patterns and programming techniques) and, crucially, tacit knowledge based on Rust (Rust-flavored ways of programming that don't break down into easy-to-explain rules). And, roughly speaking, the same is going to be true for whatever other language you substitute for "Rust".
(I say roughly because there doesn't have to be a 1:1 relationship between programming languages, schools of thought and communities of practice. A single language can have totally different communities—just compare web Python vs data scientist Python—and some communities/schools can span multiple languages. But, as an over-simplified model, seeing a language as a community is not the worst starting point.)
"Rewrite" is important as "Rust".
I think Rust itself has this benefit
Helpfully, though, uv retains compatibility with newer (but still well-established) standards in the Python community that don't share this insanity!
> uv parses TOML and wheel metadata natively, only spawning Python when it hits a setup.py-only package that has no other option
The article implies that pip also prefers toml and wheel metadata, but has to shell out to parse those, unlike uv.
Not as far as I can tell, except perhaps in extended-support legacy environments (for example, ActiveState is still maintaining a Python 2.x distribution).
Isn't assigning out what all made things fast presumptive without benchmarks? Yes, I imagine a lot is gained by the work of those PEPs. I'm more questioning how much weight is put on dropping of compatibility compared to the other items. There is also no coverage for decisions influenced by language choice which likely influences "Optimizations that don’t need Rust".
This also doesn't cover subtle things. Unsure if rkyv is being used to reduce the number of times that TOML is parsed but TOML parse times do show up in benchmarks in Cargo and Cargo/uv's TOML parser is much faster than Python's (note: Cargo team member, `toml` maintainer). I wish the TOML comparison page was still up and showed actual numbers to be able to point to.
We also have the benchmark of "pip now vs. pip years ago". That has to be controlled for pip version and Python version, but the former hasn't seen a lot of changes that are relevant for most cases, as far as I can tell.
> This also doesn't cover subtle things. Unsure if rkyv is being used to reduce the number of times that TOML is parsed but TOML parse times do show up in benchmarks in Cargo and Cargo/uv's TOML parser is much faster than Python's (note: Cargo team member, `toml` maintainer). I wish the TOML comparison page was still up and showed actual numbers to be able to point to.
This is interesting in that I wouldn't expect that the typical resolution involves a particularly large quantity of TOML. A package installer really only needs to look at it at all when building from source, and part of what these standards have done for us is improve wheel coverage. (Other relevant PEPs here include 600 and its predecessors.) Although that has also largely been driven by education within the community, things like e.g. https://blog.ganssle.io/articles/2021/10/setup-py-deprecated... and https://pradyunsg.me/blog/2022/12/31/wheels-are-faster-pure-... .
I don't know the details of Python's resolution algorithm, but for Cargo (which is where epage is coming from) a lockfile (which is encoded in TOML) can be somewhat large-ish, maybe pushing 100 kilobytes (to the point where I'm curious if epage has benchmarked to see if lockfile parsing is noticeable in the flamegraph).
(not sure how uv does it, just guessing what can be done)
- synchronization operations are implicit so we need to re-resolve to confirm the lockfile is still valid. We could take some short cut but it would require re-implementing some logic
- dependency resolution only uses `Cargo.toml` for local and git dependencies. Registry dependencies have a json summary of what content is relevant for dependency resolution. Cargo parses nearly every locked package's `Cargo.toml` to know how to build it.
[fruit.apple]
[animal]
[fruit.orange]
So the only way to know that you have all the keys in a given table is to literally read the entire file. This is one of those unfortunate things in TOML that I would honestly ignore if I were writing my own TOML parser, even if it meant I wasn't "compliant".- Tables can be in any order, independent of heirarchy
- keys can be dotted, creating subtables in any order
On top of that, most use cases for the format are not benefitted by streaming.
I mean, of course it wasn't specifically Rust that made it fast, it's really a banal statement: you need only very moderate serious programming experience to know, that rewriting legacy system from scratch can make it faster even if you rewrite it in a "slower" language. There have been C++ systems that became faster when rewritten in Python, for god's sake. That's what makes system a "legacy" system: it does a ton of things and nobody really knows what it does anymore.
But when listing things that made uv faster it really mentions some silly things, among others. Like, it doesn't parse pip.conf. Right, sure, the secret of uv's speed lies in not-parsing other package manager's config files. Great.
So all in all, yes, no doubt that hundreds of little things contributed into making uv faster, but listing a few dozens of them (surely a non-exhaustive lists) doesn't really enable you to make any conclusions about the relative importance of different improvements whatsoever. I suppose the mentioned talk[0] (even though it's more than a year old now) would serve as a better technical report.
> PEP 658 went live on PyPI in May 2023. uv launched in February 2024. uv could be fast because the ecosystem finally had the infrastructure to support it. A tool like uv couldn’t have shipped in 2020. The standards weren’t there yet.
In 2020 you could still have a whole bunch of performance wins before the PEP 658 optimization. There's also the "HTTP range requests" optimization which is the next best thing. (and the uv tool itself is really good with "uv run" and "uv python".)
> What uv drops: Virtual environments required. pip lets you install into system Python by default. uv inverts this, refusing to touch system Python without explicit flags. This removes a whole category of permission checks and safety code.
pip also refuses to touch system Python without explicit flags?
For uv, there are flags that allow it, so it doesn't really "removes a whole category of permission checks and safety code"? uv has "permission checks and safety code" to check if it's system python? I don't think uv has "dropped" anything here.
> Optimizations that don’t need Rust: Python-free resolution. pip needs Python running to do anything.
This seems to me to be implying that python is inherently slow, so yes, this optimization requires a faster language? Or maybe I don't get the full point.
> Where Rust actually matters: No interpreter startup. ... uv is a single static binary with no runtime to initialize.
This one's pretty petty/pedantic, but "Rust technically has a very lightweight runtime." https://users.rust-lang.org/t/does-rust-have-a-runtime/11406...
> uv picks from the first index that has the package, stopping there. This prevents dependency confusion attacks and avoids extra network requests.
As long as the "first" index is e.g. your organization's internal one, that does ensure that some random thing on PyPI won't override that. A tool that checks every index first still has to have the right rule to choose one.
It is, however, indeed a terrible point. I don't think I've even seen evidence that pip does anything different here. But it's the sort of problem best addressed in other ways
By "syncopation" perhaps you mean "sycophancy"? I don't see how musical rhythms are relevant here.
Just commenting to preempt any comments telling you that the article doesn’t say this.
I blame fixed AI system prompts - they forcibly collapse all inputs into the same output space. Truly disappointing that OpenAI et all have no desire to change this before everything on the internet sounds the same forever.
As you said, reading this stuff is taxing. What's more, this is a daily occurrence by now. If there's a silver lining, it's that the LLM smells are so obvious at the moment; I can close the tab as soon as I notice one.
Fairly easy, in my wife's experience. She repeatedly got accused of using chatgpt in her original writing (she's not a native english speaker, and was taught to use many of the same idioms that LLMs use) until she started actually using chatgpt with about two pages of instructions for tone to "humanize" her writing. The irony is staggering.
It’s really useful for taking my first drafts and cleaning them up ready for a final polish.
The problem isn’t the surface tics—em dashes, short exclamatory sentences, lists of three, “Not X: Y!”.
Those are symptoms of the deep, statistically-built tissue of LLM “understanding” of “how to write a technical blog post”.
If you randomize the surface choices you’re effectively running into the same problem Data did on Star Trek: The Next Generation when he tried to get the computer to give him a novel Sherlock Holmes mystery on the holodeck. The computer created a nonsense mishmash of characters, scenes, and plot points from stories in its data bank.
Good writing uses a common box of metaphorical & rhetorical tools in novel ways to communicate novel ideas. By design, LLMs are trying to avoid true (unpredictable) novelty! Thus they’ll inevitably use these tools to do the reverse of what an author should be attempting.
[1]: https://github.com/andrew/nesbitt.io/commit/0664881a524feac4...
Or, use the "deep research" mode for writing your prose instead. It's far less sloppy in how it writes.
These people are amateurs at humanizing their writing.
Same. I'm actually more tired of this AI witch hunt
I heard high school and college students are doing this routinely so their papers don't get flagged as AI
this is whether they used an LLM for the whole assignment or wrote it themselves, has to get pass through a "re-humanizing" LLM either way just to avoid drama
pip is simply difficult to maintain. Backward compatibility concerns surely contribute to that but also there are other factors, like an older project having to satisfy the needs of modern times.
For example, my employer (Datadog) allowed me and two other engineers to improve various aspects of Python packaging for nearly an entire quarter. One of the items was to satisfy a few long-standing pip feature requests. I discovered that the cross-platform resolution feature I considered most important is basically incompatible [1] with the current code base. Maintainers would have to decide which path they prefer.
Backwards compatibility is the one thing that prevents the code in an older project from being replaced with a better approach in situ. It cannot be more difficult than a rewrite, except that rewrites (arguably including my project) may hold themselves free to skip hard legacy cases, at least initially (they might not be relevant by the time other code is ready).
(I would be interested in hearing from you about UX designs for cross-platform resolution, though. Are you just imagining passing command-line flags that describe the desired target environment? What's the use case exactly — just making a .pylock file? It's hard to imagine cross-platform installation....)
This (zero-copy deserialization) is not a rust-specific technique, so I'm not entirely sure why the author describes it as one. Any good low level language (C/C++ included) can do this from my experience.
For example "No interpreter startup" is not specific to Rust either.
(But also, I think Rust can fairly claim that it's made zero-copy deserialization a lot easier and safer.)
Also, aside from memory bandwidth, there’s a latency cost inherent in traversing object graphs - 0 copy techniques ensure you traverse that graph minimally, just what’s needed to actually be accessed which is huge when you scale up. There’s a difference between one network request and fetching 1 MB vs making 100 requests to fetch 10kib and this difference also appears in memory access patterns unless they’re absorbed by your cache (not guaranteed for object graph traversal that a package manager would be doing).
(I'm agnostic on whether zero-copy "matters" in every single context. If there's no complexity cost, which is what Rust's abstractions often provide, then it doesn't really hurt.)
with zipfile.ZipFile(archive_name) as a:
with a.open(file_name) as f, io.BytesIO() as b:
b.write(f.read())
return b.getvalue()
(That does, of course, copy data around within memory, but.)This is not what zero-copy means. Here's a working definition[1].
Specifically, it's not just about keeping things in memory; copying in memory is normal. The goal is to not make copies (or more precisely, what Rust would call "clones"), but to instead convey the original representation/views of that representation through the program's lifecycle where feasible.
> a deserialized version of the data necessarily cannot be the same object as the original data
rust-asn1 would be an example of a Rust library that doesn't make any copies of data unless you explicitly ask it to. When you load e.g. a Utf8String[2] in rust-asn1, you get a view into the original input buffer, not an intermediate owning object created from that buffer.
> (That does, of course, copy data around within memory, but.)
Yes, that's what makes it not zero-copy.
[1]: https://rkyv.org/zero-copy-deserialization.html
[2]: https://docs.rs/asn1/latest/asn1/struct.Utf8String.html
Yeah, so you'd have to pass around the `BytesIO` instead.
I know that zero-copy doesn't ordinarily mean what I described, but that seemed to be how TFA was using it, based on the logic in the rest of the sentence.
That wouldn’t be zero-copy either: BytesIO is an I/O abstraction over a buffer, so it intentionally masks the “lifetime” of the original buffer. In effect, reading from the BytesIO creates new copies of the underlying data by design, in new `bytes` objects.
(This is actually a great capsule example of why zero-copy design is difficult in Python: the Pythonic thing to do is to make lots of bytes/string/rich objects as you parse, each of which owns its data, which in turn means copies everywhere.)
I'm not convinced this is going to bottleneck things, though.
(On the flip side, I guess the OS is likely to cache any disk write in memory anyway.)
It’s ~impossible in Python (because you don’t control memory) and hard in C/similar (because of use-after-free).
Rust’s borrow checker makes it easier, but it’s still tricky (for non-trivial applications). You have to do all your transformations and data movements while only referencing the original data.
json = '{"user":"nugget"}' // from somewhere
A simple way to extract json["user"] to a new variable would be to copy the bytes. In pythony/c pseudo code let user = allocate_string(6 characters)
for i in range(0, 6)
user[i] = json["user"][i]
// user is now the string "nugget"
instead, a zero copy strategy would be to create a string pointer to the address of json offset by 9, and with a length of 6. {"user":"nugget"}
^ ]end
The reason this can be tricky in C is that when you call free(json), since user is a pointer to the same string that was json, you have effectively done free(user) as well.So if you use user after calling free(json), You have written a classic _memory safety_ bug called a "use after free" or UAF. Search around a bit for the insane number of use after free bugs there have been in popular software and the havoc they have wreaked.
In rust, when you create a variable referencing the memory of another (user pointing into json) it keeps track of that (as a "borrow", so that's what the borrow checker does if you have read about that) and won't compile if json is freed while you still have access to user. That's the main memory safety issue involved with zero-copy deserialization techniques.
Indeed. As demonstrated by the fact that pip has been doing exactly the same for years.
Part of the reason things are improving is that "tries PEP 658 metadata first" is more likely to succeed, and at some point build tools may have become more aware of how pip expects the zip to be organized (see https://packaging.python.org/en/latest/specifications/binary...), and way more projects ship wheels (because the manylinux standard has improved, and because pure-Python devs have become aware of things like https://pradyunsg.me/blog/2022/12/31/wheels-are-faster-pure-...).
Edit to add: I use python daily
uv has been a delight to use
I'd characterize that as unusable, for sure.
I do use it on CI/CD pipelines, but I wouldn't dare type uvx commands myself on a daily basis.
I'm a big fan of uv, but don't like that part of uvx.
(makes me wonder if a small wrapper can do this - safe uvx, or suvx for short)
It is also really nice when working interactivly to have snappy tools that don't take you out of the flow more than absolutely more than necessary. But then I'm quite sensitive to this, I'm one of those people who turn off all GUI animations because they waste my time and make the system feel slow.
"That's the kind of performance that actually changes how you work. It's no longer doing the same thing faster, it's allowing you to work in a completely different manner. That is why performance matters and why you really should not look at anything but git. Hg (Mercurial) is pretty good, but git is better."
(in a talk he did at Google, of which I the I found the transcripts here: https://gist.github.com/dukeofgaming/2150263)
Sometimes making something much faster turns it from something you try to avoid, maybe even unconsciously, to something you gladly make part of your workflow.
Since I started using uv I regularly create new venvs just for e.g. installing a package I'm not familiar with to try some things out and see if it fits my needs. With pip I would sometimes do that too, but not nearly as often because it would take too much time. Instead I would sometimes install the package in an existing venv, potentially polluting that project's dependencies. Or I use uvx to run tools that I would not consider using otherwise because of too much friction.
I was skeptical at first too. It's not until you start using uv and experience its speed and other useful features that you fully get why so many people switch from pip or poetry or whatever to uv.
But small efficiencies do matter; see e.g. https://danluu.com/productivity-velocity/.
E.G: if you compare it to your machine, it's a different thing that if you compare it to a locked down corporate machines.
I have clients that have Python setup so bad installing all deps for a project takes... 18 minutes. Those are not crazy projects either. It's just the context that is bad. And you won't be able to change the context. But we are in talk to change the package manager to uv.
There are so many different setups that are different than yours. If you are a professional trainer, and you get a new group every week, having 12 people installing their env in a blink is a win. If you are a researcher and you want to download the top 100 pypi packages and attempt to install them, speed is a bliss. If you are a blogger and try a lot of new stuff for an article, it's great. If you are working at repl.io and you get millions of venv created every day, boy does that matter. If you are sysadmin in charge of deploying kub pods, you might be looking at serious savings. Etc.
Speed affects many things:
- CI runs
- AI iterations
- docker builds
- isolated builds over multiple versions of python
But it also unlocks some use cases.
E.G:
- uvx is great only because uv is fast. Because uv calls are virtually instants, using uvx feels like magic.
- "uv run --with" exist only because overlaying a new venv on top of the other is basically free. And it's a killer feature.
- You never create lock files in uv. Because the operation is transparently done in the background since it's so fast. I can't recall the last time I ran uv sync. Because uv run automatically call it, since it's so fast you don't notice. So you just skip the middle man and go straight to coding.
I was a big proponent of "speed is not that important". Until I got speed.
And then I realized I missed a lot, because they are things you just can't do if you are slow.
(also switching python version is so fast)
A similar, but less drastic speedup applied to docker images.
This does not add anything to your argument, if you’re doing low velocity development and manage to remember all the best practices that ought to be followed, then fine. But if you have to do CI/CD like I have to, uv is a revelation. Just works out of the box and fast.
saying that, other than the solver, most of what uv does is always going to be IO bound
[1] https://pixi.sh/
How/why did the package maintainers start using all these improvements? Some of them sound like a bunch of work, and getting a package ecosystem to move is hard. Was there motivation to speed up installs across the ecosystem? If setup.py was working okay for folks, what incentivized them to start using pyproject.toml?
It wasn't working okay for many people, and many others haven't started using pyproject.toml.
For what I consider the most egregious example: Requests is one of the most popular libraries, under the PSF's official umbrella, which uses only Python code and thus doesn't even need to be "built" in a meaningful sense. It has a pyproject.toml file as of the last release. But that file isn't specifying the build setup following PEP 517/518/621 standards. That's supposed to appear in the next minor release, but they've only done patch releases this year and the relevant code is not at the head of the repo, even though it already caused problems for them this year. It's been more than a year and a half since the last minor release.
... Ah, I got confused for a bit. When I first noticed the `pyproject.toml` deficiency, it was because Requests was affected by the major Setuptools 72 backwards incompatibility. Then this year they were hit again by the major Setuptools 78 backwards incompatibility (which the Setuptools team consciously ignored in testing because Requests already publishes their own wheel, so this only affected the build-from-source purists like distro maintainers). See also my writeup https://lwn.net/Articles/1020576/ .