Posted by j12y 13 hours ago
Looking back ten years to `left-pad`, are there more successful attacks now than ever? I would suspect so, and surely the value of a successful attack has also increased, so are we actually getting better as a broad community at detecting them before package release? It's a complex space, and commercial software houses should do better, but it seems that whilst there are some excellent commercial products (e.g. CI scan tools), generally accessible, idiot friendly tooling is somewhat lacking for projects which start as hobby/amateur code but end up being a dependency in many other projects.
I've cross-posted my comment from the current SAP supply chain attack thread [0].
It's real. As of the beginning of April we'd had 7 in the past 12 months vs 9 in the two decades before that: https://www.jefftk.com/p/more-and-more-extensive-supply-chai...
I assume you're using hyperbole.
Some of us are very aware and concerned about the risk. But like Cassandra from Greek mythology, we see the coming disaster and feel powerless to stop it.
Business school. Ahaha.
The value has increased, and that is what drives all these attacks. Cryptocurrencies are to blame in particular because they not just provided a way for money laundering the proceeds but also a juicy target in itself.
And what is stolen with today's malware? Cloud credentials. Either to use for illicit mining, which is on the decline, or to run extortion campaigns, which is made possible by cryptocurrencies. All too often it's North Korea or Iran running these campaigns.
NPM should have returned error codes when the author of left-pad attempted to remove all his data with the intention of leaving the service.
To quote Wikipedia:
> After Koçulu expressed his disappointment with npm, Inc.'s decision and stated that he no longer wished to be part of the platform, Schlueter [author of NPM] provided him with a command that would delete all 273 modules that he had registered.
I can't vouch for the number of attacks, but, and since we are talking about Python, nothing substantially changed since the time of `left-pad`. The same bad things that enabled supply chain attacks in Python ten years ago are in place today. However, it looks like there are more projects and they are more interconnected than before, so, it's likely that there are either more supply chain attacks, or that they are more damaging, or both.
Here's my anecdotal experience with Python's packaging tools. For a while, I was maintaining a package to parse libconfuse configuration language. It started as a Python 2.7 project, but at the time there was already some version of Python 3 available, so, it was written in a way that was supposed to be future-proof.
I didn't need to change the code of the project in the last ten or so years, but roughly once a year something would break in the setup.py. Usually, because PyPA decided to remove a thing that didn't bother anyone.
When Python 3.13 came out, as clockwork, setup.py broke. I rolled up my sleeves and removed the dependency on setuptools, instead, I wrote some Python code that generated a wheel from the project's sources. I didn't look up the specification of the RECORD file in dist-info directory, and assumed that sha256().hexdigest() will generate the checksums in the desired format. And that's how I shipped my packages...
Some time later, the company added an AI reviewer to the company's repo and it discovered that instead of hexdigest() the checksums have to be base64-encoded and then padding removed...
Now, to the punchline: nobody cared. The incorrectly generated packages installed perfectly fine without warnings. Nobody checks the checksums.
More so: nobody checks that during `pip install` or the more fancy `uv pip install` the packages aren't built locally (i.e. nobody cares that package installation will result in arbitrary code execution). It's not just common, it's almost universal to run `pip install` on production machines as a means of deploying a Python program. How do I know this? -- The company I work for ships its Python client as a... source package. Not intentionally. We are just lazy. But nobody cares.
Maybe a Python culture problem; maybe a hallmark of Python's status as an "easy to hire for", manager-friendly, least common denominator blub language; maybe a risk that stems from the conveniences of interpreter languages... but this is such a shame in this day and age.
It's seriously not difficult to do better. And if this is what you're doing, you're also missing out on reproducible environments both in dev and in prod. At least autogenerate a Nix package! You still don't need to publish any artifacts, but you can at least have the thing build in a sandbox or yeet the whole closure over SSH.
It's also not that hard to get a Docker image out of a Python project.
You only need one platform-minded person on the whole development team to make this happen.
What is going on???
I ran an open source project with tens of thousands of downloads (presumably all either developer machines or webservers, so even a small number is valuable) and never received a malicious pull request, offer of a bribe to install malware, or a phishing attempt with enough effort to even catch my attention.
What it says to me is that there weren't a lot of people working on the crime side of this. It's like dropping your wallet in a bar bathroom and coming back to find it still there.
If you're interested in synchronicity and frequency illusion, Sergei v. Chekanov wrote a book that sounds interesting https://jwork.org/designed-world/
Have you ever experienced coincidences that cannot be logically explained? This book helps the readers understand the meaning of synchronicity, or remarkable coincidences in people's lives. This work not only explains the mystery of synchronicity, originally introduced by Carl Jung, but it also shows how to make simple calculations to estimate the chances that coincidences are not due to mere randomness.
[1] https://github.com/Lightning-AI/pytorch-lightning/issues/216...
[2] https://github.com/Lightning-AI/pytorch-lightning/issues/216...
[3] https://github.com/Lightning-AI/pytorch-lightning/issues/216...
[4] https://github.com/Lightning-AI/pytorch-lightning/issues/216...
[5] https://socket.dev/blog/lightning-pypi-package-compromised
An extreme example is now when I make interactive educational apps for my daughter, I just make Opus use plain js and html; from double pendulums to fluid simulations, works one shot. Before I had hundreds of dependencies.
Luckily with MIT licensed code I can just tell Opus to extract exactly the pieces I need and embed them, and tweaked for my usecase. So far works great for hobby projects, but hopefully in the future productions software will have no dependencies.
vs the dependency broke something and now you're responsible for working around someone else's broken code.
Honestly, I've seen much more of the latter. Especially nowadays with every single dependency thinking they are an fully fledged OS because an agent can add 1000 feature/bug in no time. Picking the right dependency maintaining by a sane maintainer is like digging potatoes in a minefield.
I don't buy the notion of things breaking down over time, though. For "first-party" code that sticks to HTML and CSS standards, and Stage 4 / finished ecmascript standards, the web is an absurdly stable platform.
It certainly used to be that we had to do all sorts of weird vendor hacks because nobody agreed on anything and supporting IE6 and 7 were nightmares, and blackberry's browser was awful, but those days are largely behind us unless you're doing some cutting-edge chrome-only early days proposed stuff or a browser specific extension or something else that isn't a polished standard.
Even with timezone changes, you're better off using the system's information with Intl.DateTimeFormat.
I think devs who didn't care back then also won't care in the future and will still run around with requirements.txt file in 10 years.
So if you just need to do something simple like fire off a compute heavy background task and then get a result when it is done, you should probably just roll your own implementation on top of the threading API in your language. That'll probably be very stable. You don't need a massive background task orchestration framework.
People might object that the frameworks will handle edge cases that you've never thought of, but I've actually found in enterprise settings that the small custom implementations--if you actually keep it small and focused--can cover more of the edge cases. And the big frameworks often engineer their own brittle edge cases due to concerns that you just don't have.
So anyway, it isn't as simple as "dependencies are bad" or "dependencies are good", but every dependency has a cost/benefit analysis that needs to go along with it. And in an Enterprise, I'd argue that if you audit the existing dependencies you will find way too many of them that should be removed or consolidated because they were done for the speed of initial delivery and greenfielding. Eventually when you accumulate way too many of those dependencies the exposure to the supply chains, the need to keep them updated, the need to track CVEs in those deps, and the need to fix code to use updated versions of those dependencies, along with not have the direct ability to bugfix them, all combine to produce an ongoing tax of either continual maintenance or tech debt that will eventually bite you hard.
We seem to greatly overestimate the amount of code needed to do something.
For example, there are billions of lines of code from me pressing a key, to you seeing what I wrote. But if we were to make a special program that communicates via ipv6 and icmp, and it is written for hazard3 pico2350 with wiz5500 ethernet breakout, the whole thing including the c compiler to compile your code (which could very well outperform gcc -O3) will be 5-6k lines of code, including RA, and even barebones spi drivers, and a small preemptive os.
So, it is not unreasonable to manage all of those changes.
I'm going to go publish some MIT-licensed remote access code and get that into Opus's training data.
Cross portability and compilation and its very few dependency/stdlib approach with simplicity, I just really love golang.
I had built[0] a cuckoo.org alternative at https://fossbox.cloud which has only one dependency of gorilla web sockets aside from stdlib
If I were to rewrite it in rust, I couldn't say the same. Golang's stdlib is that good.
My point is, although I understand Rust can have some advantages in other areas, the advantages of golang outweigh rust for me by a very high margin. There is also the factor that I just feel more comfortable reading golang code and picking through it than rust.
It is my opinion that you can go a very very long way with a garbage collector than people imagine even on constrained systems. Unless absolutely necessary, thinking about GC feels like it might be a premature optimization in many instances which is worth thinking about.
[0]: More like (vibecoded?) as this is just a single file main.go which I had prompted on gemini 3.1 pro sometime ago. It was just a prototype which works surprisingly well that I had made because I was using the cuckoo website with friends but it kept on lagging.
Now I think go will come close to this number, so in reality, there might not be a real difference. But a leak somewhere is far more likely especially as these are mostly vibe coded (my binary has multiple functionality).
The biggest advantage that go have over rust is the stdlib and ecosystem that doesn't depend on 100 packages. And maybe that will be the deciding factor in the future or someone (I'm getting increasingly itchy for it) will need to reinvent the ecosystem to be less like npm.
It should be feasible to design vulnerabilities which look benign individually in training data, but when composed together in the agent plane & executed in a chain introduce an exploit.
There’s nothing technical really stopping that from existing right now. It’s just that nobody has put the effort in yet.
Of course most devs lie to ourselves because of our ego that pulling in deps is /just/ a time-saving measure, but of course we know there are some incredibly high quality libraries and frameworks that we don't have the skills or experience to replicate to the same level
https://github.com/search?q=A%20Mini%20Shai-Hulud%20has%20Ap...
This malware isn't even trying. Then again it's Microsoft so they're not even trying either.
2. This is just one of the four techniques the worm uses to phone home.
> The attack steals credentials, authentication tokens, environment variables, and cloud secrets, while also attempting to poison GitHub repositories.
If I remember correctly from Shai-Hulud 2, the attacker extricated creds by posting them in public github repos with minor easily reversible encryption. I believe it was double b64 last time.
I'm assuming the logic there is that every security researcher and company is going to pull and scan those creds for their stuff and their clients' stuff. So the attacker is just 1 of N people downloading it. As opposed to trying to send it to their own machine directly.
If they have a clue, the attacker still will not download that without using a botnet tunnel or Tor at a minimum.
Note though that these credentials aren't even encrypted using some lightweight ECC to prevent others from capturing them, they're posted in cleartext. Embarassment might be part of the point.
The public repo path is just one of four parallel paths, with the goal of getting around any barriers:
The exfiltration component shares its design with the "Mini Shai-Hulud" mechanism from their last campaign, using four parallel channels so stolen data gets out even if individual paths are blocked.From their website [1]
> Python does not publish official distributable binaries. As such, uv uses distributions from the Astral python-build-standalone project. See the Python distributions documentation for more details.
It points to this GitHub repo https://github.com/astral-sh/python-build-standalone which mentions this other link https://gregoryszorc.com/docs/python-build-standalone/main/r...
If I understand correctly, the source code for building Python is not fetched directly from python.org. Not so sure how secure is that.
I have the same concern for asdf [2]. However, they use pyenv [3] which, I think, feels more official.
Can someone clarify this? Which tool is better/more secure for installing python: uv or asdf?
[1] https://docs.astral.sh/uv/guides/install-python/
[2] https://github.com/asdf-community/asdf-python
[3] https://github.com/pyenv/pyenv/tree/master/plugins/python-bu...
python-build-standalone fetches CPython sources directly from python.org[1]. I don't even know where else we would get them from!
[1]: https://github.com/astral-sh/python-build-standalone/blob/a2...
I'm worried about, say, `mdformat` (a widely used formatter mostly maintained by one person in their spare time), not to mention some super-specific dependency that hasn't been updated in years and is 3 levels deep in your dep tree. I really don't want to pin & manually approve every single update for an app that's under active development, but it's beginning to look like that's mandatory for any serious app.
In the meantime, I've gotta go get my API keys out of my unencrypted `.env` files! Getting burned on a large, consumer-facing webapp would be embarrasing but logical, but losing hundreds to thousands of dollars because of some indirect dependency of some silly one-off demo repo that just happens to be on the same host & system as my `.env`s... oof.
Anyone know if OAI or Anthropic will refund you if you get your keys stolen like this? Or is it user error?
For example, at that time, one way to distribute machine learning models was via Python pickles. Which are executable objects with no restriction built in. Models in this format could do anything on a computer where the model was imported. Such an early 'wild-west' ecosystem can definitely make security compromises easier and resulting supply chain attacks more common.
You say you rely on CC to suggest software to install from the internet, and then you install it.
I haven't heard anyone suggest CC or any LLM as a "filter" for "is this package safe right now", and it seems like a very bad heuristic to me, not only, but also for the reason you gave.
- I ask the LLM for multiple options
- I tell it what I need and what I don't need
- I then look at the packages it has suggested. Sometimes LLMs suggest unmaintained packages with 5 downloads a month just because it came at the top of a web search.
- if it's not a very well known project, I look at the code, I have received vibecoded dependency suggestions before that don't even function
LLMs are useful resources for "getting the pulse of the ecosystem", but just pressing enter is crazy.
1. Packj (https://github.com/ossillate-inc/packj) detects malicious PyPI/NPM/Ruby/PHP/etc. dependencies using behavioral analysis. It uses static+dynamic code analysis to scan for indicators of compromise (e.g., spawning of shell, use of SSH keys, network communication, use of decode+eval, etc). It also checks for several metadata attributes to detect bad actors (e.g., typo squatting).
One of these days Anthropic is going to be compromised and we’re all gonna be f*cked.
"deependujha hi @thebaptiste, thanks for inquiring. Release of 2.6.2 is blocked due to some internal reasons. Will notify once release is made. "
I'd hate it if they knew of the problem that long ago and didn't warn until now. If someone has more info and can clarify I'd be thankful.
https://github.com/Lightning-AI/pytorch-lightning/issues/216...