Shai-Hulud Themed Malware Found in the PyTorch Lightning AI Training Library

Posted by j12y 13 hours ago

Shai-Hulud Themed Malware Found in the PyTorch Lightning AI Training Library(semgrep.dev)

356 points | 123 comments

wlkr 11 hours ago|

This might just be the frequency illusion at play, but there seem to have been a number of high-profile supply chain attacks of late in major packages. There are several articles on the first few pages of HN right now with different cases.

Looking back ten years to `left-pad`, are there more successful attacks now than ever? I would suspect so, and surely the value of a successful attack has also increased, so are we actually getting better as a broad community at detecting them before package release? It's a complex space, and commercial software houses should do better, but it seems that whilst there are some excellent commercial products (e.g. CI scan tools), generally accessible, idiot friendly tooling is somewhat lacking for projects which start as hobby/amateur code but end up being a dependency in many other projects.

I've cross-posted my comment from the current SAP supply chain attack thread [0].

[0]: https://news.ycombinator.com/item?id=47964003

jefftk 3 hours ago||

>This might just be the frequency illusion at play, but there seem to have been a number of high-profile supply chain attacks of late in major packages.

It's real. As of the beginning of April we'd had 7 in the past 12 months vs 9 in the two decades before that: https://www.jefftk.com/p/more-and-more-extensive-supply-chai...

godelski 1 hour ago||

I think the real question is "are we just hearing about it more now or has the actual rate of attack increased?"

nullsanity 39 minutes ago||

[dead]

JohnMakin 10 hours ago|||

People are ramming tons of code into places without ever looking at it, it would follow that supply chain attacks would increase thusly.

eddythompson80 10 hours ago|||

Yeah, and ultimately no body cares. Everyone assumes it’s just some process miss, and we need to add another step to the process and move on. Fuck ups that would have killed the credibility of projects 10 years ago are now treated as “eeh what are you gonna do. Sometimes you ship malware. Will look into it”

CoastalCoder 6 hours ago|||

> Yeah, and ultimately no body cares.

I assume you're using hyperbole.

Some of us are very aware and concerned about the risk. But like Cassandra from Greek mythology, we see the coming disaster and feel powerless to stop it.

Bengalilol 10 hours ago||||

Good old « release first, fix later »

fennecbutt 5 hours ago||||

It's not that nobody cares. It's just that nobody who does care has the money or the power to change it.

Business school. Ahaha.

fjdjshsh 4 hours ago|||

Are you talking about open source or commercial products? I can't speak for the pytorch lighting case, but I wouldn't be surprised if the maintainers didn't get any $ from it. They would be sad if the credibility of the package suffers, but ultimately it wouldn't make a big difference to them

mschuster91 9 hours ago|||

> Looking back ten years to `left-pad`, are there more successful attacks now than ever? I would suspect so, and surely the value of a successful attack has also increased, so are we actually getting better as a broad community at detecting them before package release?

The value has increased, and that is what drives all these attacks. Cryptocurrencies are to blame in particular because they not just provided a way for money laundering the proceeds but also a juicy target in itself.

And what is stolen with today's malware? Cloud credentials. Either to use for illicit mining, which is on the decline, or to run extortion campaigns, which is made possible by cryptocurrencies. All too often it's North Korea or Iran running these campaigns.

zarzavat 2 hours ago|||

FWIW left-pad was not an attack, it was a bug in NPM. It should not be possible to unpublish package versions that are depended on by other published packages. On the other hand, it should be possible to unpublish certain package versions that are new and not depended on.

NPM should have returned error codes when the author of left-pad attempted to remove all his data with the intention of leaving the service.

To quote Wikipedia:

> After Koçulu expressed his disappointment with npm, Inc.'s decision and stated that he no longer wished to be part of the platform, Schlueter [author of NPM] provided him with a command that would delete all 273 modules that he had registered.

crabbone 9 hours ago|||

> Looking back ten years to `left-pad`, are there more successful attacks now than ever?

I can't vouch for the number of attacks, but, and since we are talking about Python, nothing substantially changed since the time of `left-pad`. The same bad things that enabled supply chain attacks in Python ten years ago are in place today. However, it looks like there are more projects and they are more interconnected than before, so, it's likely that there are either more supply chain attacks, or that they are more damaging, or both.

Here's my anecdotal experience with Python's packaging tools. For a while, I was maintaining a package to parse libconfuse configuration language. It started as a Python 2.7 project, but at the time there was already some version of Python 3 available, so, it was written in a way that was supposed to be future-proof.

I didn't need to change the code of the project in the last ten or so years, but roughly once a year something would break in the setup.py. Usually, because PyPA decided to remove a thing that didn't bother anyone.

When Python 3.13 came out, as clockwork, setup.py broke. I rolled up my sleeves and removed the dependency on setuptools, instead, I wrote some Python code that generated a wheel from the project's sources. I didn't look up the specification of the RECORD file in dist-info directory, and assumed that sha256().hexdigest() will generate the checksums in the desired format. And that's how I shipped my packages...

Some time later, the company added an AI reviewer to the company's repo and it discovered that instead of hexdigest() the checksums have to be base64-encoded and then padding removed...

Now, to the punchline: nobody cared. The incorrectly generated packages installed perfectly fine without warnings. Nobody checks the checksums.

More so: nobody checks that during `pip install` or the more fancy `uv pip install` the packages aren't built locally (i.e. nobody cares that package installation will result in arbitrary code execution). It's not just common, it's almost universal to run `pip install` on production machines as a means of deploying a Python program. How do I know this? -- The company I work for ships its Python client as a... source package. Not intentionally. We are just lazy. But nobody cares.

zelphirkalt 7 hours ago|||

It's probably the same people, who think that merely having a requirements.txt stating packages with versions or even without that (2010 sends its regards) is fine. Open a random open source Python project on GitHub, and chances are you will see this kind of thing. Stands to reason, that people in companies are not acting much different.

pxc 4 hours ago||||

> It's not just common, it's almost universal to run `pip install` on production machines as a means of deploying a Python program.

Maybe a Python culture problem; maybe a hallmark of Python's status as an "easy to hire for", manager-friendly, least common denominator blub language; maybe a risk that stems from the conveniences of interpreter languages... but this is such a shame in this day and age.

It's seriously not difficult to do better. And if this is what you're doing, you're also missing out on reproducible environments both in dev and in prod. At least autogenerate a Nix package! You still don't need to publish any artifacts, but you can at least have the thing build in a sandbox or yeet the whole closure over SSH.

It's also not that hard to get a Docker image out of a Python project.

You only need one platform-minded person on the whole development team to make this happen.

What is going on???

jrumbut 4 hours ago||||

As scary as it is right now, it warms your heart a little bit that this system existed for 30 years and is only now reaching a crisis point.

I ran an open source project with tens of thousands of downloads (presumably all either developer machines or webservers, so even a small number is valuable) and never received a malicious pull request, offer of a bribe to install malware, or a phishing attempt with enough effort to even catch my attention.

What it says to me is that there weren't a lot of people working on the crime side of this. It's like dropping your wallet in a bar bathroom and coming back to find it still there.

hulahoof 6 hours ago|||

left-pad was an npm issue

sieabahlpark 4 hours ago|||

[dead]

cachius 7 hours ago||

No need to invoke frequency illusion when every moderate HN lurker already stopped counting. https://socket.dev/blog gives a good impression, but a dedicated article would be nice. Maybe recurring once or twice a year.

If you're interested in synchronicity and frequency illusion, Sergei v. Chekanov wrote a book that sounds interesting https://jwork.org/designed-world/

Have you ever experienced coincidences that cannot be logically explained? This book helps the readers understand the meaning of synchronicity, or remarkable coincidences in people's lives. This work not only explains the mystery of synchronicity, originally introduced by Carl Jung, but it also shows how to make simple calculations to estimate the chances that coincidences are not due to mere randomness.

RandyOrion 2 hours ago||

One thing that makes me wonder is that there are 4 security issues raised and all of them were automatically commented and closed by some bot called `pl-ghost` [1][2][3][4]. In the end, only this one [4] properly handled, and all bot comments are deleted. You can see the bot comments in another report [5], which is more informative than the OP one.

[1] https://github.com/Lightning-AI/pytorch-lightning/issues/216...

[2] https://github.com/Lightning-AI/pytorch-lightning/issues/216...

[3] https://github.com/Lightning-AI/pytorch-lightning/issues/216...

[4] https://github.com/Lightning-AI/pytorch-lightning/issues/216...

[5] https://socket.dev/blog/lightning-pypi-package-compromised

jackdoe 11 hours ago||

I cant wait to have no dependencies.

An extreme example is now when I make interactive educational apps for my daughter, I just make Opus use plain js and html; from double pendulums to fluid simulations, works one shot. Before I had hundreds of dependencies.

Luckily with MIT licensed code I can just tell Opus to extract exactly the pieces I need and embed them, and tweaked for my usecase. So far works great for hobby projects, but hopefully in the future productions software will have no dependencies.

mandevil 10 hours ago||

The problem with this is now you are solely responsible for managing all of the changes, all of the variation of life. Chrome changed the shape of this API, you are responsible for finding it and updating it. Morocco changed when their daylight savings took effect, now you need to update your date/time handling code. There are a lot of these things that we take for granted because our libraries handle it for us, and with no dependencies you have to do all the work. Not a big deal for making a double-pendulum simulator for your daughter to play with that will stop mattering next week, but is a concern for a company which is trying to build something that can run indefinitely into the future.

Aperocky 9 hours ago|||

> you are responsible for finding it and updating it.

vs the dependency broke something and now you're responsible for working around someone else's broken code.

Honestly, I've seen much more of the latter. Especially nowadays with every single dependency thinking they are an fully fledged OS because an agent can add 1000 feature/bug in no time. Picking the right dependency maintaining by a sane maintainer is like digging potatoes in a minefield.

zdragnar 10 hours ago||||

As a general principle, I agree with you that large companies and teams benefit from common runtimes (i.e. libraries and frameworks).

I don't buy the notion of things breaking down over time, though. For "first-party" code that sticks to HTML and CSS standards, and Stage 4 / finished ecmascript standards, the web is an absurdly stable platform.

It certainly used to be that we had to do all sorts of weird vendor hacks because nobody agreed on anything and supporting IE6 and 7 were nightmares, and blackberry's browser was awful, but those days are largely behind us unless you're doing some cutting-edge chrome-only early days proposed stuff or a browser specific extension or something else that isn't a polished standard.

Even with timezone changes, you're better off using the system's information with Intl.DateTimeFormat.

skydhash 9 hours ago||

I don’t know where the fear of breaking changes in deps comes from, but most good projects tries to keep their API stable. Even with fast-evolving platforms like Android and iOS sdk.

awakeasleep 8 hours ago||

It comes from trying to use Python apps you found on GitHub before uv tool install was a thing

zelphirkalt 7 hours ago||

In the Python ecosystem making software with reproducibility in mind was a thing before the advent of uv. Some earlier options include Pipenv and Poetry. I used Pipenv already some 6y ago to achieve that and later switched to Poetry.

I think devs who didn't care back then also won't care in the future and will still run around with requirements.txt file in 10 years.

dualvariable 10 hours ago||||

In companies, though, you often wind up with three+ massive dependency trees in your software to handle the same problem because people went and added the new hotness without deprecating the old stuff. You also find dependencies that are much heavier than necessary for the actual task at hand because the software developer was also solving the problem of needing that dependency on their resume. And then there's just the relatively tiny dependencies for fairly solved problems, like leftpad, which don't really require deps, and you can accept the maintenance burden, because not everything is an abstraction layer over chrome.

So if you just need to do something simple like fire off a compute heavy background task and then get a result when it is done, you should probably just roll your own implementation on top of the threading API in your language. That'll probably be very stable. You don't need a massive background task orchestration framework.

People might object that the frameworks will handle edge cases that you've never thought of, but I've actually found in enterprise settings that the small custom implementations--if you actually keep it small and focused--can cover more of the edge cases. And the big frameworks often engineer their own brittle edge cases due to concerns that you just don't have.

So anyway, it isn't as simple as "dependencies are bad" or "dependencies are good", but every dependency has a cost/benefit analysis that needs to go along with it. And in an Enterprise, I'd argue that if you audit the existing dependencies you will find way too many of them that should be removed or consolidated because they were done for the speed of initial delivery and greenfielding. Eventually when you accumulate way too many of those dependencies the exposure to the supply chains, the need to keep them updated, the need to track CVEs in those deps, and the need to fix code to use updated versions of those dependencies, along with not have the direct ability to bugfix them, all combine to produce an ongoing tax of either continual maintenance or tech debt that will eventually bite you hard.

RALaBarge 3 hours ago||||

I think we are stuck with LLMs. They are already in a place where they can find these issues in the first place. They can access RSS feeds. You could cron an agent to look to see if you are pwned as frequently as you want at literally almost zero cost. When you do ingest the libraries, keep a list and of what version and that can help as well.

jackdoe 7 hours ago|||

> The problem with this is now you are solely responsible for managing all of the changes

We seem to greatly overestimate the amount of code needed to do something.

For example, there are billions of lines of code from me pressing a key, to you seeing what I wrote. But if we were to make a special program that communicates via ipv6 and icmp, and it is written for hazard3 pico2350 with wiz5500 ethernet breakout, the whole thing including the c compiler to compile your code (which could very well outperform gcc -O3) will be 5-6k lines of code, including RA, and even barebones spi drivers, and a small preemptive os.

So, it is not unreasonable to manage all of those changes.

solid_fuel 10 hours ago|||

And of course, you will go over every line of code that Opus produces with the same scrutiny we expect of open source maintainers, right? Right?

I'm going to go publish some MIT-licensed remote access code and get that into Opus's training data.

Aperocky 11 hours ago|||

I am torn because I like rust over go, and rust is better from an LLM perspective. But the dependency philosophy on rust is basically a security blackhole whereas go is much better.

kblissett 11 hours ago|||

I have found Go is an amazing language for LLMs. What do you prefer about Rust?

Aperocky 10 hours ago||

A portion of context and vibe protection that are required is exported to the compiler. In addition rust binaries are generally smaller both in terms of size and footprint.

Imustaskforhelp 10 hours ago||

I sort of agree with you but for me, I prefer golang because I believe that for most use cases, Golang fits perfectly (I run a 500mb 7$/yr vps with debian and use golang binaries)

Cross portability and compilation and its very few dependency/stdlib approach with simplicity, I just really love golang.

I had built[0] a cuckoo.org alternative at https://fossbox.cloud which has only one dependency of gorilla web sockets aside from stdlib

If I were to rewrite it in rust, I couldn't say the same. Golang's stdlib is that good.

My point is, although I understand Rust can have some advantages in other areas, the advantages of golang outweigh rust for me by a very high margin. There is also the factor that I just feel more comfortable reading golang code and picking through it than rust.

It is my opinion that you can go a very very long way with a garbage collector than people imagine even on constrained systems. Unless absolutely necessary, thinking about GC feels like it might be a premature optimization in many instances which is worth thinking about.

[0]: More like (vibecoded?) as this is just a single file main.go which I had prompted on gemini 3.1 pro sometime ago. It was just a prototype which works surprisingly well that I had made because I was using the cuckoo website with friends but it kept on lagging.

Aperocky 9 hours ago||

Well I almost have the same story, my agent harness is a 5mb rust binary that runs as systemd service and occupy 10mb of memory after days. This handles all communciations between 100+ agents.

Now I think go will come close to this number, so in reality, there might not be a real difference. But a leak somewhere is far more likely especially as these are mostly vibe coded (my binary has multiple functionality).

The biggest advantage that go have over rust is the stdlib and ecosystem that doesn't depend on 100 packages. And maybe that will be the deciding factor in the future or someone (I'm getting increasingly itchy for it) will need to reinvent the ecosystem to be less like npm.

mamcx 10 hours ago|||

Vendoring don't basically copy what go does?

Aperocky 9 hours ago||

You can trust a single big stdlib more than the 100 dependency that tokio pulls at any given time.

RALaBarge 3 hours ago||

Yeah then you can version lock changes to one thing post-evaluation vs or even easier as noted above, download the stdlib and host it yourself.

v4nderstruck 10 hours ago|||

well surely Opus would never introduce vulnerabilities into the code so that sounds like the solution.

2ndorderthought 9 hours ago||

So true. Whenever I run opus I absolutely do not look at the code at all. That's for luddites.

OtherShrezzing 10 hours ago|||

I think in the relatively near future we’re going to start seeing sophisticated supply chain attacks into language model training data.

It should be feasible to design vulnerabilities which look benign individually in training data, but when composed together in the agent plane & executed in a chain introduce an exploit.

There’s nothing technical really stopping that from existing right now. It’s just that nobody has put the effort in yet.

lacunary 9 hours ago||

The develop-test-refine feedback loop for this kind of attack is so long (or expensive) that it seems likely to limit its real world use. Poison training data, wait months? a year? for the model to come out, see how well it worked, refine... or do you see a faster way to iterate?

sieabahlpark 4 hours ago||

[dead]

contingencies 2 hours ago|||

Love it :) Excellent quippy summary of the zeitgeist. Added to https://github.com/globalcitizen/taoup

gib444 10 hours ago||

Your LLM isn't a dependency?

stronglikedan 9 hours ago||

It's a tool for building things. I can build those things equally well with or without it, maybe saving some time with it (arguable)), but I'm not dependent on it.

gib444 35 minutes ago||

No, I'd posit the average developer who pulls in hundreds of deps but now uses LLMs to effectively replace them can not build things equally well without either.

Of course most devs lie to ourselves because of our ego that pulling in deps is /just/ a time-saving measure, but of course we know there are some incredibly high quality libraries and frameworks that we don't have the skills or experience to replicate to the same level

mkeeter 11 hours ago||

A repository search shows 2.2K repos with the text "A Mini Shai-Hulud has Appeared", all created within the past day:

https://github.com/search?q=A%20Mini%20Shai-Hulud%20has%20Ap...

rhdunn 11 hours ago||

The repository names all look like two terms/words from dune (harkonen, mentat, ornithoptor, etc.) followed by a number. This would indicate that the account (possibly GitHub auth/actions token) has been compromised and then used to create the repository.

avaer 7 hours ago|||

Why can't GitHub get on the case and just block any repo where the README matches the regex? I thought they'd have learned their lesson the last time it happened.

This malware isn't even trying. Then again it's Microsoft so they're not even trying either.

eddythompson80 7 hours ago|||

6 minutes later an HN submission "GitHub blocks your account if you mention X in the README" with a top comment "This is absurd, are they just doing regex matching to check for malware?"

bbor 52 minutes ago||||

1. This happened less than 24 hours ago.

2. This is just one of the four techniques the worm uses to phone home.

sgskinner 4 hours ago|||

“Some people, when confronted with a problem, think ‘I know, I’ll use regular expressions.’ Now they have two problems.”

i_think_so 3 hours ago||

[dead]

spate141 11 hours ago||

what's this all about?

foo12bar 11 hours ago|||

FTFA

> The attack steals credentials, authentication tokens, environment variables, and cloud secrets, while also attempting to poison GitHub repositories.

CodeAndCuffs 11 hours ago||

That doesn't really explain why there is a bunch of GitHub repos created as well.

If I remember correctly from Shai-Hulud 2, the attacker extricated creds by posting them in public github repos with minor easily reversible encryption. I believe it was double b64 last time.

I'm assuming the logic there is that every security researcher and company is going to pull and scan those creds for their stuff and their clients' stuff. So the attacker is just 1 of N people downloading it. As opposed to trying to send it to their own machine directly.

arsome 11 hours ago|||

I think it's more about convenience and bypassing filters - developers are already logged in to github, already have access to create repos and publish code, firewalls will allow it. Even fancy HIDS systems will think the git push is rather normal.

If they have a clue, the attacker still will not download that without using a botnet tunnel or Tor at a minimum.

Note though that these credentials aren't even encrypted using some lightweight ECC to prevent others from capturing them, they're posted in cleartext. Embarassment might be part of the point.

bbor 49 minutes ago||||

With HN ettiquette in mind, I must make an exception: this is a case where skimming the first parts of the article would help a lot!

The public repo path is just one of four parallel paths, with the goal of getting around any barriers:

  The exfiltration component shares its design with the "Mini Shai-Hulud" mechanism from their last campaign, using four parallel channels so stolen data gets out even if individual paths are blocked.

i_think_so 3 hours ago|||

[dead]

progbits 11 hours ago|||

Malware uploading the credentials it managed to steal

auraham 10 hours ago||

This week I was wondering whether using uv for managing Python versions is a good idea.

From their website [1]

> Python does not publish official distributable binaries. As such, uv uses distributions from the Astral python-build-standalone project. See the Python distributions documentation for more details.

It points to this GitHub repo https://github.com/astral-sh/python-build-standalone which mentions this other link https://gregoryszorc.com/docs/python-build-standalone/main/r...

If I understand correctly, the source code for building Python is not fetched directly from python.org. Not so sure how secure is that.

I have the same concern for asdf [2]. However, they use pyenv [3] which, I think, feels more official.

Can someone clarify this? Which tool is better/more secure for installing python: uv or asdf?

[1] https://docs.astral.sh/uv/guides/install-python/

[2] https://github.com/asdf-community/asdf-python

[3] https://github.com/pyenv/pyenv/tree/master/plugins/python-bu...

woodruffw 9 hours ago||

> If I understand correctly, the source code for building Python is not fetched directly from python.org. Not so sure how secure is that.

python-build-standalone fetches CPython sources directly from python.org[1]. I don't even know where else we would get them from!

[1]: https://github.com/astral-sh/python-build-standalone/blob/a2...

auraham 9 hours ago||

Thanks for pointing that out.

bbor 42 minutes ago|||

I'm really not worried about `uv` and `cpython` -- their processes are robust, their response times fast, and (now) their funding significant

I'm worried about, say, `mdformat` (a widely used formatter mostly maintained by one person in their spare time), not to mention some super-specific dependency that hasn't been updated in years and is 3 levels deep in your dep tree. I really don't want to pin & manually approve every single update for an app that's under active development, but it's beginning to look like that's mandatory for any serious app.

In the meantime, I've gotta go get my API keys out of my unencrypted `.env` files! Getting burned on a large, consumer-facing webapp would be embarrasing but logical, but losing hundreds to thousands of dollars because of some indirect dependency of some silly one-off demo repo that just happens to be on the same host & system as my `.env`s... oof.

Anyone know if OAI or Anthropic will refund you if you get your keys stolen like this? Or is it user error?

throawayonthe 9 hours ago||

i mean... uv is already a binary you run on your computer to manage python binaries, packages (and any binaries with those), systemwide tools etc; how much does it change whether they build the python binaries or someone else?

auraham 9 hours ago||

Both uv and asdf can be compiled from source. I prefer that way.

mixedbit 9 hours ago||

When I was doing Fast.AI Deep Learning course, I was surprised by the number of Python dependencies machine learning projects bring. Web front-end projects were always considered very third-party dependencies heavy, but to me, the machine learning ecosystem looks much more entangled. In addition, unlike web development, which is considered security critical and has over the many years accumulated a lot of wisdom and good security-related practices, machine learning development looks much more ad-hoc, with many common software engineering practices not applied.

For example, at that time, one way to distribute machine learning models was via Python pickles. Which are executable objects with no restriction built in. Models in this format could do anything on a computer where the model was imported. Such an early 'wild-west' ecosystem can definitely make security compromises easier and resulting supply chain attacks more common.

zelphirkalt 7 hours ago|

There are many people in that ecosystem, who are not primarily software engineers. Some just learned some coding along the way. Some are mathematicians. Some are devs who are AI drunk or something. Some have the mindset of "code doesn't matter any longer, if it works it works". For many proper dependency management is just a chore, that they don't want to care about. These things come together in various ML projects, even though ML projects should be amongst the projects most focused on reproducibility.

nrengan 9 hours ago||

Most of my pip installs come from Claude Code suggesting them now and me just hitting enter. Model was trained months ago, so it has no clue what got compromised this week. We built the worst possible filter for "is this package safe right now".

moritzwarhier 8 hours ago||

What filter?

You say you rely on CC to suggest software to install from the internet, and then you install it.

I haven't heard anyone suggest CC or any LLM as a "filter" for "is this package safe right now", and it seems like a very bad heuristic to me, not only, but also for the reason you gave.

nrengan 8 hours ago||

Well, people weren't checking CVEs before pip install before CC either, CC just scaled the habit to a larger audience at a faster cadence. The blast radius for day-zero compromises is what changed.

throwatdem12311 5 hours ago|||

Stop blaming the LLM for your laziness and lack of due diligence.

zarzavat 1 hour ago||

Indeed, I also use LLMs to suggest dependencies but:

- I ask the LLM for multiple options

- I tell it what I need and what I don't need

- I then look at the packages it has suggested. Sometimes LLMs suggest unmaintained packages with 5 downloads a month just because it came at the top of a web search.

- if it's not a very well known project, I look at the code, I have received vibecoded dependency suggestions before that don't even function

LLMs are useful resources for "getting the pulse of the ecosystem", but just pressing enter is crazy.

throwatdem12311 59 minutes ago||

exactly

nulltrace 6 hours ago|||

Stale training data is part of it. But even a current model can't tell what setup.py is going to run on your box. Nothing actually inspects the package before it executes. You'd want something that pulls the metadata and checks what hooks are in there before anything runs.

ashishbijlani 5 hours ago||

Built Packj [1] to do exactly this.

1. Packj (https://github.com/ossillate-inc/packj) detects malicious PyPI/NPM/Ruby/PHP/etc. dependencies using behavioral analysis. It uses static+dynamic code analysis to scan for indicators of compromise (e.g., spawning of shell, use of SSH keys, network communication, use of decode+eval, etc). It also checks for several metadata attributes to detect bad actors (e.g., typo squatting).

BrenBarn 9 hours ago||

By "the worst possible filter" do you mean "hitting enter when claude tells you to"?

throwawayqqq11 9 hours ago||

"Sandbox this project before you make no mistakes."

throwatdem12311 5 hours ago||

Claude Code updates almost every day, sometimes multiple times.

One of these days Anthropic is going to be compromised and we’re all gonna be f*cked.

woodson 3 hours ago|

Not if one is running it in a non-privileged vm/container with restricted network access. But everything is YOLO these days.

gcapu 10 hours ago||

On GitHub, I saw this message from April 20, and I’m a bit confused.

"deependujha hi @thebaptiste, thanks for inquiring. Release of 2.6.2 is blocked due to some internal reasons. Will notify once release is made. "

I'd hate it if they knew of the problem that long ago and didn't warn until now. If someone has more info and can clarify I'd be thankful.

https://github.com/Lightning-AI/pytorch-lightning/issues/216...

andymcsherry 7 hours ago||

Andy from Lightning here. The malicious packages were published today at 12:45 PM UTC to PyPi. Before that, there were no affected distributions, and we were unaware of any leak. The original release on Github did not contain the issue, but we have taken it down to prevent any confusion.

mil22 10 hours ago||

For those using uv: https://docs.astral.sh/uv/reference/settings/#exclude-newer

gcapu 10 hours ago||

I appreciate the tip, but your response has nothing to do with my question

achandra03 12 hours ago|

Bless the Maker and His water.

More comments...