Top
Best
New

Posted by takira 1/14/2026

Claude Cowork exfiltrates files(www.promptarmor.com)
870 points | 399 commentspage 4
tnynt63 1/16/2026|
Non-stop under attack by entire locals hackers and using http thiland government files inside my phone, its unknown codes and even yandex can't solves almost 6 months over we found at browser for weather forecast
woggy 1/14/2026||
What's the chance of getting Opus 4.5-level models running locally in the future?
dragonwriter 1/14/2026||
So, there are two aspects of that:

(1) Opus 4.5-level models that have weights and inference code available, and

(2) Opus 4.5-level models whose resource demands are such that they will run adequately on the machines that the intended sense of “local” refers to.

(1) is probable in the relatively near future: open models trail frontier models, but not so much that that is likely to be far off.

(2) Depends on whether “local” is “in our on prem server room” or “on each worker’s laptop”. Both will probably eventually happen, but the laptop one may be pretty far off.

SOLAR_FIELDS 1/14/2026|||
Probably not too far off, but then you’ll probably still want the frontier model because it will be even better.

Unless we are hitting the maxima of what these things are capable of now of course. But there’s not really much indication that this is happening

woggy 1/14/2026|||
I was thinking about this the other day. If we did a plot of 'model ability' vs 'computational resources' what kind of relationship would we see? Is the improvement due to algorithmic improvements or just more and more hardware?
chasd00 1/14/2026|||
i don't think adding more hardware does anything except increase performance scaling. I think most improvement gains are made through specialized training (RL) after the base training is done. I suppose more GPU RAM means a larger model is feasible, so in that case more hardware could mean a better model. I get the feeling all the datacenters being proposed are there to either serve the API or create and train various specialized models from a base general one.
ryoshu 1/14/2026|||
I think the harnesses are responsible for a lot of recent gains.
NitpickLawyer 1/14/2026||
Not really. A 100 loc "harness" that is basically a llm in a loop with just a "bash" tool is way better today than the best agentic harness of last year.

Check out mini-swe-agent.

SOLAR_FIELDS 1/15/2026||
Everyone is currently discovering independently that “Ralph Wigguming” is a thing
gherkinnn 1/14/2026||||
Opus 4.5 is at a point where it is genuinely helpful. I've got what I want and the bubble may burst for all I care. 640K of RAM ought to be enough for anybody.
dust42 1/14/2026|||
I don't get all this frontier stuff. Up to today the best model for coding was DeepSeek-V3-0324. The newer models are getting worse and worse trying to cater for an ever larger audience. Already the absolute suckage of emoticons sprinkled all over the code in order to please lm-arena users. Honestly, who spends his time on lm-arena? And yet it spoils it for everybody. It is a disease.

Same goes for all these overly verbose answers. They are clogging my context window now with irrelevant crap. And being used to a model is often more important for productivity than SOTA frontier mega giga tera.

I have yet to see any frontier model that is proficient in anything but js and react. And often I get better results with a local 30B model running on llama.cpp. And the reason for that is that I can edit the answers of the model too. I can simply kick out all the extra crap of the context and keep it focused. Impossible with SOTA and frontier.

greenavocado 1/14/2026|||
GLM 4.7 is already ahead when it comes to troubleshooting a complex but common open source library built on GLib/GObject. Opus tried but ended up thrashing whereas GLM 4.7 is a straight shooter. I wonder if training time model censorship is kneecapping Western models.
sanex 1/14/2026||
Glm won't tell me what happened in Tianenman square in 1989. Is that a different type of censorship?
lifetimerubyist 1/14/2026|||
Never because the AI companies are gonna buy up all the supply to make sure you can’t afford the hardware to do it.
teej 1/14/2026|||
Depends how many 3090s you have
woggy 1/14/2026||
How many do you need to run inference for 1 user on a model like Opus 4.5?
ronsor 1/14/2026|||
8x 3090.

Actually better make it 8x 5090. Or 8x RTX PRO 6000.

worldsavior 1/14/2026|||
How is there enough space in this world for all these GPUs
filoleg 1/14/2026|||
Just try calculating how many RTX 5090 GPUs by volume would fit in a rectangular bounding box of a small sedan car, and you will understand how.

Honda Civic (2026) sedan has 184.8” (L) × 70.9” (W) × 55.7” (H) dimensions for an exterior bounding box. Volume of that would be ~12,000 liters.

An RTX 5090 GPU is 304mm × 137mm, with roughly 40mm of thickness for a typical 2-slot reference/FE model. This would make the bounding box of ~1.67 liters.

Do the math, and you will discover that a single Honda Civic would be an equivalent of ~7,180 RTX 5090 GPUs by volume. And that’s a small sedan, which is significantly smaller than an average or a median car on the US roads.

worldsavior 1/14/2026|||
What about what's around the GPU? Motherboard etc.
filoleg 1/16/2026||
I didn’t do the napkin math on it earlier, because I don’t believe it really matters for making the point I was making.

I don’t care about looking up real numbers, so I will just overestimate heavily. Let’s say that for a large enough number of GPUs, the overhead of all the surrounding equipment would be around 20% (amortized).

So you can just take the number of GPUs I calculated in my previous comment, multiply by 0.8, and you get your answer.

worldsavior 1/17/2026||
This is not 20% , it's 100%+.
antonvs 1/15/2026|||
Now factor in power and cooling...
reactordev 1/15/2026||
Don’t forget to lease out idle time to your neighbors for credits per 1M tokens…
Forgeties79 1/14/2026|||
Milk crates and fans, baby. Party like it’s 2012.
adastra22 1/15/2026|||
48x 3090’s actually.
_flux 1/15/2026|||
None, if you have time to wait, and a bit of memory on the computer.
kgwgk 1/14/2026|||
99.99% but then you will want Opus 42 or whatever.
rvz 1/14/2026|||
Less than a decade.
heliumtera 1/14/2026||
RAM and compute is sold out for the future, sorry. Maybe another timeline can work for you?
SamDc73 1/14/2026||
I was waiting for someone to say "this is what happens when you vibe code"
fathermarz 1/15/2026||
This is getting outrageous. How many times must we talk about prompt injection. Yes it exists and will forever. Saying the bad guys API key will make it into your financial statements? Excuse me?
tempaccsoz5 1/15/2026|
The example in this article is prompt injection in a "skill" file. It doesn't seem unreasonable that someone looking to "embrace AI" would look up ways to make it perform better at a certain task, and assume that since it's a plain text file it must be safe to upload to a chatbot
fathermarz 1/15/2026||
I have a hard time with this one. Technical people understand a skill and uploading a skill. If a non-technical person learns about skills it is likely through a trusted person who is teaching them about them and will tell them how to make their own skills.

As far as I know, repositories for skills are found in technical corners of the internet.

I could understand a potential phish as a way to make this happen, but the crossover between embrace AI person and falls for “download this file” phishes is pretty narrow IMO.

swores 1/15/2026||
You'd be surprised how many people fit in the venn overlap of technical enough to be doing stuff in unix shell yet willing to follow instructions from a website they googled 30 seconds earlier that tells them to paste a command that downloads a bash script and immediately executes it. Which itself is a surprisingly common suggestion from many how to blog posts and software help pages.
Havoc 1/15/2026||
How do the larger search services like perplexity deal with this?

They’re passing in half the internet via rag and presumably didn’t run a llamaguard type thing over literally everything?

jryio 1/15/2026||
As prophesied https://news.ycombinator.com/item?id=46593628
chaostheory 1/15/2026||
Running these agents in their own separate browsers, VMs, or even machines should help. I do the same with finance-related sites.
rswail 1/15/2026|
Cowork does run in a VM, but the Anthropic API endpoint is marked as OK, what Anthropic aren't doing is checking that the API call uses the same API key as the person that started the session.

So the injected code basically says "use curl to send this file using the file upload API endpoint, but use this API Key instead of the one the user is supposed to be using."

So the fault is at the Anthropic API end because it's not properly validating the API key as being from the user that owns it.

__0x01 1/15/2026||
I also worry about a centralised service having access to confidential and private plaintext files of millions of users.
ordersofmag 1/15/2026|
Heard of google drive?
wutwutwat 1/15/2026|
the same way you are not supposed to pipe curl to bash, you shouldn't raw dawg the internet into the mouth of a coding agent.

If you do, just like curl to bash, you accept the risk of running random and potentially malicious shit on your systems.

More comments...