Potential session/cache leakage between workspace instances or consumer accounts

Posted by chatmasta 5 hours ago

Potential session/cache leakage between workspace instances or consumer accounts(github.com)

208 points | 96 commentspage 2

dchest 3 hours ago|

Can be malware? Something like https://news.ycombinator.com/item?id=48667495

acepl 5 hours ago||

Oh yes, we do not need programmers any more…

kylehotchkiss 5 hours ago||

50% unemployment :D

JohnMakin 3 hours ago|||

it’s the wet dream of execs and pm types. however, i have not seen anything close to it in my life. I remember the UML days, lol. the issue is not the code, it’s the translation layer between business and code. maybe someday ai bridges that gap. history has shown probably not

emehex 5 hours ago||

"Coding is largely solved"

supriyo-biswas 4 hours ago|||

The funny thing is at my current employer, they mentioned that "coding is increasingly becoming a solved problem" and in the same breath, mentioned that one project was too hard for anyone to do so they're not doing it and would rather sell existing features...

throwatdem12311 2 hours ago||

Weird. Coding isn’t really “solved” - because “coding” isn’t just the process of typing in characters as fast as possible - BUT the skill floor has been massively lowered while also raising the skill ceiling considerably.

We’re doing projects now that seemed impossible before because we have access to these powerful AI models. They can make things that would have taken weeks or months take days now, freeing up time for even more ambitious buildouts we never would’ve even considered before.

consp 5 hours ago||||

While abused by LLM vendors, that phrase in one form or another I've been hearing since the early '00s and it's likely way older.

ethagnawl 4 hours ago||

Sure but have you ever seen it actually play out in practice like it currently is? Whether or not it's true (of course it's not) people are currently behaving as if it is and firing/hiring accordingly.

philipov 4 hours ago||

Well, when was the last time you wrote machine code by hand?

... but then they went and changed what coding meant.

We've always been layering abstractions on top of abstractions. If we get to an abstraction that works well enough that you no longer have to dive down into the previous layer, we say we've solved coding, and change what coding means. Obviously LLMs aren't there yet.

techpression 5 hours ago||||

I love that quote, especially considering the insane amount of bugs that are produced. It’s as easy to debunk as someone claiming ”I can jump to the moon”.

CamperBob2 3 hours ago|||

"This thing isn't 100% perfect, contrary to what absolutely no one anywhere said at any time"

solenoid0937 2 hours ago||

> one tool call result that includes a string that printed a pathname including minecraft.py

This seems like a hallucination.

jstummbillig 4 hours ago||

Is there anything particular about LLMs that would make separating customer data harder than in all SaaS cases?

bri3d 3 hours ago||

Yes:

* There's an enormous amount of very expensive shared state (context cache) which you do not want to duplicate when you can avoid it.

* Memory locality is crucially important for performance.

* Hardware is extremely over-subscribed.

* Hardware is extremely expensive.

These factors all make hardware or even traditional memory-space (hypervisor/VM/hardware assisted virtualization) isolation a non-starter for most workloads and customers, which forces all isolation to the software layer. This already makes things way harder than they are in commodity SaaS.

Moving beyond that, the tools, frameworks, and hardware which the system runs on (GPU) wasn't designed for task isolation and building this isolation is even moreso an emergent research field than it is in x86 CPU hardware-sharing (which has required a huge amount of effort over the past 30+ years to get where we are today).

And, the ratio of usage/sensitivity to maturity is also just poor overall; these are young companies with rapid development and enormous delivery pressure under incredible customer workload requirements, too.

I can't tell if the original post is a real issue or not, but I'm surprised there aren't more like this overall; the whole thing really is a house of cards in this sense.

jstummbillig 2 hours ago||

> which forces all isolation to the software layer. This already makes things way harder than they are in commodity SaaS.

Is this not what happens in most SaaS? Isolation at the software layer? I understand there are special agreements, but they seem to be mostly that – no?

> the ratio of usage/sensitivity to maturity is also just poor overall; these are young companies with rapid development and enormous delivery pressure under incredible customer workload requirements, too.

Mh. The talent density in these companies is apparently quite exceptional. Things like customer data separation is something that is obvious and top of mind. I don't see why they would not hire the best to implement these relatively boring/solved things correctly at an architectural level.

bri3d 2 hours ago||

> Is this not what happens in most SaaS?

I think it's fairly popular to try to do more logical isolation in SaaS now, especially with VM-scheduling-as-a-service becoming more popular. For example, I did security architecture at a company who did relatively simple financial processing; we worked to move to a model where customer documents were encrypted using a tenant key which we'd then wrap in both a service key and a login key; users could only get the login key stapled to their session by authenticating against that account, and the processing jobs ran on a cloud vendor's logical isolation. So the user needed a login key, the service needed the attested service key, and the job ran in what amounted to a mini-VM, avoiding issues like "whoops we sent the wrong document ID and the backend gave it back to us" or "whoops, we routed the request to the wrong tenant backend!" This level of isolation would be really hard to achieve in an LLM vendor context.

> I don't see why they would not hire the best to implement these relatively boring/solved things correctly at an architectural level.

I think a lot of these things develop over time; obviously hiring people who have done them before helps, but it's hard. Even the people with strong experience often only know little slices. And unfortunately, every system operating at these scales has emergent behavior which can become really challenging at scale; mistakes like "we used hash(id) as a key in a memory cache without a collision list, and it collided" which would simply never affect most startups become more and more frequent at scale. High rate of change makes it hard to suss these mistakes out and root-cause them, too; "a customer gave us a log where we swapped X and Y" is hard to bisect when you're doing 500 code deploys a day.

adam_arthur 4 hours ago|||

Vibe-coding the implementation.

I haven't had much issue with Codex, but seems Claude Code has major issues being reported nearly on the daily.

They also happen to be the most boastful about not reading or looking at the code.

LLMs are very capable, but not nearly to the level they seem to be messaging.

(We've actually moved on from vibe-coding to having the LLM vibe code itself in a loop)

27183 4 hours ago|||

> having the LLM vibe code itself in a loop

The businesslatin name for this is Recursive Self-Improvement

rabbidruster 4 hours ago|||

Interestingly I had an almost identical experience to this report in codex. It output a user memory file that looked awfully real and wasn't at all related to my work.

27183 4 hours ago|||

If I had to hazard a guess, doing anything in a multi-tenant way on a GPU is going to be hard mode compared to most SaaS due to lack of memory safe tooling. I've built multi-tenant SaaS systems, and I've done a little GPU programming (a long time ago), but I've never tried to combine the two disciplines.

woadwarrior01 4 hours ago||

It'd be terribly compute inefficient to not share prefix caches (KV cache) across customers.

acepl 4 hours ago||

What is the probability that two customers will have exactly the same tokens in cache? Wouldnt it require using the exact same CLAUDE.md, skills, MCPs and context? After that it is even worse since the nondeterminism of LLMs and humans

27183 4 hours ago|||

I suspect what GP is getting at is there will be a strong incentive to implement some structural sharing across tenants to avoid redundantly storing the same tokens over and over. At least I'd be tempted to do this if I was working with a very precious, constrained resource (e.g. VRAM). Doing this correctly seems.. very difficult. [edit] To answer your question directly: the probability that the entire cache is identical between two different users is very low, but the probability that there exists identical chunks of cache between two different users is very high. Exploiting those commonalities successfully will significantly compress the data.

weitendorf 1 hour ago||

Agree with this and I have been thinking about it recently as well. I think you could implement a cord-like vocabulary to identify large duplicated substrings for exact deduplication and pairwise correlations or vocabulary profiles/small classifiers for forward-looking or speculative deduplications. A clear example is the GPL license, it’s a large substring you might encounter often and highly likely to be accompanied by lots of c code.

This is probably something that you’d be doing on the CPU though before sending anything to the GPU, though that’s definitely the sensitive surface since it’s hardware without good multitenancy. I assume the interface between the CPU and GPU is where you would be most likely to make a mistake where you start decoding data from one fd that was meant for another, or from the wrong position, and get someone else’s data.

I wouldn’t be confident that these are active exploits from deliberately abusing kv cache optimizations though, possibly just the kind of bugs you get from active low level performance tuning/systems work. Since this is something I have seen across providers lately I personally suspect it to be a driver issue.

dezgeg 3 hours ago||||

System prompt for something like Claude Code should be identical, no?

cmrdporcupine 1 hour ago|||

Could just be a bug in the radix tree for the KVCache with deeper, wrong, levels of the trie returning for the same initial prefix match.

Trasmatta 2 hours ago||

The first reply clearly being a copy and paste from Claude made me want to vomit

If people absolutely need to use AI to write replies, they NEED to start including a "everything after this was generated by AI" disclaimer

ai_fry_ur_brain 3 hours ago||

Openrouters model providers give me urls people have given them quite frequently.

Kapura 4 hours ago||

happy fourth of july everybody!

ofjcihen 3 hours ago|

Happy fourth to you too :)

ryantsuji 4 hours ago|

Note the repro condition: first response after 5+ min, i.e. a cache miss. A cache leak would show up on hits (someone else's cached prefix), not on misses where everything is recomputed from your own tokens.

More comments...