Posted by bearsyankees 12/3/2025
Imagine the potential impact. You're a single mother, fighting for custody of your kids. Your lawyer has some documentation of something that happened to you, that wasn't your fault, but would look bad if brought up in court. Suddenly you receive a phone call - it's a mysterious voice, demanding $10,000 or they will send the documents to the opposition. Neither of them knows each other; someone just found a trove of documents in an open back door and wanted to make a quick buck.
This is exactly what a software building code would address (if we had one!). Just like you can't open a new storefront in a new building without it being inspected, you should not be able to process millions of sensitive files without having your software's building inspected. The safety and privacy of all of us shouldn't be optional.
Just like if any human employee publicly sexually harassed his female CEO, he'd be out of a job and would find it very hard to find a new one. But Grok can do it and it's the CEO who ends up quitting.
You can't fit every security consideration into the context window.
They also know not to, say, temporarily disable auth to be able to look at the changes they've made on a page hidden behind auth, which is what I observed Gemini 3 Pro doing just yesterday.
That's what makes it bad at security. It cannot comprehend more than a floppy drive worth of data before it reverts to absolute gibberish.
Let's imagine a codebase that can fit onto a revolutionary piece of technology known as a floppy drive. As we all know, a floppy drive can store <2 megabytes of storage. But a 100k tokens is only about 400 kilobytes. So, to process the whole codebase that can fit onto a floppy drive, you need 5 agents plus the sixth "parent process" that those 5 agents will report to.
Those five agents can report "no security issues found" in their own little chunk of the codebase to the parent process, and that parent process will still be none the wiser about how those different chunks interact with each other.
Its almost as if it has additional problems beyond the context limits :)
And what keeps security problems from making it into prod in the real world?
Code review, testing, static and dynamic code scanning, and fuzzing.
Why aren't these things done?
Because there isn't enough people-time and expertise.
So in order for LLMs to improve security, they need to be able to improve our ability to do one of: code review, testing, static and dynamic code scanning, and fuzzing.
It seems very unlikely those forms of automation won't be improved in the near future by even the dumbest form of LLMs.
And if you offered CISOs a "pay to scan" service that actually worked cross-language and -platform (in contrast to most "only supported languages" scanners), they'd jump at it.
Why? Context. LLMs, today, go off the rails fairly easily. As I've mentioned in prior comments I've been working a lot with different models and agentic coding systems. When a code base starts to approach 5k lines (building the entire codebase with an agent) things start to get very rough. First of all, the agent cannot wrap it's context (it has no brain) around the code in a complete way. Even when everything is very well documented as part of the build and outlined so the LLM has indicators of where to pull in code - it almost always cannot keep schemas, requirements, or patterns in line. I've had instances where APIs that were being developed were to follow a specific schema, should require specific tests and should abide by specific constraints for integration. Almost always, in that relatively small codebase, the agentic system gets something wrong - but because of sycophancy - it gleefully informs me all the work is done and everything is A-OK! The kicker here is that when you show it why / where it's wrong you're continuously in a loop of burning tokens trying to put that train back on the track. LLMs can't be efficient with new(ish) code bases because they're always having to go lookup new documentation and burning through more context beyond what it's targeting to build / update / refactor / etc.
So, sure. You can "call an LLM multiple times". But this is hugely missing the point with how these systems work. Because when you actually start to use them you'll find these issues almost immediately.
Spot on. If we look at, historically, "AI" (pre-LLM) the data sets were much more curated, cleaned and labeled. Look at CV, for example. Computer Vision is a prime example of how AI can easily go off the rails with respect to 1) garbage input data 2) biased input data. LLMs have these two as inputs in spades and in vast quantities. Has everyone forgotten about Google's classification of African American people in images [0]? Or, more hilariously - the fix [1]? Most people I talk to who are using LLMs think that the data being strung into these models has been fine tuned, hand picked, etc. In some cases for small models that were explicitly curated, sure. But in the context (no pun) of all the popular frontier models: no way in hell.
The one thing I'm really surprised nobody is talking about is the system prompt. Not in the manner of jailbreaking it or even extracting it. But I can't imagine that these system prompts aren't collecting mass tech debt at this point. I'm sure there's band aid after band aid of simple fixes to nudge the model in ever so different directions based on things that are, ultimately, out of the control of such a large culmination of random data. I can't wait to see how these long term issues crop and and duct taped for the quick fixes these tech behemoths are becoming known for.
[0] https://www.bbc.com/news/technology-33347866 [1] https://www.theguardian.com/technology/2018/jan/12/google-ra...
But also, you'd need to have some metrics - how good are developers at security already? What if the bar is on the floor and LLM code generators are already better?
I've seen a lot of job ads (Canva) lately that mandate AI use or AI experience, and as an AI company if they wanted that I think they would have put it in the ad.
For the record I think I may be fine with the insincerity of selling AI but not using it!
Yes, but adding these common sense considerations is actually something LLMs can already do reasonably well.
If we're saying the way to ensure competency is to instill fear of not getting money tomorrow as a consequence of failure, then AI companies and humans are on equal footing.
It's like having multiple people audit your systems. Even if everyone only catches 90%, as long as they don't catch exactly the same 90%, this parallel effort helps.
he wanted to demonstrate that he indeed has the private data. But he fucked up the tar command and it ended up having his username in the directory names, a username he used in other places on the internet
The problem here however is that they get away with their sloppiness as long as the security researcher who found this is a whitehat, and the regular news won't pick it up. Once regular media pick this news up (and the local ones should), their name is tarnished and they may regret their sloppiness. Which is a good way to ensure they won't make the same mistake. After all, money talks.
The story is an example of the market self-correcting, but out comes this “building code” hobby horse anyway. All a software “building code” will do is ossify certain current practices, not even necessarily the best ones. It will tilt the playing field in favor of large existing players and to the disadvantage of innovative startups.
The model fails to apply in multiple ways. Building physical buildings is a much simpler, much less complex process with many fewer degrees of freedom than building software. Local city workers inspecting by the local municipality’s code at least has clear jurisdiction because of where the physical fixed location is. Who will write the “building code”? Who will be the inspectors?
This is HN. Of all places, I’d expect to see this presented as an opportunity for new startups, not calls for slovenly bureaucracy and more coercion. The private market is perfectly capable of performing this function. E&O and professional liability insurers if they don’t already will be soon motivated after seeing lawsuits to demand regular pentests.
The reported incident is a great reminder of caveat emptor.
I don't...think this is true? Google has no problems shipping complex software projects, their London HQ is years behind schedule and vastly over budget.
Construction is really complex. These can be mega-projects with tens of thousands of people involved, where the consequences of failure are injury or even death. When software failure does have those consequences - things like aviation control software, or medical device firmware - engineers are held to a considerably higher standard.
> The private market is perfectly capable of performing this function
But it's totally not! There are so many examples in the construction space of private markets being wholly unable to perform quality control because there are financial incentives not to.
The reason building codes exist and are enforced by municipalities is because the private market is incapable of doing so.
I used to think developers had to be supremely incompetent to end up with vulnerabilities like this.
But now I understand it’s not the developers who are incompetent…
There are organisations that are generally competent, and there are places that are less competent. It's not all that uncommon for the whole organisation to be generally incompetent.
The saddest places (for me) are those where almost every individual you talk to seems generally competent, but judging by their output the company might as well be stuffed by idiots. Something in the way they are organised suppresses the competence. (I worked at one such company.)
> Maybe I have just been lucky, but I have not had the displeasure of working with people either tha incompetent or willfully ignorant yet.
It's very important before you start any new job to suss out how competent people and the organisation are. Ideally, you probably want to work for a competent company. But at least you want to know what you are getting into.
There's a bit of luck involved, if you go in blindly, but you can also use skill and elbow-grease to investigate.
It's a natural outcome of authoritarian structures when the people at the top are idiots. When that happens, the whole organization rots.
how does one do this, without first having the job and being embedded in there? From the outside, it's near impossible to see these details imho.
It's fundamentally the same problem that the company is trying to solve when they interview you, just the other way 'round.
Some ideas: observe and ask in the interviews and hiring process in general. See what you can find out about the company from friends, contacts and even strangers. Network! Do some online research, too.
Btw, lots of the cliché interview questions ("What are your greatest weaknesses?" etc) actually make decent questions you can ask about the company and team you are about to join.
Reeves orders Treasury inquiry over Budget leaks
Chancellor’s policies found their way to the press before she announced them to MPs
https://www.telegraph.co.uk/news/2025/12/03/reeves-orders-tr...
There’s definitely plenty of incompetence regardless. But I’ve never seen a company where the incompetence was more noteworthy in the cog positions than “leadership”.
Is the issue that people aren't checking their security@ email addresses? People are on holiday? These emails get so much spam it's really hard to separate the noise from the legit signal? I'm genuinely curious.
Companies hire a "security team" and put them behind the security@ email, then decide they'll figure out how to handle issues later.
When an issue comes in, the security team tries to forward the security issue to the team that owns the project so it can be fixed. This is where complicated org charts and difficult incentive structures can get in the way.
Determining which team actually owns the code containing the bug can be very hard, depending on the company. Many security team people I've worked with were smart, but not software developers by trade. So they start trying to navigate the org chart to figure out who can even fix the issue. This can take weeks of dead-ends and "I'm busy until Tuesday next week at 3:30PM, let's schedule a meeting then" delays.
Even when you find the right team, it can be difficult to get them to schedule the fix. In companies where roadmaps are planned 3 quarters in advance, everyone is focused on their KPIs and other acronyms, and bonuses are paid out according to your ticket velocity and on-time delivery stats (despite PMs telling you they're not), getting a team to pick up the bug and work on it is hard. Again, it can become a wall of "Our next 3 sprints are already full with urgent work from VP so-and-so, but we'll see if we can fit it in after that"
Then legal wants to be involved, too. So before you even respond to reports you have to flag the corporate counsel, who is already busy and doesn't want to hear it right now.
So half or more of the job of the security team becomes navigating corporate bureaucracy and slicing through all of the incentive structures to inject this urgent priority somewhere.
Smart companies recognize this problem and will empower security teams to prioritize urgent things. This can cause another problem where less-than-great security teams start wielding their power to force everyone to work on not-urgent issues that get spammed to the security@ email all day long demanding bug bounties, which burns everyone out. Good security teams will use good judgment, though.
Now if you needed to develop something not-urgent that involved, say, the performance department, database department, and your own, hope you’ve got a few months to blow on conference calls and procedure documents.
For that industry it made sense though.
Now that I think of it, I’ll bet a lot of companies have a system similar to this for their infrastructure… they just outsource it to AWS, Azure, Google, etc. and comparatively fly by the seat of their pants on the dev side. You could only scale that system down so much, I imagine.
A lot are people who cannot code at all, cannot administer - they just fill tables and check boxes, maybe from some automated suite. They dont know what http and https is, because they are just paper pushers what is far from real security, but more like security in name only.
And they joined the work since it pays well
At my past employers it was "The VP of such-and-such said we need to ship this feature as our top priority, no exceptions"
And of course nobody remembered the setup, and logging was only accessible by the same person, so figuring out also took weeks.
Email the memo to a decision maker with the important flag on and CC: another person as a witness.
If you have been saying it for a long time and nobody has taken any action, you may use the word "escalation" as part of the subject line.
If things hit the fan, it will also make sure that what drops from the fan falls on the right people, and not on you.
They have a specific time of day, when they check their email, and they only give 30 minutes to that time, and they check emails from most recent, down.
The email comes in, two hours earlier, and, by the time they check their email, it's been buried under 50 spams, and near-spams; each of which needs to be checked, so they run out of 30 minutes, before they get to it. The next day, by email check time, another 400 spams have been thrown on top.
Think I'm kidding?
Many folks that have worked for large companies (or bureaucracies) have seen exactly this.
That said, in my experience this spam is still a few emails a day at the most, I don't think there's any excuse for not immediately patching something like that. I guess maybe someone's on holiday like you said.
There is so much spam from random people about meaningless issues in our docs. AI has made the problem worse. Determining the meaningful from the meaningless is a full time job.
The other half was people demanding payment.
I reckon only 1% of reports are valid.
LLM's can now make a plausible looking exploit report ('there is a use after free bug in your server side implementation of X library which allows shell access to your server if you time these two API calls correctly'), but the LLM has made the whole thing up. That can easily waste hours of an experts time for a total falsehood.
I can completely see why some companies decide it'll be an office-hours-only task to go through all the reports every day.
Of course this could be a real vulnerability if it would disclose the real server IP behind cloudflare. This was not the case, we were sending via AWS email gateway
Outside of startups and big tech, it's not uncommon to have release cycles that are months long. Especially common if there is any legal or regulatory involvement.
I remember heartbleed dropping shortly after a deployment and not being allowed to patch for like ten months because the fix wasn't "validated". This was despite insurers stating this issue could cost coverage and legal getting involved.
I have unfortunately seen way worse. If it will take more than an hour and the wrong people are in charge of the money, you can go a pretty long time with glaring vulnerabilities.
In a complex system it can be very hard to understand what will break, if anything. In a less complex system, it can still be hard to understand if the person who knows the security model very well isn't available.
There is always the simple answer, these are lawyers so they are probably scrambling internally to write a response that covers themselves legaly also trying to figure out how fucked they are.
1 week is surprisingly not that slow.
1) the hack is straightforward to do;
2) it can do a lot of damage (get PII or other confidential info in most cases);
3) downtime of the service wouldn't hurt anyone, especially if we compare it to the risk of the damage.
But, instead of insisting on the immediate shutting down of the affected service, we give companies weeks or months to fix the issue while notifying no one in the process and continuing with business as usual.
I've submitted 3 very easy exploits to 3 different companies the past year and, thankfully, they fixed them in about a week every time. Yet, the exploits were trivial (as I'm not good enough to find the hard ones, I admit). Mostly IDORs, like changing id=123456 to id=1 all the way up to id=123455 and seeing a lot medical data that doesn't belong to me. All 3 cases were medical labs because I had to have some tests done and wanted to see how secure my data was.
Sadly, in all 3 cases I had to send a follow-up e-mail after ~1 week, saying that I'll make the exploit public if they don't fix it ASAP. What happened was, again, in all 3 cases, the exploit was fixed within 1-2 days.
If I'd given them a month, I feel they would've fixed the issue after a month. If I'd given then a year - after a year.
And it's not like there aren't 10 different labs in my city. It's not like online access to results is critical, either. You can get a printed result or call them to write them down. Yes, it would be tedious, but more secure.
So I should've said from the beginning something like:
> I found this trivial exploit that gives me access to medical data of thousands of people. If you don't want it public, shut down your online service until you fix it, because it's highly likely someone else figured it out before me. If you don't, I'll make it public and ruin your reputation.
Now, would I make it public if they don't fix it within a few days? Probably not, but I'm not sure. But shutting down their service until the fix is in seems important. If it was some hard-to-do hack chaining several exploits, including a 0-day, it would be likely that I'd be the first one to find it and it wouldn't be found for a while by someone else afterwards. But ID enumerations? Come on.
So does the standard "responsible disclosure", at least in the scenario I've given (easy to do; not critical if the service is shut down), help the affected parties (the customers) or the businesses? Why should I care about a company worth $X losing $Y if it's their fault?
I think in the future I'll anonymously contact companies with way more strict deadlines if their customers (or others) are in serious risk. I'll lose the ability to brag with my real name, but I can live with it.
As to the other comments talking about how spammed their security@ mail is - that's the cost of doing business. It doesn't seem like a valid excuse to me. Security isn't one of hundreds random things a business should care about. It's one of the most important ones. So just assign more people to review your mail. If you can't, why are you handling people's PII?
I understand you think you are doing the right thing but be aware that by shutting down a medical communication services there's a non-trivial chance someone will die because of slower test results.
Your responsibility is responsible disclosure.
Their responsibility is how to handle it. Don't try to decide that for them.
What you're describing is likely a crime. The sad reality is most businesses don't view protection of customers' data as a sacred duty, but simply another of the innumerable risks to be managed in the course of doing business. If they can say "we were working on fixing it!" their asses are likely covered even if someone does leverage the exploit first—and worst-case, they'll just pay a fine and move on.
The more casualties, the more media attention -> the more likely they, and others in their field, will take security more seriously in the future.
If we let them do nothing for a month, they'll eventually fix it, but in the mean time malicious hackers may gain access to the PII. They might not make it public, but sell that PII via black markets. The company may not get the negative publicity it deserves and likely won't learn to fix their systems in time and to adopt adequate security measures. The sale of the PII and the breach itself might become public knowledge months after the fact, while the company has had a chance to grow in the meantime, and make more security mistakes that may be exploited later on.
And yes, I know it may be a crime - that's why I said I'd report it anonymously from now on. But if the company sits on their asses for a month, shouldn't that count as a crime, as well? The current definition of responsible disclosure gives companies too much leeway, in my opinion.
If I knew I operated a service that was trivial to exploit and was hosting people's PII, I'd shut it down until I fixed it. People won't die if I make everything in my power to provide the test results (in my example of medical labs) to doctors and patients via other means, such as via paper or phone. And if people do die, it would be devastating, of course, but it would mean society has put too much trust into a single system without making sure it's not vulnerable to the most basic of attacks. So it would happen sooner or later, anyway. Although I can't imagine someone dying because their doctor had to make a phone call to the lab instead of typing in a URL.
The same argument about people dying due to the disruption of the medical communications system could be made about too-big-to-fail companies that are entrenched into society because a lot of pension funds have invested in them. If the company goes under, the innocent people dependent on the pension fund's finances would suffer. While they would suffer, which would be awful, of course, would the alternative be to not let such companies go bankrupt? Or would it be better for such funds to not rely so much on one specific company in the first place? That is to say, in both cases (security or stocks in general) the reality is that currently people are too dependent on a few singular entities, while they shouldn't be. That has to change, and the change has to begin somewhere.
Also … shows you what a SOC 2 audit is worth: https://www.filevine.com/news/filevine-proves-industry-leade...
Even the most basic pentest would have caught this.
The auditors themselves pretty much only care that you answered all questions, they don’t really care what the answers are and absolutely aren’t going to dig any deeper.
(I’m responsible for the SOC2 audits at our firm)
I asked my my manager if that's all that was required and he said yes, just make sure you do it again next year. I spent the rest of my time worrying that we missed something. I genuinely didn't believe him until your comment.
Edit: missing sentence.
I dont at all get why there is a paragraph thanking their communication if that is the case.
I wouldn't expect them to find any computer problems either to be honest.
They should have given you some money.
They could have sold this to a ransomare group or affiliate for 5-6 figures and then the ransomware group could have exfil'd the data and attempted to extort the company for millions.
Then if they didnt pay and the ransomware group leaked the info to the public, they'd likely have to spend millions on lawsuits and fines anyways.
They should have paid this dude 5-6 figures for this find. It's scenarios like this that lead people to sell these vulns on the gray/black market instead of traditional bug bounty whitehat routes.
My argument is we're in the Wild West with AI and this stuff is being built so fast with so many evolving tools that corners are being cut even when they don't realize it.
This article demonstrates that, but it does sort of beg the question as to why not trust one vs the other when they both promise the same safeguards.
Specifically, it does not appear that AI is invoked in any way at the search endpoint - it is clearly piping results from some Box API.
Point out one (1) "AI product" company that isn't described accurately by that sentence
In truth the company forced our hand by pricing us out of the on-premise solution and will do that again with the other on-premise we use, which is set to sunset in five years or so.
Storing lots of legal data doesn’t seem to be one of these cases though.
Selling an on-premise service requires customer support, engineering, and duplication of effort if you’re pushing to the cloud as well. Then you get the temptations and lock in of cloud-only tooling and an army of certified consultant drones whose resumes really really need time on AWS-doc-solution-2035, so the on premise becomes a constant weight on management.
SaaS and the cloud is great for some things some of the time, but often you’re just staring at the marketing playbook of MS or Amazon come to life like a golem.
The funny thing is that this exploit (from the OP) has nothing to do with AI and could be <insert any SaaS company> that integrates into another service.
If SaaS Y just says "Give me your data and it will be secure", that's where it gets suspect.
I am one of the engineers that had to suffer through countless screenshots and forms to get these because they show that you are compliant and safe. While the real impactful things are ignored
https://jon4hotaisle.substack.com/i/180360455/anatomy-of-the...
It is crazy how this gets perpetuated in the industry as actually having security value, when in reality, it is just a pay-to-play checkbox.
If the options mainly consist of "trust me bro" vs "we can demonstrate that we put in some effort", the latter seems more preferable, even if it's not perfect.
It's also impossible to guarantee a 100% secure infrastructure, no matter how good your product team is.
In the grey is a term of art: "best efforts."
If data is leaking, and it wasn't because hackers bypassed a bunch of safeguards, if it can be shown that you didn't use Best Efforts to secure said data, there is liability.
1. The standards aren't clearly defined (i.e., you must specifically do this).
2. They are defined in terms of efforts rather than effects. It is like saying "every car sold must be made of steel" rather than "every car sold must be capable of withstanding an impact against a concrete wall at 60mph with X amount of deformation, etc." We want the rules to determine what level of threat is protected against, not just what motions the company went through. In the case in the article, it wasn't because hackers bypassed a bunch of safeguards; the company didn't protect against even basic threats.
3. It's not enough to have "liability". That puts the onus on individuals to sue the company for their specific damages. We need criminal penalties that are designed to punish companies (and the individuals who direct them) for the harm they do to society by the overall process of rushing ahead selling things instead of slowing down and being careful. We need large-scale enforcement so that companies actually stop doing these things because the cost of doing them becomes too enormous.
4. Our laws do not adequately take account of the differential power of those who cut corners, and the differential gains reaped. We frequently find small operators on the wrong end of painful lawsuits and onerous criminal penalties, while the biggest companies and wealthiest individuals use their position to avoid consequences. Laws need to explicitly take this into account, lowering the standard of proof for penalties against larger, wealthier, and more powerful companies and individuals, and also making those penalties exponentially higher.
Edit: I agree with you that we shouldnt let companies like this get away with what amounts to a slap on the wrist. But everything else seems irresponsible as well.
In the current world, I dunno. I guess it depends on what the company is. If it's something like a hedge fund or a fossil fuel company I think I'd be fine with some kind of wikileaks-like avenue for exposing it in such a way that it results in the company being totally destroyed.
I'd love to know who filevine uses for penetration testing (which they do, according to their website) because holy shit, how do you miss this? I mean, they list their bug bounty program under a pentesting heading, so I guess it's just nice internet people.
It's inexcusable.
Security reminds me of the Anna Karenina principle: All happy families are alike; each unhappy family is unhappy in its own way.
To be fair, data security breaches seldom are.
and otherwise well structured engineering orgs have lost their goddamn minds with move fast and break things
because they're worried that OpenAI/Google/Meta/Amazon/Anthropic will release the tool they're working on tomorrow
literally all of them are like this