Posted by ninjagoo 10 hours ago
Maybe the Internet Archive might be ok to keeping some things private until x time passes; or they could require an account to access them
Their big requirement is you need to not be doing any DNS filtering or blocking of access to what it wants, so I've got the pod DNS pointed to the unfiltered quad9 endpoint and rules in my router to allow the machine it's running on to bypass my PiHole enforcement+outside DNS blocks.
^1 https://wiki.archiveteam.org/
^2 https://wiki.archiveteam.org/index.php/ArchiveTeam_Warrior
Sometimes it feels like ai-use concerns are a guise to diminish the public record. While on the other hand services like Ring or Flock are archiving the public forever.
Either way I'm fairly certain that blocking AI agent access isn't a viable long term solution.
Great point. If my personal AI assistant cannot find your product/website/content, it effectively may no longer exist! For me. Ain't nobody got the time to go searching that stuff up and sifting through the AI slop. The pendulum may even swing the other way and the publishers may need to start paying me (or whoever my gatekeeper is) for access to my space...
They do not care and we will be all worse off for it if these AI companies keep continuing to bombard news publishers RSS feeds.
It is a shame that the open web as we know it is closing down because of these AI companies.
I've seen companies fail compliance reviews because a third-party vendor's published security policy that they referenced in their own controls no longer exists at the URL they cited. The web being unarchivable isn't just a cultural loss. It's becoming a real operational problem for anyone who has to prove to an auditor that something was true at a specific point in time.
The very first result was a 404
https://aws.amazon.com/compliance/reports/
The jokes write themselves.
Links alone can be tempting as you've to reference the same docs or policies over and over for various controls.
Even if the content is taken down, changed or moved, a copy is likely to still be available in the Wayback Machine.
Sidebar:
Having been part of multiple SOC audits at large financial firms, I can say that nothing brings adults closer to physical altercations in a corporate setting than trying to define which jobs are "critical".
- The job that calculates the profit and loss for the firm, definitely critical
- The job that cleans up the logs for the job above, is that critical?
- The job that monitors the cleaning up of the logs, is that critical too?
These are simple examples but it gets complex very quickly and engineering, compliance and legal don't always agree.
Sadly, it does not even have to be an acquisition or rebrand. For most companies, a simple "website redo", even if the brand remains unchanged, will change up all the URL's such that any prior recorded ones return "not found". Granted, if the identical attestation is simply at a new url, someone could potentially find that new url and update the "policy" -- but that's also an extra effort that the insurance company can avoid by requiring screen shots or PDF exports.
Yes we have hundreds of identical Microsoft and Aws policies, but it's the only way. Checksum the full zip and sign it as part of the contract, that's literally how we do it
That's actually a potentially good business idea - a legally certifiable archiving software that captures the content at a URL and signs it digitally at the moment of capture. Such a service may become a business requirement as Internet archivability continues to decline.
I don't know how exactly it achieves being "legally certifiable", at least to the point that courts are trusting it. Signing and timestamping with independent transparency logs would be reasonable.
Any vendor who you work with should make it trivial to access these docs, even little baby startups usually make it quite accessible - although often under NDA or contract, but once that's over with you just download a zip and everything is there.
That's what I thought the first time I was involved in a SOC2 audit. But a lot of the "evidence" I sent was just screenshots. Granted, the stuff I did wasn't legal documents, it was things like the output of commands, pages from cloud consoles, etc.
What I would not do is take a screenshot of a vendor website and say "look, they have a SOC2". At every company, even tiny little startup land, vendors go through a vendor assessment that involves collecting the documents from them. Most vendors don't even publicly share docs like that on a site so there'd be nothing to screenshot / link to.
That is: if it's not accessible by a human who was blocked?
Having your cake and eating it too should never be valid law.
Or are you thinking of companies like Iron Mountain that provide such a service for paper? But even within corporations, not everything goes to a service like Iron Mountain, only paper that is legally required to be preserved.
A society that doesn't preserve its history is a society that loses its culture over time.
[1] https://www.mololamken.com/assets/htmldocuments/NLJ_5th%20Ci...
[2] https://www.nortonrosefulbright.com/en-au/knowledge/publicat...
I hope I’m wrong, but my bot paranoia is at all time highs and I see these patterns all throughout HN these days.
@dang do you have any thoughts about how you’re performing AI moderation on HN? I’m very worried about the platform being flooded with these Submarine comments (as PG might call them).
Seriously? What kind of auditor would "fail" you over this? That doesn't sound right. That would typically be a finding and you would scramble to go appease your auditor through one process or another, or reach out to the vendor, etc, but "fail"? Definitely doesn't sound like a SOC2 audit, at least.
Also, this has never particularly hard to solve for me (obviously biased experience, so I wonder if this is just a bubble thing). Just ask companies for actual docs, don't reference urls. That's what I've typically seen, you get a copy of their SOC2, pentest report, and controls, and you archive them yourself. Why would you point at a URL? I've actually never seen that tbh and if a company does that it's not surprising that they're "failing" their compliance reviews. I mean, even if the web were more archivable, how would reliance on a URL be valid? You'd obviously still need to archive that content anyway?
Maybe if you use a tool that you don't have a contract with or something? I feel like I'm missing something, or this is something that happens in fields like medical that I have no insight into.
This doesn't seem like it would impact compliance at all tbh. Or if it does, it's impacting people who could have easily been impacted by a million other issues.
A link disappearing isn’t a major issue. Not something I’d worry about (but yea might show up as a finding on the SOC 2 report, although I wouldn’t be surprised if many auditors wouldn’t notice - it’s not like they’re checking every link)
I’m also confused why the OP is saying they’re linking to public documents on the public internet. Across the board, security orgs don’t like to randomly publish their internal docs publicly. Those typically stay in your intranet (or Google Drive, etc).
lol seriously, this is like... at least 50% of the time how it would play out, and I think the other 49% it would be "ah sorry, I'll grab that and email it over" and maybe 1% of the time it's a finding.
It just doesn't match anything. And if it were FEDRAMP, well holy shit, a URL was never acceptable anyways.
You're missing the existence of technology that allows anyone to create superficially plausible but ultimately made-up anecdotes for posting to public forums, all just to create cover for a few posts here and there mixing in advertising for a vaguely-related product or service. (Or even just to build karma for a voting ring.)
Currently, you can still sometimes sniff out such content based on the writing style, but in the future you'd have to be an expert on the exact thing they claim expertise in, and even then you could be left wondering whether they're just an expert in a slightly different area instead of making it all up.
EDIT: Also on the front page currently: "You can't trust the internet anymore" https://news.ycombinator.com/item?id=47017727
Every comment section here can be summed up as "LLM bad" these days.
It's not "LLM bad" — it's "LLM good, some people bad, bad people use LLM to get better at bad things."
Insurance pays as long as you aren't knowingly grossly negligent. You can even say "yes, these systems don't meet x standard and we are working on it" and be ok because you acknowledged that you were working on it.
Your boss and your bosses boss tell you "we have to do this so we don't get fucked by insurance if so and so happens" but they are either ignorant, lying, or just using that to get you to do something.
I've seen wildly out of date and unpatched systems get paid out because it was a "necessary tradeoff" between security and a hardship to the business to secure it.
I've actually never seen a claim denied and I've seen some pretty fuckin messy, outdated, unpatched legacy shit.
Bringing a system to compliance can reasonably take years. Insurance would be worthless without the "best effort" clause.
[1]: https://arstechnica.com/civis/threads/journalistic-standards...
In the past libraries used to preserve copies of various newspapers, including on microfiche, so it was not quite feasible to make history vanish. With print no longer out there, the modern historical record becomes spotty if websites cannot be archived.
Perhaps there needs to be a fair-use exception or even a (god forbid!) legal requirement to allow archivability? If a website is open to the public, shouldn't it be archivable?
I am sad about link rot and old content disappearing, but it's better than everything be saved for all time, to be used against folks in the future.
I don't understand this line of thinking. I see it a lot on HN these days, and every time I do I think to myself "Can't you realize that if things kept on being erased we'd learn nothing from anything, ever?"
I've started archiving every site I have bookmarked in case of such an eventuality when they go down. The majority of websites don't have anything to be used against the "folks" who made them. (I don't think there's anything particularly scandalous about caring for doves or building model planes)
The truly important stuff exists in many forms, not just online/digital. Or will be archived with increased effort, because it's worth it.
If you don’t want your bad behavior preserved for the historical record, perhaps a better answer is to not engage in bad behavior instead of relying on some sort of historical eraser.
BUT, it's hard to learn from history if there's no history to learn...