Posted by neilfrndes 6 hours ago
Onboarding my non-software engineer teammates to it has super-charged them and essentially given them all their own personal developer that can automate tasks for them. Managing codebases, etc. is still a hassle though.
90% of the power of Excel was that it was functionally a database that a normal person could actually use. I think we'll see something similar with coding agents.
That's what they aim Claude Cowork at. Every executive/leader I've shown Claude Cowork to has gone from 'what is AI' to 'vibecoding whole apps' in weeks. Then when Claude is down for an hour, they get visibly angry and don't remember how to do anything pre-Claude :)
I understand the impulse to provide a UI to manage codebases, etc. But my observation is that these people just ask Claude to do whatever it is they need done. Codebase needs managing? They just ask Claude to do it. No idea how to deploy an app? They just ask Claude to do it.
Any app built on top of this stack to 'make it easier' is competing with 'I don't care what's happening, just ask Claude to do it'.
Do you, and those executives, own the risks associated with that practice? Are those risks actually indemnified?
Its neat that 'anyone can do anything' but if they don't actually know what the risk to business or 3rd parties, why is this a good thing, especially in the enterprise where there are actors who are explicitly looking for this type of environment to exploit?
I've been working in tech since the late 90s. This is the biggest and most sudden change in company behavior I've ever seen. The only thing that comes close was the web 1.0 world in the 90s where everything suddenly became websites.
That creates tons of risks and opportunities. Good and bad. Maybe a great time to start a security company. But maybe a terrible time to be a small time web app developer when your clients can get 'good enough' in minutes for dollars on their own.
Wait, you exposed people to a technology, taught them how to use it, then you are not going to own the implications of that action without teaching them about the risks or telling them how they need to ensure they don't shoot themselves in the face or violate their duty of care?
Do you understand what you are saying and the implications of that in the real world relative to the insurance contracts that they have?
Your company is associated with HIPAA, you should have a much higher standard than this.
The drug is scary when everyone is depending on it. I wonder what is future like.
I do agree quality will be missed, and shadow IT will be again a big issue like at the end of the 80s and early 90s.
I’ve heard the same from the best devs, and some who thought themselves to be the best, I’ve known long before LLMs were ever a thing.
I’m sure others heard the same when JavaScript and Python became near ubiquitous. When PHP emerged. When C supplanted Fortran and COBOL. When these two took over from Assembly. When punch cards went the way of the dodo.
There’s always someone for whom shitty is becoming the new normal. If that makes it a rule, what do we make of that rule?
Also we went from compilers with an IDE that had a debugger, profiler, built-in help and would fit on a 3.5" disk and would load on machines with 640KiB RAM (Turbo Pascal) to chat apps or password managers that are hundreds of megabytes and regularly gobble up more than a gigabyte of memory because they ship with their own browser.
Something is lost along the way.
Coding per se is not hard. Proper engineering is. I do hope this change brings a change in focus (people train in algorithms, efficiency, solid development patterns) but I am afraid it won’t be the case.
When the electricity goes out, (most) people get similarly upset. No electricity means no internet, and all of a sudden everything that people had planed to do can’t be done until the power returns.
Like Slack or GitHub or AWS or whatever. It’s almost always a net positive to wait vs do it yourself.
However, the temptation of productivity gains are strong, and few of the customers look into relaxing these rules.
What could possibly go wrong.
I can't wait for a Hollywood blockbuster that'll pretty much be science non-fiction.
Probably "don't do anything to upset AI companies or you will effectively become a handicapped person"
Not that different from life in China: "don't do anything to upset Tencent and AliPay or you will become an outcast"
Or life in the US if you're a content creator: "don't do anything to upset Meta or Youtube or you will not be able to pay your rent"
The future: ToS basically becomes law, and you will be stripped of your own second brain if you violate it or say anything they deem "sensitive"
* ransomware attack, fire in the server room, database HDD crash, car accident takes out the internet connection, ...
Reading the first part, I was going to say they don’t even care about whether or not there’s a codebase. It doesn’t matter; it could be all gremlins and hamsters in wheels for all they care, and for all they should care. All that matters is the functionality, the value it gives them.
We’re even getting disposable code now. Entire single-use ephemeral web apps, built on the go to enable, visualise, or simplify a specific thing, then thrown away.
Will it all lead to some trouble? Definitely. So did computers, and so did the internet.
Weird times. Fun times.
I would get called in to rewrite it, using a proper database, documented rules and ensure it stayed scalable - and everyone would be happy.
These Access "apps" were abominations from a technical point of view - but they got the job done without having to spend a load of money on off-the-shelf or bespoke software. And the "tech guy" made a valuable contribution to the company. It's only at a certain point that Access started to struggle.
I foresee the exact same thing happening in the near future - except we won't be building the replacement apps ourselves - we'll just know how to give the coding agents well-specified prompts and tell them when they're making a mistake.
I think what a lot of us are concerned about is that the vibe-coded stuff bloats fast. It's so verbose and all over the place, that picking that thing apart will be a huge job, and relying on an AI to pick apart work that an AI already failed to maintain seem like wishful thinking.
It's literally "The AI is failing! Don't worry I'll just use AI to fix the AI!".
What I needed to do was sit with a user (not a manager/the person buying my services) and ask them to show me the different things they did with the software. Then I could write a spec for the actual _feature_ and would only need to look at the existing codebase if they needed data transferring across[1]. I don't see why our new LLM-based future would be any different
[1] Of course this meant I would leave out edge-cases and/or weird quirks of the system - often this was actually a bonus as they were either no longer relevant or worked that way because that was the only way they knew how to do it
To put it another way, the customers of these frontier models are implicitly being competed against by the model itself.
I'm currently doing something like this in the internal model-independent LLM chat app I work on at a F100, specifically targeted at our everyday users. <input type="file" webkitdirectory> lets the user give the model read and write access to a local folder (and OPFS lets us reuse the same fs tools we give the model for files manually attached to the chat, or for files tools want to create if they haven't granted folder access).
Every time we used to release a new version it was "still can't handle the 6MB Excel file I drop into it" when that was being extracted to CSV and added to context - now it can poke about in the big Excel file directly with SheetJS to pull the sheets/headers and inspect the shape of the data, and use locally sandboxed code execution to write code against either extracted data or the spreadsheet itself via SheetJS for pivot tables and such (all locally - none of which need go into the context).
The base models are good enough at tool calling (I really mean Claude, though, the GPTs just go on a tear calling tools with no context for the user) they're already decent at automating stuff for the user without a dedicated harness (our default system prompt is still "You are a helpful AI assistant", lol). Add tools for Graph API stuff, and now it can pull the nightly batch file from a support inbox, unzip the spreadsheet within, diff it against yesterday's and generate an import file for new users and draft an email to welcome them, something that used to be a daily support task (which I'd already automated most of - but now you don't need a dev for this kind of thing). Or go find the big 450,000+ row spreadsheet that's being automated somewhere on SharePoint, pull it down in 150,000 row chunks (Graph Excel REST API limit) and write code to go figure out whatever the user is asking.
Having implemented and used it, I like this setup so much it kinda ruined Claude.ai and ChatGPT.com for me, so I've hooked up similar access for them using a browser extension to add the folder picker input, with the extension talking to a local server to tell it which folder to give access to, and Claude/ChatGPT talking to the same server over MCP via a CloudFlare Tunnel to work with the selected folder.
We're obviously going to be holding ourselves back in terms of scale and in terms of not being a "true" SaaS with this approach, but my thesis is that we get much higher quality results and higher compliance/activation and can charge more for the bespoke model backed by our own platform.
I haven't tried it, or know a lot about it, but isn't this the whole claw thing?
The power of Excel is not what it was. Nor is the power of ordinary thought.
Isn’t that literally Claude’s web UI?
This is probably fine as long as the code is acting on local resources. The moment you have vibe coded software interacting with shared state or database the risk increases exponentially and all it takes to have a bad day is a poorly worded prompt from one of those users.
Some oversight by humans or automated guardrails will probably reduce those instances.
/s
A figma like dashboard for turning ClaudeCode, Gemini Cli, Codex into an OpenClaw but with security measures to break the lethal trifecta while running on a VM.
But it's not quite there in terms of usability. I agree that is the hardest part of the equation. It's something I'm constantly experimenting with and haven't found the solution to it yet. Open to feedback!
It's targeted for creatives atm. For the few in private testing, it's been amazing what they're able to do with the little tooling I've given them. It is a legitimate change in their daily drive.
I don't know anyone not building a product in that space
I have a vision for what will be the next household ChatGPT:
1. An actually frictionless way of keeping the human in the loop. My product is primarily targeting that: Your tools should feel like an extension of you, not replacing you.
2. Juggling work. I feel like what I'm making here is the secret sauce, so keeping a hush on it :)
3. Keeping all your work in one place. Drawing, sketching, developing, emailing, planning, writing; there is no reason to depend on other apps if you have one place that does it all, and it's the best offering among them.
Edit with some follow up thoughts -- I think what I'm trying to make is best summarized as claude code for non-developers (that's what I put in my YC application), but I think what I'm trying to make doesn't quite even have a developer equivalent.
There's not an environment you can go into right now and say "after this builds every single time, deploy to this machine" and it actually seamlessly does that. The tech is there but making it a whole Factorio-esque operation is still very manual -- and that's what I'm solving.
Good for your feelings, but I feel the same for my work ..
The main problem is still, agents are not reliable and what normal (and dev) people really want, is to have them reliable. Or well, tools to manage unreliable agents in a more clear way.
(It is a big market I think)
Super early stage but I am really happy to read your comment.
You mean UX? Isn't Claude Cowork supposed to be 'Claude but for normies'? As for Claude Code / OpenAI Codex for non-programmers, believe Replit, Loveable, & others are trying & succeeding.
WhatsApp comes to mind in how its sole focus on replacing SMS (rather than Skype/AOL/MSN Messenger/YChat/GChat) meant it had no (user-facing) password/username, no elaborate signup, no login, no chat/friend requests, no sync etc. & became the biggest social network right under the nose of well resourced competitors with worldwide distribution, like Google & Facebook.
Probably phone operators were not impacted too: SMSes bundled with flat plans are still flat plans and Europe style unlimited calls + 100 SMS per month plans are still there and those SMSes are still mostly unused.
So we could have a killer app and yet nothing changes in the flow of money around it.
UX wise, WhatsApp is a big improvement over SMS. Vocal messages, I'm not a fan of them. A waste of my time.
Mobile network operators lost the profits (at prices that were pretty much pure margin) they had on pay as you go messages, and messages not included in flat plans (e.g. overseas SMS's). They also lost a huge amount on highly profitable overseas calls. Those of us with family in other countries save a lot of money by using Whatsapp and similar instead of phone calls.
Claude can write code pretty well, but there are just a few tasks that I need to do to orchestrate everything. If it could do those tasks well even some of the time it would be about 10x more useful.
It's called Zenning AI - we're a small team in London, testing it with a few companies at the moment!
Honestly though we are finding that a little FDE to set up pre-bake stuff that’s sufficiently specific to the customer is needed. Otherwise people are like, “I don’t need to close the books, I need to do a per-working-day profitability analysis for 10 EU countries with different public holidays”, and they get stuck there.
[1] https://www.arte.tv/en/videos/126831-000-A/arte-reportage/
Example: https://www.theguardian.com/world/2024/dec/18/why-former-fac...
AGI will solve poverty, btw. Any second now. Just need 500 bil more bro.
Or don’t tell me, if it’s well worth the 24min watch
In my neck of the woods, B2B invoices are now required to be delivered over the Peppol network in UBL format, which further improves reliability.
Doesn't necessarily eliminate the need for an accountant, because the chosen UBL standard has lots of room for interpretation and ambiguity, and it's impossible to uniformly decide how process an invoice based on the invoice alone (e.g. is this deductible? is this even a business expense at all? which ledger should this go in? etc).
* find invoice I_E for expense E
* associate and categorize E based on I_E and transaction field
These things are annoying but Claude Code is great at it and it leaves a much smaller set I have to manually resolve. This is a class of problems that are tractable and checkable, which I happily use LLMs on. If it miscategorizes it, I'm going to see it because I'm looking over the accounts. In fact, I was previously using a different accounting app which had poor API support, so I dumped it so I could use Claude and it's incredible how much this helps me.
There is an enormous number of use-cases that Claude/GPT are good for and the hard part is market penetration here. As an example, my dad was looking at some statistical health survey data in India and working out what things you could glean from it. Claude identified the things that would complicate his analysis in no time. He's 70 years old, and he'd done it all manually until he asked me (I've got a Mathematics degree) if something made statistical sense to do. I told him what it likely was and then asked him to try Claude. Knocked out his work and mine in moments. But he didn't think to use it. Now I have to get him a ChatGPT/Claude subscription.
It's like how if you go to the Datadog pricing page they don't list a feature set. They have all these use-case lists with prices. You can build things using their base metrics functionality and logs functionality but showing the use-cases must have more adoption.
Interesting, sometimes they want to show you they’ll simply charge 2-3 percent of your monthly spend (https://www.datadoghq.com/pricing/?product=audit-trail#produ...)
Anthropic's response: let's make a nice package out of this, and let's target specifically the businesses that are less likely to be ready to manage such horrible events.
Also, small business contracts likely do not have the same type of language around indemnity/SLAs, so it is easier for the harms of this type of system to go unpunished because those who are harmed are even less knowledgeable.
It's just like getting Google support.
This is dangerous. Relying on so much of your business on a third party. We've seen this many times before where businesses get destroyed because something gets broken somewhere that they have outsourced and have no control over.
In my view this service should not be used, unless there is a local llm or clear manual alternative.
Then the question begs - Why use Claude at all?
Maybe a proof of concept only while you come up with a real solution. Maybe to use claude to get rid of Claude
The people who get dazzled by bright lights are going to be the ones licking their wounds later. There is going to be eggs on faces one day.
Must be nice being able to ruthlessly lie with "this is the future" marketing claims, while hiding behind this term of service.
It amazes me that we are going to litigate this like they did with cars over horses, or machines vs human labor. I honestly don't think Claude should be running companies.
I can tell you the drag is between your own tools and the real world (which is very messy and inconsistent): taxes, compliance, payroll, amendments, share structures, etc.
Within my island, my books are in order, invoices and time keeping is fully automated, calendars and sales pipelines are connected.
I'm sure there are many businesses whose inner islands are not as orderly. The zillion tools out there all try to bring equanimity to the chaos and yet here we still are with fresh books, quickbooks, and xero...
I scaled to 30+ people with automated administration. My cost was under $150 a month for everything we needed to run a successful consultancy and product business. Our accountant was blown away by how simple his life was.
I'm constantly amazed at how it has gotten much worse in the resulting decade.
E.g traditional automation + humans handling the drag = $4,000 per month with a couple of known blunder each year
vs traditional automation + AI = $400, with unknown number of blunders.
Of course it depends how much a blunder costs, to solve, or swallow. But I would bet that accounting errors even for a small business would cost the business on the long run. And that's assuming we don't yet have adversarial behavior which we can expect to come from both the inside and the outside.