Posted by y42 14 hours ago
For work, unlimited usage via Bedrock.
Yes I’d like to get more usage out of my personal sub, but at 20/mo no complains
https://thoughts.jock.pl/p/adhd-ai-agent-personal-experience...
And by crikey do I empathise with the poor support in this article. Nothing has soured me on Anthropic more than their attitude.
Great AI engineers. Questionable command line engineers (but highly successful.) Downright awful to their customers.
Strange how things can change!
The services (OpenAI, Anthropic) are not wildly changing that much. People are just using LLMs more and getting frustrated because they were told it would change the world, and then they take it out on their current patron. Give it a month and we'll be hearing how far OpenAI has fallen behind.
The first job of any support system—both in terms of importance and chronologically—is triage. This is not a research issue and it's not an interaction issue. It's at root a classification problem and should be trained and implemented as such.
There are three broad categories of interaction: cranks, grandmas, and wtfs.
Cranks are the people opening a support chat to tell you they have vital missing information about the Kennedy Assassintion or they want your help suing the government for their exposure to Agent Orange when they were stationed at Minot. "Unfortunately I can't help with that. We are a website that sells wholesale frozen lemonade. Good luck!"
Grandma questions are the people who can't navigate your website. (This isn't meant to be derogatory, just vivid; I have grandma questions often enough myself.) They need to be pointed toward some resource: a help page, a kb article, a settings page, whatever. These are good tasks for a human or LLM agent with a script or guideline and excellent knowledge/training on the support knowledge base.
WTFs are everything else. Every weird undocumented behavior, every emergent circumstance, every invalid state, etc. These are your best customers and they should be escalated to a real human, preferably a smart one, as soon as realistically possible. They're your best customers because (a) they are investing time into fixing something that actually went wrong; (b) they will walk you through it in greater detail than a bug report, live, and help you figure it out; and (c) they are invested, which means you have an opportunity for real loyalty and word-of-mouth gains.
What most AI systems (whether LLMs or scripts) do wrong is that they treat WTFs like they're grandmas. They're spending significant money on building these systems just to destroy the value they get from the most intelligent and passionate people in their customer base doing in-depth production QC/QA.
I occasionally ask AI to write lots of code such as a whole feature (>= medium shirt size) or sometimes even bigger components of said feature and I often just revert what it generated. It's not good for all the reasons mentioned.
Other times I accept its output as a rough draft and then tell it how to refactor its code from mid to senior level.
I'm sure it will get better but this is my trust level with it. It saves me time within these confines.
Edit: it is a valuable code reviewer for me, especially as a solo stealth startup.
The thing is running local LLMs will give some kind of reliability and fixed expectations that saves a lot of time - yeah sure Claude might be fantastic one day, but what do I do when the same workload churns out shit the next and I am halfway thru updating and referencing a 500 document set?
Better the devil you know and all that.
Even a simple prompt focused on two files I told Claude to do a thing to file A and not change file B (we were using it as a reference).
Claude’s plan was to not touch file B.
First thing it did was alter file B. Astonishing simple task and total failure.
It was all of one prompt, simple task, it failed outright.
I also had it declare that some function did not have a default value and then explain what the fun does and how it defaults to a specific value….
Fundamentally absurd failures that have seriously impacted my level of trust with Claude.