Posted by serjester 4 days ago
I think this piece is underestimating the difficulty involved here though. If only it was as easy as "just pick a single task and make the agent really good at that"!
The problem is that if your UI involves human beings typing or talking to you in a human language, there is an unbounded set of ways things could go wrong. You can't test against every possible variant of what they might say. Humans are bad at clearly expressing things, but even worse is the challenge of ensuring they have a concrete, accurate mental model of what the software can and cannot do.
It's almost like we really might benefit from using the advances in AI for stuff like speech recognition to build concrete interfaces with specific predefined vocabularies and a local-first UX. But stuff like that undermines a cloud-based service and a constantly changing interface and the opportunities for general spying and manufacturing "engagement" while people struggle to use the stuff you've made. And of course, producing actual specifications means that you would have to own bugs. Besides eliminating employees, much interest in AI is all about completely eliminating responsibility. As a user of ML-based monitoring products and such for years.. "intelligence" usually implies no real specifications, and no specifications implies no bugs, and no bugs implies rent-seeking behaviour without the burden of any actual responsibilities.
It's frustrating to see how often even technologists buy the story that "users don't want/need concrete specifications" or that "users aren't smart enough to deal with concrete interfaces". It's a trick.
An app? We don’t even need to put AI in it, turns out you can book flights without one.
Similarly, software eating the world was actually pretty much fine, but SaaS is/was a bit of a trap. And anyone who thought SaaS was bad should be terrified about the moats and platform lock-in that billion dollar models might mean, the enshittification that inevitably follows market dominance, etc.
Honestly we kinda need a new Stallman for the brave new world, someone who is relentlessly beating the drum on this stuff even if they come across as anticorporate and extreme. An extremist might get traction, but a call to preserve things as they are probably cannot / should not.
It's a shame if new interface = credible by default. Look at all the car manufacturers (well some, probably not enough) finally after many years conceding the change to touch interfaces "because new" was a terrible idea, when the right old tool for the job was simply better...and obvious to end-users very quickly.
I'm not equating new = bad. I'm saying new = good is wrong. And based on your last sentence, you do think car manufacturers all switching over to all touch controls was a problem. Almost everyone prefers buttons to touch screens, that's my point. The better more popular option was rejected because of a false premise, or false belief.
More to the point though.. at the beginning at least, Stallman was a respected hacker and not just some random person pushing politics on a community he was barely involved with. It's gotta be that way I think, anyone who's not a respected AI/ML insider won't get far
You don’t have any more credibility than most other HN users… so just stating insinuations as if they were self evident doesn’t even make sense.
"Oh, there's one tiny feature that management is really really interested in, make the AI gently upsell the user on a higher tier of subscription if an opportunity presents itself."
I can't imagine we're anywhere even close to the kind of perfection required not to need something like this - if it's even possible. Humans use all kinds of review and audit processes precisely because perfection is rarely attainable, and that might be fundamental.
It is almost impossible to produce a useful result, far as I’ve seen, unless one eliminates that mistake from the context window.
There are so many times where I get to a point where the conversation is finally flowing in the way that I want and I would love to "fork" into several directions from that one specific part of the conversation.
Instead I have to rely on a prompt that requests the LLM to compress the entire conversation into a non-prose format that attempts to be as semantically lossless as possible; this sadly never works as in ten did [sic].
Certainly true, but coaching it past sometimes helps (not always).
- roll back to the point before the mistake.
- add instructions so as to avoid the same path. "Do not try X. We tried X it does not work as it leads to Y.
- add resources that could aid a misunderstanding (api documentation, library code)
- rerun the request (improve/reword with observed details or insights)
I feel like some of the agentic frameworks are already including some of these heuristics, but a helping hand still can work to your benefit
But then again, I know how it could avoid the mistake, so I point that out, from that point onwards it seems fine (in that chat).
LLMs are supposed to save us from the toils of software engineering, but it looks like we're going to reinvent software engineering to make AI useful.
Problem: Programming languages are too hard.
Solution: AI!
Problem: AI is not reliable, it's hard to specify problems precisely so that it understands what I mean unambiguously.
Solution: Programming languages!
When smartphones first popped up, browsing the web on them was a pain. Now pretty much the whole web has phone versions that make it easier*.
*I recognize the folly of stating this on HN.
There's apps that open links in their embedded browser where ads aren't blocked. So I need to copy the link and open them in my real browser.
Well, cryptocurrency was supposed to save us from the inefficiences of the centralized banking system.
There's a lesson to be learned here, but alas our sociiety's collective context window is less than five years.
It doesn't have to be this way. Even before the pandemic I remember some companies simply gave me access to an internal app to choose flights where the only flights shown are these of the right date, right airport, and right price.
The only problem with most of the flights I book now is that they're with low cost airlines and packed with dark patterns designed to push upgrades.
Would an AI salesman be any better though? At least the website can't actively try to pursuade me to upgrade.
An actually useful agent is something that is totally doable with technologies even from a decade ago, which you by necessity need to host yourself, with a sizeable amount of DIY and duct tape, since it won’t be allowed to exist as a hosted product. The purveyor of goods and services cannot bargain with it so it puts useless junk into your shopping cart on impulse. You cannot really upsell it, all the ad impressions are lost on it, and you cannot phish it with ad buttons that look like the UI of your site — it goes in with the sole purpose to make your bookings/arrangements, it’s a quick in-and-out. It, by its very definition and design, is very adversarial to how most companies with Internet presences run things.
And like, as a socially anxious millennial, no I don't particularly like phone calls. However I also recognize that setting my discomfort aside, a direct connection to a human being who can help reason out a problem I'm having is not something easily replaced with a chatbot or an AI assistant. It just isn't. Perfect example: called a place to make a reservation for myself, my wife and girlfriend (poly long story) and found the place didn't usually do reservations on the day in question, but the person did ask when we'd be there. As I was talking to a person, I could provide that information immediately, and say "if you don't take reservations don't worry, that's fine," but it was an off-busy hour so we got one anyway. How does an AI navigate that conversation more efficiently than me?
As a techie person I basically spend the entire day interacting with various software to perform various tasks, work related and otherwise. I cannot overstate: NONE of these interactions, not a single one, is improved one iota by turning it into a conversation, verbal or text-based, with my or someone else's computer. By definition it makes basic tasks take longer, every time, without fail.
We need to go back to a more innocent time when we could ask a select group of friends and their trusted chain of friends for recommendations. Not what social media is today.
I dislike driving through Texas, and so, most road trips involve McDonalds - the only time I eat the junk.
My car's inbuilt nav is 13 years out of date, so it knows major throughways but not, for instance, that the road I live on has its own interface to the "highway", and so on, up to restaurants. Phones are unreliable in a lot of the US, and at one point I had a spare phone with all of its storage dedicated to offline Google maps just so I wouldn't get stuck in the Rockies somewhere.
Microsoft used to sell trip planning software and those were the good old days.
Like, really what you're wanting is legitimate information not bound to the whims of advertisers and marketers (and again, to be clear, don't we fucking all) but I don't think an LLM is going to do that for you. If it does it now, and that's a load-bearing if, I have a strong feeling that's because this tech, like all tech, is in it's infancy stage. It hasn't yet gotten enough attention from corporations and their slimy marketing divisions, but that's a temporary state of affairs and has been for every past tech too. Like, OpenAI just closed another funding round and it's valuation is now THREE HUNDRED BILLION. Do you REALLY think they and by extension/as a result, their competitors, are going to be thinking about editorial independence when existing established information institutions already can't?
Or, for that matter, solutions you can trust. Remember the pitch for Amazon Dash buttons, where you press it and it maybe-reorders a product for delivery, instantly and sight-unseen? What if the price changed? What if it's not exactly the same product anymore? Wait, did someone else already press it? Maybe I can get a better deal? etc.
Actually, that spurs a random thought: Perhaps some of these smart-speaker ordering pitches land differently if someone is in a socioeconomic class where they're already accustomed to such tasks being done competently by human office-assistants, nannies, etc. Their default expectation might be higher, and they won't need to invest time pinching pennies like the rest of us.
My parents were way older than boomers, and hers were boomers, so maybe that's it?
If i were rich enough to have some bot fly me somewhere, I'd have a real-life minion do it for me.
Unless these tools can be run locally independent of a service provider, we're just trading one boss for another.
Once performance of a local setup is on par with online ones or good enough, that'll be game over for them.
So true, it's not going to be "I use PolishDude20GPTBook because my family and friends are on it". It's going to be, "I use PolishDude20GPTBook because they have contracts with Gazeta.pl, Onet, TVN24, OLX and Allegro, so I can use it to get local news and find best-priced products in a convenient way, whereas I can't use TeMPOraLxity for any of that".
Contracts over APIs, again.
As long as the "think of my copyright / AI slop oneoneone" crowd wins. It must not.
You forgot the dotcom boom? :)
Existence of AI slop has nothing to do with whether the tech itself is exceeding or falling short of its hype. It exists because it's good enough for advertising, the cancer on modern society that metastasizes to every new medium and technology, defiling and destroying everything it touches.
Many of them already can be. Many more existing models will become local options if/when RAM prices decline.
But this won't necessarily prevent enshittification, as there's always a possibility of a new model being tasked with pushing adverts or propaganda. And perhaps existing models already have been — certainly some people talk as if it's so.
The same fate may come to AI, and that worries me. It won't matter whether you're using OpenAI models, Anthropic models, or locally run models, any more than it matters whether you use Firefox, Chrome or raw cURL - if the business gets to differentiate further between users and AI agents working as users, and especially if they get legal backing to doing that, you can kiss all the benefits of LLMs goodbye, they won't be yours as end-user, they'll all accrue to capitalists, who in turn will lend slivers of it to you, for a price of a subscription.
Oh, you mean like everyone who shows up to the Cloudflare submissions pointing out how they've been blocklisted from about 50% of the Internet, without recourse, due to the audacity to not run Chrome? In that circumstance, it's actually worse(?) because to the best of my knowledge I cannot subscribe to Cloudflare Verified to not get the :fu: I just have to hope the Eye of Sauron doesn't find me
That reminds me, it's probably time for my semi-annual Google Takeout
It stems from the problem I described though - blocking you for not using Chrome is just "only illegitimate users don't use Chrome", which is the next step after "only illegitimate users would want to use our API endpoints without starting a business and signing a formal contract with us".
Not only that, we have to be careful about all the integrations being built around it. Thankfully the MCP standard is becoming mainstream (used by Anthropic, OpenAI and next could be Google) and it's an open standard, even if started by Anthropic so we won't have e.g. Anthropic specific integrations.
That's the root of the problem. That's precisely why computers are not the "bicycles for the minds" they were imagined to be.
It's not a conspiracy theory, either. Most of the tech industry makes money inserting themselves between you and your problem and trying to make sure you're stuck with them.
I can’t think of any technological reasons why every digital system can’t have an API (barring security concerns, as those would need to be case by case)
So instead, we put 100s of billions of dollars into statistical models hoping they could do it for us.
It’s kind of backwards.
If I run a flight aggregator that has a majority of flight bookings, I can start charging 'rents' by allowing featured/sponsored listings to be promoted above the 'best' result, leading to a prisoner's dilemma where airlines should pay up to their margins to keep market share.
If an AI company becomes the default application human interface, they can do the same thing. Pay OpenAI tribute or be ended as a going concern.
What I’m saying is that if there was a standard protocol for making travel plans over the internet, we wouldn’t need an AI agent to book a trip.
We could just create great user experiences that expose those APIs like we do for pretty much everything on the web.
I think this highlights how we still haven’t cracked intelligence. Many of these issues come from the model’s very limited ability to adapt on the fly.
If you think about it every little action we take is a micro learning opportunity. A small-scale scientific process of trying something and seeing the result. Current AI models can’t really do that.
I believe that's a famous Army Ranger expression: "the map is not the terrain" (I tried to find an attribution for it but it seems it comes in "the map is not the territory" flavors, too)
>> Yet too many AI projects consistently underestimate this, chasing flashy agent demos promising groundbreaking capabilities—until inevitable failures undermine their credibility.
This is the problem with the 'MCP for Foo' posts that recently.
Adding a capability to your agent that the agent can't use just gives us exactly that:
> inevitable failures undermine their credibility
It should be relatively easy for everyone to agree that giving agents an unlimited set of arbitrary capabilities will just make them terrible at everything; and that promising that giving them these capabilities will make them better is:
A) false
B) undermining the credibility of agentic systems
C) undermining the credibility of the people making these promises
...I get it, it is hard to write good agent systems, but surely, a bunch of half-baked, function-calling wrappers that don't really work... like, it's not a good look right?
It's just vibe coding for agents.
I think it's quite reasonable to be say, if you're building a system, now, then:
> The key to navigating this tension is focus—choosing a small number of tasks to execute exceptionally well and relentlessly iterating upon them.
^ This seems like exceptionally good advice. If you can't make something that's actually good by iterating on it until it is good and it does work, then you're going to end up being a devin (ie. over promised, over hyped failure).
I literally sat in a meeting with one of our board members who used this exact example of how "AI can do everything now!" and it was REALLY hard not to laugh.
Love if it could help with that but I haven't figured it out with Google Flights yet. My dream is to tell an AI agent the above and let it figure out the best deal.
Check the official website, compare pricing with aggregator, check other dates, check people's availability on cheap dates. Sometimes I only do the first step if the official price is reasonable (I travel 1-2x a month, so I have expectation "how much it should cost").
Don't get me started if I also consider which credit card to use for the points rewards.
For example, this person[0] could have simply booked a United flight from the United site for 15k points. Instead the person batch emailed Turkish Airlines booking offices, found the Thai office that was willing to make that booking but required bank transfers in Thai baht to pay taxes, made two more phone calls to Turkish Airlines to pay taxes with a credit card, and in the end only spent 7.5k points for the same trip on United.
This may be an extreme example, but it shows the amount of familiarity with the points system, the customer service phone tree and the actual rules to get cheap flights.
If AI can do all of that, it'd be useful. Otherwise I'll stick to manual booking.
[0]: https://frequentmiler.com/yes-you-can-still-book-united-flig...
https://www.bitsaboutmoney.com/archive/seeing-like-a-bank/
> As a sophisticated user of the banking system, a useful skill to have is understanding whether the ultimate solution to an issue facing you is probably available to Tier Two or probably only available to a professional earning six figures a year. You can then route your queries to the bank to get in front of the appropriate person with the minimal amount of effort expended on making this happen.
> You might think bank would hate this, and aggressively direct people who discover side channels to Use The 1-800 Number That Is What It Is For. For better or worse, the side channels are not an accident. They are extremely intentionally designed. Accessing them often requires performance of being a professional-managerial class member or otherwise knowing some financial industry shibboleths. This is not accidental; that greatly cuts down on “misuse” of the side channels by that guy.
This sounds interesting?
Just don't book a round trip, don't check a bag, don't do it too often. Also you're gambling that they don't cancel your flight and book you on a new one to the city you don't actually want to go to (that no longer connects via the hidden city). You can get half price tickets sometimes with this trick.
Persistent mispricings can only exist if the cost of exploitation removes the benefit or constrains the population.
With a really excellent human assistant who deeply understood my brain (at least the travel related parts of it), it was kind of nice. But even then there were times when I thought it would be easier and better to just do it myself. Maybe it's a failure of imagination, but I find it very hard to see the path from today's technology to an AI agent that I would trust enough to hand it off, and that would save enough time and hassle that I wouldn't prefer to just do it myself.
But if you wanna find the cheapest way to get to A, compare different retailers, check multiple peoples availability, calculate effects of credit cards etc. It takes time. Aren't those things that could be automated with an agent that can find the cheapest flights, propose dates for it, check availability etc with multiple people via a messing app, calculate which credit card to use, etc?
When I'm picking out a flight I'm looking at, among other things:
* Is the itinerary aggravatingly early or late
* Is the layover aggravatingly short or long
* Is the layover in an airport that sucks
* Is the flight on a carrier that sucks
* What does it cost
If you asked me to encode ahead of time the relative value of each of these dimensions I'd never be able to do it. Heck, the relative value to me isn't even constant over time. But show me five options and I can easily select between them. A clear case where search is more convenient than some agent doing it for me.
It's the same problem with Alexa. I don't trust it to blindly reorder me basic stuff when I have to shift through so many bad product listing on the Amazon marketplace.
An LLM could organise flights with a lower error rate, however, when it goes wrong what is the recourse? I imagine it's anger and a self-promise never to use AI for this again.
*If you're saying that the AI just supplies suggestions then maybe it's useful. Though wouldn't people still be double checking everything anyway? Not sure how much effort this actually saves?
In one chapter he describes his frustration with GPS based navigation apps. I thought it was similar to what you describe.
> If I am commuting home, I may prefer a slower route that avoids traffic jams. (Humans, unlike GPS devices, would rather keep moving slowly than get stuck in stop-start traffic.) GPS devices also have no notion of trade-offs, in particular relating to optimising ‘average, expected journey time’ and minimising ‘variance’ – the difference between the best and the worst journey time for a given route.
For instance, whenever I drive to the airport, I often ignore my GPS. This is because what I need when I’m catching a flight is not the fastest average journey, but the one with the lowest variance in journey time – the one with the ‘least-bad worst-case scenario’. The satnav always recommends that I travel there by motorway, whereas I mostly use the back roads.
Because there is no "correct" flight. Your preference changes as you discover information about what's available at a given time and price.
The helpful AI assistant would present you with options, you'd choose what you prefer, it would refine the options, and so on, until you make your final selection. There would be no communication lag as there would be with a human assistant. That sounds very doable to me.
Flights are a good example but I often cite Uber as a good one too. Nobody wants to tell their assistant to book them an Uber - the UX/UI is so streamlined and easy, it's almost always easy enough to just do it yourself (or if you are too important for that, you probably have a private driver already). Basically anything you can do with an iPhone and the top 20 apps is in this category. You are literally competing against hundreds of engineers/product designers who had no other goal than to build the best possible experience for accomplishing X. Even if LLMs would have been helpful a priori - they aren't after every edge case has already been enumerated and planned for.
I think part of what's been happening here is that the hubris of the AI startups is really showing through.
People working on these startups are by definition much more likely than average to have bought the AI hype. And what's the AI hype? That AI will replace humans at somewhere between "a lot" and "all" tasks.
Given that we're filtering for people who believe that, it's unsurprising that they consciously or unconsciously devalue all the human effort that went into the designs of the apps they're looking to replace and think that an LLM could do better.
I think it its somewhat reductive to assign this "hubris" to "AI startups". I would posit that this hubris is more akin to the superiority we feel as human beings.
I have heard people say several times that they "treat AI like a Jr. employee", I think that within the context of a project AI should be treated based on the level if contribution. If AI is the expert, I am not going to approach it as if I am an SME that knows exactly what to ask. I am going to try and focus on the thing. know best, and ask questions around that to discover and learn the best approach. Obviously there is nuance here that is outside the scope of this discussion, but these two fundamentally different approaches have yield materially different outcomes in my experience.
Absolutely not. When giving tasks to an AI, we supply them with context, examples of what to do, examples of what not to do, and we clarify their role and job. We stick with them as they work and direct them accordingly when something goes wrong.
I've no idea what would happen if we treated a junior developer like that.
I concur and would like to add that they are also restrained by the limitations of existing "systems" and our implicit and explicit expectations of said system. I am currently attempting to mitigate the harm done by this restriction by focusing on and starting with a first principal analysis of the problem being solved before starting the work, for example; lets take a well established and well documented system like the SSA.
When attempting to develop, refactor, extend etc... such a system; what is the proper thought process. As I see it, there are two paths:
Path 1:
a) Breakdown the existing workflows
b) Identify key performance indicators (KPIs) that align with your business goals
c) Collect and analyze data related to those KPIs using BPM tools
d) Find the most expensive worst performing workflows
e) Automate them E2E w/ interface contracts on either side
This approach locks you into to existing restrictions of the system, workflows, implementation etc...Path 2:
a) Analyze system to understand goal in terms of 1st principals, e.g: What is the mission of the SSA? To move money based on conditional logic.
b) What systems / data structures are closest to this function and does the legacy system reflect this at its core e.g.: SSA should just be a ledger IMO
c) If Yes, go to "Path 1" and if No go to "D"
d) Identify the core function of the system, the critical path (core workflow) and all required parties
e) Make MVP which only does the bare min
By following path 2 and starting off with an AI analysis of the actual problem and not the problem as it exist as a solution within the context of an existing system, it is my opinion that the previous restrictions have been avoided.Note: Obviously this is a gross oversimplification of the project management process and there are usually external factors that weigh in and decide which path is possible for a given initiative, my goal here was just to highlight a specific deviation from my normal process that has yielded benefits so far in my own personal experience.
Plug: We just posted a demo of our agent doing sophisticated reasoning over a huge dataset ((JFK assassination files -- 80,000 PDF pages): https://x.com/peterjliu/status/1906711224261464320
Even on small amounts of files, I think there's quite a palpable difference in reliability/accuracy vs the big AI players.
OMFG thank you for saying this. As a core contributor to RA.Aid, optimizing it for SWE-bench seems like it would actively go against perf on real-world tasks. RA.Aid came about in the first place as a pragmatic programming tool (I created it while making another software startup, Fictie.) It works well because it was literally made and tested by making other software, and these days it mostly creates its own code.
Do you have any tips or suggestions on how to do more formalized evals, but on tasks that resemble real world tasks?
And before going to crowd-workers (maybe you can skip them entirely) try LLMs.
What I'm doing right now is this:
1) I have X problem to solve using the coding agent.
2) I ask the agent to do X
3) I use my own brain: did the agent do it correctly?
If the agent did not do it correctly, I then ask: should the agent have been able to solve this? If so, I try to improve the agent so it's able to do that.The hardest part about automating this is #3 above --each evaluation is one-off and it would be hard to even formalize the evaluation.
SWE bench, for example uses unit tests for this, and the agent is blind to the unit tests --so the agent has to make a red test (which it has never seen) go green.
There’s no way I could do what some of these “vibe coders” are doing where they allow AI to write code for them that they don’t even understand.
Basically, what's worse? "Vibes" code that no one understands or a cascade of 20 spreadsheets that no one understands? At least with the "vibes" code you can stick it in git and have some semblance of sane revision control and change tracking.
That sort of makes sense, but then again... if you run some analysis code and it spits out a few plots, how do you know what you're looking at is correct if you have no idea what the code is doing?
Does it reaffirm the biases of the one who signs my paychecks? If so, then the code is correct.
Considering the hallucinations we've all seen I don't know how they can be comfortable using AI generated data analysis to drive the future direction of the business.
> Basically, what's worse? "Vibes" code that no one understands or a cascade of 20 spreadsheets that no one understands?
Correction: it's a "cascade of 20 spreadsheets" that one person understood/understands.
Write only code still needs to work, and someone at some point needs to understand it well enough to know that it works.
I think this is a great use case for AI, but the analyst still needs to understand what the code that is output does. There are a lot of ways to transform data that result in inaccurate or misleading results.
From the boasting I've seen, Vibe coders are also using AI to slop out their tests as well.
You can for spreadsheets too.
There's always been a group of beginners that throws stuff together without fully understanding what it does. In the past, this would be copy n' paste from Stackoverflow. Now, that process is simply more automated.
A lot of these vibe coders just have a much lower bar for reliability than you.
For example, while I feel the need to understand the code I wrote using pytorch, I don't generally feel the need to totally grok how pytorch works.
We just need to be better about making it clear which code is that way and which is not.
That's my experience working with a largeish mature codebase (all on non-prod code) where you can't get far if you can't use various internal libraries correctly. With standalone (or small greenfield) projects, where results can lean more on public info from pre-training and there's not as much project specific info to pull in, you might see different outcomes.
Maybe the tech and surrounding practice will change over time, but in my short experience it's mostly been about trying to just get to 'acceptable' for this kind of task.
When we tried the 'full agent' approach (letting it roam freely through our codebase), we ended up with some impressive demos but constant production incidents. We've since pivoted to more constrained workflows with human checkpoints, and while less flashy, user satisfaction has gone way up.
The Cursor wipeout incident is a perfect example. It's not about blaming users who don't understand git - it's about tools that should know better. When I hand my code to another developer, they understand the implied contract of 'don't delete all my shit without asking.' Why should AI get a pass?
Reliable > clever. It's the difference between a senior engineer who delivers consistently and a junior who occasionally writes brilliant code but breaks the build every other week."
> Does it terrify anyone else that there is an entire cohort of new engineers who are getting into programming because of AI, but missing these absolute basic bare necessities?
> > Terrify? No, it's reassuring that I might still have a place in the world.
[0] https://www.reddit.com/r/cursor/comments/1inoryp/comment/mdo...
Why would you ask the community a question like "how to source control" when you've been working with (presumably) a programming genius LLM that could provide the most personally tailored path for baby's first git experience? Even if you don't know that "git" is a thing, you could ask questions as if you were a golden retriever and the model would still inevitably recommend git in the first turn of conversation.
Is it really the case that a person who has the ability to use a compiler, IDE, LLM, web browser, reddit, etc., somehow simultaneously lacks the ability to frame basic-ass questions about the very mission they set out on? If stuff like this is not manufactured, then we should all walk away feeling pretty fantastic about our future job prospects.
People think "this is hard, I'll re-invent it in an easier way" and end up with a half-assed version of the tooling we've honed over the decades.
This is a win in the long run because the occassional and successful thought people labor over sometimes is a better way.
When the goal is "re-invent programming to make it easier" all you get is a hodgepodge of half-ass solutions like GP said. Enhancing traditional focused workflows seems a lot more interesting to me than "coding assistant".
Hopefully AI tooling will continue to evolve. I don't see how you get around the reliability issues with this iteration of AI (GPT+RLHF+RAG, etc). Transfer learning is still abysmal.
https://www.reddit.com/r/cursor/comments/1inoryp/comment/mdr...
> I'm not a dev or engineers at all (just a geek working in Finance)
This fits my experience of teaching very intelligent students how to code; if you're an experienced programmer, you simply cannot fathom the kinds of assumptions beginners will make due to gaps in yet-to-be foundational knowledge. I remember having to tell students to mindful when searching Stack Overflow for help, because of how something as simple as an error from Requests (e.g. while doing web scraping) could lead them down a rabbit hole of "solutions" such as completely uninstalling their Python for a different/older version of Python.
Often when using an AI agent, I think to myself that a web search gets me what I need more reliably and just as quick. Maybe AI has to learn to crawl before it learns to walk, but each agent I use is leaving me without confidence that it will ever be useful and I genuinely wonder if they've ever been tested before being published...
Nowhere in that story above is there a customer or factory worker feeding in open-ended inputs. The factory is precise, it takes inputs and produces outputs. The variability is restricted to variability of inputs and the reliability of the factory kit.
Much business software is analogous to the factory. You have human workers who ultimately operate the business. And software is built to automate those tasks precisely.
AI struggles because engineers are trying to build factories through incantation - if they just say the right series of magic spells, the LLM will produce a factory.
And often it can. It’s just a shitty factory that does simple things, often inefficiently with unforeseen edge cases.
At the moment, skilled factory builders (software engineers) are better at holistically understanding the needs of the business and building precise, maintainable, specific factories.
The factory builders will use AI as a tool to help build better factories. Trying to get the AI to build the whole factory soup-to-nuts won’t work.