IBM's new SWE agents for developers

Posted by sandwichsphinx 5 days ago

IBM's new SWE agents for developers(research.ibm.com)

76 points | 70 comments

wmal 5 days ago|

I wanted to find the actual change performed by these agents so I watched the embedded video. I can not believe what I saw.

The video shows a private fork of a pubic repository. The bug is real, but it was resolved in February 2023 and doesn’t seem like the solution was automated [1]

The bug has a stack trace attached with a big arrow pointing to line 223 of a backend_compat.py file. A quick grasp on this stack trace and you already know what happened and why, and how to fix this, but…

not for the agent. It seems to analyze the repository in multiple steps and tries to locate the class. Why did they even release this video?

[1] https://github.com/Qiskit/qiskit/issues/9562

negoutputeng 5 days ago||

Mgmt at every company is asked - what are you doing to be agentic ?

so, they organize hackathons where devs build a hypothetical agentic framework nobody will dare use. So, mgmt can claim, look here what i have done to be agentic.

you should ask: would you dogfood your agent, and the answer is no way. these are meant purely for marketing purposes, as they dont meet an end user need.

negoutputeng 5 days ago|||

whats hilarious in this farce is how these are being rebranded from "co-pilots" to "agents"

just goes to show, it is all a big song-and-dance. much ado about nothing.

jjmarr 5 days ago|||

The term "co-pilot" implies a company has to hire a software engineer to guide the AI.

The term "agent" implies you can give the AI full access to your repos and fire the software engineers you're grudgingly paying six figures to.

The second is much more valuable to executives not wanting to pay the software people that demand higher salaries than virtually everyone else in the organization.

viraptor 5 days ago|||

They're was no rebrand. They're different concepts. Copilot and similar solutions are giving hints as you do the development. Agents are systems that receive a goal and will iterate actions and queries for more information until they achieve the goal.

negoutputeng 5 days ago||

you are quoting the party-line.

i am saying, the thing is snake-oil - a solution looking for a problem.

viraptor 5 days ago||

I'm explaining what words mean. Agentic approach has been a thing for years https://en.wikipedia.org/wiki/Intelligent_agent You can just say you don't like AI in programming, without saying incorrect things on top of that.

mooreds 5 days ago|||

Right. Woe is the startup that doesn't have an AI story right now.

whiplash451 5 days ago||

The companies that have a data moat and no AI are in a much better position than those who’ve got it the other way around.

mooreds 5 days ago||

Depends on what you are optimizing for.

Long term value, I agree.

Fundraising, hard disagree.

colonwqbang 5 days ago|||

Classic machine learning researcher trick: just select your test example from the training set! It certainly saves a lot of effort.

wmal 5 days ago||

That’s true, but this repo has thousands of bugs. They could at least find one that was in the training set, but also did not contain the location in the bug description.

This way it would at least look like it may work

toomuchtodo 5 days ago|||

Decision makers and those writing the check aren’t sophisticated enough to know the difference, in my experience with orgs that buy from IBM.

negoutputeng 5 days ago||

every hype cycle runs through a predictable course.

we are at a phase where the early adopters have seen the writing on the wall.. ie that llms are useful for a limited set of usecases. but there are lots of late adopters who are still awestruck and not disillusioned yet.

colonwqbang 5 days ago|||

Indeed. It's also amusing how it produces a multi-page essay on the bug instead of submitting a pull request with an actionable fix.

rgavuliak 5 days ago|||

The demo is not supposed to wow the technical people. The business people whose budgets will pay for this are less likely to notice.

viraptor 5 days ago|||

I think the process could be better, but if you want good quality you really shouldn't expect it to just jump at the "obvious" thing. Just like you wouldn't want the developer to just make the error to away in the quickest way. Getting more context is always going to be a good idea, even if it wastes some time in the "trivial" cases.

guluarte 5 days ago|||

it takes more time to watch the video than fix the bug

bubaumba 5 days ago||

you can't expect all at once. just one step forward. note how fast everything moves since 2020, and accelerating. finally 'it's' coming...

BugsJustFindMe 5 days ago||

> But with the SWE localization agent, a [ibm-swe-agent-2.0] could open a bug report they’ve received on GitHub, tag it with “ibm-swe-agent-1.0” and the agent will quickly work in the background to find the troublesome code. Once it’s found the location, it’ll suggest a fix that [ibm-swe-agent-2.0] could implement to resolve the issue. [ibm-swe-agent-2.0] could then review the proposed fix using other agents.

I made a few minor edits, but I think we all know this is coming. This calls itself "for developers" for now, but really also it's "instead of developers", and at some point the mask will come off.

alkonaut 5 days ago||

It will suck to babysit LLMs as a job. In one sense perhaps it will be nice to have models do the chores. But I fear we’ll be 90% babysitting. Today I was in an hour long chat with ChatGPT about a problem when it circled back to its initial (wrong) soliton.

I have very little fear for my own job no matter how good models get. What happens is that software gets cheaper and more of it is bought. It’s what happened in every industry with automation.

Those who can’t operate a machine though (in this case an AI) should maybe worry. But chances are their jobs weren’t very secure to begin with.

rkozik1989 4 days ago|||

Baby sitting LLMs is already my job and has been for a year. It's kind of boring but honestly after nearly 20 years in the game I felt like I was approaching endgame for programming anyways.

mistrial9 4 days ago|||

one more thing - you won't get a "job" .. on-demand temps can fill the roles, and are much cheaper for the company. It is happening already.

bloopernova 5 days ago|||

All the project/product managers that think they are the ones responsible for team success are going to get a rude awakening. When they try to do the job of an entire team, it's going to come apart pretty quickly. LLMs are a tool, nothing more, they don't magically imbue the user with competency.

digging 5 days ago|||

They're not going to try to do the job, they're going to hire cheaper, worse SWEs to manipulate AI... and then things will come apart pretty quickly :) But they'll still have someone else to blame.

> LLMs are a tool, nothing more, they don't magically imbue the user with competency.

Not a good take though, IMO. They're literally a tool that can teach you how to use them, or anything else.

bloopernova 5 days ago|||

> > LLMs are a tool, nothing more, they don't magically imbue the user with competency.

> Not a good take though, IMO. They're literally a tool that can teach you how to use them, or anything else.

I disagree. In their current incarnation, LLMs require a human subject matter expert to determine if the output is valid. In the project manager team lead example, the LLM won't tell you if the database is sized correctly, or if you even need a database.

DebtDeflation 4 days ago|||

>they're going to hire cheaper, worse SWEs to manipulate AI

This is 100% the play.

Right now you can hire 5 devs in India to do the job of 1 competent US dev and save 30-40% on total cost.

Add in AI and it will only take 3 devs in India to do the same work, and can now save 50-60% on total cost.

throw234234234 5 days ago|||

They will ensure that before that happens that won't occur; I'm sure they will cover their bases. AI is great for PM's/Product/C-Suite types (i.e. the decision makers). Bad for the do'ers/builders long term IMO.

RealityVoid 5 days ago|||

I don't care. I swore to myself that if the time comes my skills will no longer be needed, I'd gracefully ride into the sunset and do some other thing.

giantg2 5 days ago|||

Sounds nice until you actually have to find some other thing, especially with the bar for entry being high for most interesting and well compensated jobs. It will be even worse when you have huge numbers of other devs also looking for a new job.

mycall 5 days ago||||

This is really the only answer. Be water my friend.

rzzzt 5 days ago||

Incompressible, freeze around 0°C, corrosive to metal, got it.

bun_at_work 5 days ago||

side-step flamebait like winnie the poo

bravetraveler 5 days ago||

Oh, bother

soco 5 days ago||||

Hopefully that some other thing puts bread on your table.

sesteel 5 days ago|||

I've taken up a new career as an AI influencer.

lyu07282 5 days ago|||

Give IBM a trillion dollars and they couldn't threaten a 7 year olds lemonade stand business, I think we'll be safe lol

skywhopper 5 days ago|||

That’s their goal, no doubt. And I’m sure a lot of zombie projects will be blindly turned over to this type of agent and left to rot. But in practice, these agents will never replace humans, because someone will have to oversee them, and that human will probably just be the “developer” that was “replaced” by them. The work will suffer, the quality will suffer, the enjoyment of the human will suffer, the costs will increase, but some salesperson and some mid level exec will be able to claim they sold and deployed AI and get a bonus.

Workaccount2 5 days ago|||

Developers are not going to go away, but the cushy high salaries likely will. Skill development follows a logarithmic curve where an AI boost to junior devs will be much more than the boost given to senior devs. This discrepancy will pull down the value of devs as you will get "more band for you buck" from lower tier devs, since the AI is comparatively free.

Although I also wonder about the development of new languages that may be optimized for transformers, as it seems clumsy and wasteful to have transformers juggle all the tokens needed to make code readable by humans. That would be really interesting to have a model that outputs code that functions incredibly but is indecipherable by humans.

lwhi 5 days ago|||

Junior devs don't always understand enough to know why something should or shouldn't be done.

I don't think junior devs are going to benefit; if anything, the whole role of 'junior' has been made obsolete. The rote / repetitive work a junior would traditionally do, can now be delegated wholesale to a LLM.

I figure, productivity is going to be increased a lot. We'll need less developers as a result. The duties associated with developers are going to morph and become more solutions / architecture orientated.

Workaccount2 5 days ago||

What you say could be true too (or a combo), the outcome will still be the same though as more devs compete for fewer positions.

j-krieger 5 days ago|||

at some point, this will explode in a giant mess when your Codebase is littered by AI generated trash.

zeroonetwothree 5 days ago|||

There’s still a huge gulf to cross to get to “instead of”.

sksxihve 5 days ago|||

Easy fix, start publishing public repos on github with incorrect code so the AI is trained on it.

invalidOrTaken 5 days ago||

bring it on lol

dingnuts 5 days ago||

time to start a consultancy that specializes in unfucking the mess made by generative AI

neom 5 days ago|||

I run a startup accelerator with a law firm partner (but not a legal accelerator) - and some of the stuff I hear in the lunchroom is wild. No doubt the firm is going to do extremely well un-fucking gen AI legal mess.

bubaumba 5 days ago||||

not only AI, we have one 'guru' who sounds like he is reading copilot on remote audio only meetings.

giantg2 5 days ago||

Thank you for a great career idea.

bubaumba 5 days ago||

great minds think alike. remote consulting looks within the reach now.

alfalfasprout 5 days ago||||

AI is the new bottom-of-the-barrel outsourced contractor.

mycall 5 days ago||||

Reminds me of fixing all the half-baked vendor's work my company pays good money for.

Let the AI write all the code and programmers will do the fixes.

mistrial9 5 days ago|||

yeah - alongside other in-demand services. like apartment building management, corporate janitorial services, and public transportation bus drivers.

alephxyz 5 days ago||

>That score places the IBM SWE agent high up the SWE-bench leaderboard, well above many other agents relying on massive frontier models, like GPT-4o and Claude 3.

They're not even in the top half of the leaderboard. Almost half the score of the first place agent.

jcgrillo 5 days ago||

Which block in the flowchart is the one which will try to sell me db2?

hrmacb 5 days ago||

"It made sense for IBM to build agentic tools like these, argues Ruchir Puri, chief scientist at IBM Research, not just for its own developers, but for all the enterprise developers IBM strives to assist."

What a weird sentence. Mx. Puri does not argue anything, this is just an unfounded claim. So far it just looks like snake oil that is to be sold to other companies.

This would actually be a good business strategy: Sell software that diminishes productivity to your competition and watch them disintegrate.

TeslaCoils 5 days ago||

Sure... https://www.cnbc.com/2024/06/17/mcdonalds-to-end-ibm-ai-driv...

valcron1000 5 days ago||

How many millions were spent on building this "agent" that can "fix" a null pointer exception by wrapping it in a null check?

negoutputeng 5 days ago||

I would have liked to see a giant ppt of an agentic framework or architecture. Call it Enterprise Agentic Framework or something like that. The architecture diagram would fill an entire ppt slide and bedazzle its customers.

All i got instead are lame tools for developers.

kayodelycaon 5 days ago||

I wonder what kinds of errors it can actually detect. I’d love to throw it at my support queue: find the reason this thing got stuck in the interaction between three state machines which are not defined as state machines.

Or is this the next iteration of static analysis?

whiplash451 5 days ago|

What worries me most is because there is no way to prove the negative value of these agentic scams and because swe teams are (sadly) compressible to some extent, some companies will simply let go 10% of their workforce while the remaining 90% will have no choice but to keep grudging with the additional “benefit” of having to show the positive value of this scam to their hierarchy (unless they want to apply to the 10%). So much waste and sadness all around.

More comments...