Agent Skills - Hacker News

Posted by BOOSTERHIDROGEN 9 hours ago

227 points | 99 comments

wg0 3 hours ago|

Snake oil. Good to read for sure. Seems all plausible too. But snake oil nevertheless.

Here's why: The slot machine can drop any hard requirement that you specifically in your AGENTS.md, memory.md or your dozens of skill markdowns. Pretty much guaranteed.

These harnesses approaches pretend as if LLMs are strict and perfect rule followers and the only problem is not being able to specify enough rules clearly enough. That's fundamental cognitive lapse in how LLMs operate.

That leaves only one option not reliable but more reliable nevertheless: Human review and oversight. Possibly two of them one after the other.

Everything else is snake oil but at that point, you also realize that promised productivity gains are also snake oil because reading code and building a mental model is way harder than having a mental model and writing it into code.

vidarh 1 hour ago||

Humans also drop any hard requirements you specify regularly, and similarly require review. Nevertheless we manage to increase reliability of human output through processes and reviews, and most of the methods we use for harnesses are taken from experience with how to reduce reliability issues in humans, who are notoriously difficult to ensure delivers reliably.

kaashif 36 minutes ago||

The primary way to increase reliability is to automate. Instead of humans producing some output manually, humans producing machines which produce that output.

I've seen a disturbing trend where a process that could've been a script or a requirement that could've been enforced deterministically is in fact "automated" through a set of instructions for an LLM.

cortesoft 2 hours ago|||

Everything you say is all possible, and in theory I agree with you.

However, I have been using spec-kit (which is basically this style of AI usage) for the last few months and it has been AMAZING in practice. I am building really great things and have not run into any of the issues you are talking about as hypotheticals. Could they eventually happen? Sure, maybe. I am still cautious.

But at some point once you have personally used it in practice for long enough, I can't just dismiss it as snake oil. I have been a computer programmer for over 30 years, and I feel like I have a good read on what works and what doesn't in practice.

wg0 2 hours ago||

We can build all the scaffolding around but I assure you that the LLMs aren't perfect rule following machines is the fundamental problem here and that would remain.

Give it a few more months and I'm sure you'll see some of what I see if not all.

I'm saying all the above having all sorts of systems tried and tested with AI leading me to say what I said.

cortesoft 2 hours ago|||

I have been doing this for 6 months or so now, and I am not sure that even if you have a lot more experience than me that it would make your assessment more accurate, since that just means you have more experience with prior generations of the models. What I have experienced is that the AI has been getting better and better, and is making fewer and fewer mistakes.

Now, part of that is my advancements as well, as I learn how to specify my instructions to the AI and how to see in advance where the AI might have issues, but the advancements are also happening in the models themselves. They are just getting better, and rapidly.

The combination of getting better at steering the AI along with the AI itself getting better is leading me to the opposite conclusion you have. I have production systems that I wrote using spec-kit, that have been running in production for months, and have been doing spectacularly. I have been able to consistently add the new features that I need to, without losing any cohesion or adherence to the principals i have defined. Now, are there mistakes? Of course, but nothing that can't be caught and fixed, and not at a higher rate than traditional programming.

Quarrel 1 hour ago||||

> LLMs aren't perfect rule following machines is the fundamental problem here

I kind of get what you're saying, but let us not pretend that SW engineers are perfect rule followers either.

Having a framework to work within, whether you are an LLM or a human, can be helpful.

saidnooneever 1 hour ago|||

i think it depend on your goals and also your preference / expectation how your experience with LLMs is. i dont mind if they hallucinate. even if i have mental model of code i wont write it myself perfectly either.

the only downside i see is getting out of practice, which is why for my passion projects i dont use it. work is just work and pressing 1 or 2 and having 'good enough' can be a fine way to get through the day. (lucky me i dont write production code ;D... goals...)

kajman 29 minutes ago|||

I hope the only reason people are pretending these markdown suggestions are a "workflow" is fear that a more structured approach will be obsolete by the time it's polished. I can't imagine the pace of innovation with the underlying models will stay like this forever.

I hope to see harnesses that will demand instead of ask. Kill an agent that was asked to be in plan mode but did not play the prescribed planning game. Even if it's not perfect, it'd have to better than the current regime when combined with a human in the loop.

raincole 25 minutes ago|||

Don't let the perfect be the enemy of the good. Of course we know the AGENTS.md and skills aren't 100% effective. But no, it doesn't mean that they're 0% effective.

chaostheory 35 minutes ago||

I can see why this would seem to be “snake oil” logically. However, this approach does work in reality. Your comment just shows that you seem inexperienced with using generative AI.

stellalo 2 hours ago||

> A skill is a markdown file with frontmatter that gets injected into the agent’s context when the situation calls for it.

When the LLM decides that the situation calls for it

> It is a workflow: a sequence of steps the agent follows, with checkpoints that produce evidence, ending in a defined exit criterion.

A sequence of steps the LLM can decide to follow

ai_fry_ur_brain 5 hours ago||

Cant wait for everyone to realize they've wasted a year + messing with agents and experiencing a feeling of psuedo productivity.

cortesoft 2 hours ago||

I can understand skepticism to a degree, and even fundamentally believing that AI is bad for all sorts of reasons, but I am becoming more and more perplexed at the certainty behind statements like this one. How are you so certain that AI development is this doomed? It just hasn't matched my experience at all, and I wonder what your experience is that has driven you to this level of certainty about the certain doom of AI coding?

Is it just a philosophical belief that AI is morally bad? Or have you actually used AI to build things and feel confident that you have explored the space enough to come to such a strong conclusion?

I have been writing code every day for over 30 years, and have been doing it professionally for over 20. I have seen fads come and go, and I have seen real developments that have changed the way I do what I do numerous times. The more experience and the more projects I create with AI, the more certain I am that this is a lasting and fundamental change to how we produce software, and how we use computers generally. I have seen AI get better, and I have seen myself get more proficient at using it to get real work done, work that has already been tested with real world, production, workloads.

You can hate that it is happening, and hate the way working with AI feels, but that doesn't mean it is not providing real value for people and doing real work.

tokioyoyo 2 hours ago|||

I’m a bit curious with these takes. Arguing in good faith - is the general assumption that people who use AI/agents/harnesses don’t ship features? We’ve been all in Claude Code since ~Septemberish, and have been able to successfully track the boost. Like the features that we ship that get used in production. Both from infrastructure side, and business logic implementations. Frontend and backend.

I don’t think people are wasting too much time. Although, I do agree most of these posts are just bs, including this one. But AI-development has been a thing across a lot of companies in the world.

raincole 21 minutes ago|||

You're replying to an account specifically created to post inflammable AI takes (likely a bot anyway). So your attempt

> Arguing in good faith

will be futile, unfortunately.

bot403 2 hours ago||||

Ignore the people who haven't found out how to use ai yet or don't want to.

AI is a powerful tool. Depending on what I need I use chatgpt, in-ide agents, or a platform like Devin.ai.

I use it when it helps me advance my goals. I don't when it doesn't. Sometimes it misses the mark and I scale back and have it do a specific piece and I'll do the rest.

Sometimes I use it to analyze the code base in seconds vs minutes. Sometimes I use it to pinpoint a bug fast.

Ive solved customer issues in seconds and minutes with it vs hours.

I worked on a banking app with deeply domain specific data issues. AI was not very helpful on that team. My current work on consumer web apps mean my problems are more mundane and AI is a big accelerant.

Being and engineer means solving the problems with the right tools with the right tradeoffs as well. It's why I use an idea vs notepad, I use chatgpt for one-off scripts and "chat", and i use agentic workflows for big, repetitive, or "boring" low-stakes tasks.

swyx 2 hours ago||||

> have been able to successfully track the boost.

lets get nitty gritty on this - can you say how you did this? because a lot of people think this is an unsolved problem

vidarh 1 hour ago||

Not the same person, but it really depends on projects. E.g. I have some projects that involve working to large specification sets where we can measure rate of delivery against the spec. If your spec is fuzzy and incomplete, then it gets hard, but then you have little insight into human productivity for those projects either.

_sharp 4 hours ago|||

Right, just like all the productivity lost when people stopped using paper ledgers to mess around with these so-called 'databases'

vidarh 1 hour ago|||

I work on projects where we measure the output. There's nothing "pseudo" about it.

zbentley 1 hour ago||

Tell me, what do you measure? Changes shipped? Lines of code? Customer satisfaction? Defect rate? MTTR? New engineer onboarding time/TTFC? Security/compliance audit turnaround time? Uptime? Employee retention? Rollback/forward-fix rates? Linter errors? Test coverage? Meaningful test coverage?

vidarh 1 hour ago||

Depends on the clients maturity, but some places all of the above.

c0rruptbytes 4 hours ago|||

i treat it like Minecraft automation - it's just for funsies and to pass the time haha

I don't think agentic workflows are there yet, but implementing skills to manually call and use while working side by side with an AI is definitely nice - our company is focused a lot on sandboxing right now and having safe skills

I don't think we've gotten feature development well yet, but the review skills + grafana skills they wrote have been pretty solid

jeremie_strand 1 hour ago||

[dead]

0000000000100 4 hours ago|||

Trick is to not burn too much time worrying about the perfect skills and this and that. See a lot of people filling skills with LLM junk, or overdoing rules that start confusing the LLM. Just try Vanilla, see something you don't like? Then you make a skill and funnel the LLM to use it for the style of task it's working on. E.g. database work is a mixed bag with LLMs, they tend to do work in totally different styles if you leave them unconstrained.

Agents are unbelievably useful at helping takeover and refactor messy codebases though. I just started taking over this monstrous nightmare of a codebase, truly ancient code the bulk of it written over 10+ years ago in PHP. With the use of Claude / Codex I was able to port over the vast majority of the existing legacy storefront and laid the groundwork for centralizing the 10-20k LOC mega-controller logic over to reusable repo/service patterns.

Just shit that would've taking years previously, is achievable in under a month.

BOOSTERHIDROGEN 2 hours ago||

This.

Everything needs an element of human touch, I would somehow only run vanilla things. But if, let’s say, I’m creating backup scripts, I meticulously outline the plan.

pantheragmb 4 hours ago|||

I couldn't agree more, just because I know I already wasted months and pulled the plug :D

wahnfrieden 4 hours ago|||

You haven’t made money from their use yet?

slopinthebag 3 hours ago|||

They will lie to themselves and deny it.

wg0 2 hours ago|||

This will be another Microservices moment in our industry.

nothinkjustai 5 hours ago||

You’ll get downvoted for this hearsay!

footy 4 hours ago||

I think you mean heresy. But maybe I don't get the reference you're making when you say hearsay

bot403 2 hours ago|||

I'm wondering if there are anti-ai bots trolling the boards. Look at all the usernames of the negative AI posts.

Or maybe the only people left opposing AI are so hardcore against it they form their identity (username) around it

nothinkjustai 1 hour ago||

ok bot403

IncRnd 3 hours ago|||

Hearsay is a rumor or something that can't be verified.

dmix 4 hours ago||

I've tried these larger agent skillsets in the past and felt it was a waste of time because it was just doing too much. Just like vim it's often better to pick and choose from the community instead of installing skills like they are an IDE. Skills are way too personal because every dev and dev team is different. So better to treat these as a reference for your own config rather than bulk install someone else's config.

thatmf 4 hours ago||

Why are people so excited to put themselves out of a job?

Not that these or any "skills" will do that, but just- in principle. This is like alienation from labor at scale.

dewey 3 hours ago||

Because usually the people who lose their jobs are people who do not adapt to the market.

Right now it's not clear in which direction everything is involving and that's why people experiment with handing all their data to random agents, figuring out how to store and access context, re-use prompts and other attempts to harness this tech. Most of these will maybe be useless in a year as they might be deeply integrated into the next wave of models but staying on top of the development has always been part of the fun of working in this field.

kiba 3 hours ago||

People are building bots to do the most legible thing possible which is feature in X amount of time. But it doesn't matter if the bottleneck is human thinking time required to output quality code rather than X amount of code written.

dawnerd 1 hour ago|||

It's likely the people that were not good developers that suddenly got accelerated "to the top" that seem the most for it. All of the good devs I know have been a bit more cautious on the uptake.

onion2k 1 hour ago||

I think it's more subtle than that. There are a lot of measures of what a 'good developer' is, and one of them is 'shipping things'. AI is specifically accelerating that part of the industry - it's much easier to ship code faster now. If you're in a domain that doesn't need quality (easy horizontal scaling, bugs rarely have a critical impact, customers are relatively loyal) then AI is proving that shipping features is more important than code quality.

If you're in a part of the software industry that needs well-optimized and bug-free code then it's less useful. The problem for devs is that those parts of the industry are much smaller.

hibikir 4 hours ago|||

Because we've been automating large parts of our former jobs for decades. Otherwise we'd all be trying to build things in the least efficient way possible to maximize how long the job takes, which IMO isn't a great idea.

Humans have been minimizing how much work is needed to get a certain level of output for as long as we can track. It's civilization. Should we go back to farming by hand with hoes, to maximize labor used? Go back to streetlights that are individually lit? The society that falls behind on automation becomes poorer, and eventually just dies, as even the people born there tend to choose to leave to higher productivity places. It happened to eastern europe, it happens to the Amish. To any poor society which gets emigration. Doing more with less has always been exciting.

clapthewind 4 hours ago|||

Some people are playing the global optimization game; a world where anyone can have any (production grade) software they want.

cortesoft 2 hours ago|||

I don't understand this thinking as a computer programmer. My whole life has been about getting a computer to do work so humans don't have to anymore. Every single piece of software written is supposed to take away work from someone.

Do you feel this way about every automation you create? I do know some old school sys admins who felt this way about a lot of infrastructure automation advancements, and didn't like that we were creating scripts and systems to do the work that used to be done by hand. My team created an automated patching system at a job that would automatically run patching across our 30,000 servers, taking systems in and out of production autonomously, allowing the entire process to be hands free. We used to have a team whose full time job was running that process manually. Did we take their jobs by automating it?

Sure, in a sense. But there was other work that needed to be done, and now they could do it.

The whole reason I like programming and computers and technology is precisely because it does things for us so we don't have to do it. My utopia is robots doing all the hard work so humans can do whatever we want. AI is bringing us one step closer to that, and I would rather focus on trying to figure out how we can make sure the whole world can benefit from robots taking our jobs (and not just the rich owners), rather than focus on trying to make sure we leave enough work for humans to stay busy doing shit they don't actually want to do.

cuteboy19 3 hours ago|||

people are now being encouraged to use ai notetaking features under the guise of productivity.

a worker is just the sum total of all work related context. to collate, verify and organize this context is just asking to be replaced.

yieldcrv 2 hours ago||

Month 30 of software engineers not existing in 6 months

CharlesW 7 hours ago||

From an SEO/LLMO perspective, the discoverability of these skills will be difficult without a rename: https://agentskills.io/

If Addy reads this, how do you pitch this vs. Superpowers? https://github.com/obra/superpowers

consumer451 7 hours ago||

I would love to know how many people are actually using superpowers.

I showed up on the agentic dev scene prior to superpowers, and I am getting concerned that >50% of my self-rolled processes are now covered by superpowers.

I no longer trust gh stars, can anyone chime in? Is superpowers now truly adopted?

If it is truly valuable, why hasn't Boris integrated the concepts yet?

supermdguy 3 hours ago|||

I've used it off and on over the last month or so. For more complicated tasks (30+ minutes) it works well, and seems to replace a lot of prompting that I'd normally need to do (e.g. asking questions about requirements, creating specs and implementation plans, staying on task). For simple tasks, it tries to do too much and gets in the way.

marcus_holmes 6 hours ago||||

I adopted superpowers, but then adapted it. I've changed some things, added some things. I suspect that my set of agent skills is probably overlapping with OP's by quite a lot now.

I also found that I have different skills for different tasks; at work security is a huge concern and I over-emphasise security in the skills. At play I'm less bothered about security and so the skills I've written to help me build stupid one-shot exploratory websites are less about security and more about refactoring and exploring concepts.

RideOnTime22 5 hours ago||||

It's just the new thing.

People were hyping up Oh My Opencode. When they realized it didn't lead to any significant gains in performance they hopped on the next thing.

And when the same thing happens to Superpowers it'll be something else they cling on because "this time it's different"

nullstyle 7 hours ago|||

I just removed superpowers from my own setup. In my opinion, given the quality of the planning modes in both claude code and codex, superpowers was really just slowing things down and burning more tokens than vanilla.

consumer451 7 hours ago|||

Thank you for the data point.

To give back as much as I can, I use the two built-in CC review processes when appropriate. But, those only do "is this PR good code?"

Far too late did I finally roll my own custom review skill that tests: "does this PR accomplish what the specs required?"

If I could ask for one more vanilla CC skill, it might be that. However, maybe rolling your own repo-aware skill via prompt is better?

ramoz 3 hours ago||||

It never worked well for me. The only thing I really needed outside of the harnesses was a better plan review surface. https://github.com/backnotprop/plannotator

horsawlarway 5 hours ago|||

anecdata, but I ended up in the same spot.

I used superpowers - but it burns waay more tokens for basically the same outcome as a single line that states

"Please do planning and ask any required questions before implementing.

[my prompt]"

On the latest models and with a decent harness, the planning modes are quite good, and the single sentence telling it to ask you questions lets the model pick the right thing to ask about, instead of wasting a bunch of time/tokens on predefined skills that try to force basically the same result.

It does introduce a second set of required interactions, but you can have another agent be your "questions answerer" if you need it (result quality goes down a bit vs answering myself, but still quite good, especially if you spend a bit of time on the answerer prompt)

Basically - things are moving fast enough I'm not convinced buying into superpowers/agentskills/[daily prompt magic beans]/etc tooling really makes sense.

I'd stick to the defaults in the harness for most cases, and then work on being clear with the ask.

ssgodderidge 4 hours ago|||

This is like creating a React framework called ReactJS to compete with NextJS

esafak 7 hours ago|||

Looks like a bunch of canned skills served through a plugin?

ricardobeat 7 hours ago||

Does superpowers actually work? The main skill file doesn't inspire much confidence:

    "If you think there is even a 1% chance a skill might apply to what you are doing, you ABSOLUTELY MUST invoke the skill."

CharlesW 6 hours ago||

This kind of "overprompting" is one technique that even the best skills/agents use to compensate for under-invocation, which happens when more demure advisory language tends to be rationalized away by LLMs.

It shouldn't be your default, but should absolutely be tried when your skill/agent test suite displays evidence that it's not being reliably invoked without it.

ColinEberhardt 1 hour ago||

Agents Skills are built upon “Five design decisions [that] are the load-bearing ones”

And Open Design (HN front page yesterday) is supported by “Six load-bearing ideas”

The similarities in the way these prompt libraries are documented doesn’t feel coincidental.

zmmmmm 6 hours ago||

I was surprised how long some of these skills are. They are pages and pages long with tables and checkbox lists and code examples, etc.

Curious how normal that is - it would only take a couple of these to really fill the context alot.

_pdp_ 48 minutes ago||

The reason they are long is because these skills are produced mostly by Claude Code and Opus and no sensible human will read these files, let alone build a mental model around them. There is just layers of assumptions that this works - when in reality it doesn't and it is wasteful.

Here is a fun experiment.

Ask any LLM to write something vaguely familiar. For example, ask it "write a fib". Since almost all LLMs are fine tuned on code, I find that all of them will respond with a fibonacci sequence algorithm even-though to a non-programmer "write a fib" means to write an unimportant lie.

So there is compression. You can express an outcome in just 3 vague tokens without going into details what exactly is a fibonacci sequence.

That should be enough to understand that the length of the prompt does not matter. What matters is the right words, frequency and order. You can write two page prompt or two sentence prompt and both can have the same outcome.

gwerbin 5 hours ago|||

I quickly skimmed and it looks like at least a few of them are intended to be more like system prompts for a tightly scoped sub agent than a skill as such. I agree, I wouldn't want to use a lot of of these in a longer-running work session.

I have been successful with short and focused skills so far. I treat them as a reusable snippet of context, but small ones. For example a couple of paragraphs at most about how to use Python in my project and how to run unit tests. I also have several short "info" skills that don't actually provide the agent instructions, they merely contain useful contextual information that the agent can choose to pull in if needed.

Even having too many skills can be an issue because the list of skill names and their descriptions all end up in the context at some point.

tecoholic 6 hours ago|||

I have written zero skills, so not sure how normal it is. I counted the words in couple of them and they seem to be around 2k range. So 5 skills would be around 10K. Even at a small LLM context of 128k, that's still around 10%. And for a 1M context window like the big ones, it barely registers.

umeshunni 3 hours ago|||

> it would only take a couple of these to really fill the context alot.

Only skill front-matter (name, description, triggers etc) are loaded within context by default, so this isn't likely to happen without 1000s of skills.

sergiotapia 5 hours ago||

I reviewed the line counts of my own project skill files, and the top 3 I have are:

    805 lines
    660 lines
    511 lines

Maybe I am _too_ conservative here. Lots to explore.

mohamedkoubaa 5 hours ago||

No, you aren't.

tariky 1 hour ago||

What is difference between superpowers and this?

I use superpowers for several months now and it really does help. But still 90/10 rule applies, 10% of time it will produce stupid decision. So always check spec.

cortesoft 2 hours ago|

What makes this better/different than spec-kit? It seems to have a very similar philosophy. I wonder if they could work together? Or would they just be duplicative?

https://github.com/github/spec-kit

More comments...