Agent Skills - Hacker News

Posted by mooreds 8 hours ago

326 points | 187 comments

iainmerrick 7 hours ago|

This stuff smells like maybe the bitter lesson isn't fully appreciated.

You might as well just write instructions in English in any old format, as long as it's comprehensible. Exactly as you'd do for human readers! Nothing has really changed about what constitutes good documentation. (Edit to add: my parochialism is showing there, it doesn't have to be English)

Is any of this standardization really needed? Who does it benefit, except the people who enjoy writing specs and establishing standards like this? If it really is a productivity win, it ought to be possible to run a comparison study and prove it. Even then, it might not be worthwhile in the longer run.

theshrike79 33 minutes ago||

Skills can contain scripts, making them a lot more versatile than just a document.

Of course any LLM can write any script based on a document, but that's not very deterministic.

A good example is Anthropic's PDF creator skill. It has the basic english instructions as well as actual Python code to generate PDFs

rfw300 2 minutes ago||

This strikes me as entirely logical in the short run, and an insane way of packaging software that we will certainly regret in the long run.

zby 6 hours ago|||

The instructions are standard documents - but this is not all. What the system adds is an index of all skills, built from their descriptions, that is passed to the llm in each conversation. The idea is to let the llm read the skill when it is needed and not load it into context upfront. Humans use indexes too - but not in this way. But there are some analogies with GUIs and how they enhance discoverability of features for humans.

I wish they arranged it around READMEs. I have a directory with my tasks and I have a README.md there - before codex had skills it already understood that it needs to read the readme when it was dealing with tasks. The skills system is less directory dependent so is a bit more universal - but I am not sure if this is really needed.

giancarlostoro 3 hours ago|||

Claude reads from .claude/instructions.md whenever you make a new convo as a default thing. I usually have Claude add things like project layout info and summaries, preferred tooling to use, etc. So there's a reasonable expectation of how it should run. If it starts 'forgetting' I tell it to re-read it.

ethbr1 1 hour ago||||

> What the system adds is an index of all skills, built from their descriptions, that is passed to the llm in each conversation. The idea is to let the llm read the skill when it is needed and not load it into context upfront.

This is different from swagger / OpenAPI how?

I get cross trained web front-end devs set a new low bar for professional amnesia and not-invented-here-ism, but maybe we could not do that yet another time?

iainmerrick 5 hours ago|||

Humans use indexes too - but not in this way.

What's different?

zby 4 hours ago||

Hmm - maybe I should not call it index - people lookup stuff in the index when needed. Here the whole index is inserted in the conversation - it is as if when starting a task human read the whole table of contents of the manual for that task.

postalcoder 6 hours ago|||

Folks have run comparisons. From a huggingface employee:

  codex + skills finetunes Qwen3-0.6B to +6 on humaneval and beats the base score on the first run.

  I reran the experiment from this week, but used codex's new skills integration. Like claude code, codex consumes the full skill into context and doesn't start with failing runs. It's first run beats the base score, and on the second run it beats claude code.

https://xcancel.com/ben_burtenshaw/status/200023306951767675...

That said, it's not a perfect comparison because of the Codex model mismatch between runs.

The author seems to be doing a lot of work on skills evaluation.

https://github.com/huggingface/upskill

iainmerrick 6 hours ago|||

I can't quite tell what's being compared there -- just looks like several different LLMs?

To be clear, I'm suggesting that any specific format for "skills.md" is a red herring, and all you need to do is provide the LLM with good clear documentation.

A useful comparison would be between: a) make a carefully organised .skills/ folder, b) put the same info anywhere and just link to it from your top-level doc, c) just dump everything directly in the top-level doc.

My guess is that it's probably a good idea to break stuff out into separate sections, to avoid polluting the context with stuff you don't need; but the specific way you do that very likely isn't important at all. So (a) and (b) would perform about the same.

postalcoder 6 hours ago|||

Your skepticism is valid. Vercel ran a study where they said that skills underperform putting a docs index in AGENTS.md[0].

My guess is that the standardization is going to make its way into how the models are trained and Skills are eventually going to pull out ahead.

0: https://vercel.com/blog/agents-md-outperforms-skills-in-our-...

vidarh 4 hours ago||

Agents add a docs index in context for skills, so this is an issue of finding that the current specific implementation of skills in Claude Code is suboptimal.

Their reasoning about it is also flawed. E.g. "No decision point. With AGENTS.md, there's no moment where the agent must decide "should I look this up?" The information is already present." - but this is exactly the case for skills too. The difference is just where in the context the information is, and how it is structured.

Having looked at their article, ironically I think the reason it works is that they likely force more information into context by giving the agent less information to work with:

Instead of having a description, which might convince the agent a given skill isn't relevant, their index is basically a list of vague filenames, forcing the agent to make a guess, and potentialy reading the wrong thing.

This is basically exactly what skills were added to avoid. But it will break if the description isn't precise enough. And it's perfectly possible that current tooling isn't aggressive enough about pruning detail that might tempt the agent to ignore relevant files.

anupamchugh 3 hours ago|||

> If you want a clean comparison, I’d test three conditions under equal context budgets: (A) monolithic > AGENTS.md, (B) README index that links to docs, (C) skills with progressive disclosure. Measure task > success, latency, and doc‑fetch count across 10–20 repo tasks. My hunch: (B)≈(C) on quality, but (C) > wins on token efficiency when the index is strong. Also, format alone isn’t magic—skills that reference > real tools/assets via the backing MCP are qualitatively different from docs‑only skills, so I’d > separate those in the comparison. Have you seen any benchmarks that control for discovery overhead?

pton_xd 6 hours ago||||

I think the point is it smells like a hack, just like "think extra hard and I'll tip you $200" was a few years ago. It increases benchmarks a few points now but what's the point in standardizing all this if it'll be obsolete next year?

mbesto 5 hours ago||||

I think this tweet sums it correctly doesn't?

   A +6 jump on a 0.6B model is actually more impressive than a +2 jump on a 100B model. It proves that 'intelligence' isn't just parameter count; it is context relevance. You are proving that a lightweight model with a cheat sheet beats a giant with amnesia. This is the death of the 'bigger is better' dogma

Which is essentially the bitter lesson that Richard Sutton talks about?

Der_Einzige 2 hours ago||

Nice ChatGPT generated response in that tweet. Anyone too lazy to deslop their tweet shouldn't be listened to.

9dev 5 hours ago|||

Standards have to start somewhere to gain traction and proliferate themselves for longer than that.

Plus, as has been mentioned multiple times here, standard skills are a lot more about different harnesses being able to consistently load skills into the context window in a programmatic way. Not every AI workload is a local coding agent.

xrd 6 hours ago||||

Does this indicate running locally with a very small (quantized?) model?

I am very interested in finding ways to combine skills + local models + MCP + aider-ish tools to avoid using commercial LLM providers.

Is this a path to follow? Or, something different?

postalcoder 6 hours ago||

Check out the guy's work. He's doing a lot of work on precisely what you're talking about.

https://xcancel.com/ben_burtenshaw

https://huggingface.co/blog/upskill

https://github.com/huggingface/upskill

8cvor6j844qw_d6 6 hours ago|||

Sounds like the benchmark matrix just got a lot bigger, model * skill combinations.

ianbutler 2 hours ago|||

I'd argue we jumped that shark since the shift in focus to post training. Labs focus on getting good at specific formats and tasks. The generalization argument was ceded (not in the long term but in the short term) to the need to produce immediate value.

Now if a format dominates it will be post trained for and then it is in fact better.

Der_Einzige 2 hours ago||

Anthropic and Gemini still release new pre-training checkpoints regularly. It's just OpenAI who got stupid on that. RIP GPT-4.5

ianbutler 1 hour ago||

All models released from those providers go through stages of post training too, none of the models you interact with go from pre-training to release. An example of the post training pipeline is tool calling, that is to my understanding a part of post training and not pre training in general.

I can't speak to what the exact split is or what is a part of post training versus pre training at various labs but I am exceedingly confident all labs post train for effectiveness in specific domains.

Der_Einzige 1 hour ago||

I did not claim that post training doesn't happen on these models, and you are being extremely patronizing (I publish quite a bit of research on LLMs at top conferences).

I claimed that OpenAI overindexed on getting away with aggressive post-training on old pre-training checkpoints. Gemini / Anthropic correctly realized that new pre-training checkpoints need to happen to get the best gains in their latest model releases (which get post-trained too).

idopmstuff 6 hours ago|||

I have been using Claude Code to automate a bunch of my business tasks, and I set up slash commands for each of them. Each slash command starts by reading from a .md file of instructions. I asked Claude how this is different from skills and the only substantive thing it could come up with was that Claude wouldn't be able to use these on its own, without me invoking the slash command (which is fine; I wouldn't want it to go off and start checking my inventory of its own volition).

So yeah, I agree that it's all just documentation. I know there's been some evidence shown that skills work better, but my feeling is that in the long run it'll fall to the wayside, like prompt engineering, for a couple of reasons. First, many skills will just become unnecessary - models will be able to make slide decks or do frontend design without specific skills (Gemini's already excellent at design without anything beyond the base model, imho). Second, increased context windows and overall intelligence will obviate the need for the specific skills paradigm. You can just throw all the stuff you want Claude to know in your claude.md and call it a day.

steveklabnik 5 hours ago|||

Claude Code recently deprecated slash commands in favor of skills because they were so similar. Or another way of looking at it is, they added the ability to invoke a skill via /skill-name.

idopmstuff 5 hours ago||

Yeah, I saw that announcement but still can't figure out what the actual impact is - doesn't change anything for me (my non-skill slash commands still work).

steveklabnik 3 hours ago||

The actual impact is that there should be less confusion in the future about "what's the difference between these two" because there isn't really.

To overly programmer-brain it, a slash command is just a skill with a null frontmatter. This means that it doesn't participate in progressive disclosure, aka Claude won't consider invoking it automatically.

mordymoop 3 hours ago||||

Workflow-wise, the important distinction for me has been that I can refine a Skill by telling Claude Code to use it for related tasks until it does exactly what I want, correctly, the first time. Having a solid, iteratively perfected Skill really cuts down on subsequent iteration.

kurthr 6 hours ago||||

So how is this slash command limit enforced? Is it part of the Claude API/PostTraining etc? It seems like a useful tool if it is!

I'd like a user writeable, LLM readable, LLM non-writable character/sequence. That would make it a lot easier to know at a glance that a command/file/directory/username/password wasn't going to end up in context and being used by a rogue agent.

It wouldn't be fool proof, since it could probably find some other tool out there to generate it (eg write-me some unicode python), but it's something I haven't heard of that sounds useful. If it could be made fool/tool proof (fools and tools are so resourceful) that would be even better.

idopmstuff 6 hours ago||

It's part of the Claude Code harness. I honestly haven't thought at all about security related to it; it's just a nice convenience to trigger a commonly run process.

vidarh 4 hours ago|||

A bit of caution: it's perfectly able to look up and read the slash-command, so while it may be true it technically can't "invoke" a slash-command via TaskTool, it most certainly can execute all of the steps in it if the slash-command is somewhere you grant it read access, and will tend to try to do so if you tell it to invoke a slash command.

killerstorm 6 hours ago|||

> Is any of this standardization really needed?

This standardization, basically, makes a list of docs easier to scan.

As a human, you have a permanent memory. LLMs don't have it, they have to load it into the context, and doing it only as necessary can help.

E.g. if you had anterograde amnesia, you'd want everything to be optimally organized, labeled, etc, right? Perhaps an app which keeps all information handy.

iainmerrick 3 hours ago||

Everybody wants that, though, no? At least some of the time?

For example, if you've just joined a new team or a new project, wouldn't you like to have extensive, well-organised documentation to help get you started?

This reminds me of the "curb-cut effect", where accommodations for disabilities can be beneficial for everybody: https://front-end.social/@stephaniewalter/115841555015911839

ashdksnndck 3 hours ago|||

We’re working with the models that are available now, not theoretical future models with infinite context.

Claude is programmed to stop reading after it gets through the skill’s description. That means we don’t consume more tokens in the context until Claude decides it will be useful. This makes a big difference in practice. Working in a large repo, it’s an obvious step change between me needing to tell Claude to go read a particular readme that I know solves the problem vs Claude just knowing it exists because it already read the description.

Sure, if your project happened to already have a perfect index file with a one-sentence description of each other documentation file, that could serve as a similar purpose (if Claude knew about it). It’s worthwhile to spread knowledge about how effective this pattern is. Also, Claude is probably trained to handle this format specifically.

iainmerrick 3 hours ago||

To clarify, the bit where I think the bitter lesson applies is trying to standardize the directory names, the permitted headings and paragraph lengths, etc. It's pointless bikeshedding.

Making your docs nice and modular, and having a high-level overview that tells you where to find more detailed info on specific topics, is definitely a good idea. We already know that when we're writing docs for human readers. The LLMs are already trained on a big corpus written by and for humans. There's no compelling reason why we need to do anything radically different to help them out. To the contrary, it's better not to do anything radically different, so that new LLM-assisted code and docs can be accessible to humans too.

Well-written docs already play nicely with LLM context.

smithkl42 7 hours ago|||

It's all about managing context. The bitter lesson applies over the long haul - and yes, over the long haul, as context windows get larger or go away entirely with different architectures, this sort of thing won't be needed. But we've defined enough skills in the last month or two that if we were to put them all in CLAUDE.md, we wouldn't have any context left for coding. I can only imagine that this will be a temporary standard, but given the current state of the art, it's a helpful one.

OtherShrezzing 6 hours ago|||

I use Claude pretty extensively on a 2.5m loc codebase, and it's pretty decent at just reading the relevant readme docs & docstrings to figure out what's what. Those docs were written for human audiences years (sometimes decades) ago.

I'm very curious to know the size & state of a codebase where skills are beneficial over just having good information hierarchy for your documentation.

pertymcpert 4 hours ago||

Skills are more than code documentation. They can apply to anything that the model has to do, outside of coding.

iainmerrick 4 hours ago||||

To clarify, when I mentioned the bitter lesson I meant putting effort into organising the "skills" documentation in a very specific way (headlines, descriptions, etc).

Splitting the docs into neat modules is a good idea (for both human readers and current AIs) and will continue to be a good idea for a while at least. Getting pedantic about filenames, documentation schemas and so on is just bikeshedding.

storus 6 hours ago||||

Why not replace the context tokens on the GPU during inference when they become no longer relevant? i.e. some tool reads a 50k token document, LLM processes it, so then just flush those document tokens out of active context, rebuild QKV caches and store just some log entry in the context as "I already did this ... with this result"?

killerstorm 5 hours ago|||

Anthropic added features like this into 4.5 release:

https://claude.com/blog/context-management

> Context editing automatically clears stale tool calls and results from within the context window when approaching token limits.

> The memory tool enables Claude to store and consult information outside the context window through a file-based system.

But it looks like nobody has it as a part of an inference loop yet: I guess it's hard to train (i.e. you need a training set which is a good match for what people use context in practice) and make inference more complicated. I guess more high-level context management is just easier to implement - and it's one of things which "GPT wrapper" companies can do, so why bother?

zozbot234 6 hours ago|||

This is what agent calls do under the hood, yes.

storus 5 hours ago||

I don't think so, those things happen when agent yields the control back at the end of its inference call, not during the active agent inference with multiple tool calls ongoing. These days an agent can finish the whole task with 1000s tool calls during a single inference call without yielding control back to whatever called it to do some housekeeping.

vidarh 3 hours ago||

For agent, read sub-agent. E.g. the contents of your .claude/agents directory. When Claude Code spins up an agent, it provides the sub-agent with a prompt that combines the agents prompt and information composed by Claude from the outer context based on what Claude thinks needs to be communicated to the agent. Claude Code can either continue, with the sub-agent running in the background, or wait until it is complete. In either case, by default, Claude Code effectively gets to "check in" on messages from the sub-agent without seeing the whole thing (e.g. tool call results etc.), so only a small proportion of what the agent does will make it into the main agents context.

So if you want to do this, the current workaround is basically to have a sub-agent carry out tasks you don't want to pollute the main context.

I have lots of workflows that gets farmed out to sub-agents that then write reports to disk, and produce a summary to the main agent, who will then selectively read parts of the report instead of having to process the full source material or even the whole report.

ledauphin 7 hours ago||||

how is it different or better than maintaining an index page for your docs? Or a folder full of docs and giving Claude an instruction to `ls` the folder on startup?

d1sxeyes 7 hours ago|||

Vercel think it isn’t:

https://vercel.com/blog/agents-md-outperforms-skills-in-our-...

Avicebron 7 hours ago|||

It's hard to tell unless they give some hard data comparing the approaches systematically.. this feels like a grift or more charitably trying to build a presence/market around nothing. But who knows anymore, apparently saying "tell the agent to write it's own docs for reference and context continuity" is considered a revelation.

stingraycharles 6 hours ago|||

Not sure why you’re being downvoted so much, it’s a valid point.

It’s also related to attention — invoking a skill “now” means that the model has all the relevant information fresh in context, you’ll have much better results.

What I’m doing myself is write skills that invoke Python scripts that “inject” prompts. This way you can set up multi-turn workflows for eg codebase analysis, deep thinking, root cause analysis, etc.

Works very well.

MattRogish 2 hours ago|||

On the one hand, I agree.

The whole point of LLM-based code execution is, well, I can just type in any old language it understands and it ought to figure out what I mean!

A "skill" for searching a pdf could be :

* "You can search PDFs. The code is in /lib/pdf.py"

or it could be:

* "Here's a pile of libraries, figure out which you want to use for stuff"

or it could be:

* "Feel free to generate code (in any executable programming language) on the fly when you want to search a PDF."

or it could be:

* "Solve this problem <x>" and the LLM sees a pile of PDFs in the problem and decides to invent a parser.

or any other nearly infinite way of trying to get a non-deterministic LLM to do a thing you want it to do.

At some level, this is all the same. At least, it rounds to the same in a sort of kinda "Big O" order-of-magnitude comparison.

On the other hand, I also agree, but I can definitely see present value in trying to standardize it because humans want to see what is going on (see: JSON - it's highly desirable for programmers to be able to look at a string representation of data than send opaque binary over the wire, even though to a computer binary is gonna be a lot faster).

There is probably an argument, too, for optimization of context windows and tokens burned and all that kinda jazz. `O(n)` is the same as `O(10*n)` (where n is tokens burned or $$$ per second or context window size) and that doesn't matter in theory but certainly does in practice when you're the one paying the bill or you fill up the context window and get nonsense.

So if this is a _thoughtful_ standard that takes that kinda stuff into account then, well, great! It gives a benchmark we can improve and iterate upon.

With some hypothetical super LLM that has a nearly infinite context window and a cost/tok of nearly zero and throughput nearing infinity, you can just say "solve my problem" and it will (eventually) do it. But for now, I can squint and see how this might be helpful.

mhalle 5 hours ago|||

Skills are not just documentation. They include computability (programs/scripts), data (assets), and the documentation (resources) to use everything effectively.

Programs and data are the basis of deterministic results that are accessible to the llm.

Embedding an sqlite database with interesting information (bus schedules, dietary info, or a thousand other things) and a python program run by the skill can access it.

For Claude at least, it does it in a VM and can be used from your phone.

Sure, skills are more convention than a standard right now. Skills lack versioning, distribution, updates, unique naming, selective network access. But they are incredibly useful and accessible.

Spivak 5 hours ago||

Am I missing something because what you describe as the pack of stuff sounds like S tier documentation. I get full working examples and a pre-populated database it works on?

Lerc 4 hours ago|||

The main thing here would need standardisation is the environment in which the skill operates. The skill instructions are interpreted by the AI, any support scripts are. Interpreted by the environment.

You don't want to give an English description of how to compress LZMA and then let the AI do it token by token. Although that would be a pretty good arduous methodical benchmark task for an AI.

JohnMakin 2 hours ago|||

I agree with this and it's a conversation I've struggled to have with coworkers about using these -

IMO it's great if a plugin wants to have their own conventions for how to name and where to put these files and their general structure. I get the sense it doesn't matter to agents much (talking mostly claude here) and the way I use it I essentially give its own "skills" based on my own convention. It's very flexible and seems to work. I don't use the slash commands, I just script with prompts into claude CLI mostly, so if that's the only thing I gain from it, meh. I do see other comments speculating these skills work more efficiently but I'm not sure I have seen any evidence for that? Like a sibling comment noted I can just re-feed the skill knowledge back into the prompt.

runjake 4 hours ago|||

You may be right, but I find myself writing English differently depending on the audience: people vs AI.

I haven't done a formal study, so I can't prove it, but it seems like I get better output from agents if I tailor my English more towards the LLM way of "thinking".

avaer 5 hours ago|||

It's not about instructions, it's about discoverability and data.

Yeah, WWW is really just text but that doesn't mean you don't need HTTP + HTML and a browser/search engine. Skills is just that, but for agent capabilities.

Long term you're right though, agents will fetch this all themselves. And at some point they will not be our agents at all.

iainmerrick 5 hours ago||

I guess what I mean is that standardizing this bit of the problem right now feels sort of like XHTML. Many people thought that was a big deal back in the day, but it turned out to be a pointless digression.

Long term you're right though, agents will fetch this all themselves

It's not "long term", it's right now. If your docs are well-written and well-organised, agents can already use them. The most you might need to do is copy your README.md into CLAUDE.md.

3371 5 hours ago|||

You are right about it's just natural language but Standarization is very improtant, because it's never just about the model itself, the so called Harness is a big factor on LLM performance and standarization allows all harness to index all skills.

storus 6 hours ago|||

This is pushed by Antropic, OpenAI doesn't seem to care much about "skills". Maybe Anthropic is doing some extra training to better follow sections of text marked as skill, who knows? Or you can just store what worked as a skill and share with others without any need to do their own prompt for common tasks?

jonathanhefner 6 hours ago||

OpenAI has already adopted Agent Skills:

- https://community.openai.com/t/skills-for-codex-experimental...

- https://developers.openai.com/codex/skills/

- https://github.com/openai/skills

- https://x.com/embirico/status/2018415923930206718

storus 4 hours ago||

Yeah but this seems like a bolt-on and not something they train their model to understand at the token level like how they do tool calls. Maybe Anthropic has a token-level skills support (e.g. <SKILL_START>skill prompt<SKILL_END>).

fassssst 6 hours ago|||

Post training can make known formats more reliable.

apsurd 3 hours ago|||

yeah the boon of LLM is how it gives a masked incentive for every jane and joe to be intentional communicators.

tcdent 6 hours ago|||

Skills are for the most part already generated by LLMs. And, if you're implementing them in your own workflow, they're tailored to real-world problems you've encountered.

Having a super repo of everyone else's slop is backwards thinking; you are now in the era where creating written content and verifying it's effectiveness is easier than ever.

MuskIsAntidemo 6 hours ago|||

[dead]

0dayman 1 hour ago||

what a great comment

jgmedr 7 hours ago||

Our team has found success in treating skills more like re-usable semi-deterministic functions and less like fingers-crossed prompts for random edge-cases.

For example, we have a skill to /create-new-endpoint. The skill contains a detailed checklist of all the boilerplate tasks that an engineer needs to do in addition to implementing the logic (e.g. update OpenAPI spec, add integration tests, endpoint boilerplate, etc.). The engineer manually invokes the skill from the CLI via slash commands, provides a JIRA ticket number, and engages in some brief design discussion. The LLM is consistently able to one-shot these tickets in a way that matches our existing application architecture.

mooreds 5 hours ago|

How do you test these skills for consistency over time, or is that not needed?

theshrike79 5 hours ago|||

The same way you'd test a human following written instructions over time.

Check the results.

pizzafeelsright 3 hours ago|||

My experience has been that if the skill is broken down into a function, possibly paired with a validator in another stage, you're at 99.9% deterministic.

I have not yet tested this at scale but give me six months.

davidkunz 7 hours ago||

Please standardize the folder.

  .claude/skills
  .codex/skills
  .opencode/skills
  .github/skills

albert_e 6 hours ago||

This is happening as we speak.

Codex started this and OpenCode followed suit with the hour.

https://x.com/embirico/status/2018415923930206718

wernerb 37 minutes ago|||

Could we adhere to the XDG standard and put config in ~/config/agents Or perhaps create a new XDG standard? Like $XDG_AGENTS_HOME ?

PantaloonFlames 6 hours ago|||

“Proposal: include a standard folder where agent skills should be“

https://github.com/agentskills/agentskills/issues/15

prettyblocks 7 hours ago|||

I find that even though this isn't standard, that these -cli tools will scan the repo for .md files and for the most part execute the skills accordingly. Having said that, I would much prefer standards not just for this, but for plugins as well.

iainmerrick 6 hours ago||

Standards for plugins makes sense, because you're establishing a protocol that both sides need to follow to be able to work together.

But I don't see why you need a strict standard for "an informal description of how to do a particular task". I say "informal" because it's necessarily written in prose -- if it were formal, it'd be a shell script.

m4r71n 7 hours ago|||

That is being discussed in https://github.com/agentskills/agentskills/issues/15.

verdverm 7 hours ago|||

.agent/

Skills seem a bit early to standardize. We are so early in this, why do we want to handcuff our creativity so soon?

arrowsmith 7 hours ago|||

Skills are a really simple concept. They're just custom prompts with a name and some metadata. What are you afraid of handcuffing?

likium 7 hours ago|||

Just the decision of whether to allow models to invoke them has [1][2][3] different ways.

[1]: https://code.claude.com/docs/en/skills#control-who-invokes-a... [2]: https://opencode.ai/docs/skills/#disable-the-skill-tool [3]: https://developers.openai.com/codex/skills/#enable-or-disabl...

arrowsmith 7 hours ago||

All the more reason to standardise it

wernerb 35 minutes ago|||

We keep standardising without adding versioning :(

verdverm 7 hours ago|||

Eventually, you can standardize what you don't understand

The problem I see now is that everyone wants to be the winner in a hype cycle and be the standards bringer. How many "standards" have we seen put out now? No one talks about MCP much anymore, langchain I haven't seen in more than a year, will we be talking about Skills in another year?

verdverm 7 hours ago||||

They are more than that, for example the frontmatter and code files around them. The spec: https://agentskills.io/specification

Why do I want to throw away my dependency management system and shared libraries folder for putting scripts in skills?

What tools do they have access to, can I define this so it's dynamic? Do skills even have a concept for sub tools or sub agents? Why do I want to put references in a folder instead of a search engine? Does frontmatter even make sense, why not something closer to a package.json in a file next to it?

Does it even make sense to have skills in the repo? How do I use them across projects? How do we build an ecosystem and dependency management system for skills (which are themselves versioned)

arrowsmith 7 hours ago||

> They are more than that, for example the frontmatter and code files around them.

You are right. I have edited my post slightly.

> Why do I want to throw away my dependency management system and shared libraries folder for putting scripts in skills?

You don't have to put scripts in skills. The script can be anywhere the agent can access. The skill just needs to tell the LLM how to run it.

> Does it even make sense to have skills in the repo? How do I use them across projects?

You don't have to put them in the repo. E.g. with Claude Code you can put project-specific skills in `.claude/skills` in the repo and system-wide skills in `~/.claude/skills`.

verdverm 6 hours ago||

2. The spec / docs show people how to put code in a subdir. While you can reference external scripts, there is a blessed pattern that seems like an anti-pattern to me

3. generalize: how do I store, maintain, and distribute skills shared by employees who work on multiple repos. Sounds like standard dependency management to me. Does to some of the people building collections / registries. Not sure if any of them account for versioning, have not seen anything tied to lock files (though I'd avoid that by using MVS for dep selection)

vidarh 7 hours ago|||

Agreed. I think being overly formal about what can be in the frontmatter would be a mistake, but the beauty of doing this with an LLM is that you can pretty much emulate skills in any agent by telling it to start by reading the frontmatter of each skills file and use that to decide when to read the rest, so given that as a fallback, it's hardly imposing some massive burden to standardise it a bit.

nikcub 2 hours ago|||

it's actually .agents/ :)

verdverm 2 hours ago||

why plural?

nikcub 1 hour ago||

because more than one accesses it? :shrug:

mijoharas 5 hours ago|||

I mean, it'd be good if these tools followed the xdg base spec and put their config in `~/.config/claude` e.t.c instead of `~/.claude`.

It's one of my biggest pet peeves with a lot of these tools (now admittedly a lot of them have a config env var to override, but it'd be nice if they just did the right thing automatically).

tobyhinloopen 7 hours ago|||

ln -s to the rescue!

davidkunz 7 hours ago|||

The root cause should be fixed.

flurdy 6 hours ago||||

It's why I wrapped my tiny skills repo with a script that softlink them into whichever is your skills folder, defaulting to Claude, but could be any other.

I treat my skills the same as I would write tiny bash scripts and fish functions in the days gone to simplify my life by writing 2 words instead of 2 sentences. Tiny improvement that only makes sense for a programmer at heart.

[1] https://github.com/flurdy/agent-skills

smithkl42 7 hours ago||||

That doesn't work very well if your developers are on Windows (and most are). Uneven Git support for symbolic links across platforms is going to end up causing more problems than it solves.

xrd 7 hours ago|||

Why not hardlinks?

dmd 7 hours ago||

You can't hardlink a directory.

throwaway98797 6 hours ago|||

might be too early to standardize

standards are good but they slow development and experimentation

rvz 7 hours ago|||

There are 14 competing standards.

d1sxeyes 7 hours ago|||

The problem is that the de facto standard is `.claude`, which is problematic for folks not using Claude.

OtherShrezzing 6 hours ago||

Your skill then just becomes an .md file containing

>any time you want to search for a skill in `./codex`, search instead in `./claude`

and continue as you were.

AndroidKitKat 6 hours ago||

I see it similar to browser user-agents all claiming to be an ancient version of Mozilla or KHTML. We pick whatever works and then move on. It might not be "correct," but as long as our tools know what to do, who cares?

PurpleRamen 7 hours ago||||

Now, there are 15 competing standards.

smithkl42 7 hours ago|||

Soon...

behnamoh 7 hours ago||

Worse yet; opencode uses singular words by default:

    .opencode/skill

davidkunz 6 hours ago||

On the website[1] it says:

  .opencode/skills

[1]: https://opencode.ai/docs/skills/#place-files

the_mitsuhiko 4 hours ago||

They changed it. It was singular.

CuriouslyC 6 hours ago||

Pro tip: create README.md files in subfolders with helpful content that you might put in an AGENTS.md file (but, ya know, for humans too), and *link relevant skills there*. You don't even have to call them skills or use the skills format. It works for everything (including humans!).

I wrote a rant about skills a while ago that's still relevant in some ways: https://sibylline.dev/articles/2025-10-20-claude-skills-cons...

Sammi 2 hours ago|

Exactly.

It feels like people think they are something new and novel that there is something technical about them that one needs to learn.

"Skills" are just readmes on particular subjects. They can be for whatever purpose you want them to be. Any time you find that you need to repeatedly tell the agent about something, you can put it in a "skill".

You don't even have to follow the skill standard and use the standard folder and filenames. That's just so the agent can auto find and load them. You can name them whatever you want and put them wherever you want and just add them to context yourself when you need them.

bazhand 1 hour ago||

The third most popular skill on skills.sh[1] with 50k/week installs is a link to download a command[2]

[1] https://skills.sh/vercel-labs/agent-skills/web-design-guidel... [2] https://github.com/vercel-labs/agent-skills/blob/main/skills...

All of these SKILLS.md/AGENTS.md/COMMANDS.md are just simple prompts, maybe even some with context links.

And quite dangerous.

Soerensen 7 hours ago||

The observation about agents not using skills without being explicitly asked resonates. In practice, I've found success treating skills as explicit "workflows" rather than background context.

The pattern that works: skills that represent complete, self-contained sequences - "do X, then Y, then Z, then verify" - with clear trigger conditions. The agent recognizes these as distinct modes of operation rather than optional reference material.

What doesn't work: skills as general guidelines or "best practices" documents. These get lost in context or ignored entirely because the agent has no clear signal for when to apply them.

The mental model shift: think of skills less like documentation and more like subroutines you'd explicitly invoke. If you wouldn't write a function for it, it probably shouldn't be a skill.

philipp-gayret 7 hours ago||

Better yet is a system which activates skills in certain situations. I use hooks for this with Claude, works great. The skill descriptions are "Do not activate unless instructed by guidance."

Example: A Python file is read or written, guidance is given back (once, with a long cooldown) to activate global and company-specific Python skills. Claude activates the skills and writes Python to our preference.

smithkl42 7 hours ago|||

That does raise the question of what the value is of a "skill" vs a "command". Claude Code supports both, and it's not entirely clear to me when we should use one vs the other - especially if skills work best as, well, commands.

sReinwald 6 hours ago||

IMO the value and differentiating factor is basically just the ability to organize them cleanly with accompanying scripts and references, which are only loaded on demand. But a skill just by itself (without scripts or references) is essentially just a slash command with metadata.

Another value add is that theoretically agents should trigger skills automatically based on context and their current task. In practice, at least in my experience, that is not happening reliably.

8cvor6j844qw_d6 6 hours ago|||

Reminds me of my personal Obsidian notes, CLI commands for tasks I need just rarely enough to forget, with explanations for future me.

vidarh 7 hours ago||

The description "just" needs to be excruciatingly precise about when to use the skill, because the frontmatter is all the model will see in context.

But on the other hand, in Claude Code, at least, the skill "foo" is accessible as /foo, as the generalisation of the old commands/ directory, so I tend to favour being explicit that way.

thisisthenewme 6 hours ago||

My unproven theory is that agent skills are just a good way to 'acquire' unspoken domain rules. A lot of things that developers do are just in their heads, and using 'skills' forces them to write these down. Then you feed this back to the LLM company for them to train on.

esafak 8 hours ago||

Does anyone find that agents just don't use them without being asked?

libraryofbabel 7 hours ago||

This has been a problem for us too. Sometimes they reach for skills, sometimes they don’t and just try to do the thing on their own. It’s annoying.

I think this is (mostly) a solvable problem. The current generation of SotA models wasn’t RLVR-trained on skills (they didn’t exist at that time) and probably gets slightly confused by the way the little descriptions are all packed into the same tool call schema. (At least that’s how it works with Claude Code.) The next generation will have likely been RLVRed on a lot of tasks where skills are available, and will use them much more reliably. Basically, wait until the next Opus release and you should hopefully see major improvements. (Of course, all this stuff is non-deterministic blah blah, but I think it’s reasonable to expect going from “misses the skill 30% of the time” to “misses it 2% of the time”.)

empath75 7 hours ago||

I think this is mostly a problem of making things skills that don't need to be skills (telling it how to do something it already knows how to do), and having way too much context, so that the skills effectively disappear. If skills are important, information about using skills needs to be a relatively large proportion of the context. Probably the right way to do it, is aggressively trimming anything that might distract from them.

modernerd 7 hours ago|||

That's also what Vercel found:

> In 56% of eval cases, the skill was never invoked. The agent had access to the documentation but didn't use it. Adding the skill produced no improvement over baseline.

> …

> Skills aren't useless. The AGENTS.md approach provides broad, horizontal improvements to how agents work with Next.js across all tasks. Skills work better for vertical, action-specific workflows that users explicitly trigger,

https://vercel.com/blog/agents-md-outperforms-skills-in-our-...

jillesvangurp 7 hours ago|||

Depends what you use perhaps. I use codex and it seems to mostly stick to instructions I give. I use an AGENTS.md that explicitly points to the repository's skill directory. I mostly keep instructions in there for obvious things like how to build, how to test, what to do before declaring a thing done, etc. I don't tend to have a lot of skills in there either.

Probably the more skills you have, the more confused it might get. The more potentially conflicting instructions you give the harder it gets for an LLM to figure out what you actually want to happen.

If I catch it going off script, I often interrupt it and tell it what to do and update the relevant skill. Seems to work pretty good. Keeping things simple seems to work.

rco8786 7 hours ago|||

Yep. I have an incredibly hard time getting them to use Skills at all, even when asked.

I saw someone's analysis a few days ago and they found that their agents were more accurate when just dumping the skill context directly into AGENTS.md

shmoogy 7 hours ago|||

I often find they aren't triggered when I would expect using a keyword and explicitly trigger them.

tobyhinloopen 7 hours ago|||

Same! If I put the skill's instructions in the general AGENTS.md, it works just fine.

troupo 7 hours ago||

Because "skills" are just .md files that the lossy compressing statistical output machine may or may not find and that may or may not be retained in the tiny context window

chasd00 7 hours ago||

I don’t think you should be downvoted. Skills and history get added to the prompt, there’s no other interface to the model to do anything different. I think it’s smart to keep this in mind when working with LLMs. It’s like keeping in mind that a webserver just responds to HTTP requests when developing a web application. You need to keep perspective.

Edit: btw I’ve gone from genai value denier to skeptic to cautiously optimistic to fairly impressed in the span of a year. (I’m a user of Claude code)

clarity_hacker 2 hours ago||

The real value isn't the format itself — it's progressive disclosure. When you dump everything into one monolithic doc, you're burning context tokens on instructions the agent doesn't need for the current task.

Skills as a pattern let the agent scan a lightweight index of descriptions, then pull in the full instructions only when relevant. Whether that's a .skills/ folder or a README index pointing to separate docs doesn't matter much. What matters is the separation between "what capabilities exist" and "how to execute this specific one."

The standardization part is mostly useful for distribution — being able to install and share skills across projects without manually wiring them up. Same reason we standardize package formats even though you could just copy-paste code.

csummers 3 hours ago|

I'm developing a new programming language, so I have to provide a way for LLMs to know about and generate code for a language they have not seen (i.e., have no training data for).

My tooling was previously adding in AI hints with CLAUDE.md, Cursor Rules, Windsurf Rules, AGENTS.md, etc., but I recently switched to using only AGENTS.md and SKILLS. I appreciate the standardization from this perspective.

More comments...