Top
Best
New

Posted by rochansinha 4 days ago

Skills Officially Comes to Codex(developers.openai.com)
300 points | 129 commentspage 2
pupppet 3 days ago|
How are skills different than tool/function calling?
esafak 3 days ago||
It's the catalog for the tools. Especially useful if you have custom tools; they expect the basics like grep and jq to be there.
mkagenius 3 days ago|||
You can achieve what Skills achieve via function calling somewhat.

I've this mental map:

Frontmatter <---> Name and arguments of the function

Text part of Skill md <---> description field of the function

Code part of the Skill <---> body of the function

But the function wouldn't look as organised as the .md, also, Skill can have multiple function definitions.

jinushaun 3 days ago||
I agree. I don’t see how this is different from tool calling. We just put the tool instructions in a folder of markdown files.
yousif_123123 3 days ago||
It doesn't need to be describing a function. It could be explaining the skill in any way, it's kind of just like more instructions and metadata to be load just in time vs given all at once to the model.
mellosouls 3 days ago||
How can skills be monetised by creators?

Obviously they are empowering Codex and Claude etc, and many will be open source or free.

But for those who have commercial resources or tools to add to the skills choice, is there documentation for doing that smoothly, or a pathway to it?

I can see at least a couple of ways it might be done - skills requiring API keys or or other authentication approaches, but this adds friction to an otherwise smooth skill integration process.

Having instead a transparent commission on usage sent to registered skill suppliers would be much cleaner but I'm not confident that would be offered fairly, and I've seen no guidance yet on plans in that regard.

nextaccountic 2 days ago||
Sometimes, monetization is just impossible. But if you insist, have the skill call an API that needs a token, and charge for the API
shrx 3 days ago||
How would you enforce DRM on a markdown file?
tacone 3 days ago||
I don't understand how skills are different than just instructing your model to read all the front-matters from a given folder on your filesystem and then decide if they need to read the file body.
pests 3 days ago||
That is basically what it is tho.

One difference is the model might have been trained/fine-tuned to be better at "read all the front-matters from a given folder on your filesystem and then decide..." compared a model with those instructions only in its context.

Also, does your method run scripts and code in any kind of sandbox or other containment or do you give it complete access to your system? #yolo

tacone 3 days ago||
Not my method really, just a comparison. I didn't know about the sandbox.

I see there might be advantages. The manual alternative could be tweaked further though. For example you might make it hierarchical.

Or you could create an "howTo" MCP with more advanced search capabilities. (or a grandma MCP to ask advice to after a failure)

Interesting topic, I guess has found a real best practice, everybody is still exploring.

shimman 3 days ago|||
Yes I'm confused as well, it feels like it's still all prompting which isn't new or different in the LLM space.
mbreese 3 days ago||
It’s all just loading data into the context/conversation. Sometimes as part of the chat response the LLM will request for the client do something - read a file, call a tool, etc. The results of which end up back in the context as well.
fassssst 3 days ago||
Post training :)
mikaelaast 3 days ago||
Are we sure that unrestricted free-form Markdown content is the best configuration format for this kind of thing? I know there is a YAML frontmatter component to this, but doesn't the free-form nature of the "body" part of these configuration files lead to an inevitably unverifiable process? I would like my agents to be inherently evaluable, and free-text instructions do not lend themselves easily to systematic evaluation.
coldtea 3 days ago||
>doesn't the free-form nature of the "body" part of these configuration files lead to an inevitably unverifiable process?

The non-deterministic statistical nature of LLMs means it's inherently an "inevitably unverifiable process" to begin with, even if you pass it some type-checked, linted, skills file or prompt format.

Besides, YAML or JSON or XML or free-form text, for the LLM it's just tokens.

At best you could parse the more structured docs with external tools more easily, but that's about it, not much difference when it comes to their LLM consumption.

Etheryte 3 days ago|||
The modern state of the art is inherently not verifiable. Which way you give it input is really secondary to that fact. When you don't see weights or know anything else about the system, any idea of verifiability is an illusion.
mikaelaast 3 days ago|||
Sure. Verifiability is far-fetched. But say I want to produce a statistically significant evaluation result from this – essentially testing a piece of prose. How do I go about this, short of relying on a vague LLM-as-a-judge metric? What are the parameters?
visarga 3 days ago|||
You 100% need to test work done by AI, if it's code it needs to pass extensive tests, if it's just a question answered, it needs to be the common conclusion of multiple independent agents. You can trust a single AI as much as a HN or reddit comment, but you can trust a committee of 4 as a real expert.

More generally I think testing AI by using its web search, code execution and ensembling is the missing ingredient to increased usage. We need to define the opposite of AI work - what validates it. This is hard, but once done you can trust the system and it becomes cheaper to change.

JamesSwift 3 days ago||||
How would you evaluate it if the agent were not a fuzzy logic machine?

The issue isnt the LLM, its that verification is actually the hard part. In any case, its typically called “evals” and you can probably craft a test harness to evaluate these if you think about it hard enough

coldtea 3 days ago|||
Would a structured skills file format help you evaluate the results more?
mikaelaast 3 days ago||
Yes. It would make it much easier to evaluate results if the input contents were parameterized and normalized to some agreed-upon structure.

Not to mention the advantages it would present for iteration and improvement.

coldtea 3 days ago||
"if the input contents were parameterized and normalized to some agreed-upon structure"

Just the format would be. There's no rigid structure that gets any preferrential treatment by the LLM, even if it did accept. In the end it's just instructions that are no different in any way from the prompt text.

And nothing stops you from making a "parameterized and normalized to some agreed-upon structure" and passing it directly to the LLM as skills content, or parsing it and dumping it as skills regular text content.

hu3 3 days ago|||
At least MCPs can be unit tested.

With Skills however, you just selectively append more text to prompt and pray.

joshka 3 days ago|||
The DSPy + GEPA idea for this mentioned above[1] seems like it could be a reasonable approach for systematic evaluation of skills (not agents as a whole though). I'm going to give this a bit of a play over the holiday break to sort out a really good jj-vcs skill.

[1]: https://news.ycombinator.com/item?id=46338371

heliumtera 3 days ago||
Then rename your markdown skill files to skills.md.yaml.

There you go, you're welcome.

ollysb 3 days ago||
Given how precious the main context is would it not make sense to have the skill index and skill runner occur in a subagent? e.g. "run this query against the dev db" the skills index subagent finds the db skill, runs the query then returns the result to the main context.
well_ackshually 3 days ago||
Ah, yes, simple text files that describe concepts, and that may contain references to other concepts, or references to dive in deeper. We could even call these something like a link. And they form a sort of... web, maybe ?

Close enough, welcome back index.htm, can't wait to see the first ads being served in my skills

username223 3 days ago|
Imagine SUBPROGRAMs that implement well-specified sequences of operations in a COmmon Business-Oriented Language, which can CALL each other. We are truly sipping rocket fuel.
zingar 2 days ago||
What is the advantage of skills over just calling code? From where I’m standing a Claude.md with a couple of examples of a particular bash script (examples and bash also written by Claude) is enough.
alexgotoi 3 days ago||
At any HR conference you go, there are two overused words: AI and Skills.

As of this week, this also applies to Hacker News.

stared 3 days ago||
Yes! I was raving about Claude Skills a few days ago (vide https://quesma.com/blog/claude-skills-not-antigravity/), and excited they come to Codex as well!
derrida 3 days ago|
Thanks for that! You mentioned Antigravity seemed slow, I just started playing with it too (but not really given it a good go yet to really evaluate) but I had the model set to Gemini Flash, maybe you get a speed up if you do that?
stared 3 days ago||
My motivation was to use the smartest model available (overall, not only from Google) - I wanted to squeeze more out of Gemini 3 Pro that in Cursor. With new model releases usually there are things with outages. This are ever changing.

That said, for many tasks (summaries and data extraction) I do use Gemini 2.5 Flash, as it cheap and fast. So excited to try Gemini 3 Flash as well.

rdli 3 days ago|
This is great. At my startup, we have a mix of Codex/CC users so having a common set of skills we can all use for building is exciting.

It’s also interesting to see how instead of a plan mode like CC, Codex is implementing planning as a skill.

greymalik 3 days ago|
I’m probably missing it, but I don’t see how you can share skills across agents, other than maybe symlinking .claude/skills and .codex/skills to the same place?
rdli 3 days ago|||
Nothing super-fancy. We have a common GitHub repo in our org for skills, and everyone checks out the repo into their preferred setup locally.

(To clarify, I meant that some engineers mostly use CC while others mostly use Codex, as opposed to engineers using both at the same time.)

hugh-avherald 3 days ago|||
Codex 5.2 automatically picked up my claude agents' skills. Didn't prompt for it, it just so happened that what I asked it for, one of claude's agents' prompts was useful, so Codex ran with it.
More comments...