Skills Officially Comes to Codex

Posted by rochansinha 12/20/2025

Skills Officially Comes to Codex(developers.openai.com)

304 points | 130 commentspage 2

pupppet 12/20/2025|

How are skills different than tool/function calling?

esafak 12/20/2025||

It's the catalog for the tools. Especially useful if you have custom tools; they expect the basics like grep and jq to be there.

mkagenius 12/20/2025|||

You can achieve what Skills achieve via function calling somewhat.

I've this mental map:

Frontmatter <---> Name and arguments of the function

Text part of Skill md <---> description field of the function

Code part of the Skill <---> body of the function

But the function wouldn't look as organised as the .md, also, Skill can have multiple function definitions.

jinushaun 12/20/2025||

I agree. I don’t see how this is different from tool calling. We just put the tool instructions in a folder of markdown files.

yousif_123123 12/20/2025||

It doesn't need to be describing a function. It could be explaining the skill in any way, it's kind of just like more instructions and metadata to be load just in time vs given all at once to the model.

mellosouls 12/20/2025||

How can skills be monetised by creators?

Obviously they are empowering Codex and Claude etc, and many will be open source or free.

But for those who have commercial resources or tools to add to the skills choice, is there documentation for doing that smoothly, or a pathway to it?

I can see at least a couple of ways it might be done - skills requiring API keys or or other authentication approaches, but this adds friction to an otherwise smooth skill integration process.

Having instead a transparent commission on usage sent to registered skill suppliers would be much cleaner but I'm not confident that would be offered fairly, and I've seen no guidance yet on plans in that regard.

nextaccountic 12/21/2025||

Sometimes, monetization is just impossible. But if you insist, have the skill call an API that needs a token, and charge for the API

shrx 12/20/2025||

How would you enforce DRM on a markdown file?

tacone 12/20/2025||

I don't understand how skills are different than just instructing your model to read all the front-matters from a given folder on your filesystem and then decide if they need to read the file body.

pests 12/20/2025||

That is basically what it is tho.

One difference is the model might have been trained/fine-tuned to be better at "read all the front-matters from a given folder on your filesystem and then decide..." compared a model with those instructions only in its context.

Also, does your method run scripts and code in any kind of sandbox or other containment or do you give it complete access to your system? #yolo

tacone 12/20/2025||

Not my method really, just a comparison. I didn't know about the sandbox.

I see there might be advantages. The manual alternative could be tweaked further though. For example you might make it hierarchical.

Or you could create an "howTo" MCP with more advanced search capabilities. (or a grandma MCP to ask advice to after a failure)

Interesting topic, I guess has found a real best practice, everybody is still exploring.

shimman 12/20/2025|||

Yes I'm confused as well, it feels like it's still all prompting which isn't new or different in the LLM space.

mbreese 12/20/2025||

It’s all just loading data into the context/conversation. Sometimes as part of the chat response the LLM will request for the client do something - read a file, call a tool, etc. The results of which end up back in the context as well.

fassssst 12/20/2025||

Post training :)

mikaelaast 12/20/2025||

Are we sure that unrestricted free-form Markdown content is the best configuration format for this kind of thing? I know there is a YAML frontmatter component to this, but doesn't the free-form nature of the "body" part of these configuration files lead to an inevitably unverifiable process? I would like my agents to be inherently evaluable, and free-text instructions do not lend themselves easily to systematic evaluation.

coldtea 12/20/2025||

>doesn't the free-form nature of the "body" part of these configuration files lead to an inevitably unverifiable process?

The non-deterministic statistical nature of LLMs means it's inherently an "inevitably unverifiable process" to begin with, even if you pass it some type-checked, linted, skills file or prompt format.

Besides, YAML or JSON or XML or free-form text, for the LLM it's just tokens.

At best you could parse the more structured docs with external tools more easily, but that's about it, not much difference when it comes to their LLM consumption.

Etheryte 12/20/2025|||

The modern state of the art is inherently not verifiable. Which way you give it input is really secondary to that fact. When you don't see weights or know anything else about the system, any idea of verifiability is an illusion.

mikaelaast 12/20/2025|||

Sure. Verifiability is far-fetched. But say I want to produce a statistically significant evaluation result from this – essentially testing a piece of prose. How do I go about this, short of relying on a vague LLM-as-a-judge metric? What are the parameters?

visarga 12/20/2025|||

You 100% need to test work done by AI, if it's code it needs to pass extensive tests, if it's just a question answered, it needs to be the common conclusion of multiple independent agents. You can trust a single AI as much as a HN or reddit comment, but you can trust a committee of 4 as a real expert.

More generally I think testing AI by using its web search, code execution and ensembling is the missing ingredient to increased usage. We need to define the opposite of AI work - what validates it. This is hard, but once done you can trust the system and it becomes cheaper to change.

JamesSwift 12/20/2025||||

How would you evaluate it if the agent were not a fuzzy logic machine?

The issue isnt the LLM, its that verification is actually the hard part. In any case, its typically called “evals” and you can probably craft a test harness to evaluate these if you think about it hard enough

coldtea 12/20/2025|||

Would a structured skills file format help you evaluate the results more?

mikaelaast 12/20/2025||

Yes. It would make it much easier to evaluate results if the input contents were parameterized and normalized to some agreed-upon structure.

Not to mention the advantages it would present for iteration and improvement.

coldtea 12/20/2025||

"if the input contents were parameterized and normalized to some agreed-upon structure"

Just the format would be. There's no rigid structure that gets any preferrential treatment by the LLM, even if it did accept. In the end it's just instructions that are no different in any way from the prompt text.

And nothing stops you from making a "parameterized and normalized to some agreed-upon structure" and passing it directly to the LLM as skills content, or parsing it and dumping it as skills regular text content.

hu3 12/20/2025|||

At least MCPs can be unit tested.

With Skills however, you just selectively append more text to prompt and pray.

joshka 12/20/2025|||

The DSPy + GEPA idea for this mentioned above[1] seems like it could be a reasonable approach for systematic evaluation of skills (not agents as a whole though). I'm going to give this a bit of a play over the holiday break to sort out a really good jj-vcs skill.

[1]: https://news.ycombinator.com/item?id=46338371

heliumtera 12/20/2025||

Then rename your markdown skill files to skills.md.yaml.

There you go, you're welcome.

well_ackshually 12/20/2025||

Ah, yes, simple text files that describe concepts, and that may contain references to other concepts, or references to dive in deeper. We could even call these something like a link. And they form a sort of... web, maybe ?

Close enough, welcome back index.htm, can't wait to see the first ads being served in my skills

username223 12/20/2025|

Imagine SUBPROGRAMs that implement well-specified sequences of operations in a COmmon Business-Oriented Language, which can CALL each other. We are truly sipping rocket fuel.

ollysb 12/21/2025||

Given how precious the main context is would it not make sense to have the skill index and skill runner occur in a subagent? e.g. "run this query against the dev db" the skills index subagent finds the db skill, runs the query then returns the result to the main context.

stared 12/20/2025||

Yes! I was raving about Claude Skills a few days ago (vide https://quesma.com/blog/claude-skills-not-antigravity/), and excited they come to Codex as well!

derrida 12/20/2025|

Thanks for that! You mentioned Antigravity seemed slow, I just started playing with it too (but not really given it a good go yet to really evaluate) but I had the model set to Gemini Flash, maybe you get a speed up if you do that?

stared 12/20/2025||

My motivation was to use the smartest model available (overall, not only from Google) - I wanted to squeeze more out of Gemini 3 Pro that in Cursor. With new model releases usually there are things with outages. This are ever changing.

That said, for many tasks (summaries and data extraction) I do use Gemini 2.5 Flash, as it cheap and fast. So excited to try Gemini 3 Flash as well.

alexgotoi 12/20/2025||

At any HR conference you go, there are two overused words: AI and Skills.

As of this week, this also applies to Hacker News.

rdli 12/20/2025||

This is great. At my startup, we have a mix of Codex/CC users so having a common set of skills we can all use for building is exciting.

It’s also interesting to see how instead of a plan mode like CC, Codex is implementing planning as a skill.

greymalik 12/20/2025|

I’m probably missing it, but I don’t see how you can share skills across agents, other than maybe symlinking .claude/skills and .codex/skills to the same place?

rdli 12/20/2025|||

Nothing super-fancy. We have a common GitHub repo in our org for skills, and everyone checks out the repo into their preferred setup locally.

(To clarify, I meant that some engineers mostly use CC while others mostly use Codex, as opposed to engineers using both at the same time.)

hugh-avherald 12/20/2025|||

Codex 5.2 automatically picked up my claude agents' skills. Didn't prompt for it, it just so happened that what I asked it for, one of claude's agents' prompts was useful, so Codex ran with it.

zingar 12/21/2025|

What is the advantage of skills over just calling code? From where I’m standing a Claude.md with a couple of examples of a particular bash script (examples and bash also written by Claude) is enough.

More comments...