Posted by rochansinha 4 days ago
I've this mental map:
Frontmatter <---> Name and arguments of the function
Text part of Skill md <---> description field of the function
Code part of the Skill <---> body of the function
But the function wouldn't look as organised as the .md, also, Skill can have multiple function definitions.
Obviously they are empowering Codex and Claude etc, and many will be open source or free.
But for those who have commercial resources or tools to add to the skills choice, is there documentation for doing that smoothly, or a pathway to it?
I can see at least a couple of ways it might be done - skills requiring API keys or or other authentication approaches, but this adds friction to an otherwise smooth skill integration process.
Having instead a transparent commission on usage sent to registered skill suppliers would be much cleaner but I'm not confident that would be offered fairly, and I've seen no guidance yet on plans in that regard.
One difference is the model might have been trained/fine-tuned to be better at "read all the front-matters from a given folder on your filesystem and then decide..." compared a model with those instructions only in its context.
Also, does your method run scripts and code in any kind of sandbox or other containment or do you give it complete access to your system? #yolo
I see there might be advantages. The manual alternative could be tweaked further though. For example you might make it hierarchical.
Or you could create an "howTo" MCP with more advanced search capabilities. (or a grandma MCP to ask advice to after a failure)
Interesting topic, I guess has found a real best practice, everybody is still exploring.
The non-deterministic statistical nature of LLMs means it's inherently an "inevitably unverifiable process" to begin with, even if you pass it some type-checked, linted, skills file or prompt format.
Besides, YAML or JSON or XML or free-form text, for the LLM it's just tokens.
At best you could parse the more structured docs with external tools more easily, but that's about it, not much difference when it comes to their LLM consumption.
More generally I think testing AI by using its web search, code execution and ensembling is the missing ingredient to increased usage. We need to define the opposite of AI work - what validates it. This is hard, but once done you can trust the system and it becomes cheaper to change.
The issue isnt the LLM, its that verification is actually the hard part. In any case, its typically called “evals” and you can probably craft a test harness to evaluate these if you think about it hard enough
Not to mention the advantages it would present for iteration and improvement.
Just the format would be. There's no rigid structure that gets any preferrential treatment by the LLM, even if it did accept. In the end it's just instructions that are no different in any way from the prompt text.
And nothing stops you from making a "parameterized and normalized to some agreed-upon structure" and passing it directly to the LLM as skills content, or parsing it and dumping it as skills regular text content.
With Skills however, you just selectively append more text to prompt and pray.
There you go, you're welcome.
Close enough, welcome back index.htm, can't wait to see the first ads being served in my skills
As of this week, this also applies to Hacker News.
That said, for many tasks (summaries and data extraction) I do use Gemini 2.5 Flash, as it cheap and fast. So excited to try Gemini 3 Flash as well.
It’s also interesting to see how instead of a plan mode like CC, Codex is implementing planning as a skill.
(To clarify, I meant that some engineers mostly use CC while others mostly use Codex, as opposed to engineers using both at the same time.)