Top
Best
New

Posted by simonw 3 days ago

OpenAI are quietly adopting skills, now available in ChatGPT and Codex CLI(simonwillison.net)
577 points | 321 commentspage 3
brainless 3 days ago|
The skills approach is great for agents and LLMs but I feel agents have to become wider in the context they keep and more proactive in the orchestration.

I have been running Claude Code with simple prompts (eg 1) to orchestrate opencode when I do large refactors. I have also tried generating orchestration scripts instead. Like, generate a list of tasks at a high level. Have a script go task by task, create a small task level prompt (use a good model) and pass on the task to agent (with cheaper model). Keeping context low and focused has many benefits. You can use cheaper models for simple, small and well-scoped tasks.

This brings me to skills. In my product, nocodo, I am building a heavier agent which will keep track of a project, past prompts, skills needed and use the right agents for the job. Agents are basically a mix of system prompt and tools. All selected on the fly. User does not even have to generate/maintain skills docs. I can get them generated and maintained with high quality models from existing code in the project or tasks at hand.

1 Example prompt I recently used: Please read GitHub issue #9. We have phases clearly marked. Analyze the work and codebase. Use opencode, which is a coding agent installed. Check `opencode --help` about how to run a prompt in non-interactive mode. Pass each phase to opencode, one phase at a time. Add extra context you think is needed to get the work done. Wait for opencode to finish, then review the work for the phase. Do not work on the files directly, use opencode

My product, nocodo: https://github.com/brainless/nocodo

bluedino 3 days ago||
> It took just over eleven minutes to produce this PDF,

Incredibly dumb question, but when they say this, what actually happens?

Is it using TeX? Is it producing output using the PDF file spec? Is there some print driver it's wired into?

simonw 3 days ago|
Visit this link and click on the "Thought for 11m38s" text: https://chatgpt.com/share/693ca54b-f770-8006-904b-9f31a58518... - that will show you exactly what it spent those 11 minutes doing, most of which was executing Python code using the reportlab library to generate PDF files, then visually inspecting those PDF files and deciding to make further tweaks to the code that generates them.
8cvor6j844qw_d6 3 days ago||
Does this mean I can point to a code snippet and a link to the related documentation and the coding agent refer to it instead of writing "outdated" code?

Some frameworks/languages move really fast unfortunately.

simonw 3 days ago|
Yes, definitely. I've had a lot of success already showing LLMs short examples of coding libraries they don't know about from their core training data.
lexoj 3 days ago||
In these new world order, frameworks need to stop changing their APIs for minimal marginal improvements of syntax.
canadiantim 3 days ago||
Can or should skills be used for managing the documentation of dependencies in a project and the expertise in them?

I’ve been playing with doing this but kind of doesn’t feel the most natural fit.

cubefox 3 days ago||
(Minor grammar note: "OpenAI are" -- it should say "OpenAI is" -- because "OpenAI" is a name and therefore singular.)
simonw 3 days ago||
Apparently this is a British vs American English thing. I've decided to stay stubbornly British on this one.
Esophagus4 2 days ago||
Collective nouns!

They vary between British and American English. In this case, either would acceptable depending on your dialect.

Also very noticeable with sports teams.

American: “Team Spain is going to the final.”

British: “Team Spain are going to the final.”

https://editorsmanual.com/articles/collective-nouns-singular...

cubefox 2 days ago||
I thought this was the same in all languages (my reference was German) because names are singular terms, even in British English, but apparently there are special rules.
Esophagus4 2 days ago||
Yeah, English is… a mess.

Blame it on a messy divorce a few hundred years ago :)

cubefox 2 days ago||
That would make sense if before that divorce, the British didn't use singular terms in plural constructions. Otherwise it must have to do with something else. It doesn't make much sense to me. For example, consider:

The traffic jam are expanding.

The forest are growing.

That's just like the OpenAI case.

robkop 3 days ago||
Hasn’t ChatGPT been supporting skills with a different name for several months now through “agent”?

They gave it back then folders with instructions and executable files iirc

simonw 3 days ago|
Not quite the same thing. Implementing skills specifically means that you have code which, on session start, scans the skills/*/skill.md files and reads in their description: metadata and loads that into the system prompt, along with an instruction that says "if the user asks about any of these particular things go and read the skills.md file for further instructions".

Here's the prompt within Codex CLI that does that: https://github.com/openai/codex/blob/ad7b9d63c326d5c92049abd...

I extracted that into a Gist to make it easier to read: https://gist.github.com/simonw/25f2c3a9e350274bc2b76a79bc8ae...

robkop 3 days ago||
I remember you did some reverse engineering when they released agent, does it not feel quite similar to you?

I know they didn’t dynamically scan for new skill folders but they did have mentions of the existing folders (slides, docs, …) in the system prompt

simonw 3 days ago||
The main similarity is that both of them take full advantage of the bash tool + file system combination.
Pooge 3 days ago||
Does anybody have examples of life-changing skills? I can't quite understand how they're useful, yet...
hadlock 3 days ago||
Giving the llm access to Ghidra so it can directly read and iterate through the Sudoku puzzle that is decompile binaries seems like a good one. Ghidra has a cli mode and various bindings so you can automate decompiling various binaries. For example right now if you want to isolate the physics step of Microsoft flight simulator 3.0 codex will hold your hand and walk you through (over the course of 3-4 hours, using the gui) finding the main loop and making educated guesses about which decompiled c functions in there are likely physics related, but it would be a lot easier to just give it the "Ghidra" skill and say, "isolate the physics engine and export it as a portable cargo package in rust". If you're an NSA analyst you can probably use it to disassemble and isolate interesting behavior of various binaries from state actors a lot faster.
noname120 3 days ago||
Do you have experience using Ghidra in such a way? I’m curious how well it actually performs on that use case.
hadlock 1 day ago||
Yes I extracted the physics engine from Ms flight simulator 3.0 (C) and ported it into my own project (rust) in Ghidra as a complete novice from having never opened the app to working code in rust in just over three hours. It helped a lot that I have previous experience with writing my own similar software so I knew what to start looking for, and also Ms fs 3.0 is only about 9500 loc, much of it is graphics.

But yeah codex will totally hold your hand and teach you Ghidra if you have a few hours to spare and the barest grasp of assembly

Adrig 2 days ago|||
I don't know about life-changing but to me there are two major benefits that get me really interested:

- Augmenting CLI with specific knowledge and processes: I love the ability to work on my files, but I can only call a smart generalist to do the work. With skills if I want, say, a design review, I can write the process, what I'm looking for, and design principles I want to highlight rather than the average of every blog post about UX. I created custom gems/projects before (with PDFs of all my notes), but I couldn't replicate that on CLIs.

- Great way to build your library of prompts and build on it: In my org everyone is experimenting with AI but it's hard to document and share good processes and tools. With this, the copywriters can work on a "tone of voice" skill, the UX writers can extend it with an "Interface microcopy" skill, and I can add both to my "design review" agent.

Veen 3 days ago|||
Small use case but I’m using skills for analysing and scoring content then producing charts. LLM does the scoring then calls a Python script bundled in the skill that makes a variety of PNG charts based on metrics passed in via command line arguments. Claude presents the generated files for download. The skill.md file explains how to run the analysis and how to call the script and with what options. That way, you can get very consistent charts because they’re generated programmatically, but you can use the LLM for what it’s good at.
simonw 2 days ago|||
The best examples I've seen are still the ones built into ChatGPT and Claude to improve their abilities to edit documents.

The Claude frontend-design skill seems pretty good too for getting better HTML+CSS: https://github.com/anthropics/skills/blob/main/skills/fronte...

sunaookami 3 days ago||
I have made a skill that uses Playwright to control Chrome together with functionality to extract HTML, execute JS, click things and most importantly log full network requests. It's a blessing for reverse-engineering and making userscripts.
j45 3 days ago||
Something important to keep in mind is the way skills work shouldn't be assumed to be the same and work in the same way.
taw1285 3 days ago||
Curious if anyone has applied this "Skills" mindset to how you build your tool calls for your LLM agents applications?

Say I have a CMS (I use a thin layer of Vercel AI SDK) and I want to let users interact with it via chat: tag a blog, add an entry, etc, should they be organized into discrete skill units like that? And how do we go about adding progressive discovery?

More comments...