GPT‑5.3‑Codex‑Spark

Posted by meetpateltech 6 hours ago

428 points | 190 commentspage 2

pdeva1 6 hours ago|

This is closer to 5.1 mini it seems and tied to Pro account. GLM 4.7 is available on-demand on Cerebras today [1] and performs better and cheaper... [1] https://www.cerebras.ai/blog/glm-4-7

ehzb2827 5 hours ago|

GLM 4.7 scores 41.0% on Terminal Bench 2.0 [1] compared to 58.4% for GPT-5.3-Codex-Spark [2].

[1] https://z.ai/blog/glm-4.7 [2] https://openai.com/index/introducing-gpt-5-3-codex-spark/

mbm 2 hours ago||

Works pretty well as a general-purpose computer. The speed is really enjoyable. Could replace some of my Claude Code use actually. For coding, set to xhigh and use it for personal tools or small projects.

Example repo that Codex with spark made in about 15 minutes for me since `claude --resume` has been finicky lately: https://github.com/mzxrai/claude-sessions

jbellis 2 hours ago||

really too bad that the codex models are so tightly coupled to the codex harness as to be useless for everything else

edit: not useless in a absolute sense, but worse than the vanilla gpt models

thehamkercat 1 hour ago|

GPT-5.2-codex or 5.3-codex Works pretty well for me in opencode

ttul 3 hours ago||

Great move by OpenAI. With coding agents, if you have access to a fast and cheap model, you can afford to let it rip, making lots of mistakes, and iterate until it gets things right. With the right scaffolding (AGENTS.md, SKILLS.md, etc.), a fast and light model can do great things. And when it's done, you can still have the heavyweight model come in to clean up any messes.

antirez 6 hours ago||

The search for speed is vain. Often Claude Code Opus 4.6, on hard enough problems, can do the impression of acting fast without really making progresses because of lack of focus on what matters. Then you spin the much slower GPT 5.3-Codex and it fixes everything in 3 minutes of doing the right thing.

mickeyp 6 hours ago||

I disagree. This is great for bulk tasks: renaming, finding and searching for things, etc

ghosty141 2 hours ago||

What codex often does for this, write a small python script and execute that to bulk rename for example.

I agree that there is use for fast "simpler" models, there are many tasks where the regular codex-5.3 is not necessary but I think it's rarely worth the extra friction of switching from regular 5.3 to 5.3-spark.

Aurornis 5 hours ago|||

I will always take more speed. My use of LLMs always comes back to doing something manually, from reviewing code to testing it to changing direction. The faster I can get the LLM part of the back-and-forth to complete, the more I can stay focused on my part.

jusgu 5 hours ago||

disagree. while intelligence is important, speed is especially important when productionizing AI. it’s difficult to formalize the increase in user experience per increase in TPS but it most definitely exists.

capevace 5 hours ago||

Seems like the industry is moving further towards having low-latency/high-speed models for direct interaction, and having slow, long thinking models for longer tasks / deeper thinking.

Quick/Instant LLMs for human use (think UI). Slow, deep thinking LLMs for autonomous agents.

gaigalas 5 hours ago||

You always want faster feedback. If not a human leveraging the fast cycles, another automated system (eg CI).

Slow, deep tasks are mostly for flashy one-shot demos that have little to no practical use in the real world.

foobar10000 4 hours ago||

I mean, yes, one always does want faster feedback - cannot argue with that!

But some of the longer stuff - automating kernel fusion, etc, are just hard problems. And a small model - or even most bigger ones, will not get the direction right…

gaigalas 3 hours ago||

From my experience, larger models also don't get the direction right a surprising amount of times. You just take more time to notice when it happens, or start to be defensive (over-specing) to account for the longer waits. Even the most simple task can appear "hard" with that over spec'd approach (like building a react app).

Iterating with a faster model is, from my perspective, the superior approach. Doesn't matter the task complexity, the quick feedback more than compensates for it.

varispeed 5 hours ago||

Are they really thinking or are they sprinkling them with Sleep(x)?

storus 4 hours ago||

Anyone using OpenClaw to manage a bunch of coding agents so that you only set the high-level vision and leave all the prompting, testing, debugging, forking to agents? If yes, how did you glue it all together? Are you using local models? What is the SOTA for what I can run locally with a 512GB M3 Ultra, 2x DGX Spark, 2x RTX Pro 6000 Max-Q in one machine and 1x RTX Pro 6000 WS in another machine?

OsrsNeedsf2P 6 hours ago|

No hint on pricing. I'm curious if faster is more expensive, given a slight trade-off in accuracy

sauwan 5 hours ago|

It's either more expensive or dumber.

kristianp 28 minutes ago||

It will be more expensive because it's running on more expensive hardware, Cerebras. Does it also need to be smaller to fit on a single Cerebras node?

More comments...