GPT-5.3-Codex - Hacker News

Posted by meetpateltech 11 hours ago

1109 points | 424 commentspage 3

ffitch 10 hours ago|

> our team was blown away > by how much Codex was able > to accelerate its own development

they forgot to add “Can’t wait to see what you do with it”

kingstnap 11 hours ago||

> GPT‑5.3-Codex was co-designed for, trained with, and served on NVIDIA GB200 NVL72 systems. We are grateful to NVIDIA for their partnership.

This is hilarious lol

uh_uh 11 hours ago|

How so?

Philpax 11 hours ago|||

They're on shaky ground right now https://arstechnica.com/information-technology/2026/02/five-...

kingstnap 10 hours ago|||

Its kind of a suck up that more or less confirms the beef stories that were floating around this past week.

In case you missed it. For example:

Nvidia's $100 billion OpenAI deal has seemingly vanished - Ars Technica

https://arstechnica.com/information-technology/2026/02/five-...

Specifically this paragraph is what I find hilarious.

> According to the report, the issue became apparent in OpenAI’s Codex, an AI code-generation tool. OpenAI staff reportedly attributed some of Codex’s performance limitations to Nvidia’s GPU-based hardware.

dajonker 9 hours ago|||

There was never a $100 billion deal. Only a letter of intent which doesn't mean anything contractually.

esafak 10 hours ago|||

> OpenAI staff reportedly attributed some of Codex’s performance limitations to Nvidia’s GPU-based hardware.

They should design their own hardware, then. Somehow the other companies seem to be able to produce fast-enough models.

textlapse 7 hours ago||

I would love to see a nutritional facts label on how many prompts / % of code / ratio of human involvement needed to use the models to develop their latest models for the various parts of their systems.

farazbabar 4 hours ago||

I have wanted to hold back from answering comments that ask for proof of real work/productivity gains because everyone works differently, has different skill levels and frankly not everyone is working on world changing stuff. I really liked a comment someone made a few of these posts ago, these models are amazing! amazing! if you don't actually need them, but if you actually do need them, you are going to find yourself in a world of hurt. I cannot agree more, I (believe) I am a good software engineer, I have developed some interesting pieces of software over the decades and usually when I got passionate about a project, I could do really interesting things within weeks, sometimes months. I will say this, I am working on some really cool stuff, stuff I cannot tell you about, or else. And my velocity is for what used to take months is days and hours for what used to take weeks. I still review everything, I understand all the gotchas of distributed systems, performance, latency/throughput, C, java, SQL, data and infra costs, I get all of it so I am able to catch these mofos when they are about to stab me in the back but man! my productivity is through the roof. And I am loving it. Just so I can avoid saying I cannot tell you I am working on, I will start something that I can share soon (as soon as decades of pent up work is done, its probably less than a few months away!). Take it with a grain of salt, and know this, these things are not your friends, they WILL stab you in the back when you least expect them, cut a corner, take a short cut, so you have to be the PHB (dilbert reference!) with actual experience to catch them slacking. Good luck.

modeless 10 hours ago||

It's so difficult to compare these models because they're not running the same set of evals. I think literally the only eval variant that was reported for both Opus 4.6 and GPT-5.3-Codex is Terminal-Bench 2.0, with Opus 4.6 at 65.4% and GPT-5.3-Codex at 77.3%. None of the other evals were identical, so the numbers for them are not comparable.

alexhans 10 hours ago||

Isn't the best eval the one you build yourself, for your own use cases and value production?

I encourage people to try. You can even timebox it and come up with some simple things that might look initially insufficient but that discomfort is actually a sign that there's something there. Very similar to moving from not having unit/integration tests for design or regression and starting to have them.

rsanek 10 hours ago|||

I usually wait to see what ArtificialAnalysis says for a direct comparison.

input_sh 10 hours ago||

It's better on a benchmark I've never heard of!? That is groundbreaking, I'm switching immediately!

modeless 10 hours ago||

I also wasn't that familiar with it, but the Opus 4.6 announcement leaned pretty heavily on the TerminalBench 2.0 score to quantify how much of an improvement it was for coding, so it looks pretty bad for Anthropic that OpenAI beat them on that specific benchmark so soundly.

Looking at the Opus model card I see that they also have by far the highest score for a single model on ARC-AGI-2. I wonder why they didn't advertise that.

input_sh 10 hours ago||

No way! Must be a coinkydink, no way OpenAI knew ahead of time that Anthropic was gonna put a focus on that specific useless benchmark as opposed to all the other useless benchmarks!?

I'm firing 10 people now instead of 5!

dawidg81 10 hours ago||

May AI not write the code for me.

May I at least understand what it has "written". AI help is good but don't replace real programmers completely. I'm enough copy pasting code i don't understand. What if one day AI will fall down and there will be no real programmers to write the software. AI for help is good but I don't want AI to write whole files into my project. Then something may broke and I won't know what's broken. I've experienced it many times already. Told the AI to write something for me. The code was not working at all. It was compiling normally but the program was bugged. Or when I was making some bigger project with ChatGPT only, it was mostly working but after a longer time when I was promting more and more things, everything got broken.

katspaugh 10 hours ago||

Honest question: have you tried evolving your code architecture when adding features instead of just "promting more and more things"?

dawidg81 8 hours ago||

I've tried that too but it was almost the same, chatgpt kept forgetting many things about the code and project structure. In summary AI can get problematic for me and i get with troubles with it. This is like one of the reasons why I still prefer traditional text editor for writing code like Vim over a "software on steroids" like VS Code and things like that...

pixl97 9 hours ago|||

> What if one day AI will fall down and there will be no real programmers to write the software.

What if you want to write something very complex now that most people don't understand? You keep offering more money until someone takes the time to learn it and accomplish it, or you give up.

I mean, there are still people that hammer out horseshoes over a hot fire. You can get anything you're willing to pay money for.

nubg 8 hours ago||

Sorry but companies will not hire you but instead a person who learned how to code with AI. Get with the times or lose.

dawidg81 8 hours ago|||

I'm afraid of all of the modern world especially in technology, I guess if now I would "come back" to all of modern and new things: the commercialized world, AI, corporations, etc...my head would explode. I mean I can't imagine living in such world. I am not sure if everything would be alright eith myself in all this everything,This is just too much...

cheeze 8 hours ago|||

It's that Austin Powers clip of the guy slowly getting smooshed by the steam roller.

koolala 8 hours ago||

I want to recompile a Rust project to be f32 instead of f64.

Am I better off buying 1 month of Codex, Claude, or Antigravity?

I want to have the agent continuesly recompile and fix compile errors on loop until all the bugs from switching to f32 are gone.

azuanrb 39 minutes ago||

Codex by a mile. Also, there's double rate limit until April. So you're paying 1 month for 2 months usage.

TuxSH 6 hours ago|||

If I'm not mistaken Codex is free until April 2nd with the previous generous rate limits (while paying customers get 2x).

vatsachak 8 hours ago|||

Literally just find and replace

koolala 6 hours ago||

find and replace is step 1 that generates all the compile errors I want it to loop through

I'm wanting to do it on an entire programming language made in rust: https://github.com/uiua-lang/uiua

Because there are no float32 array languages in existence today

xyzsparetimexyz 5 hours ago||

Why do you want a float32 array language? Anyway the free glm4.6 model that is opencode defaults to should be fine. Why pay for something to do this.

koolala 4 hours ago||

I want to use an array language for Real-time 3D. Float32 is faster for real-time calculations and can map memory directly to the GPU since 3D graphics runtimes are limited to float32.

argsnd 8 hours ago|||

All of them can do it but Codex has the least frustrating usage limits.

koolala 6 hours ago||

When using it in VSCode? The browser system running its own container seems like it would be the most demanding on their resources. The stand-alone client is Mac-only but I don't know if it makes a difference.

My goal is to do it within the usage I get from a $20 monthly plan.

energy123 2 hours ago||

Why would you use it in VSCode?

OpenAI are offering double the normal usage limits for Codex for two months. Go with them and do it in the terminal or the Mac OS codex app if you have a Mac.

koolala 1 hour ago||

It's different to use it in the terminal vs. vscode? Don't have a mac.

energy123 35 minutes ago||

Sorry I wasn't aware it's available in vscode. Scratch my suggestion, then.

koolala 22 minutes ago||

It is confusing especially when token efficiency is on the line.

EmilStenstrom 8 hours ago||

Doesn't matter which one. All of them can do things like this now, given a good enough feedback loop. Which your problem has.

sidgarimella 6 hours ago||

Many are saying codex is more interactive but ironically I think that very interactivity/determinism works best when using codex remotely as a cloud agent and in highly async cases. Conversely I find opus great locally, where I can ram messages into it to try to lever its autonomy best (and interrupt/clean up)

prng2021 10 hours ago||

Did they post the knowledge cutoff date somewhere

jacekm 6 hours ago|

It's here: https://platform.claude.com/docs/en/about-claude/models/over...

Reliable knowledge cutoff: May 2025, training data cutoff: August 2025

brikym 1 hour ago||

This is the thread for GPT 5.3

vatsachak 8 hours ago|

AI designed websites are so easy to spot that I need to actively design my UI so that it doesn't look AI

More comments...