Top
Best
New

Posted by meetpateltech 9 hours ago

GPT-5.3-Codex(openai.com)
1041 points | 397 commentspage 2
tombert 5 hours ago|
Actually kind of excited for this. I've been using 5.2 for awhile now, and it's already pretty impressive if you set the context window to "high".

Something I have been experimenting with is AI-assisted proofs. Right now I've been playing with TLAPS to help write some more comprehensive correctness proofs for a thing I've been building, and 5.2 didn't seem quite up to it; I was able to figure out proofs on my own a bit better than it was, even when I would tell it to keep trying until it got it right.

I'm excited to see if 5.3 fairs a bit better; if I can get mechanized proofs working, then Fields Medal here I come!

EnPissant 2 hours ago|
"High" the the reasoning level. The context window never changes.
tombert 2 hours ago||
You're right! Still learning the details of this agentic stuff; I was pretty late to the party.
RivieraKid 6 hours ago||
Do software engineers here feel threatened by this? I certainly am. I'm surprised that this topic is almost entirely missing in these threads.
AstroBen 4 hours ago||
No. It turns into a complete mess without someone that knows what they're doing to steer it. It's an upgrade to autocomplete
energy123 1 hour ago||
Unless you're retiring in less than 5 years this is extremely short sighted.
lurking_swe 19 minutes ago|||
It’s also silly to try predicting the future 5 years from now, IMO. Historically progress is very unpredictable. It often plateaus when you least expect it.

It’s good to be cautious and not in denial, but i usually ignore people who talk so authoritatively about the future. It’s just a waste of time. Everyone thinks they are right.

My recommendation is have a very generous emergency fund and do your best to be effective at work. That’s the only thing you can control and the only thing that matters.

AstroBen 15 minutes ago|||
What would things look like to make someone with currently ~10 years of experience unemployable?

It's possible the job might change drastically, but I'm struggling to think of any scenario that doesn't also put most white collar professions out of work alongside me, and I don't think that's worth worrying about

llmslave 6 hours ago|||
theres alot of denial, and people that havent taken a serious look at the ai models
vatsachak 6 hours ago|||
AI is mostly garbage at creating useful abstractions. I'd feel threatened if I was a competitive programmer or IMO kid
OsrsNeedsf2P 6 hours ago|||
I would feel threatened if I didn't invest in learning how to best use AI
ReptileMan 6 hours ago|||
Jevons paradox hints that the situation is not as bleak as it sounds.
svachalek 3 hours ago||
I've been in this profession for 32 years now and this is my experience. Every time coding gets easier or cheaper, the response is first to lay off developers but quickly the demand for more software spikes and they need everyone back and more than ever.

When we achieve true AGI we're truly cooked, but it won't just be software developers by definition of AGI, it will be everyone else too. But the last people in the building before they turn the lights out for good will be the software developers.

worldsavior 6 hours ago||
No. AI does not work well enough, you still need a person to look on it and CODE. It probably never will, until AGI which probably also in my opinion will never come.
dude250711 5 hours ago||
It's a super-special AI tier that can replace developers and other grunts, yet somehow cannot replace managers and C-suite.

It can only replace whoever is not writing a fat cheque to it.

trilogic 9 hours ago||
When 2 multi billion giants advertise same day, it is not competition but rather a sign of struggle and survival. With all the power of the "best artificial intelligence" at your disposition, and a lot of capital also all the brilliant minds, THIS IS WHAT YOU COULD COME UP WITH?

Interesting

sdf2erf 9 hours ago||
Yeah they are both fighting for survival. No surprise really.

Need to keep the hype going if they are both IPO'ing later this year.

thethimble 8 hours ago|||
The AI market is an infinite sum market.

Consider the fact that 7 year old TPUs are still sitting at near 100p utilization today.

superze 8 hours ago|||
How many IPOs can a company really do?
re-thc 8 hours ago||
As many as they want. They can "spin off" and then "merge" again.
rishabhaiover 9 hours ago|||
What happened to you?
raincole 9 hours ago||
AI fried brains, unfortunately.
wasmainiac 9 hours ago||
I mean, he has a point it’s just not very eloquently written.
trilogic 8 hours ago||
I empathize with the situation, no elegance from them, no eloquence from me :)
lossolo 9 hours ago|||
What's funny is that most of this "progress" is new datasets + post-training shaping the model's behavior (instruction + preference tuning). There is no moat besides that.
Davidzheng 9 hours ago|||
"post-training shaping the models behavior" it seems from your wording that you find it not that dramatic. I rather find the fact that RL on novel environments providing steady improvements after base-model an incredibly bullish signal on future AI improvements. I also believe that the capability increase are transferring to other domains (or at least covers enough domains) that it represents a real rise in intelligence in the human sense (when measured in capabilities - not necessarily innate learning ability)
CuriouslyC 7 hours ago||
What evidence do you base your opinions on capability transfer on?
WarmWash 8 hours ago||||
>There is no moat besides that.

Compute.

Google didn't announce $185 billion in capex to do cataloguing and flash cards.

causalmodels 8 hours ago||
Google didn't buy 30% of Anthropic to starve them of compute
WarmWash 8 hours ago||
Probably why it's selling them TPUs.
riku_iki 6 hours ago|||
> is new datasets + post-training shaping the model's behavior (instruction + preference tuning). There is no moat besides that.

sure, but acquiring/generating/creating/curating so much high quality data is still significant moat.

dllrr 7 hours ago||
Using opus 4.6 in claude code right now. It's taking about 5x longer to think things through, if not more.
andyferris 6 hours ago|
The notes explicitly call out you may want to dial the effort setting back to medium to reduce latency/tokens (high being default, apparently there is a max setting too).
mmaunder 4 hours ago||
Take a screenshot of ARG-AGI-2 leaderboard now because GPT-5.3-Codex isn't up there yet and I suspect it'll cram down Claude Opus 4.6 which rules the roost for the next few hours. King for a day.
farazbabar 2 hours ago||
I have wanted to hold back from answering comments that ask for proof of real work/productivity gains because everyone works differently, has different skill levels and frankly not everyone is working on world changing stuff. I really liked a comment someone made a few of these posts ago, these models are amazing! amazing! if you don't actually need them, but if you actually do need them, you are going to find yourself in a world of hurt. I cannot agree more, I (believe) I am a good software engineer, I have developed some interesting pieces of software over the decades and usually when I got passionate about a project, I could do really interesting things within weeks, sometimes months. I will say this, I am working on some really cool stuff, stuff I cannot tell you about, or else. And my velocity is for what used to take months is days and hours for what used to take weeks. I still review everything, I understand all the gotchas of distributed systems, performance, latency/throughput, C, java, SQL, data and infra costs, I get all of it so I am able to catch these mofos when they are about to stab me in the back but man! my productivity is through the roof. And I am loving it. Just so I can avoid saying I cannot tell you I am working on, I will start something that I can share soon (as soon as decades of pent up work is done, its probably less than a few months away!). Take it with a grain of salt, and know this, these things are not your friends, they WILL stab you in the back when you least expect them, cut a corner, take a short cut, so you have to be the PHB (dilbert reference!) with actual experience to catch them slacking. Good luck.
morleytj 9 hours ago||
The behind the scenes on deciding when to release these models has got to be pretty insanely stressful if they're coming out within 30 minutes-ish of each other.
meisel 8 hours ago||
I wonder if their "5.3" was continuously being updated, with regenerated benchmarks with each improvement, and they just stayed ready to release it when claude released
morleytj 7 hours ago||
This seems plausible. It would be shocking if these companies didn't have an automated testing suite which is recomputing these benchmarks on a regular basis, and uploading to a dashboard for supervision.

Given that they already pre-approved various language and marketing materials beforehand there's no real reason they couldn't just leave it lined up with a function call to go live once the key players make the call.

Havoc 8 hours ago||
It’s also functionally not likely without some sort of insider knowledge or coordination
morleytj 8 hours ago||
Could be, could also be situations where things are lined up to launch in the near future and then a mad dash happens upon receiving outside news of another launch happening.

I suppose coincidences happen too but that just seems too unlikely to believe honestly. Some sort of knowledge leakage does seem like the most likely reason.

gallerdude 7 hours ago||
Both Opus 4.6 and GPT-5.3 one shot a Gameboy emulator for me. Guess I need a better benchmark.
well_ackshually 5 hours ago||
There's hundreds of gameboy emulators available on Github they've been trained on. It's quite literally the simplest piece of emulation you could do. The fact that they couldn't do it before is an indictment of how shit they were, but a gameboy emulator should be a weekend project for anyone even ever so slightly qualified. Your benchmark was awful to begin with.
nasreddin 1 hour ago|||
"a gameboy emulator should be a weekend project for anyone even ever so slightly qualified" do you really believe something so ridiculous?
plantain 2 hours ago|||
Your expectations are wild. Most software engineers could not write a game boy emulator - and now you need zero programming skills whatsoever to write one.
paxys 7 hours ago|||
As coding agents get "good enough" the next differentiator will be which one can complete a task in fewer tokens.
tgtweak 7 hours ago|||
Or quicker, or more comprehensively for the same price.
nlh 6 hours ago|||
Or the same number of tokens in less time. Kinda feels like the CPU / modem wars of the 90s all over again - I remember those differences you felt going from a 386 -> 486 or from a 2400 -> 9600 baud modem.

We're in the 2400 baud era for coding agents and I for one look forward to the 56k era around the corner ;)

gf000 5 hours ago||
Is such an emulator not part of their training data sets?
fishpham 9 hours ago||
Model card: https://cdn.openai.com/pdf/23eca107-a9b1-4d2c-b156-7deb4fbc6...
textlapse 6 hours ago|
I would love to see a nutritional facts label on how many prompts / % of code / ratio of human involvement needed to use the models to develop their latest models for the various parts of their systems.
More comments...