Building more with GPT-5.1-Codex-Max

Posted by hansonw 12 hours ago

Building more with GPT-5.1-Codex-Max(openai.com)

359 points | 206 commentspage 3

epolanski 9 hours ago|

Small ot question on the GPT cli tool.

I gave it a shot last month but I did not enjoy it due to the lack of a proper planning mode and being able to accept each edit independently, has it improved?

simianwords 11 hours ago||

> Compaction enables GPT‑5.1-Codex-Max to complete tasks that would have previously failed due to context-window limits, such as complex refactors and long-running agent loops by pruning its history while preserving the most important context over long horizons. In Codex applications, GPT‑5.1-Codex-Max automatically compacts its session when it approaches its context window limit, giving it a fresh context window. It repeats this process until the task is completed.

Wouldn't the model automatically do that using attention techniques? Why do you need to do it at the token layer and not leave it to the model to automatically decide which tokens are worth paying attention to?

adastra22 11 hours ago||

Attention is quadratic, so you have to pick a cutoff for context window size. In addition, the error/noise in state space increases with longer contexts, resulting in poorer performance. So even if you're willing to take the O(n^2) slowdown of a larger context window, it still won't work.

fancy_pantser 10 hours ago||

> Attention is quadratic

Exactly. Standard Multi-Head Attention uses a matrix that grows to 4B parameters for a 64K sequence as a starting place. FlashAttention v2 helps slightly, but as you grow to 128K context length, you still need over 1TB/s memory bandwidth to stay compute-bound in practice even with this optimization.

So there has been a lot of research in this area and model architectures released this year are showing some promising improvements. Sliding windows lose context fidelity and if you go fully linear, you sacrifice math, logic, and long multi-turn (agentic) capabilities, so everyone is searching for a good alternative compromise.

MiniMax-M1 had lightning attention to scale up to 1M context lengths. It's "I/O aware" via tiling and calculates attention two ways block-wise (intra-block traditional attention and inter-block linear attention), thereby avoiding the speed-inhibiting cumulative summation.

DeepSeek V3.2 uses DeepSeek Sparse Attention (DSA), which is sub-linear by only computing "interesting" pairs. For example, in 128K context lengths this requires only 10-20% of attention pairs to be materialized.

Both Qwen3-Next and Kimi Linear adopt a Gated DeltaNet, which is borrowed from Mamba2. In Qwen3-Next it alternates three Gated DeltaNet (linear attention) layers for every one gated [full] attention. The speedup is from a delta rule, which basically amounts to caching in a hand-wavy way.

There's no universally-adopted solution yet, as these are all pretty heavy-duty compromises, but the search is going strong right now for linear or better attention mechanisms that still perform well.

qsort 11 hours ago||

> due to context-window limits

simianwords 11 hours ago||

context window is not some physical barrier but rather the attention just getting saturated. what did i get wrong here?

qsort 11 hours ago|||

> what did i get wrong here?

You don't know how an LLM works and you are operating on flawed anthropomorphic metaphors.

Ask a frontier LLM what a context window is, it will tell you.

Palmik 10 hours ago|||

It's a fair question, even if it might be coming from a place of misunderstanding.

For example, DeepSeek 3.2, which employs sparse attention [1], is not only faster with long context than normal 3.1, but also seems to be better (perhaps thanks to reducing the noise?).

[1] It uses still quadratic router, but it's small, so it scales well in practice. https://api-docs.deepseek.com/news/news250929

ed 9 hours ago|||

Parent is likely thinking of sparse attention which allows a significantly longer context to fit in memory

qsort 9 hours ago||

My comment was harsher than it needed to be and I'm sorry, I think I should have gotten my point across in a better way.

With that out of the way, parent was wondering why compaction is necessary arguing that "context window is not some physical barrier but rather the attention just getting saturated". We're trying to explain that 3+2=2+3 and you people are sitting in the back going "well, actually, not all groups are abelian".

paradite 10 hours ago||||

In theory, auto-regressive models should not have limit on context. It should generate the next token with all previous tokens.

In practice, when training a model, people select a context window so that during inference, you know how much GPU memory to allocate for a prompt and reject the prompt if it exceeds the memory limit.

Of course there's also degrading performance as context gets longer, but I suspect memory limit is the primary factor of why we have context window limits.

kenjackson 9 hours ago|||

I think attention literally doesn't see anything beyond the context window. Even within the context window you may start to see attentional issues, but that's a different problem.

tunesmith 10 hours ago||

I've been dealing with Codex CLI for a while and I love it, but I'm wondering if my thinking is just limited. While I'm starting discussions and creating plan docs, I've never been able to ask it to do anything that takes it longer than 25 minutes or so. Usually far less. I'm having trouble imagining what I can ask it to do that would make it take hours - like, wouldn't that require putting together an absolutely massive planning doc that would take hours to put together anyway? I'd rather just move incrementally.

GenerWork 10 hours ago||

Perhaps they're combining an incredibly complex product that has a lot of interactive features, a big codebase, test creation, and maybe throwing some MCP stuff in there such as creating creating a ticket in Jira if a test fails?

CuriouslyC 10 hours ago|||

Easy way to get an agent to run a long time is just to get it to babysit CI/CD, tell it to iterate on it until it passes. I got Sonnet 4 to run for >6 hours that way.

aerhardt 9 hours ago||

The idea of giving it a task that may take six hours and reviewing it also gives me shivers.

I'm a very happy Codex customer, but everything turns to disgusting slop if I don't provide:

(1) Up-to-date AGENTS.md and an excellent prompt

(2) A full file-level API with function signatures, return types and function-level guidance if it's a complex one

(3) Multiple rounds of feedback until the result is finely sculpted

Overall it's very small units of work - one file or two, tops.

I've been letting the above standards go for the last couple of weeks due to crunch and looking at some of the hotspots of slop now lying around has me going all Homelander-face [1] at the sight of them.

Those hotspots are a few hundred lines in the worst cases; I'm definitely not ready to deal with the fallout of any unit of work that takes even more than 20min.

[1] https://i.kym-cdn.com/entries/icons/original/000/050/702/ab7...

jillesvangurp 6 hours ago||

I've been doing a few fairly big refactorings on our code base in the last few days. It does a decent job and I generally don't put a lot of effort in my prompts.

It seems to pick a lot up from my code base. I do have an Agents.md with some basics on how to run stuff and what to do that seems to help it going off on a wild goose chase trying to figure out how to run stuff by doing the wrong things.

I think from first using codex around July to now has been quite a journey where it improved a lot. It actually seems to do well in larger code bases where it has a lot of existing structure and examples of how things are done in that code base. A lot of things it just does without me asking for them just because there's a lot of other code that does it that way.

After recent experiences, I have some confidence this might work out well.

spmartin823 11 hours ago||

I still want something no one has, which is the ability to launch agents in different git worktrees simultaneously and check the results out on my main branch for testing when they are finished.

agentifysh 11 hours ago||

lots of tools that do this and I ended up going down this rabbit hole something that could just plug in to codex instead of requiring a fork

http://github.com/agentify-sh/10x

does minimal overhead with agent orchestration (its just a bash/typescript) as its main focus was adding enhancements to codex like double redundant checkpoint via git and jj (lessons learned from codex being git reset --hard happy), something like claude skills (just a bunch of mds that steer it towards specific activity like think, plan, execute), timeout wrappers (to get you unstuck if codex waits a long time), blacklist commands during yolo (rm -rf, git reset banned even if it by small chance run it) MIT licensed

you can work sequentially (subagents launch one after the other) or parallel (worktrees) but tbh sequentially is better because you understand what is going on with parallel it might be best for dealing with tests and UI.

poly2it 11 hours ago||

Your link is a 404.

cube2222 11 hours ago|||

I think I’ve described how I achieve kinda your desired workflow in a comment yesterday [0].

[0]: https://news.ycombinator.com/item?id=45970668

agentifysh 11 hours ago||

ha! very interesting how slept on jj is

its been essential to my workflow as well

i use both jj and git and jj is great for just creating a snapshot that i can revert to incase it fails

im still exploring it to see what else i can do with it for agentic use

rane 9 hours ago|||

tmux users might find this useful: https://github.com/raine/workmux

lysecret 11 hours ago|||

Cursor has this too

bradly 11 hours ago||

Would this be similar to how Charlie and Jules work?

esafak 7 hours ago||

How efficient is it; does it go through your subscription quota faster?

tptacek 10 hours ago||

Is "compaction" a trained-in feature of the model, or just tooling around the model calls? Agents already do compaction.

hereme888 8 hours ago||

It's getting so cut-throat for who has the current SOTA model. Seems to be the big income driver.

rolisz 9 hours ago||

I got prompted to try it out on the web. It gave me this after 5 minutes:

"I wasn’t able to finish creating the new base homepage module template and updating every module to inherit from it within the available time. I did not make any changes or commits."

Told it to get back to work. Let's see how that goes.

nowittyusername 6 hours ago||

Glad to see evolution of proper context management. the automatic compacting is months overdue so happy to see it finally come.

ed_mercer 1 hour ago|

As a long time CC user, I was like "Wait, they didn't have auto-compaction all this time??"

kachapopopow 9 hours ago|

not sure if I am actually using 5.1-codex-max or just normal 5.1-codex (is there even 5.1-codex?) trying to continue work where gemini 3 left off and couple prompts in I had to switch back since it was reimplementing and changing things that didn't need changing and attempted to solve typos by making the code implementing those things work with the typo, weird behavior - probably is not compatible with the style gemini tries to solve problems.

sumedh 9 hours ago|

Just run the /model command in codex and select the model which you want.

More comments...