Why OpenAI's Codex Isn't Your Pair Programmer — And That's the Point

Friday night. Glass of wine. OpenAI's Codex just dropped, and I was ready for the future of AI coding.

Immediate reaction: confusion, followed by underwhelm. No live preview. Each task required spinning up a fresh environment — slow, opaque, and completely unlike the real-time feedback loop I was used to from Claude Code and Cursor. Where was the co-pilot experience? Where was the instant gratification?

Second glass of wine. Watched the actual release videos and documentation. The penny dropped.

Codex isn't built for what I was expecting. It's built for something entirely different — and arguably more important.

The Vibe Coding Misunderstanding

If you've been using Cursor, Claude Code, or any of the current crop of AI coding tools, you're in the vibe coding paradigm. Real-time collaboration. Instant feedback. The AI writes alongside you, suggesting completions, answering questions, iterating through multi-turn conversations. It feels like pair programming.

Codex isn't pair programming. It's delegation.

The design philosophy is fundamentally different. You give Codex a task — not a conversation, a task — and it works on it asynchronously. It spins up an isolated environment, reads the codebase, writes code, runs tests, and produces a result. You don't see the work happening. You see the output.

This is jarring if you expect a co-pilot. It makes perfect sense if you think about what enterprise engineering teams actually need.

The Enterprise Play

Most enterprise codebases have thousands of open issues. Bug fixes. Dependency updates. Small refactors. Test coverage gaps. Documentation. The kind of work that's important but never urgent enough to prioritise over customer-facing features.

This is the work Codex is designed for. Not the creative, exploratory coding that benefits from real-time collaboration. The boring, essential work that benefits from parallelism and automation.

Imagine firing off 50 backlog tasks at once. Codex works on each independently, in parallel. Even if it only nails 30% of them on the first try, that's 15 tasks resolved without any developer time. The other 35 come back with partial progress that a developer can finish in minutes instead of hours.

That's not a co-pilot. It's a workforce.

What This Means for Teams

Codex's design implies a specific vision for how AI integrates into engineering teams: not replacing developers, but restructuring the division of labour.

Human engineers focus on architecture, creative problem-solving, and customer-facing features — the work that benefits from judgment and context. Codex handles the maintenance, the tech debt, the tickets that sit in the backlog because nobody wants to do them.

The ongoing tension between product managers (who want features) and engineers (who want quality) might actually find balance when AI agents absorb the invisible labour of keeping a codebase healthy.

The Honest Caveats

My testing revealed real friction. You can't steer Codex mid-task — once it's running, you wait for the result. There's no live preview, so you're trusting the tests. And the environment setup adds overhead that makes Codex impractical for quick, one-off tasks.

These aren't fatal flaws. They're tradeoffs. The asynchronous model is the right architecture for parallel background work. It's the wrong architecture for the interactive coding session where I'm figuring out what I want to build while I build it.

Both architectures will coexist. Claude Code for thinking and building. Codex for maintenance and scale.

Where I Land

My initial disappointment was entirely about mismatched expectations. I approached Codex as a vibe coder and found a tool that wasn't designed for vibes. It's designed for enterprises with 10,000-issue backlogs, for teams drowning in tech debt, for the kind of work that's important and unfulfilling and perfect for automation.

I'm keeping Claude Code for my daily development work. But I'm watching Codex closely for the client projects where the backlog never shrinks and the tech debt never gets addressed because there are always more features to build.

Different tools. Different problems. Both important.