Codex agents¶

Codex is OpenAI's umbrella brand for autonomous and semi-autonomous coding agents. Where ChatGPT itself is a generalist chat product, "Codex" denotes a specific agentic stack: a terminal CLI, an IDE sidebar, a hosted cloud agent, and a set of mobile / chat triggers, all sharing the same backend reasoning model, the same on-disk configuration, the same project-level instructions file (AGENTS.md), and the same registry of Model Context Protocol (MCP) tools. The product line was relaunched on May 16, 2025 as a research preview of a cloud software-engineering agent powered by a model called codex-1, and over the following year it grew into the multi-surface platform documented here. The information on this page reflects state observable in May 2026; the volatile sections (model lineup, pricing) are explicitly marked.

A historical note before the meat: the name "Codex" first appeared in 2021 as OpenAI's code-davinci-002 / code-cushman-001 line, the model that powered the original GitHub Copilot. That model family was deprecated in March 2023. The modern Codex described here is an agent product, not a model, and the names look similar only because OpenAI revived the brand.

The map of Codex surfaces¶

The modern Codex consists of four execution surfaces, all of which talk to the same backend agent service and read the same on-disk configuration.

Surface	Where it runs	Primary trigger
Codex CLI	Local terminal (Rust binary)	`codex` command
Codex IDE extension	VS Code / Cursor / Windsurf / JetBrains 2025.3+	Sidebar chat
Codex Cloud (Codex Web)	OpenAI-managed sandbox VMs	`chatgpt.com/codex`, ChatGPT iOS, Slack, GitHub PR comments
Codex desktop app	macOS native; Android remote-control app in preview	`codex app` or installer

These surfaces are not separate products with bridges between them; they are different front-ends onto the same agent. A task started in the CLI can be promoted to a Cloud VM if it needs longer compute. A session started in the IDE can be resumed from the CLI with codex resume. A Cloud task can be reviewed and merged from the iOS app. The shared on-disk state lives in ~/.codex/ and consists of config.toml (preferences, profiles, MCP servers), auth.json (OAuth refresh tokens), session transcripts under ~/.codex/sessions/, and an optional global AGENTS.md.

Beyond those four surfaces, two ambient triggers exist: /review mode, which spawns a second Codex session that scopes itself to a diff and produces structured review comments, and Codex's role as a third-party coding agent inside GitHub Copilot, where it can be assigned to issues on github.com under the Copilot "third-party agents" framework (more on that below).

The Codex CLI¶

What it is¶

Codex CLI is an open-source coding agent that runs locally and operates on the codebase in the current working directory. It is published under the Apache-2.0 licence at github.com/openai/codex and is implemented almost entirely in Rust (~96% of the repository). The original 2025 versions were Node.js; the Rust rewrite shipped in September 2025 and was driven primarily by two concerns: cold-start latency (the Node version took multiple seconds before the first prompt could be answered) and the need to bind directly against OS-level sandboxing primitives (Landlock, Seatbelt, seccomp) without going through a high-overhead bridge.

Install and run¶

The two officially supported install paths are npm (a thin Node shim that downloads the platform-specific Rust binary) and Homebrew (cask):

# npm — requires Node.js 22+, shim invokes the Rust binary
npm install -g @openai/codex

# Homebrew — macOS and Linux
brew install --cask codex

# Or download the platform binary from a GitHub Release directly
# e.g. codex-aarch64-apple-darwin.tar.gz, codex-x86_64-unknown-linux-musl.tar.gz

codex            # launch the interactive TUI
codex --version

On Windows, OpenAI recommends WSL2 for the best experience. A PowerShell-native build exists that uses Windows Sandbox containment, but several Landlock-equivalent isolation guarantees do not hold on the native Windows build (see Limitations).

Authentication — ChatGPT plan vs API key¶

On first launch, Codex offers two auth modes. Sign in with ChatGPT is the recommended path: a browser OAuth flow drops a refresh token in ~/.codex/auth.json and from that point on, Codex usage draws from the developer's ChatGPT Plus / Pro / Business / Edu / Enterprise quota at no marginal cost. The CLI uses a loopback callback on localhost:1455, so SSH sessions need a port-forward (ssh -L 1455:localhost:1455 user@host). API key auth (OPENAI_API_KEY in the environment) bills per-token against the developer's OpenAI API account and is intended for CI/CD boxes where browser flows are impossible. API-key users get delayed access to the newest model snapshots — Spark variants in particular stay ChatGPT-only for some weeks before reaching the API.

The agent harness¶

When codex starts, it does roughly the following:

Discovers AGENTS.md files (global + project tree, see below).
Loads ~/.codex/config.toml plus any project-scoped .codex/config.toml in a trusted project.
Establishes a sandbox (Apple Seatbelt on macOS, Landlock + seccomp on Linux) scoped to the current working directory.
Opens a TUI (terminal UI) with a prompt input, a streaming reasoning panel, and a diff viewer.
On each turn, sends the user message plus tool descriptions (shell, file edits, MCP tools) to the configured model and streams reasoning back. Tool calls are executed locally through the sandbox; results are fed back into the conversation until the model emits a final assistant message.

Sessions are persisted on disk and can be resumed with codex resume --last or codex resume <SESSION_ID>.

For scripting, the CLI exposes a non-interactive subcommand:

codex exec "Audit src/ for unhandled promise rejections and add tests" \
  --ask-for-approval never \
  --sandbox workspace-write

codex exec is the entry point for CI/CD pipelines, git hooks, and the experimental multi-agent orchestration mode that runs several Codex agents in parallel on isolated Git worktrees.

Inside the TUI, slash commands extend the agent: /model switches model and reasoning effort (low / medium / high / xhigh); /review invokes the code-review sub-agent on a diff or branch; /status shows the current sandbox, approval policy, model, and token usage; /skills loads named skill recipes (rolling out through Q1 2026); and image attachment is supported alongside text for screenshots and design specs.

Sandbox modes¶

The sandbox model and the approvals model are two orthogonal axes that together decide what the agent can actually do. Three sandbox levels are configurable per session:

Mode	File system	Network
`read-only`	Read anything, write nothing	Blocked
`workspace-write` (default)	Read anywhere; write only inside `cwd` and a small tmp scratch space	Blocked by default
`danger-full-access`	Full read/write everywhere	Unrestricted

On macOS the sandbox is enforced with Apple Seatbelt (sandbox-exec profiles); on Linux with Landlock for the file system and seccomp-bpf for syscall filtering. Network egress is blocked at sandbox setup for workspace-write unless the operator explicitly opts in:

codex -s workspace-write \
  -c 'sandbox_workspace_write.network_access=true' \
  "Install dependencies and run the test suite"

danger-full-access disables the file-system isolation profile entirely, removes the network egress block, and skips the seccomp syscall filter. It is intended for short, well-understood automation runs where the agent must, for example, write to /etc, mount a filesystem, or open arbitrary outbound sockets.

Approval policies¶

Independent of the sandbox, the approval policy decides when the agent must stop and ask before running a tool call:

untrusted — every shell command requires a prompt.
on-failure — auto-run, but escalate to a prompt if a command exits non-zero.
on-request — the model decides when to ask.
never — fully autonomous (the --full-auto flag is equivalent).

Combining --sandbox workspace-write with --ask-for-approval never is a popular daily-driver setup: the agent moves quickly inside the working directory but cannot reach the rest of the disk or the network. For genuinely unattended runs, codex --dangerously-bypass-approvals-and-sandbox exists, but the flag name is intentionally inconvenient.

A real `~/.codex/config.toml`¶

Persistent configuration lives in ~/.codex/config.toml. The same file is read by the CLI, the IDE extension, and the desktop app:

# ~/.codex/config.toml
model              = "gpt-5.3-codex"
approval_policy    = "on-failure"
sandbox_mode       = "workspace-write"

# Raise the AGENTS.md instruction budget from 32 KiB
project_doc_max_bytes        = 65536
project_doc_fallback_filenames = ["TEAM_GUIDE.md", ".agents.md"]

[sandbox_workspace_write]
network_access = true

# Named profiles activated with `codex --profile deep`
[profiles.deep]
model                  = "gpt-5.3-codex"
model_reasoning_effort = "high"

[profiles.fast]
model                  = "gpt-5.3-codex-spark"
model_reasoning_effort = "medium"

# MCP servers — STDIO transport
[mcp_servers.filesystem]
command = "npx"
args    = ["-y", "@modelcontextprotocol/server-filesystem", "/home/dev"]

# MCP servers — HTTP transport with bearer auth
[mcp_servers.github]
type    = "http"
url     = "https://api.githubcopilot.com/mcp/"
bearer_token_env_var = "GITHUB_MCP_TOKEN"

Profiles are a CLI-only feature today; the IDE extension does not yet honour --profile and requires a top-level model key.

Codex Cloud¶

Codex Cloud (also called Codex Web) is the hosted agent that runs at chatgpt.com/codex. It was the first surface OpenAI shipped on May 16, 2025, originally limited to Pro / Business / Enterprise plans and powered by codex-1. Plus and Edu access followed in June 2025; iOS and Slack triggers came later in 2025; an Android sidebar that can drive Cloud sessions was in development as of May 2026.

Each Codex Cloud task runs inside a dedicated, ephemeral sandboxed VM:

The user picks a connected GitHub repository, enters a natural-language task ("Add JWT refresh-token rotation to the auth service and update its tests"), and optionally pins a base branch.
The VM boots, clones the repo at that branch, runs the configured setup script (an environment-level command like pnpm install && pnpm build), and then hands control to the agent.
The agent reads AGENTS.md, plans, edits, runs tests, and iterates. It streams progress in the ChatGPT UI; the developer can ask follow-up questions, redirect it, or stop it mid-flight.
When finished, the agent produces a diff. The user can review it in-browser, edit it, and either apply it locally (via "Open in IDE" or codex pull <task-id> from the CLI) or open a pull request directly into the upstream repo via the GitHub integration.

Network egress inside Codex Cloud VMs was off by default in the May 2025 preview and made opt-in in June 2025 — the operator can configure an allowlist of domains the VM may reach during execution.

The triggers that can launch a Cloud task are deliberately diverse: the Codex tab in ChatGPT's left sidebar; ChatGPT iOS with a swipeable diff viewer; the IDE extension's "Delegate to Codex Cloud" affordance, which moves a long-running task off the developer's machine; CLI codex cloud subcommands that list, follow, and apply Cloud tasks without leaving the terminal; the Slack /codex slash command in Business and Enterprise workspaces; and GitHub PR-comment triggers such as @codex review, @codex implement, and @codex address comments that operate directly on a pull request.

Because Cloud, CLI, and IDE share the same task model and configuration, the bi-directional hand-off between local and Cloud is the architectural feature OpenAI emphasises most when contrasting Codex with single-surface competitors: the result of a Cloud run can be pulled back into the local working tree with a single command, and a local session can be promoted to a Cloud session when it needs longer compute or unattended execution.

The Codex IDE extension¶

The IDE extension is the surface most developers see first. It ships for VS Code (publisher openai, ID openai.chatgpt) and for the Code forks Cursor and Windsurf. A separate JetBrains plugin, integrated into JetBrains AI Chat, shipped in January 2026 for IntelliJ IDEA, PyCharm, WebStorm, Rider, and other 2025.3+ JetBrains IDEs.

Beneath the chat input the extension exposes a mode switcher with three positions:

Chat — read-only conversation. No file changes, no command execution. Used for architecture discussions, code reading, and planning before committing to edits.
Agent (default) — read-write inside the workspace, command execution gated by per-tool approval prompts with a diff preview. This maps to approval_policy = "on-failure" plus sandbox_mode = "workspace-write".
Agent (Full Access) — equivalent to --full-auto. Full filesystem and network access with no approval prompts. The extension shows a persistent warning banner whenever this mode is active.

Inline edits, a file context picker (@ followed by a fuzzy filename match), and a side-by-side diff view with hunk-level accept/reject buttons round out the experience. The same /review flow that runs in the CLI is also exposed as a sidebar action, scoping itself to uncommitted changes, a branch-vs-base diff, or a specific commit range and producing structured review comments — positioned as a pre-commit / pre-push gate.

JetBrains adds two auth options on top of the usual ChatGPT account and API key: a JetBrains AI subscription can pay for Codex usage directly, and free promotional credits were offered through JetBrains AI between January 2026 and credit exhaustion.

The Codex model family¶

Codex is model-portable — any reasoning-class OpenAI model can drive it — but in practice each generation has a flagship "Codex" fine-tune that ships alongside the agent. The lineage observable as of May 2026:

Model	Released	Context	Output	Notes
`codex-1`	May 16, 2025	192K	32K	o3-based; Cloud-only at launch
`codex-mini`	Jun 2025	200K	100K	Faster, cheaper sibling for CLI
`GPT-5-Codex`	Sep 2025	256K	64K	First GPT-5-based Codex tune
`GPT-5.2-Codex`	Dec 18, 2025	256K	64K	SWE-Bench-Pro record at launch
`GPT-5.3-Codex`	Feb 5, 2026	400K	128K	Current general default; 25% faster than 5.2
`GPT-5.3-Codex-Spark`	Feb 12, 2026	256K	64K	Cerebras-accelerated; text-only; Pro-only preview
`GPT-5.3-Codex-nano`	Q1 2026	128K	32K	Edge / low-quota tier; bundled with Free and Go plans
`GPT-5.4-Codex`	Apr 2026 (reported)	400K	128K	Plus/Pro flagship snapshot; specifics still rolling out

The -Spark variants are Cerebras-accelerated wafer-scale inference runs that trade off image input and tool breadth for raw token throughput, which makes them useful for tight inner loops (test-fix-rerun cycles). The -nano variants are smaller, cheaper distillations intended for autocomplete-like inner loops and low-quota Free / Go usage. The GPT-5.4-Codex snapshot is more recent than the primary research and is best treated as Plus/Pro flagship at the time of writing rather than fully spec-documented; the Plus pricing tier lists "GPT-5.5 / 5.4 / 5.3-Codex" as the eligible model basket, which is how the family appears in user-facing surfaces.

OpenAI describes GPT-5.3-Codex as the first model that was "instrumental in creating itself" — earlier versions were used internally to debug training, manage deployment, and analyse evaluations. The agentic tuning targets four benchmarks (SWE-Bench Pro, Terminal-Bench 2.0, OSWorld, GDPval) rather than HumanEval-style snippet completion, and the reinforcement-learning environment is a multi-turn shell + tool sandbox, not just code text. The user-visible effects are longer coherent task horizons (multi-hour runs without losing context), better tool-call accuracy (fewer wasted shell commands), and more conservative editing behaviour (smaller, more surgical patches by default).

Defaults per surface: the IDE extension and Cloud both default to the current flagship (gpt-5.3-codex or gpt-5.4-codex once rolled out to a given user); the CLI inherits the same default but is easy to override via model = "..." in config.toml; the -Spark and -nano variants are opt-in. ChatGPT-side selection respects the active plan's eligible-model basket.

`AGENTS.md`¶

AGENTS.md is the project-level instructions file Codex reads before every task. It started as a Codex-specific convention but has since been adopted as an open format by Cursor, Gemini CLI, Windsurf, Devin, Aider, Junie, GitHub Copilot's coding agent, Amp, Factory, Antigravity, and more than 60,000 open-source projects per the spec's own counter. The open specification is governed under the Linux Foundation's Agentic AI Foundation as of early 2026, which is why competing vendors can ship support without any contractual relationship with OpenAI.

Discovery rules (Codex implementation)¶

When Codex starts a session it builds an instruction chain by walking from a global scope down to the current working directory:

Global: ~/.codex/AGENTS.override.md if present, else ~/.codex/AGENTS.md.
Project: starting at the Git root, walk down toward the cwd. In each directory, prefer AGENTS.override.md, then AGENTS.md, then any fallback filenames listed in project_doc_fallback_filenames.
Merge: concatenate root → leaf, separated by blank lines. Files closer to the cwd appear later, which gives them effective precedence in the prompt.

Codex stops adding files once the combined size hits project_doc_max_bytes (32 KiB by default; raise it in config.toml for big monorepos). A common pitfall: a 30 KiB global file silently truncates the project-specific instructions developers actually care about. The fix is to keep the global file small (1–3 KB) and push detail into project and subdirectory files. AGENTS.override.md does not extend the sibling AGENTS.md — it replaces it at that scope level, useful for temporary local overrides without modifying the committed file.

An example `AGENTS.md`¶

# AGENTS.md

## Project overview
A TypeScript monorepo with three packages: api/, web/, and shared/.
PNPM workspaces; Turborepo for orchestration.

## Setup commands
- Install: `pnpm install`
- Dev: `pnpm dev`
- Test: `pnpm test`
- Lint: `pnpm lint`

## Code style
- TypeScript strict mode, single quotes, no semicolons.
- Prefer functional patterns; avoid mutable shared state.
- Public functions in `packages/shared/` require TSDoc.

## Testing instructions
- After any change in `packages/api/`, run `pnpm --filter api test`.
- After any change in `packages/web/`, run `pnpm --filter web test` and `pnpm --filter web e2e`.
- Do not commit if any test fails or any new TODO comment appears.

## Tools available
- `gh` CLI is authenticated; use it for PRs, not the web UI.
- A Postgres MCP server is configured; use it for schema lookups.

## PR conventions
- Title format: `[scope] short summary`
- Always include a Test Plan section in the PR body.
- Never bump a top-level dependency without explaining why in the PR body.

## Don't touch
- `infrastructure/terraform/` — managed by the platform team.
- Any file under `secrets/` or `.env.*`.

The largest known production deployment is the openai/openai monorepo itself, which reportedly carries 88 nested AGENTS.md files at the time of writing. Cross-vendor traction means the same file now also drives GitHub Copilot's coding agent, Google's Antigravity, Anthropic's Claude Code, Cursor's agent mode, and several smaller players — a rare instance of a vendor-originated convention surviving as a genuine cross-tool standard.

MCP integration¶

Codex consumes Model Context Protocol servers as a generic mechanism for adding tools — anything from a Figma reader to a Postgres client to an internal documentation index. Two transports are supported: STDIO (the server is a local process that the agent spawns over stdin/stdout) and Streamable HTTP (the server is a remote URL with bearer-token or OAuth authentication; codex mcp login <server-name> performs the OAuth dance for the latter).

# ~/.codex/config.toml — STDIO server
[mcp_servers.postgres]
command = "uvx"
args    = ["mcp-server-postgres", "postgres://localhost/mydb"]
env     = { PGPASSWORD = "secret" }

# Streamable HTTP server with OAuth
[mcp_servers.linear]
type = "http"
url  = "https://mcp.linear.app/sse"

The same MCP registry serves CLI, IDE extension, and desktop app. codex mcp list enumerates configured servers; codex mcp add and codex mcp remove mutate the registry without hand-editing TOML.

A serious security note: MCP STDIO servers run as arbitrary local processes with the agent's user privileges, and they are not sandboxed independently of the agent — they execute inside whatever sandbox the parent Codex session opened. Throughout 2025 the wider MCP ecosystem saw an explosion of community-published servers, with reports of on the order of 200,000 publicly indexed STDIO servers by late 2025; many were unsigned, unaudited, and capable of arbitrary code execution. Best practice as of 2026 is to pin MCP server versions, prefer HTTP transports with OAuth (which surface a consent screen), and audit STDIO server commands the way one audits a shell script.

A second gotcha: on macOS, exec-ing a tool through an MCP-served Codex into a workspace-write sandbox has known issues at sandbox setup (see GitHub issue #18243); teams running Codex itself as an MCP backend usually need danger-full-access or run on Linux.

Codex also has built-in tools that do not require MCP: web search (/web in the TUI, gated per plan), image generation, file editing, and shell execution. Custom slash commands and "skills" — bundled prompt + tool recipes — rolled out through Q1 2026.

Third-party-agent role inside GitHub Copilot¶

GitHub Copilot's "coding agent" framework opened up a third-party-agents extension point in late 2025 that lets non-Microsoft agents be assigned to issues and pull requests on github.com. Codex was one of the first integrations to ship: a maintainer can @codex an issue, or assign it as the agent of record, and the GitHub UI will hand the task off to an OpenAI-hosted Codex Cloud session that runs against the repo and opens a draft pull request when finished. The integration uses GitHub's standard agent-permissions model (read access to the repo, write access to a fork branch) and reports progress back via PR comments. For organisations standardising on GitHub Copilot Business or Enterprise as the seat-license vehicle, this is the path of least resistance for getting Codex into existing review workflows without provisioning ChatGPT seats for every developer.

Plan availability and pricing¶

Codex is included in every paid ChatGPT plan, plus a capped Free tier:

Plan	Price	Codex scope
Free	$0	Limited quick tasks, capped weekly usage; `-nano` model
Go	$8 / mo	Lightweight tasks; `-nano` and base 5.3-Codex
Plus	$20 / mo	A few focused coding sessions per week; all surfaces; GPT-5.4 / 5.3-Codex; cloud features (review, Slack)
Pro	from $100 / mo	5× / 10× / 20× Plus quota tiers; `-Spark` research preview; double-quota promo through May 31, 2026
Business / Edu / Enterprise	per-seat	Admin controls, audit logs, SAML, zero-data-retention, managed config; flexible API-credit extension
API key	pay-as-you-go	CLI / SDK / IDE only; no Cloud features (no GitHub auto-review, no Slack); delayed access to newest models

Rate limits reset on a rolling 5-hour window for ChatGPT-backed usage and on per-minute / per-day buckets for API-backed usage; exact numbers shift per tier. Pro subscribers who exhaust their quota can buy ChatGPT credits to keep working without changing plans.

API pricing for gpt-5.3-codex is $1.75 per million input tokens ($0.175 cached) and $14.00 per million output tokens, with proportional pricing for -mini and -nano. Through 2026 OpenAI is gradually shifting Cloud-side accounting from a per-session quota model toward a usage-based credit model that meters compute-hours and tokens, on the grounds that long-horizon agentic tasks vary too much in cost to be metered as discrete "sessions". The shift began on Plus / Pro plans in Q1 2026 and is expected to reach Business and Enterprise through 2026.

Enterprise admins get a separate panel: turning Codex Cloud on/off per workspace, requiring SSO, scoping the GitHub Connector, configuring data residency, and shipping events into the ChatGPT Compliance API. Codex inherits ChatGPT Enterprise's no-training-on-customer-data guarantee and the AES-256-at-rest / TLS 1.2+ in-transit baseline.

Security and sandbox model in depth¶

The defence-in-depth story for Codex on the local surfaces has three layers. First, the OS-level sandbox: Apple Seatbelt profiles on macOS confine the agent to a specific working directory and a small allowlist of system paths; Linux uses Landlock rules for the file system and seccomp-bpf for syscall filtering. Second, the network egress block: in workspace-write, outbound connections are dropped at sandbox setup time and only enabled by an explicit sandbox_workspace_write.network_access=true option. Third, the approval policy, which is independent of the sandbox and gates tool calls at the agent layer.

danger-full-access disables (1) the file-system isolation, (2) the seccomp filter, and (3) the network egress block in one step. It does not disable the model's own reasoning about what is safe, nor the per-call approval prompts unless --ask-for-approval never is also set. The flag is intentionally verbose and the mode is paired with a persistent UI banner so that long sessions cannot drift into it unnoticed.

Cloud-side, every task runs in a fresh ephemeral VM that is destroyed when the task ends. Network egress is off by default; operators configure an allowlist of permitted domains. Diffs are surfaced through the ChatGPT UI rather than applied to a long-lived environment, which keeps the blast radius of a misbehaving agent bounded to a single PR.

Timeline — May 2025 to May 2026¶

Compressed milestone log:

May 16, 2025 — Codex Cloud launches at chatgpt.com/codex for Pro / Business / Enterprise plans, powered by codex-1.
June 2025 — Plus tier access; opt-in internet during Cloud task execution; codex-mini released for CLI.
September 2025 — GPT-5-Codex released; the Rust rewrite of the CLI ships; first IDE extension preview for VS Code; AGENTS.md open specification published.
December 2025 — GPT-5.2-Codex released; cloud–local hand-off finalised; /review mode generally available.
January 2026 — Native JetBrains integration ships; GitHub PR-comment triggers (@codex review, @codex implement) launch; Codex becomes a third-party agent inside GitHub Copilot.
February 5, 2026 — GPT-5.3-Codex launches with a 400K-token context window and 25% latency improvement.
February 12, 2026 — GPT-5.3-Codex-Spark research preview, Cerebras-accelerated, Pro-only.
Q1 2026 — -nano variants for Free / Go; Codex Cloud reaches general availability; usage-based pricing pilot on Plus / Pro.
April 27–28, 2026 — Microsoft–OpenAI exclusivity period ends; Codex models reach Amazon Bedrock with full CLI / desktop / VS Code integration.
April 2026 — GPT-5.4-Codex Plus / Pro flagship snapshot reportedly rolls out.
May 2026 — Antigravity adopts AGENTS.md; Android remote-control of Codex sessions in preview; multi-agent worktree orchestration still flagged experimental.

Limitations and known sharp edges¶

Windows is second-class. The PowerShell-native binary works for simple cases, but WSL2 is the recommended path; some Landlock-equivalent isolation guarantees do not yet hold on Windows.
Sandbox setup is fragile on macOS when Codex is invoked through its own MCP server interface; danger-full-access is sometimes the only workable mode in that configuration (issue #18243).
Profiles are CLI-only. The IDE extension does not honour [profiles.*] blocks; switching models must be done manually in the sidebar.
AGENTS.md truncation is silent. Exceeding project_doc_max_bytes drops the deepest, most specific files first — exactly the opposite of what most developers expect.
Cross-tool AGENTS.md consumption is inconsistent. Although Copilot, Antigravity, Claude Code, Cursor, Aider, and others read the same file, the interpretation varies: some respect Don't touch directives strictly, others treat them as soft hints; some honour all nested files, others only the repo root.
Codex Cloud trust boundary. By default, Cloud VMs cannot reach the open internet; teams enabling network access must maintain the allowlist themselves. A misconfigured allowlist is a credible exfiltration risk.
Model context limits. Even 400K tokens is not infinite; very large monorepos still require AGENTS.md discipline, file context pickers, and (for some workflows) deliberate task decomposition.
Multi-agent orchestration is experimental. Running parallel Codex agents on isolated Git worktrees works, but conflict resolution at merge time is a manual step.
Model availability lags on the API. -Spark and the very newest snapshots reach ChatGPT-plan users weeks before API-key callers.
MCP supply chain. The 200,000-plus publicly indexed STDIO servers as of late 2025 are unsigned by default; teams should pin versions and audit commands.

Where Codex sits in the broader landscape¶

Codex sits in the same category as Anthropic's Claude Code, Google's Antigravity / Gemini CLI, GitHub Copilot's coding agent, Devin, and Cursor's agent mode. Two design choices distinguish it. First, multi-surface with shared state: the same configuration, the same AGENTS.md, the same MCP registry across CLI, IDE, Cloud, and mobile. Competing agents typically ship one or two surfaces and require separate setup per surface. Second, bi-directional cloud / local hand-off: long-running or unattended work can be promoted from the laptop to a managed VM and pulled back when done. Most competitors are either local-only (Claude Code, Gemini CLI) or cloud-only (Devin).

The trade-off is lock-in: every Codex surface relies on OpenAI's hosted services for the model and (for Cloud surfaces) the execution VMs. Teams with strict data-locality requirements often pair Codex CLI (local execution, API key) with self-hosted MCP servers, deliberately skipping the Cloud surface. The AGENTS.md standard, ironically, is also Codex's largest defection vector — a project that writes good AGENTS.md files is one that can switch agents with relatively little friction.

Sources¶

Changelog¶

2026-05-11 — Page created from OpenAI primary docs + Codex repo + secondary coverage (Type B, confidence 70)