Three Ways Codex Controls a Computer: Differences, Scenarios, and How to Choose article cover

Three Ways Codex Controls a Computer: Differences, Scenarios, and How to Choose

A clear breakdown of Codex's Computer Use, Chrome extension, and built-in browser—covering mechanisms, speed, use cases, and limitations—so you can quickly pick the right tool for the job.

Ethan Lin

Ethan Lin

technical_editor

Published Jun 17, 20265 min read

Codex can now control a computer in three distinct ways: Computer Use, the Chrome extension, and the built-in browser. Each targets different scenarios. Once you understand the differences, you'll know exactly when to use which.

Why Three Methods?

It comes down to the variety of applications and environments you work in. Some tasks demand manipulating traditional desktop apps with no API, others require preserving your login state, and some need deep integration with a local development environment. These three methods aren't redundant—they're deliberately designed for different jobs.

Let's unpack how each one works, its strengths, limitations, and typical use cases.

1. Computer Use: Broadest Coverage, Slowest Speed

How It Works

Computer Use lets Codex see the screen, click the mouse, and type on the keyboard—just like a human would—to operate any graphical application on your computer. It relies on visual recognition and UI-element targeting instead of structured data behind the scenes.

What It Can Do

You can have it change songs in Spotify, adjust project settings in Xcode, modify system preferences, or even control an iOS simulator through iPhone Mirroring. In theory, any app you can manually interact with, Codex can attempt to operate.

The Cost

It's slow. Every step involves:

  • Capturing a screenshot
  • Analyzing the interface layout
  • Locating the target button or input field
  • Executing the action
  • Waiting for the UI to respond
  • Verifying the result

Compared to structured plugins that call APIs directly, the latency is noticeably higher. But its real value is handling apps that have no API—a blind spot no other method can cover.

Platform Differences Matter

  • macOS: Codex operates silently in the background without disturbing your work. You keep using your computer; it acts like an invisible second user.
  • Windows: Codex must take over the foreground, meaning you can't use the machine during operation. This is an OS-level limitation with no current workaround.

A Real-World Example

Team member Jason shared a scenario: his package was stolen, and Amazon customer service had an estimated wait time of 25 minutes. He had Codex check the chat window every five minutes, and once an agent appeared, switch to checking every minute and then automatically complete the refund process. He took a shower, and the refund was already in his account when he returned.

When to Choose It

Use Computer Use when you need to operate a desktop app and no dedicated plugin or API exists. It's the fallback of last resort—but also the most universal.

2. Chrome Extension: Carries Your Login State

How It Works

The Chrome extension lets Codex directly use your authenticated browser session, including cookies, account state, and open tabs. It understands browser-level context, not just screen coordinates.

Core Advantage: Identity Awareness

Many web tools (Gmail, LinkedIn, Salesforce, internal admin panels) require login to function. Computer Use can operate a browser, but it's clicking from the outside based on visuals. The Chrome extension works from the inside, reading the DOM structure, form fields, and page state.

It can also act across multiple tabs simultaneously: read data in one tab, compare it in another, and submit a form in a third.

Security Boundaries to Note

Websites treat Codex's clicks and form submissions as your own actions. That means:

  • Research, browsing, and drafting content can be safely automated.
  • Sending messages, public posting, and completing payments are best left to manual confirmation.

Jason's approach: have Codex use Chrome daily to check Twitter DMs, browse relevant news, and collect feedback—saving valuable content to a local file—but never send a single message on his behalf.

Typical Scenarios

  • SaaS tools that require login
  • Cross-site automation workflows
  • Browsing tasks that need your personal account context

When to Choose It

This is the first choice when the task depends on your login state and happens mostly through a web interface.

3. Built-In Browser: An Isolated Sandbox for Developers

How It Works

The built-in browser lives inside a Codex conversation thread, where you and Codex share the same rendered page. It carries no login state or cookies—it's a completely isolated environment.

Why "Stateless" Is an Advantage

For development tasks, isolation is a feature. Its sweet spots include:

  • Local dev servers (e.g., localhost:3000)
  • File previews
  • Public web pages
  • Responsive layout checks
  • Reproducing visual bugs

Codex can modify code, refresh the page, take a screenshot, tweak again—creating a tight feedback loop without interference from caches or login states.

Killer Feature: Page Annotations

You can click an element directly on the page and leave a comment, such as:

  • "This z-index is inverted"
  • "Button spacing is too tight"
  • "The color doesn't match the design spec"

Codex receives the screenshot and element context, modifies the code, and reopens the same page for your next round of annotations. This cuts out the back-and-forth of screenshots and text descriptions.

When to Choose It

Choose this when doing front-end development, UI adjustments, or needing a clean environment to preview pages.

Comparison at a Glance

MethodTargetLogin StateSpeedPlatform LimitationTypical Scenarios
Computer UseDesktop apps, graphical interfacesUser must log in manuallySlowMac: background / Windows: foregroundLegacy software, cross-app tasks
Chrome extensionWeb apps requiring loginPreserves browser loginFastNoneSaaS tools, cross-site workflows
Built-in browserLocal dev pages, public web pagesStatelessFastNoneFrontend dev, UI checks

Which One to Pick? A Simple Decision Tree

  1. Does the task involve a local desktop application (non-browser)?
    → Use Computer Use (prefer macOS; note foreground restrictions on Windows)

  2. Does the task need your login state to access a website?
    → Use the Chrome extension

  3. Are you doing front-end development and need to interact with localhost or file previews?
    → Use the built-in browser

Additionally, if a dedicated plugin or MCP (Model Context Protocol) tool exists for the task, use that structured option first. Visual control is a last resort, not the default.

Summary

Codex's three computer-control methods aren't tech for tech's sake—they respond to real differences in workflows. Know their boundaries, and you'll place automation where it belongs: let Computer Use tackle stubborn legacy desktop apps, use the Chrome extension to chain your online identities, and lean on the built-in browser to accelerate frontend iterations.

The guiding principle: use an API whenever possible, then structured plugins, and only then visual control.

Recommended articles