Where agentic AI actually ships work.

Concrete shapes of work where Codex, Claude Code, MCP and friends earn their place in the team. Each use case names the tool focus, the workflow shape, and the outcome to expect.

Refactor

Large refactor with full test coverage

A multi-file refactor, like extracting a service or renaming a domain concept, where the test suite is the contract and the agent does the mechanical work.

Shape of the work: Read-only review of the area, agreed scope, agent runs in a working branch with the test suite as a gate. Diff reviewed in passes, not all at once.
What changes for the team: Two-week refactors compressed to two days of agent work plus one day of review. The test coverage is the artefact that makes it safe.

Knowledge

Repository question-and-answer for new joiners

Onboarding a new joiner without burning a senior engineer for a week. The agent reads the repo and answers structural questions with grounded references.

Shape of the work: Codex or Claude Code with read-only access to the repository. A short written guide on the kinds of questions that work, the kinds that do not, and the right escalation point.
What changes for the team: New joiners reach the first useful PR a week earlier. Senior time is reserved for the questions the agent cannot answer well.

Migration

Deterministic migration across the codebase

A version bump, an API rename, a dependency swap — the kind of change that touches dozens of files and where each touch follows the same rule.

Shape of the work: Rule written down first as plain English plus example. Agent applies the rule file by file, with the test suite as the gate and a small per-file checklist.
What changes for the team: Migrations that would have stalled in a backlog get done in days. The written rule is the durable artefact for the next migration.

Eval

Codebase-specific eval suite for agent use

A small set of repeatable tasks with known good outputs. Every model upgrade, prompt change, or workflow tweak is run against the same set before it touches production.

Shape of the work: Three to five tasks chosen with the team. Each has a written prompt, a known good diff or output, and an automated comparator. The set lives in the repo.
What changes for the team: Model and prompt changes stop being a vibes call. The team has a short list of tasks it knows the workflow should pass.

Tooling

MCP-mediated internal tool

An internal tool — a search, a deploy command, a private dataset query — exposed to the agent through a custom MCP server with proper scopes.

Shape of the work: API design pass, MCP server skeleton, scopes wired through identity, observability on call patterns. The agent calls the tool on real tasks within a week.
What changes for the team: Internal knowledge becomes reachable in agent flows without sprawl. The server is reviewed monthly against the call log.

Review

Code review triage for the senior reviewer

A senior reviewer using an agent as a first pass on long PRs — surfacing the parts that need eyes, summarising the parts that do not, never deciding alone.

Shape of the work: Agent runs over the PR diff with a tight prompt. Output is a structured triage note, not an approval. The senior reviewer reads the note then the diff.
What changes for the team: Long PRs get reviewed in less time without losing the senior judgement. The triage notes themselves become a corpus the team learns from.

Book a workshop Read the field notes