The Agent Maturity Ladder

One Big Idea

The boardroom question of 2026 is not whether AI agents can act. They can — they plan, use tools, and execute across your systems. It is how far you should let them.

The moment an agent acts on its own, three things move with it: your capital, your customer, and your name on a regulator's letter. Yet most banks still treat autonomy as a single switch — on or off — and so they either over-reach or over-restrict. Both are expensive.

The better mental model is a ladder. This issue gives you the four rungs, the two rails that decide how high any decision may climb, and the gate every autonomous action must pass before it leaves human hands.

The Insight

Two cases frame the stakes.

Zillow wrote off three hundred and four million dollars when its home-buying algorithm kept acting and the humans paid to approve its bids simply nodded along. That is over-reach — and the sharp lesson is that a "human in the loop" is worthless if that human is aligned with the machine instead of independent of it. The AI worked perfectly; the oversight did not.

Separately, the SEC brought its first "AI-washing" cases — fining two investment advisers a combined four hundred thousand dollars for claiming AI sophistication they did not actually have — the gap between the marketing and the machine.

The lesson is not "use less AI," nor "always keep a human in the loop." It is that autonomy is a per-decision choice, not a per-bank setting. Most banks fail not because they deploy too much AI — they put it on the wrong rung.

Autonomy is a per-decision choice, not a per-bank setting. Most banks fail not because they deploy too much AI, but because they put it on the wrong rung.

Framework of the Week · The Agent Maturity Ladder

Four rungs — how much of the decision the human still makes:

L1 Assisted — AI drafts and recommends; the human decides everything.
L2 Augmented — AI proposes; the human approves every material action. Most banking lives here.
L3 Supervised autonomy — AI acts inside a bounded policy; the human audits samples and can override (human-on-the-loop).
L4 Full autonomy — the agent acts end to end. No major bank runs this on a customer or balance-sheet decision; in the EU and Vietnam the law does not permit it.

Grade each rung on three axes: reversibility (can it be undone? — the primary gate), containment (blast radius if wrong?), and audit trail (can a regulator replay who acted, on what evidence, and why?). When an action cannot be undone, leaders should keep it one rung lower.

Two rails decide how high a use case may climb, regardless of how capable the model is: materiality (how much is at stake) and customer proximity (how close to, and how vulnerable, the customer). The higher they run, the lower the defensible rung.

Clear all three gates to climb above supervision: Explainable, Overridable, Accountable. Fail any one and "the algorithm decided" becomes your only answer — which is no answer at all.

The full framework, with the scoring axes drawn out, lives in the Frameworks library.

Use Case · Where Banks Actually Sit

Map the framework onto practice and a clean pattern appears.

Where decisions are reversible and far from the customer, banks have already climbed. BNY runs over a hundred AI "digital workers" in post-trade and cut false positives by forty percent. State Street drove reconciliation exceptions from thirty-one thousand down to about four thousand. JPMorgan put a drafting platform in front of two hundred thousand staff. Fiserv onboards commercial loans straight into the core ledger — with a human approving the entry. And at Vietnam's top fifteen banks, AI now turns a four-hour credit file into a forty-five-minute memo while the analyst keeps the pen.

Where the stakes rise and the customer is close, the same banks stop at L2: credit underwriting keeps a human signature; a distressed customer escalates to a person.

The rule: earn L3 in the back and middle office; keep a human signature at the front, because that is exactly where the legal gate bites.

Risk Note

2026 rewrote the rulebook, and the pieces do not all agree. In the United States, SR 26-2 modernized model risk around materiality — but the OCC pointedly left generative and agentic AI outside its scope, a vacuum banks must fill themselves; and the CFPB insists a "black box" is no defense for a credit denial. The EU AI Act makes creditworthiness high-risk and, under Article 14, demands a human who can understand, intervene, and switch the system off. Vietnam's AI Law 134/2025 plus the State Bank's Circular 83 layer bank-specific controls, and a February draft would require telling customers before an AI interacts with them. Singapore's regulators wrote the most precise agentic-AI rulebook of all — define the checkpoints, and never let accountability diffuse across a crowd of agents.

The through-line across all four jurisdictions: explainability and accountability are non-negotiable, and a rubber-stamp — clearing AML alerts that are wrong more than ninety percent of the time — does not count as oversight. Design the human in where their judgment changes the outcome — not everywhere, and not nowhere.

Latest Video

This week's video — How Far Should a Bank Let AI Act on Its Own? — walks the four rungs, the two rails, the gate, and the real cases behind each, on screen.

Watch: youtu.be/xyhPi2q9k3M

Pair it with the five-page playbook in the Frameworks library — the operating model and the use-case map, ready to take to your next risk committee.

Hit reply and tell me which rung your bank is stuck on today — and whether it is capability or courage holding it there. I read every response. Forward this to a banking executive weighing how much to let AI decide.

Was this forwarded to you? Subscribe to The AI Architect Letter — free, every Saturday.

Minh Tran · AI Business Architect · LinkedIn · Workshops & advisory: aibusinessarchitect.ai