Agent Reliability Patterns

Three techniques for ensuring AI agents complete multi-step workflows without dropping tasks.

Discovered while fixing the calendar-audit agent, which would complete Steps 1-5 but exit early, skipping Steps 6-8 (writing logs, updating status files, updating the audit log). The problem was worse when a step involved extended back-and-forth with the user - the agent interpreted finishing a conversation-heavy step as finishing the entire task.

1. TodoWrite Checklist (Awareness Layer)

What it does: At the very start of the workflow, the agent creates a visible task list using TodoWrite with all steps listed. It marks each step in_progress/completed as it goes.

Why it works: The checklist persists visually throughout the session. Even after a long back-and-forth exchange, the agent can see “5 of 8 complete” and knows there’s more to do. It’s harder to forget remaining tasks when they’re staring at you.

Where to put it: First instruction in Step 1 of the agent definition, before any actual work begins.

Example:

Use TodoWrite to create the following task list before doing anything else:
1. "Get current time & check last audit" -> in_progress
2. "Fetch calendar events" -> pending
3. "Check existing logs" -> pending
...
Mark each as in_progress when starting and completed when done.

Limitation: The agent still has to remember to check the list. It’s a visual aid, not enforcement.

2. Inline Reminders at Failure Points (Prevention Layer)

What it does: A bold, impossible-to-miss warning placed at the exact point in the workflow where the agent tends to exit early.

Why it works: When you know where the agent derails, you can place a targeted intervention right there. It’s like a road sign before a dangerous curve - you put it where drivers actually need it, not at the start of the highway.

Where to put it: At the end of the step that precedes the failure. If the agent tends to quit after Step 5, the warning goes at the bottom of Step 5’s instructions.

Example:

STOP - DO NOT END THE SESSION HERE.
You are at Step 5 of 8. Three critical steps remain:
-> Step 6: Write entries to session logs
-> Step 7: Update status files
-> Step 8: Update audit log & show summary
Rescheduling is NOT the end of the workflow. Check your TodoWrite list now.

Limitation: Still relies on the agent reading its own instructions. If the context window is very long, the agent may not revisit this section.

3. SubagentStop Hook (Enforcement Layer)

What it does: A shell script that runs automatically when the agent tries to stop. It checks whether the expected outputs exist (e.g., was the audit log updated today?). If not, it blocks the agent from stopping and tells it to complete the remaining steps.

Why it works: This is external enforcement - it doesn’t depend on the agent remembering anything. Even if the agent completely ignores the TodoWrite list and the inline warning, the hook catches it at the door and sends it back to finish.

How it works:

Agent tries to stop
Claude Code fires the SubagentStop hook
Hook script checks for a completion marker (e.g., today’s date in the audit log)
If marker missing: returns {"decision": "block", "reason": "..."} - agent can’t stop
If marker present: exits 0 - agent is allowed to stop
Important: checks stop_hook_active to prevent infinite loops (if the agent still can’t complete after being sent back, let it stop on the second attempt)

Configuration: Lives in .claude/settings.json under hooks.SubagentStop with a matcher scoped to the specific agent.

Example hook script logic:

# Check if audit log was updated today (Step 8 = last step)
if grep -q "$TODAY" "$AUDIT_LOG"; then
  exit 0  # All good, let the agent stop
fi
# Block the agent
echo '{"decision":"block","reason":"Steps 6-8 not completed."}'

Limitation: The completion check needs to be something concrete and verifiable (file modified, entry exists). It can’t check nuanced quality.

Defense in Depth

These three techniques work at different layers and should be used together:

TodoWrite  ->  Helps the agent stay on track (self-awareness)
Inline     ->  Warns at the known danger point (prevention)
Hook       ->  Catches it if both fail (enforcement)

The first two are free (just text in the agent definition). The hook is a small shell script with minimal overhead.

When to Apply These Patterns

Use all three when an agent:

Has 5+ steps in its workflow
Involves user interaction mid-workflow (which creates false “completion” signals)
Writes to files as its final steps (gives the hook something to verify)
Has previously failed to complete all steps

For simpler agents (2-3 steps, no user interaction), the TodoWrite alone may be sufficient.

The broader pattern: when a team keeps skipping the boring part, the answer isn’t to lecture them. It’s to make the boring step visible, put a reminder where attention drops, and build a gate that won’t let you skip it.