AI Tool Use Patterns: Safety Guide

From AISApedia, the AI skills & terms encyclopedia

Tool use patterns define how AI agents interact with external systems — databases, APIs, file systems, and web services — in a controlled and predictable manner. Effective patterns emphasise explicit planning before action, continuous verification during execution, and safe rollback when things go wrong, distinguishing reliable agents from those that cause cascading failures.

What is the plan-execute-verify pattern?

The most fundamental tool use pattern requires the agent to articulate what it intends to do before doing it. Rather than immediately executing a database query or API call, the agent first generates a plan — akin to a workflow teardown — deciding which tools it will use, in what order, and what outcome it expects. This plan is either shown to the user for approval or validated against a set of safety rules before execution begins.

The verification step happens after each tool call, not just at the end. The agent checks whether the result matches its expectations — did the API return a success code, does the data look reasonable, did the file write complete without error? If any verification fails, the agent stops and reports rather than proceeding with invalid state. This prevents the most common agent failure: confidently building on a broken foundation.

This pattern mirrors how experienced engineers approach production systems. An operator who runs a database migration does not execute all statements at once and hope; they run each step, verify the outcome, and only proceed when the state is confirmed correct. AI agents in agentic workflows interacting with real systems deserve the same discipline because they are operating on real data with real consequences.

The planning step also creates an audit trail. When an agent explains what it intends to do before doing it, the logged plan provides context for understanding what went wrong if the execution fails. Without this trail, debugging agent failures requires reconstructing intent from outcomes — a much harder problem.

What goes wrong when agents skip safety checks?

The most common failure is confident action on incorrect assumptions. An agent tasked with 'clean up duplicate records' might delete entries that appear identical by one field but differ in ways that matter. Without a confirmation step showing the user which records it considers duplicates and why, the agent optimises for speed at the cost of data integrity.

Cascading failures are the more severe risk. An agent that modifies a configuration file based on outdated information might break a service, which causes a monitoring alert, which triggers another automated response, which compounds the original error. Each step is locally reasonable; the cascade is globally catastrophic. These failures are particularly difficult to recover from because each step changes the system state in ways that make rolling back the previous step harder.

Permission escalation is a subtler concern. An agent with write access to a database might, through a series of individually authorised operations, produce a state that no single operation would have been authorised to create. The human-in-the-loop pattern addresses this by requiring approval at decision boundaries, not just at the individual tool level.

Side effects in external systems are the hardest to reverse. An agent that sends an email, publishes a post, or triggers a webhook has created real-world consequences that cannot be undone by reverting code or data. Tool use patterns must distinguish between read-only operations (safe to automate) and write operations with external side effects — a distinction that AI workflow audits should document (requiring either human approval or very robust safety constraints).

How do teams implement guardrails for tool-using agents?

The first layer is capability scoping: the agent should only have access to the tools it needs for its current task, with the minimum required permissions. An agent that analyses data does not need write access. An agent that updates records does not need delete access. This follows the principle of least privilege — a core pattern in guardrails libraries — and limits the blast radius of any single agent error.

The second layer is output validation. After every tool call, the result is checked against expected patterns. A database query that returns zero rows when the agent expected data triggers a pause rather than proceeding with empty context. An API call that returns an error code is handled explicitly rather than silently ignored. These validation checks should be implemented in code, not left to the agent's judgment.

The third layer is reversibility. Before making any destructive change, the agent creates a restore point — a backup of the current state, a snapshot of the file being modified, or a logged record of what is about to change. If the operation fails or produces unexpected results, the system can return to the previous state. Agents that cannot undo their actions should require explicit human approval before proceeding.

The fourth layer is rate limiting and circuit breakers. An agent in an error loop might repeatedly call the same failing API, accumulating costs or triggering rate limits. Circuit breakers that halt execution after a configurable number of failures prevent this pattern. Similarly, spending limits on API calls and token usage prevent a malfunctioning agent from running up unbounded costs.

Why should read and write tools be designed differently?

Read-only tools — database queries, file reads, API lookups — carry minimal risk because they do not change system state. If a read operation returns wrong or irrelevant data, the worst outcome is a poor analysis. The system itself is unaffected. These tools can be made available to agents with relatively loose guardrails, enabling fluid exploration and investigation — the foundation of safe multi-tool workflows.

Write tools — database inserts, file modifications, API mutations, message sends — carry real risk because they change state and may not be reversible. Each write operation should require either explicit confirmation from the user or validation against a strict policy. The asymmetry between read and write risk should be reflected in the agent's tool configuration.

A practical design principle is to default all new tools to read-only access and require explicit promotion to write access. This ensures that expanding an agent's capabilities is a deliberate decision rather than an oversight. The promotion step should include documentation of what the write tool does, what could go wrong, and what safeguards are in place — creating the audit trail that responsible deployment requires.

Logging requirements differ between read and write operations as well. Write operations should log the before-state, the intended change, the after-state, and the identity of the agent and user who authorised the action. Read operations need minimal logging beyond performance metrics. This asymmetric logging ensures that the audit trail is comprehensive for operations that change state while avoiding excessive log volume from harmless read activity.

Try this yourself

Enable MCP in Claude or use ChatGPT's code interpreter with a real spreadsheet from work. Ask it to clean the data but insist it explains each step before executing — observe how this prevents cascading errors.

Real-world example

Reckless agent: Deletes 'duplicate' rows that were actually important variations, no undo. Surgical agent: 'I found 47 potential duplicates. Here are 3 examples [shows data]. Should I: (a) remove exact matches only, (b) mark for review, or (c) create backup first?' Crisis averted through systematic verification.