AI Agent Prompts

Expert prompts for designing, building, and deploying autonomous AI agents — from single-agent task runners to multi-agent collaborative systems.

LangChainCrewAIAutoGenMulti-Agent

Agent Design & Architecture

System Design

Single-agent task system design

Design an AI agent to accomplish: [describe the goal — e.g., "research a company and produce an investment memo"].

For this agent, specify:
1. System prompt: role, constraints, output format, and what to do when stuck
2. Tool set: exactly which tools it needs and with what permissions (read-only vs write)
3. Memory strategy: what context it needs to retain across steps
4. Termination criteria: how does it know it's done?
5. Guard rails: what actions should require human approval before executing?
6. Failure modes: what are the top 3 ways this agent could go wrong, and how to mitigate each?
System Prompt

Agent system prompt template

Write a system prompt for an AI agent with this role: [describe role — e.g., "customer support escalation agent"].

The system prompt must include:
- Role definition and primary objective
- The step-by-step process the agent should follow
- What tools are available and when to use each
- Output format for each type of task
- What to do when the agent is uncertain or lacks information
- Explicit prohibitions (what the agent must never do)
- How to escalate or ask for clarification

Keep the system prompt under 500 words. Test it by asking: would a capable human follow these instructions and produce the right result?
Tool Design

Custom tool specification

I'm building a custom tool for an AI agent called [tool name].
What it does: [description]
Inputs available: [list data sources]
Output: [what the tool should return]

Write:
1. The tool function signature with typed parameters and return type
2. The tool description string (this is what the AI reads to decide when to use it — make it precise about when to call vs. not call this tool)
3. Input validation and error handling
4. A test case showing correct usage
5. Common misuse patterns and how the description prevents them

Framework: [LangChain / OpenAI function calling / Anthropic tool use]
Evaluation

Agent evaluation framework

Build an evaluation framework for an AI agent that [describe task].
The eval should test:
1. Task completion rate: define what "complete" means for this task
2. Output quality: what makes an output good vs acceptable vs failure?
3. Efficiency: what's the acceptable range of steps/tokens to complete the task?
4. Safety: what outputs or actions would constitute a safety failure?
5. Edge cases: list 5 inputs that would stress-test the agent

Write 10 evaluation test cases with: input, expected output behaviour, and pass/fail criteria.
Include a scoring rubric for human raters to assess borderline outputs.

Multi-Agent Systems

CrewAI

Multi-agent crew for content production

Design a CrewAI multi-agent system for [content production task — e.g., "weekly competitive intelligence report"].

Define each agent with:
- Role name and backstory (2-3 sentences)
- Goal (what this agent is responsible for)
- Tools available
- Expected output

Suggested crew: Researcher → Analyst → Writer → Editor
For each handoff, specify: what information passes between agents, and what the receiving agent does with it.
Include the crew goal, task definitions, and the process (sequential vs hierarchical).
AutoGen

AutoGen conversation pattern

Design an AutoGen multi-agent conversation for: [task — e.g., "review a pull request and write test cases for it"].

Define:
- Agent 1: [name, system message, model]
- Agent 2: [name, system message, model]
- (Additional agents if needed)
- The initiating message that starts the conversation
- Termination condition (what message or state ends the conversation)
- Human proxy: when should a human be able to intervene?
- Max conversation rounds

Show the Python code to configure and run this conversation.
LangGraph

Stateful workflow graph

Design a LangGraph workflow for [task — e.g., "customer complaint resolution"].

The graph should have these nodes: [list 3-5 nodes — e.g., classify_complaint, look_up_order, draft_response, escalate, send_response]
And these edges:
- Normal flow: [node] → [node]
- Conditional routing: after [node], route to [A] if [condition], else [B]
- Looping: [node] can return to [earlier node] when [condition]

For each node, specify:
- What it does
- Its inputs (from state)
- What it adds to state
- Possible next nodes

Include the Python code structure for the StateGraph.
Orchestration

Agent orchestration and human oversight

Design an orchestration system for agents running [process — e.g., "automated invoice processing pipeline"] that handles:
1. Task queue: how incoming tasks are queued, prioritised, and assigned
2. State tracking: how to track which agent is working on what, with what status
3. Human-in-the-loop gates: which steps require human approval and how that's signalled
4. Error recovery: if an agent fails partway through, how does the system resume?
5. Audit log: what events to log and in what format for compliance
6. Monitoring: what alerts should fire and when (stuck agent, high error rate, SLA breach)

Suggest the technology stack and sketch the data model.

Research & Information Agents

Research Agent

Deep research agent prompt

You are a research agent. Your task: produce a comprehensive research brief on [topic].
Process:
1. Identify 5-7 key sub-questions that together answer the main topic
2. For each sub-question: search for relevant sources, extract key information, note source credibility
3. Synthesise findings across sources, noting where sources agree or conflict
4. Identify what is well-established vs uncertain or contested
5. Produce a structured brief: executive summary, key findings by theme, evidence quality assessment, gaps in current knowledge, and recommended further reading
Cite your sources. Flag any claims you cannot verify. Aim for depth over breadth.
Competitive Intel

Competitive intelligence agent

Act as a competitive intelligence agent. Research [competitor company] and produce an intelligence brief.
Search for and synthesise:
1. Recent product launches, updates, or announcements (last 6 months)
2. Pricing changes or new pricing tiers
3. Key hires or leadership changes
4. Funding, revenue, or growth signals
5. Customer sentiment: reviews, support complaints, community mentions
6. Strategic direction: blog posts, conference talks, job postings that signal roadmap
Format: one-page brief with date-stamped findings. Note confidence level for each finding (confirmed / likely / rumour). Flag the 2-3 most strategically significant findings.
Due Diligence

Company due diligence agent

You are a due diligence research agent. Research [company name] for a potential [investment / partnership / acquisition].
Investigate:
1. Business model and revenue sources
2. Market position and competitive landscape
3. Leadership team background and track record
4. Financial signals (funding history, revenue estimates, burn rate if available)
5. Technology stack and IP (patents, open source contributions)
6. Customer base: key clients, concentration risk, churn signals
7. Red flags: legal issues, employee reviews, regulatory actions, negative press
Produce a structured report with confidence levels and source citations for each finding.
News Monitor

Industry news monitoring agent

Set up an industry monitoring agent for [industry/topic].
The agent should:
1. Track news about: [list 5-7 specific topics, companies, or trends to monitor]
2. For each item found: summarise in 2-3 sentences, assess significance (High/Medium/Low), tag by category
3. Filter out: [specify what to exclude — e.g., press releases, opinion pieces without data, duplicate coverage]
4. Output format: daily/weekly digest with items ranked by significance
5. Highlight: any item that represents a major competitive threat, market shift, or regulatory change

Run this as a scheduled workflow. What sources should it monitor and how should it handle conflicting reports?

Automation & Workflow Agents

Data Processing

Automated data processing agent

Design an agent that processes [type of data — e.g., "incoming customer feedback from email and Typeform"] automatically.
The agent should:
1. Ingest data from [sources]
2. Classify each item by [categories — e.g., feature request, bug report, compliment, complaint]
3. Extract structured fields: sentiment, urgency, product area affected, customer tier
4. Route to the appropriate team or system based on classification
5. Generate a weekly summary with trends and volume by category

Specify: tool set needed, processing logic for each step, output format, and how errors or ambiguous cases are handled.
Email Agent

Email triage and response agent

Design an email triage agent for [use case — e.g., "sales inquiry inbox"].
The agent should:
1. Read and classify incoming emails by type: [list categories]
2. For standard enquiries: draft a personalised response using templates + context from CRM
3. For complex or high-value enquiries: flag for human review with a suggested response draft
4. For spam or irrelevant mail: archive without response
5. Log all actions to [CRM / spreadsheet / database]

Define: the classification rules, draft response quality bar, escalation criteria, and how the agent should handle ambiguous emails it's unsure how to classify.
Report Generator

Automated reporting agent

Build an agent that generates a [weekly / monthly] [report type — e.g., "sales performance report"] automatically.
Data sources: [list systems — CRM, analytics, spreadsheet, database]
Report structure: [list sections — e.g., "executive summary, KPIs vs target, top performers, risks, recommended actions"]
For each section, specify:
- What data to pull and from where
- How to calculate / aggregate it
- What narrative or interpretation the agent should add (not just numbers)
- What anomalies or thresholds should trigger a special callout
Output: [format — PDF, HTML email, Slack message, Google Doc]
Schedule: [timing and recipients]
QA Agent

Quality assurance and review agent

Design a QA agent that reviews [type of output — e.g., "blog posts before publication" / "code pull requests" / "customer proposals"].
The agent should check each item against:
1. [Quality criterion 1 — e.g., "factual accuracy: are all claims verifiable?"]
2. [Quality criterion 2 — e.g., "brand voice: does the tone match our guidelines?"]
3. [Quality criterion 3 — e.g., "completeness: are all required sections present?"]
4. [Quality criterion 4 — e.g., "formatting: does it follow the template?"]
Output: a structured review with: pass/fail per criterion, specific issues with location (paragraph, line, section), and suggested fixes for each issue.
Escalation: if [X criteria] fail, block publication and notify [person/channel].

Safety & Governance

Safety

Agent safety checklist

Review my AI agent design for safety risks and suggest mitigations.
Agent purpose: [describe what the agent does]
Tools it has access to: [list all tools and their permissions]
Actions it can take autonomously: [list all actions without human approval]
Actions that require human approval: [list gated actions]

Please assess:
1. Worst-case failure mode: what's the most harmful thing this agent could do if it malfunctions?
2. Permission minimisation: are any tool permissions broader than strictly necessary?
3. Reversibility: which actions are irreversible and do they have appropriate gates?
4. Prompt injection risk: how could a malicious input manipulate the agent?
5. Audit trail: is there sufficient logging to reconstruct what happened if something goes wrong?
Governance

Agent governance framework

Design a governance framework for deploying AI agents in [organisation type — e.g., "a regulated financial services company"].
The framework should cover:
1. Agent inventory: how to document and register all deployed agents
2. Risk classification: how to categorise agents by risk level (Low / Medium / High) with criteria for each
3. Approval process: what review is required before deploying each risk class?
4. Ongoing monitoring: what metrics and alerts to maintain per agent
5. Incident response: what to do when an agent takes an unexpected or harmful action
6. Compliance: how to document agent behaviour for regulatory audit
Keep it practical — this should be implementable by a team of 5 people, not require a compliance army.
Testing

Agent adversarial testing

Generate adversarial test cases for an AI agent that [describe agent purpose].
The agent has these tools: [list tools]
And these constraints in its system prompt: [paste key constraints]

Create test inputs designed to:
1. Jailbreak the agent into ignoring its constraints
2. Prompt injection: embed instructions in tool outputs or external data that try to redirect the agent
3. Resource abuse: inputs that could cause the agent to loop, make excessive API calls, or use extreme amounts of tokens
4. Social engineering: inputs that claim special authority or permissions
5. Boundary testing: inputs at the edges of what the agent is designed to handle

For each test: input, expected safe response, and what a failure would look like.
Audit

Agent decision audit trail design

Design an audit logging system for an AI agent that [describe agent].
The audit log should capture:
1. For every agent run: start time, end time, initiating user/system, goal statement
2. For every tool call: tool name, inputs (sanitised — no credentials), outputs (truncated if large), timestamp, latency
3. For every decision point: the agent's reasoning, options considered, and path taken
4. For every action: what was done, what was changed, and whether human approval was obtained
5. Errors and retries: every failure with error details and recovery action
6. Final output: the agent's conclusion and confidence level

Storage format, retention policy, and how to query the audit trail for a specific run or date range.

Prompt Engineering for Agents

System Prompt

ReAct agent system prompt

Write a ReAct (Reason + Act) system prompt for an agent that [describe task].
The prompt must instruct the agent to:
1. Think: reason out loud about what to do before taking an action
2. Act: call a specific tool with specific inputs
3. Observe: interpret the tool result before deciding next steps
4. Repeat: loop until the goal is achieved or it determines it cannot proceed

Available tools: [list tools]
Termination: [what signals the task is done]
Constraints: [list any "never do" rules]
Output format: [what the final answer should look like]

Include example reasoning traces showing the Thought → Action → Observation → Thought pattern.
Memory

Long-term memory system prompt

Write a system prompt for an agent that has access to a memory tool with these operations:
- memory.save(key, value, description) — save a fact for later
- memory.search(query) — retrieve relevant memories by semantic search
- memory.forget(key) — remove a specific memory

The prompt should instruct the agent to:
1. Proactively save information that will be useful in future interactions (user preferences, past decisions, important context)
2. Search memory at the start of each task to retrieve relevant prior context
3. Update memories when new information supersedes old
4. Use memory efficiently — save facts, not full conversation transcripts
Include examples of what to save and what not to save.
Tool Calling

Robust tool calling instructions

Write the tool-calling instructions section of a system prompt for an agent with these tools:
[list tools with one-line descriptions]

The instructions should cover:
1. When to call a tool vs reason from existing knowledge
2. How to handle tool errors (retry, try alternative tool, ask for help)
3. How to avoid unnecessary tool calls (don't call search for things you already know)
4. How to sequence tool calls when multiple are needed
5. What to do when a tool returns unexpected or empty results
6. How to cite tool results in the final response

Include a decision tree: "Before calling a tool, ask yourself: [questions]"  
Structured Output

Force consistent JSON output

Write a prompt addition that reliably makes an agent output structured JSON in this schema:
[paste your desired JSON schema]

The addition should:
1. Clearly specify the exact JSON structure expected
2. Include field descriptions and types for each key
3. Give an example of a correctly formatted output
4. Handle edge cases: what to put when a field is unknown or not applicable
5. Include an instruction to output ONLY the JSON object with no surrounding prose
6. Specify how to handle arrays: empty [] vs null vs omit entirely
Test it by showing a sample input and the expected JSON output.

Frequently Asked Questions

AI Agent Maturity Levels

Level 1: Assisted

AI helps a human complete a task. Human stays in the loop for every decision. Example: Copilot suggesting code, ChatGPT drafting an email.

Level 2: Semi-Autonomous

AI completes multi-step tasks independently but with human checkpoints at key decisions. Example: Research agent that flags findings for review.

Level 3: Fully Autonomous

AI executes entire workflows end-to-end, only escalating for explicit exceptions. Requires robust safety guardrails and extensive testing before deploying.

Start at Level 1 or 2 for any new agent. Move to Level 3 only after validating behaviour across hundreds of real tasks.

Pro tip: Before deploying any agent at Level 2 or above, run it on at least 50 real or realistic test cases and review every output manually. Agents tend to fail at edge cases that humans consider obvious. Document the failure modes you find — they will inform your guardrails, fallback logic, and escalation triggers.

Agent Evaluation Checklist

Before going live with any AI agent, run through this checklist to catch the most common failure modes.

Does the agent handle empty or null inputs without crashing?
Does it produce consistent output for the same input?
Is the output format predictable enough for downstream parsing?
Does it fail gracefully when an external tool is unavailable?
Is there a hard limit on the number of steps or tool calls?
Does it log enough context to debug failures in production?
Are there guardrails preventing harmful or off-topic actions?
Has it been tested on adversarial inputs (jailbreak attempts)?

Popular AI Agent Frameworks at a Glance

LangGraph

Graph-based orchestration with stateful cycles. Best for complex multi-step workflows with loops and conditional branching.

CrewAI

Role-based multi-agent framework. Each agent has a role, goal, and backstory. Best for collaborative agent teams with defined responsibilities.

AutoGen

Conversation-driven agents that collaborate via messages. Best for code generation tasks and human-in-the-loop workflows.

OpenAI Assistants API

Managed agent with persistent threads, file search, and code interpreter. Best for production use cases that need OpenAI-native tool access.

LangChain Agents

Flexible agent with a wide library of tools and integrations. Best for rapid prototyping and RAG-augmented agents.

Anthropic Tool Use

Structured tool calling via Claude API. Best for precise, controllable agents where output reliability is critical.

Related Prompts