Reflexion โ Prompting Guide & Examples
Reflexion is a technique where the AI agent reflects on its failures from previous attempts to improve future performance. After a task attempt fails or produces suboptimal results, the model generates a verbal reflection on what went wrong and uses that insight for the next attempt.
How It Works
Three-phase loop: (1) Attempt the task, (2) Evaluate the result (success/failure + specific feedback), (3) Generate a verbal reflection on what to do differently. The reflection is stored and included in the next attempt as learned experience.
When to Use
Use reflexion for iterative problem-solving, code generation with tests, tasks with clear success/failure criteria, and building agents that learn from mistakes. Especially powerful when combined with automated testing or evaluation.
Model-Specific Tips
ChatGPT / GPT-4
GPT-4 handles reflexion well in multi-turn conversations. Provide explicit failure feedback and ask for structured reflection before retrying.
Claude
Claude excels at honest reflection. Provide test results or failure data and ask Claude to diagnose issues. Claude's self-awareness makes reflections particularly useful.
Gemini
Gemini supports reflexion patterns. Use structured feedback and ask for explicit lessons learned before retrying the task.
Pros & Cons
Pros
- โ Learns from mistakes within a session
- โ Produces increasingly better results
- โ Works well with automated testing
- โ Foundation for self-improving agents
Cons
- โ Requires clear success/failure signals
- โ Multiple iterations increase cost
- โ Reflections can be superficial
- โ Needs orchestration logic for automation
Example Prompts
Write a function to solve this coding problem: [problem] Test results: 3/5 tests passed. Failed on edge cases: empty array, single element. Reflect: What went wrong? What specific changes would fix the failing tests? Now write an improved version incorporating your reflection.
Previous attempt to write a marketing email had 2% open rate (goal: 15%). Reflection prompt: Analyze why the previous email underperformed. Consider: subject line, preview text, personalization, value proposition, and CTA. Write specific lessons learned. Now draft a new email applying these lessons.
Your previous SQL query returned incorrect results for Q4 data. Reflect on what went wrong: Was it a join issue? Date filter? Aggregation error? Write a specific diagnosis. Now write the corrected query based on your reflection.