๐ฏ What You'll Learn Today
LangGraph Tutorial: Error Handling Patterns - Unit 2.3 Exercise 6
This tutorial is also available in Google Colab here or for download here
Joint Initiative: This tutorial is part of a collaboration between AI Product Engineer and the Nebius Academy.
This tutorial demonstrates comprehensive error handling strategies in LangGraph, including error collection, categorization, routing, and state management. You'll learn how to build robust error handling systems that gracefully handle failures while maintaining system stability.
Key Concepts Covered
- Error State Management
- Error Categorization
- Message-Based Error Handling
- Error Routing Patterns
- Error Analytics
from typing import Annotated, Any, TypedDict
#!pip install langchain-core
#!pip install langgraph
from langchain_core.messages import AIMessage, BaseMessage, SystemMessage, ToolMessage
from langgraph.graph.message import add_messages
Step 1: State Definition with Error Tracking
Define the state structure supporting comprehensive error handling.
Why This Matters
Error-aware state is crucial because:
- Enables systematic error tracking
- Maintains error history
- Supports error analytics
- Facilitates recovery strategies
Debug Tips
-
State Structure:
- Verify all error fields initialization
- Check type annotations
- Monitor state growth
-
Common Issues:
- Missing error fields
- Type mismatches
- Memory growth from error accumulation
class State(TypedDict):
"""State for error handling.
Attributes:
messages: Conversation history
pending_tools: Tools awaiting execution
results: Successful tool results
errors: Collected error messages
error_counts: Track error frequencies
"""
messages: Annotated[list[BaseMessage], add_messages]
pending_tools: list[dict]
results: dict[str, Any]
errors: dict[str, str]
error_counts: dict[str, int]
Step 2: Error Categorization Implementation
Implement error categorization for systematic handling.
Why This Matters
Error categorization is essential because
- Enables targeted handling strategies
- Facilitates error analytics
- Supports recovery patterns
- Improves error reporting
Debug Tips
-
Categorization Logic:
- Check string matching patterns
- Verify category assignments
- Monitor category distribution
-
Common Problems:
- Missed error patterns
- Incorrect categorization
- Case sensitivity issues
def categorize_error(error: str) -> str:
"""Categorize error message for better handling.
Args:
error: Error message string
Returns:
Error category identifier
Examples:
>>> categorize_error("Connection timeout after 30s")
'TIMEOUT'
>>> categorize_error("API rate limit exceeded")
'RATE_LIMIT'
"""
if "timeout" in error.lower():
return "TIMEOUT"
elif "rate limit" in error.lower():
return "RATE_LIMIT"
elif "permission" in error.lower():
return "PERMISSION"
return "UNKNOWN"
Step 3: Error Handler Implementation
Implement main error handling logic with message generation.
Why This Matters
Error handling implementation is critical because
- Ensures error capture
- Maintains system stability
- Provides error visibility
- Enables error analysis
Debug Tips
-
Handler Logic:
- Verify message generation
- Check error counting
- Monitor state updates
-
Common Issues:
- Message format errors
- Counter inconsistencies
- State corruption
def error_handler(state: State) -> State:
"""Handle errors from parallel execution.
Creates appropriate error messages and updates state.
Args:
state: Current state containing errors
Returns:
Updated state with error messages
"""
messages = list(state["messages"])
error_counts = state.get("error_counts", {})
if state["errors"]:
# Add system message for error tracking
messages.append(
SystemMessage(content=f"Processing {len(state['errors'])} errors")
)
# Process each error
for tool_id, error in state["errors"].items():
# Create tool message for the error
tool_message = ToolMessage(
content=error, tool_call_id=tool_id, name=tool_id.split("_")[0]
)
messages.append(tool_message)
# Track error frequency
error_type = categorize_error(error)
error_counts[error_type] = error_counts.get(error_type, 0) + 1
# Add AI summary of errors
error_summary = "Encountered the following errors:\n"
error_summary += "\n".join(
f"- {tool}: {error}" for tool, error in state["errors"].items()
)
if error_counts:
error_summary += "\n\nError distribution:\n"
error_summary += "\n".join(
f"- {type_}: {count} occurrences"
for type_, count in error_counts.items()
)
messages.append(AIMessage(content=error_summary))
return {
"messages": messages,
"pending_tools": [],
"results": state["results"],
"errors": state["errors"],
"error_counts": error_counts,
}
Step 4: Error Routing Implementation
Implement routing logic based on error analysis.
Why This Matters
Error routing is important because
- Enables specialized handling
- Supports error prioritization
- Facilitates recovery flows
- Maintains system control
Debug Tips
-
Routing Logic:
- Verify condition evaluation
- Check route selection
- Monitor routing patterns
-
Common Problems:
- Missing route conditions
- Incorrect priorities
- Dead-end routes
def route_results(state: State) -> str:
"""Route to appropriate handler based on errors.
Args:
state: Current state
Returns:
Name of next handler node
"""
if state.get("errors", {}):
# Count critical errors
critical_count = sum(
1 for error in state["errors"].values() if "critical" in error.lower()
)
# Route based on error severity
if critical_count > 0:
return "critical_error_handler"
return "error_handler"
return "result_aggregator"
Step 5: Demonstration Implementation
Example usage showing error handling patterns.
Why This Matters
Demonstration code is valuable because
- Shows practical usage patterns
- Illustrates error flows
- Demonstrates recovery paths
- Provides testing scenarios
Debug Tips
-
Demo Execution:
- Monitor error generation
- Verify handling paths
- Check error summaries
-
Common Issues:
- Invalid test states
- Missing error cases
- Incorrect assertions
def demonstrate_error_handling():
"""Demonstrate error handling with various scenarios."""
print("Error Handling Demonstration")
print("=" * 50)
# Test different error scenarios
test_cases = [
{
"name": "Mixed Errors",
"state": {
"messages": [],
"pending_tools": [],
"results": {"success_1": "OK"},
"errors": {
"timeout_1": "Tool timeout after 30s",
"rate_1": "API rate limit exceeded",
},
"error_counts": {},
},
},
{
"name": "Critical Error",
"state": {
"messages": [],
"pending_tools": [],
"results": {},
"errors": {"critical_1": "Critical: Database connection failed"},
"error_counts": {},
},
},
]
for case in test_cases:
print(f"\nTest Case: {case['name']}")
result = error_handler(case["state"])
print("\nMessages:")
for msg in result["messages"]:
prefix = type(msg).__name__.replace("Message", "")
print(f"{prefix}: {msg.content}")
if result.get("error_counts"):
print("\nError Counts:")
for error_type, count in result["error_counts"].items():
print(f"{error_type}: {count}")
print("\nRouting Decision:")
route = route_results(result)
print(f"Next node: {route}")
print("-" * 50)
Common Pitfalls
-
Incomplete Error Capture
- Missing error types
- Lost error context
- Inadequate categorization
-
Poor Error Routing
- Missing edge cases
- Incorrect prioritization
- Dead-end paths
-
State Management Issues
- Inconsistent error tracking
- Lost error history
- Counter corruption
-
Message Generation Problems
- Incorrect message types
- Missing error details
- Poor formatting
Key Takeaways
-
Systematic Error Handling
- Complete error capture
- Proper categorization
- Clear error routing
-
Error Analytics
- Error type tracking
- Frequency analysis
- Pattern detection
-
Message-Based Architecture
- Structured error reporting
- Clear error communication
- Maintainable system
Next Steps
-
Enhanced Error Recovery
- Add retry mechanisms
- Implement backoff strategies
- Create recovery paths
-
Advanced Analytics
- Add error metrics
- Create error dashboards
- Implement trend analysis
-
Custom Error Handling
- Add specialized handlers
- Create error hierarchies
- Implement recovery strategies
Expected Output
Error Handling Demonstration
Test Case: Mixed Errors
## Messages
System: Processing 2 errors
Tool: Tool timeout after 30s
Tool: API rate limit exceeded
AI: Encountered the following errors:
- timeout_1: Tool timeout after 30s
- rate_1: API rate limit exceeded
## Error distribution
- TIMEOUT: 1
- RATE_LIMIT: 1
## Routing Decision
## Next node: error_handler
Test Case: Critical Error
## Messages
System: Processing 1 error
Tool: Critical: Database connection failed
AI: Encountered the following errors:
- critical_1: Critical: Database connection failed
## Error distribution
- UNKNOWN: 1
## Routing Decision
## Next node: critical_error_handler
if __name__ == "__main__":
demonstrate_error_handling()
๐ฌ๐ง Chapter