Tutorial Image: LangGraph Tutorial: Error Handling Patterns - Unit 2.3 Exercise 6

LangGraph Tutorial: Error Handling Patterns - Unit 2.3 Exercise 6

Learn how to implement robust error handling patterns in LangGraph. This tutorial covers error categorization, routing, and analytics to build resilient systems. Explore strategies for error tracking, message-based reporting, and systematic recovery, ensuring stability and transparency in multi-tool workflows.

๐ŸŽฏ What You'll Learn Today

LangGraph Tutorial: Error Handling Patterns - Unit 2.3 Exercise 6

This tutorial is also available in Google Colab here or for download here

Joint Initiative: This tutorial is part of a collaboration between AI Product Engineer and the Nebius Academy.

This tutorial demonstrates comprehensive error handling strategies in LangGraph, including error collection, categorization, routing, and state management. You'll learn how to build robust error handling systems that gracefully handle failures while maintaining system stability.

Key Concepts Covered

  1. Error State Management
  2. Error Categorization
  3. Message-Based Error Handling
  4. Error Routing Patterns
  5. Error Analytics
from typing import Annotated, Any, TypedDict
#!pip install langchain-core
#!pip install langgraph
from langchain_core.messages import AIMessage, BaseMessage, SystemMessage, ToolMessage
from langgraph.graph.message import add_messages

Step 1: State Definition with Error Tracking

Define the state structure supporting comprehensive error handling.

Why This Matters

Error-aware state is crucial because:

  1. Enables systematic error tracking
  2. Maintains error history
  3. Supports error analytics
  4. Facilitates recovery strategies

Debug Tips

  1. State Structure:

    • Verify all error fields initialization
    • Check type annotations
    • Monitor state growth
  2. Common Issues:

    • Missing error fields
    • Type mismatches
    • Memory growth from error accumulation
class State(TypedDict):
    """State for error handling.

    Attributes:
        messages: Conversation history
        pending_tools: Tools awaiting execution
        results: Successful tool results
        errors: Collected error messages
        error_counts: Track error frequencies
    """

    messages: Annotated[list[BaseMessage], add_messages]
    pending_tools: list[dict]
    results: dict[str, Any]
    errors: dict[str, str]
    error_counts: dict[str, int]

Step 2: Error Categorization Implementation

Implement error categorization for systematic handling.

Why This Matters

Error categorization is essential because

  1. Enables targeted handling strategies
  2. Facilitates error analytics
  3. Supports recovery patterns
  4. Improves error reporting

Debug Tips

  1. Categorization Logic:

    • Check string matching patterns
    • Verify category assignments
    • Monitor category distribution
  2. Common Problems:

    • Missed error patterns
    • Incorrect categorization
    • Case sensitivity issues
def categorize_error(error: str) -> str:
    """Categorize error message for better handling.

    Args:
        error: Error message string

    Returns:
        Error category identifier

    Examples:
        >>> categorize_error("Connection timeout after 30s")
        'TIMEOUT'
        >>> categorize_error("API rate limit exceeded")
        'RATE_LIMIT'
    """
    if "timeout" in error.lower():
        return "TIMEOUT"
    elif "rate limit" in error.lower():
        return "RATE_LIMIT"
    elif "permission" in error.lower():
        return "PERMISSION"
    return "UNKNOWN"

Step 3: Error Handler Implementation

Implement main error handling logic with message generation.

Why This Matters

Error handling implementation is critical because

  1. Ensures error capture
  2. Maintains system stability
  3. Provides error visibility
  4. Enables error analysis

Debug Tips

  1. Handler Logic:

    • Verify message generation
    • Check error counting
    • Monitor state updates
  2. Common Issues:

    • Message format errors
    • Counter inconsistencies
    • State corruption
def error_handler(state: State) -> State:
    """Handle errors from parallel execution.

    Creates appropriate error messages and updates state.

    Args:
        state: Current state containing errors

    Returns:
        Updated state with error messages
    """
    messages = list(state["messages"])
    error_counts = state.get("error_counts", {})

    if state["errors"]:
        # Add system message for error tracking
        messages.append(
            SystemMessage(content=f"Processing {len(state['errors'])} errors")
        )

        # Process each error
        for tool_id, error in state["errors"].items():
            # Create tool message for the error
            tool_message = ToolMessage(
                content=error, tool_call_id=tool_id, name=tool_id.split("_")[0]
            )
            messages.append(tool_message)

            # Track error frequency
            error_type = categorize_error(error)
            error_counts[error_type] = error_counts.get(error_type, 0) + 1

        # Add AI summary of errors
        error_summary = "Encountered the following errors:\n"
        error_summary += "\n".join(
            f"- {tool}: {error}" for tool, error in state["errors"].items()
        )

        if error_counts:
            error_summary += "\n\nError distribution:\n"
            error_summary += "\n".join(
                f"- {type_}: {count} occurrences"
                for type_, count in error_counts.items()
            )

        messages.append(AIMessage(content=error_summary))

    return {
        "messages": messages,
        "pending_tools": [],
        "results": state["results"],
        "errors": state["errors"],
        "error_counts": error_counts,
    }

Step 4: Error Routing Implementation

Implement routing logic based on error analysis.

Why This Matters

Error routing is important because

  1. Enables specialized handling
  2. Supports error prioritization
  3. Facilitates recovery flows
  4. Maintains system control

Debug Tips

  1. Routing Logic:

    • Verify condition evaluation
    • Check route selection
    • Monitor routing patterns
  2. Common Problems:

    • Missing route conditions
    • Incorrect priorities
    • Dead-end routes
def route_results(state: State) -> str:
    """Route to appropriate handler based on errors.

    Args:
        state: Current state

    Returns:
        Name of next handler node
    """
    if state.get("errors", {}):
        # Count critical errors
        critical_count = sum(
            1 for error in state["errors"].values() if "critical" in error.lower()
        )

        # Route based on error severity
        if critical_count > 0:
            return "critical_error_handler"
        return "error_handler"

    return "result_aggregator"

Step 5: Demonstration Implementation

Example usage showing error handling patterns.

Why This Matters

Demonstration code is valuable because

  1. Shows practical usage patterns
  2. Illustrates error flows
  3. Demonstrates recovery paths
  4. Provides testing scenarios

Debug Tips

  1. Demo Execution:

    • Monitor error generation
    • Verify handling paths
    • Check error summaries
  2. Common Issues:

    • Invalid test states
    • Missing error cases
    • Incorrect assertions
def demonstrate_error_handling():
    """Demonstrate error handling with various scenarios."""
    print("Error Handling Demonstration")
    print("=" * 50)

    # Test different error scenarios
    test_cases = [
        {
            "name": "Mixed Errors",
            "state": {
                "messages": [],
                "pending_tools": [],
                "results": {"success_1": "OK"},
                "errors": {
                    "timeout_1": "Tool timeout after 30s",
                    "rate_1": "API rate limit exceeded",
                },
                "error_counts": {},
            },
        },
        {
            "name": "Critical Error",
            "state": {
                "messages": [],
                "pending_tools": [],
                "results": {},
                "errors": {"critical_1": "Critical: Database connection failed"},
                "error_counts": {},
            },
        },
    ]

    for case in test_cases:
        print(f"\nTest Case: {case['name']}")
        result = error_handler(case["state"])

        print("\nMessages:")
        for msg in result["messages"]:
            prefix = type(msg).__name__.replace("Message", "")
            print(f"{prefix}: {msg.content}")

        if result.get("error_counts"):
            print("\nError Counts:")
            for error_type, count in result["error_counts"].items():
                print(f"{error_type}: {count}")

        print("\nRouting Decision:")
        route = route_results(result)
        print(f"Next node: {route}")
        print("-" * 50)

Common Pitfalls

  1. Incomplete Error Capture

    • Missing error types
    • Lost error context
    • Inadequate categorization
  2. Poor Error Routing

    • Missing edge cases
    • Incorrect prioritization
    • Dead-end paths
  3. State Management Issues

    • Inconsistent error tracking
    • Lost error history
    • Counter corruption
  4. Message Generation Problems

    • Incorrect message types
    • Missing error details
    • Poor formatting

Key Takeaways

  1. Systematic Error Handling

    • Complete error capture
    • Proper categorization
    • Clear error routing
  2. Error Analytics

    • Error type tracking
    • Frequency analysis
    • Pattern detection
  3. Message-Based Architecture

    • Structured error reporting
    • Clear error communication
    • Maintainable system

Next Steps

  1. Enhanced Error Recovery

    • Add retry mechanisms
    • Implement backoff strategies
    • Create recovery paths
  2. Advanced Analytics

    • Add error metrics
    • Create error dashboards
    • Implement trend analysis
  3. Custom Error Handling

    • Add specialized handlers
    • Create error hierarchies
    • Implement recovery strategies

Expected Output

Error Handling Demonstration

Test Case: Mixed Errors

## Messages

System: Processing 2 errors
Tool: Tool timeout after 30s
Tool: API rate limit exceeded
AI: Encountered the following errors:

- timeout_1: Tool timeout after 30s
- rate_1: API rate limit exceeded

## Error distribution

- TIMEOUT: 1
- RATE_LIMIT: 1

## Routing Decision

## Next node: error_handler

Test Case: Critical Error

## Messages

System: Processing 1 error
Tool: Critical: Database connection failed
AI: Encountered the following errors:

- critical_1: Critical: Database connection failed

## Error distribution

- UNKNOWN: 1

## Routing Decision

## Next node: critical_error_handler
if __name__ == "__main__":
    demonstrate_error_handling()

Rod Rivera

๐Ÿ‡ฌ๐Ÿ‡ง Chapter