From routing graphs to GOAP#

A Tier 1 tutorial that takes a hand-wired LangGraph incident-response workflow and rewrites it as a four-action LangGOAP graph. The same business logic, expressed declaratively, absorbs runtime disruptions that the routing-graph version would require code changes to handle.

The scenario and action specs match examples/screencast/incident/ and are exercised end-to-end by tests/integration/test_screencast_incident.py.

1. The baseline — a hand-wired routing graph#

The conventional LangGraph implementation needs three conditional edges to express the recovery escalation: restart → rollback → analyze + hotfix → failover. Every branch is hand-coded.

from typing import Any, Literal
from langgraph.graph import END, START, StateGraph
from typing_extensions import TypedDict

class IncidentState(TypedDict, total=False):
    status: str
    recovery_method: str
    disruptions: dict[str, bool]

def restart(state):
    if state.get("disruptions", {}).get("restart_fails"):
        return {"status": "still_down"}
    return {"status": "service_healthy", "recovery_method": "restart"}

def rollback(state):
    if state.get("disruptions", {}).get("rollback_fails"):
        return {"status": "still_down"}
    return {"status": "service_healthy", "recovery_method": "rollback"}

def analyze(state):
    return {"status": "root_cause_found"}

def hotfix(state):
    return {"status": "service_healthy", "recovery_method": "hotfix"}

def failover(state):
    return {"status": "service_healthy", "recovery_method": "failover"}

def notify(state):
    return {"status": "resolved",
            "recovery_method": state.get("recovery_method", "unknown")}

def after_restart(s) -> Literal["notify", "rollback"]:
    return "notify" if s["status"] == "service_healthy" else "rollback"
def after_rollback(s) -> Literal["notify", "analyze"]:
    return "notify" if s["status"] == "service_healthy" else "analyze"
def after_hotfix(s) -> Literal["notify", "failover"]:
    return "notify" if s["status"] == "service_healthy" else "failover"

b = StateGraph(IncidentState)
for n, fn in [("restart", restart), ("rollback", rollback),
              ("analyze", analyze), ("hotfix", hotfix),
              ("failover", failover), ("notify", notify)]:
    b.add_node(n, fn)
b.add_edge(START, "restart")
b.add_conditional_edges("restart", after_restart)
b.add_conditional_edges("rollback", after_rollback)
b.add_edge("analyze", "hotfix")
b.add_conditional_edges("hotfix", after_hotfix)
b.add_edge("failover", "notify")
b.add_edge("notify", END)
baseline = b.compile()
result = baseline.invoke({"status": "incident_detected", "disruptions": {}})
(result["status"], result["recovery_method"])
/Users/brian.sam-bodden/Code/langgoap/.venv/lib/python3.12/site-packages/langgraph/cache/base/__init__.py:8: LangChainPendingDeprecationWarning: The default value of `allowed_objects` will change in a future version. Pass an explicit value (e.g., allowed_objects='messages' or allowed_objects='core') to suppress this warning.
  from langgraph.checkpoint.serde.jsonplus import JsonPlusSerializer
('resolved', 'restart')

Routing functions: 3. Conditional edges: 3. Adding a new recovery strategy (a canary rollback, a circuit breaker) means editing routing logic, adding a node, updating edges, and adjusting type hints. The flow is implicit in the graph structure.

2. The same logic as GOAP — actions and effects only#

Each recovery step becomes an ActionSpec with declarative preconditions and effects. A cost numeric encodes priority (cheaper = preferred). There is no routing function and no conditional edge in this version — the planner discovers the chain at runtime.

from typing import Any
from langgoap.actions import ActionSpec

def restart_service(ws):
    if ws.get("restart_ineffective"):
        raise RuntimeError("OOM from memory leak")
    return {"service_healthy": True, "recovery_method": "restart"}

def rollback_deployment(ws):
    if ws.get("rollback_blocked"):
        raise RuntimeError("irreversible DB migration")
    return {"service_healthy": True, "recovery_method": "rollback"}

def analyze_error_logs(ws):
    return {"root_cause_hypothesized": True,
            "root_cause": "memory leak in RequestHandler cache"}

def apply_hotfix(ws):
    return {"service_healthy": True, "recovery_method": "hotfix"}

def failover_to_backup(ws):
    return {"service_healthy": True, "recovery_method": "failover"}

def notify_stakeholders(ws):
    return {"stakeholders_notified": True}

incident_actions = [
    ActionSpec(name="restart_service", cost=1.0,
               preconditions={"incident_detected": True},
               effects={"service_healthy": True},
               execute=restart_service),
    ActionSpec(name="rollback_deployment", cost=2.0,
               preconditions={"incident_detected": True},
               effects={"service_healthy": True},
               execute=rollback_deployment),
    ActionSpec(name="analyze_error_logs", cost=1.0,
               preconditions={"incident_detected": True},
               effects={"root_cause_hypothesized": True},
               execute=analyze_error_logs),
    ActionSpec(name="apply_hotfix", cost=3.0,
               preconditions={"root_cause_hypothesized": True},
               effects={"service_healthy": True},
               execute=apply_hotfix),
    ActionSpec(name="failover_to_backup", cost=6.0,
               preconditions={"incident_detected": True},
               effects={"service_healthy": True},
               execute=failover_to_backup),
    ActionSpec(name="notify_stakeholders", cost=1.0,
               preconditions={"service_healthy": True},
               effects={"stakeholders_notified": True},
               execute=notify_stakeholders),
]
[a.name for a in incident_actions]
['restart_service',
 'rollback_deployment',
 'analyze_error_logs',
 'apply_hotfix',
 'failover_to_backup',
 'notify_stakeholders']

3. Happy path — planner picks the cheapest chain#

With no disruptions, restart is the cheapest action that achieves service_healthy=True. The planner picks it, then chains in notify_stakeholders.

from langgoap.goals import GoalSpec
from langgoap.graph.builder import GoapGraph

goal = GoalSpec(conditions={
    "service_healthy": True,
    "stakeholders_notified": True,
})
result = GoapGraph(incident_actions).invoke(
    goal=goal,
    world_state={"incident_detected": True},
)
path = [h.action_name for h in result["execution_history"] if h.success]
(result["status"], result["world_state"]["recovery_method"], path)
('goal_achieved', 'restart', ['restart_service', 'notify_stakeholders'])

4. Disruption — restart is ineffective; planner escalates#

Set restart_ineffective=True in world state. The restart_service action raises at runtime; the observer blacklists it and the planner picks the next-cheapest path that still achieves the goal. No routing code changed.

result = GoapGraph(incident_actions).invoke(
    goal=goal,
    world_state={"incident_detected": True, "restart_ineffective": True},
)
path = [h.action_name for h in result["execution_history"] if h.success]
(result["status"], result["world_state"]["recovery_method"],
 result["replan_count"], path)
Action 'restart_service' failed: OOM from memory leak
('goal_achieved',
 'rollback',
 1,
 ['rollback_deployment', 'notify_stakeholders'])

5. Cascading disruption — both restart and rollback fail#

The planner falls through to analyze_error_logsapply_hotfix. Still no routing logic to edit. Adding a new recovery strategy is one more ActionSpec.

result = GoapGraph(incident_actions).invoke(
    goal=goal,
    world_state={
        "incident_detected": True,
        "restart_ineffective": True,
        "rollback_blocked": True,
    },
)
path = [h.action_name for h in result["execution_history"] if h.success]
(result["status"], result["world_state"]["recovery_method"], path)
Action 'restart_service' failed: OOM from memory leak
Action 'rollback_deployment' failed: irreversible DB migration
('goal_achieved',
 'hotfix',
 ['analyze_error_logs', 'apply_hotfix', 'notify_stakeholders'])

What you traded#

  • Three routing functions + three conditional edges → zero.

  • Implicit fallback order encoded in graph topology → explicit cost values on each action.

  • Adding a new recovery option means editing routing code → one more ActionSpec.

Next steps#

  • Read the side-by-side scripts in examples/screencast/incident/ to see the contrast on the file-system level.

  • See examples/tutorials/supply_chain_disruption_mediator.ipynb for a richer recovery story with stuck handlers and goal relaxation.