From routing graphs to GOAP#
A Tier 1 tutorial that takes a hand-wired LangGraph incident-response workflow and rewrites it as a four-action LangGOAP graph. The same business logic, expressed declaratively, absorbs runtime disruptions that the routing-graph version would require code changes to handle.
The scenario and action specs match examples/screencast/incident/ and are exercised end-to-end by tests/integration/test_screencast_incident.py.
1. The baseline — a hand-wired routing graph#
The conventional LangGraph implementation needs three conditional edges to express the recovery escalation: restart → rollback → analyze + hotfix → failover. Every branch is hand-coded.
from typing import Any, Literal
from langgraph.graph import END, START, StateGraph
from typing_extensions import TypedDict
class IncidentState(TypedDict, total=False):
status: str
recovery_method: str
disruptions: dict[str, bool]
def restart(state):
if state.get("disruptions", {}).get("restart_fails"):
return {"status": "still_down"}
return {"status": "service_healthy", "recovery_method": "restart"}
def rollback(state):
if state.get("disruptions", {}).get("rollback_fails"):
return {"status": "still_down"}
return {"status": "service_healthy", "recovery_method": "rollback"}
def analyze(state):
return {"status": "root_cause_found"}
def hotfix(state):
return {"status": "service_healthy", "recovery_method": "hotfix"}
def failover(state):
return {"status": "service_healthy", "recovery_method": "failover"}
def notify(state):
return {"status": "resolved",
"recovery_method": state.get("recovery_method", "unknown")}
def after_restart(s) -> Literal["notify", "rollback"]:
return "notify" if s["status"] == "service_healthy" else "rollback"
def after_rollback(s) -> Literal["notify", "analyze"]:
return "notify" if s["status"] == "service_healthy" else "analyze"
def after_hotfix(s) -> Literal["notify", "failover"]:
return "notify" if s["status"] == "service_healthy" else "failover"
b = StateGraph(IncidentState)
for n, fn in [("restart", restart), ("rollback", rollback),
("analyze", analyze), ("hotfix", hotfix),
("failover", failover), ("notify", notify)]:
b.add_node(n, fn)
b.add_edge(START, "restart")
b.add_conditional_edges("restart", after_restart)
b.add_conditional_edges("rollback", after_rollback)
b.add_edge("analyze", "hotfix")
b.add_conditional_edges("hotfix", after_hotfix)
b.add_edge("failover", "notify")
b.add_edge("notify", END)
baseline = b.compile()
result = baseline.invoke({"status": "incident_detected", "disruptions": {}})
(result["status"], result["recovery_method"])
/Users/brian.sam-bodden/Code/langgoap/.venv/lib/python3.12/site-packages/langgraph/cache/base/__init__.py:8: LangChainPendingDeprecationWarning: The default value of `allowed_objects` will change in a future version. Pass an explicit value (e.g., allowed_objects='messages' or allowed_objects='core') to suppress this warning.
from langgraph.checkpoint.serde.jsonplus import JsonPlusSerializer
('resolved', 'restart')
Routing functions: 3. Conditional edges: 3. Adding a new recovery strategy (a canary rollback, a circuit breaker) means editing routing logic, adding a node, updating edges, and adjusting type hints. The flow is implicit in the graph structure.
2. The same logic as GOAP — actions and effects only#
Each recovery step becomes an ActionSpec with declarative preconditions and effects. A cost numeric encodes priority (cheaper = preferred). There is no routing function and no conditional edge in this version — the planner discovers the chain at runtime.
from typing import Any
from langgoap.actions import ActionSpec
def restart_service(ws):
if ws.get("restart_ineffective"):
raise RuntimeError("OOM from memory leak")
return {"service_healthy": True, "recovery_method": "restart"}
def rollback_deployment(ws):
if ws.get("rollback_blocked"):
raise RuntimeError("irreversible DB migration")
return {"service_healthy": True, "recovery_method": "rollback"}
def analyze_error_logs(ws):
return {"root_cause_hypothesized": True,
"root_cause": "memory leak in RequestHandler cache"}
def apply_hotfix(ws):
return {"service_healthy": True, "recovery_method": "hotfix"}
def failover_to_backup(ws):
return {"service_healthy": True, "recovery_method": "failover"}
def notify_stakeholders(ws):
return {"stakeholders_notified": True}
incident_actions = [
ActionSpec(name="restart_service", cost=1.0,
preconditions={"incident_detected": True},
effects={"service_healthy": True},
execute=restart_service),
ActionSpec(name="rollback_deployment", cost=2.0,
preconditions={"incident_detected": True},
effects={"service_healthy": True},
execute=rollback_deployment),
ActionSpec(name="analyze_error_logs", cost=1.0,
preconditions={"incident_detected": True},
effects={"root_cause_hypothesized": True},
execute=analyze_error_logs),
ActionSpec(name="apply_hotfix", cost=3.0,
preconditions={"root_cause_hypothesized": True},
effects={"service_healthy": True},
execute=apply_hotfix),
ActionSpec(name="failover_to_backup", cost=6.0,
preconditions={"incident_detected": True},
effects={"service_healthy": True},
execute=failover_to_backup),
ActionSpec(name="notify_stakeholders", cost=1.0,
preconditions={"service_healthy": True},
effects={"stakeholders_notified": True},
execute=notify_stakeholders),
]
[a.name for a in incident_actions]
['restart_service',
'rollback_deployment',
'analyze_error_logs',
'apply_hotfix',
'failover_to_backup',
'notify_stakeholders']
3. Happy path — planner picks the cheapest chain#
With no disruptions, restart is the cheapest action that achieves service_healthy=True. The planner picks it, then chains in notify_stakeholders.
from langgoap.goals import GoalSpec
from langgoap.graph.builder import GoapGraph
goal = GoalSpec(conditions={
"service_healthy": True,
"stakeholders_notified": True,
})
result = GoapGraph(incident_actions).invoke(
goal=goal,
world_state={"incident_detected": True},
)
path = [h.action_name for h in result["execution_history"] if h.success]
(result["status"], result["world_state"]["recovery_method"], path)
('goal_achieved', 'restart', ['restart_service', 'notify_stakeholders'])
4. Disruption — restart is ineffective; planner escalates#
Set restart_ineffective=True in world state. The restart_service action raises at runtime; the observer blacklists it and the planner picks the next-cheapest path that still achieves the goal. No routing code changed.
result = GoapGraph(incident_actions).invoke(
goal=goal,
world_state={"incident_detected": True, "restart_ineffective": True},
)
path = [h.action_name for h in result["execution_history"] if h.success]
(result["status"], result["world_state"]["recovery_method"],
result["replan_count"], path)
Action 'restart_service' failed: OOM from memory leak
('goal_achieved',
'rollback',
1,
['rollback_deployment', 'notify_stakeholders'])
5. Cascading disruption — both restart and rollback fail#
The planner falls through to analyze_error_logs → apply_hotfix. Still no routing logic to edit. Adding a new recovery strategy is one more ActionSpec.
result = GoapGraph(incident_actions).invoke(
goal=goal,
world_state={
"incident_detected": True,
"restart_ineffective": True,
"rollback_blocked": True,
},
)
path = [h.action_name for h in result["execution_history"] if h.success]
(result["status"], result["world_state"]["recovery_method"], path)
Action 'restart_service' failed: OOM from memory leak
Action 'rollback_deployment' failed: irreversible DB migration
('goal_achieved',
'hotfix',
['analyze_error_logs', 'apply_hotfix', 'notify_stakeholders'])
What you traded#
Three routing functions + three conditional edges → zero.
Implicit fallback order encoded in graph topology → explicit
costvalues on each action.Adding a new recovery option means editing routing code → one more
ActionSpec.
Next steps#
Read the side-by-side scripts in
examples/screencast/incident/to see the contrast on the file-system level.See
examples/tutorials/supply_chain_disruption_mediator.ipynbfor a richer recovery story with stuck handlers and goal relaxation.