Action QoS and retry policy#

ActionQos gives an action an in-executor retry loop with exponential backoff, so transient failures (rate limits, timeouts, flaky network calls) are absorbed before the planner sees them. Without QoS, the executor reports the first exception as action_failed and the graph falls back to its replan loop. Backed by tests/integration/test_action_qos_loop.py.

1. A flaky action that succeeds on the third try#

The closure raises TimeoutError on the first two attempts and returns successfully on the third. We will run it under two retry configurations and observe how many graph-level replans were needed.

from typing import Any
from langgoap.actions import ActionSpec
from langgoap.goals import GoalSpec
from langgoap.graph.builder import GoapGraph
from langgoap.qos import ActionQos

def make_flaky(qos=None):
    calls = [0]
    def execute(ws):
        calls[0] += 1
        if calls[0] <= 2:
            raise TimeoutError(f"attempt {calls[0]}")
        return {"done": True}
    action = ActionSpec(
        name="flaky",
        preconditions={},
        effects={"done": True},
        execute=execute,
        qos=qos,
    )
    return action, calls

goal = GoalSpec(conditions={"done": True})
/Users/brian.sam-bodden/Code/langgoap/.venv/lib/python3.12/site-packages/langgraph/checkpoint/serde/encrypted.py:5: LangChainPendingDeprecationWarning: The default value of `allowed_objects` will change in a future version. Pass an explicit value (e.g., allowed_objects='messages' or allowed_objects='core') to suppress this warning.
  from langgraph.checkpoint.serde.jsonplus import JsonPlusSerializer

2. Without QoS — the graph replans on every failure#

Each TimeoutError surfaces as action_failed. The observer logs the failure, blacklists the action, and asks the planner to try again. The planner has only one action, so the blacklist-fallback path clears the blacklist and schedules another attempt. Two failures, two replans, then success on the third try.

action, calls = make_flaky(qos=None)
result = GoapGraph([action]).invoke(goal=goal, world_state={})
(result["status"], calls[0], result["replan_count"])
Action 'flaky' failed: attempt 1
Action 'flaky' failed: attempt 2
('goal_achieved', 3, 2)

3. With ActionQos(max_attempts=3, idempotent=True)#

In-executor retry. The action runs three times inside the same executor invocation; the first two raise, the third succeeds. The planner sees a single successful execution with no replans.

qos = ActionQos(
    max_attempts=3,
    idempotent=True,
    backoff_initial_ms=1.0,
    backoff_multiplier=2.0,
    backoff_max_ms=10.0,
    jitter=0.0,
)
action, calls = make_flaky(qos=qos)
result = GoapGraph([action]).invoke(goal=goal, world_state={})
(result["status"], calls[0], result["replan_count"])
('goal_achieved', 3, 0)

4. FIRE_ONCE — one attempt, no retry#

Use ActionQos(max_attempts=1) (or the convenience FIRE_ONCE constant) for actions whose side effects are not safe to re-run.

from langgoap import FIRE_ONCE

(FIRE_ONCE.max_attempts, FIRE_ONCE.idempotent)
(1, False)

Next steps#

  • Combine ActionQos with MaxActionsPolicy so retries still count against a global cap (see termination_policies.ipynb).

  • See examples/tutorials/sql_query_agent.ipynb for an end-to-end use of retry-on-failure plus effect_validator.