Detection Layers | ARGUS Docs

Overview

ARGUS doesn't throw everything at an LLM. Detection runs in four layers, each more expensive than the last, and each only fires when needed.

Layer 1 — Heuristic Engine

Pattern matching against 150+ known failure signatures. Deterministic, zero cost, catches ~80% of failures.

What it catches:

‣Placeholder outputs — "Lorem ipsum", "TODO", template strings left in responses
‣Empty results — a node returns {} or drops a required field
‣Error keys — nested error objects, rate-limit responses, API error patterns
‣Semantic degradation markers — refusal patterns, repeated filler text, corrupted JSON

The heuristic engine loads signatures from three tiers: bundled (ships with ARGUS), private (your local patterns), and shared (community-contributed, synced from cloud). All merged and deduplicated at startup.

Adaptive

The heuristic engine grows over time. When the LLM investigator discovers a new failure pattern, it proposes a candidate signature. After you approve it in the Approvals page, the pattern is added to the heuristic engine — catching the same failure deterministically next time. See Adaptive Learning.

Layer 2 — Anomaly Detector

Statistical checks for suspicious patterns that the heuristic engine can't catch with fixed signatures. Still deterministic, no LLM calls.

‣Unexpected field types — list[int] vs list[str] mismatches
‣Output size anomalies — a response that's dramatically shorter or longer than expected
‣Timing outliers — a node that usually takes 2s now takes 30s
‣Latency degradation — near-timeout nodes (≥95% of limit), suspiciously fast responses (likely cached/stale), and fast completions paired with quality failures

Layer 3 — Correlator

Traces failure propagation across nodes. This is the layer that tells you where the problem actually started, not just where it surfaced.

If node 3 dropped a field and node 5 crashed because of it, the correlator builds the causal chain and points you at node 3, not node 5.

text

   2  validate    12 ms    ⚠  silent failure
      └─  Field "score" is missing
      └─  process received bad state
   3  process    891 ms    ✗  crashed
      └─  KeyError: 'score'
      └─  Field 'score' was absent from the incoming state

root cause   validate

The correlator connects the KeyError in process back to the missing field in validate — so you fix the right node.

Layer 4 — LLM Investigator

Only triggers on ambiguous failures or when explicitly enabled. This is the expensive layer — it calls an LLM.

‣Generates root cause explanations and causal hypotheses
‣Proposes new heuristic signatures so the same failure gets caught deterministically next time
‣Provides debugging suggestions with specific node and field references

python

# Control when the LLM investigator runs
watcher = ArgusWatcher(
    graph,
    investigate=True,        # on failure only (default)
    # investigate="always",  # every node
    # investigate=False,     # never
)

Cost

The LLM investigator uses the model set by judge_model (default: gpt-4o). This adds API cost per run. Use investigate=True (default) to only run it when something fails.

Semantic Judge

Deterministic checks catch ~80% of production failures. For the remaining 20% — subtle quality issues like wrong tone, unhelpful responses, or outdated information — enable the semantic judge:

python

watcher = ArgusWatcher(graph, semantic_judge=True)

The judge runs afterdeterministic checks on every passing node. It evaluates output quality and flags issues that pattern matching can't catch. It won't override a clear heuristic failure — it only steps in when the picture is ambiguous.

Requires OPENAI_API_KEY in your environment.

When to use: complex multi-agent pipelines, customer-facing outputs, LLM-generated content where quality matters.

When to skip: simple pipelines, CI/CD speed runs, zero-cost monitoring.