Documentation — ARGUS

What is ARGUS?

ARGUS is a production readiness platformfor AI agent pipelines. It wraps your LangGraph (or any Python-based) workflow and watches every node execution, state transition, and tool call — then runs a multi-layered detection system to catch the failures that don't throw exceptions.

Your LangGraph pipeline runs. No exception. But three nodes later something crashes with a KeyError. The node that crashed didn't cause it — some node upstream returned a dict with a missing field, and nothing caught it. ARGUS sits between your nodes and catches silent failures, semantic degradation, and contract violations before they reach production.

The Problem

AI agent pipelines fail differently from traditional software. They don't crash — they degrade. A retrieval step returns irrelevant documents. A planning node generates a reasonable-looking but wrong plan. A tool call succeeds with bad parameters. The pipeline finishes, returns a result, and nobody knows it's garbage until a human reads it.

These are silent failures — the pipeline technically succeeds while the output quality collapses. Standard monitoring (latency, error rates, uptime) is blind to them. You need something that understands what your pipeline is supposed to do and can tell when it stops doing it.

How ARGUS Works

ARGUS wraps your pipeline with a single call and instruments every execution step automatically. No manual tracing. No decorators on every function. One wrapper, full visibility.

python

from argus import ArgusWatcher

# Option A — pass graph to constructor (recommended)
watcher = ArgusWatcher(graph)      # attaches monitoring automatically
app = graph.compile()
result = app.invoke(initial_state) # run auto-saves when the last node finishes
print(watcher.run_id)              # access the run ID directly

All parameters are optional. Here's a full example with customization:

python

watcher = ArgusWatcher(
    graph,                           # LangGraph graph to monitor
    max_field_size=50_000,           # max chars per captured state field
    strict=False,                    # True = raise on detection (useful for CI)
    investigate=True,                # run root cause analysis on failures
    redact_keys={"api_key", "token"},# scrub sensitive fields from traces
    persist_state=True,              # save state at each step for replay
    record_http=True,                # record HTTP calls for mocked replay
    semantic_judge=False,            # enable LLM-as-judge evaluation
    judge_model="gpt-4o",           # model for semantic judging
    validators={
        "summarize": lambda o: (len(o.get("summary", "")) > 10, "Summary too short"),
    },
)

Runs are saved automatically for linear and fan-out/fan-in graphs. Only cyclic graphs (with back-edges) need a manual watcher.finalize() call.

ARGUS architecture diagram showing the watcher wrapping a pipeline, detectors analyzing the trace, and forensic output — ARGUS wraps your pipeline, runs multi-layer detection, and produces forensic traces

Key Capabilities

‣Silent failure detection— catches semantic degradation, hallucinated outputs, and logic errors that don't raise exceptions
‣Root cause analysis — traces failures back through the execution graph to the node that caused the problem
‣Execution replay — re-run any node with frozen upstream state. Only the target node onward re-executes with your fixed code.
‣Four detection layers — heuristic engine (150+ signatures), anomaly detector, correlator, and LLM investigator
‣Zero-config instrumentation — one wrapper call, no decorators, no manual span creation

Who Is It For?

ARGUS is built for engineers shipping AI agent pipelines to production. If you're building with LangGraph, LangChain, or any Python-based agent framework and you need to know when your pipeline is silently producing bad output — ARGUS is for you.

ARGUS also works without LangGraph — use ArgusSession to wrap plain Python functions, Prefect tasks, or Temporal workflows.

Beta

ARGUS is currently in beta (v0.6.18). The core API is stable, but some detection layers and CLI commands are still being refined. Requires Python 3.9+. LangGraph 0.2+ only needed for ArgusWatcher. Join the Discord for early access and to shape the roadmap.

Ready to try it? Jump to the Quickstart.

Introduction

What is ARGUS?

The Problem

How ARGUS Works

Key Capabilities

Who Is It For?