ARGUS — ArgusLabs

Watchers

Deep dive into ArgusWatcher — the core monitoring primitive.

Overview

ArgusWatcher is the main class you interact with. It instruments your graph, records execution data, runs detectors, and produces traces. One watcher per execution run.

Basic Usage

python
from argus import ArgusWatcher

# Minimal — all defaults
watcher = ArgusWatcher()
watcher.watch(graph)
app = graph.compile()
result = app.invoke(state)
watcher.finalize()

# Access results
trace = watcher.get_trace()
print(trace.summary)

Parameters

All parameters are optional. Pass them to the ArgusWatcher() constructor to override config file and environment variable values.

Core

Core parameters
max_field_sizeint

Maximum characters to capture per state field. Fields exceeding this are truncated with a marker.

Default: 50_000

strictbool

When True, raises an exception if any detector fires during finalize(). Useful for CI/CD quality gates.

Default: False

investigatebool | "always"

Run forensic root cause analysis when detections are found. Set to "always" to analyze every trace regardless.

Default: True

Security

Security parameters
redact_keyslist[str]

List of state field names to redact in traces. Values are replaced with [REDACTED]. Supports glob patterns.

Default: None

validatorsdict

Custom validation functions keyed by field name. Each function receives the field value and returns True/False.

Default: {}

python
# Redact sensitive fields
watcher = ArgusWatcher(
    redact_keys=["api_key", "password", "*.secret"],
)

# Custom validators
watcher = ArgusWatcher(
    validators={
        "output": lambda v: len(str(v)) > 10,
        "confidence": lambda v: 0 <= v <= 1,
    }
)

Replay & Eval

Replay & evaluation parameters
persist_statebool

Save full state at each step to enable replay. Disable to reduce storage usage when replay isn't needed.

Default: True

record_httpbool

Record HTTP requests made during execution. Enables replaying with mocked external calls.

Default: False

semantic_judgebool

Enable LLM-as-judge semantic evaluation. Adds latency and cost but catches subtle quality issues.

Default: False

judge_modelstr

Which LLM to use for semantic judging. Any OpenAI-compatible model string.

Default: "gpt-4o"

Cost warning

Enabling semantic_judge sends node outputs to an LLM for evaluation. This adds API cost and latency proportional to the number of nodes in your graph. Use it selectively in staging/CI rather than on every production run.

Lifecycle

A Watcher goes through four phases:

  1. Created — constructor called, parameters loaded, storage initialized
  2. Watchingwatch() called, graph instrumented, ready for execution
  3. Recording — pipeline is running, watcher is capturing node inputs/outputs/timing
  4. Finalizedfinalize() called, detectors run, forensics generated, trace stored
python
# Full lifecycle
watcher = ArgusWatcher(strict=True)    # Created
watcher.watch(graph)                    # Watching
app = graph.compile()
result = app.invoke(state)              # Recording (happens during invoke)
watcher.finalize()                      # Finalized

# After finalize, access everything
trace = watcher.get_trace()
detections = trace.detections
forensics = trace.forensics

Do not reuse

A Watcher instance is single-use. After finalize(), create a new Watcher for the next execution. Calling watch() on a finalized Watcher raises WatcherStateError.