Node Configuration Reference
Complete specification for the NodeConfig YAML format (v1.0) — schema, field definitions, credential resolution, model alias system, and example configurations.
Node Configuration Reference
API Version: 100monkeys.ai/v1 | Kind: NodeConfig | Status: Canonical
The Node Configuration defines the capabilities, resources, and LLM providers available on an AEGIS Agent Host (Orchestrator Node or Edge Node). It uses the same Kubernetes-style declarative format (apiVersion/kind/metadata/spec) as the Agent Manifest and Workflow Manifest.
Key capabilities:
- BYOLLM (Bring Your Own LLM) — use any provider (OpenAI, Anthropic, Ollama, LM Studio)
- Air-gapped operation — local LLMs (Ollama) for fully offline deployments
- Provider abstraction — agent manifests use model aliases, not hardcoded provider names
- Hot-swappable models — change underlying LLM without updating agent manifests
For an annotated walkthrough of every field, see Daemon Configuration.
Annotated Full Example
apiVersion: 100monkeys.ai/v1 # required; must be exactly this value
kind: NodeConfig # required; must be exactly "NodeConfig"
metadata:
name: production-node-01 # required; unique human-readable node name
version: "1.0.0" # optional; configuration version for tracking
labels: # optional; key-value pairs for categorization
environment: production
region: us-west-2
spec:
# ─── Node Identity ──────────────────────────────────────────────────────────
node:
id: "550e8400-e29b-41d4-a716-446655440000" # required; stable UUID
type: orchestrator # required; edge | orchestrator | hybrid
region: us-west-2 # optional; geographic region
tags: # optional; for execution_targets matching
- production
- gpu
resources: # optional; available compute resources
cpu_cores: 8
memory_gb: 32
disk_gb: 500
gpu: true
# ─── Image Tag ──────────────────────────────────────────────────────────────
image_tag: "0.1.0-pre-alpha" # optional; written by aegis init --tag / aegis update
# ─── LLM Providers ──────────────────────────────────────────────────────────
llm_providers:
- name: openai-primary
type: openai
endpoint: "https://api.openai.com/v1"
api_key: "env:OPENAI_API_KEY"
enabled: true
models:
- alias: default
model: gpt-4o
capabilities: [chat, code, reasoning]
context_window: 128000
cost_per_1k_tokens: 0.005
- alias: fast
model: gpt-4o-mini
capabilities: [chat, code]
context_window: 128000
cost_per_1k_tokens: 0.00015
- name: anthropic-primary
type: anthropic
endpoint: "https://api.anthropic.com/v1"
api_key: "secret:aegis-system/llm/anthropic-api-key"
enabled: true
models:
- alias: smart
model: claude-sonnet-4-5
capabilities: [chat, code, reasoning]
context_window: 200000
cost_per_1k_tokens: 0.003
- name: ollama-local
type: ollama
endpoint: "http://localhost:11434"
enabled: true
models:
- alias: local
model: qwen2.5-coder:32b
capabilities: [chat, code]
context_window: 32000
cost_per_1k_tokens: 0.0
# ─── LLM Selection Strategy ─────────────────────────────────────────────────
llm_selection:
strategy: prefer-local # prefer-local | prefer-cloud | cost-optimized | latency-optimized
default_provider: openai-primary
fallback_provider: ollama-local
max_retries: 3
retry_delay_ms: 1000
# ─── Runtime ────────────────────────────────────────────────────────────────
runtime:
bootstrap_script: "assets/bootstrap.py"
default_isolation: docker # docker | firecracker | inherit | process
container_socket_path: "/var/run/docker.sock" # or Podman: /run/user/1000/podman/podman.sock
container_network_mode: "aegis-network"
orchestrator_url: "env:AEGIS_ORCHESTRATOR_URL"
nfs_server_host: "env:AEGIS_NFS_HOST"
nfs_port: 2049
nfs_mountport: 2049
runtime_registry_path: "runtime-registry.yaml" # default value
# ─── Network ────────────────────────────────────────────────────────────────
network:
bind_address: "0.0.0.0"
port: 8088
grpc_port: 50051
orchestrator_endpoint: null # WebSocket URL for edge → orchestrator (edge nodes only)
heartbeat_interval_seconds: 30
tls:
cert_path: "/etc/aegis/tls/server.crt"
key_path: "/etc/aegis/tls/server.key"
# ─── Storage ────────────────────────────────────────────────────────────────
storage:
backend: seaweedfs # seaweedfs | local_host | opendal
fallback_to_local: true
nfs_port: 2049
seaweedfs:
filer_url: "http://localhost:8888"
mount_point: "/var/lib/aegis/storage"
default_ttl_hours: 24
default_size_limit_mb: 1000
max_size_limit_mb: 10000
gc_interval_minutes: 60
local_host:
mount_point: "/data/shared_llm_weights"
opendal:
provider: "memory"
# ─── Deploy Built-In Templates ──────────────────────────────────────────────
# Deploy vendored built-in agent and workflow templates on startup.
# Includes agent-creator-agent, workflow-generator-planner-agent, judge agents,
# intent-executor-discovery-agent, intent-result-formatter-agent, skill-validator,
# and the builtin-workflow-generator, builtin-intent-to-execution, and skill-import workflows.
# Required for aegis.agent.generate, aegis.workflow.generate, and aegis.execute.intent to function.
deploy_builtins: false
# ─── MCP Tool Servers ───────────────────────────────────────────────────────
mcp_servers:
- name: web-search
enabled: true
executable: "node"
args: ["/opt/aegis-tools/web-search/index.js"]
capabilities:
- name: web.search
skip_judge: true # read-only lookup — skip inner-loop judge overhead
- name: web.fetch
skip_judge: true # read-only fetch — skip inner-loop judge overhead
credentials:
SEARCH_API_KEY: "secret:aegis-system/tools/search-api-key"
environment:
LOG_LEVEL: "info"
health_check:
interval_seconds: 60
timeout_seconds: 5
method: "tools/list"
resource_limits:
cpu_millicores: 1000
memory_mb: 512
# ─── SEAL ───────────────────────────────────────────────────────────────────
seal:
private_key_path: "/etc/aegis/seal/private.pem"
public_key_path: "/etc/aegis/seal/public.pem"
issuer: "aegis-orchestrator"
audiences: ["aegis-agents"]
token_ttl_seconds: 3600
# ─── Security Contexts ──────────────────────────────────────────────────────
security_contexts:
- name: coder-default
description: "Standard coder context — filesystem + commands + safe package registries"
capabilities:
- tool_pattern: "fs.*"
path_allowlist: [/workspace, /agent]
- tool_pattern: "cmd.run"
subcommand_allowlist:
git: [clone, add, commit, push, pull, status, diff]
cargo: [build, test, fmt, clippy, check, run]
npm: [install, run, test, build, ci]
python: ["-m"]
- tool_pattern: "web.fetch"
domain_allowlist: [pypi.org, crates.io, npmjs.com]
rate_limit:
calls: 30
per_seconds: 60
deny_list: []
- name: aegis-system-operator
description: "Platform operator — all safe tools plus destructive and orchestrator commands"
capabilities:
- tool_pattern: "fs.*"
path_allowlist: [/workspace, /agent, /shared]
- tool_pattern: "cmd.run"
subcommand_allowlist:
git: [clone, add, commit, push, pull, status, diff, stash]
cargo: [build, test, fmt, clippy, check, run]
npm: [install, run, test, build, ci]
python: ["-m"]
- tool_pattern: "web.*"
- tool_pattern: "aegis.agent.delete"
- tool_pattern: "aegis.workflow.delete"
- tool_pattern: "aegis.task.remove"
- tool_pattern: "aegis.system.info"
- tool_pattern: "aegis.system.config"
deny_list: []
# ─── Builtin Dispatchers ────────────────────────────────────────────────────
builtin_dispatchers:
- name: "cmd"
description: "Execute shell commands inside the agent container via Dispatch Protocol"
enabled: true
capabilities:
- name: cmd.run
skip_judge: false # state-mutating — always validate
- name: "fs"
description: "Filesystem operations routed through AegisFSAL"
enabled: true
capabilities:
- name: fs.read
skip_judge: true # read-only — skip inner-loop judge overhead
- name: fs.write
skip_judge: false # state-mutating — always validate
- name: fs.list
skip_judge: true # read-only — skip inner-loop judge overhead
- name: fs.grep
skip_judge: true # read-only — skip inner-loop judge overhead
- name: fs.glob
skip_judge: true # read-only — skip inner-loop judge overhead
- name: fs.edit
skip_judge: false # state-mutating — always validate
- name: fs.multi_edit
skip_judge: false # state-mutating — always validate
- name: fs.create_dir
skip_judge: false # state-mutating — always validate
- name: fs.delete
skip_judge: false # state-mutating — always validate
# ─── IAM (OIDC) ─────────────────────────────────────────────────────────────
iam:
realms:
- slug: aegis-system
issuer_url: "https://auth.myzaru.com/realms/aegis-system"
jwks_uri: "https://auth.myzaru.com/realms/aegis-system/protocol/openid-connect/certs"
audience: "aegis-orchestrator"
kind: system
jwks_cache_ttl_seconds: 300
claims:
zaru_tier: "zaru_tier"
aegis_role: "aegis_role"
# ─── gRPC Auth ──────────────────────────────────────────────────────────────
grpc_auth:
enabled: true
exempt_methods:
- "/aegis.v1.InnerLoop/Generate"
# ─── Secrets (OpenBao) ──────────────────────────────────────────────────────
secrets:
backend:
address: "https://openbao.internal:8200"
auth_method: approle
approle:
role_id: "env:OPENBAO_ROLE_ID"
secret_id_env_var: "OPENBAO_SECRET_ID"
namespace: "aegis-system"
tls:
ca_cert: "/etc/aegis/openbao-ca.pem"
# ─── Database ───────────────────────────────────────────────────────────────
database:
url: "env:AEGIS_DATABASE_URL"
max_connections: 10
connect_timeout_seconds: 5
# ─── Temporal ───────────────────────────────────────────────────────────────
temporal:
address: "temporal:7233"
worker_http_endpoint: "http://temporal-worker:3000"
worker_secret: "env:TEMPORAL_WORKER_SECRET"
namespace: "default"
task_queue: "aegis-agents"
max_connection_retries: 30
# ─── Cortex ─────────────────────────────────────────────────────────────────
cortex:
grpc_url: "http://cortex:50052"
api_key: "env:CORTEX_API_KEY" # Required for Zaru SaaS; absent = memoryless mode
# ─── External SEAL Tooling Gateway ───────────────────────────────
seal_gateway:
# gRPC endpoint URL for aegis-seal-gateway
url: "http://aegis-seal-gateway:50055"
# ─── Execution Limits ───────────────────────────────────────────────────────
max_execution_list_limit: 1000
# ─── Cluster Protocol ───────────────────────────────────────────────────────
# Configures this node's role in the multi-node cluster topology.
cluster:
# Enable cluster mode. Default: false.
enabled: true
# Node role in cluster. Options: controller | worker | hybrid. Default: hybrid.
role: worker
# Controller settings (required for workers)
controller:
# gRPC endpoint of the controller node.
endpoint: "grpc://aegis-controller:50056"
# Bootstrap token for initial attestation (Step 0).
token: "env:AEGIS_CLUSTER_TOKEN"
# Port for NodeClusterService gRPC (controller only). Default: 50056.
cluster_grpc_port: 50056
# Static list of peer controller addresses. Default: [].
peers: []
# Path to the persistent Ed25519 keypair file for node identity.
# Generated automatically on first startup if missing.
node_keypair_path: "/etc/aegis/node_keypair.pem"
# Interval in seconds for worker heartbeats to the controller. Default: 30.
heartbeat_interval_secs: 30
# Re-attest this many seconds before the security token expires. Default: 120.
token_refresh_margin_secs: 120
# TLS configuration for cluster communication (mTLS).
tls:
enabled: true
cert_path: "/etc/aegis/certs/node.crt"
key_path: "/etc/aegis/certs/node.key"
ca_cert: "/etc/aegis/certs/ca.crt"
# ─── Observability ──────────────────────────────────────────────────────────
observability:
logging:
level: info
format: json
# ── OTLP Log Export ─────────────────────────────────────────
# Set otlp_endpoint to start shipping logs to any OpenTelemetry-compatible
# backend (Grafana Cloud, Datadog, self-hosted OTEL Collector, etc.).
# otlp_endpoint: "http://otel-collector:4317" # grpc (default)
# otlp_endpoint: "https://otlp-gateway.grafana.net/v1/logs" # Grafana Cloud
# otlp_protocol: grpc # grpc (default) | http
# otlp_headers: # API keys / auth headers
# Authorization: "env:OTLP_AUTH_TOKEN"
# otlp_min_level: info # min log level exported (default: info)
# otlp_service_name: aegis-orchestrator # service.name resource attr
# batch:
# max_queue_size: 2048
# scheduled_delay_ms: 5000
# max_export_batch_size: 512
# export_timeout_ms: 10000
# tls:
# verify: true # set false to skip cert verify (dev only)
# ca_cert_path: null # custom CA cert for self-signed backends
metrics:
enabled: true
port: 9091
path: "/metrics"
tracing:
enabled: falseManifest Envelope
All node configuration files use the Kubernetes-style envelope:
| Field | Type | Required | Value |
|---|---|---|---|
apiVersion | string | ✅ | 100monkeys.ai/v1 |
kind | string | ✅ | NodeConfig |
metadata.name | string | ✅ | Unique human-readable node name |
metadata.version | string | ❌ | Semantic version for tracking |
metadata.labels | map | ❌ | Key-value pairs for categorization |
spec | object | ✅ | All configuration sections documented below |
Credential Resolution
Any string value in the config supports credential prefixes:
| Prefix | Example | Resolution |
|---|---|---|
env:VAR_NAME | env:OPENAI_API_KEY | Read from daemon process environment at startup |
secret:path | secret:aegis-system/kv/api-key | Resolved from OpenBao at runtime (requires spec.secrets.backend) |
literal:value | literal:test-key | Use literal string (not recommended for production) |
| (bare string) | sk-abc123... | Plaintext. Avoid for secrets. |
Model Alias System
Agent manifests reference model aliases, not provider-specific model names. The node configuration maps aliases to real models, enabling hot-swapping and provider independence.
Standard Aliases
| Alias | Purpose |
|---|---|
default | General-purpose model (balanced cost/performance) |
fast | Low-latency model (quick responses) |
smart | High-capability model (complex reasoning) |
cheap | Cost-optimized model |
local | Local-only model (air-gapped) |
How It Works
Agent manifest references an alias:
# agent.yaml
spec:
task:
prompt_template: ...
# The agent uses whatever model is mapped to "default" on the nodeNode A (cloud) maps default → GPT-4o:
llm_providers:
- name: openai
type: openai
models:
- alias: default
model: gpt-4oNode B (air-gapped) maps default → Llama 3.2:
llm_providers:
- name: ollama
type: ollama
models:
- alias: default
model: llama3.2:latestSame agent manifest runs on both nodes without changes.
Section Reference
spec.node
Required. Identifies this node within the AEGIS cluster.
| Key | Type | Required | Default | Description |
|---|---|---|---|---|
id | string | ✅ | — | Unique stable node identifier. UUID recommended. |
type | enum | ✅ | — | edge | orchestrator | hybrid |
region | string | ❌ | null | Geographic region (e.g., us-east-1) |
tags | string[] | ❌ | [] | Capability tags matched against execution_targets in agent manifests |
resources.cpu_cores | u32 | ❌ | — | Available CPU cores |
resources.memory_gb | u32 | ❌ | — | Available RAM in GB |
resources.disk_gb | u32 | ❌ | — | Available disk in GB |
resources.gpu | bool | ❌ | false | GPU available |
spec.image_tag
Optional. The Docker image tag used for all AEGIS-owned service containers. Written by aegis init --tag <TAG> at initialization and updated automatically by aegis update. When absent, both commands default to the version string embedded in the aegis binary.
| Key | Type | Required | Default | Description |
|---|---|---|---|---|
image_tag | string | ❌ | <binary version> | Tag applied to all AEGIS-owned Docker images (e.g. ghcr.io/100monkeys-ai/aegis-orchestrator:<tag>). Written by aegis init --tag and updated by aegis update. |
spec.llm_providers
Required array. At least one entry with at least one model is required.
| Key | Type | Required | Default | Description |
|---|---|---|---|---|
name | string | ✅ | — | Unique provider name |
type | enum | ✅ | — | openai | anthropic | ollama | openai-compatible |
endpoint | string | ✅ | — | API endpoint URL |
api_key | string | ❌ | null | API key. Supports env: and secret: prefixes. |
enabled | bool | ❌ | true | Whether this provider is active |
models[].alias | string | ✅ | — | Alias referenced in agent manifests |
models[].model | string | ✅ | — | Provider-side model identifier |
models[].capabilities | string[] | ✅ | — | chat | embedding | reasoning | vision | code |
models[].context_window | u32 | ✅ | — | Max context window in tokens |
models[].cost_per_1k_tokens | f64 | ❌ | 0.0 | Cost per 1K tokens (0.0 for free/local) |
Provider Types
| Type | Use Case | API Key Required |
|---|---|---|
openai | OpenAI API | Yes |
anthropic | Anthropic API | Yes |
ollama | Local Ollama server | No |
openai-compatible | LM Studio, vLLM, or any OpenAI-compatible API | Depends |
spec.llm_selection
Optional. Controls runtime provider selection strategy.
| Key | Type | Default | Description |
|---|---|---|---|
strategy | enum | prefer-local | prefer-local | prefer-cloud | cost-optimized | latency-optimized |
default_provider | string | null | Provider to use when no preference is specified |
fallback_provider | string | null | Provider to use if the primary fails |
max_retries | u32 | 3 | Maximum retry attempts on LLM failure |
retry_delay_ms | u64 | 1000 | Delay between retries in milliseconds |
spec.runtime
Optional. Controls how agent containers are launched.
| Key | Type | Default | Description |
|---|---|---|---|
bootstrap_script | string | assets/bootstrap.py | Path to bootstrap script relative to orchestrator binary |
default_isolation | enum | inherit | docker | firecracker | inherit | process. The docker value works with both Docker and Podman runtimes — the orchestrator auto-detects the engine from the configured socket. |
container_socket_path | string | (platform default) | Container runtime socket path. The orchestrator auto-detects whether the socket belongs to Docker or Podman. Common values: /var/run/docker.sock (Docker), /run/user/<UID>/podman/podman.sock (Podman rootless), /run/podman/podman.sock (Podman rootful). If omitted, the orchestrator checks CONTAINER_HOST, then DOCKER_HOST, then falls back to the platform default (/var/run/docker.sock). |
container_network_mode | string | null | Container network name for agent containers. Supports env:. Works with both Docker and Podman networks. |
orchestrator_url | string | http://localhost:8088 | Callback URL reachable from inside agent containers. Supports env:. |
nfs_server_host | string | null | NFS server host as seen by the Docker daemon host OS. Supports env:. |
nfs_port | u16 | 2049 | NFS server port |
nfs_mountport | u16 | 2049 | NFS mountd port |
runtime_registry_path | string | runtime-registry.yaml | Path to the StandardRuntime registry YAML. Resolved relative to the daemon working directory. Hard-fails at startup if missing. |
nfs_server_host by environment:
| Environment | Value |
|---|---|
| WSL2 / Linux native | "127.0.0.1" |
| Docker Desktop (macOS) | "host.docker.internal" |
| Linux bridge network | "172.17.0.1" (Docker bridge gateway) |
| Remote / VM host | <physical host IP> |
| Via env var | "env:AEGIS_NFS_HOST" |
spec.network
Optional. Configures ports and TLS.
| Key | Type | Default | Description |
|---|---|---|---|
bind_address | string | 0.0.0.0 | Network interface to bind all listeners |
port | u16 | 8088 | HTTP REST API port |
grpc_port | u16 | 50051 | gRPC API port |
orchestrator_endpoint | string | null | WebSocket URL for edge → orchestrator connection (edge nodes only) |
heartbeat_interval_seconds | u64 | 30 | Health check ping interval |
tls.cert_path | string | — | TLS certificate path |
tls.key_path | string | — | TLS private key path |
tls.ca_path | string | null | CA certificate path (optional) |
spec.storage
Optional. Defaults to the local_host backend.
| Key | Type | Default | Description |
|---|---|---|---|
backend | enum | local_host | seaweedfs | local_host | opendal |
fallback_to_local | bool | true | Gracefully fall back to local storage when SeaweedFS is unreachable |
nfs_port | u16 | 2049 | NFS Server Gateway listen port |
seaweedfs.filer_url | string | http://localhost:8888 | SeaweedFS Filer endpoint |
seaweedfs.mount_point | string | /var/lib/aegis/storage | Host filesystem mount point |
seaweedfs.default_ttl_hours | u32 | 24 | Default TTL for ephemeral volumes (hours) |
seaweedfs.default_size_limit_mb | u64 | 1000 | Default per-volume size quota (MB) |
seaweedfs.max_size_limit_mb | u64 | 10000 | Hard ceiling on volume size (MB) |
seaweedfs.gc_interval_minutes | u32 | 60 | Expired volume GC interval (minutes) |
seaweedfs.s3_endpoint | string | null | Optional SeaweedFS S3 gateway endpoint |
seaweedfs.s3_region | string | us-east-1 | S3 gateway region |
local_host.mount_point | string | /var/lib/aegis/local-host-volumes | Host filesystem mount point for local volumes |
opendal.provider | string | memory | OpenDAL scheme provider |
opendal.options | map | {} | OpenDAL provider options |
spec.deploy_builtins
Optional. Default: false.
| Key | Type | Default | Description |
|---|---|---|---|
deploy_builtins | bool | false | Deploy vendored built-in agent and workflow templates on startup. Includes agent-creator-agent, workflow-generator-planner-agent, judge agents, intent-executor-discovery-agent, intent-result-formatter-agent, skill-validator, and the builtin-workflow-generator, builtin-intent-to-execution, and skill-import workflows. Required for aegis.agent.generate, aegis.workflow.generate, and aegis.execute.intent to function. |
spec.force_deploy_builtins
Optional. Default: disabled.
| Key | Type | Default | Description |
|---|---|---|---|
force_deploy_builtins | string | — | Force re-register all built-in agents and workflows on startup. Accepts "true" or "env:VAR_NAME". Use after upgrades to flush stale definitions. |
spec.mcp_servers
Optional array. Each entry defines an external MCP Tool Server process.
| Key | Type | Default | Description |
|---|---|---|---|
name | string | — | Unique server name on this node |
enabled | bool | true | Whether to start this server |
executable | string | — | Executable path |
args | string[] | [] | Command-line arguments |
capabilities | CapabilityConfig[] | [] | Per-tool capability objects (see below) |
credentials | map | {} | API keys/tokens injected as env vars. Values support secret:. |
environment | map | {} | Non-secret env vars for the server process |
health_check.interval_seconds | u64 | 60 | Health check interval |
health_check.timeout_seconds | u64 | 5 | Health check timeout |
health_check.method | string | tools/list | MCP method used to health-check the server |
resource_limits.cpu_millicores | u32 | 1000 | CPU limit (1000 = 1 core) |
resource_limits.memory_mb | u32 | 512 | Memory limit (MB) |
Each CapabilityConfig entry:
| Key | Type | Default | Description |
|---|---|---|---|
name | string | — | Tool name exposed to agents (e.g. "web.search", "gmail.read") |
skip_judge | bool | false | When true, the orchestrator bypasses the inner-loop semantic pre-execution judge for this tool even if spec.execution.tool_validation is enabled in the agent manifest. Set true only for read-only or otherwise idempotent tools. Set false for any state-mutating tool. |
skip_judge is an operator override. It does not disable SEAL authentication, SecurityContext policy, argument validation, or routing. It only removes the extra semantic review step before dispatch.
spec.seal
Optional. Enables cryptographic agent authorization via SEAL. Required in production.
| Key | Type | Default | Description |
|---|---|---|---|
private_key_path | string | — | Path to RSA private key PEM for signing SecurityToken JWTs |
public_key_path | string | — | Path to RSA public key PEM for verifying SecurityToken JWTs |
issuer | string | aegis-orchestrator | JWT iss claim |
audiences | string[] | [aegis-agents] | JWT aud claims |
token_ttl_seconds | u64 | 3600 | SecurityToken lifetime in seconds |
spec.security_contexts
Optional array. Named permission boundaries assigned to agents at execution time.
The platform ships with the following built-in contexts that are registered automatically and do not need to be defined in aegis-config.yaml:
| Name | Surface | Description |
|---|---|---|
aegis-system-agent-runtime | Execution | Agent containers — fs.* scoped to /workspace, cmd.run, web.*, aegis read/execution tools |
aegis-system-default | Authoring | Platform authoring agents — unrestricted tool access for manifest authoring and validation |
zaru-free | Chat/MCP | Zaru Free tier chat surface |
zaru-pro | Chat/MCP | Zaru Pro tier chat surface |
zaru-business | Chat/MCP | Zaru Business tier chat surface |
zaru-enterprise | Chat/MCP | Zaru Enterprise tier chat surface |
aegis-system-operator | Operator | Platform operators — all safe tools plus destructive and orchestrator commands |
Entries in spec.security_contexts define operator-provided contexts that supplement the built-in list. Built-in context names cannot be redefined here.
Each entry (SecurityContextDefinition):
| Key | Type | Default | Description |
|---|---|---|---|
name | string | — | Unique context name, referenced in agent manifests |
description | string | "" | Human-readable description |
capabilities | array | [] | Tool permissions granted by this context |
deny_list | string[] | [] | Explicit tool deny list; overrides any matching capability |
Each capabilities entry (CapabilityDefinition):
| Key | Type | Description |
|---|---|---|
tool_pattern | string | Tool name pattern (e.g., "fs.*", "cmd.run", "web.fetch") |
path_allowlist | string[] | Allowed filesystem path prefixes (for fs.* tools) |
subcommand_allowlist | object | Map of base command → allowed first positional arguments (for cmd.run). Example: {cargo: ["build","test"]}. |
domain_allowlist | string[] | Allowed network domain suffixes (for web.* tools) |
rate_limit.calls | u32 | Number of calls allowed per window |
rate_limit.per_seconds | u32 | Window size in seconds |
max_response_size | u64 | Max response size in bytes |
spec.builtin_dispatchers
Optional array. Configures the built-in in-process tool handlers. These are not external MCP server processes — they are implemented directly inside the orchestrator binary and dispatched via the Dispatch Protocol.
Each entry:
| Key | Type | Default | Description |
|---|---|---|---|
name | string | — | Dispatcher identifier (e.g. "cmd", "fs") |
description | string | "" | Human-readable description forwarded to the LLM tool schema |
enabled | bool | true | Activate or deactivate this dispatcher |
capabilities | CapabilityConfig[] | [] | Per-tool capability objects (same schema as spec.mcp_servers[].capabilities) |
Each CapabilityConfig entry follows the same schema described under spec.mcp_servers above.
skip_judge defaults by tool:
| Tool | Default skip_judge | Rationale |
|---|---|---|
cmd.run | false | State-mutating — subprocess output must always be validated |
fs.read | true | Read-only — file contents are deterministic |
fs.write | false | State-mutating — written content must be validated |
fs.list | true | Read-only — directory listings are deterministic |
fs.grep | true | Read-only — search results are deterministic |
fs.glob | true | Read-only — glob matches are deterministic |
fs.edit | false | State-mutating — edits must be validated |
fs.multi_edit | false | State-mutating — edits must be validated |
fs.create_dir | false | State-mutating |
fs.delete | false | State-mutating — destructive operation |
web.search | true | Read-only external lookup |
web.fetch | true | Read-only HTTP fetch |
If spec.builtin_dispatchers is omitted the orchestrator uses the compiled-in defaults shown above. Explicit configuration is only needed when overriding those defaults.
For external MCP servers, skip_judge follows the same rule: true for deterministic reads, false for anything that can mutate state or trigger side effects. The node configuration is the source of truth for the bypass decision.
spec.iam
Optional. Configures IAM/OIDC as the trusted JWT issuer. Omit to disable JWT validation (dev only).
| Key | Type | Default | Description |
|---|---|---|---|
realms[].slug | string | — | Realm name matching the Keycloak configuration |
realms[].issuer_url | string | — | OIDC issuer URL |
realms[].jwks_uri | string | — | JWKS endpoint for JWT signature verification |
realms[].audience | string | — | Expected aud claim in tokens from this realm |
realms[].kind | enum | — | system | consumer | tenant |
jwks_cache_ttl_seconds | u32 | 300 | JWKS key cache TTL |
claims.zaru_tier | string | zaru_tier | Keycloak claim name carrying ZaruTier |
claims.aegis_role | string | aegis_role | Keycloak claim name carrying AegisRole |
spec.grpc_auth
Optional. Controls IAM/OIDC JWT enforcement on the gRPC endpoint. Requires spec.iam.
| Key | Type | Default | Description |
|---|---|---|---|
enabled | bool | true | Enforce JWT validation on gRPC methods |
exempt_methods | string[] | [/aegis.v1.InnerLoop/Generate] | gRPC method full paths exempt from auth |
spec.secrets
Optional. Configures OpenBao as the secrets backend. Follows the Keymaster Pattern — agents never access OpenBao directly.
| Key | Type | Default | Description |
|---|---|---|---|
backend.address | string | — | OpenBao server URL |
backend.auth_method | string | approle | Only approle is currently supported |
backend.approle.role_id | string | — | AppRole Role ID (public; safe to commit) |
backend.approle.secret_id_env_var | string | OPENBAO_SECRET_ID | Env var name containing the AppRole Secret ID |
backend.namespace | string | — | OpenBao namespace (maps 1:1 to an IAM realm) |
backend.tls.ca_cert | string | null | CA certificate path |
backend.tls.client_cert | string | null | mTLS client certificate path |
backend.tls.client_key | string | null | mTLS client key path |
spec.database
Optional. PostgreSQL connection for persistent state (executions, patterns, workflows). If omitted, the daemon uses in-memory repositories (development mode only).
| Key | Type | Default | Description |
|---|---|---|---|
url | string | — | PostgreSQL connection URL. Supports env: and secret:. |
max_connections | u32 | 5 | Maximum connections in the pool |
connect_timeout_seconds | u64 | 5 | Connection timeout |
Example:
database:
url: "env:AEGIS_DATABASE_URL"
max_connections: 10
connect_timeout_seconds: 5spec.temporal
Optional. Temporal workflow engine configuration for durable workflow execution. If omitted, workflow orchestration features are unavailable.
| Key | Type | Default | Description |
|---|---|---|---|
address | string | temporal:7233 | Temporal gRPC server address |
worker_http_endpoint | string | http://localhost:3000 | HTTP endpoint for Temporal worker callbacks. Supports env:. |
worker_secret | string | null | Shared secret for authenticating worker callbacks. Supports env:. |
namespace | string | default | Temporal namespace |
task_queue | string | aegis-agents | Temporal task queue name |
max_connection_retries | i32 | 30 | Maximum number of connection retries when establishing the Temporal client. |
Example:
temporal:
address: "temporal:7233"
worker_http_endpoint: "http://aegis-runtime:3000"
worker_secret: "env:TEMPORAL_WORKER_SECRET"
namespace: "default"
task_queue: "aegis-agents"
max_connection_retries: 30spec.cortex
Optional. Cortex memory and learning service configuration. If omitted or grpc_url is null, the daemon runs in memoryless mode — no error, no retry, patterns are simply not stored.
| Key | Type | Default | Description |
|---|---|---|---|
grpc_url | string | null | Cortex gRPC service URL. Supports env:. |
api_key | string | null | API key for 100monkeys hosted Cortex (Zaru SaaS). Supports env: and secret: prefixes. When absent, the orchestrator connects without authentication (local/open cortex). |
Example:
cortex:
grpc_url: "env:CORTEX_GRPC_URL"
api_key: "env:CORTEX_API_KEY" # Required for Zaru SaaSspec.discovery
Discovery — Semantic agent and workflow search (aegis.agent.search, aegis.workflow.search) is powered by the Cortex service. When spec.cortex is configured with a valid grpc_url and api_key, discovery is available automatically. No separate spec.discovery configuration is required.
spec.seal_gateway
Optional. Configures forwarding of external tool invocations to the standalone SEAL tooling gateway.
| Key | Type | Default | Description |
|---|---|---|---|
url | string | null | gRPC endpoint URL of aegis-seal-gateway (example: http://aegis-seal-gateway:50055). |
If omitted, orchestrator does not forward unknown/external tools to the gateway and continues with built-in routing only.
spec.max_execution_list_limit
Optional. Upper bound on executions returned by a single list_executions request.
| Key | Type | Default | Description |
|---|---|---|---|
max_execution_list_limit | usize | 1000 | Maximum number of executions returned by a single list_executions request. Protects against excessive memory usage. |
spec.cluster
Optional. Configures this node's role in the multi-node cluster topology. If omitted, the node defaults to hybrid and operates as a standalone single-node deployment.
spec.cluster.role is orthogonal to spec.node.type. A node with type: orchestrator can have cluster.role: worker.
| Key | Type | Required | Default | Description |
|---|---|---|---|---|
enabled | bool | ❌ | false | Enable cluster mode. |
role | enum | ✅ | hybrid | Node role: controller | worker | hybrid |
controller_endpoint | string | ❌ | — | gRPC endpoint of the controller node. Required for role: worker. |
cluster_grpc_port | u16 | ❌ | 50056 | Port for NodeClusterService gRPC (controller only). |
peers | string[] | ❌ | [] | Static list of peer controller addresses. |
node_keypair_path | string | ✅ | — | Path to the persistent Ed25519 keypair file for node identity. |
heartbeat_interval_secs | u64 | ❌ | 30 | Interval in seconds for worker heartbeats. |
token_refresh_margin_secs | u64 | ❌ | 120 | Token re-attestation margin in seconds. |
tls.enabled | bool | ❌ | true | Enable TLS for cluster communication. |
tls.cert_path | string | ❌ | — | Path to node TLS certificate. |
tls.key_path | string | ❌ | — | Path to node TLS private key. |
tls.ca_cert | string | ❌ | — | Path to CA certificate for peer verification. |
Cluster Roles
| Role | Exposes | Connects To |
|---|---|---|
controller | NodeClusterService on cluster_grpc_port (default 50056) | Nothing (it is the authority) |
worker | ForwardExecution stream on spec.network.grpc_port (50051) | Controller at controller_endpoint |
hybrid | Both 50051 and 50056 | Itself (no external cluster required) |
Node Attestation
On startup, a worker node:
- Loads (or generates) its Ed25519 keypair from
node_keypair_path - Calls
AttestNodeon the controller - Receives a
ChallengeNodecall — signs the challenge with its private key - Receives a
NodeSecurityToken(signed JWT) - Wraps all subsequent cluster RPCs in a secure envelope
- Sends periodic heartbeats to the controller
The NodeSecurityToken is automatically refreshed before expiry. The keypair is persistent across restarts, enabling the controller to recognize re-connecting workers.
spec.observability
Optional.
logging
| Key | Type | Default | Env Override | Description |
|---|---|---|---|---|
logging.level | enum | info | RUST_LOG | error | warn | info | debug | trace |
logging.format | enum | json | AEGIS_LOG_FORMAT | json | text |
logging.file | string | null | — | Log file path. Omit to write to stdout. |
logging.otlp_endpoint | string | null | AEGIS_OTLP_ENDPOINT | OTLP collector endpoint. Setting this value enables OTLP log export. Omit or null to disable. |
logging.otlp_protocol | enum | grpc | AEGIS_OTLP_PROTOCOL | grpc | http. grpc uses port 4317 (default); http uses port 4318 with Protobuf over HTTP/1.1. |
logging.otlp_headers | map | {} | AEGIS_OTLP_HEADERS | Key-value HTTP/gRPC metadata headers sent on every export RPC (e.g. Authorization, api-key). Values support env: prefixes. When set via env var, use comma-separated key=value pairs: Authorization=Bearer token,x-scope-orgid=12345. |
logging.otlp_min_level | string | info | AEGIS_OTLP_LOG_LEVEL | Minimum log level forwarded to OTLP. Does not affect stdout output. |
logging.otlp_service_name | string | aegis-orchestrator | AEGIS_OTLP_SERVICE_NAME | Value of the service.name OpenTelemetry resource attribute. |
logging.batch.max_queue_size | u32 | 2048 | — | Maximum number of log records buffered before export. |
logging.batch.scheduled_delay_ms | u64 | 5000 | — | Interval (ms) between batch export flushes. |
logging.batch.max_export_batch_size | u32 | 512 | — | Maximum records per export RPC. |
logging.batch.export_timeout_ms | u64 | 10000 | — | Timeout (ms) for a single export RPC. |
logging.tls.verify | bool | true | — | Whether to verify the OTLP endpoint's TLS certificate. Set false only for development. |
logging.tls.ca_cert_path | string | null | — | Path to a custom CA certificate PEM for self-signed OTLP backends. |
metrics
| Key | Type | Default | Description |
|---|---|---|---|
metrics.enabled | bool | true | Enable Prometheus metrics |
metrics.port | u16 | 9091 | Prometheus metrics exposition port |
metrics.path | string | /metrics | HTTP path for scraping |
tracing
| Key | Type | Default | Description |
|---|---|---|---|
tracing.enabled | bool | false | Enable distributed tracing via OpenTelemetry |
Config Discovery Order
The daemon searches for a configuration file in this order (first match wins):
--config <path>CLI flagAEGIS_CONFIG_PATHenvironment variable./aegis-config.yaml(working directory)~/.aegis/config.yaml/etc/aegis/config.yaml(Linux/macOS)
Example Configurations
Minimal (Local Development)
apiVersion: 100monkeys.ai/v1
kind: NodeConfig
metadata:
name: dev-laptop
spec:
node:
id: "dev-local"
type: edge
llm_providers:
- name: ollama
type: ollama
endpoint: "http://localhost:11434"
enabled: true
models:
- alias: default
model: llama3.2:latest
capabilities: [chat, code]
context_window: 8192
cost_per_1k_tokens: 0.0
llm_selection:
strategy: prefer-local
default_provider: ollamaAir-Gapped Production
apiVersion: 100monkeys.ai/v1
kind: NodeConfig
metadata:
name: prod-airgap-001
version: "1.0.0"
labels:
environment: production
deployment: air-gapped
spec:
node:
id: "550e8400-e29b-41d4-a716-446655440001"
type: edge
tags: [production, air-gapped, local-llm]
llm_providers:
- name: ollama
type: ollama
endpoint: "http://localhost:11434"
enabled: true
models:
- alias: default
model: llama3.2:latest
capabilities: [chat, code, reasoning]
context_window: 8192
cost_per_1k_tokens: 0.0
- alias: fast
model: phi3:mini
capabilities: [chat, code]
context_window: 4096
cost_per_1k_tokens: 0.0
llm_selection:
strategy: prefer-local
default_provider: ollamaCloud Multi-Provider
apiVersion: 100monkeys.ai/v1
kind: NodeConfig
metadata:
name: cloud-multi-001
version: "1.0.0"
labels:
environment: production
deployment: cloud
spec:
node:
id: "550e8400-e29b-41d4-a716-446655440002"
type: orchestrator
region: us-west-2
tags: [production, cloud, multi-provider]
llm_providers:
- name: openai
type: openai
endpoint: "https://api.openai.com/v1"
api_key: "env:OPENAI_API_KEY"
enabled: true
models:
- alias: default
model: gpt-4o
capabilities: [chat, code, reasoning]
context_window: 128000
cost_per_1k_tokens: 0.005
- alias: fast
model: gpt-4o-mini
capabilities: [chat, code]
context_window: 128000
cost_per_1k_tokens: 0.00015
- name: anthropic
type: anthropic
endpoint: "https://api.anthropic.com/v1"
api_key: "env:ANTHROPIC_API_KEY"
enabled: true
models:
- alias: smart
model: claude-sonnet-4-5
capabilities: [chat, code, reasoning]
context_window: 200000
cost_per_1k_tokens: 0.003
llm_selection:
strategy: cost-optimized
default_provider: openai
fallback_provider: anthropicDocker Compose Deployment
apiVersion: 100monkeys.ai/v1
kind: NodeConfig
metadata:
name: docker-compose-node
version: "1.0.0"
spec:
node:
id: "env:AEGIS_NODE_ID"
type: orchestrator
llm_providers:
- name: ollama
type: ollama
endpoint: "http://ollama:11434"
enabled: true
models:
- alias: default
model: phi3:mini
capabilities: [chat, code, reasoning]
context_window: 4096
cost_per_1k_tokens: 0.0
llm_selection:
strategy: prefer-local
default_provider: ollama
runtime:
default_isolation: docker
container_network_mode: "env:AEGIS_CONTAINER_NETWORK"
orchestrator_url: "env:AEGIS_ORCHESTRATOR_URL"
nfs_server_host: "env:AEGIS_NFS_HOST"
runtime_registry_path: "runtime-registry.yaml"
storage:
backend: seaweedfs
fallback_to_local: true
seaweedfs:
filer_url: "http://seaweedfs-filer:8888"
mount_point: "/var/lib/aegis/storage"
default_ttl_hours: 24
default_size_limit_mb: 1000
max_size_limit_mb: 10000
gc_interval_minutes: 60
database:
url: "env:AEGIS_DATABASE_URL"
max_connections: 5
connect_timeout_seconds: 5
temporal:
address: "temporal:7233"
worker_http_endpoint: "http://aegis-runtime:3000"
worker_secret: "env:TEMPORAL_WORKER_SECRET"
namespace: "default"
task_queue: "aegis-agents"
cortex:
grpc_url: "env:CORTEX_GRPC_URL"
api_key: "env:CORTEX_API_KEY" # Required for Zaru SaaS
observability:
logging:
level: infoRelated Documents
- Daemon Configuration — annotated walkthrough of every field
- Agent Manifest Reference — agent manifest specification
- Workflow Manifest Reference — workflow manifest specification
- Docker Deployment — Docker-specific deployment guide
- Secrets Management — OpenBao integration guide
- IAM Integration — Keycloak configuration guide
Workflow Manifest Reference
Complete specification for the WorkflowManifest YAML format — schema, field definitions, all seven state kinds including ContainerRun for CI/CD steps, Subworkflow for workflow composition, consensus strategies, and complete examples.
CLI Reference
Reference for the currently implemented aegis CLI commands and flags.