Configuration Reference
Full annotated aegis-config.yaml with all supported keys, types, defaults, and descriptions.
Daemon Configuration
The AEGIS daemon is configured via a single YAML file, by default aegis-config.yaml in the working directory. Pass a custom path with --config:
aegis daemon start --config /etc/aegis/config.yamlConfig Discovery Order
The daemon searches for a config file in the following order. The first match wins:
--config <path>CLI flagAEGIS_CONFIG_PATHenvironment variable./aegis-config.yaml(working directory)~/.aegis/config.yaml/etc/aegis/config.yaml(Linux/macOS)
Credential Resolution
Any string value in the config file can use a credential prefix instead of a literal:
| Prefix | Resolution |
|---|---|
env:VAR_NAME | Read from the daemon process environment at startup. |
secret:path/to/secret | Resolved from OpenBao at runtime (requires spec.secrets.backend configured). |
| (literal) | Plaintext. Avoid for secrets. |
The recommended production pattern is secret: references for all API keys and credentials. Use env: as a fallback when OpenBao is not available.
Multi-Node Cluster Topology
AEGIS supports a distributed Controller-Worker topology for high availability and horizontal scaling. Nodes can be configured as controllers (scheduling and management), workers (agent execution), or hybrid nodes.
Key features of the cluster topology include:
- Distributed Scheduling: Controllers route agent executions to the most suitable available workers.
- Node Identity: Each node maintains a stable identity via a persistent Ed25519 keypair, used for secure attestation and SealNodeEnvelope signing.
- Secure Communication: Inter-node traffic is protected by mTLS and SEAL-derived security tokens.
- Seamless Scaling: New workers can be added to the cluster by pointing them at the controller endpoint and providing valid certificates.
Manifest Envelope
All aegis-config.yaml files use the Kubernetes-style manifest envelope:
apiVersion: 100monkeys.ai/v1
kind: NodeConfig
metadata:
name: "my-aegis-node" # required
version: "1.0.0" # optional
labels: # optional
environment: "production"
spec:
# All configuration sections documented below sit under spec:The sections documented below all belong under the top-level spec: key.
Full Annotated Configuration
apiVersion: 100monkeys.ai/v1
kind: NodeConfig
metadata:
name: "aegis-node"
spec:
# ─── Node Identity ────────────────────────────────────────────────────────────
node:
# Unique stable node identifier. UUID recommended. Required.
id: "env:AEGIS_NODE_ID"
# Node type. Options: edge | orchestrator | hybrid. Required.
type: orchestrator
# Geographic region, e.g. "us-east-1". Optional.
region: "us-east-1"
# Capability tags used to match agent manifest execution_targets. Optional.
tags:
- gpu
- high-memory
# Physical resources available on this node. Optional.
resources:
cpu_cores: 8
memory_gb: 32
disk_gb: 500
gpu: false
# ─── Image Tag ────────────────────────────────────────────────────────────────
# Docker image tag for all AEGIS-owned service containers.
# Written by `aegis init --tag <TAG>` and updated by `aegis update`.
# When absent, both commands default to the version of the aegis binary.
image_tag: "0.1.0-pre-alpha"
# ─── LLM Providers ────────────────────────────────────────────────────────────
# Array of LLM provider configurations. At least one entry is required.
llm_providers:
- name: openai-primary # Unique provider name on this node. Required.
type: openai # openai | anthropic | ollama | openai-compatible. Required.
endpoint: "https://api.openai.com/v1" # API endpoint URL. Required.
api_key: "env:OPENAI_API_KEY" # Supports env: and secret: prefixes.
enabled: true # Default: true
models: # Must have at least one entry. Required.
- alias: default # Alias referenced in agent manifests. Required.
model: gpt-4o # Provider-side model name. Required.
capabilities: # chat | embedding | reasoning | vision | code. Required.
- chat
- code
- reasoning
context_window: 128000 # Max context window in tokens. Required.
cost_per_1k_tokens: 0.005 # Default: 0.0 (free/local)
- alias: fast
model: gpt-4o-mini
capabilities: [chat, code]
context_window: 128000
cost_per_1k_tokens: 0.00015
- name: anthropic-primary
type: anthropic
endpoint: "https://api.anthropic.com/v1"
api_key: "secret:aegis-system/llm/anthropic-api-key"
enabled: true
models:
- alias: smart
model: claude-sonnet-4-5
capabilities: [chat, code, reasoning]
context_window: 200000
cost_per_1k_tokens: 0.003
- name: ollama-local
type: ollama
endpoint: "http://localhost:11434"
enabled: true
models:
- alias: local
model: qwen2.5-coder:32b
capabilities: [chat, code]
context_window: 32000
cost_per_1k_tokens: 0.0
# ─── LLM Selection Strategy ───────────────────────────────────────────────────
# Optional. Controls how the orchestrator picks providers at runtime.
llm_selection:
# prefer-local | prefer-cloud | cost-optimized | latency-optimized
# Default: prefer-local
strategy: prefer-local
# Provider name to use when no preference is specified. Default: null (auto-select).
default_provider: openai-primary
# Provider to use if the primary fails. Default: null.
fallback_provider: ollama-local
# Maximum retry attempts on LLM failure. Default: 3
max_retries: 3
# Delay between retries in milliseconds. Default: 1000
retry_delay_ms: 1000
# ─── Execution Limits ────────────────────────────────────────────────────────
# Optional. Protects list_executions from returning unbounded result sets. Defaults to 1000 when omitted.
max_execution_list_limit: 1000
# ─── Runtime ──────────────────────────────────────────────────────────────────
# Optional. Has safe defaults for Docker.
runtime:
# Path to the agent bootstrap script, relative to the orchestrator binary.
# Default: assets/bootstrap.py
bootstrap_script: "assets/bootstrap.py"
# Default isolation level for executions that do not specify one.
# Options: docker | podman | firecracker | inherit | process
# Default: inherit (uses the node's compiled-in default)
default_isolation: docker
# Container runtime socket path. Omit to use the platform default:
# Docker on Linux/macOS: /var/run/docker.sock
# Podman rootless: /run/user/<UID>/podman/podman.sock
# Podman system (root): /run/podman/podman.sock
# Can also be set via CONTAINER_HOST or DOCKER_HOST env vars (see below).
container_socket_path: "/var/run/docker.sock"
# Container network name for agent containers.
# Supports env: prefix. Example: "env:AEGIS_CONTAINER_NETWORK"
container_network_mode: "aegis-network"
# URL that agent containers use to call back to the orchestrator.
# Must be reachable from inside containers, not from the host.
# Default: http://localhost:8088
orchestrator_url: "env:AEGIS_ORCHESTRATOR_URL"
# ── Container Host Environment Variables ──────────────────────────────────
# The container socket can also be configured via environment variables.
# These override container_socket_path when set:
#
# CONTAINER_HOST=unix:///run/user/1000/podman/podman.sock
# DOCKER_HOST=unix:///var/run/docker.sock
#
# CONTAINER_HOST takes precedence over DOCKER_HOST when both are set.
# Podman users should set CONTAINER_HOST to point to the Podman socket.
# ── NFS Storage Gateway (required when spec.storage.backend: seaweedfs) ───
# Hostname/IP of the NFS server. Must resolve from the HOST OS where the
# Docker daemon runs — NOT from inside agent containers.
# Supports env: prefix.
#
# Platform-specific values:
# WSL2 / Linux native: "127.0.0.1"
# Docker Desktop (Win/Mac): "host.docker.internal"
# Linux bridge network: "172.17.0.1" (Docker bridge gateway)
# Remote / VM host: "<physical host IP>"
nfs_server_host: "env:AEGIS_NFS_HOST"
# NFS server listen port. Default: 2049
nfs_port: 2049
# NFS mountd port. Default: 2049
nfs_mountport: 2049
# ─── Network ──────────────────────────────────────────────────────────────────
# Optional. Has safe defaults.
network:
# Bind address for all listeners. Default: 0.0.0.0
bind_address: "0.0.0.0"
# HTTP REST API port. Default: 8088
port: 8088
# gRPC API port (inner-loop transport, Temporal workers). Default: 50051
grpc_port: 50051
# WebSocket URL for edge-node → orchestrator connection. Omit on orchestrator nodes.
orchestrator_endpoint: null
# Health check ping interval in seconds. Default: 30
heartbeat_interval_seconds: 30
# Optional TLS. Omit for plaintext (dev only; never in production).
tls:
cert_path: "/etc/aegis/tls/server.crt"
key_path: "/etc/aegis/tls/server.key"
ca_path: null # Optional CA certificate path
# ─── Cluster ──────────────────────────────────────────────────────────────────
# Optional. Configures the node's role in the multi-node cluster topology.
cluster:
# Enable cluster mode. Default: false.
enabled: true
# Node role in cluster. Options: controller | worker | hybrid. Default: hybrid.
# - controller: Manages routing and registration; does not run executions.
# - worker: Runs agent executions; does not perform routing decisions.
# - hybrid: Both controller and worker duties (default).
role: worker
# Controller settings (required for workers)
controller:
# gRPC endpoint of the controller node.
endpoint: "grpc://aegis-controller:50056"
# Bootstrap token for initial attestation (Step 0).
token: "env:AEGIS_CLUSTER_TOKEN"
# Port for NodeClusterService (controllers/hybrids). Default: 50056.
cluster_grpc_port: 50056
# Static list of peer controller addresses. Default: [].
peers: []
# Path to the persistent Ed25519 keypair file for node identity.
# Generated automatically on first startup if missing.
node_keypair_path: "/etc/aegis/node_keypair.pem"
# Interval in seconds for worker heartbeats to the controller. Default: 30.
heartbeat_interval_secs: 30
# Re-attest this many seconds before the security token expires. Default: 120.
token_refresh_margin_secs: 120
# TLS configuration for secure inter-node cluster communication (mTLS).
tls:
enabled: true
cert_path: "/etc/aegis/certs/node.crt"
key_path: "/etc/aegis/certs/node.key"
ca_cert: "/etc/aegis/certs/ca.crt"
# ─── Storage ──────────────────────────────────────────────────────────────────
# Optional. Defaults to local_host backend.
storage:
# Storage backend. Options: seaweedfs | local_host. Default: local_host
backend: seaweedfs
# Gracefully fall back to local storage when SeaweedFS is unreachable.
# Default: true
fallback_to_local: true
# NFS Server Gateway listen port. Default: 2049
nfs_port: 2049
# SeaweedFS backend config. Required when backend: seaweedfs.
seaweedfs:
# SeaweedFS Filer endpoint.
# Default: http://localhost:8888
filer_url: "http://localhost:8888"
# Host filesystem mount point.
# Default: /var/lib/aegis/storage
mount_point: "/var/lib/aegis/storage"
# Default TTL for ephemeral volumes in hours. Default: 24
default_ttl_hours: 24
# Default per-volume size quota in MB. Default: 1000
default_size_limit_mb: 1000
# Hard ceiling on any single volume size in MB. Default: 10000
max_size_limit_mb: 10000
# GC interval for expired volumes in minutes. Default: 60
gc_interval_minutes: 60
# Optional S3 gateway endpoint (e.g., for direct object uploads).
s3_endpoint: null
# S3 gateway region. Default: us-east-1
s3_region: "us-east-1"
# Local storage config. Used when backend: local_host, or as the fallback target.
local_host:
# Base directory for local volume storage.
# Default: /var/lib/aegis/local-volumes
base_path: "/var/lib/aegis/local-volumes"
# Default TTL for ephemeral volumes in hours. Default: 24
default_ttl_hours: 24
# Default per-volume size quota in MB. Default: 1000
default_size_limit_mb: 1000
# Hard ceiling on any single volume size in MB. Default: 10000
max_size_limit_mb: 10000
# ─── Deploy Built-In Templates ───────────────────────────────────────────────
# Deploy vendored built-in agent and workflow templates on startup.
# Includes agent-creator-agent, workflow-generator-planner-agent, judge agents,
# intent-executor-discovery-agent, intent-result-formatter-agent, skill-validator,
# and the builtin-workflow-generator, builtin-intent-to-execution, and skill-import workflows.
# Required for aegis.agent.generate, aegis.workflow.generate, and aegis.execute.intent to function.
# Default: false
deploy_builtins: false
# ─── Force Deploy Built-In Templates ─────────────────────────────────────────
# Force re-registration of all built-in agents and workflows on startup, even if
# already registered. Use after a platform upgrade when built-in agent UUIDs or
# definitions have changed and stale registrations need to be flushed.
# Accepts: "true" | "false" | "env:VAR_NAME". Optional. Default: disabled.
force_deploy_builtins: "false"
# ─── MCP Tool Servers ─────────────────────────────────────────────────────────
# External MCP server processes. Optional array.
mcp_servers:
- name: web-search # Unique server name on this node. Required.
enabled: true # Default: true
# Executable path (absolute or relative to /usr/local/bin). Required.
executable: "node"
# Command-line arguments. Default: []
args:
- "/opt/aegis-tools/web-search/index.js"
# Tool capabilities this server provides (used for routing). Default: []
capabilities:
- web.search
- web.fetch
# API keys and tokens — resolved via env: or secret: prefixes.
# Values are injected as environment variables into the server process.
credentials:
SEARCH_API_KEY: "secret:aegis-system/tools/search-api-key"
# Non-secret environment variables for the server process. Default: {}
environment:
LOG_LEVEL: "info"
# Health check configuration.
health_check:
interval_seconds: 60 # Default: 60
timeout_seconds: 5 # Default: 5
method: "tools/list" # MCP method used for health check. Default: tools/list
# Resource limits for the server process.
resource_limits:
cpu_millicores: 1000 # 1000 = 1 CPU core
memory_mb: 512
# ─── SEAL (Signed Envelope Attestation Layer) ────────────────────────────────────
# Optional. Required in production to enable cryptographic agent authorization.
seal:
# RSA private key PEM used to sign SecurityToken JWTs issued at attestation.
private_key_path: "/etc/aegis/seal/private.pem"
# RSA public key PEM used to verify SecurityToken JWTs on tool calls.
public_key_path: "/etc/aegis/seal/public.pem"
# JWT iss claim. Default: aegis-orchestrator
issuer: "aegis-orchestrator"
# JWT aud claim values. Default: [aegis-agents]
audiences:
- "aegis-agents"
# SecurityToken lifetime in seconds. Default: 3600 (1 hour)
token_ttl_seconds: 3600
# ─── Security Contexts ────────────────────────────────────────────────────────
# Named permission boundaries controlling what tools agents may invoke.
# Referenced by name in agent manifests and in spec.iam ZaruTier mappings.
security_contexts:
- name: coder-default
description: "Standard coder context — filesystem + commands + safe package registries"
capabilities:
- tool_pattern: "fs.*" # ← tool_pattern, not tool
path_allowlist:
- /workspace
- /agent
- tool_pattern: "cmd.run"
subcommand_allowlist:
git: [clone, add, commit, push, pull, status, diff, stash]
cargo: [build, test, fmt, clippy, check, run]
npm: [install, run, test, build, ci]
python: ["-m"]
- tool_pattern: "web.fetch"
domain_allowlist:
- pypi.org
- crates.io
- npmjs.com
rate_limit:
calls: 30 # ← object with calls + per_seconds, not "30/minute"
per_seconds: 60
# Explicit deny list — overrides any matching capability above.
deny_list: []
- name: zaru-free
description: "Zaru Free tier: ephemeral volumes only, no outbound network"
capabilities:
- tool_pattern: "fs.*"
path_allowlist:
- /workspace
- tool_pattern: "cmd.run"
subcommand_allowlist:
python: ["-m"]
npm: [install, run, test]
deny_list:
- "web.*"
- name: zaru-pro
description: "Zaru Pro tier: full coder-default capabilities"
capabilities:
- tool_pattern: "fs.*"
path_allowlist:
- /workspace
- /agent
- tool_pattern: "cmd.run"
subcommand_allowlist:
git: [clone, add, commit, push, pull, status, diff]
cargo: [build, test, fmt, clippy, check, run]
npm: [install, run, test, build, ci]
python: ["-m"]
- tool_pattern: "web.*"
domain_allowlist:
- pypi.org
- crates.io
- npmjs.com
- api.github.com
rate_limit:
calls: 60
per_seconds: 60
# aegis-system-agent-runtime is a platform built-in — shown here for reference only.
# Do NOT copy this block into your aegis-config.yaml; it is registered automatically.
- name: aegis-system-agent-runtime
description: "Execution surface for agent containers. Grants filesystem, shell, and web access scoped to /workspace."
capabilities:
- tool_pattern: "fs.*"
path_allowlist:
- /workspace
- tool_pattern: "cmd.run"
- tool_pattern: "web.*"
- tool_pattern: "aegis.agent.get"
- tool_pattern: "aegis.agent.list"
- tool_pattern: "aegis.workflow.get"
- tool_pattern: "aegis.workflow.list"
- tool_pattern: "aegis.workflow.signal"
- tool_pattern: "aegis.task.execute"
deny_list:
- "aegis.agent.delete"
- "aegis.workflow.delete"
- "aegis.task.remove"
- "aegis.system.info"
- "aegis.system.config"
- name: aegis-system-operator
description: "Platform operator — all safe tools plus destructive and orchestrator commands"
capabilities:
- tool_pattern: "fs.*"
path_allowlist:
- /workspace
- /agent
- /shared
- tool_pattern: "cmd.run"
subcommand_allowlist:
git: [clone, add, commit, push, pull, status, diff, stash]
cargo: [build, test, fmt, clippy, check, run]
npm: [install, run, test, build, ci]
python: ["-m"]
- tool_pattern: "web.*"
# Destructive commands (operator-only)
- tool_pattern: "aegis.agent.delete"
- tool_pattern: "aegis.workflow.delete"
- tool_pattern: "aegis.task.remove"
# Orchestrator commands (operator-only)
- tool_pattern: "aegis.system.info"
- tool_pattern: "aegis.system.config"
deny_list: []
# ─── Builtin Dispatchers ──────────────────────────────────────────────────────
# Configuration for the in-process Dispatch Protocol handler.
# The cmd dispatcher is NOT an MCP server — it runs subprocesses inside agent
# containers via the bidirectional bootstrap channel.
builtin_dispatchers:
cmd:
# Enable cmd.run dispatch. Default: true
enabled: true
# Default per-subprocess timeout in seconds. Default: 60
default_timeout_secs: 60
# Ceiling timeout an agent manifest may request. Default: 300
max_timeout_ceiling_secs: 300
# Maximum stdout + stderr captured per subprocess in bytes. Default: 524288 (512 KB)
max_output_bytes: 524288
# Maximum concurrent subprocesses per execution. Default: 1
max_concurrent_per_execution: 1
# Environment variables that must never be forwarded to agent subprocesses.
global_env_denylist:
- AEGIS_TOKEN
- OPENAI_API_KEY
- ANTHROPIC_API_KEY
- SEAL_PRIVATE_KEY
- AWS_SECRET_ACCESS_KEY
- GOOGLE_API_KEY
# ─── IAM (OIDC) ───────────────────────────────────────────────────────────────
# Optional. Omit to disable JWT validation (dev only; never in production).
# Configures Keycloak as the trusted OIDC issuer for all
# human and service-account identities.
iam:
# Array of Keycloak realms trusted by this node.
realms:
- slug: "aegis-system" # Realm name matching Keycloak config. Required.
issuer_url: "https://auth.myzaru.com/realms/aegis-system" # OIDC issuer URL. Required.
jwks_uri: "https://auth.myzaru.com/realms/aegis-system/protocol/openid-connect/certs" # Required.
audience: "aegis-orchestrator" # Expected aud claim. Required.
kind: system # system | consumer | tenant. Required.
- slug: "zaru-consumer"
issuer_url: "https://auth.myzaru.com/realms/zaru-consumer"
jwks_uri: "https://auth.myzaru.com/realms/zaru-consumer/protocol/openid-connect/certs"
audience: "aegis-orchestrator"
kind: consumer
# JWKS key refresh interval in seconds. Supports live key rotation.
# Default: 300
jwks_cache_ttl_seconds: 300
# Custom claim names injected by Keycloak attribute mappers.
claims:
zaru_tier: "zaru_tier" # Claim carrying ZaruTier (Free | Pro | Business | Enterprise)
aegis_role: "aegis_role" # Claim carrying AegisRole in aegis-system realm
tenant_id: "tenant_id" # Claim carrying per-user tenant slug (u-{uuid}) for consumer users
# ─── gRPC Auth ────────────────────────────────────────────────────────────────
# Optional. Controls IAM/OIDC JWT enforcement on the gRPC endpoint.
# Requires spec.iam to be configured.
grpc_auth:
# Enable JWT validation on all gRPC methods. Default: true
enabled: true
# gRPC method full paths exempt from auth.
# The inner-loop bootstrap channel must always be exempt.
exempt_methods:
- "/aegis.v1.InnerLoop/Generate"
# ─── Secrets (OpenBao) ────────────────────────────────────────────────────────
# Optional. Omit to rely solely on env: references.
# Follows the Keymaster Pattern: only the orchestrator accesses
# OpenBao — agents never receive secret backend credentials directly.
secrets:
backend:
# OpenBao server address.
address: "https://openbao.internal:8200"
# Authentication method. Currently only approle is supported.
auth_method: "approle"
approle:
# AppRole Role ID — public, can be committed to config.
role_id: "env:OPENBAO_ROLE_ID"
# Name of the environment variable containing the AppRole Secret ID.
# The Secret ID itself must never be committed to config files.
secret_id_env_var: "OPENBAO_SECRET_ID"
# OpenBao namespace for multi-tenancy. Maps 1:1 to a Keycloak realm.
namespace: "aegis-system"
# Optional mTLS to OpenBao.
tls:
ca_cert: "/etc/aegis/openbao-ca.pem"
client_cert: null # Optional mTLS client cert
client_key: null # Optional mTLS client key
# ─── Database ─────────────────────────────────────────────────────────────────
# Optional. If omitted the daemon uses InMemory repositories (dev only).
database:
# PostgreSQL connection URL. Supports env: and secret: prefixes.
url: "env:AEGIS_DATABASE_URL"
# Maximum connections in the pool. Default: 5
max_connections: 10
# Connection timeout in seconds. Default: 5
connect_timeout_seconds: 5
# ─── Temporal ─────────────────────────────────────────────────────────────────
# Optional. If omitted workflow orchestration is unavailable.
temporal:
# Temporal gRPC server address. Default: temporal:7233
address: "temporal:7233"
# HTTP endpoint for Temporal worker callbacks. Default: http://localhost:3000
worker_http_endpoint: "http://aegis-runtime:3000"
# Shared secret for authenticating worker callbacks. Supports env:.
worker_secret: "env:TEMPORAL_WORKER_SECRET"
# Temporal namespace. Default: default
namespace: "default"
# Temporal task queue. Default: aegis-agents
task_queue: "aegis-agents"
# Maximum number of connection retries when establishing the Temporal client.
# If omitted, a default of 30 retries is used.
max_connection_retries: 30
# ─── Cortex ───────────────────────────────────────────────────────────────────
# Optional. If omitted or grpc_url is null the daemon runs in memoryless mode.
# When configured, discovery (aegis.agent.search, aegis.workflow.search) is
# available automatically — no separate discovery section is required.
cortex:
# Cortex gRPC service URL. Supports env:.
grpc_url: "env:CORTEX_GRPC_URL"
# API key for 100monkeys hosted Cortex (Zaru SaaS). Supports env: and secret:.
# Omit for local or open cortex deployments (connects without authentication).
api_key: "env:CORTEX_API_KEY"
# ─── External SEAL Tooling Gateway ─────────────────────────────────
# Optional. Omit to keep external tool routing disabled.
seal_gateway:
# gRPC endpoint URL for aegis-seal-gateway.
url: "http://aegis-seal-gateway:50055"
# ─── Observability ────────────────────────────────────────────────────────────
# Optional.
observability:
logging:
# Log level: error | warn | info | debug | trace. Default: info
level: info
# Output format: json | text. Default: json
format: json
# Log file path. Omit to write to stdout.
file: null
# ── OTLP Log Export ──────────────────────────────────────────────
# Set to ship logs to Grafana Cloud, Datadog, or a self-hosted OTEL Collector.
# Omit (or null) to disable. Override with AEGIS_OTLP_ENDPOINT.
# otlp_endpoint: "http://otel-collector:4317" # gRPC
# otlp_endpoint: "https://otlp-gateway.grafana.net/v1/logs" # HTTP
# otlp_protocol: grpc # grpc (default) | http. Override: AEGIS_OTLP_PROTOCOL
# otlp_headers: # auth headers; values support env: / secret:
# Authorization: "env:OTLP_AUTH_TOKEN"
# otlp_min_level: info # min level exported. Override: AEGIS_OTLP_LOG_LEVEL
# otlp_service_name: aegis-orchestrator # service.name attr. Override: AEGIS_OTLP_SERVICE_NAME
# batch:
# max_queue_size: 2048
# scheduled_delay_ms: 5000
# max_export_batch_size: 512
# export_timeout_ms: 10000
# tls:
# verify: true
# ca_cert_path: null
metrics:
# Enable Prometheus metrics exposition. Default: true
enabled: true
# Prometheus metrics exposition port. Default: 9091
port: 9091
# HTTP path for scraping. Default: /metrics
path: "/metrics"
tracing:
# Enable distributed tracing via OpenTelemetry. Default: false
enabled: falseSection Reference
spec.node
Required. Identifies this node within the AEGIS cluster.
| Key | Type | Required | Default | Description |
|---|---|---|---|---|
id | string | ✅ | — | Unique stable node identifier. UUID recommended. Fails validation if empty. |
type | enum | ✅ | — | edge | orchestrator | hybrid |
region | string | ❌ | null | Geographic region (e.g., "us-east-1") |
tags | string[] | ❌ | [] | Capability tags matched against execution_targets in agent manifests |
resources.cpu_cores | u32 | ❌ | — | Available CPU cores |
resources.memory_gb | u32 | ❌ | — | Available RAM in GB |
resources.disk_gb | u32 | ❌ | — | Available disk in GB |
resources.gpu | bool | ❌ | false | GPU available |
spec.image_tag
Optional. Docker image tag for AEGIS-owned services.
| Key | Type | Default | Description |
|---|---|---|---|
image_tag | string | <binary version> | Tag applied to all AEGIS-owned Docker images. Written by aegis init --tag and updated by aegis update. When absent, defaults to the version string embedded in the aegis binary. |
spec.llm_providers
Required array. At least one entry with at least one model is required.
| Key | Type | Required | Default | Description |
|---|---|---|---|---|
name | string | ✅ | — | Unique provider name |
type | enum | ✅ | — | openai | anthropic | ollama | openai-compatible |
endpoint | string | ✅ | — | API endpoint URL |
api_key | string | ❌ | null | API key. Supports env: and secret:. |
enabled | bool | ❌ | true | Whether this provider is active |
models[].alias | string | ✅ | — | Alias referenced in agent manifests |
models[].model | string | ✅ | — | Provider-side model identifier |
models[].capabilities | string[] | ✅ | — | chat | embedding | reasoning | vision | code |
models[].context_window | u32 | ✅ | — | Max context window in tokens |
models[].cost_per_1k_tokens | f64 | ❌ | 0.0 | Cost per 1K tokens (0.0 for free/local) |
spec.llm_selection
Optional. Controls runtime provider selection strategy.
| Key | Type | Default | Description |
|---|---|---|---|
strategy | enum | prefer-local | prefer-local | prefer-cloud | cost-optimized | latency-optimized |
default_provider | string | null | Provider to use when no preference is specified |
fallback_provider | string | null | Provider to use if the primary fails |
max_retries | u32 | 3 | Maximum retry attempts on LLM failure |
retry_delay_ms | u64 | 1000 | Delay between retries in milliseconds |
spec.runtime
Optional. Controls how agent containers are launched.
| Key | Type | Default | Description |
|---|---|---|---|
bootstrap_script | string | assets/bootstrap.py | Path to bootstrap script relative to orchestrator binary |
default_isolation | enum | inherit | docker | podman | firecracker | inherit | process |
container_socket_path | string | (platform default) | Container runtime socket path. Docker: /var/run/docker.sock. Podman rootless: /run/user/<UID>/podman/podman.sock. Podman system: /run/podman/podman.sock. Can be overridden via CONTAINER_HOST or DOCKER_HOST env vars. |
container_network_mode | string | null | Container network name for agent containers |
orchestrator_url | string | http://localhost:8088 | Callback URL reachable from inside agent containers |
nfs_server_host | string | null | Critical for volume mounts. NFS server host as seen by the Docker daemon host OS. See platform table below. |
nfs_port | u16 | 2049 | NFS server port |
nfs_mountport | u16 | 2049 | NFS mountd port |
nfs_server_host by environment:
| Environment | Value |
|---|---|
| WSL2 / Linux native | "127.0.0.1" |
| Docker Desktop (macOS) | "host.docker.internal" |
| Linux bridge network | "172.17.0.1" (Docker bridge gateway) |
| Remote / VM host | <physical host IP> |
| Via env var | "env:AEGIS_NFS_HOST" |
spec.network
Optional. Configures ports and TLS.
| Key | Type | Default | Description |
|---|---|---|---|
bind_address | string | 0.0.0.0 | Network interface to bind all listeners |
port | u16 | 8088 | HTTP REST API port |
grpc_port | u16 | 50051 | gRPC API port |
orchestrator_endpoint | string | null | WebSocket URL for edge → orchestrator connection (edge nodes only) |
heartbeat_interval_seconds | u64 | 30 | Health check ping interval |
tls.cert_path | string | — | TLS certificate path |
tls.key_path | string | — | TLS private key path |
tls.ca_path | string | null | CA certificate path (optional) |
spec.cluster
Optional. Configures the node's role in the multi-node cluster topology.
| Key | Type | Required | Default | Description |
|---|---|---|---|---|
enabled | bool | ❌ | false | Enable cluster mode. |
role | enum | ✅ | hybrid | Node role: controller | worker | hybrid |
controller_endpoint | string | ❌ | — | gRPC endpoint of the controller node. Required for role: worker. |
cluster_grpc_port | u16 | ❌ | 50056 | Port for NodeClusterService (controllers/hybrids) |
peers | string[] | ❌ | [] | Static list of peer controller addresses |
node_keypair_path | string | ✅ | — | Path to the persistent Ed25519 keypair file for node identity |
heartbeat_interval_secs | u64 | ❌ | 30 | Interval in seconds for worker heartbeats |
token_refresh_margin_secs | u64 | ❌ | 120 | Token re-attestation margin in seconds |
tls.enabled | bool | ❌ | true | Enable TLS for cluster communication |
tls.cert_path | string | ❌ | — | Path to node TLS certificate |
tls.key_path | string | ❌ | — | Path to node TLS private key |
tls.ca_cert | string | ❌ | — | Path to CA certificate for peer verification |
spec.storage
Optional. Defaults to the local_host backend.
| Key | Type | Default | Description |
|---|---|---|---|
backend | enum | local_host | seaweedfs | local_host |
fallback_to_local | bool | true | Gracefully fall back to local storage when SeaweedFS is unreachable |
nfs_port | u16 | 2049 | NFS Server Gateway listen port |
seaweedfs.filer_url | string | http://localhost:8888 | SeaweedFS Filer endpoint |
seaweedfs.mount_point | string | /var/lib/aegis/storage | Host filesystem mount point |
seaweedfs.default_ttl_hours | u32 | 24 | Default TTL for ephemeral volumes (hours) |
seaweedfs.default_size_limit_mb | u64 | 1000 | Default per-volume size quota (MB) |
seaweedfs.max_size_limit_mb | u64 | 10000 | Hard ceiling on volume size (MB) |
seaweedfs.gc_interval_minutes | u32 | 60 | Expired volume GC interval (minutes) |
seaweedfs.s3_endpoint | string | null | Optional SeaweedFS S3 gateway endpoint |
seaweedfs.s3_region | string | us-east-1 | S3 gateway region |
local_host.base_path | string | /var/lib/aegis/local-volumes | Base directory for local volume storage |
local_host.default_ttl_hours | u32 | 24 | Default TTL for ephemeral volumes (hours) |
local_host.default_size_limit_mb | u64 | 1000 | Default per-volume quota (MB) |
local_host.max_size_limit_mb | u64 | 10000 | Hard ceiling on volume size (MB) |
spec.deploy_builtins
Optional. Default: false.
| Key | Type | Default | Description |
|---|---|---|---|
deploy_builtins | bool | false | Deploy vendored built-in agent and workflow templates on startup. Includes agent-creator-agent, workflow-generator-planner-agent, judge agents, intent-executor-discovery-agent, intent-result-formatter-agent, skill-validator, and the builtin-workflow-generator, builtin-intent-to-execution, and skill-import workflows. Required for aegis.agent.generate, aegis.workflow.generate, and aegis.execute.intent to function. |
spec.mcp_servers
Optional array. Each entry defines an external MCP Tool Server process.
| Key | Type | Default | Description |
|---|---|---|---|
name | string | — | Unique server name on this node |
enabled | bool | true | Whether to start this server |
executable | string | — | Executable path |
args | string[] | [] | Command-line arguments |
capabilities | string[] | [] | Tool names this server provides (used for routing) |
credentials | map | {} | API keys/tokens injected as env vars. Values support secret:. |
environment | map | {} | Non-secret env vars for the server process |
health_check.interval_seconds | u64 | 60 | Health check interval |
health_check.timeout_seconds | u64 | 5 | Health check timeout |
health_check.method | string | tools/list | MCP method used to health-check the server |
resource_limits.cpu_millicores | u32 | 1000 | CPU limit (1000 = 1 core) |
resource_limits.memory_mb | u32 | 512 | Memory limit (MB) |
spec.seal
Optional. Enables cryptographic agent authorization via SEAL. Required in production.
| Key | Type | Default | Description |
|---|---|---|---|
private_key_path | string | — | Path to RSA private key PEM for signing SecurityToken JWTs |
public_key_path | string | — | Path to RSA public key PEM for verifying SecurityToken JWTs |
issuer | string | aegis-orchestrator | JWT iss claim |
audiences | string[] | [aegis-agents] | JWT aud claims |
token_ttl_seconds | u64 | 3600 | SecurityToken lifetime in seconds |
spec.security_contexts
Optional array. Named permission boundaries assigned to agents at execution time.
Each entry (SecurityContextDefinition):
| Key | Type | Default | Description |
|---|---|---|---|
name | string | — | Unique context name, referenced in agent manifests |
description | string | "" | Human-readable description |
capabilities | array | [] | Tool permissions granted by this context |
deny_list | string[] | [] | Explicit tool deny list; overrides any matching capability |
Each capabilities entry (CapabilityDefinition):
| Key | Type | Description |
|---|---|---|
tool_pattern | string | Tool name pattern (e.g., "fs.*", "cmd.run", "web.fetch", "*") |
path_allowlist | string[] | Allowed filesystem path prefixes (for fs.* tools) |
subcommand_allowlist | object | Map of base command → allowed first positional arguments (for cmd.run). Example: {cargo: ["build","test"]}. |
domain_allowlist | string[] | Allowed network domain suffixes (for web.* tools) |
rate_limit.calls | u32 | Number of calls allowed per window |
rate_limit.per_seconds | u32 | Window size in seconds |
max_response_size | u64 | Max response size in bytes |
spec.builtin_dispatchers
Optional. Configures the cmd Dispatch Protocol handler. This is an in-process handler — it is not an MCP server process.
| Key | Type | Default | Description |
|---|---|---|---|
cmd.enabled | bool | true | Enable cmd.run dispatch |
cmd.default_timeout_secs | u64 | 60 | Default subprocess timeout |
cmd.max_timeout_ceiling_secs | u64 | 300 | Maximum timeout an agent manifest may request |
cmd.max_output_bytes | u64 | 524288 | Max stdout + stderr per subprocess (512 KB) |
cmd.max_concurrent_per_execution | u32 | 1 | Max concurrent subprocesses per execution |
cmd.global_env_denylist | string[] | (see sample) | Env vars that must never be forwarded to agent subprocesses |
spec.iam
Optional. Configures IAM/OIDC as the trusted JWT issuer. Omit to disable JWT validation (dev only; never in production).
| Key | Type | Default | Description |
|---|---|---|---|
realms[].slug | string | — | Realm name matching the Keycloak configuration |
realms[].issuer_url | string | — | OIDC issuer URL |
realms[].jwks_uri | string | — | JWKS endpoint for JWT signature verification |
realms[].audience | string | — | Expected aud claim in tokens from this realm |
realms[].kind | enum | — | system | consumer | tenant |
jwks_cache_ttl_seconds | u32 | 300 | JWKS key cache TTL; refreshed automatically to support key rotation |
claims.zaru_tier | string | zaru_tier | Keycloak claim name carrying ZaruTier |
claims.aegis_role | string | aegis_role | Keycloak claim name carrying AegisRole |
spec.grpc_auth
Optional. Controls IAM/OIDC JWT enforcement on the gRPC endpoint. Requires spec.iam to be configured.
| Key | Type | Default | Description |
|---|---|---|---|
enabled | bool | true | Enforce JWT validation on gRPC methods |
exempt_methods | string[] | [/aegis.v1.InnerLoop/Generate] | gRPC method full paths exempt from auth. The inner-loop bootstrap channel must always be exempt. |
spec.secrets
Optional. Configures OpenBao as the secrets backend. Follows the Keymaster Pattern — agents never access OpenBao directly.
| Key | Type | Default | Description |
|---|---|---|---|
backend.address | string | — | OpenBao server URL |
backend.auth_method | string | approle | Authentication method. Only approle is currently supported. |
backend.approle.role_id | string | — | AppRole Role ID (public; safe to commit) |
backend.approle.secret_id_env_var | string | OPENBAO_SECRET_ID | Name of the environment variable containing the AppRole Secret ID. Never commit the actual Secret ID. |
backend.namespace | string | — | OpenBao namespace (maps 1:1 to an IAM realm) |
backend.tls.ca_cert | string | null | CA certificate path |
backend.tls.client_cert | string | null | mTLS client certificate path |
backend.tls.client_key | string | null | mTLS client key path |
spec.database
Optional. PostgreSQL connection for persistent state (executions, patterns, workflows). If omitted, the daemon uses in-memory repositories (development mode only).
| Key | Type | Default | Description |
|---|---|---|---|
url | string | — | PostgreSQL connection URL. Supports env: and secret:. |
max_connections | u32 | 5 | Maximum connections in the pool |
connect_timeout_seconds | u64 | 5 | Connection timeout |
spec.temporal
Optional. Temporal workflow engine configuration for durable workflow execution. If omitted, workflow orchestration features are unavailable.
| Key | Type | Default | Description |
|---|---|---|---|
address | string | temporal:7233 | Temporal gRPC server address |
worker_http_endpoint | string | http://localhost:3000 | HTTP endpoint for Temporal worker callbacks. Supports env:. |
worker_secret | string | null | Shared secret for authenticating worker callbacks. Supports env: and secret:. |
namespace | string | default | Temporal namespace |
task_queue | string | aegis-agents | Temporal task queue name |
max_connection_retries | i32 | 30 | Maximum number of connection retries when establishing the Temporal client. |
spec.cortex
Optional. Cortex memory and learning service configuration. If omitted or grpc_url is null, the daemon runs in memoryless mode — patterns are simply not stored.
| Key | Type | Default | Description |
|---|---|---|---|
grpc_url | string | null | Cortex gRPC service URL. Supports env:. |
api_key | string | null | API key for 100monkeys hosted Cortex (Zaru SaaS). Supports env: and secret: prefixes. When absent, the orchestrator connects without authentication (local/open cortex). |
spec.discovery
Discovery — Semantic agent and workflow search (aegis.agent.search, aegis.workflow.search) is powered by the Cortex service. When spec.cortex is configured with a valid grpc_url and api_key, discovery is available automatically. No separate spec.discovery configuration is required.
spec.seal_gateway
Optional. Configures forwarding of external tool invocations to the standalone SEAL tooling gateway.
| Key | Type | Default | Description |
|---|---|---|---|
url | string | null | gRPC endpoint URL of aegis-seal-gateway (example: http://aegis-seal-gateway:50055). |
If omitted, orchestrator does not forward unknown/external tools to the gateway and continues with built-in routing only.
spec.max_execution_list_limit
Optional. Upper bound on executions returned by a single list_executions request.
| Key | Type | Default | Description |
|---|---|---|---|
max_execution_list_limit | usize | 1000 | Maximum number of executions returned by a single list_executions request. Protects against excessive memory usage. |
spec.observability
Optional.
logging
| Key | Type | Default | Env Override | Description |
|---|---|---|---|---|
logging.level | enum | info | RUST_LOG | error | warn | info | debug | trace |
logging.format | enum | json | AEGIS_LOG_FORMAT | json | text |
logging.file | string | null | — | Log file path. Omit to write to stdout. |
logging.otlp_endpoint | string | null | AEGIS_OTLP_ENDPOINT | OTLP collector endpoint. Setting this enables OTLP log export. |
logging.otlp_protocol | enum | grpc | AEGIS_OTLP_PROTOCOL | grpc | http |
logging.otlp_headers | map | {} | AEGIS_OTLP_HEADERS | Key-value headers sent with every OTLP export RPC. Values support env: and secret: prefixes. Env var uses comma-separated key=value pairs. |
logging.otlp_min_level | string | info | AEGIS_OTLP_LOG_LEVEL | Minimum log level forwarded to OTLP (does not affect stdout). |
logging.otlp_service_name | string | aegis-orchestrator | AEGIS_OTLP_SERVICE_NAME | service.name resource attribute |
logging.otlp_batch.max_queue_size | u32 | 2048 | — | Maximum buffered records before export |
logging.otlp_batch.scheduled_delay_ms | u64 | 5000 | — | Batch flush interval (ms) |
logging.otlp_batch.max_export_batch_size | u32 | 512 | — | Records per export RPC |
logging.otlp_batch.export_timeout_ms | u64 | 10000 | — | Per-call timeout (ms) |
logging.otlp_tls.verify | bool | true | — | Verify OTLP endpoint TLS certificate |
logging.otlp_tls.ca_cert_path | string | null | — | Custom CA certificate for self-signed backends |
metrics
| Key | Type | Default | Description |
|---|---|---|---|
metrics.enabled | bool | true | Enable Prometheus metrics |
metrics.port | u16 | 9091 | Prometheus metrics exposition port |
metrics.path | string | /metrics | HTTP path for scraping |
tracing
| Key | Type | Default | Description |
|---|---|---|---|
tracing.enabled | bool | false | Enable distributed tracing via OpenTelemetry |