Full annotated aegis-config.yaml with all supported keys, types, defaults, and descriptions.

Daemon Configuration

The AEGIS daemon is configured via a single YAML file, by default aegis-config.yaml in the working directory. Pass a custom path with --config:

aegis daemon start --config /etc/aegis/config.yaml

Config Discovery Order

The daemon searches for a config file in the following order. The first match wins:

--config <path> CLI flag
AEGIS_CONFIG_PATH environment variable
./aegis-config.yaml (working directory)
~/.aegis/config.yaml
/etc/aegis/config.yaml (Linux/macOS)

Credential Resolution

Any string value in the config file can use a credential prefix instead of a literal:

Prefix	Resolution
`env:VAR_NAME`	Read from the daemon process environment at startup.
`secret:path/to/secret`	Resolved from OpenBao at runtime (requires `spec.secrets.backend` configured).
(literal)	Plaintext. Avoid for secrets.

The recommended production pattern is secret: references for all API keys and credentials. Use env: as a fallback when OpenBao is not available.

AEGIS supports a distributed Controller-Worker topology for high availability and horizontal scaling. Nodes can be configured as controllers (scheduling and management), workers (agent execution), or hybrid nodes.

Key features of the cluster topology include:

Distributed Scheduling: Controllers route agent executions to the most suitable available workers.
Node Identity: Each node maintains a stable identity via a persistent Ed25519 keypair, used for secure attestation and SealNodeEnvelope signing.
Secure Communication: Inter-node traffic is protected by mTLS and SEAL-derived security tokens.
Seamless Scaling: New workers can be added to the cluster by pointing them at the controller endpoint and providing valid certificates.

Manifest Envelope

All aegis-config.yaml files use the Kubernetes-style manifest envelope:

apiVersion: 100monkeys.ai/v1
kind: NodeConfig
metadata:
  name: "my-aegis-node"          # required
  version: "1.0.0"               # optional
  labels:                        # optional
    environment: "production"
spec:
  # All configuration sections documented below sit under spec:

The sections documented below all belong under the top-level spec: key.

Full Annotated Configuration

apiVersion: 100monkeys.ai/v1
kind: NodeConfig
metadata:
  name: "aegis-node"
spec:

# ─── Node Identity ────────────────────────────────────────────────────────────
node:
  # Unique stable node identifier. UUID recommended. Required.
  id: "env:AEGIS_NODE_ID"
  # Node type. Options: edge | orchestrator | hybrid. Required.
  type: orchestrator
  # Geographic region, e.g. "us-east-1". Optional.
  region: "us-east-1"
  # Capability tags used to match agent manifest execution_targets. Optional.
  tags:
    - gpu
    - high-memory
  # Physical resources available on this node. Optional.
  resources:
    cpu_cores: 8
    memory_gb: 32
    disk_gb: 500
    gpu: false

# ─── Image Tag ────────────────────────────────────────────────────────────────
# Docker image tag for all AEGIS-owned service containers.
# Written by `aegis init --tag <TAG>` and updated by `aegis update`.
# When absent, both commands default to the version of the aegis binary.
image_tag: "0.1.0-pre-alpha"

# ─── LLM Providers ────────────────────────────────────────────────────────────
# Array of LLM provider configurations. At least one entry is required.
llm_providers:
  - name: openai-primary           # Unique provider name on this node. Required.
    type: openai                   # openai | anthropic | ollama | openai-compatible. Required.
    endpoint: "https://api.openai.com/v1"  # API endpoint URL. Required.
    api_key: "env:OPENAI_API_KEY"  # Supports env: and secret: prefixes.
    enabled: true                  # Default: true
    models:                        # Must have at least one entry. Required.
      - alias: default             # Alias referenced in agent manifests. Required.
        model: gpt-4o              # Provider-side model name. Required.
        capabilities:              # chat | embedding | reasoning | vision | code. Required.
          - chat
          - code
          - reasoning
        context_window: 128000     # Max context window in tokens. Required.
        cost_per_1k_tokens: 0.005  # Default: 0.0 (free/local)
      - alias: fast
        model: gpt-4o-mini
        capabilities: [chat, code]
        context_window: 128000
        cost_per_1k_tokens: 0.00015

  - name: anthropic-primary
    type: anthropic
    endpoint: "https://api.anthropic.com/v1"
    api_key: "secret:aegis-system/llm/anthropic-api-key"
    enabled: true
    models:
      - alias: smart
        model: claude-sonnet-4-5
        capabilities: [chat, code, reasoning]
        context_window: 200000
        cost_per_1k_tokens: 0.003

  - name: ollama-local
    type: ollama
    endpoint: "http://localhost:11434"
    enabled: true
    models:
      - alias: local
        model: qwen2.5-coder:32b
        capabilities: [chat, code]
        context_window: 32000
        cost_per_1k_tokens: 0.0

# ─── LLM Selection Strategy ───────────────────────────────────────────────────
# Optional. Controls how the orchestrator picks providers at runtime.
llm_selection:
  # prefer-local | prefer-cloud | cost-optimized | latency-optimized
  # Default: prefer-local
  strategy: prefer-local
  # Provider name to use when no preference is specified. Default: null (auto-select).
  default_provider: openai-primary
  # Provider to use if the primary fails. Default: null.
  fallback_provider: ollama-local
  # Maximum retry attempts on LLM failure. Default: 3
  max_retries: 3
  # Delay between retries in milliseconds. Default: 1000
  retry_delay_ms: 1000

# ─── Execution Limits ────────────────────────────────────────────────────────
# Optional. Protects list_executions from returning unbounded result sets. Defaults to 1000 when omitted.
max_execution_list_limit: 1000

# ─── Runtime ──────────────────────────────────────────────────────────────────
# Optional. Has safe defaults for Docker.
runtime:
  # Path to the agent bootstrap script, relative to the orchestrator binary.
  # Default: assets/bootstrap.py
  bootstrap_script: "assets/bootstrap.py"
  # Default isolation level for executions that do not specify one.
  # Options: docker | podman | firecracker | inherit | process
  # Default: inherit (uses the node's compiled-in default)
  default_isolation: docker
  # Container runtime socket path. Omit to use the platform default:
  #   Docker on Linux/macOS:    /var/run/docker.sock
  #   Podman rootless:          /run/user/<UID>/podman/podman.sock
  #   Podman system (root):     /run/podman/podman.sock
  # Can also be set via CONTAINER_HOST or DOCKER_HOST env vars (see below).
  container_socket_path: "/var/run/docker.sock"
  # Container network name for agent containers.
  # Supports env: prefix. Example: "env:AEGIS_CONTAINER_NETWORK"
  container_network_mode: "aegis-network"
  # URL that agent containers use to call back to the orchestrator.
  # Must be reachable from inside containers, not from the host.
  # Default: http://localhost:8088
  orchestrator_url: "env:AEGIS_ORCHESTRATOR_URL"

  # ── Container Host Environment Variables ──────────────────────────────────
  # The container socket can also be configured via environment variables.
  # These override container_socket_path when set:
  #
  #   CONTAINER_HOST=unix:///run/user/1000/podman/podman.sock
  #   DOCKER_HOST=unix:///var/run/docker.sock
  #
  # CONTAINER_HOST takes precedence over DOCKER_HOST when both are set.
  # Podman users should set CONTAINER_HOST to point to the Podman socket.

  # ── NFS Storage Gateway (required when spec.storage.backend: seaweedfs) ───
  # Hostname/IP of the NFS server. Must resolve from the HOST OS where the
  # Docker daemon runs — NOT from inside agent containers.
  # Supports env: prefix.
  #
  # Platform-specific values:
  #   WSL2 / Linux native:          "127.0.0.1"
  #   Docker Desktop (Win/Mac):     "host.docker.internal"
  #   Linux bridge network:         "172.17.0.1"  (Docker bridge gateway)
  #   Remote / VM host:             "<physical host IP>"
  nfs_server_host: "env:AEGIS_NFS_HOST"
  # NFS server listen port. Default: 2049
  nfs_port: 2049
  # NFS mountd port. Default: 2049
  nfs_mountport: 2049

# ─── Network ──────────────────────────────────────────────────────────────────
# Optional. Has safe defaults.
network:
  # Bind address for all listeners. Default: 0.0.0.0
  bind_address: "0.0.0.0"
  # HTTP REST API port. Default: 8088
  port: 8088
  # gRPC API port (inner-loop transport, Temporal workers). Default: 50051
  grpc_port: 50051
  # WebSocket URL for edge-node → orchestrator connection. Omit on orchestrator nodes.
  orchestrator_endpoint: null
  # Health check ping interval in seconds. Default: 30
  heartbeat_interval_seconds: 30
  # Optional TLS. Omit for plaintext (dev only; never in production).
  tls:
    cert_path: "/etc/aegis/tls/server.crt"
    key_path: "/etc/aegis/tls/server.key"
    ca_path: null    # Optional CA certificate path

# ─── Cluster ──────────────────────────────────────────────────────────────────
# Optional. Configures the node's role in the multi-node cluster topology.
cluster:
  # Enable cluster mode. Default: false.
  enabled: true
  
  # Node role in cluster. Options: controller | worker | hybrid. Default: hybrid.
  # - controller: Manages routing and registration; does not run executions.
  # - worker: Runs agent executions; does not perform routing decisions.
  # - hybrid: Both controller and worker duties (default).
  role: worker

  # Controller settings (required for workers)
  controller:
    # gRPC endpoint of the controller node.
    endpoint: "grpc://aegis-controller:50056"
    # Bootstrap token for initial attestation (Step 0).
    token: "env:AEGIS_CLUSTER_TOKEN"

  # Port for NodeClusterService (controllers/hybrids). Default: 50056.
  cluster_grpc_port: 50056
  # Static list of peer controller addresses. Default: [].
  peers: []
  # Path to the persistent Ed25519 keypair file for node identity.
  # Generated automatically on first startup if missing.
  node_keypair_path: "/etc/aegis/node_keypair.pem"
  # Interval in seconds for worker heartbeats to the controller. Default: 30.
  heartbeat_interval_secs: 30
  # Re-attest this many seconds before the security token expires. Default: 120.
  token_refresh_margin_secs: 120
  # TLS configuration for secure inter-node cluster communication (mTLS).
  tls:
    enabled: true
    cert_path: "/etc/aegis/certs/node.crt"
    key_path: "/etc/aegis/certs/node.key"
    ca_cert: "/etc/aegis/certs/ca.crt"

# ─── Storage ──────────────────────────────────────────────────────────────────
# Optional. Defaults to local_host backend.
storage:
  # Storage backend. Options: seaweedfs | local_host. Default: local_host
  backend: seaweedfs
  # Gracefully fall back to local storage when SeaweedFS is unreachable.
  # Default: true
  fallback_to_local: true
  # NFS Server Gateway listen port. Default: 2049
  nfs_port: 2049

  # SeaweedFS backend config. Required when backend: seaweedfs.
  seaweedfs:
    # SeaweedFS Filer endpoint.
    # Default: http://localhost:8888
    filer_url: "http://localhost:8888"
    # Host filesystem mount point.
    # Default: /var/lib/aegis/storage
    mount_point: "/var/lib/aegis/storage"
    # Default TTL for ephemeral volumes in hours. Default: 24
    default_ttl_hours: 24
    # Default per-volume size quota in MB. Default: 1000
    default_size_limit_mb: 1000
    # Hard ceiling on any single volume size in MB. Default: 10000
    max_size_limit_mb: 10000
    # GC interval for expired volumes in minutes. Default: 60
    gc_interval_minutes: 60
    # Optional S3 gateway endpoint (e.g., for direct object uploads).
    s3_endpoint: null
    # S3 gateway region. Default: us-east-1
    s3_region: "us-east-1"

  # Local storage config. Used when backend: local_host, or as the fallback target.
  local_host:
    # Base directory for local volume storage.
    # Default: /var/lib/aegis/local-volumes
    base_path: "/var/lib/aegis/local-volumes"
    # Default TTL for ephemeral volumes in hours. Default: 24
    default_ttl_hours: 24
    # Default per-volume size quota in MB. Default: 1000
    default_size_limit_mb: 1000
    # Hard ceiling on any single volume size in MB. Default: 10000
    max_size_limit_mb: 10000

# ─── Deploy Built-In Templates ───────────────────────────────────────────────
# Deploy vendored built-in agent and workflow templates on startup.
# Includes agent-creator-agent, workflow-generator-planner-agent, judge agents,
# intent-executor-discovery-agent, intent-result-formatter-agent, skill-validator,
# and the builtin-workflow-generator, builtin-intent-to-execution, and skill-import workflows.
# Required for aegis.agent.generate, aegis.workflow.generate, and aegis.execute.intent to function.
# Default: false
deploy_builtins: false

# ─── Force Deploy Built-In Templates ─────────────────────────────────────────
# Force re-registration of all built-in agents and workflows on startup, even if
# already registered. Use after a platform upgrade when built-in agent UUIDs or
# definitions have changed and stale registrations need to be flushed.
# Accepts: "true" | "false" | "env:VAR_NAME". Optional. Default: disabled.
force_deploy_builtins: "false"

# ─── MCP Tool Servers ─────────────────────────────────────────────────────────
# External MCP server processes. Optional array.
mcp_servers:
  - name: web-search              # Unique server name on this node. Required.
    enabled: true                 # Default: true
    # Executable path (absolute or relative to /usr/local/bin). Required.
    executable: "node"
    # Command-line arguments. Default: []
    args:
      - "/opt/aegis-tools/web-search/index.js"
    # Tool capabilities this server provides (used for routing). Default: []
    capabilities:
      - web.search
      - web.fetch
    # API keys and tokens — resolved via env: or secret: prefixes.
    # Values are injected as environment variables into the server process.
    credentials:
      SEARCH_API_KEY: "secret:aegis-system/tools/search-api-key"
    # Non-secret environment variables for the server process. Default: {}
    environment:
      LOG_LEVEL: "info"
    # Health check configuration.
    health_check:
      interval_seconds: 60        # Default: 60
      timeout_seconds: 5          # Default: 5
      method: "tools/list"        # MCP method used for health check. Default: tools/list
    # Resource limits for the server process.
    resource_limits:
      cpu_millicores: 1000        # 1000 = 1 CPU core
      memory_mb: 512

# ─── SEAL (Signed Envelope Attestation Layer) ────────────────────────────────────
# Optional. Required in production to enable cryptographic agent authorization.
seal:
  # RSA private key PEM used to sign SecurityToken JWTs issued at attestation.
  private_key_path: "/etc/aegis/seal/private.pem"
  # RSA public key PEM used to verify SecurityToken JWTs on tool calls.
  public_key_path: "/etc/aegis/seal/public.pem"
  # JWT iss claim. Default: aegis-orchestrator
  issuer: "aegis-orchestrator"
  # JWT aud claim values. Default: [aegis-agents]
  audiences:
    - "aegis-agents"
  # SecurityToken lifetime in seconds. Default: 3600 (1 hour)
  token_ttl_seconds: 3600

# ─── Security Contexts ────────────────────────────────────────────────────────
# Named permission boundaries controlling what tools agents may invoke.
# Referenced by name in agent manifests and in spec.iam ZaruTier mappings.
security_contexts:
  - name: coder-default
    description: "Standard coder context — filesystem + commands + safe package registries"
    capabilities:
      - tool_pattern: "fs.*"        # ← tool_pattern, not tool
        path_allowlist:
          - /workspace
          - /agent
      - tool_pattern: "cmd.run"
        subcommand_allowlist:
          git: [clone, add, commit, push, pull, status, diff, stash]
          cargo: [build, test, fmt, clippy, check, run]
          npm: [install, run, test, build, ci]
          python: ["-m"]
      - tool_pattern: "web.fetch"
        domain_allowlist:
          - pypi.org
          - crates.io
          - npmjs.com
        rate_limit:
          calls: 30                 # ← object with calls + per_seconds, not "30/minute"
          per_seconds: 60
    # Explicit deny list — overrides any matching capability above.
    deny_list: []

  - name: zaru-free
    description: "Zaru Free tier: ephemeral volumes only, no outbound network"
    capabilities:
      - tool_pattern: "fs.*"
        path_allowlist:
          - /workspace
      - tool_pattern: "cmd.run"
        subcommand_allowlist:
          python: ["-m"]
          npm: [install, run, test]
    deny_list:
      - "web.*"

  - name: zaru-pro
    description: "Zaru Pro tier: full coder-default capabilities"
    capabilities:
      - tool_pattern: "fs.*"
        path_allowlist:
          - /workspace
          - /agent
      - tool_pattern: "cmd.run"
        subcommand_allowlist:
          git: [clone, add, commit, push, pull, status, diff]
          cargo: [build, test, fmt, clippy, check, run]
          npm: [install, run, test, build, ci]
          python: ["-m"]
      - tool_pattern: "web.*"
        domain_allowlist:
          - pypi.org
          - crates.io
          - npmjs.com
          - api.github.com
        rate_limit:
          calls: 60
          per_seconds: 60

  # aegis-system-agent-runtime is a platform built-in — shown here for reference only.
  # Do NOT copy this block into your aegis-config.yaml; it is registered automatically.
  - name: aegis-system-agent-runtime
    description: "Execution surface for agent containers. Grants filesystem, shell, and web access scoped to /workspace."
    capabilities:
      - tool_pattern: "fs.*"
        path_allowlist:
          - /workspace
      - tool_pattern: "cmd.run"
      - tool_pattern: "web.*"
      - tool_pattern: "aegis.agent.get"
      - tool_pattern: "aegis.agent.list"
      - tool_pattern: "aegis.workflow.get"
      - tool_pattern: "aegis.workflow.list"
      - tool_pattern: "aegis.workflow.signal"
      - tool_pattern: "aegis.task.execute"
    deny_list:
      - "aegis.agent.delete"
      - "aegis.workflow.delete"
      - "aegis.task.remove"
      - "aegis.system.info"
      - "aegis.system.config"

  - name: aegis-system-operator
    description: "Platform operator — all safe tools plus destructive and orchestrator commands"
    capabilities:
      - tool_pattern: "fs.*"
        path_allowlist:
          - /workspace
          - /agent
          - /shared
      - tool_pattern: "cmd.run"
        subcommand_allowlist:
          git: [clone, add, commit, push, pull, status, diff, stash]
          cargo: [build, test, fmt, clippy, check, run]
          npm: [install, run, test, build, ci]
          python: ["-m"]
      - tool_pattern: "web.*"
      # Destructive commands (operator-only)
      - tool_pattern: "aegis.agent.delete"
      - tool_pattern: "aegis.workflow.delete"
      - tool_pattern: "aegis.task.remove"
      # Orchestrator commands (operator-only)
      - tool_pattern: "aegis.system.info"
      - tool_pattern: "aegis.system.config"
    deny_list: []

# ─── Builtin Dispatchers ──────────────────────────────────────────────────────
# Configuration for the in-process Dispatch Protocol handler.
# The cmd dispatcher is NOT an MCP server — it runs subprocesses inside agent
# containers via the bidirectional bootstrap channel.
builtin_dispatchers:
  cmd:
    # Enable cmd.run dispatch. Default: true
    enabled: true
    # Default per-subprocess timeout in seconds. Default: 60
    default_timeout_secs: 60
    # Ceiling timeout an agent manifest may request. Default: 300
    max_timeout_ceiling_secs: 300
    # Maximum stdout + stderr captured per subprocess in bytes. Default: 524288 (512 KB)
    max_output_bytes: 524288
    # Maximum concurrent subprocesses per execution. Default: 1
    max_concurrent_per_execution: 1
    # Environment variables that must never be forwarded to agent subprocesses.
    global_env_denylist:
      - AEGIS_TOKEN
      - OPENAI_API_KEY
      - ANTHROPIC_API_KEY
      - SEAL_PRIVATE_KEY
      - AWS_SECRET_ACCESS_KEY
      - GOOGLE_API_KEY

# ─── IAM (OIDC) ───────────────────────────────────────────────────────────────
# Optional. Omit to disable JWT validation (dev only; never in production).
# Configures Keycloak as the trusted OIDC issuer for all
# human and service-account identities.
iam:
  # Array of Keycloak realms trusted by this node.
  realms:
    - slug: "aegis-system"         # Realm name matching Keycloak config. Required.
      issuer_url: "https://auth.myzaru.com/realms/aegis-system"  # OIDC issuer URL. Required.
      jwks_uri: "https://auth.myzaru.com/realms/aegis-system/protocol/openid-connect/certs"  # Required.
      audience: "aegis-orchestrator"  # Expected aud claim. Required.
      kind: system                 # system | consumer | tenant. Required.
    - slug: "zaru-consumer"
      issuer_url: "https://auth.myzaru.com/realms/zaru-consumer"
      jwks_uri: "https://auth.myzaru.com/realms/zaru-consumer/protocol/openid-connect/certs"
      audience: "aegis-orchestrator"
      kind: consumer
  # JWKS key refresh interval in seconds. Supports live key rotation.
  # Default: 300
  jwks_cache_ttl_seconds: 300
  # Custom claim names injected by Keycloak attribute mappers.
  claims:
    zaru_tier: "zaru_tier"         # Claim carrying ZaruTier (Free | Pro | Business | Enterprise)
    aegis_role: "aegis_role"       # Claim carrying AegisRole in aegis-system realm
    tenant_id: "tenant_id"         # Claim carrying per-user tenant slug (u-{uuid}) for consumer users

# ─── gRPC Auth ────────────────────────────────────────────────────────────────
# Optional. Controls IAM/OIDC JWT enforcement on the gRPC endpoint.
# Requires spec.iam to be configured.
grpc_auth:
  # Enable JWT validation on all gRPC methods. Default: true
  enabled: true
  # gRPC method full paths exempt from auth.
  # The inner-loop bootstrap channel must always be exempt.
  exempt_methods:
    - "/aegis.v1.InnerLoop/Generate"

# ─── Secrets (OpenBao) ────────────────────────────────────────────────────────
# Optional. Omit to rely solely on env: references.
# Follows the Keymaster Pattern: only the orchestrator accesses
# OpenBao — agents never receive secret backend credentials directly.
secrets:
  backend:
    # OpenBao server address.
    address: "https://openbao.internal:8200"
    # Authentication method. Currently only approle is supported.
    auth_method: "approle"
    approle:
      # AppRole Role ID — public, can be committed to config.
      role_id: "env:OPENBAO_ROLE_ID"
      # Name of the environment variable containing the AppRole Secret ID.
      # The Secret ID itself must never be committed to config files.
      secret_id_env_var: "OPENBAO_SECRET_ID"
    # OpenBao namespace for multi-tenancy. Maps 1:1 to a Keycloak realm.
    namespace: "aegis-system"
    # Optional mTLS to OpenBao.
    tls:
      ca_cert: "/etc/aegis/openbao-ca.pem"
      client_cert: null            # Optional mTLS client cert
      client_key: null             # Optional mTLS client key

# ─── Database ─────────────────────────────────────────────────────────────────
# Optional. If omitted the daemon uses InMemory repositories (dev only).
database:
  # PostgreSQL connection URL. Supports env: and secret: prefixes.
  url: "env:AEGIS_DATABASE_URL"
  # Maximum connections in the pool. Default: 5
  max_connections: 10
  # Connection timeout in seconds. Default: 5
  connect_timeout_seconds: 5

# ─── Temporal ─────────────────────────────────────────────────────────────────
# Optional. If omitted workflow orchestration is unavailable.
temporal:
  # Temporal gRPC server address. Default: temporal:7233
  address: "temporal:7233"
  # HTTP endpoint for Temporal worker callbacks. Default: http://localhost:3000
  worker_http_endpoint: "http://aegis-runtime:3000"
  # Shared secret for authenticating worker callbacks. Supports env:.
  worker_secret: "env:TEMPORAL_WORKER_SECRET"
  # Temporal namespace. Default: default
  namespace: "default"
  # Temporal task queue. Default: aegis-agents
  task_queue: "aegis-agents"
  # Maximum number of connection retries when establishing the Temporal client.
  # If omitted, a default of 30 retries is used.
  max_connection_retries: 30

# ─── Cortex ───────────────────────────────────────────────────────────────────
# Optional. If omitted or grpc_url is null the daemon runs in memoryless mode.
# When configured, discovery (aegis.agent.search, aegis.workflow.search) is
# available automatically — no separate discovery section is required.
cortex:
  # Cortex gRPC service URL. Supports env:.
  grpc_url: "env:CORTEX_GRPC_URL"
  # API key for 100monkeys hosted Cortex (Zaru SaaS). Supports env: and secret:.
  # Omit for local or open cortex deployments (connects without authentication).
  api_key: "env:CORTEX_API_KEY"

# ─── External SEAL Tooling Gateway ─────────────────────────────────
# Optional. Omit to keep external tool routing disabled.
seal_gateway:
  # gRPC endpoint URL for aegis-seal-gateway.
  url: "http://aegis-seal-gateway:50055"

# ─── Observability ────────────────────────────────────────────────────────────
# Optional.
observability:
  logging:
    # Log level: error | warn | info | debug | trace. Default: info
    level: info
    # Output format: json | text. Default: json
    format: json
    # Log file path. Omit to write to stdout.
    file: null
    # ── OTLP Log Export ──────────────────────────────────────────────
    # Set to ship logs to Grafana Cloud, Datadog, or a self-hosted OTEL Collector.
    # Omit (or null) to disable. Override with AEGIS_OTLP_ENDPOINT.
    # otlp_endpoint: "http://otel-collector:4317"    # gRPC
    # otlp_endpoint: "https://otlp-gateway.grafana.net/v1/logs"  # HTTP
    # otlp_protocol: grpc   # grpc (default) | http. Override: AEGIS_OTLP_PROTOCOL
    # otlp_headers:         # auth headers; values support env: / secret:
    #   Authorization: "env:OTLP_AUTH_TOKEN"
    # otlp_min_level: info  # min level exported. Override: AEGIS_OTLP_LOG_LEVEL
    # otlp_service_name: aegis-orchestrator  # service.name attr. Override: AEGIS_OTLP_SERVICE_NAME
    # batch:
    #   max_queue_size: 2048
    #   scheduled_delay_ms: 5000
    #   max_export_batch_size: 512
    #   export_timeout_ms: 10000
    # tls:
    #   verify: true
    #   ca_cert_path: null
  metrics:
    # Enable Prometheus metrics exposition. Default: true
    enabled: true
    # Prometheus metrics exposition port. Default: 9091
    port: 9091
    # HTTP path for scraping. Default: /metrics
    path: "/metrics"
  tracing:
    # Enable distributed tracing via OpenTelemetry. Default: false
    enabled: false

Section Reference

`spec.node`

Required. Identifies this node within the AEGIS cluster.

Key	Type	Required	Default	Description
`id`	string	✅	—	Unique stable node identifier. UUID recommended. Fails validation if empty.
`type`	enum	✅	—	`edge` \| `orchestrator` \| `hybrid`
`region`	string	❌	null	Geographic region (e.g., `"us-east-1"`)
`tags`	string[]	❌	`[]`	Capability tags matched against `execution_targets` in agent manifests
`resources.cpu_cores`	u32	❌	—	Available CPU cores
`resources.memory_gb`	u32	❌	—	Available RAM in GB
`resources.disk_gb`	u32	❌	—	Available disk in GB
`resources.gpu`	bool	❌	`false`	GPU available

`spec.image_tag`

Optional. Docker image tag for AEGIS-owned services.

Key	Type	Default	Description
`image_tag`	string	`<binary version>`	Tag applied to all AEGIS-owned Docker images. Written by `aegis init --tag` and updated by `aegis update`. When absent, defaults to the version string embedded in the `aegis` binary.

`spec.llm_providers`

Required array. At least one entry with at least one model is required.

Key	Type	Required	Default	Description
`name`	string	✅	—	Unique provider name
`type`	enum	✅	—	`openai` \| `anthropic` \| `ollama` \| `openai-compatible`
`endpoint`	string	✅	—	API endpoint URL
`api_key`	string	❌	null	API key. Supports `env:` and `secret:`.
`enabled`	bool	❌	`true`	Whether this provider is active
`models[].alias`	string	✅	—	Alias referenced in agent manifests
`models[].model`	string	✅	—	Provider-side model identifier
`models[].capabilities`	string[]	✅	—	`chat` \| `embedding` \| `reasoning` \| `vision` \| `code`
`models[].context_window`	u32	✅	—	Max context window in tokens
`models[].cost_per_1k_tokens`	f64	❌	`0.0`	Cost per 1K tokens (0.0 for free/local)

`spec.llm_selection`

Optional. Controls runtime provider selection strategy.

Key	Type	Default	Description
`strategy`	enum	`prefer-local`	`prefer-local` \| `prefer-cloud` \| `cost-optimized` \| `latency-optimized`
`default_provider`	string	null	Provider to use when no preference is specified
`fallback_provider`	string	null	Provider to use if the primary fails
`max_retries`	u32	`3`	Maximum retry attempts on LLM failure
`retry_delay_ms`	u64	`1000`	Delay between retries in milliseconds

`spec.runtime`

Optional. Controls how agent containers are launched.

Key	Type	Default	Description
`bootstrap_script`	string	`assets/bootstrap.py`	Path to bootstrap script relative to orchestrator binary
`default_isolation`	enum	`inherit`	`docker` \| `podman` \| `firecracker` \| `inherit` \| `process`
`container_socket_path`	string	(platform default)	Container runtime socket path. Docker: `/var/run/docker.sock`. Podman rootless: `/run/user/<UID>/podman/podman.sock`. Podman system: `/run/podman/podman.sock`. Can be overridden via `CONTAINER_HOST` or `DOCKER_HOST` env vars.
`container_network_mode`	string	null	Container network name for agent containers
`orchestrator_url`	string	`http://localhost:8088`	Callback URL reachable from inside agent containers
`nfs_server_host`	string	null	Critical for volume mounts. NFS server host as seen by the Docker daemon host OS. See platform table below.
`nfs_port`	u16	`2049`	NFS server port
`nfs_mountport`	u16	`2049`	NFS mountd port

nfs_server_host by environment:

Environment	Value
WSL2 / Linux native	`"127.0.0.1"`
Docker Desktop (macOS)	`"host.docker.internal"`
Linux bridge network	`"172.17.0.1"` (Docker bridge gateway)
Remote / VM host	`<physical host IP>`
Via env var	`"env:AEGIS_NFS_HOST"`

`spec.network`

Optional. Configures ports and TLS.

Key	Type	Default	Description
`bind_address`	string	`0.0.0.0`	Network interface to bind all listeners
`port`	u16	`8088`	HTTP REST API port
`grpc_port`	u16	`50051`	gRPC API port
`orchestrator_endpoint`	string	null	WebSocket URL for edge → orchestrator connection (edge nodes only)
`heartbeat_interval_seconds`	u64	`30`	Health check ping interval
`tls.cert_path`	string	—	TLS certificate path
`tls.key_path`	string	—	TLS private key path
`tls.ca_path`	string	null	CA certificate path (optional)

`spec.cluster`

Optional. Configures the node's role in the multi-node cluster topology.

Key	Type	Required	Default	Description
`enabled`	bool	❌	`false`	Enable cluster mode.
`role`	enum	✅	`hybrid`	Node role: `controller` \| `worker` \| `hybrid`
`controller_endpoint`	string	❌	—	gRPC endpoint of the controller node. Required for `role: worker`.
`cluster_grpc_port`	u16	❌	`50056`	Port for `NodeClusterService` (controllers/hybrids)
`peers`	string[]	❌	`[]`	Static list of peer controller addresses
`node_keypair_path`	string	✅	—	Path to the persistent Ed25519 keypair file for node identity
`heartbeat_interval_secs`	u64	❌	`30`	Interval in seconds for worker heartbeats
`token_refresh_margin_secs`	u64	❌	`120`	Token re-attestation margin in seconds
`tls.enabled`	bool	❌	`true`	Enable TLS for cluster communication
`tls.cert_path`	string	❌	—	Path to node TLS certificate
`tls.key_path`	string	❌	—	Path to node TLS private key
`tls.ca_cert`	string	❌	—	Path to CA certificate for peer verification

`spec.storage`

Optional. Defaults to the local_host backend.

Key	Type	Default	Description
`backend`	enum	`local_host`	`seaweedfs` \| `local_host`
`fallback_to_local`	bool	`true`	Gracefully fall back to local storage when SeaweedFS is unreachable
`nfs_port`	u16	`2049`	NFS Server Gateway listen port
`seaweedfs.filer_url`	string	`http://localhost:8888`	SeaweedFS Filer endpoint
`seaweedfs.mount_point`	string	`/var/lib/aegis/storage`	Host filesystem mount point
`seaweedfs.default_ttl_hours`	u32	`24`	Default TTL for ephemeral volumes (hours)
`seaweedfs.default_size_limit_mb`	u64	`1000`	Default per-volume size quota (MB)
`seaweedfs.max_size_limit_mb`	u64	`10000`	Hard ceiling on volume size (MB)
`seaweedfs.gc_interval_minutes`	u32	`60`	Expired volume GC interval (minutes)
`seaweedfs.s3_endpoint`	string	null	Optional SeaweedFS S3 gateway endpoint
`seaweedfs.s3_region`	string	`us-east-1`	S3 gateway region
`local_host.base_path`	string	`/var/lib/aegis/local-volumes`	Base directory for local volume storage
`local_host.default_ttl_hours`	u32	`24`	Default TTL for ephemeral volumes (hours)
`local_host.default_size_limit_mb`	u64	`1000`	Default per-volume quota (MB)
`local_host.max_size_limit_mb`	u64	`10000`	Hard ceiling on volume size (MB)

`spec.deploy_builtins`

Optional. Default: false.

Key	Type	Default	Description
`deploy_builtins`	bool	`false`	Deploy vendored built-in agent and workflow templates on startup. Includes agent-creator-agent, workflow-generator-planner-agent, judge agents, intent-executor-discovery-agent, intent-result-formatter-agent, skill-validator, and the builtin-workflow-generator, builtin-intent-to-execution, and skill-import workflows. Required for `aegis.agent.generate`, `aegis.workflow.generate`, and `aegis.execute.intent` to function.

`spec.mcp_servers`

Optional array. Each entry defines an external MCP Tool Server process.

Key	Type	Default	Description
`name`	string	—	Unique server name on this node
`enabled`	bool	`true`	Whether to start this server
`executable`	string	—	Executable path
`args`	string[]	`[]`	Command-line arguments
`capabilities`	string[]	`[]`	Tool names this server provides (used for routing)
`credentials`	map	`{}`	API keys/tokens injected as env vars. Values support `secret:`.
`environment`	map	`{}`	Non-secret env vars for the server process
`health_check.interval_seconds`	u64	`60`	Health check interval
`health_check.timeout_seconds`	u64	`5`	Health check timeout
`health_check.method`	string	`tools/list`	MCP method used to health-check the server
`resource_limits.cpu_millicores`	u32	`1000`	CPU limit (1000 = 1 core)
`resource_limits.memory_mb`	u32	`512`	Memory limit (MB)

`spec.seal`

Optional. Enables cryptographic agent authorization via SEAL. Required in production.

Key	Type	Default	Description
`private_key_path`	string	—	Path to RSA private key PEM for signing `SecurityToken` JWTs
`public_key_path`	string	—	Path to RSA public key PEM for verifying `SecurityToken` JWTs
`issuer`	string	`aegis-orchestrator`	JWT `iss` claim
`audiences`	string[]	`[aegis-agents]`	JWT `aud` claims
`token_ttl_seconds`	u64	`3600`	`SecurityToken` lifetime in seconds

`spec.security_contexts`

Optional array. Named permission boundaries assigned to agents at execution time.

Each entry (SecurityContextDefinition):

Key	Type	Default	Description
`name`	string	—	Unique context name, referenced in agent manifests
`description`	string	`""`	Human-readable description
`capabilities`	array	`[]`	Tool permissions granted by this context
`deny_list`	string[]	`[]`	Explicit tool deny list; overrides any matching capability

Each capabilities entry (CapabilityDefinition):

Key	Type	Description
`tool_pattern`	string	Tool name pattern (e.g., `"fs."`, `"cmd.run"`, `"web.fetch"`, `""`)
`path_allowlist`	string[]	Allowed filesystem path prefixes (for `fs.*` tools)
`subcommand_allowlist`	object	Map of base command → allowed first positional arguments (for `cmd.run`). Example: `{cargo: ["build","test"]}`.
`domain_allowlist`	string[]	Allowed network domain suffixes (for `web.*` tools)
`rate_limit.calls`	u32	Number of calls allowed per window
`rate_limit.per_seconds`	u32	Window size in seconds
`max_response_size`	u64	Max response size in bytes

`spec.builtin_dispatchers`

Optional. Configures the cmd Dispatch Protocol handler. This is an in-process handler — it is not an MCP server process.

Key	Type	Default	Description
`cmd.enabled`	bool	`true`	Enable `cmd.run` dispatch
`cmd.default_timeout_secs`	u64	`60`	Default subprocess timeout
`cmd.max_timeout_ceiling_secs`	u64	`300`	Maximum timeout an agent manifest may request
`cmd.max_output_bytes`	u64	`524288`	Max stdout + stderr per subprocess (512 KB)
`cmd.max_concurrent_per_execution`	u32	`1`	Max concurrent subprocesses per execution
`cmd.global_env_denylist`	string[]	(see sample)	Env vars that must never be forwarded to agent subprocesses

`spec.iam`

Optional. Configures IAM/OIDC as the trusted JWT issuer. Omit to disable JWT validation (dev only; never in production).

Key	Type	Default	Description
`realms[].slug`	string	—	Realm name matching the Keycloak configuration
`realms[].issuer_url`	string	—	OIDC issuer URL
`realms[].jwks_uri`	string	—	JWKS endpoint for JWT signature verification
`realms[].audience`	string	—	Expected `aud` claim in tokens from this realm
`realms[].kind`	enum	—	`system` \| `consumer` \| `tenant`
`jwks_cache_ttl_seconds`	u32	`300`	JWKS key cache TTL; refreshed automatically to support key rotation
`claims.zaru_tier`	string	`zaru_tier`	Keycloak claim name carrying `ZaruTier`
`claims.aegis_role`	string	`aegis_role`	Keycloak claim name carrying `AegisRole`

`spec.grpc_auth`

Optional. Controls IAM/OIDC JWT enforcement on the gRPC endpoint. Requires spec.iam to be configured.

Key	Type	Default	Description
`enabled`	bool	`true`	Enforce JWT validation on gRPC methods
`exempt_methods`	string[]	`[/aegis.v1.InnerLoop/Generate]`	gRPC method full paths exempt from auth. The inner-loop bootstrap channel must always be exempt.

`spec.secrets`

Optional. Configures OpenBao as the secrets backend. Follows the Keymaster Pattern — agents never access OpenBao directly.

Key	Type	Default	Description
`backend.address`	string	—	OpenBao server URL
`backend.auth_method`	string	`approle`	Authentication method. Only `approle` is currently supported.
`backend.approle.role_id`	string	—	AppRole Role ID (public; safe to commit)
`backend.approle.secret_id_env_var`	string	`OPENBAO_SECRET_ID`	Name of the environment variable containing the AppRole Secret ID. Never commit the actual Secret ID.
`backend.namespace`	string	—	OpenBao namespace (maps 1:1 to an IAM realm)
`backend.tls.ca_cert`	string	null	CA certificate path
`backend.tls.client_cert`	string	null	mTLS client certificate path
`backend.tls.client_key`	string	null	mTLS client key path

`spec.database`

Optional. PostgreSQL connection for persistent state (executions, patterns, workflows). If omitted, the daemon uses in-memory repositories (development mode only).

Key	Type	Default	Description
`url`	string	—	PostgreSQL connection URL. Supports `env:` and `secret:`.
`max_connections`	u32	`5`	Maximum connections in the pool
`connect_timeout_seconds`	u64	`5`	Connection timeout

`spec.temporal`

Optional. Temporal workflow engine configuration for durable workflow execution. If omitted, workflow orchestration features are unavailable.

Key	Type	Default	Description
`address`	string	`temporal:7233`	Temporal gRPC server address
`worker_http_endpoint`	string	`http://localhost:3000`	HTTP endpoint for Temporal worker callbacks. Supports `env:`.
`worker_secret`	string	null	Shared secret for authenticating worker callbacks. Supports `env:` and `secret:`.
`namespace`	string	`default`	Temporal namespace
`task_queue`	string	`aegis-agents`	Temporal task queue name
`max_connection_retries`	i32	`30`	Maximum number of connection retries when establishing the Temporal client.

`spec.cortex`

Optional. Cortex memory and learning service configuration. If omitted or grpc_url is null, the daemon runs in memoryless mode — patterns are simply not stored.

Key	Type	Default	Description
`grpc_url`	string	null	Cortex gRPC service URL. Supports `env:`.
`api_key`	string	null	API key for 100monkeys hosted Cortex (Zaru SaaS). Supports `env:` and `secret:` prefixes. When absent, the orchestrator connects without authentication (local/open cortex).

`spec.discovery`

Discovery — Semantic agent and workflow search (aegis.agent.search, aegis.workflow.search) is powered by the Cortex service. When spec.cortex is configured with a valid grpc_url and api_key, discovery is available automatically. No separate spec.discovery configuration is required.

`spec.seal_gateway`

Optional. Configures forwarding of external tool invocations to the standalone SEAL tooling gateway.

Key	Type	Default	Description
`url`	string	null	gRPC endpoint URL of `aegis-seal-gateway` (example: `http://aegis-seal-gateway:50055`).

If omitted, orchestrator does not forward unknown/external tools to the gateway and continues with built-in routing only.

`spec.max_execution_list_limit`

Optional. Upper bound on executions returned by a single list_executions request.

Key	Type	Default	Description
`max_execution_list_limit`	usize	`1000`	Maximum number of executions returned by a single `list_executions` request. Protects against excessive memory usage.

`spec.observability`

Optional.

`logging`

Key	Type	Default	Env Override	Description
`logging.level`	enum	`info`	`RUST_LOG`	`error` \| `warn` \| `info` \| `debug` \| `trace`
`logging.format`	enum	`json`	`AEGIS_LOG_FORMAT`	`json` \| `text`
`logging.file`	string	null	—	Log file path. Omit to write to stdout.
`logging.otlp_endpoint`	string	null	`AEGIS_OTLP_ENDPOINT`	OTLP collector endpoint. Setting this enables OTLP log export.
`logging.otlp_protocol`	enum	`grpc`	`AEGIS_OTLP_PROTOCOL`	`grpc` \| `http`
`logging.otlp_headers`	map	`{}`	`AEGIS_OTLP_HEADERS`	Key-value headers sent with every OTLP export RPC. Values support `env:` and `secret:` prefixes. Env var uses comma-separated `key=value` pairs.
`logging.otlp_min_level`	string	`info`	`AEGIS_OTLP_LOG_LEVEL`	Minimum log level forwarded to OTLP (does not affect stdout).
`logging.otlp_service_name`	string	`aegis-orchestrator`	`AEGIS_OTLP_SERVICE_NAME`	`service.name` resource attribute
`logging.otlp_batch.max_queue_size`	u32	`2048`	—	Maximum buffered records before export
`logging.otlp_batch.scheduled_delay_ms`	u64	`5000`	—	Batch flush interval (ms)
`logging.otlp_batch.max_export_batch_size`	u32	`512`	—	Records per export RPC
`logging.otlp_batch.export_timeout_ms`	u64	`10000`	—	Per-call timeout (ms)
`logging.otlp_tls.verify`	bool	`true`	—	Verify OTLP endpoint TLS certificate
`logging.otlp_tls.ca_cert_path`	string	null	—	Custom CA certificate for self-signed backends

`metrics`

Key	Type	Default	Description
`metrics.enabled`	bool	`true`	Enable Prometheus metrics
`metrics.port`	u16	`9091`	Prometheus metrics exposition port
`metrics.path`	string	`/metrics`	HTTP path for scraping

`tracing`

Key	Type	Default	Description
`tracing.enabled`	bool	`false`	Enable distributed tracing via OpenTelemetry

Configuration Reference