Aegis Orchestrator
Deployment

Configuration Reference

Full annotated aegis-config.yaml with all supported keys, types, defaults, and descriptions.

Daemon Configuration

The AEGIS daemon is configured via a single YAML file, by default aegis-config.yaml in the working directory. Pass a custom path with --config:

aegis daemon start --config /etc/aegis/config.yaml

Config Discovery Order

The daemon searches for a config file in the following order. The first match wins:

  1. --config <path> CLI flag
  2. AEGIS_CONFIG_PATH environment variable
  3. ./aegis-config.yaml (working directory)
  4. ~/.aegis/config.yaml
  5. /etc/aegis/config.yaml (Linux/macOS)

Credential Resolution

Any string value in the config file can use a credential prefix instead of a literal:

PrefixResolution
env:VAR_NAMERead from the daemon process environment at startup.
secret:path/to/secretResolved from OpenBao at runtime (requires spec.secrets.backend configured).
(literal)Plaintext. Avoid for secrets.

The recommended production pattern is secret: references for all API keys and credentials. Use env: as a fallback when OpenBao is not available.


Multi-Node Cluster Topology

AEGIS supports a distributed Controller-Worker topology for high availability and horizontal scaling. Nodes can be configured as controllers (scheduling and management), workers (agent execution), or hybrid nodes.

Key features of the cluster topology include:

  • Distributed Scheduling: Controllers route agent executions to the most suitable available workers.
  • Node Identity: Each node maintains a stable identity via a persistent Ed25519 keypair, used for secure attestation and SealNodeEnvelope signing.
  • Secure Communication: Inter-node traffic is protected by mTLS and SEAL-derived security tokens.
  • Seamless Scaling: New workers can be added to the cluster by pointing them at the controller endpoint and providing valid certificates.

Manifest Envelope

All aegis-config.yaml files use the Kubernetes-style manifest envelope:

apiVersion: 100monkeys.ai/v1
kind: NodeConfig
metadata:
  name: "my-aegis-node"          # required
  version: "1.0.0"               # optional
  labels:                        # optional
    environment: "production"
spec:
  # All configuration sections documented below sit under spec:

The sections documented below all belong under the top-level spec: key.


Full Annotated Configuration

apiVersion: 100monkeys.ai/v1
kind: NodeConfig
metadata:
  name: "aegis-node"
spec:

# ─── Node Identity ────────────────────────────────────────────────────────────
node:
  # Unique stable node identifier. UUID recommended. Required.
  id: "env:AEGIS_NODE_ID"
  # Node type. Options: edge | orchestrator | hybrid. Required.
  type: orchestrator
  # Geographic region, e.g. "us-east-1". Optional.
  region: "us-east-1"
  # Capability tags used to match agent manifest execution_targets. Optional.
  tags:
    - gpu
    - high-memory
  # Physical resources available on this node. Optional.
  resources:
    cpu_cores: 8
    memory_gb: 32
    disk_gb: 500
    gpu: false

# ─── Image Tag ────────────────────────────────────────────────────────────────
# Docker image tag for all AEGIS-owned service containers.
# Written by `aegis init --tag <TAG>` and updated by `aegis update`.
# When absent, both commands default to the version of the aegis binary.
image_tag: "0.1.0-pre-alpha"

# ─── LLM Providers ────────────────────────────────────────────────────────────
# Array of LLM provider configurations. At least one entry is required.
llm_providers:
  - name: openai-primary           # Unique provider name on this node. Required.
    type: openai                   # openai | anthropic | ollama | openai-compatible. Required.
    endpoint: "https://api.openai.com/v1"  # API endpoint URL. Required.
    api_key: "env:OPENAI_API_KEY"  # Supports env: and secret: prefixes.
    enabled: true                  # Default: true
    models:                        # Must have at least one entry. Required.
      - alias: default             # Alias referenced in agent manifests. Required.
        model: gpt-4o              # Provider-side model name. Required.
        capabilities:              # chat | embedding | reasoning | vision | code. Required.
          - chat
          - code
          - reasoning
        context_window: 128000     # Max context window in tokens. Required.
        cost_per_1k_tokens: 0.005  # Default: 0.0 (free/local)
      - alias: fast
        model: gpt-4o-mini
        capabilities: [chat, code]
        context_window: 128000
        cost_per_1k_tokens: 0.00015

  - name: anthropic-primary
    type: anthropic
    endpoint: "https://api.anthropic.com/v1"
    api_key: "secret:aegis-system/llm/anthropic-api-key"
    enabled: true
    models:
      - alias: smart
        model: claude-sonnet-4-5
        capabilities: [chat, code, reasoning]
        context_window: 200000
        cost_per_1k_tokens: 0.003

  - name: ollama-local
    type: ollama
    endpoint: "http://localhost:11434"
    enabled: true
    models:
      - alias: local
        model: qwen2.5-coder:32b
        capabilities: [chat, code]
        context_window: 32000
        cost_per_1k_tokens: 0.0

# ─── LLM Selection Strategy ───────────────────────────────────────────────────
# Optional. Controls how the orchestrator picks providers at runtime.
llm_selection:
  # prefer-local | prefer-cloud | cost-optimized | latency-optimized
  # Default: prefer-local
  strategy: prefer-local
  # Provider name to use when no preference is specified. Default: null (auto-select).
  default_provider: openai-primary
  # Provider to use if the primary fails. Default: null.
  fallback_provider: ollama-local
  # Maximum retry attempts on LLM failure. Default: 3
  max_retries: 3
  # Delay between retries in milliseconds. Default: 1000
  retry_delay_ms: 1000

# ─── Execution Limits ────────────────────────────────────────────────────────
# Optional. Protects list_executions from returning unbounded result sets. Defaults to 1000 when omitted.
max_execution_list_limit: 1000

# ─── Runtime ──────────────────────────────────────────────────────────────────
# Optional. Has safe defaults for Docker.
runtime:
  # Path to the agent bootstrap script, relative to the orchestrator binary.
  # Default: assets/bootstrap.py
  bootstrap_script: "assets/bootstrap.py"
  # Default isolation level for executions that do not specify one.
  # Options: docker | podman | firecracker | inherit | process
  # Default: inherit (uses the node's compiled-in default)
  default_isolation: docker
  # Container runtime socket path. Omit to use the platform default:
  #   Docker on Linux/macOS:    /var/run/docker.sock
  #   Podman rootless:          /run/user/<UID>/podman/podman.sock
  #   Podman system (root):     /run/podman/podman.sock
  # Can also be set via CONTAINER_HOST or DOCKER_HOST env vars (see below).
  container_socket_path: "/var/run/docker.sock"
  # Container network name for agent containers.
  # Supports env: prefix. Example: "env:AEGIS_CONTAINER_NETWORK"
  container_network_mode: "aegis-network"
  # URL that agent containers use to call back to the orchestrator.
  # Must be reachable from inside containers, not from the host.
  # Default: http://localhost:8088
  orchestrator_url: "env:AEGIS_ORCHESTRATOR_URL"

  # ── Container Host Environment Variables ──────────────────────────────────
  # The container socket can also be configured via environment variables.
  # These override container_socket_path when set:
  #
  #   CONTAINER_HOST=unix:///run/user/1000/podman/podman.sock
  #   DOCKER_HOST=unix:///var/run/docker.sock
  #
  # CONTAINER_HOST takes precedence over DOCKER_HOST when both are set.
  # Podman users should set CONTAINER_HOST to point to the Podman socket.

  # ── NFS Storage Gateway (required when spec.storage.backend: seaweedfs) ───
  # Hostname/IP of the NFS server. Must resolve from the HOST OS where the
  # Docker daemon runs — NOT from inside agent containers.
  # Supports env: prefix.
  #
  # Platform-specific values:
  #   WSL2 / Linux native:          "127.0.0.1"
  #   Docker Desktop (Win/Mac):     "host.docker.internal"
  #   Linux bridge network:         "172.17.0.1"  (Docker bridge gateway)
  #   Remote / VM host:             "<physical host IP>"
  nfs_server_host: "env:AEGIS_NFS_HOST"
  # NFS server listen port. Default: 2049
  nfs_port: 2049
  # NFS mountd port. Default: 2049
  nfs_mountport: 2049

# ─── Network ──────────────────────────────────────────────────────────────────
# Optional. Has safe defaults.
network:
  # Bind address for all listeners. Default: 0.0.0.0
  bind_address: "0.0.0.0"
  # HTTP REST API port. Default: 8088
  port: 8088
  # gRPC API port (inner-loop transport, Temporal workers). Default: 50051
  grpc_port: 50051
  # WebSocket URL for edge-node → orchestrator connection. Omit on orchestrator nodes.
  orchestrator_endpoint: null
  # Health check ping interval in seconds. Default: 30
  heartbeat_interval_seconds: 30
  # Optional TLS. Omit for plaintext (dev only; never in production).
  tls:
    cert_path: "/etc/aegis/tls/server.crt"
    key_path: "/etc/aegis/tls/server.key"
    ca_path: null    # Optional CA certificate path

# ─── Cluster ──────────────────────────────────────────────────────────────────
# Optional. Configures the node's role in the multi-node cluster topology.
cluster:
  # Enable cluster mode. Default: false.
  enabled: true
  
  # Node role in cluster. Options: controller | worker | hybrid. Default: hybrid.
  # - controller: Manages routing and registration; does not run executions.
  # - worker: Runs agent executions; does not perform routing decisions.
  # - hybrid: Both controller and worker duties (default).
  role: worker

  # Controller settings (required for workers)
  controller:
    # gRPC endpoint of the controller node.
    endpoint: "grpc://aegis-controller:50056"
    # Bootstrap token for initial attestation (Step 0).
    token: "env:AEGIS_CLUSTER_TOKEN"

  # Port for NodeClusterService (controllers/hybrids). Default: 50056.
  cluster_grpc_port: 50056
  # Static list of peer controller addresses. Default: [].
  peers: []
  # Path to the persistent Ed25519 keypair file for node identity.
  # Generated automatically on first startup if missing.
  node_keypair_path: "/etc/aegis/node_keypair.pem"
  # Interval in seconds for worker heartbeats to the controller. Default: 30.
  heartbeat_interval_secs: 30
  # Re-attest this many seconds before the security token expires. Default: 120.
  token_refresh_margin_secs: 120
  # TLS configuration for secure inter-node cluster communication (mTLS).
  tls:
    enabled: true
    cert_path: "/etc/aegis/certs/node.crt"
    key_path: "/etc/aegis/certs/node.key"
    ca_cert: "/etc/aegis/certs/ca.crt"

# ─── Storage ──────────────────────────────────────────────────────────────────
# Optional. Defaults to local_host backend.
storage:
  # Storage backend. Options: seaweedfs | local_host. Default: local_host
  backend: seaweedfs
  # Gracefully fall back to local storage when SeaweedFS is unreachable.
  # Default: true
  fallback_to_local: true
  # NFS Server Gateway listen port. Default: 2049
  nfs_port: 2049

  # SeaweedFS backend config. Required when backend: seaweedfs.
  seaweedfs:
    # SeaweedFS Filer endpoint.
    # Default: http://localhost:8888
    filer_url: "http://localhost:8888"
    # Host filesystem mount point.
    # Default: /var/lib/aegis/storage
    mount_point: "/var/lib/aegis/storage"
    # Default TTL for ephemeral volumes in hours. Default: 24
    default_ttl_hours: 24
    # Default per-volume size quota in MB. Default: 1000
    default_size_limit_mb: 1000
    # Hard ceiling on any single volume size in MB. Default: 10000
    max_size_limit_mb: 10000
    # GC interval for expired volumes in minutes. Default: 60
    gc_interval_minutes: 60
    # Optional S3 gateway endpoint (e.g., for direct object uploads).
    s3_endpoint: null
    # S3 gateway region. Default: us-east-1
    s3_region: "us-east-1"

  # Local storage config. Used when backend: local_host, or as the fallback target.
  local_host:
    # Base directory for local volume storage.
    # Default: /var/lib/aegis/local-volumes
    base_path: "/var/lib/aegis/local-volumes"
    # Default TTL for ephemeral volumes in hours. Default: 24
    default_ttl_hours: 24
    # Default per-volume size quota in MB. Default: 1000
    default_size_limit_mb: 1000
    # Hard ceiling on any single volume size in MB. Default: 10000
    max_size_limit_mb: 10000

# ─── Deploy Built-In Templates ───────────────────────────────────────────────
# Deploy vendored built-in agent and workflow templates on startup.
# Includes agent-creator-agent, workflow-generator-planner-agent, judge agents,
# intent-executor-discovery-agent, intent-result-formatter-agent, skill-validator,
# and the builtin-workflow-generator, builtin-intent-to-execution, and skill-import workflows.
# Required for aegis.agent.generate, aegis.workflow.generate, and aegis.execute.intent to function.
# Default: false
deploy_builtins: false

# ─── Force Deploy Built-In Templates ─────────────────────────────────────────
# Force re-registration of all built-in agents and workflows on startup, even if
# already registered. Use after a platform upgrade when built-in agent UUIDs or
# definitions have changed and stale registrations need to be flushed.
# Accepts: "true" | "false" | "env:VAR_NAME". Optional. Default: disabled.
force_deploy_builtins: "false"

# ─── MCP Tool Servers ─────────────────────────────────────────────────────────
# External MCP server processes. Optional array.
mcp_servers:
  - name: web-search              # Unique server name on this node. Required.
    enabled: true                 # Default: true
    # Executable path (absolute or relative to /usr/local/bin). Required.
    executable: "node"
    # Command-line arguments. Default: []
    args:
      - "/opt/aegis-tools/web-search/index.js"
    # Tool capabilities this server provides (used for routing). Default: []
    capabilities:
      - web.search
      - web.fetch
    # API keys and tokens — resolved via env: or secret: prefixes.
    # Values are injected as environment variables into the server process.
    credentials:
      SEARCH_API_KEY: "secret:aegis-system/tools/search-api-key"
    # Non-secret environment variables for the server process. Default: {}
    environment:
      LOG_LEVEL: "info"
    # Health check configuration.
    health_check:
      interval_seconds: 60        # Default: 60
      timeout_seconds: 5          # Default: 5
      method: "tools/list"        # MCP method used for health check. Default: tools/list
    # Resource limits for the server process.
    resource_limits:
      cpu_millicores: 1000        # 1000 = 1 CPU core
      memory_mb: 512

# ─── SEAL (Signed Envelope Attestation Layer) ────────────────────────────────────
# Optional. Required in production to enable cryptographic agent authorization.
seal:
  # RSA private key PEM used to sign SecurityToken JWTs issued at attestation.
  private_key_path: "/etc/aegis/seal/private.pem"
  # RSA public key PEM used to verify SecurityToken JWTs on tool calls.
  public_key_path: "/etc/aegis/seal/public.pem"
  # JWT iss claim. Default: aegis-orchestrator
  issuer: "aegis-orchestrator"
  # JWT aud claim values. Default: [aegis-agents]
  audiences:
    - "aegis-agents"
  # SecurityToken lifetime in seconds. Default: 3600 (1 hour)
  token_ttl_seconds: 3600

# ─── Security Contexts ────────────────────────────────────────────────────────
# Named permission boundaries controlling what tools agents may invoke.
# Referenced by name in agent manifests and in spec.iam ZaruTier mappings.
security_contexts:
  - name: coder-default
    description: "Standard coder context — filesystem + commands + safe package registries"
    capabilities:
      - tool_pattern: "fs.*"        # ← tool_pattern, not tool
        path_allowlist:
          - /workspace
          - /agent
      - tool_pattern: "cmd.run"
        subcommand_allowlist:
          git: [clone, add, commit, push, pull, status, diff, stash]
          cargo: [build, test, fmt, clippy, check, run]
          npm: [install, run, test, build, ci]
          python: ["-m"]
      - tool_pattern: "web.fetch"
        domain_allowlist:
          - pypi.org
          - crates.io
          - npmjs.com
        rate_limit:
          calls: 30                 # ← object with calls + per_seconds, not "30/minute"
          per_seconds: 60
    # Explicit deny list — overrides any matching capability above.
    deny_list: []

  - name: zaru-free
    description: "Zaru Free tier: ephemeral volumes only, no outbound network"
    capabilities:
      - tool_pattern: "fs.*"
        path_allowlist:
          - /workspace
      - tool_pattern: "cmd.run"
        subcommand_allowlist:
          python: ["-m"]
          npm: [install, run, test]
    deny_list:
      - "web.*"

  - name: zaru-pro
    description: "Zaru Pro tier: full coder-default capabilities"
    capabilities:
      - tool_pattern: "fs.*"
        path_allowlist:
          - /workspace
          - /agent
      - tool_pattern: "cmd.run"
        subcommand_allowlist:
          git: [clone, add, commit, push, pull, status, diff]
          cargo: [build, test, fmt, clippy, check, run]
          npm: [install, run, test, build, ci]
          python: ["-m"]
      - tool_pattern: "web.*"
        domain_allowlist:
          - pypi.org
          - crates.io
          - npmjs.com
          - api.github.com
        rate_limit:
          calls: 60
          per_seconds: 60

  # aegis-system-agent-runtime is a platform built-in — shown here for reference only.
  # Do NOT copy this block into your aegis-config.yaml; it is registered automatically.
  - name: aegis-system-agent-runtime
    description: "Execution surface for agent containers. Grants filesystem, shell, and web access scoped to /workspace."
    capabilities:
      - tool_pattern: "fs.*"
        path_allowlist:
          - /workspace
      - tool_pattern: "cmd.run"
      - tool_pattern: "web.*"
      - tool_pattern: "aegis.agent.get"
      - tool_pattern: "aegis.agent.list"
      - tool_pattern: "aegis.workflow.get"
      - tool_pattern: "aegis.workflow.list"
      - tool_pattern: "aegis.workflow.signal"
      - tool_pattern: "aegis.task.execute"
    deny_list:
      - "aegis.agent.delete"
      - "aegis.workflow.delete"
      - "aegis.task.remove"
      - "aegis.system.info"
      - "aegis.system.config"

  - name: aegis-system-operator
    description: "Platform operator — all safe tools plus destructive and orchestrator commands"
    capabilities:
      - tool_pattern: "fs.*"
        path_allowlist:
          - /workspace
          - /agent
          - /shared
      - tool_pattern: "cmd.run"
        subcommand_allowlist:
          git: [clone, add, commit, push, pull, status, diff, stash]
          cargo: [build, test, fmt, clippy, check, run]
          npm: [install, run, test, build, ci]
          python: ["-m"]
      - tool_pattern: "web.*"
      # Destructive commands (operator-only)
      - tool_pattern: "aegis.agent.delete"
      - tool_pattern: "aegis.workflow.delete"
      - tool_pattern: "aegis.task.remove"
      # Orchestrator commands (operator-only)
      - tool_pattern: "aegis.system.info"
      - tool_pattern: "aegis.system.config"
    deny_list: []

# ─── Builtin Dispatchers ──────────────────────────────────────────────────────
# Configuration for the in-process Dispatch Protocol handler.
# The cmd dispatcher is NOT an MCP server — it runs subprocesses inside agent
# containers via the bidirectional bootstrap channel.
builtin_dispatchers:
  cmd:
    # Enable cmd.run dispatch. Default: true
    enabled: true
    # Default per-subprocess timeout in seconds. Default: 60
    default_timeout_secs: 60
    # Ceiling timeout an agent manifest may request. Default: 300
    max_timeout_ceiling_secs: 300
    # Maximum stdout + stderr captured per subprocess in bytes. Default: 524288 (512 KB)
    max_output_bytes: 524288
    # Maximum concurrent subprocesses per execution. Default: 1
    max_concurrent_per_execution: 1
    # Environment variables that must never be forwarded to agent subprocesses.
    global_env_denylist:
      - AEGIS_TOKEN
      - OPENAI_API_KEY
      - ANTHROPIC_API_KEY
      - SEAL_PRIVATE_KEY
      - AWS_SECRET_ACCESS_KEY
      - GOOGLE_API_KEY

# ─── IAM (OIDC) ───────────────────────────────────────────────────────────────
# Optional. Omit to disable JWT validation (dev only; never in production).
# Configures Keycloak as the trusted OIDC issuer for all
# human and service-account identities.
iam:
  # Array of Keycloak realms trusted by this node.
  realms:
    - slug: "aegis-system"         # Realm name matching Keycloak config. Required.
      issuer_url: "https://auth.myzaru.com/realms/aegis-system"  # OIDC issuer URL. Required.
      jwks_uri: "https://auth.myzaru.com/realms/aegis-system/protocol/openid-connect/certs"  # Required.
      audience: "aegis-orchestrator"  # Expected aud claim. Required.
      kind: system                 # system | consumer | tenant. Required.
    - slug: "zaru-consumer"
      issuer_url: "https://auth.myzaru.com/realms/zaru-consumer"
      jwks_uri: "https://auth.myzaru.com/realms/zaru-consumer/protocol/openid-connect/certs"
      audience: "aegis-orchestrator"
      kind: consumer
  # JWKS key refresh interval in seconds. Supports live key rotation.
  # Default: 300
  jwks_cache_ttl_seconds: 300
  # Custom claim names injected by Keycloak attribute mappers.
  claims:
    zaru_tier: "zaru_tier"         # Claim carrying ZaruTier (Free | Pro | Business | Enterprise)
    aegis_role: "aegis_role"       # Claim carrying AegisRole in aegis-system realm
    tenant_id: "tenant_id"         # Claim carrying per-user tenant slug (u-{uuid}) for consumer users

# ─── gRPC Auth ────────────────────────────────────────────────────────────────
# Optional. Controls IAM/OIDC JWT enforcement on the gRPC endpoint.
# Requires spec.iam to be configured.
grpc_auth:
  # Enable JWT validation on all gRPC methods. Default: true
  enabled: true
  # gRPC method full paths exempt from auth.
  # The inner-loop bootstrap channel must always be exempt.
  exempt_methods:
    - "/aegis.v1.InnerLoop/Generate"

# ─── Secrets (OpenBao) ────────────────────────────────────────────────────────
# Optional. Omit to rely solely on env: references.
# Follows the Keymaster Pattern: only the orchestrator accesses
# OpenBao — agents never receive secret backend credentials directly.
secrets:
  backend:
    # OpenBao server address.
    address: "https://openbao.internal:8200"
    # Authentication method. Currently only approle is supported.
    auth_method: "approle"
    approle:
      # AppRole Role ID — public, can be committed to config.
      role_id: "env:OPENBAO_ROLE_ID"
      # Name of the environment variable containing the AppRole Secret ID.
      # The Secret ID itself must never be committed to config files.
      secret_id_env_var: "OPENBAO_SECRET_ID"
    # OpenBao namespace for multi-tenancy. Maps 1:1 to a Keycloak realm.
    namespace: "aegis-system"
    # Optional mTLS to OpenBao.
    tls:
      ca_cert: "/etc/aegis/openbao-ca.pem"
      client_cert: null            # Optional mTLS client cert
      client_key: null             # Optional mTLS client key

# ─── Database ─────────────────────────────────────────────────────────────────
# Optional. If omitted the daemon uses InMemory repositories (dev only).
database:
  # PostgreSQL connection URL. Supports env: and secret: prefixes.
  url: "env:AEGIS_DATABASE_URL"
  # Maximum connections in the pool. Default: 5
  max_connections: 10
  # Connection timeout in seconds. Default: 5
  connect_timeout_seconds: 5

# ─── Temporal ─────────────────────────────────────────────────────────────────
# Optional. If omitted workflow orchestration is unavailable.
temporal:
  # Temporal gRPC server address. Default: temporal:7233
  address: "temporal:7233"
  # HTTP endpoint for Temporal worker callbacks. Default: http://localhost:3000
  worker_http_endpoint: "http://aegis-runtime:3000"
  # Shared secret for authenticating worker callbacks. Supports env:.
  worker_secret: "env:TEMPORAL_WORKER_SECRET"
  # Temporal namespace. Default: default
  namespace: "default"
  # Temporal task queue. Default: aegis-agents
  task_queue: "aegis-agents"
  # Maximum number of connection retries when establishing the Temporal client.
  # If omitted, a default of 30 retries is used.
  max_connection_retries: 30

# ─── Cortex ───────────────────────────────────────────────────────────────────
# Optional. If omitted or grpc_url is null the daemon runs in memoryless mode.
# When configured, discovery (aegis.agent.search, aegis.workflow.search) is
# available automatically — no separate discovery section is required.
cortex:
  # Cortex gRPC service URL. Supports env:.
  grpc_url: "env:CORTEX_GRPC_URL"
  # API key for 100monkeys hosted Cortex (Zaru SaaS). Supports env: and secret:.
  # Omit for local or open cortex deployments (connects without authentication).
  api_key: "env:CORTEX_API_KEY"

# ─── External SEAL Tooling Gateway ─────────────────────────────────
# Optional. Omit to keep external tool routing disabled.
seal_gateway:
  # gRPC endpoint URL for aegis-seal-gateway.
  url: "http://aegis-seal-gateway:50055"

# ─── Observability ────────────────────────────────────────────────────────────
# Optional.
observability:
  logging:
    # Log level: error | warn | info | debug | trace. Default: info
    level: info
    # Output format: json | text. Default: json
    format: json
    # Log file path. Omit to write to stdout.
    file: null
    # ── OTLP Log Export ──────────────────────────────────────────────
    # Set to ship logs to Grafana Cloud, Datadog, or a self-hosted OTEL Collector.
    # Omit (or null) to disable. Override with AEGIS_OTLP_ENDPOINT.
    # otlp_endpoint: "http://otel-collector:4317"    # gRPC
    # otlp_endpoint: "https://otlp-gateway.grafana.net/v1/logs"  # HTTP
    # otlp_protocol: grpc   # grpc (default) | http. Override: AEGIS_OTLP_PROTOCOL
    # otlp_headers:         # auth headers; values support env: / secret:
    #   Authorization: "env:OTLP_AUTH_TOKEN"
    # otlp_min_level: info  # min level exported. Override: AEGIS_OTLP_LOG_LEVEL
    # otlp_service_name: aegis-orchestrator  # service.name attr. Override: AEGIS_OTLP_SERVICE_NAME
    # batch:
    #   max_queue_size: 2048
    #   scheduled_delay_ms: 5000
    #   max_export_batch_size: 512
    #   export_timeout_ms: 10000
    # tls:
    #   verify: true
    #   ca_cert_path: null
  metrics:
    # Enable Prometheus metrics exposition. Default: true
    enabled: true
    # Prometheus metrics exposition port. Default: 9091
    port: 9091
    # HTTP path for scraping. Default: /metrics
    path: "/metrics"
  tracing:
    # Enable distributed tracing via OpenTelemetry. Default: false
    enabled: false

Section Reference

spec.node

Required. Identifies this node within the AEGIS cluster.

KeyTypeRequiredDefaultDescription
idstringUnique stable node identifier. UUID recommended. Fails validation if empty.
typeenumedge | orchestrator | hybrid
regionstringnullGeographic region (e.g., "us-east-1")
tagsstring[][]Capability tags matched against execution_targets in agent manifests
resources.cpu_coresu32Available CPU cores
resources.memory_gbu32Available RAM in GB
resources.disk_gbu32Available disk in GB
resources.gpuboolfalseGPU available

spec.image_tag

Optional. Docker image tag for AEGIS-owned services.

KeyTypeDefaultDescription
image_tagstring<binary version>Tag applied to all AEGIS-owned Docker images. Written by aegis init --tag and updated by aegis update. When absent, defaults to the version string embedded in the aegis binary.

spec.llm_providers

Required array. At least one entry with at least one model is required.

KeyTypeRequiredDefaultDescription
namestringUnique provider name
typeenumopenai | anthropic | ollama | openai-compatible
endpointstringAPI endpoint URL
api_keystringnullAPI key. Supports env: and secret:.
enabledbooltrueWhether this provider is active
models[].aliasstringAlias referenced in agent manifests
models[].modelstringProvider-side model identifier
models[].capabilitiesstring[]chat | embedding | reasoning | vision | code
models[].context_windowu32Max context window in tokens
models[].cost_per_1k_tokensf640.0Cost per 1K tokens (0.0 for free/local)

spec.llm_selection

Optional. Controls runtime provider selection strategy.

KeyTypeDefaultDescription
strategyenumprefer-localprefer-local | prefer-cloud | cost-optimized | latency-optimized
default_providerstringnullProvider to use when no preference is specified
fallback_providerstringnullProvider to use if the primary fails
max_retriesu323Maximum retry attempts on LLM failure
retry_delay_msu641000Delay between retries in milliseconds

spec.runtime

Optional. Controls how agent containers are launched.

KeyTypeDefaultDescription
bootstrap_scriptstringassets/bootstrap.pyPath to bootstrap script relative to orchestrator binary
default_isolationenuminheritdocker | podman | firecracker | inherit | process
container_socket_pathstring(platform default)Container runtime socket path. Docker: /var/run/docker.sock. Podman rootless: /run/user/<UID>/podman/podman.sock. Podman system: /run/podman/podman.sock. Can be overridden via CONTAINER_HOST or DOCKER_HOST env vars.
container_network_modestringnullContainer network name for agent containers
orchestrator_urlstringhttp://localhost:8088Callback URL reachable from inside agent containers
nfs_server_hoststringnullCritical for volume mounts. NFS server host as seen by the Docker daemon host OS. See platform table below.
nfs_portu162049NFS server port
nfs_mountportu162049NFS mountd port

nfs_server_host by environment:

EnvironmentValue
WSL2 / Linux native"127.0.0.1"
Docker Desktop (macOS)"host.docker.internal"
Linux bridge network"172.17.0.1" (Docker bridge gateway)
Remote / VM host<physical host IP>
Via env var"env:AEGIS_NFS_HOST"

spec.network

Optional. Configures ports and TLS.

KeyTypeDefaultDescription
bind_addressstring0.0.0.0Network interface to bind all listeners
portu168088HTTP REST API port
grpc_portu1650051gRPC API port
orchestrator_endpointstringnullWebSocket URL for edge → orchestrator connection (edge nodes only)
heartbeat_interval_secondsu6430Health check ping interval
tls.cert_pathstringTLS certificate path
tls.key_pathstringTLS private key path
tls.ca_pathstringnullCA certificate path (optional)

spec.cluster

Optional. Configures the node's role in the multi-node cluster topology.

KeyTypeRequiredDefaultDescription
enabledboolfalseEnable cluster mode.
roleenumhybridNode role: controller | worker | hybrid
controller_endpointstringgRPC endpoint of the controller node. Required for role: worker.
cluster_grpc_portu1650056Port for NodeClusterService (controllers/hybrids)
peersstring[][]Static list of peer controller addresses
node_keypair_pathstringPath to the persistent Ed25519 keypair file for node identity
heartbeat_interval_secsu6430Interval in seconds for worker heartbeats
token_refresh_margin_secsu64120Token re-attestation margin in seconds
tls.enabledbooltrueEnable TLS for cluster communication
tls.cert_pathstringPath to node TLS certificate
tls.key_pathstringPath to node TLS private key
tls.ca_certstringPath to CA certificate for peer verification

spec.storage

Optional. Defaults to the local_host backend.

KeyTypeDefaultDescription
backendenumlocal_hostseaweedfs | local_host
fallback_to_localbooltrueGracefully fall back to local storage when SeaweedFS is unreachable
nfs_portu162049NFS Server Gateway listen port
seaweedfs.filer_urlstringhttp://localhost:8888SeaweedFS Filer endpoint
seaweedfs.mount_pointstring/var/lib/aegis/storageHost filesystem mount point
seaweedfs.default_ttl_hoursu3224Default TTL for ephemeral volumes (hours)
seaweedfs.default_size_limit_mbu641000Default per-volume size quota (MB)
seaweedfs.max_size_limit_mbu6410000Hard ceiling on volume size (MB)
seaweedfs.gc_interval_minutesu3260Expired volume GC interval (minutes)
seaweedfs.s3_endpointstringnullOptional SeaweedFS S3 gateway endpoint
seaweedfs.s3_regionstringus-east-1S3 gateway region
local_host.base_pathstring/var/lib/aegis/local-volumesBase directory for local volume storage
local_host.default_ttl_hoursu3224Default TTL for ephemeral volumes (hours)
local_host.default_size_limit_mbu641000Default per-volume quota (MB)
local_host.max_size_limit_mbu6410000Hard ceiling on volume size (MB)

spec.deploy_builtins

Optional. Default: false.

KeyTypeDefaultDescription
deploy_builtinsboolfalseDeploy vendored built-in agent and workflow templates on startup. Includes agent-creator-agent, workflow-generator-planner-agent, judge agents, intent-executor-discovery-agent, intent-result-formatter-agent, skill-validator, and the builtin-workflow-generator, builtin-intent-to-execution, and skill-import workflows. Required for aegis.agent.generate, aegis.workflow.generate, and aegis.execute.intent to function.

spec.mcp_servers

Optional array. Each entry defines an external MCP Tool Server process.

KeyTypeDefaultDescription
namestringUnique server name on this node
enabledbooltrueWhether to start this server
executablestringExecutable path
argsstring[][]Command-line arguments
capabilitiesstring[][]Tool names this server provides (used for routing)
credentialsmap{}API keys/tokens injected as env vars. Values support secret:.
environmentmap{}Non-secret env vars for the server process
health_check.interval_secondsu6460Health check interval
health_check.timeout_secondsu645Health check timeout
health_check.methodstringtools/listMCP method used to health-check the server
resource_limits.cpu_millicoresu321000CPU limit (1000 = 1 core)
resource_limits.memory_mbu32512Memory limit (MB)

spec.seal

Optional. Enables cryptographic agent authorization via SEAL. Required in production.

KeyTypeDefaultDescription
private_key_pathstringPath to RSA private key PEM for signing SecurityToken JWTs
public_key_pathstringPath to RSA public key PEM for verifying SecurityToken JWTs
issuerstringaegis-orchestratorJWT iss claim
audiencesstring[][aegis-agents]JWT aud claims
token_ttl_secondsu643600SecurityToken lifetime in seconds

spec.security_contexts

Optional array. Named permission boundaries assigned to agents at execution time.

Each entry (SecurityContextDefinition):

KeyTypeDefaultDescription
namestringUnique context name, referenced in agent manifests
descriptionstring""Human-readable description
capabilitiesarray[]Tool permissions granted by this context
deny_liststring[][]Explicit tool deny list; overrides any matching capability

Each capabilities entry (CapabilityDefinition):

KeyTypeDescription
tool_patternstringTool name pattern (e.g., "fs.*", "cmd.run", "web.fetch", "*")
path_allowliststring[]Allowed filesystem path prefixes (for fs.* tools)
subcommand_allowlistobjectMap of base command → allowed first positional arguments (for cmd.run). Example: {cargo: ["build","test"]}.
domain_allowliststring[]Allowed network domain suffixes (for web.* tools)
rate_limit.callsu32Number of calls allowed per window
rate_limit.per_secondsu32Window size in seconds
max_response_sizeu64Max response size in bytes

spec.builtin_dispatchers

Optional. Configures the cmd Dispatch Protocol handler. This is an in-process handler — it is not an MCP server process.

KeyTypeDefaultDescription
cmd.enabledbooltrueEnable cmd.run dispatch
cmd.default_timeout_secsu6460Default subprocess timeout
cmd.max_timeout_ceiling_secsu64300Maximum timeout an agent manifest may request
cmd.max_output_bytesu64524288Max stdout + stderr per subprocess (512 KB)
cmd.max_concurrent_per_executionu321Max concurrent subprocesses per execution
cmd.global_env_denyliststring[](see sample)Env vars that must never be forwarded to agent subprocesses

spec.iam

Optional. Configures IAM/OIDC as the trusted JWT issuer. Omit to disable JWT validation (dev only; never in production).

KeyTypeDefaultDescription
realms[].slugstringRealm name matching the Keycloak configuration
realms[].issuer_urlstringOIDC issuer URL
realms[].jwks_uristringJWKS endpoint for JWT signature verification
realms[].audiencestringExpected aud claim in tokens from this realm
realms[].kindenumsystem | consumer | tenant
jwks_cache_ttl_secondsu32300JWKS key cache TTL; refreshed automatically to support key rotation
claims.zaru_tierstringzaru_tierKeycloak claim name carrying ZaruTier
claims.aegis_rolestringaegis_roleKeycloak claim name carrying AegisRole

spec.grpc_auth

Optional. Controls IAM/OIDC JWT enforcement on the gRPC endpoint. Requires spec.iam to be configured.

KeyTypeDefaultDescription
enabledbooltrueEnforce JWT validation on gRPC methods
exempt_methodsstring[][/aegis.v1.InnerLoop/Generate]gRPC method full paths exempt from auth. The inner-loop bootstrap channel must always be exempt.

spec.secrets

Optional. Configures OpenBao as the secrets backend. Follows the Keymaster Pattern — agents never access OpenBao directly.

KeyTypeDefaultDescription
backend.addressstringOpenBao server URL
backend.auth_methodstringapproleAuthentication method. Only approle is currently supported.
backend.approle.role_idstringAppRole Role ID (public; safe to commit)
backend.approle.secret_id_env_varstringOPENBAO_SECRET_IDName of the environment variable containing the AppRole Secret ID. Never commit the actual Secret ID.
backend.namespacestringOpenBao namespace (maps 1:1 to an IAM realm)
backend.tls.ca_certstringnullCA certificate path
backend.tls.client_certstringnullmTLS client certificate path
backend.tls.client_keystringnullmTLS client key path

spec.database

Optional. PostgreSQL connection for persistent state (executions, patterns, workflows). If omitted, the daemon uses in-memory repositories (development mode only).

KeyTypeDefaultDescription
urlstringPostgreSQL connection URL. Supports env: and secret:.
max_connectionsu325Maximum connections in the pool
connect_timeout_secondsu645Connection timeout

spec.temporal

Optional. Temporal workflow engine configuration for durable workflow execution. If omitted, workflow orchestration features are unavailable.

KeyTypeDefaultDescription
addressstringtemporal:7233Temporal gRPC server address
worker_http_endpointstringhttp://localhost:3000HTTP endpoint for Temporal worker callbacks. Supports env:.
worker_secretstringnullShared secret for authenticating worker callbacks. Supports env: and secret:.
namespacestringdefaultTemporal namespace
task_queuestringaegis-agentsTemporal task queue name
max_connection_retriesi3230Maximum number of connection retries when establishing the Temporal client.

spec.cortex

Optional. Cortex memory and learning service configuration. If omitted or grpc_url is null, the daemon runs in memoryless mode — patterns are simply not stored.

KeyTypeDefaultDescription
grpc_urlstringnullCortex gRPC service URL. Supports env:.
api_keystringnullAPI key for 100monkeys hosted Cortex (Zaru SaaS). Supports env: and secret: prefixes. When absent, the orchestrator connects without authentication (local/open cortex).

spec.discovery

Discovery — Semantic agent and workflow search (aegis.agent.search, aegis.workflow.search) is powered by the Cortex service. When spec.cortex is configured with a valid grpc_url and api_key, discovery is available automatically. No separate spec.discovery configuration is required.


spec.seal_gateway

Optional. Configures forwarding of external tool invocations to the standalone SEAL tooling gateway.

KeyTypeDefaultDescription
urlstringnullgRPC endpoint URL of aegis-seal-gateway (example: http://aegis-seal-gateway:50055).

If omitted, orchestrator does not forward unknown/external tools to the gateway and continues with built-in routing only.


spec.max_execution_list_limit

Optional. Upper bound on executions returned by a single list_executions request.

KeyTypeDefaultDescription
max_execution_list_limitusize1000Maximum number of executions returned by a single list_executions request. Protects against excessive memory usage.

spec.observability

Optional.

logging

KeyTypeDefaultEnv OverrideDescription
logging.levelenuminfoRUST_LOGerror | warn | info | debug | trace
logging.formatenumjsonAEGIS_LOG_FORMATjson | text
logging.filestringnullLog file path. Omit to write to stdout.
logging.otlp_endpointstringnullAEGIS_OTLP_ENDPOINTOTLP collector endpoint. Setting this enables OTLP log export.
logging.otlp_protocolenumgrpcAEGIS_OTLP_PROTOCOLgrpc | http
logging.otlp_headersmap{}AEGIS_OTLP_HEADERSKey-value headers sent with every OTLP export RPC. Values support env: and secret: prefixes. Env var uses comma-separated key=value pairs.
logging.otlp_min_levelstringinfoAEGIS_OTLP_LOG_LEVELMinimum log level forwarded to OTLP (does not affect stdout).
logging.otlp_service_namestringaegis-orchestratorAEGIS_OTLP_SERVICE_NAMEservice.name resource attribute
logging.otlp_batch.max_queue_sizeu322048Maximum buffered records before export
logging.otlp_batch.scheduled_delay_msu645000Batch flush interval (ms)
logging.otlp_batch.max_export_batch_sizeu32512Records per export RPC
logging.otlp_batch.export_timeout_msu6410000Per-call timeout (ms)
logging.otlp_tls.verifybooltrueVerify OTLP endpoint TLS certificate
logging.otlp_tls.ca_cert_pathstringnullCustom CA certificate for self-signed backends

metrics

KeyTypeDefaultDescription
metrics.enabledbooltrueEnable Prometheus metrics
metrics.portu169091Prometheus metrics exposition port
metrics.pathstring/metricsHTTP path for scraping

tracing

KeyTypeDefaultDescription
tracing.enabledboolfalseEnable distributed tracing via OpenTelemetry

On this page