Skip to Content
⚠️ v0.1 — Early preview. APIs and schema may change.
Observability

Observability

Zil agents ship with pre-wired OpenTelemetry conventions for agent-specific spans. This is one of the highest-value features — most agents in the wild have poor observability. Zil standardizes it.

How it works

Zil builds on ADK’s built-in OpenTelemetry tracing. When you call zil.create_agent(), it:

  1. Reads observability/config.yaml from your project
  2. Sets standard OTel environment variables (OTEL_EXPORTER_OTLP_TRACES_ENDPOINT, OTEL_SERVICE_NAME)
  3. Injects resource attributes (agent.name, agent.version) so every span is tagged with the agent’s identity
  4. Calls ADK’s maybe_set_otel_providers() to register the OTLP exporter

ADK then automatically emits spans for agent invocations, LLM calls, and tool executions — following the OpenTelemetry GenAI Semantic Conventions .

Configuration

# observability/config.yaml observability: tracing: exporter: otlp endpoint: ${OTEL_EXPORTER_OTLP_TRACES_ENDPOINT} sample_rate: 1.0 resource_attributes: service.name: my-agent span_conventions: - agent.session - agent.turn - agent.reasoning - agent.skill.invoke - agent.mcp.tool_call - agent.guardrail.check required_attributes: - agent.name - agent.version - session.id - tokens.input - tokens.output - cost.usd

Configuration fields

tracing

FieldTypeRequiredDescription
exporterstringYesExporter type. otlp is the standard choice.
endpointstringYesOTel collector URL. Supports ${ENV_VAR} substitution.
sample_ratefloatNoSampling rate (0.0–1.0). Default: 1.0 (sample everything).

resource_attributes

Custom OTel resource attributes  attached to all spans. Typically used for service.name. Supports ${ENV_VAR} substitution.

span_conventions

The standard span types emitted by Zil agents:

SpanDescription
agent.sessionTop-level span for an entire user session
agent.turnOne request-response cycle within a session
agent.reasoningLLM inference / chain-of-thought step
agent.skill.invokeInvocation of an agent skill or function
agent.mcp.tool_callCall to an MCP server tool
agent.guardrail.checkGuardrail evaluation (pass/block/escalate)

required_attributes

Attributes that must be present on every span:

AttributeTypeDescription
agent.namestringAgent name from the manifest
agent.versionstringAgent version from the manifest
session.idstringUnique session identifier
tokens.inputintegerInput tokens consumed
tokens.outputintegerOutput tokens generated
cost.usdfloatEstimated cost in USD for this operation

Token and cost attributes are mandatory because they enable cost governance — tracking spend per agent, per session, per request.

Dev vs Prod

Local development

Quick debugging — print spans to stderr with no infrastructure:

zil run --trace-console

This uses OpenTelemetry’s ConsoleSpanExporter — every span is printed as it completes.

Full observability with Docker — use zil web --docker --trace to get traces, metrics, and logs in one command:

zil web --docker --trace

This starts a Grafana OTEL-LGTM  container alongside your agent, providing:

ComponentWhat it collectsGrafana data source
TempoDistributed tracesTempo
MimirMetrics (request counts, latencies)Prometheus
LokiStructured logsLoki
  • Agent UI: http://localhost:8000
  • Grafana: http://localhost:3000 (login: admin / admin)

To explore traces:

  1. Open Grafana → Explore (compass icon)
  2. Select Tempo as the data source
  3. Click SearchRun query

Both containers stop on Ctrl+C.

Manual OTLP setup — point to any collector:

export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://localhost:4318 zil run --trace

Production (Cloud Run)

Deploy with --trace to send spans to Google Cloud Trace:

zil deploy --project my-project --region us-central1 --trace

View traces at: GCP Console → Observability → Trace Explorer

For other backends, set the endpoint in your .env:

# .env OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=https://your-collector.example.com/v1/traces

The SDK picks this up automatically via zil.create_agent() — no code changes needed between dev and prod.

SDK integration

Telemetry is wired up automatically when you use zil.create_agent():

import zil agent = zil.create_agent(tools=[...]) # Tracing is active if observability/config.yaml exists # and OTEL_EXPORTER_OTLP_TRACES_ENDPOINT is set

To disable automatic telemetry setup (e.g., in tests or when managing OTel yourself):

agent = zil.create_agent(tools=[...], enable_telemetry=False)

For programmatic control:

from zil.sdk import setup_telemetry, setup_console_telemetry # Console exporter for local debugging setup_console_telemetry(agent_name="my-agent", agent_version="1.0.0") # Or OTLP exporter from config setup_telemetry(obs_config, agent_name="my-agent", agent_version="1.0.0")

Compatible backends

The otlp exporter works with any OTel-compatible backend:

BackendNotes
Google Cloud TraceNative GCP integration via zil deploy --trace
Grafana OTEL-LGTMBuilt-in with zil web --docker --trace (traces + metrics + logs)
DatadogVia OTel Collector
HoneycombDirect OTLP support
Grafana CloudHosted Tempo / Mimir / Loki
JaegerOpen-source, traces only

Environment setup

Set the collector endpoint in your .env:

# .env OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://localhost:4318/v1/traces

For Google Cloud Trace, use the --otel_to_cloud flag with adk web or configure the GCP OTLP endpoint.

CLI flags

Both zil run and zil web support tracing flags:

FlagDescription
--traceEnable OTLP trace export to the configured endpoint
--trace-consolePrint spans to stderr (no collector needed)
# Export to OTLP collector zil run --trace zil web --trace # Local debugging — no infra required zil run --trace-console

Validation

zil validate checks that:

  • observability/config.yaml exists
  • The cost.usd attribute is present (warns if missing)
✓ observability/config.yaml — present ⚠ observability/config.yaml — missing recommended attribute: cost.usd