Guardrails
Guardrails define rules the agent must follow. They live at identity/guardrails.yaml and combine runtime-enforced checks (regex pattern matching before and after the LLM) with LLM instruction rules (passed to the system prompt).
Configuration
# identity/guardrails.yaml
# --- Runtime-enforced rules (checked by the GuardrailEngine) ---
# Detection toggles (built-in pattern libraries)
detection:
prompt_injection: true # block common jailbreak patterns (10 patterns)
pii_output: true # block SSN/credit card in agent output
pii_input: false # optionally scan user input for PII
# Custom regex patterns
blocked_patterns:
- name: internal_urls
pattern: "https?://internal\\."
target: output # "input" | "output" | "both"
severity: block # "block" | "warn" | "log"
# Keyword-based topic blocking (checked on input)
denied_topics:
- "competitor pricing"
- "salary information"
# Output length enforcement
output_constraints:
max_response_length: 4000
# --- LLM instruction rules (not runtime-enforced) ---
hard_blocks:
- topic: illegal_activity
description: Refuse requests for illegal activities.
- topic: personal_data_extraction
description: Never extract or store personal data beyond the session.
escalation_triggers:
- condition: user_requests_human
action: escalate
message: "Connecting you with a human agent."
- condition: confidence_below_threshold
threshold: 0.3
action: escalate
message: "I'm not confident in my answer. Let me connect you with a specialist."Runtime engine
The GuardrailEngine runs regex-based checks on every input (before the LLM) and output (after the LLM). It’s loaded automatically by zil.create_agent() and attached to the agent as agent._zil_guardrails.
Built-in detection
When detection.prompt_injection: true, the engine checks input against 10 patterns covering:
- Ignore/disregard instructions
- DAN jailbreak attempts
- System prompt override and extraction
- XML/instruction tag injection
- Rule override attempts
- Instruction extraction via questions
- Task override attempts
When detection.pii_output: true, the engine checks output for SSN and credit card patterns. Set detection.pii_input: true to also scan user input.
Custom patterns
Add project-specific regex patterns via blocked_patterns:
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Pattern identifier |
pattern | string | Yes | Regex pattern |
target | string | Yes | input, output, or both |
severity | string | Yes | block (reject), warn (log + allow), or log (allow) |
Denied topics
Simple keyword matching on user input. If any keyword appears (case-insensitive), the input is blocked.
Output constraints
| Field | Type | Required | Description |
|---|---|---|---|
max_response_length | integer | No | Maximum characters per response |
SDK API
from zil.sdk.guardrails import GuardrailEngine
engine = GuardrailEngine.from_config(guardrails_dict)
# Check user input before sending to LLM
result = engine.check_input(user_message)
if result.blocked:
return "I can't help with that."
# Check agent output before returning to user
result = engine.check_output(agent_response)
if result.blocked:
return "[redacted]"GuardrailResult fields:
| Field | Type | Description |
|---|---|---|
passed | bool | True if no blocking violations |
blocked | bool | True if action is block |
action | str | allow, block, or warn |
violations | list | List of Violation objects with rule, description, severity, matched_text |
OTel integration
When a tracer is active, GuardrailCallback emits spans:
guardrail.check.input— with violation attributesguardrail.check.output— with violation attributes
LLM instruction rules
These sections are not runtime-enforced. They are converted to natural-language directives and included in the system prompt.
hard_blocks[]
| Field | Type | Required | Description |
|---|---|---|---|
topic | string | Yes | Identifier for the blocked topic |
description | string | Yes | Human-readable explanation of the block |
escalation_triggers[]
| Field | Type | Required | Description |
|---|---|---|---|
condition | string | Yes | Trigger condition identifier |
action | string | Yes | Action to take (escalate) |
message | string | No | Message shown to the user during escalation |
threshold | float | No | Numeric threshold for conditions like confidence_below_threshold |
Security audit
Use zil audit to assess your guardrail configuration:
zil audit # full security report
zil audit --fix # with remediation suggestionsSee CLI Reference — zil audit for details.
Identity directory
Guardrails are one of three required files in the identity/ directory:
identity/
├── persona.md # Who the agent is (markdown)
├── instructions.md # How the agent behaves (markdown)
└── guardrails.yaml # Hard rules (structured YAML)All three files are required when the spec.identity field is present in the manifest. zil validate checks for their existence and validates guardrails.yaml structure.
How the SDK uses identity files
When you call zil.create_agent(), the SDK:
- Loads the runtime engine from
guardrails.yaml— attaches asagent._zil_guardrailsfor input/output checking - Composes a system prompt from all three identity files:
persona.md— included as-isinstructions.md— included as-isguardrails.yaml→hard_blocksandescalation_triggersconverted to natural-language directives
You can disable the runtime engine with enable_guardrails=False:
agent = zil.create_agent(enable_guardrails=False)See SDK Reference for all options.
Examples
Financial agent (strict)
detection:
prompt_injection: true
pii_output: true
pii_input: true # financial data is sensitive
blocked_patterns:
- name: account_numbers
pattern: '\b\d{10,12}\b'
target: output
severity: block
denied_topics:
- "competitor pricing"
- "investment advice"
output_constraints:
max_response_length: 2000
hard_blocks:
- topic: financial_advice
description: Never provide specific financial advice or recommendations.
- topic: legal_advice
description: Never provide specific legal counsel.
escalation_triggers:
- condition: user_requests_human
action: escalate
- condition: regulatory_topic_detected
action: escalate
message: "This topic requires human review. Connecting you now."Internal helpdesk agent
detection:
prompt_injection: true
pii_output: true
blocked_patterns:
- name: internal_urls
pattern: "https?://internal\\."
target: output
severity: block
denied_topics: []
output_constraints:
max_response_length: 3000
hard_blocks:
- topic: external_data_sharing
description: Never share internal company data outside the organization.
escalation_triggers:
- condition: user_requests_human
action: escalate
- condition: confidence_below_threshold
threshold: 0.5
action: escalate