Documentation Index
Fetch the complete documentation index at: https://kensa.sh/docs/llms.txt
Use this file to discover all available pages before exploring further.
Checks run before the LLM judge to save cost. If any check fails, the judge is skipped (fail-fast). A scenario passes only when all checks pass AND the judge passes.
Check types
| Check | What it tests |
|---|
output_contains | Output includes a string or pattern |
output_matches | Output matches a regex |
tools_called | All listed tools were invoked (set membership, order-free) |
tools_not_called | None of the listed tools were invoked |
tool_order | Tools called in this temporal sequence (use only when order is load-bearing) |
trajectory | Match the expected tool-call path, optionally with accuracy threshold and inline budgets |
max_cost | Total cost under threshold |
max_turns | LLM call count under limit |
max_duration | Execution time under limit |
no_repeat_calls | No duplicate tool calls with identical arguments |
Examples
Output checks
checks:
# String containment (case-insensitive by default)
- type: output_contains
params: { value: "confirmation number" }
# Case-sensitive containment
- type: output_contains
params: { value: "OK", case_sensitive: true }
# Regex match
- type: output_matches
params: { pattern: "\\d{6,}" }
description: Output contains a 6+ digit number
checks:
# Tools were called (set membership, order-free)
- type: tools_called
params: { tools: [search_flights] }
# Tools were NOT called (safety check)
- type: tools_not_called
params: { tools: [delete_account] }
description: Agent must never call delete
# Tools called in order
- type: tool_order
params: { order: [search_flights, book_flight] }
description: Must search before booking
# Canonical tool-call trajectory with optional budgets
- type: trajectory
params:
steps:
- tool: search_flights
- tool: book_flight
ordering: exact
args: ignore
min_accuracy: 1.0
max_steps: 2
max_tokens: 2000
max_duration_seconds: 30
description: Search, then book, within budget
# No duplicate calls (trace-wide; flags any tool called twice with the same args)
- type: no_repeat_calls
description: Agent should not redo identical work
trajectory is the higher-level path check for tool correctness. It emits trajectory_accuracy
and step_efficiency metrics in reports, and in V1 it is limited to one trajectory check per
scenario.
Resource checks
checks:
# Cost cap
- type: max_cost
params: { max_usd: 0.10 }
description: Under 10 cents
# Turn limit
- type: max_turns
params: { max: 5 }
description: Complete in 5 LLM calls
# Time limit
- type: max_duration
params: { max_seconds: 30 }
description: Under 30 seconds
Adding a check
Checks use a registry pattern. To add a new check type:
- Add a value to
CheckType in models.py
- Write a check function in
checks.py
- Register it in
CHECK_REGISTRY
# checks.py
def check_my_check(spans: list[Span], params: dict[str, Any]) -> CheckResult:
# Your logic here
return CheckResult(check="my_check", passed=True, detail="...")
CHECK_REGISTRY: dict[CheckType, CheckFn] = {
# ...existing checks...
CheckType.MY_CHECK: check_my_check,
}
No call-site changes needed. The registry handles dispatch.