Scenarios are YAML files inDocumentation Index
Fetch the complete documentation index at: https://kensa.sh/docs/llms.txt
Use this file to discover all available pages before exploring further.
.kensa/scenarios/. Your coding agent generates these, but you can write them by hand.
Full example
Fields
| Field | Required | Description |
|---|---|---|
id | Yes | Unique identifier |
name | No | Human-readable name. Defaults to id. |
description | No | What this scenario tests |
source | No | How it was generated: code, traces, or user |
input | No | Literal input, or the JSONL field selector when cases is set. |
cases | No | Path to a JSONL file for parameterized cases. Resolves relative to the scenario YAML. |
trials | No | Number of repeated executions per case. Defaults to 1 (smoke). Values above 1 are measured runs. |
run_command | Command mode | Argv list passed to subprocess.run (no shell). When literal input is set, it is appended as the final argv element. |
env_overrides | No | Extra environment variables for this scenario’s subprocess |
dataset | No | Legacy alias for cases |
input_field | No | Legacy alias for input when cases/dataset is set |
expected_outcome | No | Natural-language description of success |
checks | No | List of deterministic checks |
criteria | No | Natural-language criteria for the LLM judge (mutually exclusive with judge) |
judge | No | Reference to a judge spec in .kensa/judges/ (mutually exclusive with criteria) |
trace_refs | No | Paths to previous trace files for context |
failure_pattern | No | Known failure pattern this scenario targets |
Checks vs criteria
Checks are deterministic and free. Use them for objective, binary conditions:- Was a specific tool called?
- Did the agent follow the expected tool trajectory?
- Did the agent stay under budget?
- Did it complete in fewer than N turns?
- Did the agent confirm before taking action?
- Was the response professional in tone?
- Did the agent avoid hallucinating details?
Case-driven scenarios
Point at a JSONL file where each row becomes a separate case. Usecases for the file and input for the field selector:
input field becomes the scenario input. Other fields can be referenced in check params via {{...}} placeholders. dataset and input_field still load for older scenarios, but new scenarios should use cases and input.
For pytest-native evals, case rows often hold partial conversations:
case.messages into the real application and record the result with case.output(...).
Trajectory checks
Usetrajectory when tool-call correctness matters more than any single tool event:
trajectory_accuracy and step_efficiency metrics in reports. In V1, each
scenario can define at most one trajectory check.