Documentation Index
Fetch the complete documentation index at: https://kensa.sh/docs/llms.txt
Use this file to discover all available pages before exploring further.
kensa works standalone without a coding agent. Python 3.10+.
kensa init
Scaffold .kensa/, add the CLI to your project’s dev deps, and install skills for your coding agent.
kensa init # bare scaffold + uv add --dev kensa + interactive agent prompt
kensa init --example # also write a demo agent and scenario
kensa init --pytest # scaffold a pytest-native eval in tests/evals/
kensa init --force # overwrite the existing example (with --example)
kensa init --no-cli # skip the uv add --dev step
kensa init -a codex # non-interactive: install skills for one agent
kensa init -a all # install skills for every supported agent target
In an interactive terminal, kensa init prompts for the coding agent to install skills for, defaulting from existing project markers or all in a fresh repo. In non-interactive environments, pass -a/--agent claude|codex|cursor|opencode|gemini|other|all|none.
With --example, kensa detects your API keys and writes a demo:
ANTHROPIC_API_KEY set → Anthropic agent using claude-haiku-4-5
OPENAI_API_KEY set → OpenAI agent using gpt-5.4-mini
- Neither → stub agent (no API call)
--example writes .kensa/agents/example.py and .kensa/scenarios/example.yaml; run kensa eval afterwards to verify your setup. --pytest writes tests/evals/test_example.py, .kensa/scenarios/example.yaml, and .kensa/scenarios/example.jsonl without touching production code. Without --example or --pytest, kensa init leaves the directory empty so you (or your coding agent) can author scenarios from scratch.
kensa capture
Capture one real agent invocation so kensa generate has a trace to mine. Pass the child command after --.
kensa capture -i "refund this order" -- python agent.py
kensa capture -i "classify this ticket" -- uv run python src/agent.py
kensa capture -- python agent.py # agent reads its own input
| Flag | Default | Description |
|---|
-i, --input | — | Input string appended as the final argv element (mirrors scenario.input) |
Writes a capture-kind run manifest under .kensa/runs/ and a JSONL trace under .kensa/traces/. The child process inherits KENSA_TRACE_DIR, so any installed instrumentor (Anthropic, OpenAI, LangChain) records spans automatically. If no spans land, the CLI flags it and points at kensa.instrument().
kensa run rejects capture-kind manifests; only kensa generate consumes them.
kensa generate
Synthesize scenario YAMLs from captured traces using an LLM. Run kensa capture (or a prior kensa run) first so there are traces to mine.
kensa generate # 3 scenarios from the latest run
kensa generate -n 5 # 5 scenarios
kensa generate --run-id 20260423T120000 # from a specific run
kensa generate --trace path/to/trace.jsonl # from an explicit trace file (repeatable)
kensa generate --dry-run # print YAML to stdout, write nothing
kensa generate --force # overwrite existing scenario files
kensa generate --run-command 'python .kensa/agents/app.py' # override entrypoint hint
| Flag | Default | Description |
|---|
--run-id | latest | Run ID to source traces from |
--trace | — | Specific trace file(s); repeatable. Overrides --run-id |
-n, --count | 3 | Number of scenarios to generate (1–20) |
--model | resolved | LLM model override (e.g. claude-sonnet-4-6) |
--dry-run | off | Print YAML to stdout; do not write files |
--force | off | Overwrite existing scenario files |
--scenario-dir | .kensa/scenarios | Where to write generated scenarios |
--source-scenario-dir | auto | Where to scan for existing scenarios when recovering the observed run_command. Defaults to --scenario-dir if it already has scenarios, otherwise .kensa/scenarios. |
--run-command | inferred | Entrypoint argv to hint to the LLM (repeatable) |
Source priority: --trace → --run-id → latest capture manifest → latest run manifest. When the manifest references exactly one run_command, generated scenarios are rewritten to use it verbatim; with multiple observed entrypoints, anything outside the allowlist is rejected. Pass --run-command explicitly when no manifest is available.
Model resolution mirrors kensa judge: KENSA_JUDGE_MODEL env var → ANTHROPIC_API_KEY (uses claude-sonnet-4-6) → OPENAI_API_KEY (uses gpt-5.4-mini). Keys can live in a .env walked up from cwd.
Generated scenarios must include at least one of max_cost or max_turns, must have either checks or criteria, and cannot reference a judge: file (use inline criteria instead). Scenarios that fail these rules are surfaced in the CLI output with the reason.
kensa eval
Run + judge + report in one shot.
kensa eval # all scenarios
kensa eval -s classify_ticket # specific scenario (repeatable)
kensa eval --pytest tests/evals/ -k draft -q # run pytest-native evals
kensa eval --format markdown # CI-friendly output
kensa eval --timeout 600 # 10-minute per-scenario timeout
kensa eval --model claude-sonnet-4-6 # override judge model
| Flag | Default | Description |
|---|
--scenario-dir | .kensa/scenarios | Where scenario YAMLs live |
-s, --scenario-id | all | Run a specific scenario (repeatable) |
--timeout | 300 | Per-scenario timeout in seconds |
--model | resolved | Judge model override |
--pytest | off | Shell out to pytest and enable Kensa run/result artifacts |
--format | terminal | terminal, markdown, or json |
With --pytest, any remaining arguments are passed through to pytest. The pytest plugin expands @pytest.mark.kensa(...) tests into scenario cases and trials, then writes Kensa run/result artifacts for the completed pytest run.
kensa run
Run scenarios and capture traces. No judging.
kensa run # all scenarios
kensa run -s classify_ticket # specific scenario
kensa run --dry-run # list what would run, don't execute
kensa run --format json # machine-readable manifest
| Flag | Default | Description |
|---|
--scenario-dir | .kensa/scenarios | Where scenario YAMLs live |
-s, --scenario-id | all | Run a specific scenario (repeatable) |
--timeout | 300 | Per-scenario timeout in seconds |
--dry-run | off | List scenarios that would run, without executing |
--format | text | text or json |
Each scenario runs in its own subprocess with KENSA_TRACE_DIR set. Traces are written as JSONL to .kensa/traces/.
kensa judge
Score the latest run with checks + LLM judge.
kensa judge # default model, latest run
kensa judge --model claude-haiku-4-5 # override model
kensa judge --run-id abc123 # specific run
kensa judge --format json # machine-readable
| Flag | Default | Description |
|---|
--run-id | latest | Which run to judge |
--model | resolved | Judge model override |
--format | text | text or json |
Checks run first. If all pass, the LLM judge evaluates criteria. If any check fails, the judge is skipped (fail-fast).
kensa report
Generate reports from the latest run.
kensa report # rich terminal output
kensa report --format markdown # CI-friendly
kensa report --format json # machine-readable
kensa report --format html # standalone HTML file
kensa report -o results.md --format markdown # write to file
kensa report --run-id abc123 -v # full reasoning for a past run
| Flag | Default | Description |
|---|
--run-id | latest | Which run to render |
--format | terminal | terminal, markdown, json, or html |
-o, --output | stdout | Write to file instead of stdout |
-v, --verbose | off | Show full check details and judge reasoning |
kensa report always writes a standalone HTML report to .kensa/reports/ as a side effect, regardless of --format.
When a scenario includes a trajectory check, report output also surfaces the numeric
trajectory_accuracy and step_efficiency metrics alongside pass/fail.
kensa analyze
Surface cost, latency, and anomalies across runs.
kensa analyze # text summary
kensa analyze --format json # machine-readable
kensa analyze -o analysis.json --format json
| Flag | Default | Description |
|---|
--trace-dir | .kensa/traces | Where trace JSONL files live |
--format | text | text or json |
-o, --output | stdout | Write to file instead of stdout |
Outputs trace-level cost and latency distributions, overall success rate, per-tool usage, and flagged anomalies such as cost outliers, latency outliers, repeated tool calls, and high turn counts.
kensa doctor
Verify your setup is ready to run.
Checks:
- Python version (3.10+)
- Package manager detection (uv, pipenv, pip)
.kensa/scenarios/ directory exists
.env file loaded
- API keys (
ANTHROPIC_API_KEY, OPENAI_API_KEY)
- Trace directory writable
- SDK instrumentation (scans agent scripts for openai/anthropic/langchain imports, verifies instrumentor packages)
- Judge provider instantiation
kensa mcp
Serve kensa over the Model Context Protocol for LLM clients (Claude Code, Cursor, Codex, Claude Desktop, etc.). Requires the mcp extra.
uv add "kensa[mcp]"
uv run kensa mcp # stdio transport (default)
uv run kensa mcp --http --port 8765 # streamable HTTP, localhost-only
For a zero-install launcher (no uv add needed), use the kensa-mcp shim package: uvx kensa-mcp.
| Flag | Default | Description |
|---|
--http | off | Use HTTP transport instead of stdio |
--host | 127.0.0.1 | HTTP host (with --http) |
--port | 8765 | HTTP port (with --http) |
Exposes 7 tools (init, doctor, run, judge, eval, report, analyze) and 8 resources under the kensa:// namespace. See MCP server for the full reference.
Environment variables
| Variable | Purpose |
|---|
KENSA_TRACE_DIR | Directory for JSONL span output. Set automatically during kensa run. |
KENSA_JUDGE_MODEL | Override the default judge model. |
ANTHROPIC_API_KEY | Anthropic API key for judge and/or agent. |
OPENAI_API_KEY | OpenAI API key for judge and/or agent. |