CLI Reference

kensa works standalone without a coding agent. Python 3.10+.

kensa init

Scaffold .kensa/, add the CLI to your project’s dev deps, and install skills for your coding agent.

kensa init                 # bare scaffold + uv add --dev kensa + interactive agent prompt
kensa init --example       # also write a demo agent and scenario
kensa init --pytest        # scaffold a pytest-native eval in tests/evals/
kensa init --force         # overwrite the existing example (with --example)
kensa init --no-cli        # skip the uv add --dev step
kensa init -a codex        # non-interactive: install skills for one agent
kensa init -a all          # install skills for every supported agent target

ANTHROPIC_API_KEY set → Anthropic agent using claude-haiku-4-5
OPENAI_API_KEY set → OpenAI agent using gpt-5.4-mini
Neither → stub agent (no API call)

--example writes .kensa/agents/example.py and .kensa/scenarios/example.yaml; run kensa eval afterwards to verify your setup. --pytest writes tests/evals/test_example.py, .kensa/scenarios/example.yaml, and .kensa/scenarios/example.jsonl without touching production code. Without --example or --pytest, kensa init leaves the directory empty so you (or your coding agent) can author scenarios from scratch.

kensa capture

Capture one real agent invocation so kensa generate has a trace to mine. Pass the child command after --.

kensa capture -i "refund this order" -- python agent.py
kensa capture -i "classify this ticket" -- uv run python src/agent.py
kensa capture -- python agent.py                # agent reads its own input

Flag	Default	Description
`-i, --input`	—	Input string appended as the final argv element (mirrors `scenario.input`)

Writes a capture-kind run manifest under .kensa/runs/ and a JSONL trace under .kensa/traces/. The child process inherits KENSA_TRACE_DIR, so any installed instrumentor (Anthropic, OpenAI, LangChain) records spans automatically. If no spans land, the CLI flags it and points at kensa.instrument(). kensa run rejects capture-kind manifests; only kensa generate consumes them.

kensa generate

Synthesize scenario YAMLs from captured traces using an LLM. Run kensa capture (or a prior kensa run) first so there are traces to mine.

kensa generate                               # 3 scenarios from the latest run
kensa generate -n 5                          # 5 scenarios
kensa generate --run-id 20260423T120000      # from a specific run
kensa generate --trace path/to/trace.jsonl   # from an explicit trace file (repeatable)
kensa generate --dry-run                     # print YAML to stdout, write nothing
kensa generate --force                       # overwrite existing scenario files
kensa generate --run-command 'python .kensa/agents/app.py'  # override entrypoint hint

Flag	Default	Description
`--run-id`	latest	Run ID to source traces from
`--trace`	—	Specific trace file(s); repeatable. Overrides `--run-id`
`-n, --count`	`3`	Number of scenarios to generate (1–20)
`--model`	resolved	LLM model override (e.g. `claude-sonnet-4-6`)
`--dry-run`	off	Print YAML to stdout; do not write files
`--force`	off	Overwrite existing scenario files
`--scenario-dir`	`.kensa/scenarios`	Where to write generated scenarios
`--source-scenario-dir`	auto	Where to scan for existing scenarios when recovering the observed `run_command`. Defaults to `--scenario-dir` if it already has scenarios, otherwise `.kensa/scenarios`.
`--run-command`	inferred	Entrypoint argv to hint to the LLM (repeatable)

Source priority: --trace → --run-id → latest capture manifest → latest run manifest. When the manifest references exactly one run_command, generated scenarios are rewritten to use it verbatim; with multiple observed entrypoints, anything outside the allowlist is rejected. Pass --run-command explicitly when no manifest is available. Model resolution mirrors kensa judge: KENSA_JUDGE_MODEL env var → ANTHROPIC_API_KEY (uses claude-sonnet-4-6) → OPENAI_API_KEY (uses gpt-5.4-mini). Keys can live in a .env walked up from cwd. Generated scenarios must include at least one of max_cost or max_turns, must have either checks or criteria, and cannot reference a judge: file (use inline criteria instead). Scenarios that fail these rules are surfaced in the CLI output with the reason.

kensa eval

Run + judge + report in one shot.

kensa eval                       # all scenarios
kensa eval -s classify_ticket      # specific scenario (repeatable)
kensa eval --pytest tests/evals/ -k draft -q  # run pytest-native evals
kensa eval --format markdown     # CI-friendly output
kensa eval --timeout 600         # 10-minute per-scenario timeout
kensa eval --model claude-sonnet-4-6  # override judge model

Flag	Default	Description
`--scenario-dir`	`.kensa/scenarios`	Where scenario YAMLs live
`-s, --scenario-id`	all	Run a specific scenario (repeatable)
`--timeout`	`300`	Per-scenario timeout in seconds
`--model`	resolved	Judge model override
`--pytest`	off	Shell out to pytest and enable Kensa run/result artifacts
`--format`	`terminal`	`terminal`, `markdown`, or `json`

With --pytest, any remaining arguments are passed through to pytest. The pytest plugin expands @pytest.mark.kensa(...) tests into scenario cases and trials, then writes Kensa run/result artifacts for the completed pytest run.

kensa run

Run scenarios and capture traces. No judging.

kensa run                              # all scenarios
kensa run -s classify_ticket             # specific scenario
kensa run --dry-run                    # list what would run, don't execute
kensa run --format json                # machine-readable manifest

Flag	Default	Description
`--scenario-dir`	`.kensa/scenarios`	Where scenario YAMLs live
`-s, --scenario-id`	all	Run a specific scenario (repeatable)
`--timeout`	`300`	Per-scenario timeout in seconds
`--dry-run`	off	List scenarios that would run, without executing
`--format`	`text`	`text` or `json`

Each scenario runs in its own subprocess with KENSA_TRACE_DIR set. Traces are written as JSONL to .kensa/traces/.

kensa judge

Score the latest run with checks + LLM judge.

kensa judge                            # default model, latest run
kensa judge --model claude-haiku-4-5   # override model
kensa judge --run-id abc123            # specific run
kensa judge --format json              # machine-readable

Flag	Default	Description
`--run-id`	latest	Which run to judge
`--model`	resolved	Judge model override
`--format`	`text`	`text` or `json`

Checks run first. If all pass, the LLM judge evaluates criteria. If any check fails, the judge is skipped (fail-fast).

kensa report

Generate reports from the latest run.

kensa report                          # rich terminal output
kensa report --format markdown        # CI-friendly
kensa report --format json            # machine-readable
kensa report --format html            # standalone HTML file
kensa report -o results.md --format markdown  # write to file
kensa report --run-id abc123 -v       # full reasoning for a past run

Flag	Default	Description
`--run-id`	latest	Which run to render
`--format`	`terminal`	`terminal`, `markdown`, `json`, or `html`
`-o, --output`	stdout	Write to file instead of stdout
`-v, --verbose`	off	Show full check details and judge reasoning

kensa report always writes a standalone HTML report to .kensa/reports/ as a side effect, regardless of --format. When a scenario includes a trajectory check, report output also surfaces the numeric trajectory_accuracy and step_efficiency metrics alongside pass/fail.

kensa analyze

Surface cost, latency, and anomalies across runs.

kensa analyze                         # text summary
kensa analyze --format json           # machine-readable
kensa analyze -o analysis.json --format json

Flag	Default	Description
`--trace-dir`	`.kensa/traces`	Where trace JSONL files live
`--format`	`text`	`text` or `json`
`-o, --output`	stdout	Write to file instead of stdout

Outputs trace-level cost and latency distributions, overall success rate, per-tool usage, and flagged anomalies such as cost outliers, latency outliers, repeated tool calls, and high turn counts.

kensa doctor

Verify your setup is ready to run.

kensa doctor

Checks:

Python version (3.10+)
Package manager detection (uv, pipenv, pip)
.kensa/scenarios/ directory exists
.env file loaded
API keys (ANTHROPIC_API_KEY, OPENAI_API_KEY)
Trace directory writable
SDK instrumentation (scans agent scripts for openai/anthropic/langchain imports, verifies instrumentor packages)
Judge provider instantiation

kensa mcp

Serve kensa over the Model Context Protocol for LLM clients (Claude Code, Cursor, Codex, Claude Desktop, etc.). Requires the mcp extra.

uv add "kensa[mcp]"
uv run kensa mcp                       # stdio transport (default)
uv run kensa mcp --http --port 8765    # streamable HTTP, localhost-only

For a zero-install launcher (no uv add needed), use the kensa-mcp shim package: uvx kensa-mcp.

Flag	Default	Description
`--http`	off	Use HTTP transport instead of stdio
`--host`	`127.0.0.1`	HTTP host (with `--http`)
`--port`	`8765`	HTTP port (with `--http`)

Exposes 7 tools (init, doctor, run, judge, eval, report, analyze) and 8 resources under the kensa:// namespace. See MCP server for the full reference.

Environment variables

Variable	Purpose
`KENSA_TRACE_DIR`	Directory for JSONL span output. Set automatically during `kensa run`.
`KENSA_JUDGE_MODEL`	Override the default judge model.
`ANTHROPIC_API_KEY`	Anthropic API key for judge and/or agent.
`OPENAI_API_KEY`	OpenAI API key for judge and/or agent.

Getting started

Reference

Workflows

Releases

CLI Reference

kensa init

kensa capture

kensa generate

kensa eval

kensa run

kensa judge

kensa report

kensa analyze

kensa doctor

kensa mcp

Environment variables

Getting started

Reference

Workflows

Releases

Documentation Index

​kensa init

​kensa capture

​kensa generate

​kensa eval

​kensa run

​kensa judge

​kensa report

​kensa analyze

​kensa doctor

​kensa mcp

​Environment variables

kensa init

kensa capture

kensa generate

kensa eval

kensa run

kensa judge

kensa report

kensa analyze

kensa doctor

kensa mcp

Environment variables