Skip to main content

Documentation Index

Fetch the complete documentation index at: https://kensa.sh/docs/llms.txt

Use this file to discover all available pages before exploring further.

kensa works standalone without a coding agent. Python 3.10+.

kensa init

Scaffold .kensa/, add the CLI to your project’s dev deps, and install skills for your coding agent.
kensa init                 # bare scaffold + uv add --dev kensa + interactive agent prompt
kensa init --example       # also write a demo agent and scenario
kensa init --pytest        # scaffold a pytest-native eval in tests/evals/
kensa init --force         # overwrite the existing example (with --example)
kensa init --no-cli        # skip the uv add --dev step
kensa init -a codex        # non-interactive: install skills for one agent
kensa init -a all          # install skills for every supported agent target
In an interactive terminal, kensa init prompts for the coding agent to install skills for, defaulting from existing project markers or all in a fresh repo. In non-interactive environments, pass -a/--agent claude|codex|cursor|opencode|gemini|other|all|none. With --example, kensa detects your API keys and writes a demo:
  • ANTHROPIC_API_KEY set → Anthropic agent using claude-haiku-4-5
  • OPENAI_API_KEY set → OpenAI agent using gpt-5.4-mini
  • Neither → stub agent (no API call)
--example writes .kensa/agents/example.py and .kensa/scenarios/example.yaml; run kensa eval afterwards to verify your setup. --pytest writes tests/evals/test_example.py, .kensa/scenarios/example.yaml, and .kensa/scenarios/example.jsonl without touching production code. Without --example or --pytest, kensa init leaves the directory empty so you (or your coding agent) can author scenarios from scratch.

kensa capture

Capture one real agent invocation so kensa generate has a trace to mine. Pass the child command after --.
kensa capture -i "refund this order" -- python agent.py
kensa capture -i "classify this ticket" -- uv run python src/agent.py
kensa capture -- python agent.py                # agent reads its own input
FlagDefaultDescription
-i, --inputInput string appended as the final argv element (mirrors scenario.input)
Writes a capture-kind run manifest under .kensa/runs/ and a JSONL trace under .kensa/traces/. The child process inherits KENSA_TRACE_DIR, so any installed instrumentor (Anthropic, OpenAI, LangChain) records spans automatically. If no spans land, the CLI flags it and points at kensa.instrument(). kensa run rejects capture-kind manifests; only kensa generate consumes them.

kensa generate

Synthesize scenario YAMLs from captured traces using an LLM. Run kensa capture (or a prior kensa run) first so there are traces to mine.
kensa generate                               # 3 scenarios from the latest run
kensa generate -n 5                          # 5 scenarios
kensa generate --run-id 20260423T120000      # from a specific run
kensa generate --trace path/to/trace.jsonl   # from an explicit trace file (repeatable)
kensa generate --dry-run                     # print YAML to stdout, write nothing
kensa generate --force                       # overwrite existing scenario files
kensa generate --run-command 'python .kensa/agents/app.py'  # override entrypoint hint
FlagDefaultDescription
--run-idlatestRun ID to source traces from
--traceSpecific trace file(s); repeatable. Overrides --run-id
-n, --count3Number of scenarios to generate (1–20)
--modelresolvedLLM model override (e.g. claude-sonnet-4-6)
--dry-runoffPrint YAML to stdout; do not write files
--forceoffOverwrite existing scenario files
--scenario-dir.kensa/scenariosWhere to write generated scenarios
--source-scenario-dirautoWhere to scan for existing scenarios when recovering the observed run_command. Defaults to --scenario-dir if it already has scenarios, otherwise .kensa/scenarios.
--run-commandinferredEntrypoint argv to hint to the LLM (repeatable)
Source priority: --trace--run-id → latest capture manifest → latest run manifest. When the manifest references exactly one run_command, generated scenarios are rewritten to use it verbatim; with multiple observed entrypoints, anything outside the allowlist is rejected. Pass --run-command explicitly when no manifest is available. Model resolution mirrors kensa judge: KENSA_JUDGE_MODEL env var → ANTHROPIC_API_KEY (uses claude-sonnet-4-6) → OPENAI_API_KEY (uses gpt-5.4-mini). Keys can live in a .env walked up from cwd. Generated scenarios must include at least one of max_cost or max_turns, must have either checks or criteria, and cannot reference a judge: file (use inline criteria instead). Scenarios that fail these rules are surfaced in the CLI output with the reason.

kensa eval

Run + judge + report in one shot.
kensa eval                       # all scenarios
kensa eval -s classify_ticket      # specific scenario (repeatable)
kensa eval --pytest tests/evals/ -k draft -q  # run pytest-native evals
kensa eval --format markdown     # CI-friendly output
kensa eval --timeout 600         # 10-minute per-scenario timeout
kensa eval --model claude-sonnet-4-6  # override judge model
FlagDefaultDescription
--scenario-dir.kensa/scenariosWhere scenario YAMLs live
-s, --scenario-idallRun a specific scenario (repeatable)
--timeout300Per-scenario timeout in seconds
--modelresolvedJudge model override
--pytestoffShell out to pytest and enable Kensa run/result artifacts
--formatterminalterminal, markdown, or json
With --pytest, any remaining arguments are passed through to pytest. The pytest plugin expands @pytest.mark.kensa(...) tests into scenario cases and trials, then writes Kensa run/result artifacts for the completed pytest run.

kensa run

Run scenarios and capture traces. No judging.
kensa run                              # all scenarios
kensa run -s classify_ticket             # specific scenario
kensa run --dry-run                    # list what would run, don't execute
kensa run --format json                # machine-readable manifest
FlagDefaultDescription
--scenario-dir.kensa/scenariosWhere scenario YAMLs live
-s, --scenario-idallRun a specific scenario (repeatable)
--timeout300Per-scenario timeout in seconds
--dry-runoffList scenarios that would run, without executing
--formattexttext or json
Each scenario runs in its own subprocess with KENSA_TRACE_DIR set. Traces are written as JSONL to .kensa/traces/.

kensa judge

Score the latest run with checks + LLM judge.
kensa judge                            # default model, latest run
kensa judge --model claude-haiku-4-5   # override model
kensa judge --run-id abc123            # specific run
kensa judge --format json              # machine-readable
FlagDefaultDescription
--run-idlatestWhich run to judge
--modelresolvedJudge model override
--formattexttext or json
Checks run first. If all pass, the LLM judge evaluates criteria. If any check fails, the judge is skipped (fail-fast).

kensa report

Generate reports from the latest run.
kensa report                          # rich terminal output
kensa report --format markdown        # CI-friendly
kensa report --format json            # machine-readable
kensa report --format html            # standalone HTML file
kensa report -o results.md --format markdown  # write to file
kensa report --run-id abc123 -v       # full reasoning for a past run
FlagDefaultDescription
--run-idlatestWhich run to render
--formatterminalterminal, markdown, json, or html
-o, --outputstdoutWrite to file instead of stdout
-v, --verboseoffShow full check details and judge reasoning
kensa report always writes a standalone HTML report to .kensa/reports/ as a side effect, regardless of --format. When a scenario includes a trajectory check, report output also surfaces the numeric trajectory_accuracy and step_efficiency metrics alongside pass/fail.

kensa analyze

Surface cost, latency, and anomalies across runs.
kensa analyze                         # text summary
kensa analyze --format json           # machine-readable
kensa analyze -o analysis.json --format json
FlagDefaultDescription
--trace-dir.kensa/tracesWhere trace JSONL files live
--formattexttext or json
-o, --outputstdoutWrite to file instead of stdout
Outputs trace-level cost and latency distributions, overall success rate, per-tool usage, and flagged anomalies such as cost outliers, latency outliers, repeated tool calls, and high turn counts.

kensa doctor

Verify your setup is ready to run.
kensa doctor
Checks:
  • Python version (3.10+)
  • Package manager detection (uv, pipenv, pip)
  • .kensa/scenarios/ directory exists
  • .env file loaded
  • API keys (ANTHROPIC_API_KEY, OPENAI_API_KEY)
  • Trace directory writable
  • SDK instrumentation (scans agent scripts for openai/anthropic/langchain imports, verifies instrumentor packages)
  • Judge provider instantiation

kensa mcp

Serve kensa over the Model Context Protocol for LLM clients (Claude Code, Cursor, Codex, Claude Desktop, etc.). Requires the mcp extra.
uv add "kensa[mcp]"
uv run kensa mcp                       # stdio transport (default)
uv run kensa mcp --http --port 8765    # streamable HTTP, localhost-only
For a zero-install launcher (no uv add needed), use the kensa-mcp shim package: uvx kensa-mcp.
FlagDefaultDescription
--httpoffUse HTTP transport instead of stdio
--host127.0.0.1HTTP host (with --http)
--port8765HTTP port (with --http)
Exposes 7 tools (init, doctor, run, judge, eval, report, analyze) and 8 resources under the kensa:// namespace. See MCP server for the full reference.

Environment variables

VariablePurpose
KENSA_TRACE_DIRDirectory for JSONL span output. Set automatically during kensa run.
KENSA_JUDGE_MODELOverride the default judge model.
ANTHROPIC_API_KEYAnthropic API key for judge and/or agent.
OPENAI_API_KEYOpenAI API key for judge and/or agent.
Last modified on May 4, 2026