How it works
Kensa turns agent behavior into repeatable evals: scenarios in, traces captured, checks run, reports out.
Zero to eval
Ask your coding agent to inspect the codebase and draft the first scenarios. You review evals instead of starting from a blank file.
Runs become traces
Kensa captures LLM calls, tool use, tokens, cost, and latency while your agent runs each scenario.
Checks gate judges
Assertions run before LLM judges, catching obvious regressions without spending tokens.
Ship with evidence
Get verdicts, traces, cost, latency, and failure details in terminal, Markdown, JSON, or HTML.
Each run leaves traces that kensa can turn into sharper scenarios.
Skills
5 skills take you from zero to eval, or from traces to targeted iteration.
/audit-evalsAssess readiness, identify testable behaviors, prepare the environment. The default entry point.
/generate-scenariosHappy paths, edge cases, tool usage, error handling, cost bounds. One command.
/generate-judgesBinary pass/fail definitions with few-shot examples, ready to reuse across scenarios.
/validate-judgeTest judge accuracy against human labels. Iterates until TPR and TNR meet threshold.
/diagnose-errorsCategorize failures, identify patterns, recommend next action.
CLI PY3.10+
Works standalone for CI and local iteration. Checks run before the judge, so obvious failures stop early without spending tokens.
kensa initScaffold .kensa/ (bare; --example for a demo)kensa captureRecord one real agent invocation as a tracekensa generateSynthesize scenarios from captured traceskensa evalrun + judge + report in one shotkensa runExecute scenarios in subprocesseskensa judgeDeterministic checks + LLM judgekensa reportTerminal, markdown, JSON, or HTML outputkensa analyzeCost/latency stats + anomaly flaggingkensa doctorPre-flight environment checkskensa mcpServe kensa over MCP for LLM clientsFAQ
What agents does kensa work with?
Any Python agent that makes LLM calls. Auto-instrumentation covers Anthropic, OpenAI, and LangChain out of the box. Other providers work with manual OTel config.
Do I need to modify my agent code?
No. kensa auto-instruments your agent at startup. Zero code changes needed.
Can I run kensa in CI?
Yes. kensa eval --format markdown is all you need. Deterministic checks need no API keys. Add judge keys as secrets for LLM-judged criteria.
Can I drive kensa from an MCP client?
Yes. In Claude Code, claude mcp add kensa -- uvx kensa-mcp registers the stdio server — uvx fetches the kensa-mcp package from PyPI on first run, no pre-install needed. Every CLI action is a tool, and runs, scenarios, and judges are readable as resources under kensa://.
Is kensa free?
Yes, it is MIT licensed. The only cost is your LLM API calls for judge criteria, and that's optional.