Disciplined workflows
for Claude Code.
Add structured specs, plans, audits, checkpoints, and handoffs to any project. One command. Native slash commands and skills. No daemon, no framework, no lock-in.
The problem with agent workflows
Most AI-assisted development workflows share the same structural failures. These are not model capability issues — they are protocol gaps.
Agents wander from the plan. Each step introduces compounding divergence from the intended architecture.
Workflows burn tokens on narration, restating context, and theatrical reasoning instead of producing artifacts.
Agents report success without distinguishing what was validated from what was assumed.
Handoffs between agents or sessions lose assumptions, constraints, and decisions made earlier.
Native Claude Code integration
ZenKit installs as slash commands and skills that Claude Code already supports. No framework to learn. No runtime to manage. Just structured discipline added to your existing workflow.
$ npx zenkit init claude ZenKit for Claude Code ====================== created .claude/commands/zenkit-spec.md created .claude/commands/zenkit-plan.md created .claude/commands/zenkit-build.md created .claude/commands/zenkit-audit.md created .claude/commands/zenkit-checkpoint.md created .claude/commands/zenkit-handoff.md created .claude/skills/zenkit-audit/SKILL.md created .claude/skills/zenkit-handoff/SKILL.md created .claude/skills/zenkit-checkpoint/SKILL.md created CLAUDE.md Done. Start with: /zenkit-spec "your feature"
Core primitives
Six categories of artifacts that compose into disciplined workflows. Each is a plain file — markdown or JSON Schema — readable by humans and machines alike.
Eight workflow verbs: spec, plan, build, audit, refactor, handoff, checkpoint, ship. Each has a defined input/output contract.
JSON Schema definitions for handoffs, tasks, audits, checkpoints, and benchmarks. Machine-validateable contracts.
Reusable capabilities: architecture review, security audit, bug triage, prompt pruning, release checks. Composable within commands.
Automatic checkpoints at workflow boundaries. Pre-change validates plans exist. Post-change validates tests pass. Pre-ship validates all gates.
Explicit state snapshots with gate conditions. Distinguish validated facts from assumptions. Enable bounded rollback.
Evaluation criteria for execution quality, verbosity, and architectural alignment. Quantified scoring on a 0-10 scale.
Why lightweight matters
Heavier frameworks solve real problems but introduce their own. Every abstraction layer is a source of drift, a cost multiplier, and a barrier to understanding what actually happened during execution.
Benchmark: criteria-driven verification
ZenKit benchmarks verify acceptance criteria against the actual implementation — file contents, schema validity, test execution, and JSON value checks. Not file existence. Not narrative claims.
Structured handoffs
Every agent-to-agent transfer uses the same contract. Context, assumptions, constraints, decisions, risks, and open questions — nothing is lost between stages.
{
"context": "Backend architect completed the user profile API endpoint with data validation and error handling.",
"assumptions": [
"PostgreSQL is the primary datastore",
"Authentication middleware is already in place",
"Profile images are stored in object storage, not the database"
],
"constraints": [
"Response time must be under 200ms at p95",
"No breaking changes to existing /api/v1 endpoints"
],
"decision": "Implemented as a new /api/v1/profile resource with GET/PATCH operations. Used existing ORM patterns rather than raw SQL for consistency.",
"deliverable": {
"type": "code",
"description": "Profile API endpoint with validation, tests, and OpenAPI spec",
"files_changed": [
"src/api/profile.ts",
"src/api/profile.test.ts",
"docs/openapi.yaml"
],
"validation_status": "passed"
},
"risks": [
{
"description": "Profile PATCH allows partial updates — concurrent writes could cause data races",
"severity": "medium",
"mitigation": "Added optimistic locking via updated_at timestamp check"
}
],
"open_questions": [
"Should profile deletion be soft-delete or hard-delete?",
"Is rate limiting needed on the PATCH endpoint?"
],
"next_agent": "frontend-architect"
}This handoff is validated against handoff.schema.json before the next agent begins work. Invalid handoffs are rejected.
Self-audit, not self-certification
ZenKit uses its own benchmark system to audit itself. This is structured introspection, not proof of correctness. The claims are only as strong as the checks behind them.
- Tests whether ZenKit's primitives are expressive enough to describe real work
- Produces inspectable evidence — run the benchmark yourself and verify
- Forces the same honesty requirements on ZenKit itself
- Does not prove ZenKit is correct — a system can only check what it knows to check
- Does not replace independent inspection
- Does not validate the rubrics themselves
Safeguards: benchmark checks are verifiable, uncertainty is required (not optional), limitations are inherited from specs, illustrative data is labeled, telemetry is never fabricated.
Three layers, adopt what you need
Slash commands, skills, and CLAUDE.md. One command, zero dependencies.
Schemas, benchmarks, validation engine. For teams that want machine-verifiable workflows.
Dynamic tool calls for validate, benchmark, and checkpoint. Coming soon.