npx zenkit init claude

Disciplined workflows
for Claude Code.

Add structured specs, plans, audits, checkpoints, and handoffs to any project. One command. Native slash commands and skills. No daemon, no framework, no lock-in.

npx zenkit init claude

How it works

The problem with agent workflows

Most AI-assisted development workflows share the same structural failures. These are not model capability issues — they are protocol gaps.

Drift

Agents wander from the plan. Each step introduces compounding divergence from the intended architecture.

Verbosity

Workflows burn tokens on narration, restating context, and theatrical reasoning instead of producing artifacts.

Hidden uncertainty

Agents report success without distinguishing what was validated from what was assumed.

Lost context

Handoffs between agents or sessions lose assumptions, constraints, and decisions made earlier.

Native Claude Code integration

ZenKit installs as slash commands and skills that Claude Code already supports. No framework to learn. No runtime to manage. Just structured discipline added to your existing workflow.

/zenkit-specDefine what to build before building it

/zenkit-planBreak a spec into tasks with acceptance criteria

/zenkit-buildImplement with documented decisions

/zenkit-auditReview for correctness, security, alignment

/zenkit-checkpointCapture state — what's validated vs assumed

/zenkit-handoffTransfer context without losing decisions

terminal

$ npx zenkit init claude

ZenKit for Claude Code
======================

  created .claude/commands/zenkit-spec.md
  created .claude/commands/zenkit-plan.md
  created .claude/commands/zenkit-build.md
  created .claude/commands/zenkit-audit.md
  created .claude/commands/zenkit-checkpoint.md
  created .claude/commands/zenkit-handoff.md
  created .claude/skills/zenkit-audit/SKILL.md
  created .claude/skills/zenkit-handoff/SKILL.md
  created .claude/skills/zenkit-checkpoint/SKILL.md
  created CLAUDE.md

Done. Start with: /zenkit-spec "your feature"

Core primitives

Six categories of artifacts that compose into disciplined workflows. Each is a plain file — markdown or JSON Schema — readable by humans and machines alike.

Commands

Eight workflow verbs: spec, plan, build, audit, refactor, handoff, checkpoint, ship. Each has a defined input/output contract.

/plan → structured plan with tasks, constraints, and acceptance criteria

Schemas

JSON Schema definitions for handoffs, tasks, audits, checkpoints, and benchmarks. Machine-validateable contracts.

handoff.schema.json validates every agent-to-agent transfer

Skills

Reusable capabilities: architecture review, security audit, bug triage, prompt pruning, release checks. Composable within commands.

security-review skill triggers during /audit on auth-related changes

Hooks

Automatic checkpoints at workflow boundaries. Pre-change validates plans exist. Post-change validates tests pass. Pre-ship validates all gates.

pre-ship hook blocks deploy if audit findings are unaddressed

Checkpoints

Explicit state snapshots with gate conditions. Distinguish validated facts from assumptions. Enable bounded rollback.

checkpoint captures git ref, test status, and token spend before shipping

Rubrics

Evaluation criteria for execution quality, verbosity, and architectural alignment. Quantified scoring on a 0-10 scale.

verbosity-score penalizes restating known context or theatrical reasoning

Why lightweight matters

Heavier frameworks solve real problems but introduce their own. Every abstraction layer is a source of drift, a cost multiplier, and a barrier to understanding what actually happened during execution.

instead ofOrchestration runtime with daemon processes

Plain files. No daemon. No runtime dependency.

instead ofAgent personas with elaborate backstories

Agent contracts with bounded responsibility and explicit handoffs.

instead ofCustom DSL for workflow definition

Markdown commands + JSON Schema. Tools you already know.

instead ofVendor-locked tool integrations

Runtime-agnostic. Works with Claude Code, local runtimes, or custom harnesses.

instead ofDashboard-first management layer

CLI-first. Files in your repo. Version-controlled with your code.

Benchmark: criteria-driven verification

ZenKit benchmarks verify acceptance criteria against the actual implementation — file contents, schema validity, test execution, and JSON value checks. Not file existence. Not narrative claims.

6/6 specs passed

claude code pack

12/12 criteria, 33/33 checks

cli tool

7/7 criteria, 20/20 checks

handoff system

9/9 criteria, 24/24 checks

protocol completeness

10/10 criteria, 37/37 checks

schema validator playground

8/8 criteria, 25/25 checks

self audit

10/10 criteria, 25/25 checks

Detail: Schema Validator Playground

Status

pass

Criteria

8/8

Checks

25/25

Telemetry

~37,500 tokens, ~$0.29(estimated)

Acceptance criteria

ac-1Schema selector component exists and exports SchemaSelector

src/components/playground/SchemaSelector.tsx contains 'export function SchemaSelector'

ac-2JSON editor component exists and accepts value/onChange props

src/components/playground/JsonEditor.tsx contains 'export function JsonEditor'

ac-3Validation results component displays errors with paths

src/components/playground/ValidationResults.tsx contains 'err.path'

ac-4Playground page wires schema selection, editing, and validation together

src/app/playground/page.tsx contains 'validateAgainstSchema'

ac-5All 5 ZenKit schemas are registered and compilable

6 schemas found (expected 6), 0 compilation errors

ac-6Example data exists and validates for each schema

valid-handoff.json: valid against handoff.schema.json; schemas.ts registers 'handoff'; schemas.ts registers 'task'; schemas.ts registers 'audit'; schemas.ts registers 'checkpoint'; schemas.ts registers 'benchmark'

ac-7Unit tests exist and cover schema validation

src/lib/__tests__/schemas.test.ts exists

ac-8All schemas use consistent draft-07 format

All 6 schemas use http://json-schema.org/draft-07/schema#

What this does NOT prove

? Token and cost figures are estimates — no actual API telemetry is captured by this runner

? Acceptance criteria verify code structure and schema validity, not runtime UI behavior

? Stage durations reflect verification time, not original implementation time

ZenKit vs baseline

data source: illustrative

Metric

ZenKit

Baseline

Status

pass

Criteria

Checks

Both modes verify the same codebase — structural difference is in workflow metadata. Real comparison requires A/B execution.

Structured handoffs

Every agent-to-agent transfer uses the same contract. Context, assumptions, constraints, decisions, risks, and open questions — nothing is lost between stages.

handoff: backend-architect → frontend-architect

{
  "context": "Backend architect completed the user profile API endpoint with data validation and error handling.",
  "assumptions": [
    "PostgreSQL is the primary datastore",
    "Authentication middleware is already in place",
    "Profile images are stored in object storage, not the database"
  ],
  "constraints": [
    "Response time must be under 200ms at p95",
    "No breaking changes to existing /api/v1 endpoints"
  ],
  "decision": "Implemented as a new /api/v1/profile resource with GET/PATCH operations. Used existing ORM patterns rather than raw SQL for consistency.",
  "deliverable": {
    "type": "code",
    "description": "Profile API endpoint with validation, tests, and OpenAPI spec",
    "files_changed": [
      "src/api/profile.ts",
      "src/api/profile.test.ts",
      "docs/openapi.yaml"
    ],
    "validation_status": "passed"
  },
  "risks": [
    {
      "description": "Profile PATCH allows partial updates — concurrent writes could cause data races",
      "severity": "medium",
      "mitigation": "Added optimistic locking via updated_at timestamp check"
    }
  ],
  "open_questions": [
    "Should profile deletion be soft-delete or hard-delete?",
    "Is rate limiting needed on the PATCH endpoint?"
  ],
  "next_agent": "frontend-architect"
}

This handoff is validated against handoff.schema.json before the next agent begins work. Invalid handoffs are rejected.

Self-audit, not self-certification

ZenKit uses its own benchmark system to audit itself. This is structured introspection, not proof of correctness. The claims are only as strong as the checks behind them.

What self-audit does

Tests whether ZenKit's primitives are expressive enough to describe real work
Produces inspectable evidence — run the benchmark yourself and verify
Forces the same honesty requirements on ZenKit itself

What self-audit does NOT do

Does not prove ZenKit is correct — a system can only check what it knows to check
Does not replace independent inspection
Does not validate the rubrics themselves

Safeguards: benchmark checks are verifiable, uncertainty is required (not optional), limitations are inherited from specs, illustrative data is labeled, telemetry is never fabricated.

Three layers, adopt what you need

Layer 1: Claude Code pack

Slash commands, skills, and CLAUDE.md. One command, zero dependencies.

npx zenkit init claude

Layer 2: Protocol + CLI

Schemas, benchmarks, validation engine. For teams that want machine-verifiable workflows.

npm install zenkit

Layer 3: MCP server

Dynamic tool calls for validate, benchmark, and checkpoint. Coming soon.

planned

View on GitHub→

Disciplined workflowsfor Claude Code.

The problem with agent workflows

Native Claude Code integration

Core primitives

Why lightweight matters

Benchmark: criteria-driven verification

Structured handoffs

Self-audit, not self-certification

Three layers, adopt what you need

Disciplined workflows
for Claude Code.