01
One command, one job.
Nothing silently chains into the next phase. Every handoff is a deliberate choice with the work product visible to a human.
By Erik Benjaminson, Founder, Sapient Technology Group · Published · v1
The factory moves work from a vague idea to merged, documented code through five connected phases. Every handoff between phases is explicit. No command silently chains into the next; humans see and approve every transition where judgment matters.
Brainstorm and clarify. Decide what to build before how.
Reuse-first audit, then minimal prose plan with optional deepening.
Prose to YAML to todo files. Wave-parallel agents, gated by build and test.
Automated PR review feeds a resolution loop until the quality bar is met.
Documentation sync, sealed by the same wave pattern as code.
The seam between interpretation and generation is the load-bearing wall of the whole system: a human-readable plan, a deterministic structured artifact, then the files.
Everything else in the factory follows from these. They are not preferences. They are the reason the wave-boundary quality gates almost never fail.
01
Nothing silently chains into the next phase. Every handoff is a deliberate choice with the work product visible to a human.
02
Interpretation is non-deterministic and belongs to a human reviewing a narrative plan. File generation is deterministic and belongs to a script.
03
Todos execute in parallel dependency waves. Each wave commits before the next begins, so a regression is a cheap rollback to a known-good point.
The discover phase answers a single question: what does the work need to produce? Brainstorming is an interview, not a generation. Clarification asks only the non-obvious questions. The output is one reviewable document, nothing more.
Many specs collapse at the first edge case because no one asked the implicit question. The clarify phase exists to ask it. Erik Benjaminson · Sapient Technology Group
brainstorming skill · /clarify-spec · Asks only non-obvious questions /ca-plan never writes code. Step zero is a mandatory audit against existing types, components, and prior art. If reuse exists, the plan halts there. Research is spawned only for genuine gaps.
The most expensive line of code is the one that duplicates something already in the codebase. Erik Benjaminson · Sapient Technology Group
repo-research-analyst sweeps the codebase for existing types, functions, components, and prior solutions. Frontmatter in docs/solutions/ is filtered by relevance before any plan is written.
Halt the plan. Cite the existing code that solves the problem. No new implementation is written.
Extend prior art with the smallest viable change. MINIMAL plan is the default posture.
YAGNI checklist. /deepen-plan only when scope genuinely warrants research and review fan-out.
/ca-plan · step 0 always runs the audit · The most expensive line is the duplicate one plans/<title>.md. code-simplicity-reviewer. P1/P2/P3 buckets. Approve/Reject/Modify per finding. The user decides what is incorporated. The prose plan is for humans. The YAML is the contract. The todo files are deterministic. Splitting these stages is the difference between a reviewable plan and a wall of generated tasks no one fully read.
plan.md
Narrative. Reviewable. The product of human judgment about scope and posture.
/plan-yaml
Converts narrative into structured tasks: dependencies, priorities, files touched, success criteria, context.
plan.yaml
Editable. Reviewable. The canonical source of truth between planning and execution.
/plan-todos
Deterministic generator. Same YAML always produces the same todos. Archives prior runs.
todos/*.md
One file per task with frontmatter. A PreToolUse hook validates every write before disk.
Because the YAML is canonical, /plan-todos is safe to re-run. Same inputs, same outputs.
The execute phase composes three pieces that are powerful only together. /execute-todos is the orchestrator. todo-executor is the worker. test-driven-development is the iron rail that auto-triggers under every implementation. Two independent quality filters stacked. Neither can be bypassed silently.
The orchestrator
/execute-todos
Computes the wave plan from todo dependencies. Spawns workers, gates the commit, never pushes.
The worker
todo-executor
One agent per todo per wave. Looks up current docs, deliberates three approaches, scores, picks, then implements.
The iron rail
test-driven-development
Auto-triggers on every implementation. No production code without a failing test first. If code exists before the test, delete it.
Wave 1 is every todo with no dependencies. Wave N is every todo whose dependencies were satisfied by waves 1 through N-1. npm run build && npm test must pass before each commit; agent-browser sweeps for runtime errors after the final wave.
npm run build && npm test → Commit feat(wave-N) → Never pushes humans decide when work leaves the branch The todo-executor is required to generate three alternative approaches and score each on a five-criterion rubric. The highest-scoring approach is selected with written reasoning, then implemented. The rubric isn't a suggestion; it is baked into the agent's required response format.
Fast and small. Fails the long tail: parenthetical notes, ranges, fractions, locale.
Aligns with the existing recipe pipeline. Handles edge cases without bespoke rules.
Non-deterministic at the unit-test boundary. Wrong abstraction for a hot path.
The skill auto-triggers the moment a todo-executor begins implementation. There is no opt-in. The cycle runs Red, Verify Red, Green, Verify Green, Refactor. Each step has a verification gate.
Red
One behavior. Smallest meaningful unit.
Verify Red
For the expected reason. Skipping verify counts as skipping TDD.
Green
Minimal code. No design flourishes.
Verify Green
Output must be pristine. No warnings, no skipped tests.
Refactor
Tests green throughout. No new behavior added.
The iron law
No production code without a failing test first.
If code exists before the test, delete it.
Multi-agent review across stakeholder perspectives (dev, ops, user, security, business) and scenarios (happy path, edge cases, scale, concurrency, failure modes). Conditional migration agents engage only when schema files are touched. Findings land in todos/{id}-pending-{priority}.md and feed straight back into /execute-todos. Todos are the universal currency.
Opening the PR triggers an automated Claude Code Review inside GitHub Actions. The bot posts severity-tagged findings as PR comments; /resolve-pr turns each one into a todo in the same format planning uses. /execute-todos then re-enters Phase 05 with those todos — identical TDD iron law, identical three-approach deliberation, identical wave commits, identical npm run build && npm test quality gate. Every comment travels the same factory rails that built the feature.
Review feedback enters the factory as todos. Same shape as planned work. Same guardrails. Same proof of passing tests before anything merges. Erik Benjaminson · Sapient Technology Group
anthropics/claude-code-action@v1 · pull_request · opened + synchronize · severity → priority pull_request event (opened, synchronize). Bot reads the diff and CI results, posts a severity-tagged review comment via gh pr comment. No human writes the first-pass review. plans/pr-resolution-<N>.md, runs interactive triage (Accept / Skip / Modify / Investigate More), emits todos/pr<N>-f<id>.md using the standard template. Stops there. todo-executor agent on opus with the three-approach scorecard, written under the test-driven-development skill (no production code without a failing test first), committed wave-by-wave, gated by npm run build && npm test, browser-validated on the final wave. origin/master, push, embed /update-docs --analyze-only status into the PR body, open the PR, reset local master to origin/master so the next squash merge lands cleanly. /update-docs closes the learning loop. It is both a pre-PR analyze step and a post-ship sync step. A state file tracks the last point of analysis, so the agent never re-reads what hasn't changed.
Code that ships without its documentation isn't done; it is a debt the next engagement will pay. Erik Benjaminson · Sapient Technology Group
/update-docs · pre-PR & post-ship · Same squash-safe pattern as code git-change-analyzer categorizes commits, doc-gap-analyzer (opus) identifies stale docs, doc-updater rewrites them. In full mode, auto-opens a docs/sync-<date> PR and resets master. | Phase | Artifact | Location |
|---|---|---|
| Discover | Brainstorm doc | docs/brainstorms/YYYY-MM-DD-*.md |
| Plan | Prose plan | plans/<kebab-case-title>.md |
| Plan | Structured tasks | plans/<title>.yaml |
| Execute | Todo files | todos/<id>-<slug>.md |
| Execute | Archived todos | todos/archive/<slug>-<timestamp>/ |
| Review | Review findings as todos | todos/<id>-pending-<priority>-*.md |
| Ship | PR | GitHub |
| Ship | PR resolution plan | plans/pr-resolution-<N>.md |
| Learn | Doc sync state | docs/tech-docs/.doc-sync-state.json |
The factory is strict about a small number of things and permissive about everything else. The strictness is at the seams; agents have full freedom inside a phase to choose how to do the work, but no freedom to cross the boundary unannounced.
01
/ca-plan audits existing code before research. Prevents reinventing wheels.
02
/plan-yaml happens while context is fresh. /plan-todos is deterministic and can be re-run safely.
03
If wave 3 breaks, waves 1 and 2 are already committed and safe.
04
Plans, reviews, and PR feedback all produce the same todo format. They all feed /execute-todos.
05
Both pre-PR check and post-ship cleanup. Documentation drift has no place to hide.
06
/execute-todos never pushes. /resolve-pr never executes. /ca-plan never codes. Every handoff is explicit.
Two independent quality filters on every line. Deliberation picks the best design on paper; test-driven development forces that design to prove itself before it ships. Erik Benjaminson · Sapient Technology Group
AI should expand what skilled people can build, decide, and deliver.
Sapient Technology Group makes that practical: turning new AI capabilities into working products, useful systems, and better ways of operating.
Accepting select client engagements.