SDD Deep Dive: Five Tools, One Feature
The same forgot-password spec — built five different ways.
Five Tools, One Feature
The same forgot-password spec — built five different ways.
Same discipline. Different hands on the pen.
From Principles to Tools
In the intro deck, four principles produced a cleaner result:
| Principle | What it means |
|---|---|
| Gaps before code | Surface every ambiguity before the agent generates anything |
| Spec is the contract | The agent reads the spec — not your memory or your Slack thread |
| Tests trace to spec lines | A failing test names a decision, not a code location |
| Spec is the changelog | When requirements change, the spec changes first |
Now you’ll see the same forgot-password spec — built five different ways.
Each tool enforces all four principles. The pen changes hands.
What You’ll Learn
-
The five main SDD toolchains
Spec Kit, BMAD, Matt Pocock Skills, Superpowers, OpenSpec — what each one is and who it’s for.
-
How to match a tool to your context
Solo vs team, greenfield vs legacy, AI-drafted vs author-driven.
-
The same discipline across all five
Same four principles. Different workflow. Different hands on the pen.
The Spec We’re Building From
The feature from the intro deck. The spec is already written.
# specs/forgot-password.yaml
endpoint: POST /auth/forgot-password
version: "1.0"
inputs:
- name: email type: Email required: true
invariants:
- response is identical for known and unknown emails # no enumeration
- max 5 requests / hour / IP
- max 3 requests / hour / email
- token TTL <= 15 minutes
- token is single-use
- token stored as SHA-256 hash, never plaintext
outputs:
status: 202 Accepted
body: { ok: true }
side_effects:
- email sent IF user exists
- audit log: { userId?, ip, requestedAt }Five tools. Five paths to this same spec — and to working code that satisfies it.
Tool 1 of 5 — Spec Kit
You write the spec. Slash commands gate every phase.
Author-Driven
You write every decision. The AI executes, never guesses.
Gated Phases
Slash commands act as gates: specify → plan → tasks → implement.
Best For
Solo developers who want maximum control over every decision.
Spec Kit — Install & Constitution
uv tool install specify-cli \
--from git+https://github.com/github/spec-kit.git
specify init . --integration claudeSet the project-wide rules once with /speckit.constitution. They prepend to every subsequent agent prompt.
/speckit.constitution
Governing principles for an auth service: never leak account existence in any endpoint;
all security-sensitive endpoints rate-limited at multiple keys;
tokens must be single-use, hashed, time-bounded;
all auth events must be audit-logged.Spec Kit — Specify
Describe intent, not stack.
/speckit.specify
POST /auth/forgot-password — user requests a reset email.
Same response for known and unknown emails (never leak existence).
Rate limit: 5/hour/IP, 3/hour/email. Token: 15-minute TTL, single-use, SHA-256 hashed at rest.
Audit log every request.What the agent does next: drafts spec.md with acceptance criteria. Anywhere it sees a gap, it inserts a [NEEDS CLARIFICATION: <question>] marker rather than guessing. For critical ambiguities it may pause and ask up to 3 grouped questions before writing.
Spec Kit — Clarify
Don’t move on with [NEEDS CLARIFICATION] markers still in the spec. Run:
/speckit.clarifyWhat the agent does next: runs a structured ambiguity scan over the spec, then asks up to 5 targeted clarifying questions, one at a time — multiple-choice where possible (e.g. “200 / 202 / 204?”). Each answer is written back into the spec immediately. Stops when 5 are answered or no critical gaps remain.
Spec Kit — Plan
Now describe stack, not behaviour.
/speckit.plan
TypeScript + Express. Prisma + PostgreSQL.
Redis for rate-limit counters. Use express-rate-limit middleware.
Use crypto.randomBytes for tokens.
Use existing audit-log table and nodemailer SMTP wrapper.What the agent does next: reads the clarified spec + the constitution, then writes plan.md mapping each spec line to a concrete implementation choice in your stack. Does not ask questions — by this point, the spec should be unambiguous.
Spec Kit — Tasks & Implement
/speckit.tasksWhat the agent does next: reads the spec + plan, generates a numbered task list — each task referencing the spec line it traces back to:
[ ] 1. Add Zod EmailSchema — Ref: spec acceptance line 1
[ ] 2. Add rate-limit middleware (5/h/IP) — Ref: line 2
[ ] 3. Add per-email rate limiter (3/h/email) — Ref: line 3
[ ] 4. Migration: tokens table (tokenHash, expiresAt, usedAt) — Ref: lines 4-6
[ ] 5. Implement forgotPassword handler — Ref: spec (all)
[ ] 6. Wire audit.log in success and miss paths — Ref: line 7
[ ] 7. Tests — one per acceptance line/speckit.implementWhat the agent does next: works through the task list in order, ticking each off. Every line of code it writes is traceable to a spec acceptance line. Tests are generated against acceptance lines, not against implementation shape.
Tool 2 of 5 — BMAD Method
AI agents draft each artifact. You review and approve between phases.
github.com/bmad-code-org/BMAD-METHOD
AI-Drafted
Analyst, PM, Architect, Developer, UX, Tech Writer agents draft each artifact.
Human-Approved
You review and approve before the next agent runs.
Best For
Teams with dedicated reviewers who want AI to draft the spec.
BMAD — Install & Meet the Agents
npx bmad-method install
# Select your AI IDE — Claude Code, Cursor, Copilot, etc.Each agent is installed as a skill. Default agents:
| Agent skill | Persona | Role |
|---|---|---|
bmad-analyst | Mary | Discovery & research |
bmad-agent-pm | John | Product Manager |
bmad-agent-architect | Winston | Software Architect |
bmad-agent-dev | Amelia | Developer |
bmad-ux-designer | Sally | UX Designer |
bmad-tech-writer | Paige | Technical Writer |
bmad-help | — | Navigator — invoke any time |
Each agent reads only what the previous agent produced. You review every artifact before the next agent runs. The chain is the spec.
BMAD — Start With the Help Skill
After install, in a fresh chat:
bmad-helpWhat the skill does next: scans your project state (which artifacts already exist, which modules are installed) and recommends the next agent and workflow. For a new feature, it will point you at bmad-agent-pm + the create-PRD workflow.
BMAD — PM Drafts the PRD
bmad-agent-pm
Create a PRD for POST /auth/forgot-password.
Auth service. Must not leak account existence. Rate limit per IP and per email.
Tokens short-lived, single-use, hashed. Audit every request.What the agent does next: the PM persona (John) activates and runs the create-PRD workflow. It walks you through goals → users → functional requirements → acceptance criteria, asking targeted questions where your brief was thin. The output is a PRD with acceptance criteria like:
[ ] Identical 202 response for known and unknown emails
[ ] 5/hour/IP and 3/hour/email rate limits
[ ] Token TTL <= 15 minutes, single-use
[ ] Token stored as hash, not plaintext
[ ] Audit log on every requestYou review. Approve. Move on.
BMAD — Architect Designs Against the PRD
bmad-agent-architect
Read the PRD. Produce an architecture for forgot-password.
Stack: Express + Prisma + PostgreSQL + Redis.What the agent does next: the Architect persona (Winston) activates, reads the PRD, and runs the create-architecture workflow. It produces the concurrency strategy, error classes, data model, and any cross-cutting decisions — asking only when a PRD acceptance line has more than one defensible design.
BMAD — Developer & Verification
bmad-agent-dev
Implement POST /auth/forgot-password.
Read the PRD and the architecture before any code.
Every acceptance criterion must be satisfied.What the agent does next: the Developer persona (Amelia) activates, reads both artifacts, and runs the dev-story workflow. It implements one story at a time against the PRD, generating tests against acceptance criteria. Verification then walks the criteria one by one against the implementation:
✓ AC1 identical 202 for known and unknown — passes
✓ AC2 5/hour/IP — passes
✓ AC3 3/hour/email — passes
✓ AC4 token TTL — passes
✓ AC5 token hash storage — passes
✓ AC6 audit log on every request — passes
NFR p99 < 200ms — no load test present. Recommend k6 smoke before deploy.Tool 3 of 5 — Matt Pocock’s Skills
Lightweight Claude Code skills. Each enforces one piece of the discipline. You compose them.
Composable
Pick only the rungs you need. Mix with other workflows.
Lightweight
No install ceremony. npx skills@latest add mattpocock/skills
Best For
Developers who want to mix and match — some specs by you, some by AI.
Matt Pocock Skills — Install & Setup
npx skills@latest add mattpocock/skillsPick the skills you want — and make sure /setup-matt-pocock-skills is one of them. Then run it once per repo to scaffold the per-repo config (issue tracker, triage labels, docs location):
/setup-matt-pocock-skillsNow the engineering skills are wired up:
| Command | What it does |
|---|---|
/grill-with-docs | Interrogates you about a feature against the existing domain model. Output: a hardened brief plus updates to CONTEXT.md and ADRs. |
/to-prd | Turns the conversation into a PRD and submits it as a GitHub issue. |
/to-issues | Breaks a plan, spec, or PRD into independently-grabbable GitHub issues (vertical slices). |
/tdd | Drives test-first implementation, red → green → refactor. |
/diagnose | Disciplined diagnosis loop for hard bugs: reproduce → minimise → hypothesise → instrument → fix. |
Matt Pocock Skills — Grill the Spec
The clarify phase as a skill.
/grill-with-docs Build POST /auth/forgot-password — email, link, done.What the skill does next (quoting the SKILL.md): “Interview me relentlessly about every aspect of this plan until we reach a shared understanding. Walk down each branch of the design tree, resolving dependencies between decisions one-by-one. For each question, provide your recommended answer. Ask the questions one at a time, waiting for feedback on each question before continuing.”
It also reads CONTEXT.md and any ADRs in your repo first, then sharpens the conversation against that vocabulary. Expect questions like:
1. What response for unknown emails? (any inconsistency leaks existence)
2. Per-IP rate limit, or per-email, or both?
3. Token TTL? Single-use, or re-usable until expiry?
4. Plaintext or hashed at rest?
5. Audit log scope — only user-found, or every request?
6. HTTP status — 200, 202, 204?
7. What does the email body say if user doesn't exist?Matt Pocock Skills — TDD the Implementation
Spec hardened. Now drive the build.
/tdd Implement POST /auth/forgot-password against the brief above. Test first. Smallest possible step.What the skill does next: enforces vertical-slice TDD — one acceptance line at a time, not all tests up front. The SKILL.md explicitly bans the “horizontal slice” of writing all tests then all code, because bulk-written tests test imagined behaviour instead of actual behaviour.
-
Red
Write one failing integration-style test for one acceptance line — testing behaviour through the public interface, not implementation details.
-
Green
Implement the minimum to pass it.
-
Refactor
Only after green, never before.
If a bug surfaces, /diagnose runs the structured debugging loop (reproduce → minimise → hypothesise → instrument → fix → regression-test) instead of patching blind.
Tool 4 of 5 — Superpowers
A skills library that auto-activates the right discipline at the right moment. No slash commands to remember.
Auto-Activated
Skills trigger automatically by context — brainstorming before code, TDD during build, review between tasks.
Mandatory Workflows
”Mandatory workflows, not suggestions.” The agent checks for relevant skills before any task.
Best For
Reviewer-first culture where you want the discipline baked in by default.
Superpowers — Install
The plugin installs via the Claude Code plugin marketplace.
/plugin install superpowers@claude-plugins-officialOther agents have their own marketplaces — Codex CLI, Cursor, and Copilot CLI are documented in the README.
Once installed, the agent has the skills. There is no /brainstorm or /execute to type — the workflow activates from conversation context.
Superpowers — The Workflow Skills
| Skill | Activates when |
|---|---|
brainstorming | You start describing a feature, before any code |
writing-plans | A design has been agreed |
subagent-driven-development | A plan exists and execution begins |
executing-plans | Batch-running approved plan steps |
test-driven-development | During implementation |
requesting-code-review | Between tasks |
verification-before-completion | Before claiming done |
The skill set covers the full loop from a fuzzy idea to verified code.
Superpowers — In Practice
Conversation, not commands:
> Need a forgot-password endpoint. Auth service.What the skill does next (this is from the brainstorming SKILL.md checklist):
-
Explore project context
Reads files, docs, recent commits — grounds itself in your codebase before anything else.
-
Ask clarifying questions, one at a time
Purpose, constraints, success criteria. Never bulk questions.
-
Propose 2–3 approaches with tradeoffs
Each with a recommendation.
-
Present the design, get approval section by section
Cannot proceed without it.
-
Write design doc
Saved to
docs/superpowers/specs/YYYY-MM-DD-<topic>-design.mdand committed.
Superpowers — The Output
What you get back from brainstorming for the forgot-password endpoint:
Options to decide:
- Response shape: identical (no leak) vs explicit (UX clearer)
- Rate limit: IP only / email only / both / sliding window
- Token storage: plaintext (fast) vs hashed (safe)
- TTL: 5 / 15 / 60 minutes — security vs UX tradeoff
Risks flagged:
- CWE-203 (account enumeration) if responses differ
- SMTP exhaustion if no email-level rate limit
- Token replay if no usedAtTool 5 of 5 — OpenSpec
Delta-spec for brownfield code. You don’t spec the whole system — you spec what’s changing.
github.com/Fission-AI/OpenSpec
Brownfield-First
No need to spec the whole system. Start with the delta.
Change as Changelog
Each delta becomes a permanent record of what changed and why.
Best For
Legacy codebases with no specs at all. Start anywhere.
OpenSpec — Install & Init
npm install -g @fission-ai/openspec@latest
openspec init| Greenfield SDD | OpenSpec |
|---|---|
| Spec the whole feature | Spec only what’s changing |
| Full spec before code | Change proposal IS the spec |
| New codebase | Current code is the baseline |
| — | Old changes archived as changelog |
For an existing forgot-password with no rate limit, no TTL, no audit — you don’t write forgot-password.yaml from scratch. You write a change.
OpenSpec — Propose a Change
/opsx:propose add-forgot-password-hardeningWhat the command does next: the agent reads your existing code, infers the current behaviour, and generates a complete proposal folder at openspec/changes/add-forgot-password-hardening/:
| File | Purpose |
|---|---|
proposal.md | why we’re doing this, what’s changing |
specs/ | requirements and scenarios |
design.md | technical approach |
tasks.md | implementation checklist |
The flow is generation-first, not interview-first — you review and edit the artifacts, then run /opsx:apply when satisfied.
OpenSpec — Inside the Proposal
Looking at real proposals in the OpenSpec repo, proposal.md follows a four-section structure:
# Harden POST /auth/forgot-password
## Why
Current implementation has no rate limit, leaks account existence,
and stores plaintext tokens. Security review flagged all three.
## What Changes
- identical 202 response for known and unknown emails
- 5/hour/IP and 3/hour/email rate limits
- token TTL = 15 minutes, single-use (usedAt set on first use)
- token stored as SHA-256 hash
- audit log on every request
## Capabilities
- Rate limiting (per-IP, per-email)
- Token hashing and TTL enforcement
- Audit logging across success and miss paths
## Impact
- Existing tests must still pass
- 7 new tests required for the new invariants
- No breaking changes to email template, SMTP wrapper, or route shapeOpenSpec — Apply & Archive
/opsx:applyWhat the command does next: walks the tasks.md checklist for the active change, ticking each item off as the code edits land. From the README’s worked example:
You: /opsx:apply
AI: Implementing tasks...
✓ 1.1 Add rate-limit middleware (per-IP)
✓ 1.2 Add rate-limit middleware (per-email)
✓ 2.1 Migrate tokens table — add tokenHash, expiresAt, usedAt
✓ 2.2 Hash tokens with SHA-256 on create
✓ 3.1 Wire audit.log into success and miss paths
All tasks complete!Once green, archive the change so it becomes part of the changelog:
/opsx:archiveOpenSpec — Expanded Workflow (Optional)
The README documents an expanded set of commands behind a profile switch:
openspec config profile # select the expanded profile
openspec update # apply the new slash commands| Command | Purpose |
|---|---|
/opsx:new | Start a new change (more deliberate than propose) |
/opsx:continue | Resume work on an existing change |
/opsx:ff | Fast-forward progress |
/opsx:verify | Check implementation against specs |
/opsx:bulk-archive | Archive multiple changes at once |
/opsx:onboard | Team workflow setup |
For most flows the three-command core (propose → apply → archive) is enough.
Side By Side — Five Paths
| Phase | Spec Kit | BMAD | Matt Pocock | Superpowers | OpenSpec |
|---|---|---|---|---|---|
| Clarify | /speckit.specify + /speckit.clarify | bmad-agent-pm PRD interview | /grill-with-docs | brainstorming skill | Existing code + “What’s changing?” |
| Spec | spec.md | PRD | Brief transcript | Design doc + plan | proposal.md + specs/ |
| Plan | /speckit.plan | bmad-agent-architect | (in /tdd prompt) | writing-plans skill | design.md + tasks.md |
| Implement | /speckit.implement | bmad-agent-dev | /tdd | subagent-driven-development | /opsx:apply |
| Verify | Tests from spec | Acceptance walk-through | /diagnose | verification-before-completion | /opsx:verify (expanded profile) |
| Asks questions? | Yes — /clarify asks ≤5, one at a time | Yes — PM/Architect walk you through | Yes — /grill-with-docs interviews | Yes — brainstorming one at a time | No — generates from existing code |
| Spec author | You | AI (PM agent) | You + AI grilling | You + AI brainstorm | You (change only) |
Same discipline ends every column: code traces to a spec, tests trace to spec lines, failures name invariants.
Which One For You?
Spec Kit
You want to own every decision yourself. Solo, maximum control.
BMAD
Your team has dedicated reviewers and you want AI to draft the artifacts.
Matt Pocock's Skills
You want to mix and match — some specs by you, some by AI, all light.
Superpowers
You want the discipline auto-applied without remembering commands.
OpenSpec
You’re working in a legacy codebase with no specs at all.
Five Tools, One Discipline
| Tool | Best for | Principle it most visibly enforces |
|---|---|---|
| Spec Kit | Solo, max control | Gaps before code — you close every gap yourself |
| BMAD | Teams with dedicated reviewers | Spec is the contract — AI drafts, human signs off |
| Matt Pocock Skills | Lightweight, composable flow | Tests trace to spec — TDD-first loop built in |
| Superpowers | Reviewer-first culture | Spec is the contract — skills auto-apply the discipline |
| OpenSpec | Legacy codebases | Spec is the changelog — the change proposal IS the spec |
Resources
| Tool | Repository |
|---|---|
| Spec Kit | github.com/github/spec-kit |
| BMAD Method | github.com/bmad-code-org/BMAD-METHOD |
| Matt Pocock’s Skills | github.com/mattpocock/skills |
| Superpowers | github.com/obra/superpowers |
| OpenSpec | github.com/Fission-AI/OpenSpec |