SDD Deep Dive: Five Tools, One Feature

intermediate Deck 35 slides

The same forgot-password spec — built five different ways.

Five Tools, One Feature

The same forgot-password spec — built five different ways.

Same discipline. Different hands on the pen.

From Principles to Tools

In the intro deck, four principles produced a cleaner result:

PrincipleWhat it means
Gaps before codeSurface every ambiguity before the agent generates anything
Spec is the contractThe agent reads the spec — not your memory or your Slack thread
Tests trace to spec linesA failing test names a decision, not a code location
Spec is the changelogWhen requirements change, the spec changes first

Now you’ll see the same forgot-password spec — built five different ways. Each tool enforces all four principles. The pen changes hands.

What You’ll Learn

  1. The five main SDD toolchains

    Spec Kit, BMAD, Matt Pocock Skills, Superpowers, OpenSpec — what each one is and who it’s for.

  2. How to match a tool to your context

    Solo vs team, greenfield vs legacy, AI-drafted vs author-driven.

  3. The same discipline across all five

    Same four principles. Different workflow. Different hands on the pen.

The Spec We’re Building From

The feature from the intro deck. The spec is already written.

yaml
# specs/forgot-password.yaml
endpoint: POST /auth/forgot-password
version:  "1.0"

inputs:
  - name: email   type: Email   required: true

invariants:
  - response is identical for known and unknown emails    # no enumeration
  - max 5 requests / hour / IP
  - max 3 requests / hour / email
  - token TTL <= 15 minutes
  - token is single-use
  - token stored as SHA-256 hash, never plaintext

outputs:
  status: 202 Accepted
  body:   { ok: true }

side_effects:
  - email sent IF user exists
  - audit log: { userId?, ip, requestedAt }

Five tools. Five paths to this same spec — and to working code that satisfies it.

Tool 1 of 5 — Spec Kit

You write the spec. Slash commands gate every phase.

github.com/github/spec-kit

Author-Driven

You write every decision. The AI executes, never guesses.

Gated Phases

Slash commands act as gates: specify → plan → tasks → implement.

Best For

Solo developers who want maximum control over every decision.

Spec Kit — Install & Constitution

bash
uv tool install specify-cli \
  --from git+https://github.com/github/spec-kit.git

specify init . --integration claude

Set the project-wide rules once with /speckit.constitution. They prepend to every subsequent agent prompt.

plaintext
/speckit.constitution 
Governing principles for an auth service: never leak account existence in any endpoint;
all security-sensitive endpoints rate-limited at multiple keys; 
tokens must be single-use, hashed, time-bounded; 
all auth events must be audit-logged.

Spec Kit — Specify

Describe intent, not stack.

plaintext
/speckit.specify 
POST /auth/forgot-password — user requests a reset email. 
Same response for known and unknown emails (never leak existence). 
Rate limit: 5/hour/IP, 3/hour/email. Token: 15-minute TTL, single-use, SHA-256 hashed at rest. 
Audit log every request.

What the agent does next: drafts spec.md with acceptance criteria. Anywhere it sees a gap, it inserts a [NEEDS CLARIFICATION: <question>] marker rather than guessing. For critical ambiguities it may pause and ask up to 3 grouped questions before writing.

Spec Kit — Clarify

Don’t move on with [NEEDS CLARIFICATION] markers still in the spec. Run:

plaintext
/speckit.clarify

What the agent does next: runs a structured ambiguity scan over the spec, then asks up to 5 targeted clarifying questions, one at a time — multiple-choice where possible (e.g. “200 / 202 / 204?”). Each answer is written back into the spec immediately. Stops when 5 are answered or no critical gaps remain.

Spec Kit — Plan

Now describe stack, not behaviour.

plaintext
/speckit.plan 
TypeScript + Express. Prisma + PostgreSQL. 
Redis for rate-limit counters. Use express-rate-limit middleware. 
Use crypto.randomBytes for tokens. 
Use existing audit-log table and nodemailer SMTP wrapper.

What the agent does next: reads the clarified spec + the constitution, then writes plan.md mapping each spec line to a concrete implementation choice in your stack. Does not ask questions — by this point, the spec should be unambiguous.

Spec Kit — Tasks & Implement

plaintext
/speckit.tasks

What the agent does next: reads the spec + plan, generates a numbered task list — each task referencing the spec line it traces back to:

plaintext
[ ] 1. Add Zod EmailSchema       — Ref: spec acceptance line 1
[ ] 2. Add rate-limit middleware (5/h/IP)   — Ref: line 2
[ ] 3. Add per-email rate limiter (3/h/email) — Ref: line 3
[ ] 4. Migration: tokens table (tokenHash, expiresAt, usedAt) — Ref: lines 4-6
[ ] 5. Implement forgotPassword handler — Ref: spec (all)
[ ] 6. Wire audit.log in success and miss paths — Ref: line 7
[ ] 7. Tests — one per acceptance line
plaintext
/speckit.implement

What the agent does next: works through the task list in order, ticking each off. Every line of code it writes is traceable to a spec acceptance line. Tests are generated against acceptance lines, not against implementation shape.

Tool 2 of 5 — BMAD Method

AI agents draft each artifact. You review and approve between phases.

github.com/bmad-code-org/BMAD-METHOD

AI-Drafted

Analyst, PM, Architect, Developer, UX, Tech Writer agents draft each artifact.

Human-Approved

You review and approve before the next agent runs.

Best For

Teams with dedicated reviewers who want AI to draft the spec.

BMAD — Install & Meet the Agents

bash
npx bmad-method install
# Select your AI IDE — Claude Code, Cursor, Copilot, etc.

Each agent is installed as a skill. Default agents:

Agent skillPersonaRole
bmad-analystMaryDiscovery & research
bmad-agent-pmJohnProduct Manager
bmad-agent-architectWinstonSoftware Architect
bmad-agent-devAmeliaDeveloper
bmad-ux-designerSallyUX Designer
bmad-tech-writerPaigeTechnical Writer
bmad-helpNavigator — invoke any time

Each agent reads only what the previous agent produced. You review every artifact before the next agent runs. The chain is the spec.

BMAD — Start With the Help Skill

After install, in a fresh chat:

plaintext
bmad-help

What the skill does next: scans your project state (which artifacts already exist, which modules are installed) and recommends the next agent and workflow. For a new feature, it will point you at bmad-agent-pm + the create-PRD workflow.

BMAD — PM Drafts the PRD

plaintext
bmad-agent-pm
Create a PRD for POST /auth/forgot-password.
Auth service. Must not leak account existence. Rate limit per IP and per email.
Tokens short-lived, single-use, hashed. Audit every request.

What the agent does next: the PM persona (John) activates and runs the create-PRD workflow. It walks you through goals → users → functional requirements → acceptance criteria, asking targeted questions where your brief was thin. The output is a PRD with acceptance criteria like:

plaintext
[ ] Identical 202 response for known and unknown emails
[ ] 5/hour/IP and 3/hour/email rate limits
[ ] Token TTL <= 15 minutes, single-use
[ ] Token stored as hash, not plaintext
[ ] Audit log on every request

You review. Approve. Move on.

BMAD — Architect Designs Against the PRD

plaintext
bmad-agent-architect
Read the PRD. Produce an architecture for forgot-password.
Stack: Express + Prisma + PostgreSQL + Redis.

What the agent does next: the Architect persona (Winston) activates, reads the PRD, and runs the create-architecture workflow. It produces the concurrency strategy, error classes, data model, and any cross-cutting decisions — asking only when a PRD acceptance line has more than one defensible design.

BMAD — Developer & Verification

plaintext
bmad-agent-dev
Implement POST /auth/forgot-password.
Read the PRD and the architecture before any code.
Every acceptance criterion must be satisfied.

What the agent does next: the Developer persona (Amelia) activates, reads both artifacts, and runs the dev-story workflow. It implements one story at a time against the PRD, generating tests against acceptance criteria. Verification then walks the criteria one by one against the implementation:

plaintext
✓  AC1  identical 202 for known and unknown — passes
✓  AC2  5/hour/IP — passes
✓  AC3  3/hour/email — passes
✓  AC4  token TTL — passes
✓  AC5  token hash storage — passes
✓  AC6  audit log on every request — passes

NFR p99 < 200ms — no load test present. Recommend k6 smoke before deploy.

Tool 3 of 5 — Matt Pocock’s Skills

Lightweight Claude Code skills. Each enforces one piece of the discipline. You compose them.

github.com/mattpocock/skills

Composable

Pick only the rungs you need. Mix with other workflows.

Lightweight

No install ceremony. npx skills@latest add mattpocock/skills

Best For

Developers who want to mix and match — some specs by you, some by AI.

Matt Pocock Skills — Install & Setup

bash
npx skills@latest add mattpocock/skills

Pick the skills you want — and make sure /setup-matt-pocock-skills is one of them. Then run it once per repo to scaffold the per-repo config (issue tracker, triage labels, docs location):

plaintext
/setup-matt-pocock-skills

Now the engineering skills are wired up:

CommandWhat it does
/grill-with-docsInterrogates you about a feature against the existing domain model. Output: a hardened brief plus updates to CONTEXT.md and ADRs.
/to-prdTurns the conversation into a PRD and submits it as a GitHub issue.
/to-issuesBreaks a plan, spec, or PRD into independently-grabbable GitHub issues (vertical slices).
/tddDrives test-first implementation, red → green → refactor.
/diagnoseDisciplined diagnosis loop for hard bugs: reproduce → minimise → hypothesise → instrument → fix.

Matt Pocock Skills — Grill the Spec

The clarify phase as a skill.

plaintext
/grill-with-docs Build POST /auth/forgot-password — email, link, done.

What the skill does next (quoting the SKILL.md): “Interview me relentlessly about every aspect of this plan until we reach a shared understanding. Walk down each branch of the design tree, resolving dependencies between decisions one-by-one. For each question, provide your recommended answer. Ask the questions one at a time, waiting for feedback on each question before continuing.”

It also reads CONTEXT.md and any ADRs in your repo first, then sharpens the conversation against that vocabulary. Expect questions like:

plaintext
1.  What response for unknown emails? (any inconsistency leaks existence)
2.  Per-IP rate limit, or per-email, or both?
3.  Token TTL? Single-use, or re-usable until expiry?
4.  Plaintext or hashed at rest?
5.  Audit log scope — only user-found, or every request?
6.  HTTP status — 200, 202, 204?
7.  What does the email body say if user doesn't exist?

Matt Pocock Skills — TDD the Implementation

Spec hardened. Now drive the build.

plaintext
/tdd Implement POST /auth/forgot-password against the brief above. Test first. Smallest possible step.

What the skill does next: enforces vertical-slice TDD — one acceptance line at a time, not all tests up front. The SKILL.md explicitly bans the “horizontal slice” of writing all tests then all code, because bulk-written tests test imagined behaviour instead of actual behaviour.

  1. Red

    Write one failing integration-style test for one acceptance line — testing behaviour through the public interface, not implementation details.

  2. Green

    Implement the minimum to pass it.

  3. Refactor

    Only after green, never before.

If a bug surfaces, /diagnose runs the structured debugging loop (reproduce → minimise → hypothesise → instrument → fix → regression-test) instead of patching blind.

Tool 4 of 5 — Superpowers

A skills library that auto-activates the right discipline at the right moment. No slash commands to remember.

github.com/obra/superpowers

Auto-Activated

Skills trigger automatically by context — brainstorming before code, TDD during build, review between tasks.

Mandatory Workflows

”Mandatory workflows, not suggestions.” The agent checks for relevant skills before any task.

Best For

Reviewer-first culture where you want the discipline baked in by default.

Superpowers — Install

The plugin installs via the Claude Code plugin marketplace.

plaintext
/plugin install superpowers@claude-plugins-official

Other agents have their own marketplaces — Codex CLI, Cursor, and Copilot CLI are documented in the README.

Once installed, the agent has the skills. There is no /brainstorm or /execute to type — the workflow activates from conversation context.

Superpowers — The Workflow Skills

SkillActivates when
brainstormingYou start describing a feature, before any code
writing-plansA design has been agreed
subagent-driven-developmentA plan exists and execution begins
executing-plansBatch-running approved plan steps
test-driven-developmentDuring implementation
requesting-code-reviewBetween tasks
verification-before-completionBefore claiming done

The skill set covers the full loop from a fuzzy idea to verified code.

Superpowers — In Practice

Conversation, not commands:

plaintext
> Need a forgot-password endpoint. Auth service.

What the skill does next (this is from the brainstorming SKILL.md checklist):

  1. Explore project context

    Reads files, docs, recent commits — grounds itself in your codebase before anything else.

  2. Ask clarifying questions, one at a time

    Purpose, constraints, success criteria. Never bulk questions.

  3. Propose 2–3 approaches with tradeoffs

    Each with a recommendation.

  4. Present the design, get approval section by section

    Cannot proceed without it.

  5. Write design doc

    Saved to docs/superpowers/specs/YYYY-MM-DD-<topic>-design.md and committed.

Superpowers — The Output

What you get back from brainstorming for the forgot-password endpoint:

plaintext
Options to decide:
  - Response shape: identical (no leak) vs explicit (UX clearer)
  - Rate limit: IP only / email only / both / sliding window
  - Token storage: plaintext (fast) vs hashed (safe)
  - TTL: 5 / 15 / 60 minutes — security vs UX tradeoff

Risks flagged:
  - CWE-203 (account enumeration) if responses differ
  - SMTP exhaustion if no email-level rate limit
  - Token replay if no usedAt

Tool 5 of 5 — OpenSpec

Delta-spec for brownfield code. You don’t spec the whole system — you spec what’s changing.

github.com/Fission-AI/OpenSpec

Brownfield-First

No need to spec the whole system. Start with the delta.

Change as Changelog

Each delta becomes a permanent record of what changed and why.

Best For

Legacy codebases with no specs at all. Start anywhere.

OpenSpec — Install & Init

bash
npm install -g @fission-ai/openspec@latest
openspec init
Greenfield SDDOpenSpec
Spec the whole featureSpec only what’s changing
Full spec before codeChange proposal IS the spec
New codebaseCurrent code is the baseline
Old changes archived as changelog

For an existing forgot-password with no rate limit, no TTL, no audit — you don’t write forgot-password.yaml from scratch. You write a change.

OpenSpec — Propose a Change

plaintext
/opsx:propose add-forgot-password-hardening

What the command does next: the agent reads your existing code, infers the current behaviour, and generates a complete proposal folder at openspec/changes/add-forgot-password-hardening/:

FilePurpose
proposal.mdwhy we’re doing this, what’s changing
specs/requirements and scenarios
design.mdtechnical approach
tasks.mdimplementation checklist

The flow is generation-first, not interview-first — you review and edit the artifacts, then run /opsx:apply when satisfied.

OpenSpec — Inside the Proposal

Looking at real proposals in the OpenSpec repo, proposal.md follows a four-section structure:

markdown
# Harden POST /auth/forgot-password

## Why
Current implementation has no rate limit, leaks account existence,
and stores plaintext tokens. Security review flagged all three.

## What Changes
- identical 202 response for known and unknown emails
- 5/hour/IP and 3/hour/email rate limits
- token TTL = 15 minutes, single-use (usedAt set on first use)
- token stored as SHA-256 hash
- audit log on every request

## Capabilities
- Rate limiting (per-IP, per-email)
- Token hashing and TTL enforcement
- Audit logging across success and miss paths

## Impact
- Existing tests must still pass
- 7 new tests required for the new invariants
- No breaking changes to email template, SMTP wrapper, or route shape

OpenSpec — Apply & Archive

plaintext
/opsx:apply

What the command does next: walks the tasks.md checklist for the active change, ticking each item off as the code edits land. From the README’s worked example:

text
You: /opsx:apply
AI:  Implementing tasks...
     ✓ 1.1 Add rate-limit middleware (per-IP)
     ✓ 1.2 Add rate-limit middleware (per-email)
     ✓ 2.1 Migrate tokens table — add tokenHash, expiresAt, usedAt
     ✓ 2.2 Hash tokens with SHA-256 on create
     ✓ 3.1 Wire audit.log into success and miss paths
     All tasks complete!

Once green, archive the change so it becomes part of the changelog:

plaintext
/opsx:archive

OpenSpec — Expanded Workflow (Optional)

The README documents an expanded set of commands behind a profile switch:

bash
openspec config profile   # select the expanded profile
openspec update           # apply the new slash commands
CommandPurpose
/opsx:newStart a new change (more deliberate than propose)
/opsx:continueResume work on an existing change
/opsx:ffFast-forward progress
/opsx:verifyCheck implementation against specs
/opsx:bulk-archiveArchive multiple changes at once
/opsx:onboardTeam workflow setup

For most flows the three-command core (proposeapplyarchive) is enough.

Side By Side — Five Paths

PhaseSpec KitBMADMatt PocockSuperpowersOpenSpec
Clarify/speckit.specify + /speckit.clarifybmad-agent-pm PRD interview/grill-with-docsbrainstorming skillExisting code + “What’s changing?”
Specspec.mdPRDBrief transcriptDesign doc + planproposal.md + specs/
Plan/speckit.planbmad-agent-architect(in /tdd prompt)writing-plans skilldesign.md + tasks.md
Implement/speckit.implementbmad-agent-dev/tddsubagent-driven-development/opsx:apply
VerifyTests from specAcceptance walk-through/diagnoseverification-before-completion/opsx:verify (expanded profile)
Asks questions?Yes — /clarify asks ≤5, one at a timeYes — PM/Architect walk you throughYes — /grill-with-docs interviewsYes — brainstorming one at a timeNo — generates from existing code
Spec authorYouAI (PM agent)You + AI grillingYou + AI brainstormYou (change only)

Same discipline ends every column: code traces to a spec, tests trace to spec lines, failures name invariants.

Which One For You?

Spec Kit

You want to own every decision yourself. Solo, maximum control.

BMAD

Your team has dedicated reviewers and you want AI to draft the artifacts.

Matt Pocock's Skills

You want to mix and match — some specs by you, some by AI, all light.

Superpowers

You want the discipline auto-applied without remembering commands.

OpenSpec

You’re working in a legacy codebase with no specs at all.

Five Tools, One Discipline

ToolBest forPrinciple it most visibly enforces
Spec KitSolo, max controlGaps before code — you close every gap yourself
BMADTeams with dedicated reviewersSpec is the contract — AI drafts, human signs off
Matt Pocock SkillsLightweight, composable flowTests trace to spec — TDD-first loop built in
SuperpowersReviewer-first cultureSpec is the contract — skills auto-apply the discipline
OpenSpecLegacy codebasesSpec is the changelog — the change proposal IS the spec

Resources

ToolRepository
Spec Kitgithub.com/github/spec-kit
BMAD Methodgithub.com/bmad-code-org/BMAD-METHOD
Matt Pocock’s Skillsgithub.com/mattpocock/skills
Superpowersgithub.com/obra/superpowers
OpenSpecgithub.com/Fission-AI/OpenSpec
1 / 35