The complete guide
A pragmatic, AI-powered engineering discipline for solo developers and small teams — maintain a deployable, production-ready state at all times, without a large engineering org.
Describes skills library v1.3.0 (released 2026-04-18) · VERSION · CHANGELOG · verify locally with cat ~/.claude/skills/VERSION
Overview
InsanelyGreat's Shippable States Software Development (SSD) is a pragmatic engineering discipline designed for solo developers and small teams who use AI — specifically Claude Code — to build production software. The system maintains a deployable, production-ready state at all times throughout the development cycle.
The core principle:
SSD synthesizes lessons from continuous deployment, trunk-based development, feature flags, and decades of software engineering failures where "90% done" meant "months from shipping." It was designed from the ground up to be operated by one person or a handful of people, with AI as a force multiplier at every step of the workflow.
The methodology is simple: maintain a deployable state at all times. The discipline is hard: no shortcuts, no "we'll fix it later," no broken code on main. The payoff is enormous: no death marches, predictable delivery, high quality, low stress — without needing a 10-person eng team to enforce it.
Why It Matters
The "90% Done" Problem
Traditional development creates a predictable trap:
Week 1–8: "Making good progress!"
Week 9: "We're 90% done!"
Week 10: "Still 90% done..."
Week 11: "Uh, still 90%..."
Week 12: Panic, cut features, ship something broken
Why? The last 10% includes all the work no one budgeted for:
- Integration between components
- Error handling and edge cases
- Performance under real load
- Production deployment and data migration
- Security hardening and cross-browser testing
- Accessibility and documentation
The InsanelyGreat's SSD solution: Do the hard "last 10%" work incrementally throughout development, not as a crisis at the end. Claude Code skills handle the checklist so you stay focused on shipping.
The Iron Law
Every project has exactly three variables:
- Scope — what features and capabilities ship
- Time — when it ships
- Quality — how well it works
| Constraint | What Flexes | When to Use |
|---|---|---|
| Fix Time | Scope reduces, quality preserved | Hard deadlines (conference, contract, funding round) |
| Fix Scope | Timeline extends, quality preserved | API compliance, feature parity requirements |
| Fix Quality | Time and scope flex | Medical, financial, safety-critical systems |
Most projects are time-constrained. Declare your constraint at kickoff. Adjusting scope to meet a deadline is not failure — it's engineering judgment.
Principle 1: Constant Production Parity
Your development environment must match production as closely as possible from Day 1.
Traditional
- Weeks 1–8: Local development
- Week 9: "Okay let's deploy to staging..."
- Week 10: "Why doesn't it work in staging?"
- Week 11: "Production is different from staging..."
- Week 12: "What do you mean SSL certs take 3 days?"
InsanelyGreat's SSD
- Day 1: Deploy "Hello World" to production
- Day 2: Deploy first feature to production
- Day 3: Deploy improved version
- Day 30: Deploy to production (like every day)
Why this works: Deployment is never "the hard part" because you do it constantly. Production issues surface immediately when they're easy to fix. You know your deployment budget from Day 1.
Principle 2: The Shippable State Invariant
At the end of each work session, the system must be in a state where:
- All tests pass
- No compilation errors
- No broken user-facing features
- Documentation matches implementation
- Could be deployed to production without embarrassment
Not required: feature-complete or meeting all goals. Just that what exists actually works.
Principle 3: Feature Flags Over Feature Branches
Long-lived feature branches are antithetical to shippable states.
# Problem: Feature branch divergence
Main: A---B---C---D---E---F---G---H
\
Feature: I---J---K---L---M
\
(days of merge conflicts)
# SSD: All work on main, behind flags
Main: A---B---C---D---E---F---G---H
Day 1: Add feature code (flag off by default)
Day 2: Expand feature (still flagged off)
Day 3: Feature works, flip flag on
All work happens on main/trunk. The feature exists in production but is invisible until ready.
if feature_flags.is_enabled("new_checkout", user=user):
return new_checkout_flow(user, plan)
else:
return legacy_checkout_flow(user, plan)
Principle 4: The Ratchet Principle
Forward progress only. Each commit improves the system in some measurable way.
Banned commits:
- "WIP" or "checkpoint" commits
- "Broken, will fix tomorrow"
- Commented-out code "for later"
- Partially implemented features visible to users
The ratchet mechanism — every commit must:
- Pass CI/CD
- Maintain or improve code coverage
- Be deployable
If you need to save work that's not ready: use local stash (not committed), Draft PR with "DO NOT MERGE" (not on main), or a feature flag (committed, but invisible).
Principle 5: Scope Flexibility Is a Feature
Traditional thinking: "We must deliver all planned features by the deadline."
Result: Deliver nothing on time, or deliver broken features.
SSD thinking: "We deliver whatever is shippable by the deadline."
Result: Deliver working software, adjust scope based on reality.
How to cut scope well:
- Cut entire features, not the quality of existing features
- Cut depth, not breadth (fewer powerful features beats many broken features)
- Hide features behind flags rather than deleting (easy to resurrect)
- Communicate cuts early and often to stakeholders
Pattern 1: Deployed Day One
Before writing any business logic, establish a real deployment to your distribution channel. The specifics vary by platform — pick yours:
- Frontend deployed to a real URL (even if it just renders a title)
- Backend API deployed and reachable from the frontend
- Database provisioned and migrated
- CI/CD: push to main → deploy to staging automatically
- One authenticated route working end-to-end
- Error tracking (Sentry) wired up in frontend and backend
- Domain + SSL configured
- App builds and runs on minimum target device/simulator
- Main tab/navigation structure in place (empty screens are fine)
- One piece of data persisted end-to-end: create → persist → relaunch → still there
- Authenticated session working: login → token in Keychain → cold launch restores
- CI: Xcode Cloud or GitHub Actions builds and runs tests on every push
- App archived and submitted to TestFlight (even a Hello World build)
- Crash reporting wired up (Sentry, Crashlytics, or Bugsnag)
- App Store Connect record created with bundle ID matching the app
- App builds and runs on minimum target API
- Hilt dependency injection wired and working
- Navigation structure in place with NavHost
- One piece of data persisted end-to-end in Room: create → persist → kill app → relaunch → still there
- Authenticated session working: login → token in DataStore → relaunch restores
- CI: GitHub Actions or Bitrise builds debug APK and runs unit tests on every push
- Internal Testing track on Play Console with a working build uploaded
- Firebase Crashlytics (or equivalent) initialized and sending test crashes
- App builds and launches on minimum target OS
- Main window with placeholder navigation
- One real piece of persisted data: create it, see it in the UI, relaunch — still there
- Basic Settings window
- CI: Xcode Cloud or GitHub Actions builds and runs tests on every push
- Archive and notarization working (even for a Hello World app)
- Crash reporting wired up (Sentry, Bugsnag, or Crashlytics)
- Service containerized and deployed to production environment (even returning
{"status": "ok"}) - Health endpoints (
/health,/ready) responding correctly - Database provisioned, connected, and one migration applied
- Structured logging with
request_idpropagation on every request - Error tracking (Sentry or equivalent) capturing unhandled exceptions
- One authenticated endpoint working end-to-end
- CI/CD: push to main → container built, tests run, deployed to staging
.env.exampledocumenting every required environment variable
This is your MVP. It does nothing useful, but it's real. If deployment takes 2 weeks and you budget 0 weeks, you're starting 2 weeks late on Day 1.
Pattern 2: Walking Skeleton
Build one feature end-to-end before building any feature fully complete.
Wrong order
- Design all UI screens
- Build all database tables / persistence
- Write all API endpoints / services
- Connect everything
- Discover they don't fit together
Right order
- Build login flow end-to-end
- Build "add item" end-to-end
- Build "edit item" end-to-end
- Each step shippable as-is
Never build all UI then all backend/persistence. One complete flow first, then expand breadth. "End-to-end" means different things by platform: on web, UI → API → DB → response. On iOS, View → persist → relaunch → verify. On Android, Compose → Room → relaunch → verify. The principle is the same: one complete flow before breadth.
Pattern 3: Dark Launching
Launch features in production before they're visible to users. The pattern works across all platforms:
if feature_flags.new_dashboard and user.is_internal_tester:
return render_new_dashboard(user)
return render_old_dashboard(user)
if featureFlags.isEnabled("newDashboard", user: user) {
return NewDashboardView()
}
return OldDashboardView()
if (featureFlags.isEnabled("newDashboard", user)) {
NewDashboardScreen()
} else {
OldDashboardScreen()
}
Benefits:
- Test in production without risk
- Gradual rollout (internal → beta → everyone)
- Easy rollback: flip flag
- Development never blocks deployment
Pattern 4: Timebox with Eject
For risky or exploratory work, timebox it with a pre-committed eject plan.
"We'll spend 3 days exploring this approach.
On day 3, we decide:
- Ship it
- Iterate it (extend timebox)
- Abandon it (revert to last shippable state)"
This prevents:
- Sunk cost fallacy ("we've invested so much...")
- Endless exploration without shipping
- Half-finished experiments in the codebase
Pattern 5: The Nightly Ritual
End each day with a shippable state. Spend the last 30 minutes on this checklist:
- All tests pass locally
- Code committed and pushed
- CI/CD pipeline green
- Feature flags set appropriately
- Documentation updated if APIs changed
- Tomorrow's first task identified
Your future self (or your teammate) should be able to pick up exactly where you left off, with no confusion about what state things are in.
Decision Framework
Choosing Your Constraint
At project kickoff, declare your primary constraint and communicate it explicitly.
| Constraint Type | Example Projects | Scope | Quality |
|---|---|---|---|
| Time-Constrained | Conference demos, MVP for funding, contractual deliveries | Flexes | Preserved |
| Scope-Constrained | API compliance, platform migrations, feature parity | Fixed | Preserved |
| Quality-Constrained | Medical devices, financial systems, infrastructure | Flexes | Fixed |
When to Cut Scope
Scope cuts should happen early and often, not as last-minute panic.
Cut scope now if you see these signals:
- It's Wednesday and you're not confident about Friday's shippable state
- You're accumulating technical debt faster than paying it off
- Tests are being skipped "temporarily"
- "We'll clean it up after shipping" is appearing in conversations
Metrics That Matter
| Traditional (misleading) | SSD (actually useful) |
|---|---|
| Lines of code written | Days since last production deployment |
| Number of commits | Mean time to deploy a change |
| Features "in progress" | % of code behind feature flags (target: <5%) |
| Percentage complete | Test coverage (and is it passing?) |
Deployment Frequency
This is the single most important SSD metric:
- Once per month — Traditional waterfall
- Once per week — Decent
- Once per day — Excellent
- Multiple times per day — World-class
Common Objections
"This sounds like more work"
You're doing the work either way. Option A: days 1–85 ignore deployment, days 86–100 frantic debugging, ship broken. Option B: do the hard parts incrementally every day, day 90 ship the fully-working subset you completed. Same total effort, drastically different stress and quality.
"Our stakeholders need to see progress"
SSD gives better demos. Traditional: "Here's a mockup... this button doesn't work yet... imagine when this is connected to the backend..." SSD: "Here's the actual working product. Press any button." Which demo builds more confidence?
"We need to iterate quickly"
False dichotomy. Shippable states don't slow iteration — they enable it. Every iteration is testable by real users. No integration phase blocking feedback. Pivots are cheap because sunk cost is always minimal.
"My team isn't disciplined enough"
This is exactly why you need this. Discipline problems are solved with systems, not willpower. CI/CD forces tests to pass. Can't commit broken code. Daily deployments force completion. Visible production state keeps everyone honest. SSD creates discipline through automation and forcing functions.
"This doesn't work for mobile apps"
It works. You cannot deploy to the App Store daily (review takes 1-3 days). But you CAN deploy to TestFlight / Play Internal Testing daily. SSD targets the internal deployment pipeline, not the store review process. TestFlight is your "production" for SSD purposes until you cut a release.
Feature flags on mobile use an SDK (Firebase Remote Config, LaunchDarkly). Flag changes take effect on next app launch, not instantly. When you cut a store release, it should be a non-event — you've been shipping to testers daily. For macOS desktop: notarization is your deployment gate. Automate it in CI from Day 1.
Getting Started
Four weeks to establish the SSD rhythm. Success criteria: on day 30, deploy to production with confidence in under 10 minutes.
Day 0: Bootstrap
- Install the skills:
git clone https://github.com/AlexHorovitz/skills ~/.claude/skills - Run
/ssd-initonce at the project root — creates thessd/working directory (gitignored), writesssd/project.yml(detected stack/framework/platform), createsdocs/decisions/,docs/runbooks/,docs/architecture/, and runs prerequisite checks - All
/ssdphases refuse to proceed until init has run
Week 1: Foundation
- Set up CI/CD pipeline
- Deploy "Hello World" to your distribution channel (production server, TestFlight, Play Internal, notarized build)
- Configure automated testing
- Establish feature flag system (server-side for web, SDK-based for mobile/desktop)
- Invoke
/ssd startto run the Walking Skeleton playbook
Week 2: First Feature
- Build one feature end-to-end
- Deploy to production behind flag
- Verify in production
- Enable for internal users
Week 3: Rhythm
- Deploy to production daily
- Every commit passes CI
- All incomplete features behind flags
- Documentation current
Week 4: Optimization
- Reduce deploy time to under 10 minutes
- Increase test coverage
- Remove old feature flags
- Retrospective: what's working?
Platform-specific Day 1 checklists for iOS, Android, macOS, Web, and Headless are in Pattern 1: Deployed Day One above.
Claude Code Skills
InsanelyGreat's SSD is implemented as a set of orchestrated skills for Claude Code — this is what makes the methodology practical for a single developer or small team. The /ssd orchestrator sequences the right sub-skills for each development phase, giving you the equivalent of a senior architect, systems designer, and code reviewer on call at all times. The full skill set is free for personal and internal organizational use — github.com/AlexHorovitz/skills (library v1.3.0, 2026-04-18).
Skill Taxonomy
| Type | Skills | When you invoke directly |
|---|---|---|
| Bootstrap | /ssd-init |
Once, at project start (or when ssd/ has drifted) |
| Orchestrator | /ssd |
Always — start here after init |
| Domain | /architect, /coder, /systems-designer, /refactor |
When working outside the SSD workflow |
| Review | /code-reviewer, /codebase-skeptic, /software-standards |
On-demand or via SSD |
| Reference | /methodology |
When you want to understand SSD doctrine or score self-adherence |
Step 1: /ssd-init — Project Bootstrap
Run once per project before any /ssd phase. First-run housekeeping: creates ssd/ (gitignored working directory), writes ssd/project.yml (detected language, framework, platform, distribution channel), creates ssd/current.yml (active workstreams pointer), creates docs/decisions/ / docs/runbooks/ / docs/architecture/ (committed decision records), and runs SSD prerequisite checks.
Idempotent — safe to re-run. It never overwrites existing files, never deletes anything, and appends to ssd/init-log.md on each run.
/ssd-init./ssd refuses to proceed if ssd/project.yml is absent. Init is not auto-run — the user decides when to commit to the SSD convention.
Step 2: /ssd — The Orchestrator
Milestone → Verify Loop
Every milestone takes a before/after snapshot and requires explicit verification:
- Snapshot: record git SHA and metrics to
ssd/milestones/<topic>/sha-beforeandmetrics-before.yml. - Deep audit:
codebase-skepticwritesskeptic-before.md. - Refactor planning:
refactoremitsrefactor-plan.md— every item cites a specific finding ID fromskeptic-before.md. No cite → not in scope. - Validate:
code-reviewerwithremediation_mode: trueon each refactor PR. - Deploy and confirm production health.
- Verify (mandatory): re-run
codebase-skeptic→skeptic-after.md; diff frontmatter; re-runcode-revieweron the remediation diff. The milestone is complete only when all original BLOCKER/🔴/💀 findings are ✅ closed, no new BLOCKER-severity regression was introduced, and the remediation diff has no BLOCKERs. A refactor that claims to close findings without verification is indistinguishable from wishful thinking.
Sub-Skill Reference
| Skill | Role in SSD | Phase |
|---|---|---|
/ssd-init (v1.1.0) |
First-run housekeeping: creates ssd/ tree, writes project.yml + current.yml, runs prerequisite checks |
prerequisite to all phases |
/architect (v1.1.0) |
Design: models, services, API contracts, ADRs, current-scale baseline. Platform-adaptive (web, iOS, Android, macOS, headless); web guides cover Next.js, Django, FastAPI, Rails, Laravel, Angular, Vue/Nuxt, Spring Boot, ASP.NET Core. Integration has a first-class contract. | start, feature |
/systems-designer (v1.2.0) |
Production readiness: reliability, observability, deployment safety. Validates architect spec in Phase 0. Covers AI/LLM integration, compliance & data lifecycle, cost observability, and chaos/failure injection. | start, feature, ship |
/coder (v1.1.0) |
Implementation from spec (Python, TypeScript, Swift, Ruby, Java, C#, PHP, Go, Rust, C/C++, Obj-C). Halts if the architect spec omits a feature flag. Spec-drift check amends ADRs. Emits 03-coder-status.md with test/lint/typecheck results. |
feature |
/code-reviewer (v1.2.0) |
PR gate: BLOCKER/MAJOR findings block merge. Phase 1.5 prior-review follow-up (remediation mode) and Phase 3.5 fix-introduces-edge-cases. Red flags include LLM prompt injection, IntegrityError fetch mismatch, cache-without-race-test, release theatre. Loads examples.md reference. |
feature, milestone, gate, verify |
/codebase-skeptic (v1.2.0) |
Deep architectural critique through 10 expert voices. Mandatory Phase 2.5 Operational Failure Modes Sweep. Forward-Looking Pass in Phase 4. Incident-Story attestation (Beck), Domain-Modeling Stance (Evans), Deployment-Gate Hardening (Humble). | milestone |
/software-standards (v1.1.0) |
Adversarial comparative audit. Two modes: Comparative and Adversarial Single. Requires 2–3 evidence citations per /10 score. For vendor selection / legacy onboarding / pre-acquisition — not routine review. |
audit |
/refactor (v1.2.0) |
Post-ship targeted improvement. Every item cites a specific finding from skeptic-before.md. Step 4.5 Budget Check with halt-and-rollback. Step 5 per-item re-check loop closure. Step 6 systems-designer coordination trigger. Loads patterns.md reference. |
milestone |
/methodology (v1.2.0) |
SSD doctrine reference — Iron Law, Five Principles, Decision Framework. Provides machine-checkable rule source for /ssd gate. /methodology score emits a self-adherence metric. |
reference / any phase |
Review Tier Selection
Three skills do "review" work. Never chain all three — pick the right tier:
/code-reviewer— every PR, always, no exceptions (≤500 changed lines)/codebase-skeptic— milestone reviews and pre-release audits of an owned codebase/software-standards— comparative/adversarial evaluation only (vendor selection, legacy onboarding, pre-acquisition)
coder and a language-specific coder (e.g. python-django-coder) both apply, the specific one wins. code-reviewer and codebase-skeptic are mutually exclusive on the same scope. codebase-skeptic and software-standards are mutually exclusive.The SSD Artifact Tree
Every SSD invocation produces artifacts at well-known paths relative to the project root. Sub-skills read from and write to this tree — that is what lets a session resume, a reviewer verify, and a teammate onboard.
<project-root>/
├── docs/ # committed decision records
│ ├── decisions/ # ADRs from architect
│ ├── runbooks/ # runbooks from systems-designer
│ └── architecture/ # component diagrams, data models
└── ssd/ # gitignored working directory
├── project.yml # language, framework, platform
├── current.yml # active workstreams + budgets
├── features/
│ └── <slug>/
│ ├── 00-brief.md
│ ├── 01-architect.md
│ ├── 02-systems-designer.md
│ ├── 03-coder-status.md
│ ├── 04-code-review.md
│ └── 05-deploy.md
├── milestones/
│ └── YYYY-MM-DD-<topic>/
│ ├── sha-before
│ ├── metrics-before.yml
│ ├── skeptic-before.md
│ ├── refactor-plan.md
│ ├── refactor-prs.md
│ ├── skeptic-after.md
│ └── verification.md
├── audits/
│ └── YYYY-MM-DD-<scope>/
│ └── standards-report.md
└── archive/ # closed feature + milestone directories
Every primary output carries YAML frontmatter (skill, version, produced_at, scope, consumed_by). Review outputs add finding_counts and a computed gate_pass. Design outputs add a deliverables block. This is what makes /ssd gate mechanically checkable and milestone verification a frontmatter diff rather than prose reconciliation.
Session Continuity
On invocation, /ssd reads ssd/current.yml. Each active workstream carries a budget in hours. The orchestrator flags entries that are over budget ("suggest scope reduction, not more work") and entries last-touched more than 3 days ago ("stale work that may need a fresh audit"). Closing a workstream archives its artifacts under ssd/archive/features/<slug>/.
Methodology-Backed Gate Enforcement
Before /ssd gate passes, these doctrine rules are checked mechanically. Each cites a principle in methodology/core.md.
| Rule | Check | Source |
|---|---|---|
| Tests pass | <test-command> exits 0 | core.md §1 |
| No broken features | Covered by tests | core.md §2 |
| Docs match implementation | ADRs updated if architecture changed | core.md §2 |
| No WIP on main | git log grep for "WIP | checkpoint | TODO tomorrow" is empty | core.md §4 |
| Feature behind flag | Flag config delta present (unless infra) | core.md §3 |
| Deployable | CI passes; migration is reversible | core.md §2 |
If any rule fails, /ssd gate emits the failure with the doctrine cite and refuses to pass. "I know better" is not an override.
Hard Rules
1. No merge without a clean /ssd gate
No BLOCKER or MAJOR findings from the code-reviewer. No exceptions.
2. No incomplete work on main without a feature flag
WIP commits on main are banned. Use a feature flag or a local stash.
3. Tests must pass before and after every change
"I'll fix the tests tomorrow" is not a shippable state.
4. Refactor only after shipping
Separate PRs, never mixed with feature work. Milestones run after shipping, never instead of it.
5. Deploy beats perfection
Reduce scope rather than delay a deploy.
6. Production parity from day one
If you haven't deployed to production yet, that is your next task.