InsanelyGreat's Shippable States Software Development

The complete guide

A pragmatic, AI-powered engineering discipline for solo developers and small teams — maintain a deployable, production-ready state at all times, without a large engineering org.

Describes skills library v1.3.0 (released 2026-04-18) · VERSION · CHANGELOG · verify locally with cat ~/.claude/skills/VERSION

Jump to Getting Started

Overview

InsanelyGreat's Shippable States Software Development (SSD) is a pragmatic engineering discipline designed for solo developers and small teams who use AI — specifically Claude Code — to build production software. The system maintains a deployable, production-ready state at all times throughout the development cycle.

The core principle:

If you can't ship it right now, you don't have a product — you have a construction site.

SSD synthesizes lessons from continuous deployment, trunk-based development, feature flags, and decades of software engineering failures where "90% done" meant "months from shipping." It was designed from the ground up to be operated by one person or a handful of people, with AI as a force multiplier at every step of the workflow.

The methodology is simple: maintain a deployable state at all times. The discipline is hard: no shortcuts, no "we'll fix it later," no broken code on main. The payoff is enormous: no death marches, predictable delivery, high quality, low stress — without needing a 10-person eng team to enforce it.

Why It Matters

The "90% Done" Problem

Traditional development creates a predictable trap:

Week  1–8:  "Making good progress!"
Week  9:    "We're 90% done!"
Week 10:    "Still 90% done..."
Week 11:    "Uh, still 90%..."
Week 12:    Panic, cut features, ship something broken

Why? The last 10% includes all the work no one budgeted for:

  • Integration between components
  • Error handling and edge cases
  • Performance under real load
  • Production deployment and data migration
  • Security hardening and cross-browser testing
  • Accessibility and documentation

The InsanelyGreat's SSD solution: Do the hard "last 10%" work incrementally throughout development, not as a crisis at the end. Claude Code skills handle the checklist so you stay focused on shipping.

The Iron Law

Every project has exactly three variables:

  1. Scope — what features and capabilities ship
  2. Time — when it ships
  3. Quality — how well it works
Iron Law: You can fix at most ONE of these. The other two must flex.
Constraint What Flexes When to Use
Fix Time Scope reduces, quality preserved Hard deadlines (conference, contract, funding round)
Fix Scope Timeline extends, quality preserved API compliance, feature parity requirements
Fix Quality Time and scope flex Medical, financial, safety-critical systems

Most projects are time-constrained. Declare your constraint at kickoff. Adjusting scope to meet a deadline is not failure — it's engineering judgment.

Principle 1: Constant Production Parity

Your development environment must match production as closely as possible from Day 1.

Traditional

  • Weeks 1–8: Local development
  • Week 9: "Okay let's deploy to staging..."
  • Week 10: "Why doesn't it work in staging?"
  • Week 11: "Production is different from staging..."
  • Week 12: "What do you mean SSL certs take 3 days?"

InsanelyGreat's SSD

  • Day 1: Deploy "Hello World" to production
  • Day 2: Deploy first feature to production
  • Day 3: Deploy improved version
  • Day 30: Deploy to production (like every day)

Why this works: Deployment is never "the hard part" because you do it constantly. Production issues surface immediately when they're easy to fix. You know your deployment budget from Day 1.

Principle 2: The Shippable State Invariant

At the end of each work session, the system must be in a state where:

  • All tests pass
  • No compilation errors
  • No broken user-facing features
  • Documentation matches implementation
  • Could be deployed to production without embarrassment

Not required: feature-complete or meeting all goals. Just that what exists actually works.

The practical test: "If I got hit by a bus right now and someone else had to ship what I've committed, would they hate me?"

Principle 3: Feature Flags Over Feature Branches

Long-lived feature branches are antithetical to shippable states.

# Problem: Feature branch divergence
Main:    A---B---C---D---E---F---G---H
              \
Feature:       I---J---K---L---M
                                \
                          (days of merge conflicts)

# SSD: All work on main, behind flags
Main:    A---B---C---D---E---F---G---H

Day 1: Add feature code (flag off by default)
Day 2: Expand feature (still flagged off)
Day 3: Feature works, flip flag on

All work happens on main/trunk. The feature exists in production but is invisible until ready.

if feature_flags.is_enabled("new_checkout", user=user):
    return new_checkout_flow(user, plan)
else:
    return legacy_checkout_flow(user, plan)
Flag rollout order: Internal team → 1% → 10% → 50% → 100% → remove flag and dead code.

Principle 4: The Ratchet Principle

Forward progress only. Each commit improves the system in some measurable way.

Banned commits:

  • "WIP" or "checkpoint" commits
  • "Broken, will fix tomorrow"
  • Commented-out code "for later"
  • Partially implemented features visible to users

The ratchet mechanism — every commit must:

  • Pass CI/CD
  • Maintain or improve code coverage
  • Be deployable

If you need to save work that's not ready: use local stash (not committed), Draft PR with "DO NOT MERGE" (not on main), or a feature flag (committed, but invisible).

Principle 5: Scope Flexibility Is a Feature

Traditional thinking: "We must deliver all planned features by the deadline."
Result: Deliver nothing on time, or deliver broken features.

SSD thinking: "We deliver whatever is shippable by the deadline."
Result: Deliver working software, adjust scope based on reality.

Reducing scope is not failure — it's engineering judgment.

How to cut scope well:

  • Cut entire features, not the quality of existing features
  • Cut depth, not breadth (fewer powerful features beats many broken features)
  • Hide features behind flags rather than deleting (easy to resurrect)
  • Communicate cuts early and often to stakeholders

Pattern 1: Deployed Day One

Before writing any business logic, establish a real deployment to your distribution channel. The specifics vary by platform — pick yours:

  • Frontend deployed to a real URL (even if it just renders a title)
  • Backend API deployed and reachable from the frontend
  • Database provisioned and migrated
  • CI/CD: push to main → deploy to staging automatically
  • One authenticated route working end-to-end
  • Error tracking (Sentry) wired up in frontend and backend
  • Domain + SSL configured
  • App builds and runs on minimum target device/simulator
  • Main tab/navigation structure in place (empty screens are fine)
  • One piece of data persisted end-to-end: create → persist → relaunch → still there
  • Authenticated session working: login → token in Keychain → cold launch restores
  • CI: Xcode Cloud or GitHub Actions builds and runs tests on every push
  • App archived and submitted to TestFlight (even a Hello World build)
  • Crash reporting wired up (Sentry, Crashlytics, or Bugsnag)
  • App Store Connect record created with bundle ID matching the app
  • App builds and runs on minimum target API
  • Hilt dependency injection wired and working
  • Navigation structure in place with NavHost
  • One piece of data persisted end-to-end in Room: create → persist → kill app → relaunch → still there
  • Authenticated session working: login → token in DataStore → relaunch restores
  • CI: GitHub Actions or Bitrise builds debug APK and runs unit tests on every push
  • Internal Testing track on Play Console with a working build uploaded
  • Firebase Crashlytics (or equivalent) initialized and sending test crashes
  • App builds and launches on minimum target OS
  • Main window with placeholder navigation
  • One real piece of persisted data: create it, see it in the UI, relaunch — still there
  • Basic Settings window
  • CI: Xcode Cloud or GitHub Actions builds and runs tests on every push
  • Archive and notarization working (even for a Hello World app)
  • Crash reporting wired up (Sentry, Bugsnag, or Crashlytics)
  • Service containerized and deployed to production environment (even returning {"status": "ok"})
  • Health endpoints (/health, /ready) responding correctly
  • Database provisioned, connected, and one migration applied
  • Structured logging with request_id propagation on every request
  • Error tracking (Sentry or equivalent) capturing unhandled exceptions
  • One authenticated endpoint working end-to-end
  • CI/CD: push to main → container built, tests run, deployed to staging
  • .env.example documenting every required environment variable

This is your MVP. It does nothing useful, but it's real. If deployment takes 2 weeks and you budget 0 weeks, you're starting 2 weeks late on Day 1.

Pattern 2: Walking Skeleton

Build one feature end-to-end before building any feature fully complete.

Wrong order

  • Design all UI screens
  • Build all database tables / persistence
  • Write all API endpoints / services
  • Connect everything
  • Discover they don't fit together

Right order

  • Build login flow end-to-end
  • Build "add item" end-to-end
  • Build "edit item" end-to-end
  • Each step shippable as-is

Never build all UI then all backend/persistence. One complete flow first, then expand breadth. "End-to-end" means different things by platform: on web, UI → API → DB → response. On iOS, View → persist → relaunch → verify. On Android, Compose → Room → relaunch → verify. The principle is the same: one complete flow before breadth.

Pattern 3: Dark Launching

Launch features in production before they're visible to users. The pattern works across all platforms:

if feature_flags.new_dashboard and user.is_internal_tester:
    return render_new_dashboard(user)
return render_old_dashboard(user)
if featureFlags.isEnabled("newDashboard", user: user) {
    return NewDashboardView()
}
return OldDashboardView()
if (featureFlags.isEnabled("newDashboard", user)) {
    NewDashboardScreen()
} else {
    OldDashboardScreen()
}

Benefits:

  • Test in production without risk
  • Gradual rollout (internal → beta → everyone)
  • Easy rollback: flip flag
  • Development never blocks deployment
Critical rule: A feature is not "done" until the flag is removed. Flag code is technical debt — pay it off quickly once at 100%.
Platform note: Web feature flags can be server-side (hot-swap, no deploy needed). Mobile and desktop flags use an SDK (Firebase Remote Config, LaunchDarkly) and typically take effect on next app launch — flag changes are not instant on mobile.

Pattern 4: Timebox with Eject

For risky or exploratory work, timebox it with a pre-committed eject plan.

"We'll spend 3 days exploring this approach.
On day 3, we decide:
  - Ship it
  - Iterate it (extend timebox)
  - Abandon it (revert to last shippable state)"

This prevents:

  • Sunk cost fallacy ("we've invested so much...")
  • Endless exploration without shipping
  • Half-finished experiments in the codebase
The eject plan must be decided before starting, not when you're attached to the code.

Pattern 5: The Nightly Ritual

End each day with a shippable state. Spend the last 30 minutes on this checklist:

  • All tests pass locally
  • Code committed and pushed
  • CI/CD pipeline green
  • Feature flags set appropriately
  • Documentation updated if APIs changed
  • Tomorrow's first task identified

Your future self (or your teammate) should be able to pick up exactly where you left off, with no confusion about what state things are in.

Decision Framework

Choosing Your Constraint

At project kickoff, declare your primary constraint and communicate it explicitly.

Constraint Type Example Projects Scope Quality
Time-Constrained Conference demos, MVP for funding, contractual deliveries Flexes Preserved
Scope-Constrained API compliance, platform migrations, feature parity Fixed Preserved
Quality-Constrained Medical devices, financial systems, infrastructure Flexes Fixed

When to Cut Scope

Scope cuts should happen early and often, not as last-minute panic.

Cut scope now if you see these signals:

  • It's Wednesday and you're not confident about Friday's shippable state
  • You're accumulating technical debt faster than paying it off
  • Tests are being skipped "temporarily"
  • "We'll clean it up after shipping" is appearing in conversations

Metrics That Matter

Traditional (misleading) SSD (actually useful)
Lines of code written Days since last production deployment
Number of commits Mean time to deploy a change
Features "in progress" % of code behind feature flags (target: <5%)
Percentage complete Test coverage (and is it passing?)

Deployment Frequency

This is the single most important SSD metric:

  • Once per month — Traditional waterfall
  • Once per week — Decent
  • Once per day — Excellent
  • Multiple times per day — World-class
If you can't deploy daily, you don't have shippable states — you have deployment problems masquerading as development problems.
Platform note: "Deploy" means pushing to your primary distribution channel. For web: production server. For mobile: TestFlight / Play Internal Testing (daily). For App Store / Play Store production releases, weekly is excellent. The metric that matters: how quickly can a committed change reach a real tester?

Common Objections

"This sounds like more work"

You're doing the work either way. Option A: days 1–85 ignore deployment, days 86–100 frantic debugging, ship broken. Option B: do the hard parts incrementally every day, day 90 ship the fully-working subset you completed. Same total effort, drastically different stress and quality.

"Our stakeholders need to see progress"

SSD gives better demos. Traditional: "Here's a mockup... this button doesn't work yet... imagine when this is connected to the backend..." SSD: "Here's the actual working product. Press any button." Which demo builds more confidence?

"We need to iterate quickly"

False dichotomy. Shippable states don't slow iteration — they enable it. Every iteration is testable by real users. No integration phase blocking feedback. Pivots are cheap because sunk cost is always minimal.

"My team isn't disciplined enough"

This is exactly why you need this. Discipline problems are solved with systems, not willpower. CI/CD forces tests to pass. Can't commit broken code. Daily deployments force completion. Visible production state keeps everyone honest. SSD creates discipline through automation and forcing functions.

"This doesn't work for mobile apps"

It works. You cannot deploy to the App Store daily (review takes 1-3 days). But you CAN deploy to TestFlight / Play Internal Testing daily. SSD targets the internal deployment pipeline, not the store review process. TestFlight is your "production" for SSD purposes until you cut a release.

Feature flags on mobile use an SDK (Firebase Remote Config, LaunchDarkly). Flag changes take effect on next app launch, not instantly. When you cut a store release, it should be a non-event — you've been shipping to testers daily. For macOS desktop: notarization is your deployment gate. Automate it in CI from Day 1.

Getting Started

Four weeks to establish the SSD rhythm. Success criteria: on day 30, deploy to production with confidence in under 10 minutes.

Day 0: Bootstrap

  • Install the skills: git clone https://github.com/AlexHorovitz/skills ~/.claude/skills
  • Run /ssd-init once at the project root — creates the ssd/ working directory (gitignored), writes ssd/project.yml (detected stack/framework/platform), creates docs/decisions/, docs/runbooks/, docs/architecture/, and runs prerequisite checks
  • All /ssd phases refuse to proceed until init has run

Week 1: Foundation

  • Set up CI/CD pipeline
  • Deploy "Hello World" to your distribution channel (production server, TestFlight, Play Internal, notarized build)
  • Configure automated testing
  • Establish feature flag system (server-side for web, SDK-based for mobile/desktop)
  • Invoke /ssd start to run the Walking Skeleton playbook

Week 2: First Feature

  • Build one feature end-to-end
  • Deploy to production behind flag
  • Verify in production
  • Enable for internal users

Week 3: Rhythm

  • Deploy to production daily
  • Every commit passes CI
  • All incomplete features behind flags
  • Documentation current

Week 4: Optimization

  • Reduce deploy time to under 10 minutes
  • Increase test coverage
  • Remove old feature flags
  • Retrospective: what's working?
If you can't deploy daily after Week 1: Stop adding features. Fix the deployment pipeline. That is your only priority.

Platform-specific Day 1 checklists for iOS, Android, macOS, Web, and Headless are in Pattern 1: Deployed Day One above.

Claude Code Skills

InsanelyGreat's SSD is implemented as a set of orchestrated skills for Claude Code — this is what makes the methodology practical for a single developer or small team. The /ssd orchestrator sequences the right sub-skills for each development phase, giving you the equivalent of a senior architect, systems designer, and code reviewer on call at all times. The full skill set is free for personal and internal organizational use — github.com/AlexHorovitz/skills (library v1.3.0, 2026-04-18).

Skill Taxonomy

Type Skills When you invoke directly
Bootstrap /ssd-init Once, at project start (or when ssd/ has drifted)
Orchestrator /ssd Always — start here after init
Domain /architect, /coder, /systems-designer, /refactor When working outside the SSD workflow
Review /code-reviewer, /codebase-skeptic, /software-standards On-demand or via SSD
Reference /methodology When you want to understand SSD doctrine or score self-adherence

Step 1: /ssd-init — Project Bootstrap

Run once per project before any /ssd phase. First-run housekeeping: creates ssd/ (gitignored working directory), writes ssd/project.yml (detected language, framework, platform, distribution channel), creates ssd/current.yml (active workstreams pointer), creates docs/decisions/ / docs/runbooks/ / docs/architecture/ (committed decision records), and runs SSD prerequisite checks.

Idempotent — safe to re-run. It never overwrites existing files, never deletes anything, and appends to ssd/init-log.md on each run.

Prerequisite checks reported: CI/CD pipeline, test harness, linter/formatter, pre-commit hooks, feature flag system, deployed "Hello World", secrets management, README with setup steps. Missing items are flagged BLOCKER / MAJOR / MINOR — the user fixes them, not /ssd-init.
/ssd refuses to proceed if ssd/project.yml is absent. Init is not auto-run — the user decides when to commit to the SSD convention.

Step 2: /ssd — The Orchestrator

/ssd start Walking Skeleton: architect + systems-designer for Day-1 deploy
/ssd feature Feature loop: architect → systems-designer → coder → code-reviewer → deploy
/ssd milestone Deep audit + targeted refactor (runs after shipping)
/ssd verify Mandatory remediation verification after milestone refactor
/ssd gate Shippable-state check (code-reviewer + methodology rules)
/ssd ship Deploy readiness check (systems-designer checklist)
/ssd audit Adversarial comparative review — the nuclear option

Milestone → Verify Loop

Every milestone takes a before/after snapshot and requires explicit verification:

  1. Snapshot: record git SHA and metrics to ssd/milestones/<topic>/sha-before and metrics-before.yml.
  2. Deep audit: codebase-skeptic writes skeptic-before.md.
  3. Refactor planning: refactor emits refactor-plan.md — every item cites a specific finding ID from skeptic-before.md. No cite → not in scope.
  4. Validate: code-reviewer with remediation_mode: true on each refactor PR.
  5. Deploy and confirm production health.
  6. Verify (mandatory): re-run codebase-skepticskeptic-after.md; diff frontmatter; re-run code-reviewer on the remediation diff. The milestone is complete only when all original BLOCKER/🔴/💀 findings are ✅ closed, no new BLOCKER-severity regression was introduced, and the remediation diff has no BLOCKERs. A refactor that claims to close findings without verification is indistinguishable from wishful thinking.

Sub-Skill Reference

Skill Role in SSD Phase
/ssd-init (v1.1.0) First-run housekeeping: creates ssd/ tree, writes project.yml + current.yml, runs prerequisite checks prerequisite to all phases
/architect (v1.1.0) Design: models, services, API contracts, ADRs, current-scale baseline. Platform-adaptive (web, iOS, Android, macOS, headless); web guides cover Next.js, Django, FastAPI, Rails, Laravel, Angular, Vue/Nuxt, Spring Boot, ASP.NET Core. Integration has a first-class contract. start, feature
/systems-designer (v1.2.0) Production readiness: reliability, observability, deployment safety. Validates architect spec in Phase 0. Covers AI/LLM integration, compliance & data lifecycle, cost observability, and chaos/failure injection. start, feature, ship
/coder (v1.1.0) Implementation from spec (Python, TypeScript, Swift, Ruby, Java, C#, PHP, Go, Rust, C/C++, Obj-C). Halts if the architect spec omits a feature flag. Spec-drift check amends ADRs. Emits 03-coder-status.md with test/lint/typecheck results. feature
/code-reviewer (v1.2.0) PR gate: BLOCKER/MAJOR findings block merge. Phase 1.5 prior-review follow-up (remediation mode) and Phase 3.5 fix-introduces-edge-cases. Red flags include LLM prompt injection, IntegrityError fetch mismatch, cache-without-race-test, release theatre. Loads examples.md reference. feature, milestone, gate, verify
/codebase-skeptic (v1.2.0) Deep architectural critique through 10 expert voices. Mandatory Phase 2.5 Operational Failure Modes Sweep. Forward-Looking Pass in Phase 4. Incident-Story attestation (Beck), Domain-Modeling Stance (Evans), Deployment-Gate Hardening (Humble). milestone
/software-standards (v1.1.0) Adversarial comparative audit. Two modes: Comparative and Adversarial Single. Requires 2–3 evidence citations per /10 score. For vendor selection / legacy onboarding / pre-acquisition — not routine review. audit
/refactor (v1.2.0) Post-ship targeted improvement. Every item cites a specific finding from skeptic-before.md. Step 4.5 Budget Check with halt-and-rollback. Step 5 per-item re-check loop closure. Step 6 systems-designer coordination trigger. Loads patterns.md reference. milestone
/methodology (v1.2.0) SSD doctrine reference — Iron Law, Five Principles, Decision Framework. Provides machine-checkable rule source for /ssd gate. /methodology score emits a self-adherence metric. reference / any phase

Review Tier Selection

Three skills do "review" work. Never chain all three — pick the right tier:

  • /code-reviewer — every PR, always, no exceptions (≤500 changed lines)
  • /codebase-skeptic — milestone reviews and pre-release audits of an owned codebase
  • /software-standards — comparative/adversarial evaluation only (vendor selection, legacy onboarding, pre-acquisition)
Skill-overlap priority: when coder and a language-specific coder (e.g. python-django-coder) both apply, the specific one wins. code-reviewer and codebase-skeptic are mutually exclusive on the same scope. codebase-skeptic and software-standards are mutually exclusive.

The SSD Artifact Tree

Every SSD invocation produces artifacts at well-known paths relative to the project root. Sub-skills read from and write to this tree — that is what lets a session resume, a reviewer verify, and a teammate onboard.

<project-root>/
├── docs/                                 # committed decision records
│   ├── decisions/                        # ADRs from architect
│   ├── runbooks/                         # runbooks from systems-designer
│   └── architecture/                     # component diagrams, data models
└── ssd/                                  # gitignored working directory
    ├── project.yml                       # language, framework, platform
    ├── current.yml                       # active workstreams + budgets
    ├── features/
    │   └── <slug>/
    │       ├── 00-brief.md
    │       ├── 01-architect.md
    │       ├── 02-systems-designer.md
    │       ├── 03-coder-status.md
    │       ├── 04-code-review.md
    │       └── 05-deploy.md
    ├── milestones/
    │   └── YYYY-MM-DD-<topic>/
    │       ├── sha-before
    │       ├── metrics-before.yml
    │       ├── skeptic-before.md
    │       ├── refactor-plan.md
    │       ├── refactor-prs.md
    │       ├── skeptic-after.md
    │       └── verification.md
    ├── audits/
    │   └── YYYY-MM-DD-<scope>/
    │       └── standards-report.md
    └── archive/                          # closed feature + milestone directories

Every primary output carries YAML frontmatter (skill, version, produced_at, scope, consumed_by). Review outputs add finding_counts and a computed gate_pass. Design outputs add a deliverables block. This is what makes /ssd gate mechanically checkable and milestone verification a frontmatter diff rather than prose reconciliation.

Session Continuity

On invocation, /ssd reads ssd/current.yml. Each active workstream carries a budget in hours. The orchestrator flags entries that are over budget ("suggest scope reduction, not more work") and entries last-touched more than 3 days ago ("stale work that may need a fresh audit"). Closing a workstream archives its artifacts under ssd/archive/features/<slug>/.

Methodology-Backed Gate Enforcement

Before /ssd gate passes, these doctrine rules are checked mechanically. Each cites a principle in methodology/core.md.

RuleCheckSource
Tests pass<test-command> exits 0core.md §1
No broken featuresCovered by testscore.md §2
Docs match implementationADRs updated if architecture changedcore.md §2
No WIP on maingit log grep for "WIP | checkpoint | TODO tomorrow" is emptycore.md §4
Feature behind flagFlag config delta present (unless infra)core.md §3
DeployableCI passes; migration is reversiblecore.md §2

If any rule fails, /ssd gate emits the failure with the doctrine cite and refuses to pass. "I know better" is not an override.

Hard Rules

1. No merge without a clean /ssd gate

No BLOCKER or MAJOR findings from the code-reviewer. No exceptions.

2. No incomplete work on main without a feature flag

WIP commits on main are banned. Use a feature flag or a local stash.

3. Tests must pass before and after every change

"I'll fix the tests tomorrow" is not a shippable state.

4. Refactor only after shipping

Separate PRs, never mixed with feature work. Milestones run after shipping, never instead of it.

5. Deploy beats perfection

Reduce scope rather than delay a deploy.

6. Production parity from day one

If you haven't deployed to production yet, that is your next task.