Reference Architecture · v0.5

Domain-AdaptiveManufacturing

Software that builds and configures itself to the reality of the manufacturer — from a two-person startup to Nestlé and Apple — without converging on the lowest common denominator, and without becoming a thousand unmaintainable bespoke codebases.

C · Bespoke surfaceGenerated experience + customer logic — unique per tenant
B · Adaptation engineComposes & configures validated blocks — shared engine, per-tenant output
A · Invariant substrateOntology · fact ledger · data fabric · governance — shared, trusted, fixed
determinism below · cognition above
Control layerPLC · motion · robot · safety — deterministic, real-time, certified
Document
Reference Architecture
Status
For discussion
Audience
Platform architects & eng leaders
Published via
insanely great

01  ·  The reframe

Standardize the physics. Bespoke the experience.

Traditional manufacturing SaaS — an MES bolted to an ERP — fails not because it shares a core, but because it draws the abstraction line backwards.

It standardizes the workflows, screens, and logic — the things that genuinely differ between companies and should be bespoke. And it leaves the primitives rigid and implicit — material, lot, operation, genealogy, measurement — the things that are actually universal and should be bedrock.

The result has weak opinions about physics and strong opinions about your daily routine. So the real process migrates into spreadsheets and tribal knowledge that live around the system, and the system of record becomes a fiction maintained for auditors.

Be uncompromising about the invariant. Hold zero opinion about the variable.

The agentic era makes this affordable, because the cost that forced one-size-fits-all — human dev-years per customer — is collapsing. Build-once-sell-many only mattered when building cost human-years.

02  ·  Design tenets

Seven decisions everything follows from.

01

Strong opinions on the invariant; none on the variable

The whole architecture falls out of where this line is drawn.

02

Determinism below, cognition above

Real-time safety-bearing control stays deterministic. Agents live above a governed, audited, human-gated contract. No agent in the interlock path — ever.

03

Immutable truth, bespoke interpretation

One canonical, append-oriented fact ledger; many bespoke projections and logics computed over it.

04

Bespoke by composition, not by codegen

Adaptation composes and configures validated blocks. True code generation is the gated exception — so bespoke fit never becomes a forever liability.

05

Open-world ontology

A customer can add concepts the core doesn't know without forking the core.

06

The reality is the spec — continuously

Onboarding is never one-time. The platform re-adapts as plant, products, and regulations change. No replatforming cliff.

07

Generation is trusted only through governance

Every adaptive artifact is versioned, validated against tests and golden checks, signed, and auditable. You validate the generator and the envelope — not just each output.

03  ·  The three strata

Trusted where reality is common. Bespoke where it differs.

A · Substrate

Shared, trusted, fixed

Universal ontology (open-world). The immutable, event-sourced fact ledger. The data fabric mapping historian, MES, ERP, ATE, LIMS, supplier portals into the ontology. The control boundary. Governance & audit.

B · Engine

Shared engine, per-tenant output

A capability library of validated, domain-blind blocks. The agentic onboarding that induces a tenant's ontology and profile and composes what fits. A gated generation envelope. Continuous re-adaptation.

C · Surface

Unique per tenant

The generated experience — UIs, operator screens, dashboards, work instructions in the customer's vocabulary. The customer's own logic & decisions: scheduling rules, dispositions, escalation paths, definitions of good.

Two layers authored once. Two induced per tenant. One derived.

The kernel and the regulatory packs are written once and reused. The ontology mapping and adapters are induced or configured per tenant. The defaults are derived from archetype × risk. No layer requires a code fork per customer — which is what keeps "adapt to anyone" a product rather than a consultancy.

And it demotes your ERP: in the data fabric, SAP and the historian and the MES become sources and sinks among others, not the tyrants whose schema you must obey. The semantic center of gravity moves to the platform's open, customer-fit ontology. Run it over what you have; migrate authority into the ledger on your terms.

04  ·  One platform, every scale

A startup isn't running a lite version. It's running the same platform.

The architecture doesn't change across scale — only the tenant profile, the scale envelope, and the governance intensity do.

Startup

Sources
spreadsheets, a few machines, no historian
What the engine composes
a lightweight system from a few blocks
Governance
light
Substrate · blocks · engine
identical

Nestlé / Apple

Sources
thousands of sources, deep ERP, full OT stack
What the engine composes
the same blocks, at scale, heavily governed
Governance
regulated, audited, signed, validated
Substrate · blocks · engine
identical

The startup grows continuously into the enterprise shape instead of hitting the traumatic replatforming cliff — the "rip out the spreadsheets and survive an 18-month SAP implementation" rite of passage. Removing that cliff is, by itself, a serious value proposition.

05  ·  The conformance / alignment layer

Born aligned. Not validated after the fact.

Borrow the practice teachers use to align lesson plans to a standards framework, and apply it to manufacturing's governing specs: make the spec a structured target, bind every system part to it, and prove the binding with evidence drawn from the fact ledger.

Spec-as-data. An IATF clause, an AS9100 requirement, a 21 CFR control, a HACCP critical control point, a control-plan line, a GD&T callout — each decomposes into atomic requirement nodes with stable IDs, bindingness, and revision. The manufacturing analog of an addressable standard.

Conformance bindings. Every part — a capability block, a control limit, a workflow, a generated screen — declares which nodes it satisfies. Bidirectional: node → implementing parts (coverage), part → nodes served (justification). Orphans and gaps fall out automatically.

Evidence, and a live posture. A tag is a claim; evidence is proof. Each binding links to design-time evidence and to runtime results streamed from the ledger — and carries live state: satisfied / at-risk / failing. Compliance posture is just another projection over immutable truth. No binder is assembled before an audit; the posture is always current.

One thread, end to end

A turned shaft diameter, monitored by a single SPC control limit that simultaneously discharges a drawing tolerance and an IATF clause.

DRW:4471-A/rev-C/char-07
Ø12.000 ±0.025 mm · Key Product Characteristic
shall
IATF16949/9.1.1.1
SPC & demonstrated capability for special characteristics
shall
CTRLPLAN:line-14
Control-plan row (crosswalk)
shall
SPC:contract/turning-op-30/char-07
X̄-R limits · Cpk ≥ 1.33
design-time · registry · signed

Phase-I study · ARL calibration · gage R&R · Cpk study

runtime · fact ledger · live

measurement events · control state · rolling Cpk · gage calibration

live posture
Live conformance posture
satisfiedat-riskfailing

A projection over the fact ledger — rendered to both the audit view and the engineer's process view from the same underlying facts.

A control signal, a capability drop, or an out-of-cal gage flips the posture — and the flip lights up against both the drawing node and the IATF clause at once, because both are computed from the same ledger facts. The auditor's compliance view and the engineer's process view are one truth rendered twice.

And because bindings are live and bidirectional, a spec change — a drawing rev C → rev D tightening the tolerance — emits a precise re-validation worklist: exactly the one affected part, nothing more. Continuous adaptation produces a scoped delta instead of a full re-validation.

06  ·  The representation layer

GitOps for the factory.

Every human gate, every conformance review, the anti-lock-in promise, and agent-editability all require the bespoke surface to be legible, diffable, signable text. You cannot meaningfully gate what you cannot read.

ArtifactRepresentationWhy
Tenant profile, ontology, contracts, bindings, defaults
structured config
YAML as the editing skin over a strict schema; a typed config language (CUE / Dhall / KCL) where correctness is criticalhuman-diffable; the schema kills YAML's ambiguity footguns
Work instructions, SOPs, requirement statements, rationale
prose + metadata
Markdown + frontmatter — the SKILL.md patternone file: machine reads the header, human reads the body, the two can't drift
Scheduling logic, dispositions, alarms, envelopes
behavior
a small, constrained, sandboxed DSL with a real grammarlogic-in-YAML becomes the unmaintainable nested monster CI grew into
The fact ledger
truth
event store — not config at alldeclarative config and immutable events must stay strictly separate

If the entire bespoke surface is human-readable text under version control, the governance machinery is simply modern software practice: the human gate is a pull-request review; the validation envelope is CI — schema validation, golden checks, and the born-aligned conformance check; the audit trail is commit history; signing is your commit-signing policy; rollback is a revert; and the re-validation worklist is a CI job triggered when a requirement node changes.

The whole trust model falls out for free once customization is reviewed, validated, signed text.

07  ·  The control & automation layer

Lights-out isn't ungoverned. It's governance, front-loaded.

The platform is prepared by design to supervise and orchestrate the automation layer — and explicitly not to be the control or safety layer. Lights-out is the case that proves the boundary must be absolute.

L4 — ERP / business
L3 — MES / operationsplatform cognition · probabilistic · supervisory
L2 — SCADA / supervisorydispatch jobs · select & download recipes / robot programs · detect faults
governed · audited · human-gated contract
L1 — PLC · motion · robot · SAFETYdeterministic · hard-real-time · certified — IEC 61508/62061 · ISO 13849 · ISO 10218 / TS 15066
L0 — sensors / actuatorsthe platform observes & commands — never is — this layer

Any platform-initiated physical action passes through a governed actuation contract — itself a representation-layer artifact: human-readable, schema-validated, signed.

actuation_contract: id: cell-7/load-and-run authority_level: supervisory # never "safety" target: { resource: robot-cell-7, action: run_program } allowed: programs: [PGM-4471-rough, PGM-4471-finish] # whitelist, not arbitrary setpoints: spindle_rpm: { min: 800, max: 2400 } # hard clamps rate_limit: 1/min preconditions: # all must hold, checked locally - interlocks_satisfied: true - gage.calibration: in_date # a conformance binding - operational_envelope: cell-7/turning on_violation: reject_and_hold # deterministic local guardian audit: signed binds: [IATF16949/9.1.1.1, SOP/cell-7-startup] # conformance

The guardian that enforces on_violation lives at the certified layer, not in the platform — an out-of-envelope command is physically rejected regardless of what the cognition believes. And the human gate doesn't disappear at 3 a.m.; it relocates to design time. A human already approved exactly what the cell may do; at runtime it executes autonomously inside that pre-approved envelope.

operational_envelope: id: cell-7/turning ratified_by: j.smith (Mfg Eng) # design-time human gate valid_when: # the operational design domain material: [4140-steel] part_numbers: [4471-A/rev-D] shift: any # lights-out permitted autonomy: act_within_envelope # run unattended inside the domain on_edge: excursion_minor: { action: adjust_within_clamps, log: true } excursion_major: { action: safe_hold, quarantine_wip: true, page: oncall-remote } outside_domain: { action: safe_hold, page: oncall-remote } # never improvise twin_validated: true # simulated before deploy

The runtime's intelligence is boundary detection and safe degradation — recognize you're outside the validated domain, fail to a safe state, quarantine the WIP, page a remote human — never improvisation in a novel situation. Because lights-out removes the human who'd catch a bad config, "validate before deploy" for physical actuation means simulate against a digital twin first, not merely schema-check.

The representation layer is what makes the control layer safe to automate.

08  ·  The hard problems

What this costs, stated honestly.

Validation of adaptive systems under GxP / AS9100

Computer-system validation assumes a frozen system; continuous adaptation strains it. Block-and-envelope validation plus the conformance layer is the path — but it's a frontier, and it needs regulators on board.

The OT / safety boundary in practice

Clean in a diagram; messy against real PLC fleets and legacy interlocks. The safety-architecture proof is non-trivial.

Data & document quality

Adaptation quality is bounded by the legibility of the floor's data, control plans, recipes, and SOPs.

Human factors & transferability

If every plant's software is bespoke, a mobile workforce loses transferable skill — which argues for a consistent interaction grammar even as specifics are bespoke.

Inverted lock-in

A wholly agent-generated system is a new dependency — mitigated by open standards in the fabric and exportable, human-readable, version-controlled compositions.

The unsafe-to-guess core

Subgrouping, spec provenance, risk tier, CCP designation, disposition authority — gated decisions, across every capability.

09  ·  The fractal

SPC is the whole platform in miniature.

Statistical process control is one entry in the capability library — and it has its own invariant kernel, its own adaptation layer, its own tenant profile. The same pattern holds at the level of a single quality method and at the level of the entire platform: a fixed validated core, agentic adaptation, a governed bespoke surface.

That self-similarity is the strongest evidence the abstraction line is drawn in the right place rather than a convenient one. Build the platform and SPC is one of its first, best-defined capabilities. Build SPC well and you've prototyped the platform's whole metabolism in one domain.

The thesis in one line

Standardize the physics. Bespoke the experience. Gate the decisions that are unsafe to guess.

A rock-solid invariant substrate, a shared engine that adapts by composing validated blocks rather than generating throwaway code, and a bespoke surface generated to each customer's reality — with a hard line keeping deterministic control below and probabilistic cognition above.

Discuss the state of the project