QMX — Quantumind
Modular eXpert Interconnect
QMX defines the hardware interface, software protocol, and routing architecture needed to connect multiple heterogeneous ASIC inference modules into a single coherent Mixture of Experts system — running entirely on-premise, owned by the operator, with zero cloud dependency.
MoE is the dominant architecture. The interconnect is a closed monopoly.
Every frontier LLM shipped in 2025–2026 uses a Mixture of Experts architecture — DeepSeek V3/R1 (256 experts), Llama 4, Grok, Mistral Large. The reason is efficiency: only a fraction of model weights activate per token, so you get the capability of a 400B model at the compute cost of a 40B model.
The problem is that expert routing creates massive communication overhead. For DeepSeek V3 with 256 experts across 64 GPUs, expert weight loading consumes up to 98.9% of decode time. NVIDIA built NVLink 6 specifically to solve this — a proprietary, closed interconnect that only works within NVIDIA hardware.
When you want to run MoE on heterogeneous hardware — mixing Apple Silicon, AMD, FPGA modules, and purpose-built ASIC inference chips — there is no open standard for the interconnect or the routing layer. You are forced to either choose a single vendor's full stack or build bespoke glue code every time.
What exists today and why it falls short
Academic work (Mozart, A3D-MoE, NASiC) proposes chiplet architectures for MoE at wafer scale — research-stage, not field-deployable. MoE Sovereign (April 2026) is a software-only Docker-based multi-model router — useful, but purely a software abstraction layer with no hardware specification, no physical module standard, and explicitly cloud-flexible rather than sovereign-first. taitashaw/moe-router-engine is an open-source FPGA MoE token dispatcher — solves intra-machine routing overhead but targets homogeneous GPU clusters only.
Nobody has defined an open physical module specification that lets any ASIC inference chip — from any fab — plug into a common interconnect fabric, register its capabilities, and receive routed inference work from a sovereign on-premise orchestrator.
The gap QMX fills
QMX is the missing layer. It specifies:
- A physical module interface — the connector, power envelope, and signal standard that any ASIC or FPGA inference module must implement to join a QMX fabric
- A capability manifest protocol — how a module advertises its domain strengths, throughput, quantisation support, and current load to the fabric controller
- A routing protocol — how inference requests are dispatched to the right expert module based on semantic domain, latency budget, and queue depth
- A fabric controller spec — the software interface the Victron platform exposes to manage the full module registry, health monitoring, and upgrade lifecycle
QMX is designed to be silicon-agnostic. A Mac Mini's Neural Engine, an AMD Radeon inference module, an FPGA acceleration card, and a purpose-built ASIC chip can all coexist on the same QMX fabric — contributing their respective strengths to a single coherent intelligence stack.
The target deployment is not a hyperscaler data centre. It is a desk, a server room, or a small rack — owned by a person, a clinic, a law firm, or a government department.
Four-layer model
QMX separates concerns across four layers, each independently specifiable and upgradeable:
- L1 — Physical (QMX-M): The module connector standard. Defines pin layout, power rails (12V/5V/3.3V), PCIe 5.0 x4 signal lanes, and a dedicated QMX management bus (I²C variant) for out-of-band capability advertisement and health telemetry.
- L2 — Interconnect (QMX-F): The fabric standard. For local multi-module setups: CXL 3.x over the PCIe 5.0 lanes with memory pooling enabled. For rack-scale or distributed sovereign deployments: 400GbE with RDMA (RoCE v2) for sub-5µs inter-node latency.
- L3 — Capability Protocol (QMX-CAP): The manifest schema. Every module broadcasts a JSON capability document at boot and on state change, declaring: domain tags, active model identifiers, quantisation level, current TOPS, available VRAM, queue depth, time-to-first-token p50/p95, and thermal state.
- L4 — Routing (QMX-R): The dispatch protocol. The Victron fabric controller reads all registered QMX-CAP manifests and maintains a live routing table. Incoming inference requests are tagged by the router with a semantic domain score, then dispatched to the module best matched by domain, latency budget, and current load.
Compute Express Link
The primary interconnect for local multi-module QMX deployments. CXL 3.x provides cache-coherent memory pooling across all attached modules — meaning expert modules can share a unified memory fabric rather than copying tensors between devices. Latency: ~100ns. Bandwidth: up to 256 GB/s bidirectional over PCIe 5.0 x16. CXL is an open industry standard (not NVIDIA-proprietary), supported by Intel, AMD, Arm, and all major ASIC vendors. First CXL 3.x fabric switch silicon (Panmnesia PCIe 6.0/CXL 3.2) became available in late 2025.
PCI Express 5.0
For early-phase QMX deployments where CXL fabric switches are unavailable. QMX-M modules connect via PCIe 5.0 x4 (32 GB/s per module). The fabric controller manages memory transfers explicitly. Higher latency than CXL (~1–3µs) but sufficient for batched inference workloads where latency tolerance is ≥5ms.
Ethernet RDMA
For sovereign deployments spanning multiple physical nodes — different rooms, server racks, or buildings. RDMA over Converged Ethernet v2 (RoCE v2) brings inter-node latency to 1–5µs with kernel bypass. Each node runs its own local QMX fabric; the QMX-R router handles cross-node dispatch as a federated fabric. No NVSwitch, no InfiniBand licence, no vendor lock-in.
Ultra Accelerator Link
UALink 1.0 (targeting 1,024-device scaling from AMD, Intel, Astera Labs — hardware arriving late 2026) is on the QMX roadmap as the high-scale fabric option. QMX-F is designed to be transport-agnostic at the specification level; UALink support will be a QMX-F v1.1 extension.
Semantic routing, not load balancing
The QMX router is not a simple round-robin or least-loaded dispatcher. It performs semantic domain scoring on every incoming inference request and matches it to the module whose capability manifest best fits the task type.
Domain tags in the QMX-CAP schema are drawn from a controlled vocabulary: GENERAL, CODE, MATH, REASONING, MEDICAL, LEGAL, STRUCTURED_DATA, CREATIVE, MULTILINGUAL, VISION, AUDIO. A module may declare multiple tags with confidence weights.
Routing decision pipeline
Each request passes through four stages before dispatch:
- Domain classification: a lightweight local classifier (≤100M params, always-hot) tags the request with domain scores in <10ms
- Candidate selection: modules with matching domain tags are ranked by availability score (queue depth × latency p50 × thermal headroom)
- Latency budget check: if the request carries a latency SLA, only modules capable of meeting it are considered
- Dispatch + streaming: tokens stream back through the fabric controller to the caller; if the winning module saturates mid-stream, the router can hand off continuation to a secondary module
Multi-expert synthesis
For complex queries that span domains (e.g. a medical coding question requiring both MEDICAL and STRUCTURED_DATA expertise), QMX-R supports parallel expert dispatch with result synthesis. The fabric controller sends sub-queries to two modules simultaneously and a lightweight synthesis model (also resident on the host) merges the outputs. This is the true MoE pattern — not sequential fallback, but genuine parallel expert consultation.
Capability manifest — QMX-CAP schema
Every QMX-compliant module broadcasts this schema at registration and on any state change:
The manifest is published over the QMX management bus (out-of-band I²C) at boot, and pushed via a lightweight UDP multicast to the fabric controller on any field change. Pull endpoint also available at http://[module-ip]:7070/qmx/cap.
Encrypted at Rest
All model weights stored on a QMX module must be encrypted using AES-256-GCM at minimum. The decryption key is held in the module's secure enclave and never exposed on the fabric. Post-quantum key encapsulation (ML-KEM-768 / Kyber) is required for QMX v1.0 certification.
Encrypted in Transit
All QMX-F fabric traffic between the fabric controller and modules is encrypted. For CXL/PCIe: AEAD encryption at the QMX-P protocol layer. For Ethernet: mandatory TLS 1.3 with post-quantum hybrid key exchange (X25519 + ML-KEM-768).
Hardware Attestation
Each QMX module must present a hardware-rooted identity certificate (stored in a dedicated secure element, TPM 2.0 compatible) at registration. The fabric controller verifies the certificate chain before accepting any module into the routing table. Prevents rogue module injection.
Zero External Calls
QMX-compliant modules must not initiate any network connection outside the local QMX fabric. The fabric controller maintains an egress policy enforced at the OS level. No telemetry, no update pings, no vendor callbacks — unless the operator explicitly provisions an outbound channel.
Immutable Inference Log
All routing decisions, module assignments, and inference completions are logged to an append-only audit store on the fabric controller. Log integrity is protected by a hash chain (SHA-3). Required for regulated sector deployments (GDPR, NHS DSP Toolkit, defence classification).
Full Air-Gap Capable
The entire QMX fabric — fabric controller, all modules, routing layer — is designed to operate with zero external network connectivity. Updates are applied via signed offline packages. This is a hard design requirement, not an optional mode.
As of June 2026, no publicly available project or product occupies the same position as QMX. The table below maps the closest existing work against the QMX specification dimensions. Research conducted June 2026.
| Project / Product | Open spec | Physical module standard | Heterogeneous ASIC | Sovereign / on-premise first | MoE routing | Personal / SME scale |
|---|---|---|---|---|---|---|
| QMX (QSSI) | ✓ Apache 2.0 | ✓ QMX-M connector | ✓ By design | ✓ Core requirement | ✓ QMX-R semantic router | ✓ Mac Mini upward |
| MoE Sovereign (Apr 2026) | ✓ Apache 2.0 | ✗ Software only | ~ GPU clusters | ~ Optional, cloud-flexible | ✓ Software router | ~ Docker required |
| moe-router-engine (FPGA) | ✓ Open source | ~ FPGA only | ✗ Homogeneous GPU | ~ Not specified | ✓ Hardware token dispatch | ✗ Data centre focus |
| Mozart / A3D-MoE (academic) | ~ Research paper | ~ Custom chiplet | ✓ Chiplet heterogeneous | ✗ Cloud / HPC | ✓ Hardware MoE | ✗ Wafer-scale only |
| SambaNova RDU | ✗ Proprietary | ✗ Proprietary | ✗ SambaNova only | ✗ Cloud SaaS | ✓ MoE inference | ✗ Enterprise only |
| NVLink 6 / NVSwitch | ✗ NVIDIA proprietary | ✗ NVIDIA only | ✗ NVIDIA GPUs only | ~ On-premise available | ✓ High-bandwidth | ✗ $100k+ entry |
| CXL 3.x (standard) | ✓ Open standard | ✓ Physical spec | ✓ Heterogeneous | ✓ On-premise | ✗ Interconnect only, no router | ~ Server-class hardware |
The conclusion is clear: CXL provides the physical interconnect foundation, but no project combines it with an open module specification, a semantic routing protocol, and a sovereign-first design principle targeting personal and SME-scale deployments. QMX is the first specification to occupy this space.
Single-node QMX
One Mac Mini M4 Max or AMD workstation acts as both fabric controller and primary expert module. A single FPGA acceleration card connects via QMX-M (PCIe 5.0 x4 initially, CXL 3.x when available). Two experts on one physical machine. Suitable for individual professionals and students.
Multi-module local fabric
One AMD EPYC node as fabric controller. Two to four QMX-M modules on a CXL 3.x fabric switch — e.g. one FPGA module (code/structured data), one general-purpose ASIC module (reasoning/NL), one domain-specific fine-tune (medical/legal). 8–24 concurrent users. The target configuration for the Local AI Instance business tier.
Distributed federated fabric
Multiple physical nodes (NVIDIA DGX or custom builds), each running a local QMX CXL fabric, interconnected over 400GbE RoCE. The fabric controllers form a federated QMX-R cluster. Suitable for government departments, hospitals, and financial institutions requiring department-level expert segregation and national-scale inference throughput.
Software-first validation
Implement QMX-CAP manifest schema and QMX-R semantic router entirely in software on existing Mac Mini / AMD / NVIDIA hardware. Different locally-running models (LLaMA 3, Mistral, Qwen) act as software experts. Validate routing logic, latency budgets, domain classification accuracy, and synthesis quality before any custom silicon.
PCIe 5.0 module interface — QMX-M draft
Define the physical QMX-M connector and power spec. Prototype first QMX-M module as a reference design using an existing FPGA development board (Xilinx Alveo U55C or similar). Publish the QMX-M draft specification as an open document under Apache 2.0.
CXL 3.x fabric — QMX-F
Integrate CXL 3.x fabric switch (Panmnesia or equivalent) as the primary QMX-F interconnect. Implement memory pooling across fabric controller and FPGA module. Benchmark latency and bandwidth against software-baseline. Publish QMX-F specification draft.
Full specification release + ASIC qualification programme
Publish complete QMX v1.0 specification. Launch QMX Certification Programme — a formal process by which ASIC and FPGA vendors can qualify their modules as QMX-compliant. First ASIC inference chip certifications. Integration into the Victron AIO-32-MAX as the fabric controller reference platform.
UALink 1.0 fabric extension + direct-to-silicon ASIC modules
Add UALink 1.0 as a QMX-F transport option for large-scale sovereign deployments. Qualify first purpose-built ASIC inference chips (direct-to-silicon LLM inference modules) under the QMX-M physical spec. Enable petaflop-class sovereign MoE deployments on commodity rackspace.
This specification is open. Contribute to it.
QMX is intended as a community specification — not a QSSI-proprietary standard. ASIC vendors, FPGA engineers, system architects, and sovereign AI advocates are invited to contribute. The goal is an open ecosystem, not a moat.