The MIND runtime provides deterministic execution of compiled models with minimal overhead. It supports multiple deployment modes from embedded devices to cloud servers.

Architecture

┌─────────────────────────────────────┐
│              Application                  │
├───────────────────────────────────────────┤
│          Runtime API (C/Rust)             │
├───────────────────────────────────────────┤
│   Executor   │   Memory Manager           │
├──────────────┼────────────────────────────┤
│ CPU Backend  │  GPU + Accelerator Drivers │
│  (Shipped)   │      (Roadmap)             │
└──────────────┴────────────────────────────┘

The CPU backend ships in the open-source compiler and is the default execution target for compiled MIND artifacts. GPU and accelerator execution (CUDA, ROCm, Metal, and others) ships in the commercial mind-runtime, available to consumers under a commercial license; bit-identical determinism across those substrates is the active roadmap.

GPU Runtime (Commercial)

GPU and multi-vendor accelerator execution ships in the commercial mind-runtime under a commercial license. Capabilities include:

CUDA / ROCm / Metal: Vendor-native GPU backends via dynamic SDK loading
WebGPU / WebNN: Browser and edge-device acceleration targets
Specialized accelerators: TPU, NPU, FPGA, and ASIC targets under evaluation
Deterministic fallback: CPU reference path when a vendor SDK is unavailable

Execution Modes

Mode	Use Case	Characteristics
AOT (Ahead-of-Time)	Production deployment	Fastest startup, smallest binary
JIT (Just-in-Time)	Development, dynamic shapes	Flexible, runtime optimization
Interpreter	Debugging, conformance	Reference implementation

Memory Management

Static allocation: Memory planned at compile time for AOT
Arena allocator: Fast bump allocation for intermediate tensors
Buffer reuse: Automatic sharing of memory between non-overlapping tensors
Device memory: Unified API for CPU and GPU memory

Determinism Tiers

MIND defines three independently verifiable determinism tiers. Each tier addresses a different audit consumer; each is independently observable; an implementation may satisfy any subset, but conformance to a higher tier never weakens a lower tier. The normative reference is mind-spec performance §determinism-tiers.

Tier 1 — Build determinism (required)

Same MIND source bytes → byte-identical compiled artifact (mic@1 canonical text form, mic@3 canonical binary form (RFC 0021), and final native-ELF/cdylib/AOT object) across runs, machines, operating systems, and time. Verified by SHA-256 of the produced artifacts; the evidence-chain attestation (RFC 0016) anchors its trace_hash on the mic@3 binary (re-anchored 2026-05-31, prior mic@1 text anchor was lossy for function bodies). The native-ELF self-host fixed point is closed for the compiler front-end as of the v0.10.x line: the pure-MIND front-end reproduces its own bootstrap byte-identically, gated by the keystone suite (7/7). The bootstrap self-hosts; full-chain Rust-independence — removing Rust from the entire toolchain — is roadmap. The bootstrap mic@1 and mic@3 fixed points remain intact.

Tier 2 — Within-substrate runtime determinism (required in deterministic mode)

Same input bytes + same hardware + same selected code path → byte-identical output bytes, every invocation. IEEE 754-2008 strict for floating-point operations (including FMA). No threading non-determinism: deterministic mode disables work-stealing and ordered-reduction-violating optimizations.

// Create runtime with deterministic mode (default)
let rt = Runtime::new(RuntimeConfig {
    deterministic: true,  // IEEE 754 strict, no threading non-determinism
    seed: 42,             // RNG seed for reproducibility
});

// Same inputs always produce same outputs (Tier 2)
let out1 = model.forward(&input);
let out2 = model.forward(&input);
assert_eq!(out1, out2);  // Guaranteed

Opt-in SIMD fast paths (such as the dense BLAS-style intrinsics shipped under the std-surface feature) are within-substrate deterministic by construction: a fixed input on a fixed CPU evaluating a fixed code path produces a fixed output. SIMD reduction ordering may differ from sequential scalar reduction in floating-point, but the difference is bounded and itself deterministic given the same hardware.

Tier 3 — Cross-substrate Q16.16 bit-identity (optional, substrate-thesis tier)

Q16.16 fixed-point operations produce byte-identical results across the verified substrate pair — x86 (AVX2) and ARM (NEON) — gated by the cross_substrate suite and verified by SHA-256 over the concatenated (operation_id, q16_output) stream for a fixed conformance corpus. Extending this tier to GPU targets ships with the commercial mind-runtime and is roadmap.

Tier 3 is observable only on the Q16.16 path because integer-domain SIMD reduction is associative — SIMD fast paths produce identical byte sequences to scalar reference at every input length. Scalar IEEE-754 f64/f32 now runs on the strict deterministic path (no reassociation, no FMA contraction) and is run-to-run bit-identical, verified on x86_64 (AVX2) + ARM64 (NEON), 2026-07-05. Floating-point vector reduction is not associative; cross-substrate float vector bit-identity is not claimed for any tier. The Q16.16 path is the substrate-bridge from the verified CPU pair today to the GPU and accelerator backends on the commercial-runtime roadmap.

Tier	Scope	Claim	Verification
1	Compilation	Same source → same artifact	SHA-256 of build output
2	Runtime, within substrate	Same input + same code path → same output	Repeated-invocation hash match
3	Runtime, across substrates	Q16.16 output byte-identical across substrates (x86 == ARM verified; GPU roadmap)	SHA-256 of conformance corpus

Tier 3 implies Tier 2 for the Q16.16 path; Tier 2 implies nothing about Tier 3; Tier 1 is orthogonal to both.

Resource Limits

let config = RuntimeConfig {
    max_memory_mb: 1024,      // Memory limit
    max_threads: 4,           // Thread pool size
    timeout_ms: Some(5000),   // Execution timeout
    ..Default::default()
};

let rt = Runtime::new(config);

Profiling

// Enable profiling
let rt = Runtime::new(RuntimeConfig {
    profile: true,
    ..Default::default()
});

model.forward(&input);

// Get profile data
let profile = rt.get_profile();
for op in profile.operations {
    println!("{}: {}ms", op.name, op.duration_ms);
}

Learn More

See the full runtime specification at mind-spec/runtime.md and the runtime is available as part of MIND Enterprise.