Runtime

The MIND runtime provides deterministic execution of compiled models with minimal overhead. It supports multiple deployment modes from embedded devices to cloud servers.

Architecture

┌─────────────────────────────────────┐
│              Application                  │
├───────────────────────────────────────────┤
│          Runtime API (C/Rust)             │
├───────────────────────────────────────────┤
│   Executor   │   Memory Manager           │
├──────────────┼────────────────────────────┤
│ CPU Backend  │  GPU + Accelerator Drivers │
│  (Stable)    │  (17 production drivers)   │
└──────────────┴────────────────────────────┘

Backends (Apr 2026): 17 GPU + accelerator drivers ship in mind-runtime — CUDA (Enterprise), ROCm, Metal, WebGPU, WebNN, TPU, NPU, LPU, DPU, FPGA, ASIC, Cerebras, Taalas, Tenstorrent, SambaNova, Graphcore IPU, Intel Gaudi — plus the built-in CPU runtime. Each driver loads its vendor SDK at runtime via libloading and falls back to a deterministic CPU reference when the SDK is unavailable.

GPU Runtime (Enterprise)

The Enterprise GPU runtime provides production-grade CUDA acceleration:

  • cuBLAS/cuDNN: TF32 Tensor Cores for matmul, auto-tuned convolutions
  • Memory Allocator: CachingAllocator achieves 8.3M allocs/sec (180x faster than cudaMalloc)
  • Tensor Cores: TF32, FP16, FP8 (Ada Lovelace+) with PTX mma.sync
  • Async Streams: 8 streams (6 compute, 2 transfer) for overlapped execution
  • Supported GPUs: SM_80+ (Ampere, Ada Lovelace, Hopper)

Execution Modes

ModeUse CaseCharacteristics
AOT (Ahead-of-Time)Production deploymentFastest startup, smallest binary
JIT (Just-in-Time)Development, dynamic shapesFlexible, runtime optimization
InterpreterDebugging, conformanceReference implementation

Memory Management

  • Static allocation: Memory planned at compile time for AOT
  • Arena allocator: Fast bump allocation for intermediate tensors
  • Buffer reuse: Automatic sharing of memory between non-overlapping tensors
  • Device memory: Unified API for CPU and GPU memory

Determinism Guarantees

The runtime provides strong determinism guarantees:

// Create runtime with deterministic mode (default)
let rt = Runtime::new(RuntimeConfig {
    deterministic: true,  // IEEE 754 strict, no threading non-determinism
    seed: 42,             // RNG seed for reproducibility
});

// Same inputs always produce same outputs
let out1 = model.forward(&input);
let out2 = model.forward(&input);
assert_eq!(out1, out2);  // Guaranteed

Resource Limits

let config = RuntimeConfig {
    max_memory_mb: 1024,      // Memory limit
    max_threads: 4,           // Thread pool size
    timeout_ms: Some(5000),   // Execution timeout
    ..Default::default()
};

let rt = Runtime::new(config);

Profiling

// Enable profiling
let rt = Runtime::new(RuntimeConfig {
    profile: true,
    ..Default::default()
});

model.forward(&input);

// Get profile data
let profile = rt.get_profile();
for op in profile.operations {
    println!("{}: {}ms", op.name, op.duration_ms);
}

Learn More

See the full runtime specification at mind-spec/runtime.md and the runtime is available as part of MIND Enterprise.