WebGPU Benchmark
GEMM — Matrix Multiplication
Apples-to-apples comparison: MindLang AOT-compiled WGSL shader vs ONNX Runtime Web's WebGPU backend. Same operation (1024×1024 GEMM), same GPU, same browser — 2 GFLOP of floating-point work per run.
AOT-compiled WGSL via mindc --target webgpu. Fetches pre-compiled gemm.wgsl with 8x4 register tiling, vec4 loads, and bank-conflict-free shared memory. Dispatches 128x64 output tiles via 16x16 workgroups.
ONNX Runtime Web 1.21 with WebGPU execution provider. Loads a static-shape MatMul model (matching the selected size), creates InferenceSession, and runs inference.
Source Code
MindLang source — tiled GEMM kernel with static tensor types, shared memory tiles, and workgroup barrier synchronization.
Compiled WGSL output from mindc. This is exactly what runs on your GPU — no runtime JIT, no graph construction overhead.
MIND project manifest. Declares the WebGPU target, 16x16 workgroup size, and high-performance power preference.
Methodology
Both paths perform the identical mathematical operation: C = A × B where A, B, C are 1024×1024 f32 matrices. Each run includes 1 warmup dispatch (not counted) followed by 5 timed dispatches with queue.onSubmittedWorkDone() synchronization between each.
MindLang uses a pre-compiled WGSL compute shader fetched from /bench/gemm/gemm.wgsl. The shader uses 8×4 register tiling (32 FMAs per inner-loop iteration), bank-conflict-free shared memory (stride-17 padding), and vec4 vectorized loads. Shader compile time is measured separately and not included in the dispatch average.
ONNX Runtime Web v1.21 uses the WebGPU execution provider. A static-shape MatMul ONNX model matching the selected size is loaded, giving ONNX RT full opportunity to specialize its kernel. Session init time (including ONNX graph compilation to WGSL) is measured separately.
Include Compile toggle amortizes each side's compile/init cost across all timed runs: effective = (compile + Σdispatch) / N. MindLang's “compile” is fetching a pre-built WGSL file and creating a pipeline; ONNX RT's init includes runtime WGSL shader generation from the ONNX graph.
Results vary by GPU, driver version, browser, and system load. Requires Chrome 113+, Edge 113+, or another browser with WebGPU enabled.