Learn how to run MIND's performance benchmarks and verify the results yourself.

Prerequisites

# Clone the MIND repository
git clone https://github.com/star-ga/mind.git
cd mind

# Build MIND in release mode
cargo build --release

Determinism Benchmark

Verify that MIND produces bit-identical compilation output.

python3 benchmarks/determinism/benchmark_determinism.py

Expected Output

SUMMARY: 4/4 tests DETERMINISTIC
✅ DETERMINISM VERIFIED: All outputs are bit-identical across runs

What it tests: 4 different programs (scalar_math, small_matmul, medium_matmul, mlp), 10 compilation runs per program, SHA256 hash comparison. All hashes match = deterministic.

PyTorch Comparison Benchmark

Compare MIND frontend compilation speed vs PyTorch torch.compile() (GPU, Ampere-class GPU).

# Install PyTorch if needed
pip install torch

# Run comparison
python3 benchmarks/pytorch_comparison/benchmark_pytorch_compile.py

Expected Output

MIND (Criterion):                2.80-17.10 µs
PyTorch torch.compile GPU (Ampere-class GPU): 99-878 ms
Ratio:                           35,000-176,000×

Note: MIND measures frontend only (parse + typecheck + IR). PyTorch torch.compile() on GPU measures the full compilation pipeline including Triton/cuBLAS kernel generation. These are different scopes of work.

Mojo Comparison Benchmark

Compare MIND frontend compilation speed vs Mojo full LLVM compilation.

Results

MIND (Criterion):           1.8-6.1 µs
Mojo 0.26.1 (mojo build):  810-829 ms
Ratio:                      135,000-458,000×

Note: MIND measures frontend only (parse + typecheck + IR). Mojo mojo build performs full LLVM compilation to a native binary. Different scopes of work.

JAX Comparison Benchmark

Compare MIND frontend compilation speed vs JAX cold-start XLA compilation.

Results

MIND (Criterion):                   1.8-6.1 µs
JAX 0.9 (jax.jit cold-start XLA):  37.5-360.5 ms
Ratio:                              21,200-95,100×

Note: MIND measures frontend only (parse + typecheck + IR). JAX jax.jit() performs full XLA compilation (HLO lowering + optimization + code generation). Cache disabled via JAX_ENABLE_COMPILATION_CACHE=0. Different scopes of work.

Real Compilation Time (Criterion)

Measure MIND's true compilation time with in-process Criterion benchmarks.

# Run Criterion benchmarks (in-process, no subprocess overhead)
cargo bench --bench compiler
cargo bench --bench simple_benchmarks

Expected Output (RFC 0005 baseline, held through the current v0.10.x line)

compiler_pipeline/parse_typecheck_ir/small_matmul
                        time:   [2.75 µs 2.80 µs 2.85 µs]
compiler_pipeline/parse_typecheck_ir/medium_mlp
                        time:   [6.50 µs 6.55 µs 6.61 µs]
compiler_pipeline/parse_typecheck_ir/large_network
                        time:   [17.00 µs 17.10 µs 17.22 µs]

In-process Criterion benchmarks — no process spawning, no FFI overhead. Results may vary ±10% by hardware.

GPU Benchmarks (Enterprise)

These benchmarks run on the commercial mind-runtime and are not included in the open-source release. GPU execution is not part of the open-source build; contact sales for a commercial license.

The Enterprise runtime includes CUDA GPU benchmarks. Contact sales for access to:

Memory allocation: CachingAllocator vs cudaMalloc (180x improvement)
MatMul performance: cuBLAS with TF32/FP16 Tensor Cores (35-40% faster than PyTorch)
Elementwise operations: float4 vectorized kernels (98% bandwidth utilization)
Supported GPUs: NVIDIA SM_80+ (Ampere, Ada Lovelace, Hopper)

See Enterprise for licensing details.

Understanding the Results

Why Python Bindings?

The Python bindings (PyO3) allow calling the Rust compiler directly from Python, eliminating:

Process spawning overhead (~2-3 ms)
Inter-process communication (~1-2 ms)
Total overhead: ~5 ms

This reveals MIND's true compilation performance: 2.80-17.10 µs (varies by machine)

Subprocess vs Direct Call

subprocess.run("mind compile")

Spawn process: ~2-3 ms
IPC overhead: ~1-2 ms
Actual compile: 2.80-17.10 µs
TOTAL: ~5 ms

mind.compile() (Python binding)

Direct function call: ~0 µs
Actual compile: 2.80-17.10 µs
TOTAL: 2.80-17.10 µs

Benchmark Methodology

Same-Machine Testing

All comparisons performed on identical hardware:

Same CPU, RAM, OS
Same Python version
Sequential testing (no parallel interference)
Controlled environment

Statistical Rigor

Warmup: 10 runs (eliminate cold-start)
Sample size: 100 measurements
Outlier detection: Tukey's method
Confidence intervals: 95% CI
Precision: Nanosecond resolution (perf_counter)

Determinism Verification

SHA256 hashing: Cryptographic-strength verification
Byte-level comparison: Exact binary match
Multiple runs: 10+ per test
Zero tolerance: Any mismatch = failure

Reproducing Published Results

The published benchmark results are from:

Date	February 2026
Platform	Ubuntu 24.04, Linux 6.17, x86_64
GPU	Ampere-class GPU, CUDA 12.8
PyTorch	2.10.0+cu128
JAX	0.9.0.1
Mojo	0.26.1.0

To reproduce exactly:

cargo build --release
# Run benchmarks as shown above

Results should be within ±10% due to hardware differences.

MIC/MAP Format Benchmark

Compare MIC format efficiency against JSON, TOML, and TOON.

cd benchmarks
python3 format_benchmark.py

Token Efficiency Results

Format	Tokens	vs JSON	Parse Speed	Annual Cost (1M IRs)
JSON	278	baseline	5.31 us	$487
TOML	151	1.8x	137.06 us	$264
TOON	67	4.1x	2.67 us	$117
mic@1	52	5.3x	2.26 us	$91
mic@2	27	10.3x	—	$47

mic@3 (binary IRModule) encodes the same reference IR in 90 bytes — 12.4× smaller than JSON's 1,117 bytes. mic@2 saves $440/yearper million IR operations vs JSON at $0.00175/1K input-token pricing (mic@1 saves $396). Token counts use the paper's len/4 heuristic; real cl100k_base tokenization gives mic@1 3.4× / mic@2 5.6×.

MAP vs JSON-RPC

Protocol	Size	Tokens	vs JSON-RPC
JSON-RPC	1,004 bytes	251	baseline
MAP	234 bytes	58	4.3x fewer tokens

Next Steps

View Full Results — Complete benchmark data
Performance Overview — Understand the performance characteristics
Performance FAQ — Common questions answered

Running Benchmarks

Prerequisites

Determinism Benchmark

Expected Output

PyTorch Comparison Benchmark

Expected Output

Mojo Comparison Benchmark

Results

JAX Comparison Benchmark

Results

Real Compilation Time (Criterion)

Expected Output (RFC 0005 baseline, held through the current v0.10.x line)

GPU Benchmarks (Enterprise)

Understanding the Results

Why Python Bindings?

Subprocess vs Direct Call

subprocess.run("mind compile")

mind.compile() (Python binding)

Benchmark Methodology

Same-Machine Testing

Statistical Rigor

Determinism Verification

Reproducing Published Results

MIC/MAP Format Benchmark

Token Efficiency Results

MAP vs JSON-RPC

Next Steps