Running Benchmarks

Learn how to run MIND's performance benchmarks and verify the results yourself.

Prerequisites

# Clone the MIND repository
git clone https://github.com/star-ga/mind.git
cd mind

# Build MIND in release mode
cargo build --release

Determinism Benchmark

Verify that MIND produces bit-identical compilation output.

python3 benchmarks/determinism/benchmark_determinism.py

Expected Output

SUMMARY: 4/4 tests DETERMINISTIC
✅ DETERMINISM VERIFIED: All outputs are bit-identical across runs

What it tests: 4 different programs (scalar_math, small_matmul, medium_matmul, mlp), 10 compilation runs per program, SHA256 hash comparison. 100% identical hashes = deterministic.

PyTorch Comparison Benchmark

Compare MIND frontend compilation speed vs PyTorch torch.compile() (GPU, RTX 3080).

# Install PyTorch if needed
pip install torch

# Run comparison
python3 benchmarks/pytorch_comparison/benchmark_pytorch_compile.py

Expected Output

MIND (Criterion):                1.8-15.5 µs
PyTorch torch.compile GPU (RTX 3080): 99-878 ms
Ratio:                           35,000-176,000×

Note: MIND measures frontend only (parse + typecheck + IR). PyTorch torch.compile() on GPU measures the full compilation pipeline including Triton/cuBLAS kernel generation. These are different scopes of work.

Mojo Comparison Benchmark

Compare MIND frontend compilation speed vs Mojo full LLVM compilation.

Results (February 2026)

MIND (Criterion):           1.8-6.1 µs
Mojo 0.26.1 (mojo build):  810-829 ms
Ratio:                      135,000-458,000×

Note: MIND measures frontend only (parse + typecheck + IR). Mojo mojo build performs full LLVM compilation to a native binary. Different scopes of work.

JAX Comparison Benchmark

Compare MIND frontend compilation speed vs JAX cold-start XLA compilation.

Results (February 2026)

MIND (Criterion):                   1.8-6.1 µs
JAX 0.9 (jax.jit cold-start XLA):  37.5-360.5 ms
Ratio:                              21,200-95,100×

Note: MIND measures frontend only (parse + typecheck + IR). JAX jax.jit() performs full XLA compilation (HLO lowering + optimization + code generation). Cache disabled via JAX_ENABLE_COMPILATION_CACHE=0. Different scopes of work.

Real Compilation Time (Criterion)

Measure MIND's true compilation time with in-process Criterion benchmarks.

# Run Criterion benchmarks (in-process, no subprocess overhead)
cargo bench --bench compiler
cargo bench --bench simple_benchmarks

Expected Output (v0.2.1+)

compiler_pipeline/parse_typecheck_ir/small_matmul
                        time:   [1.75 µs 1.77 µs 1.80 µs]
compiler_pipeline/parse_typecheck_ir/medium_mlp
                        time:   [2.85 µs 2.88 µs 2.92 µs]
compiler_pipeline/parse_typecheck_ir/large_network
                        time:   [4.68 µs 4.75 µs 4.83 µs]

In-process Criterion benchmarks — no process spawning, no FFI overhead. Results may vary ±10% by hardware.

GPU Benchmarks (Enterprise)

The Enterprise runtime includes CUDA GPU benchmarks. Contact sales for access to:

  • Memory allocation: CachingAllocator vs cudaMalloc (180x improvement)
  • MatMul performance: cuBLAS with TF32/FP16 Tensor Cores (35-40% faster than PyTorch)
  • Elementwise operations: float4 vectorized kernels (98% bandwidth utilization)
  • Supported GPUs: NVIDIA SM_80+ (Ampere, Ada Lovelace, Hopper)

See Enterprise for licensing details.

Understanding the Results

Why Python Bindings?

The Python bindings (PyO3) allow calling the Rust compiler directly from Python, eliminating:

  • Process spawning overhead (~2-3 ms)
  • Inter-process communication (~1-2 ms)
  • Total overhead: ~5 ms

This reveals MIND's true compilation performance: 1.8-15.5 µs (varies by machine)

Subprocess vs Direct Call

subprocess.run("mind compile")

  • Spawn process: ~2-3 ms
  • IPC overhead: ~1-2 ms
  • Actual compile: 1.8-15.5 µs
  • TOTAL: ~5 ms

mind.compile() (Python binding)

  • Direct function call: ~0 µs
  • Actual compile: 1.8-15.5 µs
  • TOTAL: 1.8-15.5 µs

Benchmark Methodology

Same-Machine Testing

All comparisons performed on identical hardware:

  • Same CPU, RAM, OS
  • Same Python version
  • Sequential testing (no parallel interference)
  • Controlled environment

Statistical Rigor

  • Warmup: 10 runs (eliminate cold-start)
  • Sample size: 100 measurements
  • Outlier detection: Tukey's method
  • Confidence intervals: 95% CI
  • Precision: Nanosecond resolution (perf_counter)

Determinism Verification

  • SHA256 hashing: Cryptographic-strength verification
  • Byte-level comparison: Exact binary match
  • Multiple runs: 10+ per test
  • Zero tolerance: Any mismatch = failure

Reproducing Published Results

The published benchmark results are from:

DateFebruary 2026
PlatformUbuntu 24.04, Linux 6.17, x86_64
GPURTX 3080, CUDA 12.8
PyTorch2.10.0+cu128
JAX0.9.0.1
Mojo0.26.1.0

To reproduce exactly:

cargo build --release
# Run benchmarks as shown above

Results should be within ±10% due to hardware differences.

MIC/MAP Format Benchmark

Compare MIC format efficiency against JSON, TOML, and TOON.

cd benchmarks
python3 format_benchmark.py

Token Efficiency Results

FormatTokensvs JSONParse SpeedAnnual Cost (1M IRs)
JSON278baseline5.31 us$487
TOML1511.8x137.06 us$264
TOON674.1x2.67 us$117
MIC525.3x2.26 us$91

MIC saves $396/year per million IR operations vs JSON at GPT-5.2 pricing ($0.00175/1K input tokens).

MAP vs JSON-RPC

ProtocolSizeTokensvs JSON-RPC
JSON-RPC1,004 bytes251baseline
MAP234 bytes584.3x fewer tokens

Next Steps