Running Benchmarks
Learn how to run MIND's performance benchmarks and verify the results yourself.
Prerequisites
# Clone the MIND repository git clone https://github.com/star-ga/mind.git cd mind # Build MIND in release mode cargo build --release
Determinism Benchmark
Verify that MIND produces bit-identical compilation output.
python3 benchmarks/determinism/benchmark_determinism.py
Expected Output
SUMMARY: 4/4 tests DETERMINISTIC ✅ DETERMINISM VERIFIED: All outputs are bit-identical across runs
What it tests: 4 different programs (scalar_math, small_matmul, medium_matmul, mlp), 10 compilation runs per program, SHA256 hash comparison. 100% identical hashes = deterministic.
PyTorch Comparison Benchmark
Compare MIND frontend compilation speed vs PyTorch torch.compile() (GPU, RTX 3080).
# Install PyTorch if needed pip install torch # Run comparison python3 benchmarks/pytorch_comparison/benchmark_pytorch_compile.py
Expected Output
MIND (Criterion): 1.8-15.5 µs PyTorch torch.compile GPU (RTX 3080): 99-878 ms Ratio: 35,000-176,000×
Note: MIND measures frontend only (parse + typecheck + IR). PyTorch torch.compile() on GPU measures the full compilation pipeline including Triton/cuBLAS kernel generation. These are different scopes of work.
Mojo Comparison Benchmark
Compare MIND frontend compilation speed vs Mojo full LLVM compilation.
Results (February 2026)
MIND (Criterion): 1.8-6.1 µs Mojo 0.26.1 (mojo build): 810-829 ms Ratio: 135,000-458,000×
Note: MIND measures frontend only (parse + typecheck + IR). Mojo mojo build performs full LLVM compilation to a native binary. Different scopes of work.
JAX Comparison Benchmark
Compare MIND frontend compilation speed vs JAX cold-start XLA compilation.
Results (February 2026)
MIND (Criterion): 1.8-6.1 µs JAX 0.9 (jax.jit cold-start XLA): 37.5-360.5 ms Ratio: 21,200-95,100×
Note: MIND measures frontend only (parse + typecheck + IR). JAX jax.jit() performs full XLA compilation (HLO lowering + optimization + code generation). Cache disabled via JAX_ENABLE_COMPILATION_CACHE=0. Different scopes of work.
Real Compilation Time (Criterion)
Measure MIND's true compilation time with in-process Criterion benchmarks.
# Run Criterion benchmarks (in-process, no subprocess overhead) cargo bench --bench compiler cargo bench --bench simple_benchmarks
Expected Output (v0.2.1+)
compiler_pipeline/parse_typecheck_ir/small_matmul
time: [1.75 µs 1.77 µs 1.80 µs]
compiler_pipeline/parse_typecheck_ir/medium_mlp
time: [2.85 µs 2.88 µs 2.92 µs]
compiler_pipeline/parse_typecheck_ir/large_network
time: [4.68 µs 4.75 µs 4.83 µs]In-process Criterion benchmarks — no process spawning, no FFI overhead. Results may vary ±10% by hardware.
GPU Benchmarks (Enterprise)
The Enterprise runtime includes CUDA GPU benchmarks. Contact sales for access to:
- Memory allocation: CachingAllocator vs cudaMalloc (180x improvement)
- MatMul performance: cuBLAS with TF32/FP16 Tensor Cores (35-40% faster than PyTorch)
- Elementwise operations: float4 vectorized kernels (98% bandwidth utilization)
- Supported GPUs: NVIDIA SM_80+ (Ampere, Ada Lovelace, Hopper)
See Enterprise for licensing details.
Understanding the Results
Why Python Bindings?
The Python bindings (PyO3) allow calling the Rust compiler directly from Python, eliminating:
- Process spawning overhead (~2-3 ms)
- Inter-process communication (~1-2 ms)
- Total overhead: ~5 ms
This reveals MIND's true compilation performance: 1.8-15.5 µs (varies by machine)
Subprocess vs Direct Call
subprocess.run("mind compile")
- Spawn process: ~2-3 ms
- IPC overhead: ~1-2 ms
- Actual compile: 1.8-15.5 µs
- TOTAL: ~5 ms
mind.compile() (Python binding)
- Direct function call: ~0 µs
- Actual compile: 1.8-15.5 µs
- TOTAL: 1.8-15.5 µs
Benchmark Methodology
Same-Machine Testing
All comparisons performed on identical hardware:
- Same CPU, RAM, OS
- Same Python version
- Sequential testing (no parallel interference)
- Controlled environment
Statistical Rigor
- Warmup: 10 runs (eliminate cold-start)
- Sample size: 100 measurements
- Outlier detection: Tukey's method
- Confidence intervals: 95% CI
- Precision: Nanosecond resolution (perf_counter)
Determinism Verification
- SHA256 hashing: Cryptographic-strength verification
- Byte-level comparison: Exact binary match
- Multiple runs: 10+ per test
- Zero tolerance: Any mismatch = failure
Reproducing Published Results
The published benchmark results are from:
| Date | February 2026 |
| Platform | Ubuntu 24.04, Linux 6.17, x86_64 |
| GPU | RTX 3080, CUDA 12.8 |
| PyTorch | 2.10.0+cu128 |
| JAX | 0.9.0.1 |
| Mojo | 0.26.1.0 |
To reproduce exactly:
cargo build --release # Run benchmarks as shown above
Results should be within ±10% due to hardware differences.
MIC/MAP Format Benchmark
Compare MIC format efficiency against JSON, TOML, and TOON.
cd benchmarks python3 format_benchmark.py
Token Efficiency Results
| Format | Tokens | vs JSON | Parse Speed | Annual Cost (1M IRs) |
|---|---|---|---|---|
| JSON | 278 | baseline | 5.31 us | $487 |
| TOML | 151 | 1.8x | 137.06 us | $264 |
| TOON | 67 | 4.1x | 2.67 us | $117 |
| MIC | 52 | 5.3x | 2.26 us | $91 |
MIC saves $396/year per million IR operations vs JSON at GPT-5.2 pricing ($0.00175/1K input tokens).
MAP vs JSON-RPC
| Protocol | Size | Tokens | vs JSON-RPC |
|---|---|---|---|
| JSON-RPC | 1,004 bytes | 251 | baseline |
| MAP | 234 bytes | 58 | 4.3x fewer tokens |
Next Steps
- View Full Results — Complete benchmark data
- Performance Overview — Understand the performance characteristics
- Performance FAQ — Common questions answered