Common questions about MIND's performance characteristics.

Compilation Speed

How fast is MIND compilation?

2.80-17.10 microseconds for the frontend (parse + typecheck + IR), measured in-process via Rust Criterion benchmarks on the current pinned baseline. Time scales with program complexity: a small matmul program takes ~2.8 µs, a 3-layer network takes ~17.1 µs. This does not include code generation or linking.

How does this compare to other frameworks?

Framework	What's Measured	Time
MIND	Frontend (parse+typecheck+IR)	2.80-17.10 µs
PyTorch 2.10 (GPU)	Full pipeline (graph+codegen)	99-878 ms GPU cold-start (35,000-176,000x ratio)
Mojo 0.26.1	Full LLVM compilation (mojo build)	810-829 ms (135,000-458,000x ratio)
JAX 0.9	Cold-start XLA compilation (jax.jit())	37.5-360.5 ms (21,200-95,100x ratio)

Important:These measure different amounts of work. MIND's frontend (parse + typecheck + IR, no codegen) completes in microseconds, while PyTorch's full GPU torch.compile() pipeline, Mojo's full LLVM compilation, and JAX's cold-start XLA compilation each do substantially more — so the large ratios above reflect a scope difference, not just speed.

Why is MIND so fast?

Specialized design: Built specifically for tensor operations, not general-purpose
Single-pass compilation: No multi-stage optimization passes
Efficient type checking: O(n log n) type inference
Fast parser: O(n) recursive descent parsing
No runtime tracing: Pure static compilation

Does fast compilation hurt runtime performance?

No. MIND optimizes both compilation and runtime:

Fast frontend (2.80-17.10 µs) enables rapid iteration
Efficient runtime ensures production performance

Many frameworks optimize one at the expense of the other (e.g., XLA optimizes runtime but takes 10-100ms to compile).

Determinism

What does "deterministic" mean for MIND?

Every compilation of the same source code produces bit-identical output (same SHA256 hash, byte-for-byte identical) across different runs and machines. This guarantee is maintained on the keystone and --no-default-features builds to enable reproducible research and auditable compilation.

How is this verified?

We use SHA256 cryptographic hashing of the complete compilation output:

40 total test runs (4 programs × 10 runs each)
All 40 hashes matched — zero collisions
All runs bit-identical across the test suite

Why does determinism matter?

Reproducible research: Your results are exactly reproducible
Debugging: Eliminate non-determinism as a variable
Auditing: Verify production builds are identical to tested builds
Caching: Can safely cache compilation results

Do other frameworks have this?

Most frameworks do not guarantee determinism:

PyTorch: Non-deterministic (hash maps, random initialization)
JAX: "Mostly" deterministic (not guaranteed)
XLA: Non-deterministic (optimization passes)

Unlike most frameworks, MIND is designed to produce bit-identical output across runs and machines. The keystone and --no-default-features builds maintain byte-identity across substrates as a reference point for reproducible research.

Autodiff

What is "compile-time autodiff"?

MIND generates gradient computation code during compilation, not at runtime.

Traditional (runtime) autodiff

Run forward pass → Build tape
Run backward pass → Walk tape
Repeat every training iteration

MIND (compile-time) autodiff

Compile → Generate gradient IR
Training: Execute pre-generated code
No tape, no per-iteration cost

How much faster is it?

Over 1000 training iterations:

MIND: ~38 µs autodiff generation (paid once at compile time)
PyTorch: ~50-500 ms (paid every iteration)
Advantage: 1,345-11,284× more efficient (depending on model complexity)

Is there any runtime cost?

Zero per-iteration autodiff cost. The gradient code is already compiled — just execute it.

Benchmarks

Where can I see the full results?

Full benchmark results on GitHub

Can I reproduce the benchmarks?

Yes! See Running Benchmarks for step-by-step instructions.

What hardware were benchmarks run on?

Platform: Ubuntu 24.04, Linux 6.17 x86_64
GPU: NVIDIA Ampere-class GPU, CUDA 12.8
CPU: a commodity x86 CPU, 64GB DDR4
PyTorch: 2.10.0+cu128 (GPU)
JAX: 0.9.0.1 (CUDA)
Mojo: 0.26.1.0 (pixi)
MIND: v0.10.0 (Criterion in-process benchmarks; floor pinned to .bench-baseline-2026-05-18-rfc0005.txt: small 2.80µs / medium 6.55µs / large 17.10µs — held byte-identical across all v0.7.x–v0.10.0 builds since both the high-level std-surface and the low-level subset land on the same cold path)
Date: May 2026 (post RFC 0005)

Why use Python bindings for measurement?

Python subprocess.run() adds ~5ms overhead (process spawning + IPC). Python bindings (PyO3) eliminate this overhead to reveal true compilation time.

With subprocess: ~5.5 ms (includes ~5ms overhead)

With bindings: 2.80-17.10 µs (true compilation time)

Future Performance

Will compilation get even faster?

Yes! Planned improvements:

Short-term (6 months): Target <1 µs (2× faster)
Long-term (1-2 years): Target <0.5 µs (4× faster)

Methods: Parser optimizations, incremental compilation, caching

What about GPU support?

GPU support (CUDA, Metal, WebGPU, WebNN) ships in the commercial mind-runtime, available under a commercial license. Compilation stays fast (2.80-17.10 µs), with GPU-optimized runtime kernels; bit-identical determinism across substrates is the active roadmap.

See Roadmap for details.

Learn More

Performance Overview — Complete performance documentation
Running Benchmarks — Reproduce the results yourself
Full Benchmark Results — Complete verified data

Performance FAQ