MIC-B v2 Binary Format
MIC-B v2 is a compact binary format for Mind IR graphs, designed for efficient storage and fast parsing with direct memory mapping.
Key Features
- ~1.4-3x smaller than mic@2 text format (55 vs 78 bytes for residual block)
- ULEB128 varints for space-efficient integers
- String table deduplication for repeated identifiers
- Deterministic — same graph produces identical bytes
- Lossless roundtrip with mic@2
Wire Format Layout
┌─────────────────┬──────────────────────────────────┐ │ Offset │ Content │ ├─────────────────┼──────────────────────────────────┤ │ 0-3 │ Magic: "MICB" (4 bytes ASCII) │ │ 4 │ Version: 0x02 │ │ 5+ │ String Table │ │ ... │ Symbol Table │ │ ... │ Type Table │ │ ... │ Value Table │ │ ... │ Output (1 uleb128) │ └─────────────────┴──────────────────────────────────┘
ULEB128 Encoding
Unsigned Little-Endian Base-128 encoding uses 7 bits per byte for data, with the MSB as a continuation flag:
| Value | Encoded Bytes |
|---|---|
| 0 | [0x00] |
| 127 | [0x7F] |
| 128 | [0x80, 0x01] |
| 16383 | [0xFF, 0x7F] |
| 16384 | [0x80, 0x80, 0x01] |
Zigzag Encoding
Signed integers use zigzag encoding before ULEB128 for efficient representation of small negative values:
// Zigzag mapping 0 → 0 -1 → 1 1 → 2 -2 → 3 2 → 4 ... // Formula encode(n) = (n << 1) ^ (n >> 63) decode(z) = (z >> 1) ^ -(z & 1)
Table Structures
1. String Table
Interned strings for names and dimension tokens:
uleb128 count # number of strings repeat count: uleb128 byte_length # UTF-8 byte length bytes data # UTF-8 content (no null terminator)
2. Symbol Table
References to symbolic dimension names:
uleb128 count # number of symbols repeat count: uleb128 string_idx # index into string table
3. Type Table
Tensor type definitions:
uleb128 count # number of types
repeat count:
u8 dtype # data type (see table)
uleb128 rank # number of dimensions
repeat rank:
uleb128 dim_str_idx # index into string tableData Type Encoding
| Byte | Type |
|---|---|
| 0 | f16 |
| 1 | f32 |
| 2 | f64 |
| 3 | bf16 |
| 4-7 | i8, i16, i32, i64 |
| 8-11 | u8, u16, u32, u64 |
| 12 | bool |
4. Value Table
Values with implicit sequential IDs:
uleb128 count # number of values repeat count: u8 tag # 0=Arg, 1=Param, 2=Node ... payload # tag-specific data
Arg/Param Payload (tag 0 or 1)
uleb128 name_str_idx # index into string table uleb128 type_idx # index into type table
Node Payload (tag 2)
u8 opcode # opcode byte ... opcode_params # opcode-specific parameters uleb128 input_count # number of inputs repeat input_count: uleb128 input_id # value ID (must be < current)
Opcode Encoding
| Byte | Opcode | Extra Params |
|---|---|---|
| 0 | Matmul | none |
| 1-4 | Add, Sub, Mul, Div | none |
| 5 | Relu | none |
| 6 | Softmax | sleb128 axis |
| 7-10 | Sigmoid, Tanh, GELU, LayerNorm | none |
| 11 | Transpose | uleb128 n, n × sleb128 |
| 12 | Reshape | none |
| 13-15 | Sum, Mean, Max | uleb128 n, n × sleb128 axes |
| 16 | Concat | sleb128 axis |
| 17 | Split | sleb128 axis, uleb128 count |
| 18 | Gather | sleb128 axis |
| 255 | Custom | uleb128 name_str_idx |
Binary Example
Residual block Y = relu(X @ W + b) + X (~55 bytes vs 78 bytes mic@2 text):
4D 49 43 42 02 # Magic "MICB" + version 2
05 # 5 strings
03 31 32 38 # "128"
01 58 # "X"
01 57 # "W"
01 62 # "b"
00 # 0 symbols
02 # 2 types
00 02 00 00 # T0: f16 [128, 128]
00 01 00 # T1: f16 [128]
07 # 7 values
00 01 00 # Arg("X", T0)
01 02 00 # Param("W", T0)
01 03 01 # Param("b", T1)
02 00 02 00 01 # Node(Matmul, [0, 1])
02 01 02 03 02 # Node(Add, [3, 2])
02 05 01 04 # Node(Relu, [4])
02 01 02 05 00 # Node(Add, [5, 0])
06 # Output: 6Determinism Rules
- String table uses first-seen insertion order
- All tables maintain graph definition order
- Varints use minimal encoding (no zero-padding)
- No padding bytes between sections
Rust API
use mind::ir::compact::v2::{parse_micb, emit_micb, Graph, MicbError};
use std::io::Cursor;
// Parse MIC-B binary
let mut cursor = Cursor::new(bytes);
let graph = parse_micb(&mut cursor)?;
// Emit Graph to MIC-B binary
let mut output = Vec::new();
emit_micb(&graph, &mut output)?;
// Roundtrip is deterministic
let mut cursor2 = Cursor::new(&output);
assert!(graph.eq(&parse_micb(&mut cursor2)?));Validation
Decoders MUST verify:
- Magic bytes are exactly "MICB"
- Version is 0x02
- All string indices are in bounds
- All type indices are in bounds
- Node inputs reference earlier values only
- Output references a valid value
Error Handling
On invalid input, decoders SHOULD:
- Return an error with byte offset
- Not panic or crash
- Not allocate unbounded memory