MIC-B v2 Binary Format

MIC-B v2 is a compact binary format for Mind IR graphs, designed for efficient storage and fast parsing with direct memory mapping.

Key Features

  • ~1.4-3x smaller than mic@2 text format (55 vs 78 bytes for residual block)
  • ULEB128 varints for space-efficient integers
  • String table deduplication for repeated identifiers
  • Deterministic — same graph produces identical bytes
  • Lossless roundtrip with mic@2

Wire Format Layout

┌─────────────────┬──────────────────────────────────┐
│ Offset          │ Content                          │
├─────────────────┼──────────────────────────────────┤
│ 0-3             │ Magic: "MICB" (4 bytes ASCII)    │
│ 4               │ Version: 0x02                    │
│ 5+              │ String Table                     │
│ ...             │ Symbol Table                     │
│ ...             │ Type Table                       │
│ ...             │ Value Table                      │
│ ...             │ Output (1 uleb128)               │
└─────────────────┴──────────────────────────────────┘

ULEB128 Encoding

Unsigned Little-Endian Base-128 encoding uses 7 bits per byte for data, with the MSB as a continuation flag:

ValueEncoded Bytes
0[0x00]
127[0x7F]
128[0x80, 0x01]
16383[0xFF, 0x7F]
16384[0x80, 0x80, 0x01]

Zigzag Encoding

Signed integers use zigzag encoding before ULEB128 for efficient representation of small negative values:

// Zigzag mapping
0  →  0
-1 →  1
1  →  2
-2 →  3
2  →  4
...

// Formula
encode(n) = (n << 1) ^ (n >> 63)
decode(z) = (z >> 1) ^ -(z & 1)

Table Structures

1. String Table

Interned strings for names and dimension tokens:

uleb128     count           # number of strings
repeat count:
  uleb128   byte_length     # UTF-8 byte length
  bytes     data            # UTF-8 content (no null terminator)

2. Symbol Table

References to symbolic dimension names:

uleb128     count           # number of symbols
repeat count:
  uleb128   string_idx      # index into string table

3. Type Table

Tensor type definitions:

uleb128     count           # number of types
repeat count:
  u8        dtype           # data type (see table)
  uleb128   rank            # number of dimensions
  repeat rank:
    uleb128 dim_str_idx     # index into string table

Data Type Encoding

ByteType
0f16
1f32
2f64
3bf16
4-7i8, i16, i32, i64
8-11u8, u16, u32, u64
12bool

4. Value Table

Values with implicit sequential IDs:

uleb128     count           # number of values
repeat count:
  u8        tag             # 0=Arg, 1=Param, 2=Node
  ...       payload         # tag-specific data

Arg/Param Payload (tag 0 or 1)

uleb128     name_str_idx    # index into string table
uleb128     type_idx        # index into type table

Node Payload (tag 2)

u8          opcode          # opcode byte
...         opcode_params   # opcode-specific parameters
uleb128     input_count     # number of inputs
repeat input_count:
  uleb128   input_id        # value ID (must be < current)

Opcode Encoding

ByteOpcodeExtra Params
0Matmulnone
1-4Add, Sub, Mul, Divnone
5Relunone
6Softmaxsleb128 axis
7-10Sigmoid, Tanh, GELU, LayerNormnone
11Transposeuleb128 n, n × sleb128
12Reshapenone
13-15Sum, Mean, Maxuleb128 n, n × sleb128 axes
16Concatsleb128 axis
17Splitsleb128 axis, uleb128 count
18Gathersleb128 axis
255Customuleb128 name_str_idx

Binary Example

Residual block Y = relu(X @ W + b) + X (~55 bytes vs 78 bytes mic@2 text):

4D 49 43 42 02              # Magic "MICB" + version 2
05                          # 5 strings
03 31 32 38                 # "128"
01 58                       # "X"
01 57                       # "W"
01 62                       # "b"
00                          # 0 symbols
02                          # 2 types
00 02 00 00                 # T0: f16 [128, 128]
00 01 00                    # T1: f16 [128]
07                          # 7 values
00 01 00                    # Arg("X", T0)
01 02 00                    # Param("W", T0)
01 03 01                    # Param("b", T1)
02 00 02 00 01              # Node(Matmul, [0, 1])
02 01 02 03 02              # Node(Add, [3, 2])
02 05 01 04                 # Node(Relu, [4])
02 01 02 05 00              # Node(Add, [5, 0])
06                          # Output: 6

Determinism Rules

  • String table uses first-seen insertion order
  • All tables maintain graph definition order
  • Varints use minimal encoding (no zero-padding)
  • No padding bytes between sections

Rust API

use mind::ir::compact::v2::{parse_micb, emit_micb, Graph, MicbError};
use std::io::Cursor;

// Parse MIC-B binary
let mut cursor = Cursor::new(bytes);
let graph = parse_micb(&mut cursor)?;

// Emit Graph to MIC-B binary
let mut output = Vec::new();
emit_micb(&graph, &mut output)?;

// Roundtrip is deterministic
let mut cursor2 = Cursor::new(&output);
assert!(graph.eq(&parse_micb(&mut cursor2)?));

Validation

Decoders MUST verify:

  • Magic bytes are exactly "MICB"
  • Version is 0x02
  • All string indices are in bounds
  • All type indices are in bounds
  • Node inputs reference earlier values only
  • Output references a valid value

Error Handling

On invalid input, decoders SHOULD:

  • Return an error with byte offset
  • Not panic or crash
  • Not allocate unbounded memory