HMMLHyperMedia Markup Language

HMML Binary Format Specification - v1.0

HMML (HyperMedia Markup Language) is a binary container that stores HTML/CSS/SVG markup together with the binary resources it references (images, fonts). Markup is kept as text and may be compressed; resources are stored as raw bytes (no base64) and referenced from the markup via the hmml:<id> URI scheme.

Design goals: smaller than a self-contained HTML file (no base64 inflation), full layout freedom (the renderer is a browser), streamable, and forward-compatible.

All multi-byte integers are little-endian. Strings are UTF-8.


1. File layout

┌───────────────────────────────────────────────┐
│ Header                                          │
│   Signature   9 bytes                           │
│   Major       u8                                │
│   Minor       u8                                │
│   Codec       u8                                │
├───────────────────────────────────────────────┤
│ Chunk[0]                                        │
│ Chunk[1]                                        │
│ ...                                             │
│ Chunk[n]  (ENDF, conventionally last)           │
└───────────────────────────────────────────────┘

1.1 Header

Field Size Value
Signature 9 89 48 4D 4D 4C 0D 0A 1A 0A (\x89HMML\r\n\x1a\n)
Major 1 Major version. This document describes major 1.
Minor 1 Minor version. 0 here. Minor bumps are backward-readable.
Codec 1 Compression codec id used by compressed chunks (see §3).

The signature follows PNG's design: the high bit in 0x89 detects 7-bit transfer corruption; HMML is human-readable; \r\n\x1a\n catches newline translation and stops cat from dumping the body to a terminal.


2. Chunks

Every chunk has the same framing:

Field Size Notes
Type 4 Four ASCII bytes identifying the chunk kind.
Flags 1 Bit field (see below).
Length 4 u32 - byte length of Payload (excludes Type/Flags/Length and CRC).
Payload N Length bytes.
CRC32 4 u32 - present iff FLAG_HAS_CRC is set. See §4.

Flags

Bit Name Meaning
0 FLAG_COMPRESSED Payload was produced by the file's codec (§3).
1 FLAG_HAS_CRC A 4-byte CRC32 trails the payload.
2-7 reserved Must be 0 in v1.

A decoder must skip chunks whose Type it does not recognise (it has the Length, so it knows how far to advance). This is the forward-compatibility mechanism.

2.1 MARK - markup

The HTML/CSS/SVG document, UTF-8 encoded. If FLAG_COMPRESSED is set, the payload is the codec output; otherwise it is raw UTF-8. Exactly one MARK chunk per file.

Resource references inside the markup use the scheme hmml:<id>, valid anywhere a URL is expected - src, srcset, CSS url(), SVG <image href>, etc.

2.2 RSRC - resource

One referenced resource. Stored uncompressed (image formats are already compressed). Payload:

Field Size Notes
idLen u16 Byte length of id.
id idLen UTF-8 resource id (e.g. r0).
mimeLen u16 Byte length of mime.
mime mimeLen UTF-8 MIME (e.g. image/webp).
data remainder Raw resource bytes.

data runs to the end of the payload (its length = Length − 4 − idLen − mimeLen). There may be any number of RSRC chunks; ids should be unique.

2.3 META - metadata (optional)

A single JSON object, UTF-8 encoded, optionally compressed (FLAG_COMPRESSED). Used for title, author, timestamps, app-specific fields, etc. At most one per file.

2.4 ENDF - end of document

Zero-length payload. Marks the logical end; a decoder stops here. Optional but recommended (lets a reader detect truncation vs. a clean end).


3. Compression codecs

The header Codec byte names the algorithm used for any FLAG_COMPRESSED payload in the file. One codec per file. Reserved ids:

Id Algorithm
0 store (no compression)
1 DEFLATE raw (RFC 1951)
2 gzip (RFC 1952)
3 zlib / DEFLATE (RFC 1950)
4-15 reserved for future built-ins
≥16 application-defined

A decoder may auto-resolve ids 0-3 from the platform. For custom ids the caller must supply a matching codec. RSRC payloads are never compressed regardless of codec.


4. CRC32

When FLAG_HAS_CRC is set, the trailing u32 is the IEEE CRC32 (polynomial 0xEDB88320, reflected, init/final 0xFFFFFFFF) computed over the chunk's Type + Flags + Length + Payload bytes - i.e. everything in the chunk except the CRC field itself. CRC is opt-in per file (set on all chunks or none).


5. Reference resolution

hmml:<id> references are resolved by the reader, not the format. Typical modes:


6. Worked example (hex)

A minimal file: markup <b>hi</b>, no resources, store codec, no CRC.

89 48 4D 4D 4C 0D 0A 1A 0A   signature
01                            major = 1
00                            minor = 0
00                            codec = store
4D 41 52 4B                   "MARK"
00                            flags = 0
09 00 00 00                   length = 9
3C 62 3E 68 69 3C 2F 62 3E    "<b>hi</b>"
45 4E 44 46                   "ENDF"
00                            flags = 0
00 00 00 00                   length = 0