a month ago 3 views

The Big Postcard/Brief/defmt/CBOR comparison

Not so big actually. Also not a comparison. Maybe more:

Some place where people write their favorite examples and ask others to phrase them in their way

Examples

Sending text that the sender doesn’t want to format as strings all along

… should we ask defmt to join in?

defmt

Regular defmt, 13 bytes, schema-unaware, with type annotations

With some guesswork / manual syntax as I’m doing this manually

00 # Interned "The gruffalo failed to eat: {}\n"
01 # Interned "Discouraged by {=str?}"
   # (treated as a plain formatting argument because there is 1 unfilled)
03 # String of length 3 (because there is 1 unfilled)
    666f78
02 # Interned "There will be {=int} more attempts.\n"
02 # Number 2
03 # Interned "The next will happen in {}s.\n"
03 # String of length 3 (rendered at production?)
    312e35

CBOR

With packed and known dictionary, 34 bytes and semantically readable

In CBOR we’ll assume that a packed dictionary has been set up, and reference into that dictionary. Because we can we’ll also not try to do float seconds (even though it’d be efficient on the wire), but rather not do any calculation and express a millisecond timer.

We’ll also not use an outer array, but just stream CBOR items, so technically it’s a CBOR stream.

Some choices here are more for interoperability than for compactness: With the many interned strings, it may easily be more compact to use plain numbers for known values and annotate literal numbers instead. However, this eases interoperbility when all parties know how packed things work. Compression points have not been particularly optimized; using the right dictionary setup shuffling the most common items first, packed items can be as short as 1 byte. (Here, most are 3 long, some 2).

Diagnostic notation:

6(30)   # compressed for "The gruffalo failed to eat: "
6(4)    # compressed for "Discouraged by\""
"Fox"   # That was not in our list of known / interned strings
"\"\n"  # No point in compressing that
6(32)   # compressed for "There will be "
2
6(33)   # compressed for " more attempts.\n"
6(34)   # compressed for "The next will happen in "
        # tagged value indicating a duration, using decimal fraction
        # 1500 x 10^-3
        # Note that 4([-3, 1500]) would have worked too, but then it
        # would not have indicated "seconds" to it.
1002({4: [-3, 1500]})
"\n"    # Again, no point in compressing that

On the wire (34 bytes):

C6               # tag(6)
   18 1E         # unsigned(30)
C6               # tag(6)
   04            # unsigned(4)
63               # text(3)
   466F78        # "Fox"
62               # text(2)
   220A          # "\"\n"
C6               # tag(6)
   18 20         # unsigned(32)
02               # unsigned(2)
C6               # tag(6)
   18 21         # unsigned(33)
C6               # tag(6)
   18 22         # unsigned(34)
D9 03EA          # tag(1002)
   A1            # map(1)
      04         # unsigned(4)
      82         # array(2)
         22      # negative(2)
         19 05DC # unsigned(1500)
61               # text(1)
   0A            # "\n"
Naïvely replacing defmt's serialization, 13 bytes

If the defmt serialization was just using CBOR instead, it would not gain the ability to seek to the next record (because you don’t know whether a number is a top-level string to be emitted, or part of a formatting argument as an interned string or a number)

Note that this is almost identical binary (63 instead of 03 for the string leaders); things would diverge if we used indices or item lengths in 25…=127, or anything beyond 256.

On the wire:

00 # Interned "The gruffalo failed to eat: {}\n"
01 # Interned "Discouraged by {=str?}"
   # (treated as a plain formatting argument because there is 1 unfilled)
63 # String of length 3 (because there is 1 unfilled)
    666f78
02 # Interned "There will be {=int} more attempts.\n"
02 # Number 2
03 # Interned "The next will happen in {}s.\n"
63 # String of length 3 (rendered at production?)
    312e35
Packed CBOR with functional table entries, 19 bytes

This is some way between the prior two items. It is skippable (if you are at the start of a standalone item you can seek to the next one with a plain CBOR decoder), and looks progressively better the more you understand of CBOR. If the known interning tables mismatch, unpacking may produce wrong expansions (as in defmt or the plain packed example), but the error would show when the number of arguments mismatches at some point.

This assumes an application-specific unpacker in which Rust style string argument expansion can be the forward application of the reference. There is no use here for inverted references (not sure what they might be used for here).

Two properties of this setup are not shown because they don’t come up in the Gruffalo example:

Diagnostic notation (this is what an unaware receiver sees, minus the comments):

216( # Functional "The gruffalo failed to eat: {}\n"
    217( # Functional "Discouraged by {:?}"
        "fox"
    ))
218( # Functional "There will be {} more attempts.\n"
    2
    )
219( # Functional "The next will happen in {}s.\n"
    4([-3, 1500]) # 1500 * 10^-3, assuming that is how the timer
                  # handles time internally; no need to calculate
    )

Diagnostic notation (as a decoder that has the tables but doesn’t know how they are applied may show it):

216(          # CPArust("The gruffalo failed to ead: {}\n")
    217(      # CPArust("Discouraged by {:?}")
        "fox"
    ))
218(2)        # CPArust("There will be {} more attempts.\n")
219(fd'1.500) # CPArust("The next will happen in {}s.\n")

On the wire (19 bytes):

# CBOR sequence with 3 elements
D8 D8           # tag(216)
   D8 D9        # tag(217)
      63        # text(3)
         666F78 # "fox"
D8 DA # tag(218)
   02 # unsigned(2)
D8 DB            # tag(219)
   C4            # tag(4)
      82         # array(2)
         22      # negative(2)
         19 05DC # unsigned(1500)

An ad-hoc struct is transported

CBOR

Extensible map

Picking a dict-based approach assuming that extensibility is not just “we might add a 4th thing at some point”. Assuming that some of the items should be backwards-compatible to ignore, the format picks the “negative is optional” convention.

{
    0 /greeting/: "hello",
    # Mapping enum items to numbers; if any of those are not C-style
    # enums, that might be using enum tags like 121("foo") for Other(str).
    1 /audience/: 2,
   -5 /bias    /: 4,
}
A3               # map(3)
   00            # unsigned(0)
   65            # text(5)
      68656C6C6F # "hello"
   01            # unsigned(1)
   02            # unsigned(2)
   05            # unsigned(5)
   04            # unsigned(4)