5 Simple Binary Encoding Gotchas
Spending hours in frustration debugging SBE issues in your application? This post covers up to 80% of common usage issues related to simple binary encoding.
Join the DZone community and get the full member experience.
Join For FreeSpending hours in frustration debugging Simple Binary Encoding (SBE) issues in your application? You aren’t alone. I’ve been there before. This post hopes to alleviate some of your pains by covering up to 80% of common usage issues related to SBE (I believe).
Code examples: https://github.com/tommyqqt/sbe-gotchas.git
Let’s Recap
SBE is an ultra-fast codec commonly used in low latency financial applications such as FIX engines, pricing engines, etc. This post assumes that you are familiar with the basics.
If you are new to SBE, visit https://github.com/real-logic/simple-binary-encoding
This post refers to the specific SBE implementation in Java (version 1.19.0) developed by Real-Logic. It is not about the SBE FIX standard.
The same structure Block Fields-Repeating Groups-Var length fields can also be nested in each repeating group.
Fields in an SBE message have to be encoded/decoded sequentially unless the limit is at the beginning of a block whose fixed-length members can be accessed randomly.
Now let’s jump to the common gotchas!
1. When Encoded Length Isn’t Encoded Length
There are times when we would want to know the encoded length of an SBE message, such as sending the message over the wire or persisting it to a file.
How would you get the encoded length of an SBE message? If we just finished encoding the message and we have the encoder on hand, isn’t it just simply calling encoder.encodedLength()? Let’s try it out.
xxxxxxxxxx
final int encodedLength = nosEncoder.encodedLength();
final byte[] bytes = new byte[encodedLength];
buffer.getBytes(0, bytes);
final DirectBuffer readBuffer = new UnsafeBuffer(bytes);
wrapDecoder(headerDecoder, nosDecoder, readBuffer, 0);
System.out.println(nosDecoder);
xxxxxxxxxx
java.lang.IndexOutOfBoundsException: index=262 length=22 capacity=276
We get an exception because encoder.encodedLength() excludes the header length. The whole byte array is required to decode the message, not just the body.
xxxxxxxxxx
int encodedLengthFromDecoder = headerDecoder.encodedLength() + nosDecoder.encodedLength();
How to determine the encoded length if we only have the encoded buffer? Unfortunately, the decoder has to traverse to the end of the message to get the encoded length. Or, one other way is to remember the encoded length at the time that the message was encoded and pass it along with the encoded buffer as a method parameter.
xxxxxxxxxx
//Skip to the end
skipGroup(nosDecoder.allocations(), allocDec -> {
skipGroup(allocDec.nestedParties(), partyDec -> {
partyDec.nestedPartyDescription();
});
allocDec.allocDescription();
});
nosDecoder.traderDescription();
nosDecoder.orderDescription();
//decoder encoded length at end of message = actual encoded Length
encodedLengthFromDecoder = headerDecoder.encodedLength() + nosDecoder.encodedLength();
2. The Moving Repeating Group
One habit that we Java programmers usually adopt is that sometimes when we need to use a value returned by a method call multiple times whereby the value is supposed to stay the same between calls, we then call the method many times, instead of assigning its result to a local variable.
What happen when we try to obtain a reference to the start of a repeating group multiple times like the code below? Note that we haven’t even attempted to traverse the repeating group yet (by calling next()).
xxxxxxxxxx
//At start of repeating group, print the group count and current limit
System.out.println("Number of allocations: " + nosDecoder.allocations().count());
System.out.println("Current limit: " + nosDecoder.limit());
//Print the group count and limit again
System.out.println("Number of allocations: " + nosDecoder.allocations().count());
System.out.println("Current limit: " + nosDecoder.limit());
Common sense says we should get the sames count and limit both times, but it doesn’t work that way for SBE.
xxxxxxxxxx
Number of allocations: 2
Current limit: 30
Number of allocations: 20291
Current limit: 34
3. Mutating Var Length Field
What if there is a field whose value want to mutate after we have encoded the message? If we know which field we intended to backtrack later in advance, remember the limit just before encoding it, then use the limit to backtrack later.
xxxxxxxxxx
//I want to change trader description later so remember the limit here
final int limit = nosEncoder.limit();
nosEncoder.traderDescription("TRADER-1");
nosEncoder.orderDescription("ORDER DESC");
nosEncoder.limit(limit);
nosEncoder.traderDescription("TRADER-00001");
//Everything subsequent to the above needs to be encoded again
nosEncoder.orderDescription("ORDER DESC");
Unless the field is a fixed length field, every field subsequent to the mutated field needs to be encoded again.
4. The Semi-Forbidden Schema Evolution
Suppose “orderDescription” is a new var-length field that has just been added to the end of the schema like this:
xxxxxxxxxx
<sbe:message name="NewOrderSingle" id="0001" description="Example NewOrderSingle">
<field name="orderId" id="11" type="fixedStringEncoding16"/>
<field name="tradeDate" id="75" type="uint16"/>
<group name="allocations" id="78" dimensionType="groupSizeEncoding">
<field name="allocAccount" id="79" type="fixedStringEncoding16"/>
<field name="allocQty" id="80" type="double"/>
<group name="nestedParties" id="539" dimensionType="groupSizeEncoding">
<field name="nestedPartyID" id="524" type="fixedStringEncoding16"/>
<field name="nestedPartyRole" id="538" type="fixedStringEncoding16"/>
<data name="nestedPartyDescription" id="6051" type="varStringEncoding"/>
</group>
<data name="allocDescription" id="6052" type="varStringEncoding"/>
</group>
<data name="traderDescription" id="6053" type="varStringEncoding"/>
<!-- new var length field -->
<data name="orderDescription" id="6054" type="varStringEncoding"/>
</sbe:message>
If we are not using that field now, can we not have to change our code to encode/decode that new field? It’s at the end of the message anyway and surely regression test doesn’t pick up anything!
What happens when the code below runs?
xxxxxxxxxx
encodeOrder(nosEncoder, "ORDER-001", 20200701,
"TRADER-0123456789",
null,
buffer);
//Flip the buffer to decoder
wrapDecoder(headerDecoder, nosDecoder, buffer, 0);
System.out.println(nosDecoder);
encodeOrder(nosEncoder, "ORDER-002", 20200701,
"TRADER-0001", //Longer trader desc
null,
buffer);
//Flip the buffer to decoder
wrapDecoder(headerDecoder, nosDecoder, buffer, 0);
System.out.println(nosDecoder);
It explodes.
xxxxxxxxxx
java.lang.IndexOutOfBoundsException: index=265 length=926299444 capacity=384
Code that uses SBE also tends to reuse the buffers to reduce allocations. Even though we don’t care about the last field, the buffer may contains some bytes from the previous message that encroaches on the new field when we encode the new message.
5. Debugging and Testing with Base64 Encoding
How to troubleshoot SBE issues in production?
A couple of ways:
- Replay the SBE messages in your test harness (if you employ event sourcing pattern)
- Trawl through log files for problematic SBE messages
Let’s talk about (2). Below is a string representation of an SBE message.
xxxxxxxxxx
[NewOrderSingle](sbeTemplateId=1|sbeSchemaId=1|sbeSchemaVersion=1|sbeBlockLength=18):orderId=ORDER-001|tradeDate=15613|allocations=[(allocAccount=ACCOUNT-1|allocQty=100.0|nestedParties=[(nestedPartyID=Party-1|nestedPartyRole=|nestedPartyDescription='Party-1')]|allocDescription='ALLOCATION WITH ACCOUNT ACCOUNT-1'),(allocAccount=ACCOUNT-2|allocQty=200.0|nestedParties=[(nestedPartyID=Party-2|nestedPartyRole=|nestedPartyDescription='Party-2')]|allocDescription='ALLOCATION WITH ACCOUNT ACCOUNT-2')]|traderDescription='TRADER-0123456789'|orderDescription=''
Besides eyeballing it, is there a way to turn text into SBE bytes (i.e. similar to Protobuf TextFormat parser)? You can write one your own or look hard enough for SBE parsers on the internet. The only problem is that SBE string representation is somewhat arbitrary. It is not a well-defined language like JSON. SBE parsers can stop working if there is an extra pipe or parenthesis somewhere in the text.
Java 8 ‘s Base64 encoding comes to the rescue. We can print the SBE’s bytes as a string anywhere, be it in the log files or even in Junit test cases, and easily reconstruct the bytes later on. No longer need to worry about storing SBE messages as binary files. Yay!
xxxxxxxxxx
final int encodedLength = headerEncoder.encodedLength() + nosEncoder.encodedLength();
final byte[] bytes = new byte[encodedLength];
buffer.getBytes(0, bytes);
final String base64EncStr = Base64.getEncoder().encodeToString(bytes);
System.out.println(base64EncStr);
final byte[] decoderBytes = Base64.getDecoder().decode(base64EncStr);
final DirectBuffer decoderBuffer = new UnsafeBuffer(decoderBytes);
wrapDecoder(headerDecoder, nosDecoder, decoderBuffer, 0);
final String decoderToString = nosDecoder.toString();
System.out.println(decoderToString);
xxxxxxxxxx
SBE Base64 encoding string:
EgABAAEAAQBPUkRFUklELTAwMQAAAAAA/TwYAAIAQUNDT1VOVC0xAAAAAAAAAAAAAAAAAFlAIAABAFBhcnR5LTEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABwAAAFBhcnR5LTEhAAAAQUxMT0NBVElPTiBXSVRIIEFDQ09VTlQgQUNDT1VOVC0xQUNDT1VOVC0yAAAAAAAAAAAAAAAAAGlAIAABAFBhcnR5LTIAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABwAAAFBhcnR5LTIhAAAAQUxMT0NBVElPTiBXSVRIIEFDQ09VTlQgQUNDT1VOVC0yCAAAAFRSQURFUi0xFgAAAERVTU1ZIE5FVyBPUkRFUiBTSU5HTEU=
Best of luck on your SBE adventure and don’t forget to share the tips if you think they make your life easier!
Published at DZone with permission of Tommy Q. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments