Is Protobuf 5x Faster Than JSON? (Part 2)

Many articles out there claim that Protobuf is a better choice than JSON for performance reasons. But is that really true? The investigation continues here.

Tao Wen

Apr. 21, 17 · Opinion

Likes (8)

Comment

Save

19.3K Views

In Part 1, we started looking at some benchmarks regarding whether Protobuf is actually the better choice as opposed to JSON for performance reasons. Let's continue the discussion here.

Decode Object

We have proved JSON is slow for numeric input. What about object binding itself? Is the benchmarking result bad because binding is slow in JSON? Given that we are using 10 fields in the benchmark, let’s find out.

To make the game fair, we use a short and pure ASCII string field this time. The string copying performance should be very similar. So, the performance difference should come from the binding process.

message PbTestObject {
  string field1 = 1;
}

library	compared with Jackson	ns/op
Protobuf	2.52	68666.658
Thrift	2.74	63139.324
Jsoniter	5.78	29887.361
DSL-Json	5.32	32458.030
Jackson	1	172747.146

For 1 string field, Protobuf is actually slower than Jsoniter by 2.3x.

We can repeat the same test for 5 fields, 10 fields, and 15 fields to see a pattern.

message PbTestObject {
  string field1 = 1;
  string field2 = 2;
  string field3 = 3;
  string field4 = 4;
  string field5 = 5;
}

library	compared with Jackson	ns/op
Protobuf	1.3	276972.857
Thrift	1.44	250016.572
Jsoniter	2.5	143807.401
DSL-Json	2.41	149261.728
Jackson	1	359868.351

For 5 string fields, Protobuf is only 1.3x faster than Jackson. If you think JSON object binding is slow and that it will dominate the performance — you are wrong.

message PbTestObject {
  string field1 = 1;
  string field2 = 2;
  string field3 = 3;
  string field4 = 4;
  string field5 = 5;
  string field6 = 6;
  string field7 = 7;
  string field8 = 8;
  string field9 = 9;
  string field10 = 10;
}

library	compared with Jackson	ns/op
Protobuf	1.22	462167.920
Thrift	1.12	503725.605
Jsoniter	2.04	277531.128
DSL-Json	1.84	307569.103
Jackson	1	564942.726

For 10 string fields, Protobuf is only 1.22x faster than Jackson. I think you get the idea. The binding process in Protobuf is done by switch case:

boolean done = false;
while (!done) {
  int tag = input.readTag();
  switch (tag) {
    case 0:
      done = true;
      break;
    default: {
      if (!input.skipField(tag)) {
        done = true;
      }
      break;
    }
    case 10: {
      java.lang.String s = input.readStringRequireUtf8();
      field1_ = s;
      break;
    }
    case 18: {
      java.lang.String s = input.readStringRequireUtf8();
      field2_ = s;
      break;
    }
    case 26: {
      java.lang.String s = input.readStringRequireUtf8();
      field3_ = s;
      break;
    }
    case 34: {
      java.lang.String s = input.readStringRequireUtf8();
      field4_ = s;
      break;
    }
    case 42: {
      java.lang.String s = input.readStringRequireUtf8();
      field5_ = s;
      break;
    }
  }
}

This implementation is faster than Hashmap, but only marginal faster. DSL-JSON implemented the binding using similar hash dispatching:

switch(nameHash) {
case 1212206434:
_field1_ = com.dslplatform.json.StringConverter.deserialize(reader);
nextToken = reader.getNextToken();
break;
case 1178651196:
_field3_ = com.dslplatform.json.StringConverter.deserialize(reader);
nextToken = reader.getNextToken();
break;
case 1195428815:
_field2_ = com.dslplatform.json.StringConverter.deserialize(reader);
nextToken = reader.getNextToken();
break;
case 1145095958:
_field5_ = com.dslplatform.json.StringConverter.deserialize(reader);
nextToken = reader.getNextToken();
break;
case 1161873577:
_field4_ = com.dslplatform.json.StringConverter.deserialize(reader);
nextToken = reader.getNextToken();
break;
default:
nextToken = reader.skip();
break;
}

The hashing function is FNV:

long hash = 0x811c9dc5;
while (ci < buffer.length) {
final byte b = buffer[ci++];
if (b == '"') break;
hash ^= b;
hash *= 0x1000193;
}

Hash will collide, so use with caution. If the input is likely to contain an unknown field, use the slower version to check the string again once hash matched. Jsoniter has a decoding mode DYNAMIC_MODE_AND_MATCH_FIELD_STRICTLY, which will generate exact matching code:

switch (field.len()) {
case 6:
    if (field.at(0) == 102 &&
            field.at(1) == 105 &&
            field.at(2) == 101 &&
            field.at(3) == 108 &&
            field.at(4) == 100) {
        if (field.at(5) == 49) {
            obj.field1 = (java.lang.String) iter.readString();
            continue;
        }
        if (field.at(5) == 50) {
            obj.field2 = (java.lang.String) iter.readString();
            continue;
        }
        if (field.at(5) == 51) {
            obj.field3 = (java.lang.String) iter.readString();
            continue;
        }
        if (field.at(5) == 52) {
            obj.field4 = (java.lang.String) iter.readString();
            continue;
        }
        if (field.at(5) == 53) {
            obj.field5 = (java.lang.String) iter.readString();
            continue;
        }
    }
    break;
}
iter.skip();

DSL-JSON can also be changed to use exact string match again after hash matched. The numbers are:

library	compared with Jackson	ns/op
Jsoniter (hash mode)	2.13	274949.346
Jsoniter (strict mode)	1.95	300524.989
DSL-JSON (hash mode)	1.91	305812.208
DSL-JSON (strict mode)	1.71	343203.344
Jackson	1	585421.314

The conclusion for object binding is, given field names are short, the number tag dispatching approach is not significantly faster — even compared with Jackson.

Encode Object

For 1 field:

library	compared with Jackson	ns/op
Protobuf	1.22	57502.775
Thrift	0.86	137094.627
Jsoniter	2.06	57081.756
DSL-Json	2.46	47890.664
Jackson	1	117604.479

For 5 fields:

library	compared with Jackson	ns/op
Protobuf	1.68	127933.179
Thrift	0.46	467818.566
Jsoniter	2.54	84702.001
DSL-Json	2.68	80211.517
Jackson	1	214802.686

For 10 fields:

library	compared with Jackson	ns/op
Protobuf	1.72	194371.476
Thrift	0.38	888230.783
Jsoniter	2.59	129305.086
DSL-Json	2.56	130379.967
Jackson	1	334297.953

For object encoding, Protobuf is about 1.7x faster than Jackson, but it is slower than DSL-JSON.

The optimization of object encoding is to write out as many control bytes as possible in one write.

public void encode(Object obj, com.jsoniter.output.JsonStream stream) throws java.io.IOException {
if (obj == null) { stream.writeNull(); return; }
stream.write((byte)'{');
encode_((com.jsoniter.benchmark.with_1_string_field.TestObject)obj, stream);
stream.write((byte)'}');
}

public static void encode_(com.jsoniter.benchmark.with_1_string_field.TestObject obj, com.jsoniter.output.JsonStream stream) throws java.io.IOException {
boolean notFirst = false;
if (obj.field1 != null) {
if (notFirst) { stream.write(','); } else { notFirst = true; }
stream.writeRaw("\"field1\":", 9);
stream.writeVal((java.lang.String)obj.field1);
}
}

If we know the field is not nullable, even the quote of string can be merged and written out once.

public void encode(Object obj, com.jsoniter.output.JsonStream stream) throws java.io.IOException {
if (obj == null) { stream.writeNull(); return; }
stream.writeRaw("{\"field1\":\"", 11);
encode_((com.jsoniter.benchmark.with_1_string_field.TestObject)obj, stream);
stream.write((byte)'\"', (byte)'}');
}

public static void encode_(com.jsoniter.benchmark.with_1_string_field.TestObject obj, com.jsoniter.output.JsonStream stream) throws java.io.IOException {
com.jsoniter.output.CodegenAccess.writeStringWithoutQuote((java.lang.String)obj.field1, stream);
}

The conclusion of object encoding/decoding: Protobuf might be slightly faster than Jackson, but it is actually slower than DSL-JSON.

Decode Integer List

Protobuf has special support for packed integer lists.

22        // tag (field number 4, wire type 2)
06        // payload size (6 bytes)
03        // first element (varint 3)
8E 02     // second element (varint 270)
9E A7 05  // third element (varint 86942)

message PbTestObject {
  repeated int32 field1 = 1 [packed=true];
}

library	compared with Jackson	ns/op
Protobuf	2.92	249888.105
Thrift	3.63	201439.691
Jsoniter	2.97	245837.298
DSL-Json	1.97	370897.998
Jackson	1	730450.607

Protobuf is about 3x faster than Jackson, for integer list decoding. However, Jsoniter is as fast as Protobuf in this case.

The decoding loop in Jsoniter is unrolled:

public static java.lang.Object decode_(com.jsoniter.JsonIterator iter) throws java.io.IOException { 
    java.util.ArrayList col = (java.util.ArrayList)com.jsoniter.CodegenAccess.resetExistingObject(iter);
    if (iter.readNull()) { com.jsoniter.CodegenAccess.resetExistingObject(iter); return null; }
    if (!com.jsoniter.CodegenAccess.readArrayStart(iter)) {
        return col == null ? new java.util.ArrayList(0): (java.util.ArrayList)com.jsoniter.CodegenAccess.reuseCollection(col);
    }
    Object a1 = java.lang.Integer.valueOf(iter.readInt());
    if (com.jsoniter.CodegenAccess.nextToken(iter) != ',') {
        java.util.ArrayList obj = col == null ? new java.util.ArrayList(1): (java.util.ArrayList)com.jsoniter.CodegenAccess.reuseCollection(col);
        obj.add(a1);
        return obj;
    }
    Object a2 = java.lang.Integer.valueOf(iter.readInt());
    if (com.jsoniter.CodegenAccess.nextToken(iter) != ',') {
        java.util.ArrayList obj = col == null ? new java.util.ArrayList(2): (java.util.ArrayList)com.jsoniter.CodegenAccess.reuseCollection(col);
        obj.add(a1);
        obj.add(a2);
        return obj;
    }
    Object a3 = java.lang.Integer.valueOf(iter.readInt());
    if (com.jsoniter.CodegenAccess.nextToken(iter) != ',') {
        java.util.ArrayList obj = col == null ? new java.util.ArrayList(3): (java.util.ArrayList)com.jsoniter.CodegenAccess.reuseCollection(col);
        obj.add(a1);
        obj.add(a2);
        obj.add(a3);
        return obj;
    }
    Object a4 = java.lang.Integer.valueOf(iter.readInt());
    java.util.ArrayList obj = col == null ? new java.util.ArrayList(8): (java.util.ArrayList)com.jsoniter.CodegenAccess.reuseCollection(col);
    obj.add(a1);
    obj.add(a2);
    obj.add(a3);
    obj.add(a4);
    while (com.jsoniter.CodegenAccess.nextToken(iter) == ',') {
        obj.add(java.lang.Integer.valueOf(iter.readInt()));
    }
    return obj;
}

Encode Integer List

Integer list encoding should be fast in Protobuf, which does not need to write out all those commas.

library	compared with Jackson	ns/op
Protobuf	1.35	159337.360
Thrift	0.45	472555.572
Jsoniter	1.9	112770.811
DSL-Json	2.19	97998.250
Jackson	1	214409.223

Protobuf is only 1.35x faster than Jackson. Although integer object fields are faster in Protobuf, integer lists are not. There is no special optimization here. DSL-JSON is faster than Jackson because individual numbers are written out faster.

Decode Object List

The list is more frequently being used as a container for objects.

message PbTestObject {
  message ElementObject {
    string field1 = 1;
  }
  repeated ElementObject field1 = 1;
}

library	compared with Jackson	ns/op
Protobuf	1.26	1118704.310
Thrift	1.3	1078278.555
Jsoniter	2.91	483304.365
DSL-Json	2.22	635179.183
Jackson	1	1407116.476

Protobuf is about 1.3x faster than Jackson for object list decoding, but DSL-JSON is faster than Protobuf.

Encode Object List

library	compared with Jackson	ns/op
Protobuf	2.22	328219.768
Thrift	0.38	1885052.964
Jsoniter	3.63	200420.923
DSL-Json	3.87	187964.594
Jackson	1	727582.950

Protobuf is more than 2x faster than Jackson for object list encoding, but DSL-JSON is faster than Protobuf. It seems like Protobuf is not good at list encoding/decoding.

Decode Double Array

Java arrays are special. double[] is more efficient than List<Double>. It is very common to see an array of doubles to represent the value/coordinate of the time interval. However, the Protobuf Java library does not speak double[]. It will always use List<Double>. We can expect a win for JSON here.

message PbTestObject {
  repeated double field1 = 1 [packed=true];
}

library	compared with Jackson	ns/op
Protobuf	5.18	207503.316
Thrift	6.12	175678.703
Jsoniter	3.52	305553.080
DSL-Json	2.8	383549.289
Jackson	1	1075423.265

Protobuf is more than 5x faster than Jackson for double array decoding, but compared with Jsoniter, Protobuf is only 1.47x faster. So, if you have a lot of double numbers but they are in the array instead of on the fields, the performance difference is smaller.

The loop is unrolled in Jsoniter and optimized for a small array.

public static java.lang.Object decode_(com.jsoniter.JsonIterator iter) throws java.io.IOException {
... // abbreviated
 nextToken = com.jsoniter.CodegenAccess.nextToken(iter);
 if (nextToken == ']') {
     return new double[0];
 }
 com.jsoniter.CodegenAccess.unreadByte(iter);
 double a1 = iter.readDouble();
 if (!com.jsoniter.CodegenAccess.nextTokenIsComma(iter)) {
     return new double[]{ a1 };
 }
 double a2 = iter.readDouble();
 if (!com.jsoniter.CodegenAccess.nextTokenIsComma(iter)) {
     return new double[]{ a1, a2 };
 }
 double a3 = iter.readDouble();
 if (!com.jsoniter.CodegenAccess.nextTokenIsComma(iter)) {
     return new double[]{ a1, a2, a3 };
 }
 double a4 = (double) iter.readDouble();
 if (!com.jsoniter.CodegenAccess.nextTokenIsComma(iter)) {
     return new double[]{ a1, a2, a3, a4 };
 }
 double a5 = (double) iter.readDouble();
 double[] arr = new double[10];
 arr[0] = a1;
 arr[1] = a2;
 arr[2] = a3;
 arr[3] = a4;
 arr[4] = a5;
 int i = 5;
 while (com.jsoniter.CodegenAccess.nextTokenIsComma(iter)) {
     if (i == arr.length) {
         double[] newArr = new double[arr.length * 2];
         System.arraycopy(arr, 0, newArr, 0, arr.length);
         arr = newArr;
     }
     arr[i++] = iter.readDouble();
 }
 double[] result = new double[i];
 System.arraycopy(arr, 0, result, 0, i);
 return result;
}

Encode Double Array

library	compared with Jackson	ns/op
Protobuf	15.63	107760.788
Thrift	0.54	3125678.472
Jsoniter (only 6 digit precision)	6.74	249945.866
DSL-Json	1.14	1478332.248
Jackson	1	1684935.837

Protobuf is more than 15x faster than Jackson for double array encoding. Protobuf is 2.3x faster than Jsoniter with 6-digit precision. Again, double encoding is really really slow in JSON.

Decode String

JSON strings contain escaped characters. Protobuf can decode string by a simple array copy. The string tested is a 160-character ASCII.

syntax = "proto3";
option optimize_for = SPEED;
message PbTestObject {
  string field1 = 1;
}

library	compared with Jackson	ns/op
Protobuf	1.85	173680.548
Thrift	2.29	140635.170
Jsoniter	2.4	134067.924
DSL-Json	2.27	141419.108
Jackson	1	321406.155

Protobuf is 1.85x faster than Jackson for long string decoding. However, DSL-JSON is actually faster than Protobuf. There are a couple of interesting things happening here.

Fast Path

DSL-JSON implemented a fast path for ASCII.

for (int i = 0; i < chars.length; i++) {
bb = buffer[ci++];
if (bb == '"') {
currentIndex = ci;
return i;
}
// If we encounter a backslash, which is a beginning of an escape sequence
// or a high bit was set - indicating an UTF-8 encoded multibyte character,
// there is no chance that we can decode the string without instantiating
// a temporary buffer, so quit this loop
if ((bb ^ '\\') < 1) break;
chars[i] = (char) bb;
}

This fast path avoids the cost to process escaped char and UTF-8.

JVM Hotspot Optimization

Before JDK9, java.lang.String was char[]-based. The input is byte[] and UTF-8 encoded; we cannot copy directly from byte[] to char[]. In JDK9, java.lang.String is changed to be byte[]-based. If we take a look the JDK 9 source code:

@Deprecated(since="1.1")
public String(byte ascii[], int hibyte, int offset, int count) {
    checkBoundsOffCount(offset, count, ascii.length);
    if (count == 0) {
        this.value = "".value;
        this.coder = "".coder;
        return;
    }
    if (COMPACT_STRINGS && (byte)hibyte == 0) {
        this.value = Arrays.copyOfRange(ascii, offset, offset + count);
        this.coder = LATIN1;
    } else {
        hibyte <<= 8;
        byte[] val = StringUTF16.newBytesFor(count);
        for (int i = 0; i < count; i++) {
            StringUTF16.putChar(val, i, hibyte | (ascii[offset++] & 0xff));
        }
        this.value = val;
        this.coder = UTF16;
    }
}

Using the deprecated but still available constructor, we can use Arrays.copyOfRange to construct a java.lang.String now. However, after testing, it turns out this is not faster than DSL-JSON implementation.

It seems like the JVM Hotspot is doing some loop code pattern matching here. If the loop is written in this way, the string is constructed directly from copying byte[]. Even if in JDK9 with +UseCompactStrings, in theory, the byte[] > char[] > byte[] conversion should be slow. However, it turns out DSL-JSON implementation is still the fastest.

If the input is mostly string. Then this optimization is crucial. The art of parsing in Java is more about the art of copying bytes into a JVM-stupid java.lang.String. In a modern language like Go, the string is UTF-8 byte[]-based, which is wise.

Encode String

Similar problem. We cannot copy char[] into byte[].

library	compared with Jackson	ns/op
Protobuf	0.96	262077.921
Thrift	0.99	252140.935
Jsoniter	1.5	166381.978
DSL-Json	1.38	181008.120
Jackson	1	250431.354

Protobuf is slightly slower than Jackson when encoding long strings. Again, the char[]-based string is the problem here.

Skip Structure

JSON is a format without a header. Without a header, JSON will need to scan every byte to locate the field needed — even if other fields are not intended to be parsed.

message PbTestWriteObject {
  repeated string field1 = 1;
  message Field2 {
    repeated string field1 = 1;
    repeated string field2 = 2;
    repeated string field3 = 3;
  }
  Field2 field2 = 2;
  string field3 = 3;
}
message PbTestReadObject {
  string field3 = 3;
}

The message will be written using PbTestWriteObject and read by PbTestReadObject. field1 and field2 should be skipped.

library	compared with Jackson	ns/op
Protobuf	5.05	152194.483
Thrift	5.43	141467.209
Jsoniter	3.75	204704.100
DSL-Json	2.51	305784.845
Jackson	1	768840.597

Protobuf can skip the structure faster than Jackson by 5x. If skipping long string, the cost for JSON will be linear to the string size, while the cost of Protobuf is constant time.

Summary

It is the time to count the scores!

scenario	Protobuf V. Jackson	Protobuf V. Jsoniter	Jsoniter V. Jackson
Decode Integer	8.51	2.64	3.22
Encode Integer	2.9	1.44	2.02
Decode Double	13.75	4.4	3.13
Encode Double	12.71	1.96 (only 6 digits precision)	6.5
Decode Object	1.22	0.6	2.04
Encode Object	1.72	0.67	2.59
Decode Integer List	2.92	0.98	2.97
Encode Integer List	1.35	0.71	1.9
Decode Object List	1.26	0.43	2.91
Encode Object List	2.22	0.61	3.63
Decode Double Array	5.18	1.47	3.52
Encode Double Array	15.63	2.32 (only 6 digits precision)	6.74
Decode String	1.85	0.77	2.4
Encode String	0.96	0.63	1.5
Skip Structure	5.05	1.35	3.75

Worst case scenario for JSON:

Skip very long string — proportional to the string length.
Decode double field — Protobuf can be 4.4x faster.
Encode double field with precision — it can be 12.71x slower to write all the digits out.
Decode integer — Protobuf can be 2.64x faster.

If your real workload is unlike the worst case scenario mentioned above, but mostly composed of strings, then the speedup should be < 2x (compared against Jsoniter); Protobuf can even be slower if you are really unlucky.

JSON Jackson (API) Data Types Strings Object (computer science) Library Thrift (protocol) Data structure

Opinions expressed by DZone contributors are their own.

Related

Trending