Platinum Partner
java,opinion,enterprise-integration,javascript,xml,json,jsonh

Just Drop to Binary if You're Going to Compress Your JSON

This post was written at 5:30AM.  I had this thought while doing research for another post, and I couldn’t really let it go.

XML, as a text base format, is really wasteful in space. But that wasn’t what really made it lose its shine. That happened when it became so complex that it stopped being human readable. For example, I give you:

 <?xml version="1.0" encoding="UTF-8" ?>

   <SOAP-ENV:Envelope

   xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance"

    xmlns:xsd="http://www.w3.org/1999/XMLSchema"

    xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">

    <SOAP-ENV:Body>

        <ns1:getEmployeeDetailsResponse

         xmlns:ns1="urn:MySoapServices"

         SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">

            <return xsi:type="ns1:EmployeeContactDetail">

                <employeeName xsi:type="xsd:string">Bill Posters</employeeName>

                <phoneNumber xsi:type="xsd:string">+1-212-7370194</phoneNumber>

                <tempPhoneNumber

                 xmlns:ns2="http://schemas.xmlsoap.org/soap/encoding/"

                 xsi:type="ns2:Array"

                 ns2:arrayType="ns1:TemporaryPhoneNumber[3]">

                    <item xsi:type="ns1:TemporaryPhoneNumber">

                        <startDate xsi:type="xsd:int">37060</startDate>

                        <endDate xsi:type="xsd:int">37064</endDate>

                        <phoneNumber xsi:type="xsd:string">+1-515-2887505</phoneNumber>

                    </item>

                    <item xsi:type="ns1:TemporaryPhoneNumber">

                        <startDate xsi:type="xsd:int">37074</startDate>

                        <endDate xsi:type="xsd:int">37078</endDate>

                        <phoneNumber xsi:type="xsd:string">+1-516-2890033</phoneNumber>

                    </item>

                    <item xsi:type="ns1:TemporaryPhoneNumber">

                        <startDate xsi:type="xsd:int">37088</startDate>

                        <endDate xsi:type="xsd:int">37092</endDate>

                        <phoneNumber xsi:type="xsd:string">+1-212-7376609</phoneNumber>

                    </item>

                </tempPhoneNumber>

            </return>

        </ns1:getEmployeeDetailsResponse>

    </SOAP-ENV:Body>

 /SOAP-ENV:Envelope>

After XML was banished from the company of respectable folks, we had JSON show up and entertain us. It is smaller and more concise than XML, and so far it has resisted the efforts to make it into some sort of a uber-complex enterprise-y tool.

But today, I ran into quite a few efforts in the community that try to do strange things to JSON. I am talking about things like JSON DB (a compressed JSON format, not an actual JSON database), JSONH, json.hpack and other things. All of these projects are attempts to reduce the size of JSON documents.

Let's take an example. The following code is a JSON document representing one of RavenDB's builds:

 {

   "BuildName": "RavenDB Unstable v2.5",

   "IsUnstable": true,

   "Version": "2509-Unstable",

   "PublishedAt": "2013-02-26T12:06:12.0000000",

   "DownloadsIds": [],

   "Changes": [

     {

       "Commiter": {

         "Email": "david@davidwalker.org",

         "Name": "David Walker"

       },

       "Version": "17c661cb158d5e3c528fe2c02a3346305f0234a3",

       "Href": "/app/rest/changes/id:21039",

       "TeamCityId": 21039,

       "Username": "david walker",

       "Comment": "Do not save Has-Api-Key header to metadata\n",

       "Date": "2013-02-20T23:22:43.0000000",

       "Files": [

         "Raven.Abstractions/Extensions/MetadataExtensions.cs"

       ]

     },

     {

       "Commiter": {

         "Email": "david@davidwalker.org",

         "Name": "David Walker"

       },

       "Version": "5ffb4d61ad9102696948f6678bbecac88e1dc039",

       "Href": "/app/rest/changes/id:21040",

       "TeamCityId": 21040,

       "Username": "david walker",

       "Comment": "Do not save IIS Application Request Routing headers to metadata\n",

       "Date": "2013-02-20T23:23:59.0000000",

       "Files": [

         "Raven.Abstractions/Extensions/MetadataExtensions.cs"

       ]

     },

     {

       "Commiter": {

         "Email": "ayende@ayende.com",

         "Name": "Ayende Rahien"

       },

        "Version": "5919521286735f50f963824a12bf121cd1df4367",

       "Href": "/app/rest/changes/id:21035",

       "TeamCityId": 21035,

       "Username": "ayende rahien",

       "Comment": "Better disposal\n",

       "Date": "2013-02-26T10:16:45.0000000",

       "Files": [

         "Raven.Client.WinRT/MissingFromWinRT/ThreadSleep.cs"

       ]

     },

     {

       "Commiter": {

         "Email": "ayende@ayende.com",

         "Name": "Ayende Rahien"

       },

       "Version": "c93264e2a94e2aa326e7308ab3909aa4077bc3bb",

       "Href": "/app/rest/changes/id:21036",

       "TeamCityId": 21036,

       "Username": "ayende rahien",

       "Comment": "Will ensure that the value is always positive or zero (never negative).\nWhen using numeric calc, will div by 1,024 to get more concentration into buckets.\n",

       "Date": "2013-02-26T10:17:23.0000000",

       "Files": [

         "Raven.Database/Indexing/IndexingUtil.cs"

       ]

     },

     {

       "Commiter": {

         "Email": "ayende@ayende.com",

         "Name": "Ayende Rahien"

       },

       "Version": "7bf51345d39c3993fed5a82eacad6e74b9201601",

       "Href": "/app/rest/changes/id:21037",

       "TeamCityId": 21037,

       "Username": "ayende rahien",

       "Comment": "Fixing a bug where we wouldn't decrement reduce stats for an index when multiple values from the same bucket are removed\n",

       "Date": "2013-02-26T10:53:01.0000000",

       "Files": [

         "Raven.Database/Indexing/MapReduceIndex.cs",

         "Raven.Database/Storage/Esent/StorageActions/MappedResults.cs",

         "Raven.Database/Storage/IMappedResultsStorageAction.cs",

         "Raven.Database/Storage/Managed/MappedResultsStorageAction.cs",

         "Raven.Tests/Issues/RavenDB_784.cs",

         "Raven.Tests/Storage/MappedResults.cs",

         "Raven.Tests/Views/ViewStorage.cs"

       ]

     },

     {

       "Commiter": {

         "Email": "ayende@ayende.com",

         "Name": "Ayende Rahien"

       },

       "Version": "ff2c5b43eba2a8a2206152658b5e76706e12945c",

       "Href": "/app/rest/changes/id:21038",

       "TeamCityId": 21038,

       "Username": "ayende rahien",

       "Comment": "No need for so many repeats\n",

       "Date": "2013-02-26T11:27:49.0000000",

       "Files": [

         "Raven.Tests/Bugs/MultiOutputReduce.cs"

       ]

     },

     {

       "Commiter": {

         "Email": "ayende@ayende.com",

         "Name": "Ayende Rahien"

       },

       "Version": "0620c74e51839972554fab3fa9898d7633cfea6e",

       "Href": "/app/rest/changes/id:21041",

       "TeamCityId": 21041,

       "Username": "ayende rahien",

       "Comment": "Merge branch 'master' of https://github.com/cloudbirdnet/ravendb into 2.1\n",

       "Date": "2013-02-26T11:41:39.0000000",

       "Files": [

         "Raven.Abstractions/Extensions/MetadataExtensions.cs"

       ]

     }

   ],

   "ResolvedIssues": [],

   "Contributors": [

     {

       "FullName": "Ayende Rahien",

       "Email": "ayende@ayende.com",

       "EmailHash": "730a9f9186e14b8da5a4e453aca2adfe"

     },

     {

       "FullName": "David Walker",

       "Email": "david@davidwalker.org",

       "EmailHash": "4e5293ab04bc1a4fdd62bd06e2f32871"

     }

   ],

    "BuildTypeId": "bt8",

   "Href": "/app/rest/builds/id:588",

   "ProjectName": "RavenDB",

   "TeamCityId": 588,

   "ProjectId": "project3",

   "Number": 2509

 }

This document is 4.52KB in size. Running this through JSONH gives us the following:

 [

     14,

     "BuildName",

     "IsUnstable",

     "Version",

     "PublishedAt",

     "DownloadsIds",

     "Changes",

     "ResolvedIssues",

     "Contributors",

     "BuildTypeId",

     "Href",

     "ProjectName",

     "TeamCityId",

      "ProjectId",

     "Number",

     "RavenDB Unstable v2.5",

     true,

     "2509-Unstable",

     "2013-02-26T12:06:12.0000000",

     [

     ],

     [

         {

             "Commiter": {

                 "Email": "david@davidwalker.org",

                 "Name": "David Walker"

             },

             "Version": "17c661cb158d5e3c528fe2c02a3346305f0234a3",

             "Href": "/app/rest/changes/id:21039",

             "TeamCityId": 21039,

             "Username": "david walker",

             "Comment": "Do not save Has-Api-Key header to metadata\n",

             "Date": "2013-02-20T23:22:43.0000000",

             "Files": [

                 "Raven.Abstractions/Extensions/MetadataExtensions.cs"

             ]

         },

         {

             "Commiter": {

                 "Email": "david@davidwalker.org",

                 "Name": "David Walker"

             },

             "Version": "5ffb4d61ad9102696948f6678bbecac88e1dc039",

             "Href": "/app/rest/changes/id:21040",

             "TeamCityId": 21040,

             "Username": "david walker",

             "Comment": "Do not save IIS Application Request Routing headers to metadata\n",

             "Date": "2013-02-20T23:23:59.0000000",

             "Files": [

                 "Raven.Abstractions/Extensions/MetadataExtensions.cs"

             ]

         },

         {

             "Commiter": {

                 "Email": "ayende@ayende.com",

                 "Name": "Ayende Rahien"

             },

             "Version": "5919521286735f50f963824a12bf121cd1df4367",

             "Href": "/app/rest/changes/id:21035",

             "TeamCityId": 21035,

             "Username": "ayende rahien",

             "Comment": "Better disposal\n",

             "Date": "2013-02-26T10:16:45.0000000",

             "Files": [

                 "Raven.Client.WinRT/MissingFromWinRT/ThreadSleep.cs"

             ]

         },

         {

             "Commiter": {

                 "Email": "ayende@ayende.com",

                 "Name": "Ayende Rahien"

             },

             "Version": "c93264e2a94e2aa326e7308ab3909aa4077bc3bb",

              "Href": "/app/rest/changes/id:21036",

             "TeamCityId": "...bug where we wouldn't decrement reduce stats for an index when multiple values from the same bucket are removed\n",

             "Date": "2013-02-26T10:53:01.0000000",

             "Files": [

                 "Raven.Database/Indexing/MapReduceIndex.cs",

                 "Raven.Database/Storage/Esent/StorageActions/MappedResults.cs",

                 "Raven.Database/Storage/IMappedResultsStorageAction.cs",

                 "Raven.Database/Storage/Managed/MappedResultsStorageAction.cs",

                 "Raven.Tests/Issues/RavenDB_784.cs",

                 "Raven.Tests/Storage/MappedResults.cs",

                 "Raven.Tests/Views/ViewStorage.cs"

             ]

         },

         {

             "Commiter": {

                 "Email": "ayende@ayende.com",

                 "Name": "Ayende Rahien"

             },

             "Version": "ff2c5b43eba2a8a2206152658b5e76706e12945c",

             "Href": "/app/rest/changes/id:21038",

             "TeamCityId": 21038,

             "Username": "ayende rahien",

              "Comment": "No need for so many repeats\n",

             "Date": "2013-02-26T11:27:49.0000000",

             "Files": [

                 "Raven.Tests/Bugs/MultiOutputReduce.cs"

             ]

         },

         {

             "Commiter": {

                 "Email": "ayende@ayende.com",

                 "Name": "Ayende Rahien"

              },

             "Version": "0620c74e51839972554fab3fa9898d7633cfea6e",

             "Href": "/app/rest/changes/id:21041",

             "TeamCityId": 21041,

             "Username": "ayende rahien",

             "Comment": "Merge branch 'master' of https://github.com/cloudbirdnet/ravendb into 2.1\n",

             "Date": "2013-02-26T11:41:39.0000000",

             "Files": [

                 "Raven.Abstractions/Extensions/MetadataExtensions.cs"

             ]

         }

     ],

     [

     ],

     [

         {

             "FullName": "Ayende Rahien",

              "Email": "ayende@ayende.com",

             "EmailHash": "730a9f9186e14b8da5a4e453aca2adfe"

         },

         {

             "FullName": "David Walker",

             "Email": "david@davidwalker.org",

             "EmailHash": "4e5293ab04bc1a4fdd62bd06e2f32871"

         }

     ],

     "bt8",

     "/app/rest/builds/id:588",

     "RavenDB",

     588,

     "project3",

     2509

 ]

It reduced the document size to 2.93KB! Awesome!  That's nearly half of the original size. Except – this is actually generating an utterly unreadable mess. Can you look at this and figure out what the hell is going on?

I thought not. At this point, we might as well use a binary format. I happen to have a zip tool at my disposal, so I checked what would happen if I threw this JSON through that. The end result was a 1.42KB file, and I had no more loss of readability than I did with the JSONH code.

To be frank, I just don’t get efforts like this. JSON is a text base in human readable format. If you lose the human readable portion of the format, you might as well drop directly to binary. It is likely to be more efficient, and you don’t lose anything by doing it.

If you want to compress your data, it is probably better to use something like a compression tool. HTTP Compression, for example, is practically free, since all servers and clients should be able to consume it now. And any tool that you use should be able to inspect through it. Plus, it's likely to generate much better results from your JSON documents than if you try a clever format like the one generated by JSONH.




Published at DZone with permission of {{ articles[0].authors[0].realName }}, DZone MVB. (source)

Opinions expressed by DZone contributors are their own.

{{ tag }}, {{tag}},

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}
{{ parent.authors[0].realName || parent.author}}

{{ parent.authors[0].tagline || parent.tagline }}

{{ parent.views }} ViewsClicks
Tweet

{{parent.nComments}}