Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Using Jolt in Big Data Streams to Remove Nulls

DZone's Guide to

Using Jolt in Big Data Streams to Remove Nulls

Learn how to use Jolt code within your big data streams to remove null values with some example source data and JSON code.

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

In this article, we're going to learn about using Jolt code in your big data streams to remove null values.

Here's some JSON code for using Jolt in big data streams:

[
  {
    "operation": "default",
    "spec": {
      "address": "",
      "somesensorvalues[]": {
        "*": {
          "sensor1": false
        }
      },
      "startTime": "",
      "onStartTime": "",
      "markId": "",
      "markName": "",
      "stoppedTime": "",
      "startTime2": "",
      "powerSetting": "false",
      "speed": 0,
      "id": 0,
      "city": "",
      "state": ""
    }
  },
  {
    "operation": "shift",
    "spec": {
      "*": "&"
    }
  }
]

To help you understand, here's some example source data:

{
  "address" : "2000 Electric Avenue",
  "somesensordata" : [ {
    "sensor1" : null
  } ],
  "city" : "hightstown",
  "deviceId" : 5454545,
  "dateTime" : "2017-08-07 14:56:09",
  "id" : 6831491,
  "idle" : false,
  "startTime" : null,
  "onStartTime" : null,
  "markId" : null,
  "markName" : null,
  "zipCode" : "08520"
}

Image title

Yeah, sometimes you really don't want to see any nulls!

The above Jolt script will copy all the values in a source JSON document to a destination. For the explicitly named variables like speed, it will replace nulls with the value on the right. So for speed, it will put in a 0 to represent a null value.

For embedded lists of values (likesensor1  inside of somesensorvalues array), the syntax is a bit different.

References

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

Topics:
big data ,json ,jolt ,tutorial

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}