Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Using Jolt in Big Data Streams to Remove Nulls

DZone's Guide to

Using Jolt in Big Data Streams to Remove Nulls

Learn how to use Jolt code within your big data streams to remove null values with some example source data and JSON code.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

In this article, we're going to learn about using Jolt code in your big data streams to remove null values.

Here's some JSON code for using Jolt in big data streams:

[
  {
    "operation": "default",
    "spec": {
      "address": "",
      "somesensorvalues[]": {
        "*": {
          "sensor1": false
        }
      },
      "startTime": "",
      "onStartTime": "",
      "markId": "",
      "markName": "",
      "stoppedTime": "",
      "startTime2": "",
      "powerSetting": "false",
      "speed": 0,
      "id": 0,
      "city": "",
      "state": ""
    }
  },
  {
    "operation": "shift",
    "spec": {
      "*": "&"
    }
  }
]

To help you understand, here's some example source data:

{
  "address" : "2000 Electric Avenue",
  "somesensordata" : [ {
    "sensor1" : null
  } ],
  "city" : "hightstown",
  "deviceId" : 5454545,
  "dateTime" : "2017-08-07 14:56:09",
  "id" : 6831491,
  "idle" : false,
  "startTime" : null,
  "onStartTime" : null,
  "markId" : null,
  "markName" : null,
  "zipCode" : "08520"
}

Image title

Yeah, sometimes you really don't want to see any nulls!

The above Jolt script will copy all the values in a source JSON document to a destination. For the explicitly named variables like speed, it will replace nulls with the value on the right. So for speed, it will put in a 0 to represent a null value.

For embedded lists of values (likesensor1  inside of somesensorvalues array), the syntax is a bit different.

References

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:
big data ,json ,jolt ,tutorial

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}