Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Parsing Apache NiFi Records to HBase

DZone's Guide to

Parsing Apache NiFi Records to HBase

Learn how to use SiteToSiteProvenanceReportingTask to send provenance to Apache NiFi for processing and read the streaming provenance data from Apache NiFi.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

We're eating our own provenance food!

It's almost comically easy to do this. You set up a task on the server you are reporting on that sends the data to your receiver. You make a simple flow to the other server to ingest and process that. I stored it to HBase as JSON, as it's a good place to put a lot of data quickly.

Send the data:

You need to create a SiteToSiteProvenanceReportingTask in Controller Settings > Reporting Tasks. It's pretty simple. Set the values above with your destination NiFi server and a port name that you have created already.

Receive the data and process:

An individual JSON record:

Split the JSON into records:

$.[*]

To save to HBase (PutHBaseJSON), create a table.

hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.1.2.2.6.2.0-205, r5210d2ed88d7e241646beab51e9ac147a973bdcc, Sat Aug 26 09:33:50 UTC 2017
hbase(main):001:0> create 'PROVENANCE', 'event'
0 row(s) in 2.9900 seconds
=> Hbase::Table - PROVENANCE

scan 'PROVENANCE'

 ff91e204-05b0-48aa-a666-7942e3f109ab                     column=event:previousAttributes, timestamp=1517159115042, value={"path":"./","filename":"humidity.583225-583284.log","s2s.address":"192.168.1.197:55032","s2s.host":"1
                                                          92.168.1.197","mime.type":"text/plain","uuid":"9006a1bb-d755-4272-b8d3-76e666c2a7c6","tailfile.original.path":"/opt/demo/logs/humidity.log"}
 ff91e204-05b0-48aa-a666-7942e3f109ab                     column=event:previousContentURI, timestamp=1517159115042, value=http://192.168.1.193:8080/nifi-api/provenance-events/61825/content/input
 ff91e204-05b0-48aa-a666-7942e3f109ab                     column=event:previousEntitySize, timestamp=1517159115042, value=59
 ff91e204-05b0-48aa-a666-7942e3f109ab                     column=event:processGroupId, timestamp=1517159115042, value=01611005-4e82-1491-ae5d-ca64f59491cb
 ff91e204-05b0-48aa-a666-7942e3f109ab                     column=event:processGroupName, timestamp=1517159115042, value=Process MiniFi Creator
 ff91e204-05b0-48aa-a666-7942e3f109ab                     column=event:timestamp, timestamp=1517159115042, value=2018-01-28T00:25:30.616Z
 ff91e204-05b0-48aa-a666-7942e3f109ab                     column=event:timestampMillis, timestamp=1517159115042, value=1517099130616
 ff91e204-05b0-48aa-a666-7942e3f109ab                     column=event:updatedAttributes, timestamp=1517159115042, value={"RouteOnAttribute.Route":"humidity"}
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:actorHostname, timestamp=1517159114898, value=192.168.1.193
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:application, timestamp=1517159114898, value=NiFi Flow
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:childIds, timestamp=1517159114898, value=[]
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:componentId, timestamp=1517159114898, value=3a25cda9-0161-1000-813c-631724a10585
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:componentName, timestamp=1517159114898, value=RouteOnAttribute
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:componentType, timestamp=1517159114898, value=RouteOnAttribute
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:contentURI, timestamp=1517159114898, value=http://192.168.1.193:8080/nifi-api/provenance-events/61701/content/output
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:durationMillis, timestamp=1517159114898, value=-1
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:entityId, timestamp=1517159114898, value=9b017666-7ce9-45c5-9d0a-2f81e56d6fa8
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:entitySize, timestamp=1517159114898, value=16
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:entityType, timestamp=1517159114898, value=org.apache.nifi.flowfile.FlowFile
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:eventOrdinal, timestamp=1517159114898, value=61701
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:eventType, timestamp=1517159114898, value=ROUTE
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:lineageStart, timestamp=1517159114898, value=1517084974341
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:parentIds, timestamp=1517159114898, value=[]
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:platform, timestamp=1517159114898, value=nifi
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:previousAttributes, timestamp=1517159114898, value={"path":"./","filename":"uv.164064-164080.log","s2s.address":"192.168.1.197:55032","s2s.host":"192.168
                                                          .1.197","mime.type":"text/plain","uuid":"9b017666-7ce9-45c5-9d0a-2f81e56d6fa8","tailfile.original.path":"/opt/demo/logs/uv.log"}
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:previousContentURI, timestamp=1517159114898, value=http://192.168.1.193:8080/nifi-api/provenance-events/61701/content/input
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:previousEntitySize, timestamp=1517159114898, value=16
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:processGroupId, timestamp=1517159114898, value=01611005-4e82-1491-ae5d-ca64f59491cb
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:processGroupName, timestamp=1517159114898, value=Process MiniFi Creator
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:timestamp, timestamp=1517159114898, value=2018-01-28T00:25:30.607Z
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:timestampMillis, timestamp=1517159114898, value=1517099130607
 ffde140c-3053-4b9d-89c6-14b68025384d                     column=event:updatedAttributes, timestamp=1517159114898, value={"RouteOnAttribute.Route":"uv"}
1830 row(s) in 11.7680 seconds

And that's it!

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Topics:
apache nifi ,hbase ,big data ,parsing ,tutorial

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}