Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Big Data DevOps (Part 2): Schemas!

DZone's Guide to

Big Data DevOps (Part 2): Schemas!

Schemas, schemas, schemas. Know your records, know your data types, know your fields, and know your data.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Since we can process records in Apache NiFi, Streaming Analytics Manager, Apache Kafka, and any tool that can work with a schema, we have a real need to use a Schema Registry. I have mentioned them before. One thing that is important is to be able to automate the management of schemas. Today, we will be listing and exporting them for backup and migration purposes. We will also cover how to upload new schemas and version of schemas.

The steps to back up schemas with Apache NiFi 1.5+ is easy.

  1. GetHTTP: Get the list of schemas for SR via GET.
  2. SplitJson: Turn the list into individual records.
  3. EvaluateJsonPath: Get the schema name.
  4. InvokeHTTP: Get the schema body.
  5. EvaluateJsonPath: Turn the schema text into a separate flow file.
  6. Rename and save both the full JSON record from the registry and the schema only.

NiFi flow:

Initial call to list all schemas:

Get the schema name:

Example schema with text:

An example of JSON schema text:

Build a new flow file from the schema text JSON:

Get the latest version of the schema text for this schema by name:

The list returned:

Swagger documentation for SR:

Schema list JSON formatting:

"entities" : [ { "schemaMetadata" : { "type" : "avro", "schemaGroup" : "Kafka", "name" : "adsb", "description" : "adsb", "compatibility" : "BACKWARD", "validationLevel" : "ALL", "evolve" : true }, "id" : 3, "timestamp" : 1520460239420

Get Schema List REST URL (GET).

.

If you wish, you can use the Confluent Style API against SR and against the Confluent Schema Registry. It is slightly different but it's easy to change our REST calls to process this.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:
big data ,apache nifi ,streaming analytics ,big data analytics ,apache kafka ,tutorial

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}