Big Data DevOps (Part 2): Schemas!

DZone 's Guide to

Big Data DevOps (Part 2): Schemas!

Schemas, schemas, schemas. Know your records, know your data types, know your fields, and know your data.

· Big Data Zone ·
Free Resource

Since we can process records in Apache NiFi, Streaming Analytics Manager, Apache Kafka, and any tool that can work with a schema, we have a real need to use a Schema Registry. I have mentioned them before. One thing that is important is to be able to automate the management of schemas. Today, we will be listing and exporting them for backup and migration purposes. We will also cover how to upload new schemas and version of schemas.

The steps to back up schemas with Apache NiFi 1.5+ is easy.

  1. GetHTTP: Get the list of schemas for SR via GET.
  2. SplitJson: Turn the list into individual records.
  3. EvaluateJsonPath: Get the schema name.
  4. InvokeHTTP: Get the schema body.
  5. EvaluateJsonPath: Turn the schema text into a separate flow file.
  6. Rename and save both the full JSON record from the registry and the schema only.

NiFi flow:

Initial call to list all schemas:

Get the schema name:

Example schema with text:

An example of JSON schema text:

Build a new flow file from the schema text JSON:

Get the latest version of the schema text for this schema by name:

The list returned:

Swagger documentation for SR:

Schema list JSON formatting:

"entities" : [ { "schemaMetadata" : { "type" : "avro", "schemaGroup" : "Kafka", "name" : "adsb", "description" : "adsb", "compatibility" : "BACKWARD", "validationLevel" : "ALL", "evolve" : true }, "id" : 3, "timestamp" : 1520460239420

Get Schema List REST URL (GET).


If you wish, you can use the Confluent Style API against SR and against the Confluent Schema Registry. It is slightly different but it's easy to change our REST calls to process this.

apache kafka, apache nifi, big data, big data analytics, streaming analytics, tutorial

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}