Getting Started with Apache Avro: Part 1
Join the DZone community and get the full member experience.
Join For FreeIn our previous post we got some basic idea about Avro. In this post we will use Avro for serializing and deserializing data.
We will use these 3 methods in which we can use Avro for serialization/deserialization:
- Using Avro command line tools.
- Using Avro Java API without code generation.
- Using Avro Java API with code generation.
Sample Data
We will use below sample data (StudentActivity.json):
{"id":"A91D021BA58444B29D4D42CA5E39F7BF","student_id":100,"university_id":908,"course_details":{"course_id":100,"enroll_date":"2012-02-13 00:00:00.000000000","verb":"completed","result_score":0.9}}
{"id":"502A77CC99B241CB94CA356F5218F1A9","student_id":101,"university_id":112,"course_details":{"course_id":233,"enroll_date":"2011-06-08 00:00:00.000000000","verb":"started","result_score":0.65}}
{"id":"5D04CD5ABF014D6EBA237766F9B470DE","student_id":102,"university_id":340,"course_details":{"course_id":339,"enroll_date":"2012-03-06 00:00:00.000000000","verb":"started","result_score":0.57}}
Note that the JSON records are nested ones.
Defining a Schema
Avro schemas are defined using JSON. The avro schema for our sample data is defined as below (StudentActivity.avsc):
{
"namespace": "com.rishav.avro",
"type": "record",
"name": "StudentActivity",
"fields": [
{
"name": "id",
"type": "string"
},
{
"name": "student_id",
"type": "int"
},
{
"name": "university_id",
"type": "int"
},
{
"name": "course_details",
"type": {
"name": "Activity",
"type": "record",
"fields": [
{
"name": "course_id",
"type": "int"
},
{
"name": "enroll_date",
"type": "string"
},
{
"name": "verb",
"type": "string"
},
{
"name": "result_score",
"type": "double"
}
]
}
}
]
}
1. Serialization/Deserialization using Avro command line tools
Avro provides a jar file by name avro-tools-<version>.jar which provides many command line tools as listed below:
$ java -jar avro-tools-1.7.5.jar
Version 1.7.5 of Apache Avro
Copyright 2010 The Apache Software Foundation
This product includes software developed at
The Apache Software Foundation (http://www.apache.org/).
C JSON parsing provided by Jansson and
written by Petri Lehtinen. The original software is
available from http://www.digip.org/jansson/.
----------------
Available tools:
cat extracts samples from files
compile Generates Java code for the given schema.
concat Concatenates avro files without re-compressing.
fragtojson Renders a binary-encoded Avro datum as JSON.
fromjson Reads JSON records and writes an Avro data file.
fromtext Imports a text file into an avro data file.
getmeta Prints out the metadata of an Avro data file.
getschema Prints out schema of an Avro data file.
idl Generates a JSON schema from an Avro IDL file
idl2schemata Extract JSON schemata of the types from an Avro IDL file
induce Induce schema/protocol from Java class/interface via reflection.
jsontofrag Renders a JSON-encoded Avro datum as binary.
random Creates a file with randomly generated instances of a schema.
recodec Alters the codec of a data file.
rpcprotocol Output the protocol of a RPC service
rpcreceive Opens an RPC Server and listens for one message.
rpcsend Sends a single RPC message.
tether Run a tethered mapreduce job.
tojson Dumps an Avro data file as JSON, one record per line.
totext Converts an Avro data file to a text file.
totrevni Converts an Avro data file to a Trevni file.
trevni_meta Dumps a Trevni file's metadata as JSON.
trevni_random Create a Trevni file filled with random instances of a schema.
trevni_tojson Dumps a Trevni file as JSON.
For converting json sample data to Avro binary format use "fromjson" option and for getting json data back from Avro files use "tojson" option.
Command for SerializingJSON
Without any compression
java -jar avro-tools-1.7.5.jar fromjson --schema-file StudentActivity.avsc StudentActivity.json > StudentActivity.avro
With snappy compression
java -jar avro-tools-1.7.5.jar fromjson --schema-file StudentActivity.avsc StudentActivity.json > StudentActivity.snappy.avro
Command for Deserializing JSON
The same command is used for deserializing both compressed and uncompressed data
java -jar avro-tools-1.7.5.jar tojson StudentActivity.avro
java -jar avro-tools-1.7.5.jar tojson StudentActivity.snappy.avro
As Avro data file contains the schema also, we can retrieve it using this commmand:
java -jar avro-tools-1.7.5.jar getschema StudentActivity.avro
java -jar avro-tools-1.7.5.jar getschema StudentActivity.snappy.avro
In our next post we will use Avro Java API for serialization/deserialization.
Published at DZone with permission of Rishav Rohit, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments