DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
  1. DZone
  2. Software Design and Architecture
  3. Microservices
  4. Getting Started with Apache Avro: Part 1

Getting Started with Apache Avro: Part 1

Rishav Rohit user avatar by
Rishav Rohit
·
Feb. 24, 14 · Interview
Like (3)
Save
Tweet
Share
44.58K Views

Join the DZone community and get the full member experience.

Join For Free

In our previous post we got some basic idea about Avro. In this post we will use Avro for serializing and deserializing data.

We will use these 3 methods in which we can use Avro for serialization/deserialization:

  1. Using Avro command line tools.
  2. Using Avro Java API without code generation.
  3. Using Avro Java API with code generation.

Sample Data

We will use below sample data (StudentActivity.json):

{"id":"A91D021BA58444B29D4D42CA5E39F7BF","student_id":100,"university_id":908,"course_details":{"course_id":100,"enroll_date":"2012-02-13 00:00:00.000000000","verb":"completed","result_score":0.9}}
{"id":"502A77CC99B241CB94CA356F5218F1A9","student_id":101,"university_id":112,"course_details":{"course_id":233,"enroll_date":"2011-06-08 00:00:00.000000000","verb":"started","result_score":0.65}}
{"id":"5D04CD5ABF014D6EBA237766F9B470DE","student_id":102,"university_id":340,"course_details":{"course_id":339,"enroll_date":"2012-03-06 00:00:00.000000000","verb":"started","result_score":0.57}}

Note that the JSON records are nested ones.


Defining a Schema

Avro schemas are defined using JSON. The avro schema for our sample data is defined as below (StudentActivity.avsc):

{
    "namespace": "com.rishav.avro",
    "type": "record",
    "name": "StudentActivity",
    "fields": [
        {
            "name": "id",
            "type": "string"
        },
        {
            "name": "student_id",
            "type": "int"
        },
        {
            "name": "university_id",
            "type": "int"
        },
        {
            "name": "course_details",
            "type": {
                "name": "Activity",
                "type": "record",
                "fields": [
                    {
                        "name": "course_id",
                        "type": "int"
                    },
                    {
                        "name": "enroll_date",
                        "type": "string"
                    },
                    {
                        "name": "verb",
                        "type": "string"
                    },
                    {
                        "name": "result_score",
                        "type": "double"
                    }
                ]
            }
        }
    ]
}


1. Serialization/Deserialization using Avro command line tools

Avro provides a jar file by name avro-tools-<version>.jar which provides many command line tools as listed below:

$ java -jar avro-tools-1.7.5.jar 
Version 1.7.5 of Apache Avro
Copyright 2010 The Apache Software Foundation

This product includes software developed at
The Apache Software Foundation (http://www.apache.org/).

C JSON parsing provided by Jansson and
written by Petri Lehtinen. The original software is
available from http://www.digip.org/jansson/.

----------------

Available tools:
  cat  extracts samples from files
  compile  Generates Java code for the given schema.
  concat  Concatenates avro files without re-compressing.
  fragtojson  Renders a binary-encoded Avro datum as JSON.
  fromjson  Reads JSON records and writes an Avro data file.
  fromtext  Imports a text file into an avro data file.
  getmeta  Prints out the metadata of an Avro data file.
  getschema  Prints out schema of an Avro data file.
  idl  Generates a JSON schema from an Avro IDL file
 idl2schemata  Extract JSON schemata of the types from an Avro IDL file
  induce  Induce schema/protocol from Java class/interface via reflection.
  jsontofrag  Renders a JSON-encoded Avro datum as binary.
  random  Creates a file with randomly generated instances of a schema.
  recodec  Alters the codec of a data file.
  rpcprotocol  Output the protocol of a RPC service
  rpcreceive  Opens an RPC Server and listens for one message.
  rpcsend  Sends a single RPC message.
  tether  Run a tethered mapreduce job.
  tojson  Dumps an Avro data file as JSON, one record per line.
  totext  Converts an Avro data file to a text file.
  totrevni  Converts an Avro data file to a Trevni file.
  trevni_meta  Dumps a Trevni file's metadata as JSON.
trevni_random  Create a Trevni file filled with random instances of a schema.
trevni_tojson  Dumps a Trevni file as JSON.

For converting json sample data to Avro binary format use "fromjson" option and for getting json data back from Avro files use "tojson" option.

Command for SerializingJSON

Without any compression

java -jar avro-tools-1.7.5.jar fromjson --schema-file StudentActivity.avsc StudentActivity.json > StudentActivity.avro

With snappy compression

java -jar avro-tools-1.7.5.jar fromjson --schema-file StudentActivity.avsc StudentActivity.json > StudentActivity.snappy.avro

Command for Deserializing JSON

The same command is used for deserializing both compressed and uncompressed data

java -jar avro-tools-1.7.5.jar tojson StudentActivity.avro
java -jar avro-tools-1.7.5.jar tojson StudentActivity.snappy.avro

As Avro data file contains the schema also, we can retrieve it using this commmand:

java -jar avro-tools-1.7.5.jar getschema StudentActivity.avro
java -jar avro-tools-1.7.5.jar getschema StudentActivity.snappy.avro

In our next post we will use Avro Java API for serialization/deserialization.



avro

Published at DZone with permission of Rishav Rohit, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Shift-Left: A Developer's Pipe(line) Dream?
  • 5 Steps for Getting Started in Deep Learning
  • Introduction to Spring Cloud Kubernetes
  • OpenVPN With Radius and Multi-Factor Authentication

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: