DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • Container Checkpointing in Kubernetes With a Custom API
  • Leveraging Seekable OCI: AWS Fargate for Containerized Microservices
  • Deploying Dockerized Applications on AWS Lambda: A Step-by-Step Guide
  • Strategic Deployments in AWS: Leveraging IaC for Cross-Account Efficiency

Trending

  • Building a Real-Time Audio Transcription System With OpenAI’s Realtime API
  • Build a Simple REST API Using Python Flask and SQLite (With Tests)
  • Introducing Graph Concepts in Java With Eclipse JNoSQL
  • Enforcing Architecture With ArchUnit in Java
  1. DZone
  2. Coding
  3. Tools
  4. CockroachDB CDC With Hadoop Ozone S3 Gateway and Docker Compose - Part 4

CockroachDB CDC With Hadoop Ozone S3 Gateway and Docker Compose - Part 4

This is the fourth tutorial post on CockroachDB and Docker Compose. Today, we'll evaluate the Hadoop Ozone object store for CockroachDB object-store sink viability.

By 
Artem Ervits user avatar
Artem Ervits
DZone Core CORE ·
Jan. 04, 22 · Tutorial
Likes (3)
Comment
Save
Tweet
Share
5.0K Views

Join the DZone community and get the full member experience.

Join For Free

This is the fourth in the series of tutorials on CockroachDB and Docker Compose.

Today, we're going to evaluate the Hadoop Ozone object store for CockroachDB object-store sink viability. A bit of caution, this article only explores the art of possible, please use the ideas in this article at your own risk! Firstly, Hadoop Ozone is a new object store Hadoop Community is working on. It exposes an S3 API backed by HDFS and can scale to billions of files on-prem!

You can find the older posts here: Part 1, Part 2, Part 3.

  • Information on CockroachDB can be found here.
  • Information on Docker Compose can be found here
  • Information on Hadoop Ozone can be found here
  1. Download ozone 0.4.1 distro
wget -O hadoop-ozone-0.4.1-alpha.tar.gz https://www-us.apache.org/dist/hadoop/ozone/ozone-0.4.1-alpha/hadoop-ozone-0.4.1-alpha.tar.gz
tar xvzf hadoop-ozone-0.4.1-alpha.tar.gz


  1. Modify the compose file for Ozone to include CRDB
cd ozone-0.4.1-alpha/compose


Notice the plethora of compose recipes available here!

We will focus on the ozones3 as we need the S3 gateway. As a homework exercise, try ozones3-haproxy once you're done with this tutorial. I can see a lot of interesting use cases with that!

cd ozones3


Edit the file and add Cockroach:

   crdb:
      image: cockroachdb/cockroach:v21.2.3
      container_name: crdb-1
      ports:
         - "26257:26257"
         - "8080:8080"
      command: start-single-node --insecure
      volumes:
         - ${PWD}/cockroach-data/data:/cockroach/cockroach-data:rw


The whole docker-compose file should look like so now:

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

version: "3"
services:
   datanode:
      image: apache/ozone-runner:${HADOOP_RUNNER_VERSION}
      volumes:
        - ../..:/opt/hadoop
      ports:
        - 9864
      command: ["ozone","datanode"]
      env_file:
        - ./docker-config
   om:
      image: apache/ozone-runner:${HADOOP_RUNNER_VERSION}
      volumes:
         - ../..:/opt/hadoop
      ports:
         - 9874:9874
      environment:
         ENSURE_OM_INITIALIZED: /data/metadata/om/current/VERSION
      env_file:
          - ./docker-config
      command: ["ozone","om"]
   scm:
      image: apache/ozone-runner:${HADOOP_RUNNER_VERSION}
      volumes:
         - ../..:/opt/hadoop
      ports:
         - 9876:9876
      env_file:
          - ./docker-config
      environment:
          ENSURE_SCM_INITIALIZED: /data/metadata/scm/current/VERSION
      command: ["ozone","scm"]
   s3g:
      image: apache/ozone-runner:${HADOOP_RUNNER_VERSION}
      volumes:
         - ../..:/opt/hadoop
      ports:
         - 9878:9878
      env_file:
          - ./docker-config
      command: ["ozone","s3g"]
   crdb:
      image: cockroachdb/cockroach:v21.2.3
      container_name: crdb-1
      ports:
         - "26257:26257"
         - "8080:8080"
      command: start-single-node --insecure
      volumes:
         - ${PWD}/cockroach-data/data:/cockroach/cockroach-data:rw


  1. Start docker-compose with CRDB and Ozone.

By default, Ozone will start with a single data node, we're going to start it with 3 data nodes at once.

docker-compose up -d --scale=datanode=3
Creating network "ozones3_default" with the default driver
Creating ozones3_s3g_1      ... done
Creating ozones3_om_1       ... done
Creating ozones3_datanode_1 ... done
Creating ozones3_datanode_2 ... done
Creating ozones3_datanode_3 ... done
Creating crdb-1             ... done
Creating ozones3_scm_1      ... done


  1. Check logs for om and s3g
docker logs `ozones3_s3g_1`
docker logs `ozones3_om_1`


To make sure everything works and S3, as well as Ozone Manager, are up.

2020-01-06 16:30:42 INFO  BaseHttpServer:207 - HTTP server of S3GATEWAY is listening at http://0.0.0.0:9878
2020-01-06 16:30:50 INFO  BaseHttpServer:207 - HTTP server of OZONEMANAGER is listening at http://0.0.0.0:9874


  1. Browse the UI.

Ozone exposes a few UIs via HTTP, specifically:

  • HDFS Storage Container Manager: http://localhost:9876/#!/
  • Gateway: http://localhost:9878/static/

After the bucket is created, you can browse to it:

http://localhost:9878/bucket1?browser

  1. Create a bucket.
aws s3api --endpoint http://localhost:9878/ create-bucket --bucket=ozonebucket
{
    "Location": "http://localhost:9878/ozonebucket"
}


  1. Upload a file to the bucket.
touch test
aws s3 --endpoint http://localhost:9878 cp test s3://bucket1/test
artem@Artems-MBP ozones3 % aws s3 --endpoint http://localhost:9878 cp test s3://ozonebucket/test
upload: ./test to s3://ozonebucket/test


You can browse the bucket using UI, hit refresh if necessary.

http://localhost:9878/ozonebucket?browser

You can also use aws API:

aws s3 ls --endpoint http://localhost:9878 s3://ozonebucket
artem@Artems-MBP ozones3 % aws s3 ls --endpoint http://localhost:9878 s3://ozonebucket
2020-01-06 12:59:39          0 test


  1. Setup a changefeed in CRDB to point to Ozone.

The steps here are not much different from the Minio changefeed described in the previous post.

Access the cockroach CLI.

docker exec -it crdb-1 ./cockroach sql --insecure
SET CLUSTER SETTING cluster.organization = '<organization name>';

SET CLUSTER SETTING enterprise.license = '<secret>';

SET CLUSTER SETTING kv.rangefeed.enabled = true;

CREATE DATABASE cdc_demo;

SET DATABASE = cdc_demo;

CREATE TABLE office_dogs (
     id INT PRIMARY KEY,
     name STRING);

INSERT INTO office_dogs VALUES
   (1, 'Petee'),
   (2, 'Carl');

UPDATE office_dogs SET name = 'Petee H' WHERE id = 1;


  1. Create an Ozone-specific changefeed.
CREATE CHANGEFEED FOR TABLE office_dogs INTO 'experimental-s3://ozonebucket/dogs?AWS_ACCESS_KEY_ID=dummy&AWS_SECRET_ACCESS_KEY=dummy&AWS_ENDPOINT=http://ozones3_s3g_1:9878' with updated;
root@:26257/cdc_demo> CREATE CHANGEFEED FOR TABLE office_dogs INTO 'experimental-s3://ozonebucket/dogs?AWS_ACCESS_KEY_ID=dummy&AWS_SECRET_ACCESS_KEY=dummy&AWS_ENDPOINT=http://ozones3_s3g_1:9878' with updated;
        job_id
+--------------------+
  518597966522974209
(1 row)

Time: 20.3764ms


The AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are set with dummy data to make changefeed work, Ozone needs to run in Kerberos mode to configure AWS secrets, see this

At this point, go back to the S3 UI and make sure dogs directory is created. Alas, the directory is there and if you browse to the farthest child directory, you will notice the JSON file.

Again, modifying the rows in the table will produce new files on the filesystem.

UPDATE office_dogs SET name = 'Beathoven' WHERE id = 1;


Clicking on the file will open a new browser tab with the following data:

{"after": {"id": 1, "name": "Beathoven"}, "key": [1], "updated": "1578334166605122300.0000000000"}


We can also confirm the files are there with CLI:

artem@Artems-MBP ozones3 % aws s3 ls --endpoint http://localhost:9878 s3://ozonebucket/dogs/2020-01-06/
2020-01-06 13:05:45        191 202001061805395834869000000000000-aa12c96bd4b5919c-1-2-00000000-office_dogs-1.ndjson
2020-01-06 13:09:28         99 202001061808465775434000000000001-aa12c96bd4b5919c-1-2-00000001-office_dogs-1.ndjson
aws s3 cp --quiet --endpoint http://localhost:9878 s3://ozonebucket/dogs/2020-01-06/202001061808465775434000000000001-aa12c96bd4b5919c-1-2-00000001-office_dogs-1.ndjson /dev/stdout
{"after": {"id": 1, "name": "Beathoven"}, "key": [1], "updated": "1578334166605122300.0000000000"}


Hope you enjoyed this tutorial and come back for more! Please share your feedback in the comments.

This article only scratches the surface, for everything, there is to learn about Hadoop and
Ozone, navigate to their respective websites.
AWS hadoop Docker (software) CockroachDB

Published at DZone with permission of Artem Ervits. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Container Checkpointing in Kubernetes With a Custom API
  • Leveraging Seekable OCI: AWS Fargate for Containerized Microservices
  • Deploying Dockerized Applications on AWS Lambda: A Step-by-Step Guide
  • Strategic Deployments in AWS: Leveraging IaC for Cross-Account Efficiency

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!