DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • How to Push Docker Images to AWS Elastic Container Repository Using GitHub Actions
  • How to Use Jenkins Effectively With ECS/EKS Cluster
  • Container Checkpointing in Kubernetes With a Custom API
  • Leveraging Seekable OCI: AWS Fargate for Containerized Microservices

Trending

  • Agentic Testing: Moving Quality From Checkpoint to Control Layer
  • S3 Vectors: How to Build a RAG Without a Vector Database
  • Ujorm3: A New Lightweight ORM for JavaBeans and Records
  • The ORM Is Over: AI-Written SQL Is the New Data Access Layer
  1. DZone
  2. Coding
  3. Tools
  4. CockroachDB CDC With Hadoop Ozone S3 Gateway and Docker Compose - Part 4

CockroachDB CDC With Hadoop Ozone S3 Gateway and Docker Compose - Part 4

This is the fourth tutorial post on CockroachDB and Docker Compose. Today, we'll evaluate the Hadoop Ozone object store for CockroachDB object-store sink viability.

By 
Artem Ervits user avatar
Artem Ervits
DZone Core CORE ·
Jan. 04, 22 · Tutorial
Likes (3)
Comment
Save
Tweet
Share
5.2K Views

Join the DZone community and get the full member experience.

Join For Free

This is the fourth in the series of tutorials on CockroachDB and Docker Compose.

Today, we're going to evaluate the Hadoop Ozone object store for CockroachDB object-store sink viability. A bit of caution, this article only explores the art of possible, please use the ideas in this article at your own risk! Firstly, Hadoop Ozone is a new object store Hadoop Community is working on. It exposes an S3 API backed by HDFS and can scale to billions of files on-prem!

You can find the older posts here: Part 1, Part 2, Part 3.

  • Information on CockroachDB can be found here.
  • Information on Docker Compose can be found here
  • Information on Hadoop Ozone can be found here
  1. Download ozone 0.4.1 distro
wget -O hadoop-ozone-0.4.1-alpha.tar.gz https://www-us.apache.org/dist/hadoop/ozone/ozone-0.4.1-alpha/hadoop-ozone-0.4.1-alpha.tar.gz
tar xvzf hadoop-ozone-0.4.1-alpha.tar.gz


  1. Modify the compose file for Ozone to include CRDB
cd ozone-0.4.1-alpha/compose


Notice the plethora of compose recipes available here!

We will focus on the ozones3 as we need the S3 gateway. As a homework exercise, try ozones3-haproxy once you're done with this tutorial. I can see a lot of interesting use cases with that!

cd ozones3


Edit the file and add Cockroach:

   crdb:
      image: cockroachdb/cockroach:v21.2.3
      container_name: crdb-1
      ports:
         - "26257:26257"
         - "8080:8080"
      command: start-single-node --insecure
      volumes:
         - ${PWD}/cockroach-data/data:/cockroach/cockroach-data:rw


The whole docker-compose file should look like so now:

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

version: "3"
services:
   datanode:
      image: apache/ozone-runner:${HADOOP_RUNNER_VERSION}
      volumes:
        - ../..:/opt/hadoop
      ports:
        - 9864
      command: ["ozone","datanode"]
      env_file:
        - ./docker-config
   om:
      image: apache/ozone-runner:${HADOOP_RUNNER_VERSION}
      volumes:
         - ../..:/opt/hadoop
      ports:
         - 9874:9874
      environment:
         ENSURE_OM_INITIALIZED: /data/metadata/om/current/VERSION
      env_file:
          - ./docker-config
      command: ["ozone","om"]
   scm:
      image: apache/ozone-runner:${HADOOP_RUNNER_VERSION}
      volumes:
         - ../..:/opt/hadoop
      ports:
         - 9876:9876
      env_file:
          - ./docker-config
      environment:
          ENSURE_SCM_INITIALIZED: /data/metadata/scm/current/VERSION
      command: ["ozone","scm"]
   s3g:
      image: apache/ozone-runner:${HADOOP_RUNNER_VERSION}
      volumes:
         - ../..:/opt/hadoop
      ports:
         - 9878:9878
      env_file:
          - ./docker-config
      command: ["ozone","s3g"]
   crdb:
      image: cockroachdb/cockroach:v21.2.3
      container_name: crdb-1
      ports:
         - "26257:26257"
         - "8080:8080"
      command: start-single-node --insecure
      volumes:
         - ${PWD}/cockroach-data/data:/cockroach/cockroach-data:rw


  1. Start docker-compose with CRDB and Ozone.

By default, Ozone will start with a single data node, we're going to start it with 3 data nodes at once.

docker-compose up -d --scale=datanode=3
Creating network "ozones3_default" with the default driver
Creating ozones3_s3g_1      ... done
Creating ozones3_om_1       ... done
Creating ozones3_datanode_1 ... done
Creating ozones3_datanode_2 ... done
Creating ozones3_datanode_3 ... done
Creating crdb-1             ... done
Creating ozones3_scm_1      ... done


  1. Check logs for om and s3g
docker logs `ozones3_s3g_1`
docker logs `ozones3_om_1`


To make sure everything works and S3, as well as Ozone Manager, are up.

2020-01-06 16:30:42 INFO  BaseHttpServer:207 - HTTP server of S3GATEWAY is listening at http://0.0.0.0:9878
2020-01-06 16:30:50 INFO  BaseHttpServer:207 - HTTP server of OZONEMANAGER is listening at http://0.0.0.0:9874


  1. Browse the UI.

Ozone exposes a few UIs via HTTP, specifically:

  • HDFS Storage Container Manager: http://localhost:9876/#!/
  • Gateway: http://localhost:9878/static/

After the bucket is created, you can browse to it:

http://localhost:9878/bucket1?browser

  1. Create a bucket.
aws s3api --endpoint http://localhost:9878/ create-bucket --bucket=ozonebucket
{
    "Location": "http://localhost:9878/ozonebucket"
}


  1. Upload a file to the bucket.
touch test
aws s3 --endpoint http://localhost:9878 cp test s3://bucket1/test
artem@Artems-MBP ozones3 % aws s3 --endpoint http://localhost:9878 cp test s3://ozonebucket/test
upload: ./test to s3://ozonebucket/test


You can browse the bucket using UI, hit refresh if necessary.

http://localhost:9878/ozonebucket?browser

You can also use aws API:

aws s3 ls --endpoint http://localhost:9878 s3://ozonebucket
artem@Artems-MBP ozones3 % aws s3 ls --endpoint http://localhost:9878 s3://ozonebucket
2020-01-06 12:59:39          0 test


  1. Setup a changefeed in CRDB to point to Ozone.

The steps here are not much different from the Minio changefeed described in the previous post.

Access the cockroach CLI.

docker exec -it crdb-1 ./cockroach sql --insecure
SET CLUSTER SETTING cluster.organization = '<organization name>';

SET CLUSTER SETTING enterprise.license = '<secret>';

SET CLUSTER SETTING kv.rangefeed.enabled = true;

CREATE DATABASE cdc_demo;

SET DATABASE = cdc_demo;

CREATE TABLE office_dogs (
     id INT PRIMARY KEY,
     name STRING);

INSERT INTO office_dogs VALUES
   (1, 'Petee'),
   (2, 'Carl');

UPDATE office_dogs SET name = 'Petee H' WHERE id = 1;


  1. Create an Ozone-specific changefeed.
CREATE CHANGEFEED FOR TABLE office_dogs INTO 'experimental-s3://ozonebucket/dogs?AWS_ACCESS_KEY_ID=dummy&AWS_SECRET_ACCESS_KEY=dummy&AWS_ENDPOINT=http://ozones3_s3g_1:9878' with updated;
root@:26257/cdc_demo> CREATE CHANGEFEED FOR TABLE office_dogs INTO 'experimental-s3://ozonebucket/dogs?AWS_ACCESS_KEY_ID=dummy&AWS_SECRET_ACCESS_KEY=dummy&AWS_ENDPOINT=http://ozones3_s3g_1:9878' with updated;
        job_id
+--------------------+
  518597966522974209
(1 row)

Time: 20.3764ms


The AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are set with dummy data to make changefeed work, Ozone needs to run in Kerberos mode to configure AWS secrets, see this

At this point, go back to the S3 UI and make sure dogs directory is created. Alas, the directory is there and if you browse to the farthest child directory, you will notice the JSON file.

Again, modifying the rows in the table will produce new files on the filesystem.

UPDATE office_dogs SET name = 'Beathoven' WHERE id = 1;


Clicking on the file will open a new browser tab with the following data:

{"after": {"id": 1, "name": "Beathoven"}, "key": [1], "updated": "1578334166605122300.0000000000"}


We can also confirm the files are there with CLI:

artem@Artems-MBP ozones3 % aws s3 ls --endpoint http://localhost:9878 s3://ozonebucket/dogs/2020-01-06/
2020-01-06 13:05:45        191 202001061805395834869000000000000-aa12c96bd4b5919c-1-2-00000000-office_dogs-1.ndjson
2020-01-06 13:09:28         99 202001061808465775434000000000001-aa12c96bd4b5919c-1-2-00000001-office_dogs-1.ndjson
aws s3 cp --quiet --endpoint http://localhost:9878 s3://ozonebucket/dogs/2020-01-06/202001061808465775434000000000001-aa12c96bd4b5919c-1-2-00000001-office_dogs-1.ndjson /dev/stdout
{"after": {"id": 1, "name": "Beathoven"}, "key": [1], "updated": "1578334166605122300.0000000000"}


Hope you enjoyed this tutorial and come back for more! Please share your feedback in the comments.

This article only scratches the surface, for everything, there is to learn about Hadoop and
Ozone, navigate to their respective websites.
AWS hadoop Docker (software) CockroachDB

Published at DZone with permission of Artem Ervits. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • How to Push Docker Images to AWS Elastic Container Repository Using GitHub Actions
  • How to Use Jenkins Effectively With ECS/EKS Cluster
  • Container Checkpointing in Kubernetes With a Custom API
  • Leveraging Seekable OCI: AWS Fargate for Containerized Microservices

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook