DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • 7 Microservices Best Practices for Developers
  • Minions in Minikube - A Kubernetes Intro for Java Developers
  • Data Ingestion Into Azure Data Explorer Using Kafka Connect
  • Deploy Application on Open-Shift that Requires Elevated Privileges on Specific Paths

Trending

  • Intro to RAG: Foundations of Retrieval Augmented Generation, Part 2
  • Go 1.24+ Native FIPS Support for Easier Compliance
  • The Role of AI in Identity and Access Management for Organizations
  • Exploring Intercooler.js: Simplify AJAX With HTML Attributes
  1. DZone
  2. Software Design and Architecture
  3. Cloud Architecture
  4. Setting Up a CrateDB Cluster With Kubernetes to Store and Query Machine Data

Setting Up a CrateDB Cluster With Kubernetes to Store and Query Machine Data

Using open source CrateDB + Kubernetes for machine data.

By 
Mika Naylor user avatar
Mika Naylor
·
May. 13, 20 · Tutorial
Likes (4)
Comment
Save
Tweet
Share
8.9K Views

Join the DZone community and get the full member experience.

Join For Free

Because of its horizontally scalable shared-nothing architecture, the CrateDB open source database is well-suited for working with Kubernetes. Setting up a CrateDB cluster with Kubernetes can be done in just a few steps, and scaling up and down is straightforward – making the cluster particularly flexible. This step-by-step tutorial will show you how to get CrateDB and Kubernetes working together.

CrateDB is used for real-time machine data processing, monitoring, and analytics. The open source database is suited for applications with high volumes of machine data (like anomaly detection), log data (like ecommerce), network data (like capacity planning), and IoT/IIoT data (like smart manufacturing, smart home products, and fitness gear). However, this database is probably not what you want to use if you require strong (ACID) transactional consistency or highly normalized schemas with many tables and joins.

Kubernetes: From Pods, Controllers, and Services

(Skip ahead to the next section if you don’t need a lot of Kubernetes background.)

Container orchestration is the management, deployment and scaling of containerized systems. Within a Kubernetes cluster, at least one node must act as the master; the number of slaves is arbitrary. The containers are intelligently distributed across all Kubernetes nodes. Different Kubernetes components run on different servers depending on their function, with diverse instances of these components coordinating across multiple machines. To define the state of a Kubernetes cluster, three concepts are particularly important: pods, controllers, and services.

Pods

A Kubernetes pod represents a single computing unit, and thus the basic building block of a Kubernetes system. A pod can be a single container or several that are closely linked. For example, if a web application is deployed, a pod executes a single instance of the application. Pods can be scaled up horizontally by adding replica pods, or scaled down by removing them. More complex applications often require more than one container. All containers in a pod share a common network interface, and each container has access to the storage volumes assigned to the pod. The official CrateDB Docker Image is very suitable as a single container pod; a combination of several can create a CrateDB cluster of any size.

Controllers

Controllers are used to create pods and perform management functions. Controllers manage sets of pods according to specified specifications, and Kubernetes provides several controllers for different purposes. For example, containers should ideally be stateless to eliminate negative effects if a container is destroyed, rebuilt, or moved to another server. Stateless containers are suitable for web applications that maintain the state compared to an external database. However, databases themselves require persistent storage: data shouldn’t be lost just because a container is rescheduled. To solve this, Kubernetes provides the StatefulSet controller that assigns each pod a fixed identity and a fixed storage space, which are retained during restarts and rescheduling. The controller creates all pods within a stateful set from the same template, but they are not interchangeable. 

Services

Since the pods can be stopped, started, and rescheduled to any Kubernetes node, their assigned IP addresses change over time. However, client applications shouldn’t have to deal with changing IP addresses. That's what Kubernetes services are for: they serve as static interfaces providing access to one or more pods. A typical service is a load balancer that distributes incoming queries across the entire cluster.

Understanding these Kubernetes concepts is foundational to understanding configurations for the CrateDB cluster.

Setting up a Kubernetes Cluster

Minikube provides a solution for executing Kubernetes locally, giving you a simple and powerful method of getting started with Kubernetes. Minikube can work with various hypervisors as a VM runtime, and is set up for use with the popular cross-platform option VirtualBox by default. If a compatible hypervisor such as VirtualBox is installed on the system, Minikube recognizes it and automatically sets up the VM. In addition, the standard command line kubectl is required, which controls the Kubernetes Cluster manager.

Once these three components have been installed, the system can be started. By default, Minikube allocates 1GB of memory to the VM. This can be adjusted as required, as in the following example using 4GB, or “memory 4096”: 

Java
 




xxxxxxxxxx
1
28


 
1
$ minikube start --memory 4096
2
 
          
3
Starting local Kubernetes v1.10.0 cluster...
4
Starting VM...
5
Downloading Minikube ISO
6
160.27 MB / 160.27 MB [======================================] 100.00% 0s
7
Getting VM IP address...
8
Moving files into cluster...
9
Setting up certs...
10
Connecting to cluster...
11
Setting up kubeconfig...
12
Starting cluster components...
13
Kubectl is now configured to use the cluster.
14
Loading cached images from config file.


 

To prepare the newly created Kubernetes cluster for use, Minikube now automatically configures kubectl. This can be checked with the following command:

Java
 




xxxxxxxxxx
1


 
1
$ kubectl get nodes
2
 
          
3
 
4
 
          
5
NAME       STATUS     ROLES     AGE       VERSION
6
 
          
7
minikube   Ready      master    4m        v1.10.0


 

With the help of namespaces, Kubernetes divides the physical cluster into several areas. Technically speaking, no extra namespace need to be created for the CrateDB cluster, but it’s advisable to keep an overview of the resources. The following command creates a new namespace:

Java
 




xxxxxxxxxx
1


 
1
$ kubectl create namespace crate
2
 
          
3
 
4
 
          
5
namespace/crate created


 

Now, if you query the existing namespaces, the newly created “crate” appears. While the default namespace is used if no other is specified, "kube-public" stands for all resources that are publicly available, and "kube-system" for the resources used internally by Kubernetes.

Java
 




xxxxxxxxxx
1
13


 
1
$ kubectl get namespaces
2
 
          
3
 
4
 
          
5
NAME          STATUS    AGE
6
 
          
7
default       Active     32m
8
 
          
9
kube-public   Active     31m
10
 
          
11
kube-system   Active     32m
12
 
          
13
crate         Active    59s


 

Setting up CrateDB Services

For CrateDB to function, each CrateDB node must be able to communicate with the other nodes in the cluster. To accomplish this, a Kubernetes service named crate-internal-service.yaml is created that displays on all pods that have the label “app: crate”. Labels are key/value pairs that stick to objects (such as pods) in order to give them attributes without changing their semantics. All CrateDB pods must be given the “app: crate” label. In addition, the following code defines a fixed IP address, and makes the service available on port 4300, the standard port that CrateDB uses for communication between the nodes.

Here is the configuration:

Java
 




xxxxxxxxxx
1
33


 
1
kind: Service
2
 
          
3
apiVersion: v1
4
 
          
5
metadata:
6
 
          
7
  name: crate-internal-service
8
 
          
9
  labels:
10
 
          
11
    app: crate
12
 
          
13
spec:
14
 
          
15
  # A static IP address is assigned to this service. This IP address is
16
 
          
17
  # only reachable from within the Kubernetes cluster.
18
 
          
19
  type: ClusterIP
20
 
          
21
  ports:
22
 
          
23
    # Port 4300 for inter-node communication.
24
 
          
25
  - port: 4300
26
 
          
27
    name: crate-internal
28
 
          
29
  selector:
30
 
          
31
    # Apply this to all nodes with the `app:crate` label.
32
 
          
33
    app: crate


 

Now the service can be created:

Java
 




xxxxxxxxxx
1


 
1
$ kubectl create -f crate-internal-service.yaml --namespace crate
2
 
          
3
 service/crate-internal created
4
 
          


 

Kubernetes generates SRV records, which can be used to propagate the services of the cluster via DNS. In a later step, these can be used to set up CrateDB Unicast Host Discovery.

So that clients can also run queries on CrateDB, it must be possible to address the pods externally. For this purpose, an external service (crate-external-service) is created. Like the internal service, it refers to all pods with the “app: crate” label. Kubernetes will now create an external load balancer. Typically, such a service is only available with a hosted solution. In this case, however, Kubernetes uses the load balancer that the hosted solution provides.

This results in the following configuration:

Java
 




xxxxxxxxxx
1
37


 
1
kind: Service
2
 
          
3
apiVersion: v1
4
 
          
5
metadata:
6
 
          
7
  name: crate-external-service
8
 
          
9
  labels:
10
 
          
11
    app: crate
12
 
          
13
spec:
14
 
          
15
  # Create an externally reachable load balancer.
16
 
          
17
  type: LoadBalancer
18
 
          
19
  ports:
20
 
          
21
    # Port 4200 for HTTP clients.
22
 
          
23
  - port: 4200
24
 
          
25
    name: crate-web
26
 
          
27
    # Port 5432 for PostgreSQL wire protocol clients.
28
 
          
29
  - port: 5432
30
 
          
31
    name: postgres
32
 
          
33
  selector:
34
 
          
35
    # Apply this to all nodes with the `app:crate` label.
36
 
          
37
    app: crate


 

Now the external service can be created:

Java
 




xxxxxxxxxx
1


 
1
$ kubectl create -f crate-external-service.yaml --namespace crate
2
 
          
3
service/crate-external created


 

Defining the CrateDB Controller

The CrateDB cluster interfaces have now been created with the services. Next, a controller is needed to assemble and manage the cluster. The configuration for crate-controller.yaml contains the following points:

  • The Kubernetes controller creates pods: crate-0, crate-1, crate-2, etc.
  • The controller creates a stateful set called “crate-set”. This requires three CrateDB pods with a fixed identity and persistent storage.
  • Each pod has the “app: crate” label so that it can be addressed with the previously-created services.
  • Init containers (specialized containers that run within a pod in front of the app container) are used to configure the appropriate memory map limit so that CrateDB passes the bootstrap check. Such checks are carried out automatically in order to identify runtime problems.
  • 512MB is allocated to each pod, so that the cluster uses 1.5GB of the total of 4GB. This leaves room for growth. 
  • The CrateDB containers that get each pod running are defined, Using version 4.1.4 of the CrateDB Docker image.
  • The crate-internal-service creates the SRV records.
  • Each pod provides various ports: port 4300 for communication within each node, port 4200 for HTTP clients, and port 5432 for PostgreSQL Wire Protocol clients.
  • Environment Variables are defined. Here CrateDB configures the size of the usable memory (CRATE HEAP SIZE) as 256 MB, or 50 percent of the available memory.
  • To facilitate a quick start, RAM Drive serves as a temporary storage solution.
Java
 




xxxxxxxxxx
1
173


 
1
kind: StatefulSet
2
 
          
3
apiVersion: "apps/v1"
4
 
          
5
metadata:
6
 
          
7
  # This is the name used as a prefix for all pods in the set.
8
 
          
9
  name: crate
10
 
          
11
spec:
12
 
          
13
  serviceName: "crate-set"
14
 
          
15
  # Our cluster has three nodes.
16
 
          
17
  replicas: 3
18
 
          
19
  selector:
20
 
          
21
    matchLabels:
22
 
          
23
      # The pods in this cluster have the `app:crate` app label.
24
 
          
25
      app: crate
26
 
          
27
  template:
28
 
          
29
    metadata:
30
 
          
31
      labels:
32
 
          
33
        app: crate
34
 
          
35
    spec:
36
 
          
37
      # InitContainers run before the main containers of a pod are 
38
 
          
39
      # started, and they must terminate before the primary containers 
40
 
          
41
      # are initialized. Here, we use one to set the correct memory
42
 
          
43
      # map limit.
44
 
          
45
      initContainers:
46
 
          
47
      - name: init-sysctl
48
 
          
49
        image: busybox
50
 
          
51
        imagePullPolicy: IfNotPresent
52
 
          
53
        command: ["sysctl", "-w", "vm.max_map_count=262144"]
54
 
          
55
        securityContext:
56
 
          
57
          privileged: true
58
 
          
59
      # This final section is the core of the StatefulSet configuration. 
60
 
          
61
      # It defines the container to run in each pod.
62
 
          
63
      containers:
64
 
          
65
      - name: crate
66
 
          
67
        # Use the CrateDB 4.1.4 Docker image.
68
 
          
69
        image: crate:4.1.4
70
 
          
71
        # Pass in configuration to CrateDB via command-line options. 
72
 
          
73
        # Give the initial master nodes by their node names.
74
 
          
75
        # This has just to be done by initial cluster creation.
76
 
          
77
        command:
78
 
          
79
          - /docker-entrypoint.sh
80
 
          
81
          - -Cnode.name=${POD_Name}
82
 
          
83
          - -Ccluster.name=${CLUSTER_NAME}
84
 
          
85
          - -Ccluster.initial_master_nodes=crate-0,crate-1,crate-2
86
 
          
87
          - -Cdiscovery.seed_providers=srv
88
 
          
89
          - -Cdiscovery.srv.query=_crate-internal._tcp.crate-internal-service.${NAMESPACE}. svc.cluster.local
90
 
          
91
          - -Cgateway.recover_after_nodes=2
92
 
          
93
          - -Cgateway.expected_nodes=${EXPECTED_NODES}
94
 
          
95
          - -Cpath.data=/data
96
 
          
97
        volumeMounts:
98
 
          
99
              # Mount the `/data` directory as a volume named `data`.
100
 
          
101
            - mountPath: /data
102
 
          
103
              name: data
104
 
          
105
        resources:
106
 
          
107
           limits:
108
 
          
109
            # How much memory each pod gets.
110
 
          
111
            memory: 512Mi
112
 
          
113
        ports:
114
 
          
115
          # Port 4300 for inter-node communication.
116
 
          
117
        - containerPort: 4300
118
 
          
119
          name: crate-internal
120
 
          
121
          # Port 4200 for HTTP clients.
122
 
          
123
        - containerPort: 4200
124
 
          
125
          name: crate-web
126
 
          
127
          # Port 5432 for PostgreSQL wire protocol clients.
128
 
          
129
        - containerPort: 5432
130
 
          
131
          name: postgres
132
 
          
133
        # Environment variables passed through to the container.
134
 
          
135
        env:
136
 
          
137
          # This is variable is detected by CrateDB.
138
 
          
139
        - name: CRATE_HEAP_SIZE
140
 
          
141
          value: "256m"
142
 
          
143
          # The rest of these variables are used in the command-line
144
 
          
145
          # options.
146
 
          
147
        - name: EXPECTED_NODES
148
 
          
149
          value: "3"
150
 
          
151
        - name: CLUSTER_NAME
152
 
          
153
          value: "my-crate"
154
 
          
155
        - name: NAMESPACE
156
 
          
157
          valueFrom:
158
 
          
159
            fieldRef:
160
 
          
161
              fieldPath: metadata.namespace
162
 
          
163
      volumes:
164
 
          
165
        # Use a RAM drive for storage which is fine for testing, but must
166
 
          
167
        # not be used for production setups!
168
 
          
169
        - name: data
170
 
          
171
          emptyDir:
172
 
          
173
            medium: "Memory"


 

After the configuration has been saved, the controller can be created:

Java
 




xxxxxxxxxx
1


 
1
$ kubectl create -f crate-controller.yaml --namespace crate
2
 
          
3
statefulset.apps/crate-controller created


 

The StatefulSet controller produces each CrateDB pod individually. This process can be observed with the following command:

Java
 




xxxxxxxxxx
1


 
1
$ kubectl get pods --namespace crate
2
 
          
3
 
4
 
          
5
NAME      READY      STATUS            RESTARTS   AGE
6
 
          
7
crate-0   0/1        PodInitializing   0          36s
8
 
          


 

Finally, the CrateDB cluster is fully initialized:

Java
 




xxxxxxxxxx
1
11


 
1
$ kubectl get pods --namespace crate
2
 
          
3
 
4
 
          
5
NAME      READY      STATUS    RESTARTS   AGE
6
 
          
7
crate-0   1/1        Running   0          2m
8
 
          
9
crate-1   1/1       Running    0          1m
10
 
          
11
crate-2   1/1        Running   0          1m


 

Accessing the CrateDB Cluster

Before anyone can access CrateDB, the external service must be running:

Java
 




xxxxxxxxxx
1


 
1
$ kubectl get service --namespace crate
2
 
          
3
 
4
 
          
5
NAME                    TYPE          CLUSTER-IP      EXTERNAL-IP  PORT(S)                        AGE
6
 
          
7
crate-external-service  LoadBalancer   10.96.227.26     <pending>     4200:31159/TCP,5432:31316/TCP  44m
8
 
          
9
crate-internal-service  ClusterIP      10.101.192.101  <none>       4300/TCP                       44m


 

The “PORT(S)” column shows that Kubernetes port 31159 is connected to CrateDB port 4200 (HTTP) and Kubernetes port 31316 is connected to CrateDB port 5432 (PostgreSQL Wire Protocol). Due to a peculiarity of Minikube, the status of the external IP is still indicated with "pending". This requires a workaround.

First, the Minikube services are queried separately:

Java
 




xxxxxxxxxx
1
18


 
1
$ minikube service list --namespace crate
2
 
          
3
 
4
 
          
5
|------------|------------------------|--------------------------------|
6
 
          
7
| NAMESPACE  |          NAME          |              URL               |
8
 
          
9
|------------|------------------------|--------------------------------|
10
 
          
11
| my-cratedb | crate-external-service | http://192.168.99.100:31159    |
12
 
          
13
|            |                        | http://192.168.99.100:31316    |
14
 
          
15
| my-cratedb | crate-internal-service | No node port                   |
16
 
          
17
|------------|------------------------|--------------------------------|
18
 
          


 

Two ports (192.168.99.100) are displayed, but both have HTTP as the specification. This is correct for the CrateDB HTTP port, but not for the PostgreSQL port. For the example described here, the HTTP port is 31159, and its functionality can be checked with a simple HTTP request. If the HTTP API response looks like this, everything works as expected:

Java
 




xxxxxxxxxx
1
29


 
1
$ curl 192.168.99.100:31159
2
 
          
3
 
4
 
          
5
{
6
 
          
7
  "ok" : true,
8
 
          
9
  "status" : 200,
10
 
          
11
  "name" : "Regenstein",
12
 
          
13
  "cluster_name" : "my-crate",
14
 
          
15
  "version" : {
16
 
          
17
  "number" : "4.1.4”,
18
 
          
19
  "build_hash" : "6a9f8ebc5fefd63f666caa6f28e29b4b214ac7fc",
20
 
          
21
  "build_timestamp" : "2020-03-20T10:40:21Z",
22
 
          
23
  "build_snapshot" : false,
24
 
          
25
  "lucene_version" : 8.4.0"
26
 
          
27
  }
28
 
          
29
}


 

The network address (in this tutorial it’s 192.168.99.100:31159) can now be copied into the browser. The following CrateDB Admin UI should appear:

Clicking on the Cluster screen in the left navigation menu shows that the CrateDB cluster has three nodes as expected:

 

In the "Getting Started" guide from Crate.io you can find more details on importing test data and creating queries.

Configuring Persistent Storage

In practice, you’ll want to be sure data in the cluster can survive typical power cycling scenarios (switching the hardware off and on again) without damage. So far, the last lines of the example controller file crate-controller.yaml look like this:

Java
 




xxxxxxxxxx
1
11


 
1
            volumes:
2
 
          
3
            # Use a RAM drive for storage which is fine for testing, but must
4
 
          
5
            # not be used for production setups!
6
 
          
7
            - name: data
8
 
          
9
            emptyDir:
10
 
          
11
            medium: "Memory"


 

To set up persistent disk storage, Kubernetes provides the Persistent Volumes subsystem. It offers APIs for users and administrators that abstract away the details about how storage is provided from how it is consumed. One of these APIs is PersistentVolumesClaim. This instructs Kubernetes to request storage space from the underlying infrastructure. Kubernetes is agnostic about the implementation details.

The part of the controller file described above (from volumes:) must now be replaced with a new configuration. In the following example, 1GB of persistent storage is requested per pod (in practice, other storage sizes can also be selected). The following config section belongs to the same indentation level as serviceName: "crate-set", meaning much further to the left:

Java
 




xxxxxxxxxx
1
19


 
1
  volumeClaimTemplates:
2
 
          
3
            # Use persistent storage.
4
 
          
5
            - metadata:
6
 
          
7
            name: data
8
 
          
9
            spec:
10
 
          
11
            accessModes:
12
 
          
13
            - ReadWriteOnce
14
 
          
15
            resources:
16
 
          
17
            requests:
18
 
          
19
            storage: 1Gi


 

Unfortunately, the existing pods cannot be updated because the storage device is set to be changed. In the course of this change, all data that was previously written in CrateDB will be lost. The following commands show how the controller must be deleted and recreated:

Java
 




xxxxxxxxxx
1


 
1
$ kubectl replace --force -f crate-controller.yaml --namespace crate
2
 
          
3
 
4
 
          
5
statefulset.apps "crate" deleted
6
 
          
7
statefulset.apps/crate replaced


 

The following command can be used to verify whether 1GB is available for the pods:

Java
 




xxxxxxxxxx
1
11


 
1
$ kubectl get pvc --namespace crate
2
 
          
3
 
4
 
          
5
NAME            STATUS         VOLUME                                           CAPACITY   ACCESS MODES   STORAGECLASS   AGE
6
 
          
7
data-crate-0   Bound   pvc-281c14ef-a47e-11e8-a3df-080027220281   1Gi              RWO               standard          3m
8
 
          
9
data-crate-1   Bound   pvc-53ec50e1-a47e-11e8-a3df-080027220281   1Gi              RWO               standard          2m
10
 
          
11
data-crate-2   Bound   pvc-56d3433e-a47e-11e8-a3df-080027220281   1Gi             RWO               standard          2m


 

Scaling Horizontally to Five Nodes

The ready-made CrateDB package, available for download from the Crate.io website, is limited to three nodes. Exceeding this limit leads to malfunctions. If you don't want to use the more powerful (but paid) enterprise version, can still expand your cluster by using CrateDB Community Edition “from the source.”

The following code can be used to build CrateDB: 

Java
 




xxxxxxxxxx
1
10


 
1
sh$ git clone https://github.com/crate/crate
2
 
          
3
sh$ cd crate
4
 
          
5
sh$ git submodule update –init
6
 
          
7
sh$ git checkout <TAG>
8
 
          
9
sh$ ./gradlew clean communityEditionDistTar
10
 
          


 

The commit tag of the Git tag that corresponds to the version used must be inserted in place of “<TAG>”. As soon as the “gradlew” command has been successfully executed, the desired CrateDB CE release is stored as a compressed tarball archive in the app/build/distributions directory.

Horizontal scaling is now easy to implement by increasing or decreasing the number of replicas used.

In this example, the controller configuration initially defined three replicas:

Java
 




xxxxxxxxxx
1


 
1
  # Our cluster has three nodes.
2
 
          
3
  replicas: 3


 

The number can be changed while the cluster is running. This is particularly useful if, for example, it’s necessary to adapt rapidly to traffic peaks. Note that this procedure is not ideal for making permanent changes – the CrateDB Admin UI will display a corresponding warning.

The following changes are now made in the crate-controller.yaml file: The number of replicas is set from 3 to 5:

Java
 




xxxxxxxxxx
1


 
1
  # Our cluster has five nodes.
2
 
          
3
  replicas: 5


 

The rest is done automatically by CrateDB: the Expected_Nodes are set to the value 5, and both Minimum_Master_Nodes and the Recover_After_Nodes are adjusted. These values should be at least as large as half the cluster size plus 1. For this example, the system now sets them from 2 to 3.

Since this time only the "Replicas" and "Container" sections have been changed, the controller configuration can be updated directly:

Java
 




xxxxxxxxxx
1


 
1
$ kubectl replace -f crate-controller.yaml --namespace crate
2
 
          
3
 
4
 
          
5
statefulset.apps/crate replaced


 

This process can also be observed with the kubectl command while it’s taking place. Kubernetes first ends the pods that were running, but then starts them again with the same identity and the same memory. Finally, the following result is visible: 

Java
 




xxxxxxxxxx
1
15


 
1
$ kubectl get pods --namespace crate
2
 
          
3
 
4
 
          
5
NAME            READY          STATUS         RESTARTS   AGE
6
 
          
7
crate-0   1/1     Running    0                 11m
8
 
          
9
crate-1   1/1     Running    0                 11m
10
 
          
11
crate-2   1/1     Running    0                 10m
12
 
          
13
crate-3   1/1     Running    0                 2m
14
 
          
15
crate-4   1/1     Running    0                 2m


 

All five nodes can now also be seen in the Admin browser window:

Scaling Down: Removing a Node From the Cluster

As it pertains to CrateDB, there’s no difference whether a node should be removed from the cluster or whether it fails unexpectedly. In both cases, a node is removed from the cluster and CrateDB does the rest automatically. To test this, it’s advisable to load test data in the system first. Replicas and the Expected_Nodes can be set to 4 in the controller configuration, with everything else remaining as it is. The controller configuration is updated as follows:

Java
 




xxxxxxxxxx
1


 
1
$ kubectl replace -f crate-controller.yaml --namespace crate
2
 
          
3
 
4
 
          
5
statefulset.apps/crate replaced


 

Kubernetes is now making the changes pod by pod. While the cluster is in the middle of the roll-out – i.e. in an inconsistent status – some checks will fail. By default, replication routines are configured so that CrateDB can help itself if shards (horizontal partitions) need to be recreated. While the process continues, the Admin UI shows some warnings. When the process is complete, everything should be back in line, with the scaling down process concluded successfully.

CrateDB and Kubernetes work well as a team and make it possible to quickly set up a flexibly scalable cluster. Experimenting with test data can be useful when it comes to building experience, and gradually growing more familiar with using these technologies together.

Kubernetes Docker (software) Database Test data pods Java (programming language) Web Service Machine

Opinions expressed by DZone contributors are their own.

Related

  • 7 Microservices Best Practices for Developers
  • Minions in Minikube - A Kubernetes Intro for Java Developers
  • Data Ingestion Into Azure Data Explorer Using Kafka Connect
  • Deploy Application on Open-Shift that Requires Elevated Privileges on Specific Paths

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!