{{announcement.body}}
{{announcement.title}}

Hazelcast Resilient to Kubernetes Zone Failures

DZone 's Guide to

Hazelcast Resilient to Kubernetes Zone Failures

Check out how features in Hazelcast, including the creation of automatic backups and the ZONE_AWARE feature, promote high availability in Kubernetes.

· Cloud Zone ·
Free Resource

See how Hazelcast protects against Kubernetes failure.

Data is valuable. Or, I should write, some data is valuable. You may think that if the data is important to you, then you must store it in the persistent volume, like a database or filesystem. This sentence is obviously true. However, there are many use cases in which you don’t want to sacrifice the benefits given by in-memory data stores. After all, no persistent database provides fast data access or allows us to combine data entries with such high flexibility. Then, how to keep your in-memory data safe? That is what I’m going to present in this blog post.

High Availability

Hazelcast is distributed and highly available by nature. It’s achieved by keeping the data partition backup always on another Hazelcast member. Let’s look at the diagram below.

Hazelcast diagram

Hazelcast diagram

Imagine you put some data into the Hazelcast cluster, for example, a key-value entry (“foo”, “bar”). It is placed into the data partition 1, and this partition is situated in the member 1. Now, Hazelcast guarantees that the backup of any partition is kept in a different member. So, in our example, the backup of the partition 1 could be placed in the member 2 or 3 (but never in the member 1). Backups are also propagated synchronously, so the strict consistency is preserved.

Imagine that member 1 crashes. What happens next is that the Hazelcast cluster detects it, promotes the backup data, and creates a new backup. This way, you can always be sure that if any of the Hazelcast member crashes, you’ll never lose any data. That is what I called “high availability by nature.”

We can increase the backup-count property and propagate the backup data synchronously to multiple members at the same time. However, the performance would suffer. In the corner case scenario, we could have backup-count equal to the number of members and then even if all members except for one crash, the data is not lost. Such an approach, however, would not only be very slow (because we have to propagate all data to all members synchronously), but it would also use a lot of in-memory data. That is why it’s not very common to increase the backup-count. For the simplicity of this post, let’s say that we’ll always keep its value as 1.

High Availability on Kubernetes

Let’s move the terminology from the previous section to Kubernetes. We can say we’re sure that if one Hazelcast pod fails, we don’t experience any data loss. So far, so good. It sounds like we are highly available, right? Well…yes and no. Let’s look at the diagram below.

Kubernetes cluster

Kubernetes cluster

Kubernetes may schedule two of your Hazelcast member pods to the same node, as presented on the diagram. Now, if node 1 crashes, we experience data loss. That’s because both the data partition and the data partition backup are effectively stored on the same machine. How to solve this problem?

Luckily, Kubernetes is quite flexible so that we may ask it to schedule each pod on a different node. This can be done using the Inter-pod affinity feature. In our case, if we’d like to be sure that each Hazelcast member is scheduled on a separate node, assuming our application name is “hazelcast,” we need to add the following configuration.

Java




xxxxxxxxxx
1
10


 
1
affinity:
2
  podAntiAffinity:
3
    requiredDuringSchedulingIgnoredDuringExecution:
4
    - labelSelector:
5
        matchExpressions:
6
        - key: app.kubernetes.io/name
7
          operator: In
8
          values:
9
          - hazelcast
10
      topologyKey: kubernetes.io/hostname


Note that with such configuration, you must be aware of what nodes are available and ensure not to run more Hazelcast members than the number of Kubernetes nodes. Otherwise, Kubernetes won’t schedule some of your pods. We could also use preferredDuringSchedulingIgnoredDuringExecution instead of requiredDuringSchedulingIgnoredDuringExecution, but then we’d again be unsure if no two Hazelcast members are scheduled on the same Kubernetes node.

All in all, with some additional effort, we achieved high availability on Kubernetes. Let’s see what happens next.

Multi-zone High Availability on Kubernetes

At this point, we’re sure that if any of the Kubernetes nodes fail, we don’t lose any data. However, what happens if the whole availability zone fails? Let’s look at the diagram below.

Multi-zone availability

Multi-zone availability

Kubernetes clusters can be deployed in one or many availability zones. Usually, for the production environments, we should avoid having just one single availability zone, because any zone failure would result in the downtime of our system. If you use Google Cloud Platform, then you can start a multi-zone Kubernetes cluster with one click (or one command). On AWS, you can easily install it with kops, and Azure offers multi-zone Kubernetes service as part of AKS (Azure Kubernetes Service). Now, when you look at the diagram above, what happens if the availability zone 1 is down? We experience data loss because both the data partition and the data partition backup and effectively stored inside the same zone.

Luckily, Hazelcast offers the  ZONE_AWARE  functionality, which forces Hazelcast members to store the given data partition backup inside a member located in a different availability zone. Having the  ZONE_AWARE  feature enabled, we end up with the following diagram.

Hazelcast Zone-Aware

Hazelcast Zone-Aware

Let me stress it again. Hazelcast guarantees that the data partition backup is stored in a different availability zone. So, even if the whole Kubernetes availability zone is down (and all related Hazelcast members are terminated), we won’t experience any data loss. That is what should be called the real high availability on Kubernetes! And you should always configure Hazelcast in that manner. How to do it? Let’s now look into the configuration details.

Hazelcast ZONE_AWARE Kubernetes Configuration

One of the requirements for the Hazelcast  ZONE_AWARE feature is to set an equal number of members in each availability zone. Starting from Kubernetes 1.16, you can achieve it by defining Pod Topology Spread Constraints. However, this feature is still in alpha and, what’s more, cloud platforms don’t offer Kubernetes 1.16 yet. That is why I’ll present how to achieve itby running two separate Helm installations, each in a different availability zone. They’ll both point to the same Kubernetes service, and therefore, all pods will form one Hazelcast cluster.

First, you need to run a service that will be used for the Hazelcast cluster discovery.

Shell




xxxxxxxxxx
1
13


 
1
$ kubectl apply -f - <<EOF
2
apiVersion: v1
3
kind: Service
4
metadata:
5
  name: hazelcast-service
6
spec:
7
  type: ClusterIP
8
  selector:
9
    role: hazelcast
10
  ports:
11
  - protocol: TCP
12
    port: 5701
13
EOF


Then, you can start two Helm installations. Assuming your availability zones are named: us-central1-a and us-central1-b, the first command looks as follows. Note that the following command is for Helm version 2 (if you use Helm 3, you need to add a parameter --generate-name).

Shell




xxxxxxxxxx
1
21


 
1
$ helm install hazelcast/hazelcast -f - <<EOF
2
hazelcast:
3
  yaml:
4
    hazelcast:
5
      network:
6
        join:
7
          kubernetes:
8
            service-name: hazelcast-service
9
      partition-group:
10
        enabled: true
11
        group-type: ZONE_AWARE
12
affinity:
13
  nodeAffinity:
14
    requiredDuringSchedulingIgnoredDuringExecution:
15
      nodeSelectorTerms:
16
      - matchExpressions:
17
        - key: failure-domain.beta.kubernetes.io/zone
18
          operator: In
19
          values:
20
          - us-central1-a
21
EOF


Let’s comment on the parameters we just used. It contains three interesting parts:

  • Using hazelcast-service for discovery – otherwise, each Helm installation would use its service and therefore they would form two separate Hazelcast clusters
  • Enabling ZONE_AWARE
  • Setting nodeAffinity – forces all Hazelcast pods in this installation to be scheduled on the nodes in the zone availability us-central1-a

To run the second Helm chart installation, you should run the same command, this time with the zone us-central1-b. Then, you should see that all your members formed one Hazelcast cluster.

Shell




xxxxxxxxxx
1
20


 
1
$ kubectl get pods
2
NAME                                   READY   STATUS    RESTARTS   AGE
3
dandy-ladybird-hazelcast-0             1/1     Running   0          117s
4
dandy-ladybird-hazelcast-1             1/1     Running   0          77s
5
dandy-ladybird-hazelcast-2             1/1     Running   0          40s
6
romping-gibbon-hazelcast-0             1/1     Running   0          107s
7
romping-gibbon-hazelcast-1             1/1     Running   0          74s
8
romping-gibbon-hazelcast-2             1/1     Running   0          39s
9
 
          
10
$ kubectl logs dandy-ladybird-hazelcast-0
11
...
12
Members {size:6, ver:6} [
13
        Member [10.32.1.8]:5701 - 55dc5090-38c7-48e4-821e-abe35e370a8e this
14
        Member [10.32.4.4]:5701 - 11ba4bce-20ad-4af0-afef-f7670e44195d
15
        Member [10.32.3.4]:5701 - 16ad5d0b-6b88-4d24-b46b-e3e200c54626
16
        Member [10.32.2.6]:5701 - 56c87c59-f764-411a-8faf-d8618af3ed7c
17
        Member [10.32.1.9]:5701 - 8d00bb07-39f2-4b6b-8dc8-fbb7ab13751e
18
        Member [10.32.5.7]:5701 - 1212341a-d033-42b3-96c9-9c69ded9db94
19
]
20
...


What we just deployed is a Hazelcast cluster resilient to Kubernetes zone failures. Needless to add, if you want to have your cluster deployed on more zones, then you can run the next Helm installations. Note, however, that it won’t mean that if 2 zones fail at the same time, you don’t lose data. What Hazelcast guarantees is just that the data partition backup is stored in the member, which is always located in a different availability zone.

Hazelcast Cloud

Last but not least, Hazelcast multi-zone deployments will soon be available in the managed version of Hazelcast. You can check it at cloud.hazelcast.com. By ticking multiple zones in the web console, you can enable the multi-zone high availability level for the Hazelcast deployment. It’s no great secret that while implementing Hazelcast Cloud, internally, we used exactly the same strategy as described above.

Conclusion

In the cloud era, multi-zone high availability usually becomes a must. Zone failures happen and we’re no longer safe just by having our services on different machines. That is why any production-ready deployment of Kubernetes should be regional in scope, and not only zonal. The same applies to Hazelcast. Enabling  ZONE_AWARE is highly recommended, especially because Hazelcast is often used as a stateful backbone for stateless services. If your Kubernetes cluster, however, is deployed only in one availability zone, then please at least make sure Hazelcast partition backups are always effectively placed on a different Kubernetes node. Otherwise, your system is not highly available at all and any machine failure may result in data loss.

Further Reading

Is Multi-Cloud a Solution for High Availability?

A Brief History of High Availability

Topics:
kubernetes ,hazelcast ,high availability ,resilience ,datastore ,cloud

Published at DZone with permission of Rafał Leszko . See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}