Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Successful GKE Node Pool Updates With Cloud Deployment Manager

DZone's Guide to

Successful GKE Node Pool Updates With Cloud Deployment Manager

One of the few difficuties of CDM. Fortunately, we've got you covered.

· Cloud Zone ·
Free Resource

Discover a centralized approach to monitor your virtual infrastructure, on-premise IT environment, and cloud infrastructure – all on a single platform.

The Cloud Deployment Manager is actually a great tool and we’re loving using it.

But as with any other technology, there are also tricky tasks. One of them is when it comes to node pool updates. Basic operations like node version upgrade or node count changes are straightforward, but changing OAuth scopes or changing machine type does not work out-of-the-box, because this requires the creation of a new node pool.

Let’s see how to update a Google Kubernetes Engine (GKE) cluster with a new machine type.

Initial Setup

First, we create a GKE cluster with 2 nodes and machine type n1-standard-2. We use the following Deployment Manager configuration:

resources:
- name: np-playground
  type: container.v1.cluster
  properties:
    zone: "europe-west3-a"
    cluster:
      initialClusterVersion: "1.10.9-gke.5"
      ## Can be used to update master version, even if the official docu states that this field is r/o.
      ## ref: https://cloud.google.com/kubernetes-engine/docs/reference/rest/v1/projects.zones.clusters
      currentMasterVersion: "1.10.9-gke.5"
      ## Initial NodePool config, change only for node count or node version changes.
      nodePools:
      - name: "np-playground-np"
        initialNodeCount: 2
        version: "1.10.9-gke.5"
        config:
          machineType: "n1-standard-2"
          oauthScopes:
            - https://www.googleapis.com/auth/logging.write
            - https://www.googleapis.com/auth/monitoring
            - https://www.googleapis.com/auth/ndev.clouddns.readwrite
          preemptible: true

## Duplicates node pool config from v1.cluster section, to get it explicitly managed.
- name: np-playground-np
  type: container.v1.nodePool
  properties:
    zone: europe-west3-a
    ## This is very important, as its actually controls the creation order by adding implicit a dependsOn constraint.
    ## ref: https://cloud.google.com/deployment-manager/docs/configuration/use-references
    ## ref: https://cloud.google.com/deployment-manager/docs/configuration/create-explicit-dependencies
    clusterId: $(ref.np-playground.name)
    nodePool:
      name: "np-playground-np"

The configuration file has some specials conditions. First, the node pool configuration (line 25ff.) works because of the default policy for adding resources which is CREATE_OR_ACQUIRE. If the Deployment Manager finds a resource which matches on name, type, and zone or region, then the resource will be acquired instead of created.

The second special is the reference usage (line 32). As a node pool cannot be created without an existing cluster, the deployment manager command would fail. So we've to ensure that the node pool will be created (actually acquired) after the cluster becomes available.

Without references, Deployment Manager creates all resources in parallel, so there is no guarantee that dependent resources are created in the correct order.
Using references would enforce the order in which resources are created. (source: GCP doc)
$ gcloud deployment-manager deployments create upgrade-test --config dm.yaml

NAME              TYPE                   STATE      ERRORS  INTENT
np-playground     container.v1.cluster   COMPLETED  []
np-playground-np  container.v1.nodePool  COMPLETED  []

Adding a New Node Pool

At this point, we are going to create a new node pool with different machine type n1-highmem-2.

The following section must be appended to configuration from the previous section:

## New NodePool with desired config
- name: np-playground-np-highmem
  type: container.v1.nodePool
  properties:
    zone: europe-west3-a
    clusterId: $(ref.np-playground.name)
    nodePool:
      name: "np-playground-np-highmem"
      initialNodeCount: 2
      version: "1.10.9-gke.5"
      config:
        ## different machine type
        machineType: "n1-highmem-2"
## scopes can be changed as well
        oauthScopes:
          - https://www.googleapis.com/auth/logging.write
          - https://www.googleapis.com/auth/monitoring
          - https://www.googleapis.com/auth/ndev.clouddns.readwrite
        preemptible: true


Applying these changes will result in following:

$ gcloud deployment-manager deployments update upgrade-test --config dm.yaml

NAME                      TYPE                   STATE      INTENT
np-playground             container.v1.cluster   COMPLETED
np-playground-np          container.v1.nodePool  COMPLETED
np-playground-np-highmem  container.v1.nodePool  COMPLETED

Migrate the Workloads

After creating a new node pool, workloads are still running on the old NodePool.
Kubernetes does not reschedule Pods as long as they are running and available. (Source:  GCP documents)

To migrate these Pods to the new node pool:

  1. Cordon the existing node pool: This operation marks the nodes in the old node pool as unschedulable. Kubernetes stops scheduling new Pods to these nodes once you mark them as unschedulable.

  2. Drain the existing node pool: This operation evicts the workloads running on the nodes of the old node pool gracefully.

Cordon Old Node Pool

Connect to the K8s cluster:

gcloud container clusters get-credentials np-playground --zone europe-west3-a

Select all nodes from old node pool:

kubectl get nodes -l cloud.google.com/gke-nodepool=np-playground-np

Cordon all nodes from old node pool:

for node in $(kubectl get nodes -l cloud.google.com/gke-nodepool=np-playground-np -o=name); do
  kubectl cordon "$node";
done

Drain old Node Pool

The following command iterates each node in old node pool and drains them by evicting pods with an allotted graceful termination period of 10 seconds:

for node in $(kubectl get nodes -l cloud.google.com/gke-nodepool=np-playground-np -o=name); do
  kubectl drain --force --ignore-daemonsets --delete-local-data --grace-period=10 "$node";
done

Delete Old Node Pool

Finally, we only have to delete the old node pool. This can be achieved by deleting or commenting out the old node pool configuration.

Find the full YAML below:

resources:
- name: np-playground
  type: container.v1.cluster
  properties:
    zone: "europe-west3-a"
    cluster:
      initialClusterVersion: "1.10.9-gke.5"
      currentMasterVersion: "1.10.9-gke.5"
      ## Initial NodePool config, change only for node count or node version changes.
      nodePools:
      - name: "np-playground-np"
        initialNodeCount: 2
        version: "1.10.9-gke.5"
        config:
          machineType: "n1-standard-2"
          oauthScopes:
            - https://www.googleapis.com/auth/logging.write
            - https://www.googleapis.com/auth/monitoring
            - https://www.googleapis.com/auth/ndev.clouddns.readwrite
          preemptible: true

## Duplicates node pool config from v1.cluster section, to get it explicitly managed.
#- name: np-playground-np
#  type: container.v1.nodePool
#  properties:
#    zone: europe-west3-a
#    clusterId: $(ref.np-playground.name)
#    nodePool:
#      name: "np-playground-np"

## New NodePool with desired config
- name: np-playground-np-highmem
  type: container.v1.nodePool
  properties:
    zone: europe-west3-a
    clusterId: $(ref.np-playground.name)
    nodePool:
      name: "np-playground-np-highmem"
      initialNodeCount: 2
      version: "1.10.9-gke.5"
      config:
        machineType: "n1-highmem-2"
        oauthScopes:
          - https://www.googleapis.com/auth/logging.write
          - https://www.googleapis.com/auth/monitoring
          - https://www.googleapis.com/auth/ndev.clouddns.readwrite
        preemptible: true

And apply:

$ gcloud deployment-manager deployments update upgrade-test --config dm.yaml

NAME                      TYPE                   STATE      INTENT
np-playground             container.v1.cluster   COMPLETED
np-playground-np-highmem  container.v1.nodePool  COMPLETED

Conclusion

As you can see, it's actually not a big deal to perform node upgrades with the Deployment Manager.
Even if it is to be hoped that this will become a feature in future Deployment Manager versions (like node version upgrade).

But this procedure has two main drawbacks. First, an update cannot be done with a single Deployment Manager run and needs manual actions. This prevents us from a full automated cluster upgrade (in matter of machine type change).

The second issue is when the configuration file from the last step (delete old node pool) is used to create a new cluster (e.g. re-creation in case of disaster recovery) a cluster with two node pools is created — the one configured in  nodePools section beneath cluster and the new node pool with highmem machines.

Learn how to auto-discover your containers and monitor their performance, capture Docker host and container metrics to allocate host resources, and provision containers.

Topics:
gcp ,kubernetes ,tutorial ,cloud ,google ,node pool

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}