Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

How to Monitor Istio, the Kubernetes Service Mesh

DZone's Guide to

How to Monitor Istio, the Kubernetes Service Mesh

Learn how to deploy and monitor Istio, a platform used to interconnect microservices, over a Kubernetes cluster.

· Microservices Zone ·
Free Resource

Learn why microservices are breaking traditional APM tools that were built for monoliths.

In this article, we are going to deploy and monitor Istio over a Kubernetes cluster. Istio is a microservice mesh platform that offers advanced routing, balancing, security and high availability features, plus Prometheus-style metrics for your services out of the box.

What Is Istio?

Istio is a platform used to interconnect microservices. It provides advanced network features like load balancing, service-to-service authentication, monitoring, etc, without requiring any changes in service code.

In the Kubernetes context, Istio deploys an Envoy proxy as a sidecar container inside every pod that provides a service.

These proxies mediate every connection, and from that position, they route the incoming/outgoing traffic and enforce the different security and network policies.

This dynamic group of proxies is managed by the Istio "control plane", a separate set of pods that orchestrate the routing, security, live ruleset updates, etc.

You have detailed descriptions of each subsystem component in the Istio project docs.

Service Mesh Explained: The Rise of the "Service Mesh"

Containers are incredibly light and fast, it's no surprise their density is roughly one order of magnitude greater than virtual machines. Classical monolithic component interconnection diagrams are rapidly turning into highly dynamic, fault tolerant, N-to-N communications with their own internal security rules, labeling-based routes, DNS and service directories, etc. The famous microservice mesh.

This means that while software autonomous units (containers) are becoming simpler and numerous, interconnection and troubleshooting distributed software behavior is actually getting harder.

And of course, we don't want to burden containers with this complexity, we want them to stay thin and platform agnostic.

Kubernetes already offers a basic abstraction layer separating the service itself from the server pods. Several software projects are striving to tame this complexity, offering visibility, traceability, and other advanced pod networking features, we already covered how to monitor Linkerd, let's now talk about Istio.

Istio Features Overview

  • Intelligent routing and load balancing: Policies to map static service interfaces to different backend versions, allowing for A/B testing, canary deployments, gradual migration, etc. Istio also allows you to define routing rules based on HTTP-layer metadata like session tokens or user agent string.
  • Network resilience and health checks: timeouts, retry budgets, health checks, and circuit breakers. Ensuring that unhealthy pods can be quickly weeded out of the service mesh
  • Policy Enforcement: Peer TLS authentication, pre-condition checking (whitelists and similar ACL), quota management to avoid service abuse and/or consumer starvation.
  • Telemetry, traceability, and troubleshooting: telemetry is automatically injected in any service pod providing Prometheus-style network and L7 protocol metrics, Istio also dynamically traces the flow and chained connections of your microservices mesh.

How to Deploy Istio in Kubernetes

Istio Deployment Overview

Istio developers have made deploying the platform in a new or existing Kubernetes cluster simple enough.

Just make sure that your Kubernetes version is 1.7.3 or newer, with RBAC enabled. And that you don't have any older version of Istio already installed on the system.

Following the installation instructions here:

curl -L https://git.io/getLatestIstio | sh - cd istio-<version> export PATH=$PWD/bin:$PATH

Now you need to decide whether or not you want mutual TLS authentication between pods. If you choose to enable TLS your Istio services won't be allowed to talk to non-Istio entities.

We will use the non-TLS version this time:

kubectl apply -f install/kubernetes/istio.yaml

Istio system services and pods will be ready in a few minutes:

$ kubectl get svc -n istio-system NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE istio-ingress 10.15.240.67 some-external-ip 80:32633/TCP,443:31389/TCP 1d istio-mixer 10.15.249.77 <none> 9091/TCP,15004/TCP,9093/TCP,9094/TCP,9102/TCP,9125/UDP,42422/TCP 1d istio-pilot 10.15.243.227 <none> 15003/TCP,443/TCP 1d $ kubectl get pod -n istio-system NAME READY STATUS RESTARTS AGE istio-ca-1363003450-vp7pt 1/1 Running 0 1d istio-ingress-1005666339-w1gcs 1/1 Running 0 1d istio-mixer-465004155-nncrd 3/3 Running 0 1d istio-pilot-1861292947-zlt8w 2/2 Running 0 1d

Injecting the Istio "Envoy" Proxy in Your Existing Kubernetes Pods

As we mentioned in the architecture diagram, any service pod needs to be bundled with the Envoy container. Your Kubernetes cluster can be automatically instructed to do it if the alpha cluster features are enabled and you deploy the Istio-initializer:

kubectl apply -f install/kubernetes/istio-initializer.yaml

Alternatively, you can rewrite your existing yaml definitions on the fly using istioctl

kubectl create -f <(istioctl kube-inject -f <your-app-spec>.yaml)

Let's try with a simple single container deployment and service

$ kubectl apply -f <(istioctl kube-inject -f flask.yaml) $ kubectl logs flask-1027288086-lr4fm a container name must be specified for pod flask-1027288086-lr4fm, choose one of: [flask istio-proxy]

OK, it looks like our pods and services have been correctly instrumented.

How to Monitor Istio Using Prometheus

One of the major infrastructure enhancements of tunneling your service traffic through the Istio Envoy proxies is that you automatically collect metrics that are fine-grained and provide high-level application information (since they are reported for every service proxy).

These individuals metrics are then forwarded to the Mixer component, which aggregates them for the entire mesh.

Mixer provides three Prometheus endpoints:

  1. istio-mesh (istio-mixer.istio-system:42422): all Mixer-generated mesh metrics.
  2. mixer (istio-mixer.istio-system:9093): all Mixer-specific metrics. Used to monitor Mixer itself.
  3. envoy (istio-mixer.istio-system:9102): raw stats generated by Envoy (and translated from statsd to Prometheus).

Istio project also provides examples and documentation on configuring a Prometheus server to scrape and analyze the most relevant metrics.

kubectl apply -f install/kubernetes/addons/prometheus.yaml

Wait until the pod is ready, and forward the Prometheus server port to your local machine

kubectl -n istio-system port-forward $(kubectl -n istio-system get pod -l app=prometheus -o jsonpath='{.items[0].metadata.name}') 9090:9090 &

You can now access the Prometheus server UI opening http://localhost:9090/ in your web browser:

There is also a Grafana deployment already preconfigured and ready to test at the Istio repository:

$ kubectl create -f install/kubernetes/addons/grafana.yaml

Again, wait for the pod and service to be up and running and redirect the Grafana service port

kubectl -n istio-system port-forward $(kubectl -n istio-system get pod -l app=grafana -o jsonpath='{.items[0].metadata.name}') 3000:3000 &

You can access a pre-populated Dashboard at http://localhost:3000/dashboard/db/istio-dashboard.

How to Monitor Istio With Sysdig Scraping Prometheus Metrics

Istio core services using the Prometheus metric format is very convenient because as you know, Sysdig will automatically detect and scrape Prometheus endpoints.

Let's edit Sysdig agent configuration file (dragent.yaml) to configure which pods and ports should be scraped:

prometheus: enabled: true ... - include: kubernetes.pod.annotation.prometheus.io/scrape: true conf: port_filter: - include: 42422 - include: 9102 - include: 9093

Make sure that Prometheus is enabled and then, write an include filter. For this example, we use Kubernetes annotations, this way we can easily keep adding hosts without changing the agent configuration again.

Let's annotate the Mixer pod (your specific serial number will vary):

kubectl annotate pod istio-mixer-465004155-nncrd -n istio-system prometheus.io/scrape=true

Logging into the Sysdig Monitor web console, we check that the new metrics are indeed flowing to our cloud platform (metricCount.prometheus).

We are scraping istio-mixer Prometheus endpoints, time to monitor Istio!

Monitoring Istio: Reference Metrics and Dashboards

Let's start from the beginning, monitoring our services and application behavior.

Segmenting by service and service version, a few usual metrics that you want to monitor (and create the associated Dashboards):

  • Number of requests istio_request_count
  • Request duration istio_request_duration.avg, istio_request_duration.count
  • Request size http_request_size_bytes.count
  • Also, 90-99 percentiles of these metric, to delimit your worst case scenario
    • http_request_size_bytes.90percentile
    • http_request_duration_microseconds.90percentile
    • http_request_size_bytes.99percentile
    • http_request_duration_microseconds.99percentile
  • HTTP Error codes response_code
  • Bandwidth and disk IO consumption in your serving pods net.bytes.total file.iops.total
  • Top accessed URLs net.http.url, net.http.method

Using the Sysdig dashboard wizard you can quickly assemble your custom service overview with the most important metrics

Or just use the brand new Istio default dashboards.

Istio System Overview

Istio Service

If you are familiar with Sysdig or have read other articles related to Kubernetes and multiple HTTP services you will soon realize that you already had similar HTTP / network metrics out of the box.

So, apart from the essentials, let's highlight some of the additional features that Istio brings to the table in terms of monitoring.

HTTP request and response size (in bytes), not just as in network bandwidth but measuring each HTTP connection individually:

You have specific metrics to monitor gRPC connections, segmented by method, code, service, type, etc. Istio is able to route HTTP/2 & gRPC through its proxies.

Thanks to Istio connection traceability, you can also monitor the mentioned metrics (request count, duration, etc) not only from the destination but also from the source internal service (or a version thereof):

Monitoring Istio Internals

Apart from monitoring the services, you can use Istio and Sysdig aggregated metrics to monitor Istio internal services health and performance.

Istio provides its own Ingress controller, this is a very relevant piece of our infrastructure to monitor. When your users are experiencing performance problems or errors, the edge router is one of the first points to check.

To assess the global health of your edge router connections you can display its connections table, global HTTP response codes, resource usage, number of request per service or URL, etc.

Istio's Mixer has several adapters where it forward information you can use the mixer_adapter_dispatch_count metric segmented by adapter these connections

Mixer will also be contacted by the services to retrieve authorization and preconditions info, you can monitor these connections (and the result code)

You can use the "adapter" metrics to monitor Istio's Mixer communication with the adapters (Prometheus, Kubernetes)

  • mixer_adapter_dispatch_count
  • mixer_adapter_dispatch_duration.avg
  • mixer_adapter_dispatch_duration.count

And the "resolve" metrics to monitor Istio's Mixer communication with the services

  • mixer_config_resolve_actions.avg
  • mixer_config_resolve_actions.count
  • mixer_config_resolve_count
  • mixer_config_resolve_duration.avg
  • mixer_config_resolve_duration.count
  • mixer_config_resolve_rules.avg
  • mixer_config_resolve_rules.count

Monitoring Istio A/B Deployments and Canary Deployments

One of Istio major features is the ability to establish intelligent routing based on service version.

The pods that provide the backend for a certain service will have different Kubernetes labels

Labels: app=reviews pod-template-hash=3187719182 version=v3

These different backends are transparent to the consumer (service or final user) but Istio can take advantage of this information to perform:

  • Content-based routing: For example, if the user-agent is a mobile phone, you can change the specific service that formats the final HTML template
  • A/B deployments: two similar versions of the service that you want to compare in production
  • Canary deployment: Experimental service version that will only be triggered by certain conditions (like some specific test users)
  • Traffic Shifting: Progressive migration to the new service version maintaining the old version fully functional

Aggregating Istio and Sysdig metrics you can supervise these service migration will all the information you need to take further decisions.

For example, we are comparing the alpha and beta service pods, they provide the same Kubernetes service, using Istio traffic shifting, we decide to split ingress traffic 50-50.

As you can see the number of requests and duration of requests (two top graphs) is extremely similar, so we can assume it's a fair comparison in terms of load.

If you look at the two bottom graphs, it turns out that service alpha is suffering almost 3 times the number of HTTP errors and also its worst-case response time (99 percentile down-right graph) is also significantly higher than service beta. Looks like our developers did a great job with the new version :).

Conclusions

Istio solves the "mesh tangle" adding a transparent proxy as a sidecar to your service-provider pods. From this vantage position, it can collect fine-grained metrics and dynamically modify the routing flow without interfering with the pod software at all.

This strategy nicely complements Sysdig analogous non-intrusive, minimal-instrumentation approach to maintain your service pods simple and infrastructure agnostic (as they should be®).

The service mesh will be a hot topic for Kubernetes in 2018 and the jury is still out, we will keep an eye on the ecosystem to compare the family of service mesh infrastructures that are growing around the container stack.

Eager to learn how different security tools compare? Check our webinar Comparing Docker Security tools.

We ran a webinar discussing and comparing some of the most interesting container security tools, join this session to learn:

  • How to secure Docker containers and what are the best practices.
  • Why you need to both static and dynamic (run-time) scanning or your images.
  • What other container security measures and policies we can implement.
  • Demo: how configure and implement the basics with the most important tools in this list and how they compare.

Record growth in microservices is disrupting the operational landscape. Read the Global Microservices Trends report to learn more.

Topics:
microservices ,monitoring ,service mesh ,istio ,kubernetes ,tutorial

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}