DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Containers

Containers allow applications to run quicker across many different development environments, and a single container encapsulates everything needed to run an application. Container technologies have exploded in popularity in recent years, leading to diverse use cases as well as new and unexpected challenges. This Zone offers insights into how teams can solve these challenges through its coverage of container performance, Kubernetes, testing, container orchestration, microservices usage to build and deploy containers, and more.

icon
Latest Refcards and Trend Reports
Trend Report
Kubernetes in the Enterprise
Kubernetes in the Enterprise
Refcard #233
Getting Started With Kubernetes
Getting Started With Kubernetes
Refcard #344
Kubernetes Multi-Cluster Management and Governance
Kubernetes Multi-Cluster Management and Governance

DZone's Featured Containers Resources

Dockerizing an Ansible Playbook, Part 1

Dockerizing an Ansible Playbook, Part 1

By Gitanjali Sahoo
This is a two-part article series. This is Part 1 of the series, where we will go over the concept of Dockerization. And then, Dockerize an Ansible playbook to run in a Docker container. In Part 2, we will create a GitLab pipeline to register the Docker image of the Ansible playbook to a Docker registry and deploy it to Kubernetes. Why Dockerizing? With the continuously increasing complexity of the business problem domain, building a solution needs a whole lot of dependent libraries and tools with different versions while managing their interdependencies and cross-dependencies. The most challenging area is porting an application to a hosting environment where some of the dependent libraries and packages with specific versions are already installed as part of other applications hosted in the system. The developer now has to deal with conflicting versions while building and running their application without impacting any existing ones. With Dockerizing, you can pack your application with everything that you need and ship it as a package and run it in an isolated environment called a container. The good part is you have to do it once and run it anywhere consistently. Some of the other key benefits are ROI - Docker dramatically reduces infrastructure costs as it is lightweight and needs only fewer resources to run the same application. It allows multiple applications to run in their own container, sharing the same underlying OS layer Developer experience Faster deployment as the developer needs to bundle only the libraries and tools that its application needs in a declarative manner. There is no time spent in setting up an infra. Once done, it can be reused again and again. Easier debugging as a developer can reproduce the environment easily anywhere with the same exact set of dependencies. This makes promoting to a higher environment, such as UAT and PROD, easy and more reliable Better security as less access is needed to work with the code running inside a container. The key to Dockerization is the Docker engine which interacts with the underlying OS. There is an underlying hardware infrastructure and then the OS on top of it. And then Docker installed on OS. Docker manages libraries and dependencies to run the application. Docker Engine is available on a variety of Linux platforms. It can also run on Windows 10 through a Docker desktop which creates a Linux virtual machine on top of Windows OS. What is Ansible Playbook? Ansible is an automation tool that automates provisioning, configuration management, application deployment, and orchestration of processes. Playbooks are YAML files that define configurations, deployment, and orchestration in Ansible and allow Ansible to perform operations on managed nodes. Dockerize an Ansible Playbook and Run It in a Docker Container In this tutorial, we will create a playbook that does the following It logs in to localhost and prints OS info. With Dockerizing a playbook, the Docker container will be the localhost It connects a remote host through passwordless login and prints OS info Create an Ansible Playbook For this tutorial, we will use Windows PowerShell to run all our commands. Create a working directory named playbooks and cd to that directory. mkdir playbooks cd playbooks Create a file named hello.yaml and copy the below content to it. - name: This is sample hello program hosts: localhost gather_facts: no tasks: - name: print os version of the container shell : "uname -a" register: osverlocal - name: display os version debug: msg: "Hello loaclhost - {{ osverlocal.stdout }" - name: This is sample hello program on ansible server hosts: common1 remote_user: user1 gather_facts: no tasks: - name: print os version of the remote host shell : "hostname" register: osverremote - name: display os version debug: msg: "Hello common1 -{{ osverremote.stdout }" In this code snippet, we have two sections, one for each host machine: localhost and remote host defined by the common1 variable. This will be defined in another file called Ansible Inventory File in the next step. Ansible tasks will be executed on the localhost using a logged-in user to the container and on the remote host by user1. We will cover it later in this article. Create an Inventory File Create a file named inventory.txt and add the following entry there. Please ensure to replace <fully qualified hostname> with the remote host machine's FQDN: [common1] <fully qualified hostname> Generate Key Pair and Copy to the Target Machine For passwordless SSH login to a remote host, we need to use private-public key pair. On Windows PowerShell, run the following command and follow through prompts to generate a key pair ssh-keygen. ssh-keygenssh-keygen This will generate id_rsa and id_rsa.pub under<user home directory>/.ssh folder. id_rsa file contains the private key, and id_rsa.pub contains the public key. Create .ssh folder under /home/user1 if not exists. In this tutorial, I am using "user1" to connect to the remote host. You may use a different user (which must exist in the target machine). Copy id_rsa.pub to /home/user1/.ssh folder in the target machine. Append the content of id_rsa.pub file to authorized_keys file located under /home/user1/.ssh in the target machine. Install Docker on Your Local Workstation Prerequisites Docker Desktop for Windows requires WSL2 (Windows System for Linux 2) to be enabled. NOTE: This step will require a Windows reboot. To enable WSL2, start the Windows PowerShell as an Administrator. You can do so by typing Windows PowerShell in the Cortona Search bar in the Taskbar, as shown below. Select Run as Administrator option among the three options shown in the pane in the right half. This will bring up the Windows PowerShell terminal window. At the Windows PowerShell Terminal prompt, type in the following command: wsl — install This single command will do all that is required to enable WSL and install Ubuntu Linux on your Windows laptop. Once the installation is done, you will be prompted to reboot. Upon reboot, the Ubuntu installation/configuration will continue. You will be prompted to set up an account. Type in a username of your choice (your name, for example). You will also be prompted to enter/re-enter the password. Please keep these two handy. This completes the WSL Setup. Install Docker Desktop for Windows NOTE: Installation of Docker Desktop for Windows will require you to log off after the installation. So please save your work before you begin the installation. Here are the steps for installing Docker Desktop for Windows Download the Docker Desktop Installer from this web page. This will download the Docker Desktop Installer.exe file in the Downloads directory. Double Click on this file to begin the installation. Accept the defaults in the Dialog box that comes up next. On doing so, the installation will begin. Once the installation is completed, you will be prompted to sign out/log off. Once you log back in, Docker Desktop for Windows is ready to use. Build and Run an Ansible Playbook in the Docker Container Create a Docker File Create a file named Dockerfile and paste the below content: Dockerfile FROM python:3.10.8-slim-buster ARG USERNAME="ansibleuser" RUN DEBIAN_FRONTEND=noninteractive apt-get update && \ apt-get install -y sshpass git openssh-client openssh-server sudo && \ rm -rf /var/lib/apt/lists/* && \ apt-get clean RUN echo "root ALL=(ALL) ALL" >> /etc/sudoers RUN echo "$USERNAME ALL=(ALL:ALL) NOPASSWD: ALL" >> /etc/sudoers RUN useradd -ms /bin/bash $USERNAME RUN mkdir /ansible COPY ./requirements.txt /ansible/ COPY ./hello.yml /ansible/ COPY ./inventory.txt /ansible/ COPY ./id_rsa.pub /home/$USERNAME/.ssh/id_rsa.pub RUN sudo /etc/init.d/ssh restart RUN chown -R $USERNAME:$USERNAME /ansible RUN chown $USERNAME:$USERNAME -R /home/$USERNAME/.ssh && \ chmod 700 -R /home/$USERNAME/.ssh && \ chmod 640 /home/$USERNAME/.ssh/id_rsa.pub RUN ssh-keyscan <HOST-NAME> >> /home/$USERNAME/.ssh/known_hosts RUN pip install -r /ansible/requirements.txt USER $USERNAME WORKDIR /ansible ENTRYPOINT ["tail", "-f", "/ansible/hello.yml"] me/$USERNAME/.ssh && \ chmod 640 /home/$USERNAME/.ssh/id_rsa.pub RUN ssh-keyscan <HOST-NAME> >> /home/$USERNAME/.ssh/known_hosts RUN pip install -r /ansible/requirements.txt USER $USERNAME WORKDIR /ansible ENTRYPOINT ["tail", "-f", "/ansible/hello.yml"] In this Docker file, we are listing out tools/libraries that are needed to be installed to build and run our Ansible playbook. We are creating a user named "Ansibleuser" in the container. This user will be used to run Ansible tasks on the localhost machine. The username defined in remote_user will be used to log in to the remote host machine. In this tutorial, I am using user1 to log in to the remote host. Please make sure to use the same user that you have copies public key as part of the previous step. Please replace <HOST_NAME> in line#22 with the FQDN of the remote host. This will allow the client (in this case, the Docker container) to trust the remote host machine. Create requirements.txt File This file contains the libraries that will be installed as part of the pip command provided in the Dockerfile. These libraries are needed to run Ansible runner: ansible==6.6.0 ansible_runner==2.3.0 Build a Docker Image Start the Docker desktop that was installed. This will start a demon process named Docker daemon (Dockerd). It listens to Docker API requests and manages Docker objects such as images, containers, networks, and volumes. Now open the Windows PowerShell prompt and add “cd” to the playbooks folder (the folder location where you have created all the above files). Run the following command: docker build . -t hwa This will build an image named hwa. Verify that the image was created by running the below command. docker image ls The output will be something as below: Create a Docker Container Run the below command to create a container from the image named hwa: docker run -d hwa It will print the container ID something as below: Verify that the container is running by running the below command: docker container ls Please note this ID somewhere to refer to later. Execute the Ansible Playbook in the Docker Container Login to the container by running the below command. Please make sure to replace <container-id> with your container id. docker container exec -it <container-id> bash This will let you log into the Docker container giving you a bash shell to interact with. If you run the ls command, then you will find all your Ansible-related files below: Now run the Ansible playbook by running the below command: ansible-playbook hello.yml -i inventory.txt This will print the OS version of the Docker container and the remote host by passwordless SSH to the remote host. Congratulations! You successfully Dockerized your Ansible playbook and executed the playbook in a container. In Part 2, we will push this to the Docker registry and then deploy it to K8s. More
Visual Network Mapping Your K8s Clusters To Assess Performance

Visual Network Mapping Your K8s Clusters To Assess Performance

By Anton Lawrence CORE
Building performant services and systems is at the core of every business. Tons of technologies emerge daily, promising capabilities that help you surpass your performance benchmarks. However, production environments are chaotic landscapes that exact a heavy performance toll when not maintained and monitored. Although Kubernetes is the defacto choice for container orchestration, many organizations fail to implement it. Growing organizations, in the process of upscaling their services, unintentionally introduce complexities into the system. Knowing how the infrastructure is set up and how clusters operate and communicate are crucial. Most of the infrastructural setup is tranched into a network of systems to communicate and share the workloads. If only we could visually see how the systems are connected and the underlying factors. Mapping the network using an efficient tool for visualization and assessment is essential for monitoring and maintaining services. Introduction To Visual Network Mapping Network mapping is the process of identifying and cataloging all the devices and connections within a network. A visual network map is a graphical representation of the network that displays the devices and the links between them. Visual network maps can provide a comprehensive understanding of a network's topology and identify potential problems or bottlenecks, allowing for modifications and expansion plans that can significantly improve troubleshooting, planning, analysis, and monitoring. Open-source security tools, such as OpenVAS, Nmap, and Nessus, can be used to conduct network mapping and generate visual network maps. These tools are freely available, making them a cost-effective option for organizations looking to improve their network security. Furthermore, many open-source security tools also offer active community support, enabling users to share knowledge, tips, and best practices for using the tool to its full potential. Benefits of Using Visual Network Maps An effective tool for planning and developing new networks, expanding or modernizing existing networks, and analyzing network problems or issues is a visual network map. A proper setup of visual network maps can exponentially augment the monitoring, tracking, and remediation capabilities. It can give you a clear and complete picture of the network, enabling you to pinpoint the issue’s potential source and resolve it then and there, or it can assist you in real-time network monitoring and notify you of any changes or problems beforehand. Introduction to Caretta and Grafana Caretta is an open-source network visualization and monitoring tool that enables real-time network viewing and monitoring. Grafana is an open-source data visualization and monitoring platform that enables you to create customized dashboards and alerts as well as examine and analyze data. An effective solution for comprehending and managing your network can be created by combining Caretta and Grafana. How Caretta Uses eBPF and Grafana Caretta’s reason for existence is to help you understand the topology and the relationships between devices in distributed environments. It offers various capabilities such as device discovery, real-time monitoring, alerts, notifications, and reporting. It uses Victoria Metrics to gather and publish its metrics, and any Prometheus-compatible dashboard can use the results. Carreta makes it possible to accept typical control-plane node annotations by enabling tolerations. It gathers network information, such as device and connection details, using the eBPF (extended Berkeley Packet Filter) kernel functionality and then uses the Grafana platform to present the information in a visual map. Grafana’s Role in Visualizing Caretta’s Network Maps Grafana is designed to be a modular and flexible tool that integrates and onboards a wide range of data sources and custom applications with simplicity. Due to its customizable capabilities, you can modify how the network map is presented using the Grafana dashboard. Additionally, you can pick from several visualization options to present the gathered data in an understandable and helpful way. Grafana is crucial for both showing the network data that Caretta has gathered and giving users a complete picture of the network. Using Caretta and Grafana To Create a Visual Network Map To use Caretta and Grafana for creating a visual network map, you must set up, incorporate, and configure them. The main configuration item is the Caretta daemonset. You must deploy the Caretta daemonset to the cluster of choice that will collect the network metrics into a database and set up the Grafana data source to point to the Caretta database to see the network map. Prerequisites and Requirements for Using Caretta and Grafana Caretta is a modern tool integrated with advanced features. It relies on Linux kernel version >= 4.16 and x64 bit system helm chart. Let's dive in and see how to install and configure this brilliant tool combination. Steps for Installing and Configuring Caretta and Grafana With an already pre-configured helm chart, installing Caretta is just a few calls away. The recommendation is to install Caretta in a new, unique namespace. helm repo add groundcover https://helm.groundcover.com/ helm repo update helm install caretta --namespace caretta --create-namespace groundcover/caretta The same can be applied to installing Grafana. helm install --name my-grafana --set "adminPassword=secret" \n --namespace monitoring -f custom-values.yaml stable/grafana Our custom-values.yaml will look something like below: ## Grafana configuration grafana.ini: ## server server: protocol: http http_addr: 0.0.0.0 http_port: 3000 domain: grafana.local ## security security: admin_user: admin admin_password: password login_remember_days: 1 cookie_username: grafana_admin cookie_remember_name: grafana_admin secret_key: hidden ## database database: type: rds host: mydb.us-west-2.rds.amazonaws.com ## session session: provider: memory provider_config: "" cookie_name: grafana_session cookie_secure: true session_life_time: 600 ## Grafana data persistence: enabled: true storageClass: "-" accessModes: - ReadWriteOnce size: 1Gi Configuration You can configure Caretta using helm values. Values in Helm are a chart’s setup choices. When the chart is installed, you can change the values listed in a file called values.yaml, which is part of the chart package, and customize the configurations based on the requirement at hand. An example of configuration overwriting default values is shown below: pollIntervalSeconds: 15 # set metrics polling interval tolerations: # set any desired tolerations - key: node-role.kubernetes.io/control-plane operator: Exists effect: NoSchedule config: customSetting1: custom-value1 customSetting2: custom-value2 victoria-metrics-single: server: persistentVolume: enabled: true # set to true to use persistent volume ebpf: enabled: true # set to true to enable eBPF config: someOption: ebpf_options The pollIntervalSeconds sets the interval at which metrics are polled. In our case, we have set it to poll every 15 seconds. The tolerations section allows specifying tolerations for the pods. In the shown example, pods are allowed only to run on nodes that have the node-role.kubernetes.io/control-plane label and exist with the effect NoSchedule. The config section allows us to specify custom configuration options for the application. The victoria-metrics-single section allows us to configure the Victoria-metrics-single server. Here, it is configuring the persistent volume as enabled. The eBPF section allows us to enable eBPF and configure its options. Creating a Visual Network Map With Caretta and Grafana Caretta consists of two parts: the “Caretta Agent” and the “Caretta Server.” Every node in the cluster runs the Caretta Agent Kubernetes DaemonSet, which collects information about the cluster’s status. You will need to include the data gathered by Caretta in Grafana in order to view it as a network map and generate a visual network map. apiVersion: apps/v1 kind: DaemonSet metadata: name: caretta-depoy-test namespace: caretta-depoy-test spec: selector: matchLabels: app: caretta-depoy-test template: metadata: labels: app: caretta-depoy-test spec: containers: - name: caretta-depoy-test image: groundcover/caretta:latest command: ["/caretta"] args: ["-c", "/caretta/caretta.yaml"] volumeMounts: - name: config-volume mountPath: /caretta volumes: - name: config-volume configMap: name: caretta-config Data from the Caretta Agent is received by the Caretta Server, a Kubernetes StatefulSet, which then saves it in a database. apiVersion: apps/v1 kind: StatefulSet metadata: name: caretta-depoy-test labels: app: caretta-depoy-test spec: serviceName: caretta-depoy-test replicas: 1 selector: matchLabels: app: caretta-depoy-test template: metadata: labels: app: caretta-depoy-test spec: containers: - name: caretta-depoy-test image: groundcover/caretta:latest env: - name: DATABASE_URL value: mydb.us-west-2.rds.amazonaws.com ports: - containerPort: 80 name: http To accomplish this, you will need to create a custom data source plugin in Grafana to connect to Caretta’s data and then develop visualizations in Grafana to show that data. [datasources] [datasources.caretta] name = caretta-deploy-test type = rds url = mydb.us-west-2.rds.amazonaws.com access = proxy isDefault = true Customization Options for the Network Map and How to Access Them The network map that Caretta and Grafana produced can be customized in a variety of ways. We can customize the following: Display options: With display customization options, you have control over the layout of the map, the thickness, and the color of the connections and devices. Data options: With data options, you may select which information, including warnings, performance metrics, and details about your device and connection, is shown on the map. Alerting options: With alerting options, you can be informed of any network changes or problems, such as heavy traffic, sluggish performance, or connectivity problems. Visualization options: With visualization options, you can present the gathered data in an understandable and useful way. Usually, you’ll need to use the Grafana dashboard to access these and other customization options. Depending on the version of Caretta and Grafana you are running and your particular setup and needs, you will have access to different options and settings. Interpreting and Using the Visual Network Map The primary goals of a visual network map made with Caretta and Grafana are aiding in network topology comprehension, the identification of possible bottlenecks or problems, and the planning and troubleshooting of network problems. You must comprehend the various components of the map and what they stand for in order to interpret and use the visual network map. Some of the types of information that may be displayed on the map are: Devices: The network’s endpoints, including servers, switches, and routers, are presented on the map. Connections: The connections between devices, such as network cables, wireless connectivity, or virtual connections, and sometime the connectivity type may be depicted on the map. Data: Performance indicators, alarms, and configuration information will be displayed on the maps. Tips for Using the Network Map To Assess Performance in Your K8s Cluster Creating a curated, informative and scalable network map is more challenging than it sounds. But with a proper tool set, this becomes manageable. We have seen what we can accomplish using Caretta and Grafana together. Now, let's see what we need to consider for using network maps that showcase the performance metrics of your Kubernetes clusters. First and foremost, understand the network topology of the cluster, including the physical and virtual networks that your services run on. Next, ensure that the network plugin that you are using is compatible with your application. Finally, define network policies to secure communication between pods, control ingress, and egress traffic, monitor, and troubleshoot. Understand pod-to-pod communication and pod networking is happening. Conclusion Breaking down large systems into microservices, making systems distributed, and orchestrating them is the most followed approach to boost performance and uptime. Kubernetes and Docker are the market leaders here. As performant as it is, observability is a concern in large-scale distributed systems. We need to consider all the influencing outliers and anomalies to monitor and enhance the overall system with optimal performance in mind. New technologies make innovations and advancements easy but introduce unknown impediments to the system. You need an observability tool that can track all the network operations and present them in an efficient and informative way. Grafana is the leading tool in the monitoring space. By combining Caratta, an open-source network visualization, and monitoring tool, with Grafana, we can unlock the true value of our infrastructure. More
How Do the Docker Client and Docker Servers Work?
How Do the Docker Client and Docker Servers Work?
By Eugenia Kuzmenko
Kubernetes Control Plane: 10 Tips for Airtight K8s Security
Kubernetes Control Plane: 10 Tips for Airtight K8s Security
By Olesia Pozdniakova
What Is a Kubernetes CI/CD Pipeline?
What Is a Kubernetes CI/CD Pipeline?
By Jyoti Sahoo
Change Data Capture With QuestDB and Debezium
Change Data Capture With QuestDB and Debezium

Modern data architecture has largely shifted away from the ETL (Extract-Transform-Load) paradigm to the ELT (Extract-Load-Transform) paradigm where raw data is first loaded into a data lake before transformations are applied (e.g., aggregations, joins) for further analysis. Traditional ETL pipelines were hard to maintain and relatively inflexible with changing business needs. As new cloud technologies promised cheaper storage and better scalability, data pipelines could move away from pre-built extractions and batch uploads to a more streaming architecture. Change data capture (CDC) fits nicely into this paradigm shift where changes to data from one source can be streamed to other destinations. As the name implies, CDC tracks changes in data (usually a database) and provides plugins to act on those changes. For event-driven architectures, CDC is especially useful as a consistent data delivery mechanism between service boundaries (e.g., Outbox Pattern). In a complex microservice environment, CDC helps to simplify data delivery logic by offloading the burden to the CDC systems. To illustrate, let’s take a reference architecture to stream stock updates from PostgreSQL into QuestDB. A simple Java Spring App polls stock prices by ticker symbol and updates the current price to a PostgreSQL database. Then the updates are detected by Debezium (a popular CDC system) and fed to a Kafka topic. Finally, the Kafka Connect QuestDB connector listens to that topic and streams changes into QuestDB for analysis. Structuring the data pipeline this way allows the application to be simple. The Java Spring App only needs to fetch the latest stock data and commit to PostgreSQL. Since PostgreSQL is an excellent OLTP (transactional) database, the app can rely on the ACID compliance to ensure that only the committed data will be seen by downstream services. The app developer does not need to worry about complicated retry logic or out-of-sync datasets. From the database standpoint, PostgreSQL can be optimized to do what it does best — transactional queries. Kafka can be leveraged to reliably feed data to other endpoints, and QuestDB can be used to store historical data to run analytical queries and visualization. So without further ado, let’s get to the example: Prerequisites Git Docker Engine: 20.10+ Setup To run the example locally, first clone the repo from this link: Next, navigate to the stocks sample to build and run the Docker compose files: $ cd kafka-questdb-connector/kafka-questdb-connector-samples/stocks/ $ docker compose build $ docker compose up This will build the Dockerfile for the Java Spring App/Kafka Connector for QuestDB and pull down PostgreSQL (preconfigured with Debezium), Kafka/Zookeeper, QuestDB, and Grafana containers. Kafka and Kafka Connect take a bit to initialize. Wait for the logs to stop by inspecting the connect container. Start the Debezium Connector At this point, the Java App is continuously updating the stock table in PostgreSQL, but the connections have not been set up. Create the Debezium connector (i.e., PostgreSQL → Debezium → Kafka) by executing the following: curl -X POST -H "Content-Type: application/json" -d '{"name":"debezium_source","config":{"tasks.max":1,"database.hostname":"postgres","database.port":5432,"database.user":"postgres","database.password":"postgres","connector.class":"io.debezium.connector.postgresql.PostgresConnector","database.dbname":"postgres","database.server.name":"dbserver1"} ' localhost:8083/connectors Start the QuestDB Kafka Connect Sink Finish the plumbing by creating the Kafka Connect side (i.e., Kafka → QuestDB sink): curl -X POST -H "Content-Type: application/json" -d '{"name":"questdb-connect","config":{"topics":"dbserver1.public.stock","table":"stock", "connector.class":"io.questdb.kafka.QuestDBSinkConnector","tasks.max":"1","key.converter":"org.apache.kafka.connect.storage.StringConverter","value.converter":"org.apache.kafka.connect.json.JsonConverter","host":"questdb", "transforms":"unwrap", "transforms.unwrap.type":"io.debezium.transforms.ExtractNewRecordState", "include.key": "false", "symbols": "symbol", "timestamp.field.name": "last_update"}' localhost:8083/connectors Final Result Now, all the updates written to the PostgreSQL table will be also be reflected in QuestDB. To validate, navigate to localhost:19000 and select from the stock table: select * from stock; You can also run aggregations for a more complex analysis: SELECT timestamp, symbol, avg(price), min(price), max(price) FROM stock where symbol = 'IBM' SAMPLE by 1m align to calendar; Finally, you can interact with a Grafana dashboard for visualization here: The visualization is a candle chart composed of changes captured by Debezium; each candle shows the opening, closing, high, and low price in a given time interval. The time interval can be changed by selecting the top-left ‘Interval’ option: Deep Dive Now that we have the sample application up and running, let’s take a deeper dive into each component in the stocks example. We will look at the following files: ├── kafka-questdb-connector/kafka-questdb-connector-samples/stocks/ │ ├── Dockerfile-App | | -- The Dockerfile to package our Java App | ├── Dockerfile-Connect | | -- The Dockerfile to combine the Debezium container | | -- image the with QuestDB Kafka connector │ ├── src/main/resources/schema.sql | | -- The SQL which creates the stock table in PostgreSQL | | -- and populates it with initial data │ ├── src/main/java/com/questdb/kafka/connector/samples/StocksApplication.java | | -- The Java Spring App which updates the stock table in PostgreSQL | | -- in regular intervals ... Producer (Java App) The producer is a simple Java Spring Boot App. It has two components: 1. The schema.sql file. This file is used to create the stock table in PostgreSQL and populate it with initial data. It’s picked up by the Spring Boot app and executed on startup. create table if not exists stock ( id serial primary key, symbol varchar(10) unique, price float8, last_update timestamp ); insert into stock (symbol, price, last_update) values ('AAPL', 500.0, now()) ON CONFLICT DO NOTHING; insert into stock (symbol, price, last_update) values ('IBM', 50.0, now()) ON CONFLICT DO NOTHING; insert into stock (symbol, price, last_update) values ('MSFT', 100.0, now()) ON CONFLICT DO NOTHING; insert into stock (symbol, price, last_update) values ('GOOG', 1000.0, now()) ON CONFLICT DO NOTHING; insert into stock (symbol, price, last_update) values ('FB', 200.0, now()) ON CONFLICT DO NOTHING; insert into stock (symbol, price, last_update) values ('AMZN', 1000.0, now()) ON CONFLICT DO NOTHING; insert into stock (symbol, price, last_update) values ('TSLA', 500.0, now()) ON CONFLICT DO NOTHING; insert into stock (symbol, price, last_update) values ('NFLX', 500.0, now()) ON CONFLICT DO NOTHING; insert into stock (symbol, price, last_update) values ('TWTR', 50.0, now()) ON CONFLICT DO NOTHING; insert into stock (symbol, price, last_update) values ('SNAP', 10.0, now()) ON CONFLICT DO NOTHING; The ON CONFLICT DO NOTHING clause is used to avoid duplicate entries in the table when the app is restarted. 2. Java code to update prices and timestamps with a random value. The updates are not perfectly random, the application uses a very simple algorithm to generate updates, which very roughly resembles stock price movements. In a real-life scenario, the application would fetch the price from some external source. The producer is packaged into a minimal Dockerfile, Dockerfile-App, and linked to PostgreSQL: FROM maven:3.8-jdk-11-slim AS builder COPY ./pom.xml /opt/stocks/pom.xml COPY ./src ./opt/stocks/src WORKDIR /opt/stocks RUN mvn clean install -DskipTests FROM azul/zulu-openjdk:11-latest COPY --from=builder /opt/stocks/target/kafka-samples-stocks-*.jar /stocks.jar CMD ["java", "-jar", "/stocks.jar"] Kafka Connect, Debezium, and QuestDB Kafka Connector Before we dive into the Kafka Connect, Debezium, and the QuestDB Kafka connector configurations, let’s take a look at their relationship with each other. Kafka Connect is a framework for building connectors to move data between Kafka and other systems. It supports two classes of connectors: Source connectors: read data from a source system and write it to Kafka. Sink connectors: read data from Kafka and write it to a sink system. Debezium is a source connector for Kafka Connect that can monitor and capture the row-level changes in the databases. What does it mean? Whenever a row is inserted, updated, or deleted in a database, Debezium will capture the change and write it as an event to Kafka. On a technical level, Debezium is a Kafka Connect connector running inside the Kafka Connect framework. This is reflected in the Debezium container image, which packages the Kafka Connect with Debezium connectors pre-installed. QuestDB Kafka connector is also a Kafka Connect connector. It’s a Sink connector that reads data from Kafka and writes it to QuestDB. We add the QuestDB Kafka connector to the Debezium container image, and we get a Kafka Connect image that has both Debezium and QuestDB Kafka connectors installed. This is the Dockerfile we use to build the image: FROM ubuntu:latest AS builder WORKDIR /opt RUN apt-get update && apt-get install -y curl wget unzip jq RUN curl -s https://api.github.com/repos/questdb/kafka-questdb-connector/releases/latest | jq -r '.assets[]|select(.content_type == "application/zip")|.browser_download_url'| wget -qi - RUN unzip kafka-questdb-connector-*-bin.zip FROM debezium/connect:1.9.6.Final COPY --from=builder /opt/kafka-questdb-connector/*.jar /kafka/connect/questdb-connector/ The Dockerfile downloads the latest release of the QuestDB Kafka connector, unzips it, and copies it to the Debezium container image. The resulting image has both Debezium and QuestDB Kafka connectors installed: The overall Kafka connector is completed with a Source connector and a Sink connector: Debezium Connector We already know that Debezium is a Kafka Connect connector that can monitor and capture the row-level changes in the databases. We also have a Docker image that has both Debezium and QuestDB Kafka connectors installed. However, at this point, neither of the connectors are running. We need to configure and start them. This is done via a CURL command that sends a POST request to the Kafka Connect REST API. curl -X POST -H "Content-Type: application/json" -d '{"name":"debezium_source","config":{"tasks.max":1,"database.hostname":"postgres","database.port":5432,"database.user":"postgres","database.password":"postgres","connector.class":"io.debezium.connector.postgresql.PostgresConnector","database.dbname":"postgres","database.server.name":"dbserver1"} ' localhost:8083/connectors The request body contains the configuration for the Debezium connector. Let’s break it down: { "name": "debezium_source", "config": { "tasks.max": 1, "database.hostname": "postgres", "database.port": 5432, "database.user": "postgres", "database.password": "postgres", "connector.class": "io.debezium.connector.postgresql.PostgresConnector", "database.dbname": "postgres", "database.server.name": "dbserver1" } } It listens to changes in the PostgreSQL database and publishes it to Kafka with the above configuration. The topic name defaults to <server-name>.<schema>.<table>. In our example, it is dbserver1.public.stock. Why? Because the database server name is dbserver1, the schema is public, and the only table we have is stock. So after we send the request, Debezium will start listening to changes in the stock table and publish them to the dbserver1.public.stock topic. QuestDB Kafka Connector At this point, we have a PostgreSQL table stock being populated with random stock prices and a Kafka topic dbserver1.public.stock that contains the changes. The next step is to configure the QuestDB Kafka connector to read from the dbserver1.public.stock topic and write the data to QuestDB. Let’s take a deeper look at the configuration at the start of the QuestDB Kafka Connect sink: { "name": "questdb-connect", "config": { "topics": "dbserver1.public.stock", "table": "stock", "connector.class": "io.questdb.kafka.QuestDBSinkConnector", "tasks.max": "1", "key.converter": "org.apache.kafka.connect.storage.StringConverter", "value.converter": "org.apache.kafka.connect.json.JsonConverter", "host": "questdb", "transforms": "unwrap", "transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState", "include.key": "false", "symbols": "symbol", "timestamp.field.name": "last_update" } } The important things to note here are: table and topics: The QuestDB Kafka connector will create a QuestDB table with the name stock and write the data from the dbserver1.public.stock topic to it. host: The QuestDB Kafka connector will connect to QuestDB running on the questdb host. This is the name of the QuestDB container. connector.class: The QuestDB Kafka connector class name. This tells Kafka Connect to use the QuestDB Kafka connector. value.converter: The Debezium connector produces the data in JSON format. This is why we need to configure the QuestDB connector to use the JSON converter to read the data: org.apache.kafka.connect.json.JsonConverter. symbols: Stock symbols are translated to QuestDB symbol type, used for string values with low cardinality (e.g., enums). timestamp.field.name: Since QuestDB has great support for timestamp and partitioning based on that, we can specify the designated timestamp column. transforms: unwrap field uses io.debezium.transforms.ExtractNewRecordState type to extract just the new data and not the metadata that Debezium emits. In other words, this is a filter to basically take the payload.after portion of the Debezium data on the Kafka topics. The ExtractNewRecordState transform is probably the least intuitive part of the configuration. Let’s have a closer look at it. In short, for every change in the PostgreSQL table, the Debezium emits a JSON message to a Kafka topic such as the following: { "schema": { "comment": "this contains Debezium message schema, it's not very relevant for this sample" }, "payload": { "before": null, "after": { "id": 8, "symbol": "NFLX", "price": 1544.3357414199545, "last_update": 1666172978269856 }, "source": { "version": "1.9.6.Final", "connector": "postgresql", "name": "dbserver1", "ts_ms": 1666172978272, "snapshot": "false", "db": "postgres", "sequence": "[\"87397208\",\"87397208\"]", "schema": "public", "table": "stock", "txId": 402087, "lsn": 87397208, "xmin": null }, "op": "u", "ts_ms": 1666172978637, "transaction": null } } Don’t get scared if you feel overwhelmed by the sheer size of this message. Most of the fields are metadata, and they are not relevant to this sample. The important point is that we cannot push the whole JSON message to QuestDB and we do not want all the metadata in QuestDB. We need to extract the payload.after portion of the message and only then push it to QuestDB. This is exactly what the ExtractNewRecordState transform does: it transforms the big message into a smaller one that contains only the payload.after portion of the message. Hence, it is as if the message looked like this: { "id": 8, "symbol": "NFLX", "price": 1544.3357414199545, "last_update": 1666172978269856 } This is the message we can push to QuestDB. The QuestDB Kafka connector will read this message and write it to the QuestDB table. The QuestDB Kafka connector will also create the QuestDB table if it does not exist. The QuestDB table will have the same schema as the JSON message — where each JSON field will be a column in the QuestDB table. QuestDB and Grafana Once the data is written to QuestDB tables, we can work with the time-series data easier. Since QuestDB is compatible with PostgreSQL wire protocol, we can use the PostgreSQL data source on Grafana to visualize the data. The preconfigure dashboard is using the following query: SELECT $__time(timestamp), min(price) as low, max(price) as high, first(price) as open, last(price) as close FROM stock WHERE $__timeFilter(timestamp) and symbol = '$Symbol' SAMPLE BY $Interval ALIGN TO CALENDAR; We have created a system that continuously tracks and stores the latest prices for multiple stocks in a PostgreSQL table. These prices are then fed as events to Kafka through Debezium, which captures every price change. The QuestDB Kafka connector reads these events from Kafka and stores each change as a new row in QuestDB, allowing us to retain a comprehensive history of stock prices. This history can then be analyzed and visualized using tools such as Grafana, as demonstrated by the candle chart. Conclusion This sample project is a foundational reference architecture to stream data from a relational database into an optimized time series database. For existing projects that are using PostgreSQL, Debezium can be configured to start streaming data to QuestDB and take advantage of time series queries and partitioning. For databases that are also storing raw historical data, adopting Debezium may need some architectural changes. However, this is beneficial as it is an opportunity to improve performance and establish service boundaries between a transactional database and an analytical, time-series database. This reference architecture can be extended to configure Kafka Connect to also stream to other data warehouses for long-term storage. After inspecting the data, QuestDB can also be configured to downsample the data for longer-term storage or even detach partitions to save space.

By Yitaek Hwang CORE
Kubernetes vs Docker: Differences Explained
Kubernetes vs Docker: Differences Explained

Kubernetes vs Docker: Differences Explained Containerization has existed for decades but has seen increasing adoption in recent years for application development and modernization. This article covers two container solutions and their uses: Docker, which is the container engine solution, its container orchestration solution Docker Compose, and Docker Swarm, which is a cluster-container orchestration solution. Kubernetes, the alternative cluster-container solution, compares it to Docker Swarm to help you choose the one that best meets your requirements. What Is Containerization? A containerization is a form of virtualization at the application level. It aims to package an application with all its dependencies, runtimes, libraries, and configuration files in one isolated executable package, which is called a container. The operating system (OS) is not included in the container, which makes it different from virtual machines (VMs), which are virtualized at the hardware level and include the OS. While the concept behind virtualization is the sharing of physical resources between several virtual machines, containers share the kernel of one OS between several containers. Unlike virtual machines, containers are lightweight precisely because they don’t contain the OS. This is why containers take seconds to boot. In addition, containers can easily be deployed on different operating systems (Windows, Linux, macOS) and in different environments (cloud, VM, physical server) without requiring any changes. In 2013, Docker Inc. introduced Docker in an attempt to standardize containers to be used widely and on different platforms. A year later, Google introduced Kubernetes as a solution to manage a cluster of container hosts. The definitions of the two solutions will show the difference between Kubernetes and Docker. What Is Docker? Docker is an open-source platform to package and run applications in standard containers that can run across different platforms in the same behavior. With Docker, containerized applications are isolated from the host, which offers the flexibility of delivering applications to any platform running any OS. Furthermore, the Docker engine manages containers and allows them to run simultaneously on the same host. Due to the client-server architecture, Docker consists of client- and server-side components (Docker client and Docker daemon). The client and the daemon (Dockerd) can run on the same system, or you can connect the client to a remote daemon. The daemon processes the API requests sent by the client in addition to managing the other Docker objects (containers, networks, volumes, images, etc.). Docker Desktop is the installer of Docker client and daemon and includes other components like Docker Compose, Docker CLI (Command Line Interface), and more. It can be installed on different platforms: Windows, Linux, and macOS. Developers can design an application to run on multiple containers on the same host, which creates the need to manage multiple containers at the same time. For this reason, Docker Inc. introduced Docker Compose. Docker vs Docker Compose can be summarized as follows: Docker can manage a container, while Compose can manage multiple containers on one host. Docker Compose Managing multi-containerized applications on the same host is a complicated and time-consuming task. Docker Compose, the orchestration tool for a single host, manages multi-containerized applications defined on one host using the Compose file format. Docker Compose allows running multiple containers at the same time by creating one YAML configuration file where you define all the containers. Compose allows you to split the application into several containers instead of building it in one container. You can split your application into sub-sub services called microservices and run each microservice in a container. Then you can start all the containers by running a single command through Compose. Docker Swarm Developers can design an application to run on multiple containers on different hosts, which creates the need for an orchestration solution for a cluster of containers across different hosts. For this reason, Docker Inc. introduced Docker Swarm. Docker Swarm or Docker in Swarm mode is a cluster of Docker engines that can be enabled after installing Docker. Swarm allows managing multiple containers on different hosts, unlike Compose, which allows managing multiple containers on the same host only. What Is Kubernetes? Kubernetes (K8s) is an orchestration tool that manages containers on one or more hosts. K8s clusters the hosts whether they are on-premises, in the cloud, or in hybrid environments and can integrate with Docker and other container platforms. Google initially developed and introduced Kubernetes to automate the deployment and management of containers. K8s provides several features to support resiliency, like container fault tolerance, load balancing across hosts, and automatic creation and removal of containers. Kubernetes manages a cluster of one or more hosts, which are either master nodes or worker nodes. The master nodes contain the control panel components of Kubernetes, while the worker nodes contain non-control panel components (Kubelet and Kube-proxy). The recommendation is to have at least a cluster of four hosts: at least one master node and three worker nodes to run your tests. Control Panel Components (Master Node) The master node can span across multiple nodes but can run only on one computer. It is recommended that you avoid creating application containers on the master node. The master is responsible for managing the cluster. It responds to cluster events, makes cluster decisions, schedules operations with containers, starts up a new Pod (a group of containers on the same host and the smallest unit in Kubernetes), runs control loops, etc. Apiserver is the control panel frontend, which exposes an API to other Kubernetes components. It handles the access and authentication of the other components. Etcd is a database that stores all cluster key/value data. Each master node should have a copy of etcd to ensure high availability. Kube scheduler is responsible for assigning a node for the newly created Pods. Kube control manager is a set of controller processes that run in a single process to reduce complexity. The controller process is a control loop that watches the shared state of the cluster through the apiserver. When the state of the cluster changes, it takes action to change it back to the desired state. The control manager monitors the state of nodes, jobs, service accounts, tokens, and more. Cloud controller manager is an optional component that allows the cluster to communicate with the APIs of cloud providers. It separates the components that interact with the cloud from those that interact with the internal cluster. Node Components (Working Nodes) The working nodes are the non-master nodes. There are two node components: kubelet and kube-proxy. They should run on each working node in addition to container runtime software like Docker. Kubelet is an agent that runs on the working node to make sure that each container runs in a Pod. It manages the containers that were created by Kubernetes to ensure they are running in a healthy state. Kube-proxy is a network proxy running on each working node and is part of the Kubernetes network service. It allows communication between Pods and the cluster or the external network. Other Components Service is a logical set of Pods that work together at a given time. Unlike Pods, the IP address of a service is fixed. This fixes the issue created when a Pod is deleted so that other Pods or objects can communicate with the service instead. The set of Pods of one service is selected by assigning a policy to the service to filter Pods based on labels. The label is a key/value pair of attributes that can be assigned to Pods, services, or other objects. Labels allow querying objects based on common attributes and assign tasks to the selection. Each object can have one or more labels. A key can only be defined one time in an object. Kubernetes vs Docker Swarm: What Is Better? Kubernetes and Docker are different scope solutions that can complete each other to make a powerful combination. Thus, Docker vs Kubernetes is not a correct comparison. Docker allows developers to package applications in isolated containers. Developers can deploy those containers to other machines without worrying about compatibility with operating systems. Developers can use Docker Compose to manage containers on one host. But Docker Compose vs Kubernetes is not an accurate comparison either since the solutions are for different scopes. The scope of Compose is limited to one host, while that of Kubernetes is for a cluster of hosts. When the number of containers and hosts becomes high, developers can use Docker Swarm or Kubernetes to orchestrate Docker containers and manage them in a cluster. Both Kubernetes and Docker Swarm are container orchestration solutions in a cluster setup. Kubernetes is more widely used than Swarm in large environments because it provides high availability, load balancing, scheduling, and monitoring to provide an always-on, reliable, and robust solution. The following points will highlight the differences that make K8s a more robust solution to consider. Installation Swarm is included in the Docker engine already. Using certain Docker CLI (command-line interface) standard commands, Swarm can easily be enabled. Kubernetes deployment is more complex, though, because you need to learn new non-standard commands to install and use it. In addition, you need to learn to use the specific deployment tools used in Kubernetes. The cluster nodes should be configured manually in Kubernetes, like defining the master, controller, scheduler, etc. Note: The complexity of Kubernetes installation can be overcome by using Kubernetes as a service (KaaS). Major cloud platforms offer Kaas; those include Google Kubernetes Engine (GKE), which is part of Google Cloud Platform (GCP), and Amazon Elastic Kubernetes Service (EKS). Scalability Both solutions support scalability. However, it is easier to achieve scalability with Swarm, while with Kubernetes, it is more flexible to do so. Swarm uses the simple Docker APIs to scale containers and services on demand in an easier and faster way. Kubernetes, on the other hand, supports auto-scaling, which makes scalability more flexible. But due to the unified APIs that it uses, the scalability is more complex. Load Balancing Swarm has a built-in load-balancing feature and is performed automatically using the internal network. All the requests to the cluster are load-balanced across hosts. Swarm uses DNS to load-balance the request to service names. No need for manual configuration for this feature in Swarm. Kubernetes should be configured manually to support load balancing. You should define policies in Pods for load balancing. Thus Pods should be defined as services. Kubernetes uses Ingress for load balancing, which is an object that allows accessing the Kubernetes services from an external network. High Availability Both solutions natively support high-availability features. The swarm manager monitors a cluster’s state and takes action to fix any change in the actual state to meet the desired state. Whenever a worker node crashes, the swarm manager recreates the containers on another running node. Kubernetes also automatically detects faulty nodes and seamlessly fails over to new nodes. Monitoring Swarm does not have built-in monitoring and logging tools. It requires third-party tools for this purpose, like Reimann or Elasticsearch, and Kibana (ELK). Kubernetes has the ELK monitoring tool built-in to natively monitor the cluster state. In addition, a number of monitoring tools are supported to monitor other objects like nodes, containers, Pods, etc. Conclusion Docker is a containerization platform for building and deploying applications in containers independently from the operating system. It can be installed using Docker Desktop on Windows, Linux, or macOS and includes other solutions like Compose and Swarm. When multiple containers are created on the same host, managing them becomes more complicated. Docker Compose can be used in this case to easily manage multiple containers of one application on the same host. In large environments, a cluster of multiple nodes becomes a need to ensure high availability and other advanced features. Here comes the need for a container orchestration solution like Docker Swarm and, alternatively, Kubernetes. The comparison between the features of these two platforms shows that both support scalability, high availability, and load balancing. However, Swarm is easier to install and use, while Kubernetes supports auto-scaling and built-in monitoring tools. This explains why most large organizations use Kubernetes with Docker for applications that are largely distributed across hundreds of containers.

By Alex Tray
Docker Use Cases: 15 Most Common Ways to Use Docker
Docker Use Cases: 15 Most Common Ways to Use Docker

Containerizing applications instead of hosting them on virtual machines is a concept that has been trending in the last few years, making container management popular. Docker sits at the heart of this transition, helping organizations seamlessly adopt containerization technology. Recently, Docker use cases can be seen across all industries, regardless of size and nature. What Is Docker? Docker is a containerization technology that enables developers to package a service into a container along with its dependencies, libraries, and operating system. By separating the apps from the infrastructure, Docker allows you to seamlessly deploy and move apps across a variety of environments. Docker makes it very simple to create and manage containers using the following steps: Create a Docker file and add the code Build a Docker image based on the Dockerfile Create a running instance from the Docker image Scale containers on-demand What Are Microservices? Traditionally, software is developed using a monolithic architecture wherein the entire software is developed as a single entity using a waterfall development method. These monolithic architectures come with size, complexity, and scalability challenges. Microservices or microservices architecture addresses these challenges by allowing developers to break down an application into smaller independent units which communicate with each other using REST APIs. Typically, each function can be developed as an independent service, which means each service can function independently without affecting any of the other services. Therefore, organizations can accelerate release cycles, scale operations on-demand, and seamlessly make changes to code without application downtimes. Migrating from a monolithic architecture to microservices is a popular Docker use case. What Are Containers? A container is a notable use case of microservices architecture. A container is a standard unit of software that isolates an application from its underlying infrastructure by packaging it with all dependencies and required resources. Unlike virtual machines, which virtualize hardware layers, containers only virtualize software layers above the OS level. We will discuss more about container management. Business Benefits of Docker Docker has become synonymous with containerization because of its portability and ecosystem. All major cloud providers such as AWS, GCP, and Azure have incorporated Docker into the system and also provide support. Therefore, you can seamlessly run Docker containers on any environment including VirtualBox, Rackspace, and OpenStack. Scalability is one of the biggest benefits of Docker. By deploying multiple containers on a single host, organizations can significantly reduce operational costs. Moreover, Docker allows you to deploy services on commodity hardware thus eliminating the costs of purchasing expensive servers.Docker’s underlying motto includes fewer resources and smaller engineering teams. Organizations can therefore perform operations using fewer resources and thereby require less staff to monitor and manage such operations. This means cost savings and more ROI. Docker allows you to instantly create and manage containers with ease, which facilitates faster deployments. The ability to deploy and scale infrastructure using a simple YAML config file makes it easy to use all while offering a faster time to market. Security is prioritized with each isolated container. You will find the most common Docker use cases below. Docker Use Case 1: From Monolith to Microservices Architecture Gone are the days when software was developed using only a monolith approach (waterfall model) wherein the entire software was developed as a single entity. Although monolith architecture facilitates the building, testing, deploying and horizontal scaling of software, as the application gets bigger, management can become a challenge. Any bug in any function can affect the entire app. Furthermore, making a simple change requires rewriting, testing and deploying the entire application. As such, adopting new technologies isn’t flexible. On the other hand, Microservices break down the app into multiple independent and modular services which each possess their own database schema and communicate with each other via APIs. The microservices architecture suits the DevOps-enabled infrastructures as it facilitates continuous delivery. By leveraging Docker, organizations can easily incorporate DevOps best practices into the infrastructure allowing them to stay ahead of the competition. Moreover, Docker allows developers to easily share software along with its dependencies with operations teams and ensure that it runs the same way on both ends. For instance, administrators can use the Docker images created by the developers using Dockerfiles to stage and update production environments. As such, the complexity of building and configuring CI/CD pipelines is reduced allowing for a higher level of control over all changes made to the infrastructure. Load balancing configuration becomes easier too. Docker Use Case 2: Increased Productivity In a traditional development environment, the complexity usually lies in defining, building and configuring development environments using manual efforts without delaying the release cycles. The lack of portability causes inconsistent behavior in the apps. Docker allows you to build containerized development environments using Docker images and to easily set up and use the development environment, all while delivering consistent performance throughout its lifecycle. Moreover, it offers seamless support for all tools, frameworks, and technologies used in the development environment. Secondly, Docker environments facilitate automated builds, automated tests and Webhooks. This means you can easily integrate Bitbucket or GitHub repos with the development environment and create automatic builds from the source code and move them into the Docker Repo. A connected workflow between developers and CI/CD tools also means faster releases. Docker comes with a cloud-managed container registry eliminating the need to manage your own registry, which can get expensive when you scale the underlying infrastructure. Moreover, the complexity in configuration becomes a thing of the past. Implementing role-based access allows people across various teams to securely access Docker images. Also, Slack integration allows teams to seamlessly collaborate and coordinate throughout the product life cycle. Offering accelerated development, automated workflows, and seamless collaboration, there’s no doubt that Docker increases productivity. Docker Use Case 3: Infrastructure as Code The microservice architecture enables you to break down software into multiple service modules allowing you to work individually with each function. While this brings scalability and automation, there’s a catch: it leaves you with hundreds of services to monitor and manage. This is where Infrastructure as Code (IaC) comes to your rescue, enabling you to manage the infrastructure using code. Basically, it allows you to define the provisioning of resources for the infrastructure using config files and convert the infrastructure into software, thereby taking advantage of software best practices such as CI/CD processes, automation, reusability and versioning. Docker brings IaC into the development phase of the CI/CD pipeline as developers can use Docker-compose to build composite apps using multiple services and ensure that it works consistently across the pipeline. IaC is a typical example of a Docker use case. Docker Use Case 4: Multi-Environment Standardization Docker provides a production parity environment for all its members across the pipeline. Consider an instance wherein a software development team is evolving. When a new member joins the team, each member has to install/update the operating system, database, node, yarn, etc. It can take 1-2 days just to get the machines ready. Furthermore, it’s a challenge to ensure that everyone gets the same OS, program versions, database versions, node versions, code editor extensions, and configurations. For instance, if you use two different versions of a library for two different programs, you need to install two versions. In addition, custom environment variables should be specified before you execute these programs. Now, what if you make certain last minute changes to dependencies in the development phase and forget to make those changes in the production? Docker packages all the required resources into a container and ensures that there are no conflicts between dependencies. Moreover, you can monitor untracked elements that break your environment. Docker standardizes the environment ensuring that containers work similarly throughout the CI/CD pipeline. Docker Use Case 5: Loosely Coupled Architecture Gone are the days of the traditional waterfall software development model. Today, developers, enabled by the cloud and microservices architecture, are breaking applications into smaller units and easily building them as loosely coupled services that communicate with each other via REST APIs. Docker helps developers package each service into a container along with the required resources making it easy to deploy, move and update them. Telecom industries are leveraging the 5G technology and Docker’s support for software-defined network technology to build loosely coupled architectures. The new 5G technology supports network function virtualization allowing telecoms to virtualize network appliance hardware. As such, they can divide and develop each network function into a service and package it into a container. These containers can be installed on commodity hardware which allows telecoms to eliminate the need for expensive hardware infrastructure thus significantly reducing costs. The fairly recent entrance of public cloud providers into the telecom market has shrunk the profits of telecom operators and ISVs. They can now use Docker to build cost-effective public clouds with the existing infrastructure, thereby turning docker use cases into new revenue streams. Docker Use Case 6: For Multi-Tenancy Multi-tenancy is a cloud deployment model wherein a single installed application serves multiple customers with the data of each customer being completely isolated. Software-as-a-Service (SaaS) apps mostly use the multi-tenancy approach. There are 4 common approaches to a multi-tenancy model: Shared database – Isolated Schema: All tenants’ data is stored in a single database in a separate schema for each tenant. The isolation is medium. Shared Database – Shared Schema: All tenants’ data is stored in a single database wherein each tenant’s data is identified by a “Foreign Key.” The isolation level is low. Isolated database – Shared App Server: The data related to each tenant is stored in a separate database. The isolation level is high. Docker-based Isolated tenants: A separate database stores each tenant’s data and each tenant is identified by a new set of containers. While the tenant data is separated, all of these approaches use the same application server for all tenants. That said, Docker allows for complete isolation wherein each tenant app code runs inside its own container for each tenant. To do this, organizations can simply convert the app code into a Docker image to run containers and use docker-compose.yaml to define the configuration for multi-container and multi-tenant apps, thus enabling them to run containers for each tenant. A separate Postgres database and a separate app server will be used for each tenant running inside the container. Each tenant will need 2 database servers and 2 app servers. You can route your requests to the right tenant container by adding an NGINX server container. Docker Use Case 7: Speed Up Your CI/CD Pipeline Deployments Unlike monolith applications which take a few minutes to turn on, containers launch within a few seconds seeing as they are lightweight. As such, you can quickly deploy code at lightning speeds or rapidly make changes to codebases and libraries using containers in the CI/CD pipelines. However, it’s important to note that long build times can slow down the CI/CD deployments. This occurs because the CI/CD pipeline must start from scratch every time meaning dependencies must be pulled on each occasion. Luckily, Docker comes with a cache layer that makes it easy to overcome the build issue. That said, it only works on local machines and therefore is not available for remote runner machines. To solve this issue, use the “—from-cache” command to instruct Docker build to get the cache from the local machine image. If you don’t have the local existing docker image, you can simply create an image and pull it just before the execution of the “Docker build” command. It’s important to note that this method only uses the latest docker image base. Therefore, in order to get the earlier images caching, you should push and pull every docker image based on each stage. Docker Use Case 8: Isolated App Infrastructure One of Docker’s key advantages is its isolated application infrastructure. Each container is packaged with all dependencies, therefore you don’t need to worry about dependency conflicts. You can easily deploy and run multiple applications on one or multiple machines with ease, regardless of the OS, platform, and version of the app. Consider an instance wherein two servers are using different versions of the same application. By running these servers in independent containers, you can eliminate dependency issues. Docker also offers an SSH server for automation and debugging for each isolated container. Seeing as each service/daemon is isolated, it’s easy to monitor applications and resources running inside the isolated container and quickly identify errors. This allows you to run an immutable infrastructure, thereby minimizing any downtimes resulting from infrastructure changes. Docker Use Case 9: Portability – Ship Any Application Anywhere Portability is one of the top five Docker use cases. Portability is the ability of a software application to run on any environment regardless of the host OS, plugins, or platform. Containers offer portability seeing as they come packaged with all of the resources required to run an application such as the code, system libraries, runtime, libraries, and configuration settings. Portability is also measured by the amount of tweaking needed for an application to move to another host environment. For example, Linux containers run on all Linux distributions but sometimes fail to work on Windows environments. Docker offers complete portability allowing you to move an app between various environments without making any significant changes to its configuration. Docker has created a standard for containerization, it’s therefore no surprise that its containers are highly portable. Moreover, Docker containers use the host machine OS kernel and eliminate the need to add the OS. This makes them lightweight and easy to move between different environments. The foregoing is especially useful when developers want to test an application in various operating systems and analyze the results. Any discrepancies in code will only affect a single container and therefore won’t crash the entire operating system. Docker Use Case 10: Hybrid and Multi-cloud Enablement According to Channel Insider, the top three drivers of Docker adoption in organizations are hybrid clouds, VMware costs, and pressure from testing teams. Although hybrid clouds are flexible and allow you to run customized solutions, distributing the load across multiple environments can be a challenge. In order to facilitate seamless movement between clouds, cloud providers usually need to compromise on costs or feature sets. Docker eliminates these interoperability issues seeing as its containers run in the same way in both on-premise and cloud deployments. You can seamlessly move them between testing and production environments or internal clouds built using multiple cloud vendor offerings. Also, the complexity of deployment processes is reduced. Thanks to Docker, organizations can build hybrid and multi-cloud environments comprising two or more public/private clouds from different vendors. Migrating from AWS to the Azure cloud is easy. Plus, you can select services and distribute them across different clouds based on security protocols and service-level agreements. Docker Use Case 11: Reduce IT/Infrastructure Costs With virtual machines, you need to copy the entire guest operating system. Thankfully, this is not the case with Docker. Docker allows you to provision fewer resources enabling you to run more apps and facilitating efficient optimization of resources. For example, developer teams can consolidate resources onto a single server thus reducing storage costs. Furthermore, Docker comes with high scalability allowing you to provision required resources for a precise moment and automatically scale the infrastructure on-demand. You only pay for the resources you actually use. Moreover, apps running inside Docker deliver the same level of performance across the CI/CD pipeline, from development to testing, staging and production. As such, bugs and errors are minimized. This environment parity enables organizations to manage the infrastructure with minimal staff and technical resources therefore saving considerably on maintenance costs. Basically, Docker enhances productivity which means you don’t need to hire as many developers as you would in a traditional software development environment. Docker also comes with the highest level of security and, most importantly, it’s open-source and free. Docker Use Case 12: Security Practices Docker containers are secure by default. When you create a container using Docker, it will automatically create a set of namespaces and isolate the container. Therefore, a container cannot access or affect processes running inside another container. Similarly, each container gets its own network stack which means it cannot gain privileged access to the network ports, sockets and interfaces of other containers unless certain permissions are granted. In addition to resource accounting and limiting, control groups handle the provisioning of memory, compute and disk I/O resources. Distributed-Denial-of-Service (DDoS) attacks are thus successfully mitigated seeing as a resource-exhausted container cannot crash the system. When a container launches, the Docker daemon activates a set of restriction capabilities, augmenting the binary root with fine-grained access controls. This provides higher security seeing as a lot of processes that run as root don’t need real root privileges. Therefore, they can operate with lesser privileges. Another important feature is the running signed images using the Docker Content Trust Signature Verification feature defined in the dockerd config file. If you want to add an extra layer of security and harden the Docker containers, SELinux, Apparmor and GRSEC are notable tools that can help you do so. Docker Use Case 13: Disaster Recovery While hybrid and multi-cloud environments offer amazing benefits to organizations, they also pose certain challenges. Maintaining resilience is a notable one. In order to ensure business continuity, your applications must withstand errors and failures without data losses. You can’t afford downtimes when a component fails, especially with critical applications. As such, we recommend that you remove single points of failure using redundant component resiliency and access paths for high availability. Applications should also possess self-healing abilities. Containers can help you in this regard. Nevertheless, for cases where unforeseen failures arise, you need a disaster recovery plan that reduces business impact during human-created or natural failures. Docker containers can be easily and instantly created or destroyed. When a container fails, it is automatically replaced by another one seeing as containers are built using the Docker images and based on dockerfile configurations. Before moving an image to another environment, you can commit data to existing platforms. You can also restore data in case of a disaster. All of this being said, it’s important to understand that the underlying hosts may be connected to other components. Therefore, your disaster recovery plan should involve spinning up a replacement host as well. In addition, you should consider things like stateful servers, network and VPN configurations, etc. Docker Use Case 14: Easy Infrastructure Scaling Docker augments the microservices architecture wherein applications are broken down into independent services and packaged into containers. Organizations are taking advantage of microservices and cloud architectures and building distributed applications. Docker enables you to instantly spin up identical containers for an application and horizontally scale the infrastructure. As the number of containers increases, you’ll need to use a container orchestration tool such as Kubernetes or Docker Swarm. These tools come with smart scaling abilities that allow them to automatically scale up the infrastructure on-demand. They also help you optimize costs seeing as they remove the need to run unnecessary containers. It’s important to fine-grain components in order to make orchestration easier. In addition, stateless and disposable components will enable you to monitor and manage the lifecycle of the container with ease. Docker Use Case 15: Dependency Management Isolation of dependencies is the strongest feature of containers. Consider an instance where you have two applications that use different third party libraries. If the applications depend on different versions of the same library, it can be a challenge to keep tabs on the version difference throughout the product life cycle. You may need to allow containers to talk to each other. For instance, an app needs to talk to a database associated with another app. When you move an application to a new machine, you’ll have to remember all of the dependencies. Furthermore, version and package conflicts can be painful. When trying to reproduce an environment, there are OS, language, and package dependencies that should be taken care of. If you work with Python language, you’ll need dependency management tools such as virtualenv, venv, and pyenv. If the new environment doesn’t have a tool like git, you’ll need to create a script to install git CLI. The script keeps changing for different OS and OS versions, therefore every team member should be aware of these tools, which isn’t always easy. Be it OS, language, or CLI tool dependencies, Docker is the best tool for dependency management. By simply defining the configuration in the dockerfile along with its dependencies, you can seamlessly move an app to another machine or environment without the need to remember the dependencies, worry about package conflicts or keep track of user preferences and local machine configurations. Companies Powered by Docker Docker use cases are not limited by region or industry.PayPal is a leading US-based financial technology company which offers online payment services across the globe. The company processes around 200 payments per second across three different systems; PayPal, Venmo, and Braintree. As such, moving services between different clouds and architectures used to delay deployment and maintenance tasks. PayPal therefore implemented Docker and standardized its apps and operations across the infrastructure. To this day, the company has migrated 700 apps to Docker and works with 4000 software employees managing 200,000 containers and 8+ billion transactions per year while achieving a 50% increase in productivity. Adobe also uses Docker for containerization tasks. For instance, ColdFusion is an Adobe web programming language and application server that facilitates communication between web apps and backend systems. Adobe uses Docker to containerize and deploy ColdFusion services. It uses Docker Hub and Amazon Elastic Container Registry to host Docker images. Users can therefore pull these images to the local machine and run Docker commands. GE is one of the few companies that was bold enough to embrace the technology at its embryonic stage and has become a leader over the years. As such, the company operates multiple legacy apps which delay the deployment cycle. GE turned to Docker and has since managed to considerably reduce development to deployment time. Moreover, it is now able to achieve higher application density than VMs, which reduces operational costs. What’s Next After Docker? Once you understand how Docker is impacting different business aspects, the next thing you want to grasp is how to fully leverage Docker technology. As organization operations evolve, the need for thousands of containers arises. Thankfully, Docker is highly scalable and you can easily scale services up and down while defining the number of replicas needed using the scale. $ docker service scale frontend=50 You can also scale multiple services at once using the docker service scale command. Container Management Systems As business evolves, organizations need to scale operations on-demand. Furthermore, as container clusters increase, it becomes challenging to orchestrate them. Container management systems help you manage container tasks right from creation and deployment all the way to scaling and destruction, allowing you to use automation wherever applicable. Basically, they simplify container management. In addition to creating and removing containers, these systems manage other container-related tasks such as orchestration, security, scheduling, monitoring, storage, log management, load balancing, and network management. According to Datadog, organizations that use container management systems host 11.5 containers per host on average compared to 6.5 containers per host when managed by non-orchestrated environments. Popular Container Management Tools Here are some of the most popular container managers for your business. Kubernetes: Kubernetes is the most popular container orchestration tool developed by Google. It wasn’t long before Kubernetes became a de facto standard for container management and orchestration. Google moved the tool to Cloud Native Computing Foundation (CNCF), which means the tool is now supported by industry giants such as IBM, Microsoft, Google, and RedHat. It enables you to quickly package, test, deploy, and manage large clusters of containers with ease. It’s also open-source, cost-effective, and cloud-agnostic. Amazon EKS: As Kubernetes became a standard for container management cloud providers started to incorporate it into their platform offerings. Amazon Elastic Kubernetes Service (EKS) is a managed Kubernetes service for managing Kubernetes on AWS. With EKS organizations don’t need to install and configure Kubernetes work nodes or planes seeing as it handles that for you. In a nutshell, EKS acts as a container service and manages container orchestration for you. However, EKS only works with AWS cloud. Amazon ECS: Amazon Elastic Container Service (ECS) is a fully managed container orchestration tool for AWS environments which helps organizations manage microservices and batch jobs with ease. ECS looks similar to EKS but differs seeing as it manages container clusters, unlike EKS which only performs Kubernetes tasks. ECS is free while EKS charges $0.1 per hour. That said, seeing as it’s open-source, EKS provides you with more support from the community. ECS, on the other hand, is more of a proprietary tool. ECS is mostly useful for people who don’t have extensive DevOps resources or who find Kubernetes to be complex.Also read: Amazon ECS vs EKS. Amazon Fargate: Amazon Fargate is another container management as a serverless container service that enables organizations to run virtual machines without having to manage servers or container clusters. It’s actually a part of ECS but it also works with EKS. While ECS offers better control over infrastructure, it has some management complexities. If you want to run specific tasks without worrying about infrastructure management, we recommend Fargate. Azure Kubernetes Service: Azure Kubernetes Service (AKS) a container management tool that is a fully-managed Kubernetes service offered by Microsoft for Azure environments. It’s open-source and mostly free seeing as you only pay for the associated resources. AKS is integrated with the Azure Active Directory (AD) and offers a higher security level with role-based access controls. It seamlessly integrates with Microsoft solutions and is easy to manage using Azure CLI or the Azure portal. Google Kubernetes Service: Google Kubernetes Engine (GKE) is a Kubernetes-managed service developed by Google in 2015 to manage Google compute engine instances running Kubernetes. GKE was the first ever Kubernetes-managed service, followed by AKS and EKS. GKE offers more features and automation than its competitors. Google charges $0.15 per hour per cluster. Conclusion In today’s complex software development environments comprising multiple operating systems, programming languages, plugins, frameworks, container management, and architectures, Docker creates a standardized workflow environment for every member throughout the product life cycle. More importantly, Docker is open-source and supported by a strong and vibrant community which is available to help you with any issues. Failing to successfully leverage Docker use cases will surely keep you behind your competitors.

By Alfonso Valdes
Developing Cloud-Native Applications With Containerized Databases
Developing Cloud-Native Applications With Containerized Databases

With the advent of microservices in Kubernetes, individual developer teams now manage their own data, middleware, and databases. Automated tests and CI/CD pipelines must be revisited to include these new requirements. This session demonstrates how to use Kustomize and Tekton to provide Kube-Native automated workflows taking into account new parameters such as database operators, StorageClass, and PVC. The demonstration focuses on building a comics cards web application using a Flask-based frontend and leveraging MongoDB as the database. So for people who don't know me, I'm Nic. I'm a developer advocate with Ondat. And I also happen to run DoK UK, London. So if you from the London area, don't hesitate to subscribe to the meetup group. So as usual for DoK London, we focus on hands on labs. So today what I'm going to do, I'm going to try to do a demo in 10 minutes. So I sacrificed a goat yesterday. So hopefully, it should be fine. I tested several time. So if it's not working, it's not my fault, I promise. So the talk for today is going to be focused on developing cloud-native applications with containerized databases. But really, it's about shifting data left with Kubernetes, with shifting left in development is to bring testing and the idea to start to test your, your code and your application as early as possible in the development process. And so when you test a new application, it also means if you want to test it in context, which is the right thing to do, you need to bring two things, you need to bring the application context. But also you need to bring in from infrastructure features; you need to bring your infrastructure context to the application as well. And typically, you can bring this into networking storage, things like operators as well, because you will need your database, you need to be able to configure your database for an operator to define it in a declarative way you need so your storage as well. So typically, you want to run a cast solution. So container attached storage, there's plenty of there, plenty of them in the market, including Ondat, of course. And then you can also bring in things like CNI, CSI, of course, Service Security. And the result of this is actually a declarative model for developers to consume infrastructure as code. But it's, it's better than bare, you know, infrastructure as code that you're used to with Terraform. Because it's Kubernetes-native, it's really native to Kubernetes. You don't need to run anything, but Kubernetes, including your pipelines, right, and this is the purpose of today, I'm going to show you how to run different pipelines, whether you're on your laptop, and you want to build your development environment, and then how can you go from there, to your production environment. And of course, you have benefits for DevOps, as I said, databases on-demand, you know, pretty much anything that is required for your application in terms of the infrastructure, reduce costs, accelerate software delivery, and finally, you end up with a better code quality. So just before jumping into the demo, this is the picture of what I'm going to show you today. There's a lot of moving parts. But what I want to focus on is from dev laptop, you can see there's a fork here, right? So you have your application code that you commit from your developer laptop. And then there's two ways, right, so either you can make to, to your local, you know, Docker, and then push your container image to Docker Hub. Or if you want to develop to deploy in production, then you just again, you push into your production repository, and then you will rely on some sort of pipeline to deploy it into update the image, still on the same Docker repository, and then if it's production, you will want to have a tool like, like, so getups, with flux that will pick up the new manifests, and directly deploy to production. If you're on your laptop, well, you need some sort of tool to update to update the manifest to update the application, and then run it on your development Kubernetes cluster. All right. So now I'm going to show you how to do this. There's essentially two main components you want to have when you're developing an application starting with your laptop, if you want to bring things left. So first, you need some infrastructure definition, right? And the infrastructure definition is going to tell you, okay, how you want to configure your database? Do you want to enable certain features like on the storage side? Such as, you know, replication, and encryption if you want to test performance of your application, these kind of things. So for this, usually you will use Kustomize. And the principle of Kustomize is to use overlays. Right, so you have the base overlays, that is going to define all your different manifests that are required to deploy your application. So in our case, we're going to be looking at an application, which is pretty famous. Now, it's my Marvel application I've been showing multiple times at DoK. So it's composed of a front end, which is just a Flask application. So Python, just as the front end, and then we have database which is using MongoDB this time I said Postgres in the title but yeah, this time is going to be a MongoDB. And it's going to be using the MongoDB community operator. So here, I'm just going to show you a couple of things you want to configure. So this is my dev environment. So you can see, I'm going to be using my dev overlay for my dev laptop. And for this, this is where I define my MongoDB configuration. So you can see the storage space, I want to allocate, you can see the roles the permission, and what is good with operators is that it's immutable. Meaning that if someone even in production is trying to change with command line, the different parameters there is going to be rewritten by the operator. So it's also a good way, using an operator is also a good way to guarantee mutability for your application configuration, your application context. Here, for example, I'm using an Ondat storage class. And this is where I'm going to find I'm going to define like the type of the file system, so xfs, which is the recommended one for MongoDB. And if I want to have, for example, in the overlay, I want to have number of replicas equal to zero, because this is a dev environment, and maybe encrypted, I don't want to encrypt it. But for example, if I want to do some testing, when enabling replication and enabling replication, well, I just have to declaratively, change these parameters' values and then run against my development environment. That's for the infrastructure part. Now, of course, we need some way to dynamically update things on your laptop, right when you're developing the application. So this is now my application repository. You can see here, I've got my Docker file, my Python scripts for Flask, and I've got my HTML page that is rendering my front end. And you can see here a scaffold configuration. And basically, I'm going to use scaffold, and I'm sure I mean, there are other tools on the market, like tilt that also does that. And the idea is this one. So as soon as gonna, as I'm going to save changes on my HTML page, scaffold is running in development mode, which means that as soon as I save it, is going to run the pipeline to build the image, and then use Kustomize to deploy it to my local Kubernetes cluster. So let's take a look at the Kubernetes cluster environment. This is my laptop configuration, which is running on k3s, yes, you can see that scaffold has already deployed my application, when I started to run a scaffold in dev mode, I did it prior to the demo, because it takes some time. So now the idea is I want to modify my code and see the results directly without doing anything but saving my HTML configuration. But first, let's just check the application. So here, I'm just gonna launch my application there in a browser, so localhost port 8080, you can see the wonderful Marvel app. And here I've got a typo, right, there's more than one comic. So I want to change this into a plural. Okay, so I'm killing this. Now, the only thing I need to do is fine comic here. And replace it with comics like this. Okay, so you can see here, it's been modified. Now, look at this, right, so here, you see, currently, it's logging, everything happened happening into scaffold. So as soon as going as I'm going to say this, right? So you can see detects a change, and now is going to build, again the image and redeploy my application. And it's the same thing. If I want to change any configuration on the infrastructure side is going to do exactly the same thing. So now if I go back here, I should see terminating, it's already done, right. So if you see here, this one is 16 seconds. So scaffold, basically, to redeploy my application into my development cluster. Right? So now let's just check that locally, at least the change has been applied in fine to this. OK, change back. Okay, so now it's the I'm happy with my development environment. So now, imagine that you run all the tests you want to deploy into production. So remember, I'm using a single repository, and then I'm going to pick that repository with flux and flux as soon as it detects the change will apply it into my production cluster. So my production cluster just to show you here, it's there. So you can see I've got same thing, I've got my mobile application that is those two, those four containers there 41 minutes. This is when I changed them before. So you can also see like a lot of completing tasks. This is because I'm using Tekton. So Tekton is a coop native pipeline that allows you to contain your pipelines within Kubernetes. Every tasks that you apply in Tekton is just a CRD. So a custom resource definition, and everything from so you're going to combine different tasks like building your images, building your manifest, all of this will correspond to as many tasks, right, so as many Kubernetes resources, so this is why you see there. So now I need to do a couple of things, right. So I'm still on my laptop here. The first thing I want to do is, of course, I want to commit my changes into the repository. So I'm going to commit updates, I'm going to push it but before pushing it, I want to show you a couple of things. So this is the environment where this is the repository where I have my targets manifests, that flux is going to pick up and update in my production cluster. So this manifest there will be updated by Tekton. So the role of Tekton, again, is to use this, the code of the application, and build the image and replace the image here in the manifest with the particular you know, shard, digests, etc., etc. And same thing, if I want to, this manifest represents both the application configuration, as well as the infrastructure configuration, which I can also change if I want. For example, here, again, I'm using on that as the underlying distributed storage, number of replicas to encryption is true for my production environment. So now, I'm going to push it and then I'm going to, I'm going to, okay, it's pushed. So now I'm going to move to Tekton. And what I'm going to do, I'm going to, I'm going to manually trigger my pipeline. So you can use like, also, you know, dynamic, I mean, you can use Evans in Tekton. But good luck with this Tekton is kind of difficult. The learning curve is quite steep. Without using Evans, if you want to use Tekton Evans, it gets even worse. But yeah, so the idea here to create to launch your pipeline; remember, we are in Kubernetes. So it's again, as easy as creating a manifest. So as soon as I'm going to create that particular manifest, I'm going to launch the pipeline. And again, the pipeline is going to be to build the image to update the manifest. And then flux is going to pick up the repository of what is on the repository, and with the GitOps pipeline, update the application in the production cluster. So just watch this, right, so I'm going to be monitoring the flux reconciliation. So at the moment, you can see the last reconciliation I had happened and what you know, some time ago, so as soon as, as I'm going to trigger the pipeline, then flux is going to pick it up. And here we should see some updates. So let's do this. Now, there's a command to monitor what's happening. So Tekton pipeline runs logs, and then it's going to chain different tasks. So it's going to take probably one minute during that time, I'm going to explain to you a couple of tasks. So the different task is going to basically build a Docker image, again, right in the same way I've done on my laptop locally, and then push to Docker Hub; this is Tekton that is going to build the image, this time not using a Docker the local Docker process. Remember, it's running in Kubernetes. in Kubernetes, you don't want to mount the Docker sockets; that's bad because you need to do it as root. So typically, you use something like Kaniko, which is just user space to build the image. And actually on the Tekton marketplace, or library, you can find a lot of useful tab main tasks that are pre-built for you. And actually Kaniko is one of them, right? So build the image. And then from the image, I'm going to be using what we call a workspace, which is represented as a volume in Tekton. And that workspace is going to be transferring data from one task to another. So here I'm going to be transferring the image ID into my manifests and as a side effect as well, Tekton as a limitation which allows you to, I mean, which limits you to one single word, I mean, one volume between all the tasks. But because we are running Ondat here, we kind of removing that limitation, because you can run your volume on any node, regardless of where you put the pod consuming that volume is. So basically, we help you run Tekton, which is great. So here, it's done. So the manifest has been updated by my last task here. So if I go back into my repository, here, I should see an update 31 seconds ago. So now what has changed basically, here, I kept the same infrastructure environment, the only thing that's changed is the image ID here. So now I should see flux that is trying to reconcile. And here reconciliation there applied version. So it should be already there. If we go back to the production cluster, you should see now 23 seconds, right 25 seconds. So effectively, it's redeployed the whole application within production with the production configuration, right? So the last step is to check this now. I'm going to run again, the application but this time from the production perspective. So let's do this. I think that's on the screen here. So that's the moment of truth. Did it work? Okay, it works. So now I've changed my production environment from commic to commics with an S. And that concludes our demo. So just to conclude, I hope you enjoyed this demo. And the goal was really to show you that actually developing an application from your laptop to production only take what 10 minutes. That's about it, right? You just have to learn Kubernetes. So thank you for watching, and I'll see you next time. Bye.

By Sylvain Kalache
Securing Your Containers—Top 3 Challenges
Securing Your Containers—Top 3 Challenges

A cost-effective and less sophisticated alternative to Virtual Machines (Containers) have revolutionized the application delivery approach. They have dramatically reduced the intake of IT labor and resources in managing application infrastructure. Yet, while securing containers and containerized ecosystems, software teams are met with many roadblocks. Especially for enterprise teams accustomed to more traditional network security processes and strategies. We’ve usually preached that containers offer better security because they isolate the application from the host system and each other. Somewhere we have made it sound like they’re inherently secure and almost impenetrable to threats. But how far-fetched is this idea? Let’s dive right into it. Let’s get a high-level view of what the market looks like. According to Business wire, the global Container Security Market size is projected to reach $3.9 billion by 2027, that’s a 23.5% CAGR. Pretty much like any software, containerized applications can fall prey to security vulnerabilities that include bugs, inadequate authentication and authorization, and misconfiguration. Container Threat Model Let’s understand the container threat model below: Source: Container Security by Liz Rice Possible Vulnerable Factors in a Container External attackers (from outside) trying to access one of our deployments (Tesla getting crypto-hacked). Internal attackers who have some level of access to the prod environment (not necessarily admin). Malicious internal factors are the privileged internal users like developers and administrators who have access to the deployment. Inadvertent internal factors that might accidentally cause problems like carelessly storing some secret key or certificate in the container image. By introducing some new service or reducing the waiting time to enhance the customer experience, companies tend to open some ports in their servers or firewall. If not thoroughly guarded, this might become the passage of hackers. There can be several routes that can compromise your container security (and the above image pretty much summarizes so). Let’s discuss some of these factors precisely. Challenges 1. The Fault in Container Images It’s not all about having malicious software. Poorly configured container images can be the reason for introducing vulnerabilities. The problem begins when people think they can spin up their own image or download it from the cloud and start using it straight. We should be aware that every day new vulnerabilities are introduced on the cloud. Every container image needs to be scanned individually to prove it is secure. Some Known Cases To Avoid What if the image launches an extraneous daemon or service that allows unauthorized network access? What if it’s configured to work with more user privileges than are necessary? Another danger to look out for is if any secret key or credential is stored within the image. Note: From all the above cases, we notice that Docker will always give priority to its own network above your local network. Recommendations Pulling images from trusted container registries. Trusted container registries are not poorly configured. Usually referred to private registries, but not necessarily, unless they are encrypted and have authenticated connections. This should include credentials federated with existing network security controls. The container registry should undergo frequent maintenance tests to keep it devoid of any stale images with lingering vulnerabilities. Software teams need to frame shared practices with blue-green deployments or rollbacks of the container changes before pushing them into production. 2. Watch Out for Your Orchestration Security Popular orchestration tools like Kubernetes is unmissable while addressing security issues. It has become the prime attack surface. According to Salt Security, approximately 34% of the organizations have absolutely no API security strategy in place. Adding to that, 27% say they have just a basic strategy that involves minimal scanning and manual reviews of API security status and no controls over them. When Kubernetes is handling multiple containers, it is, in a way, exposing a large surface area for attack. Following industry-wide practiced field-level tokenization is not enough, when we are not securing the orchestrator’s ecosystem. Because it’s just a matter of time before the sensitive information gets decoded and exposed. Recommendations Ensuring the administrative interface of the orchestrator is properly encrypted, which can include two-factor authentication and at-rest encryption of data. Separation of network traffic into discrete virtual networks. This segregation should be done on the basis of the sensitivity of the traffic being transmitted. (For example, public-facing web apps can be categorized as low-sensitivity workloads and something like tax reporting software as high-sensitivity workloads and separating them. The idea is to ensure each host runs containers of a certain security level.) The best practice can be to adhere to end-to-end encryption of all network traffic between cluster nodes and also includes authenticated network connections between cluster members. We should aim to introduce nodes securely into clusters, maintaining a persistent identity for each node throughout its lifecycle. (Isolate/Remove compromised nodes without affecting the cluster’s security). 3. Prevent the “Container Escape Scenario” Popular container runtimes such as containerd, CRI-O, and rkt might have hardened their security policies over time, however, there’s still the possibility of them containing bugs. This is an important aspect to consider because they can allow mischievous code to run inside the “container escape” out onto the host. If you remember back in 2019, a vulnerability was discovered in runC called Runscape. This bug (CVE-2019-5736 ) had the potential to enable hackers to break away from the sandbox environment and gave root access to the host servers. This led to compromising an entire infrastructure. At first, they assumed that it could be a malicious Docker image as there has to be a malicious process inside. After all tests, they realized it was a bug in runC. Security Needs to Shift Left While dealing with a microservices-based environment, the best practice says to bring in automated deployments at every step. We are not agile if we are still performing deployments manually, following a weekly or monthly cadence. To actually shift left in application delivery, we need to create a modern toolchain of security plugins and their extensions throughout the pipeline. This is how it works: if there’s any vulnerability present in the image, the process should stop right there in the build stage. Periodic audits should be conducted on RBACs to monitor all access levels. Also, all the tools and processes should align with the CIS benchmarks. A good approach will be to adopt security-as-code practices to write the security manifest for Kubernetes-native YAML files as custom resource definitions. These are human-readable and declare the security state for the application at runtime. Now, this can be pushed into the prod environment and protected with the zero-trust model. Thus, there are never any changes to the code outside the pipeline. Wrapping Up It’s time to wrap our thoughts around containerization and container security handling. My goal was to highlight some of the easily doable, yet highly neglected zones while practicing containerization. Today, automated security processes across CI/CD pipelines and declarative zero-trust security policies are the need of the hour. They enhance developer productivity and are a part of DevOps best practices.

By Komal J Prabhakar
Comparing Styles of Container-Based Deployment for IBM App Connect Enterprise
Comparing Styles of Container-Based Deployment for IBM App Connect Enterprise

When deploying IBM App Connect Enterprise integrations in containers there are options around how we deploy the runtime, the configurations, the integration flows and environment specific data. All the movable parts can either be “included in the image,” “configured at deploy time,” or a mixture of the two. In this article, we will explore these different deployment options, the benefits of each, and considerations for taking each approach. What Changes When We Deploy IBM App Connect Enterprise in Containers? From IBM App Connect v11, the integration server can be deployed on its own, without the need for an integration node to look after it. This is most relevant in a container scenario where the container platform itself takes on the management of the individual integration servers. Additionally, all the IBM App Connect Enterprise configuration required is defined alongside the container providing a more portable deployment of integrations. Container based deployment has many notable differences to a traditional deployment, such as: Deployment isolation: In container deployments we see groups of integrations being independently deployed into their own container runtime as opposed to a more traditional singular centralized "broker" deployment. This allows for greater isolation, protecting each deployment from the other integrations. Product runtime versioning: Since the container image also includes the product runtime this provides an opportunity to, for example, run different versions of IBM App Connect Enterprise in different containers allowing each integration to independently choose when and how to upgrade. Previously in a shared server environment all integrations were running on the very same version of the product and had to be upgraded at the same time, with costly regression testing, and greater risk during the single upgrade. Memory and CPU: Container platforms enable explicit/declarative ways to specify how much CPU and memory each (IBM App Connect Enterprise) container requires, ensuring the specific needs of the integrations can be taken into account. Dedicated Operating System Resources: With VM/server-based IBM App Connect Enterprise deployments there are a few areas where resources just could not easily be dedicated and managed, meaning unrelated workloads could potentially impact or have adverse effects on each other. An example of this would be the integration node level listener or the maximum number of threads or processes on the VM/server. Load balancing: In a traditional VM/server-based deployment, load balancing across VM replicas has to be explicitly set up and maintained. In containers this is replaced with the dynamic routing which is automatically set up and managed by the container platform as containers are deployed. TLS termination: In containers this would potentially be handled and configured by the platform (i.e. outside of the container) which in traditional VM/server-based deployments the configuration would be in a configurable service. This can be managed through a service mesh like Istio, which you can read about in the O’Reilly book "Istio Explained:" https://www.ibm.com/downloads/cas/XWN1WV9Q Declarative Configuration: The majority of the configuration is now declaratively defined within a configuration file (server.conf.yaml) rather than requiring post deployment IBM App Connect Enterprise commands to be run on the node and server. This is linked with container-based objects like Kubernetes ConfigMap that are read on start-up, ensuring that the integration server’s state is not changed at runtime. This means configuration is more portable and repeatable as the documented YAML defines the true state of a redeployed container. What Is a Container Image and What Should Be Included in It? A container image is a read-only template which is used to create instances of containers. Anything not included in the image can then be sourced during the creation of the container. It is not part of the deployed image, but will be available in the running container. Therefore, when deploying IBM App Connect Enterprise into container environments there is a decision to be made on how much of the configuration (including external libraries, drivers, etc.), flows, and environment settings should be included as part of the image being deployed. Figure 1: Shows what is included in each container image type Sometimes, the following terms are associated with the scale shown in Figure 1: "baked" (furthest right column), as in everything is baked into the image "fried" (further left column), where everything is configured as it is needed Most organizations will fall somewhere between these two extremes, but most commonly we see people using the image types C and D from Figure 1. All options represented in the diagram are valid ways of building and deploying integration containers. Generally, the decision as to which approach to take relates to secondary considerations such as image management, skills within a team, non-functional requirements (testing, security), and management of the live environment. Using Pre-Built Container Images IBM provides a pre-built container image for IBM App Connect Enterprise in the IBM Entitled Registry. This image contains pre-integrated database drivers and the MQ Client. Note that this pre-built image can only be used in conjunction with the App Connect Operator. On deployment of the image, you can specify the location of a remote BAR file and links to configuration information, and the container will draw those in during start up. Used in this way, it is therefore an example of Image type B in Figure 1. It is also possible to build new images based on the pre-built image, adding server configuration, a BAR file, or even environment configuration in order to create image types C, D, or E. Whilst the pre-built image covers most scenarios, there will always be circumstances where teams would prefer to create a completely custom image. A sample is provided on GitHub for this purpose. However, there are many advantages to using the pre-built image with the App Connect Operator, as described in the following articles: Part 1 - What is an Operator and why did we create one for IBM App Connect? Part 2 - Exploring the IntegrationServer resource of the IBM App Connect Operator Images and Hierarchies New images can be based on existing images. Figure 2: Basing images on other images Image 1 could be deployed, adding the BAR file and server configuration at deployment time. Alternatively, further images could be built by layering configuration (e.g. server.conf.yaml) and code (e.g. the BAR file) on top of the first image as in Image 2 in Figure 2. In this case we would only have to add the environment specifics at deployment time. We could go even further and create an image for each environment that the integration will be deployed to as in Image 3 in Figure 2. In this case we could simply deploy the image and it would automatically contain all the environment specific information required. These images are hierarchical, relying on each other, meaning a level of management is needed. Part of this management is the responsibility of ownership of the images, from the template owner through to the application teams owning environment specific images and containers. Having a single container with everything deployed into it doesn’t mean just one image to manage. All three images shown in Figure 2 will need to be managed and consideration given when making changes, i.e. how a change in Image 2 might impact the top level of images (Image 3) with environment specific configurations. The illustration below provides an example of an image hierarchy where images have been created down to the level of each environment. Let’s look at the implications of this: Figure 3: Example image hierarchy where environment specific images have been created We can see that making a change to an upstream image (such as “Rest Flow”) potentially affects many downstream images and could cause a cascade of image updates to be required. On the one hand, this might be considered good for governance as it enforces that all images share a known base. On the flip side, small changes to images at the base of the hierarchy could result in sweeping changes, which detracts from the isolation and decoupling we’re trying to achieve in more fine-grained deployment. Image Template Configurations Certain groups of integrations will share common aspects. The IBM App Connect image from the IBM Cloud Container Registry provides a good base template containing database drivers and the MQ Client. Further templates can be built on top of that to include any additional elements and can be used as "Enterprise Templates" to be managed and maintained by the organization for all future integrations of a given type. Such elements found in Templates are well-suited to be “included in the image” rather than “configured at deploy time.” Examples might include the addition of database drivers, certain client libraries and agents, and maybe even standardized sub-flows for things such as error handling. The core App Connect Enterprise runtime configuration is detailed in the server configuration YAML file (server.conf.yaml). Certain settings in that server configuration can also be included as part of an image template to implement organization wide standards and defaults. Examples might include administrative role mapping, configuration for offloading logs to an external system, and more. Figure 4: Image Templating IBM App Connect Enterprise Specific Considerations This section looks at a few key considerations for IBM App Connect Enterprise when deciding what to include within the container image. IBM App Connect Enterprise is a rich runtime with a lot of additional configurations that can be deployed to describe the integration server, the container itself, the integration flow, and environment variables for each of these. This exacerbates the potential for environmental drift when configuring entirely at deploy time and the complexity of the build when configuring during the build of the image. Implementing integrations inherently means that there are more configurations compared to traditional application development. For example, for each integrated system there are connectivity settings, clients, drivers, credentials, etc. Furthermore, getting those configurations exactly right is critically important to the functioning of the integration application. In Figure 1, we see different approaches for how much of these different types of configurations can be included in the image. Choosing an image strategy more towards the left of Figure 1 can lead to a very large number of live configurations to manage when configuring them all at deploy time. As we move toward the right of Figure 1, we include more of these configurations within the boundaries of the image. This reduces the potential for environmental drift and enables testing during image build time to catch errors and issues before deployment into each environment. However, the more configuration you build into the image, the more images you will have to manage, to the point where in the furthest right column of Figure 1, you would have a different image for each application for each environment. Observations from the Field As organizations begin this journey, we are often asked how other customers are performing their own container deployments. All the approaches shown in Figure 1, from using the certified container and configuring everything on top at deploy time, through each of the five approaches to having environment specific images, are valid deployment styles. Some of the customers we have worked with have been concerned around environmental drift and as such have included as much configuration into the image as possible, up to and including the BAR file. These customers tend to configure environmental specifics at deploy time in a bid to reduce the number of images they need to manage. Other customers have gone a step further and created images for each environment, however we have not seen enough customers doing this to see this as a general trend. Given the number of styles available and the number of implementation decisions to be made, there is no reason why the deployment style cannot be refined as the organization better understands the implications. Thoughts on Choosing an Image Deployment Strategy As organizations transition more into container-based deployment models, there is a significant change in how we approach the build and deployment of IBM App Connect integrations. There is no "one size fits all" when deciding which style of container-based deployment to adopt. It is important to understand how these changes will impact your organization, its needs, and ways of working. You need to explore questions such as the following: Categorizing your integrations: How many common integration “patterns” are present? This may indicate how many different image templates you may need. Credential Management: How will credentials be managed and accessed either during build or deploy time? Tooling: Do you have the right tools and skills in your organization to implement the building and deploying of images, configurations, and related artifacts? How do you manage and access artifacts? Stay tuned to this series for some practical examples of including configuration in images and building image hierarchies. Acknowledgement and thanks to Carsten Boernert as co-author of this article and Kim Clark, Rob Convery, and Ben Thompson for providing valuable input.

By Aiden Gallagher
OpenShift Container Platform 4.11 Cluster Setup
OpenShift Container Platform 4.11 Cluster Setup

Introduction In this article, I will walk you through OpenShift Container Platform 4.11 cluster setup on Linux. Further, I will be creating a namespace and deployment on OCP cluster. Prerequisite Register and create a red hat account using this link: Redhat User Registration Page. Once an account has been created, log on to the red hat developer portal using this link: OCP Cluster Page. Minimum Hardware requirement for OpenShift Container Platform. 4 physical CPU cores 9 GB of free RAM memory 35 GB of storage space Minimum OS requirement. RHEL 7,8, and 9. There are multiple other platforms and OS version OpenShift Container Platform installation supported. Here is the official link from Red Hat Min-Sys-Requirement for supported platforms. Creating OCP Cluster Once all the prerequisites are satisfied, click on OCP Cluster Page, then click on create cluster. Then select an OpenShift cluster type to create as local, where Red Hat offers cloud and on-premises BareMetal. For our use case, we are selecting local, which will create a minimal cluster on your desktop/laptop for local development and testing. As highlighted in yellow in the above image, based on your local desktop/laptop Operating System, select the supported operating system from Linux/Windows and macOS. For this article, we are selecting Linux since my local operating system on RHEL 9. After Selecting OS then, click on “Download OpenShiftLocal," which will download a package with the file name crc-linux-amd64.tar.xz , then download or copy your pull secret. You'll be prompted for this information during installation. Copy the OpenShift Local archive for your operating system from the download directory to the target path(/opt/oclab) and extract or place the binary in your $PATH or by creating a directory /opt/oclab as shown below. Shell sudo mkdir -p /opt/oclab sudo chwon -R ocadmin:ocadmin /opt/oclab sudo su – ocadmin cd /opt/oclab cp -p /tmp/crc-linux-amd64.tar.xz . tar -xvf crc-linux-amd64.tar.xz The output will be like below, as shown. Extract the CRC package under /opt/oclab then execute “CRC setup” to set up your host operating system for the OpenShift Local virtual machine and run the below commands. Shell cd /opt/oclab cd crc-linux-2.12.0-amd64/ ./crc setup The output will be like below, as shown. Then, run crc start to create a minimal OpenShift 4 cluster on your computer. Shell cd /home/ocadmin/.crc/bin ./crc start We can see while crc start-up execution provisioning a CRC VMwith the required configuration. Creating CRC VM for OpenShift 4.11.18 cluster with the following operators DNS, image-registry, network, openshift-controller-manager , operator-lifecycle-manager-packageserver the installation will be completed. A CRC-admin and CRC-developer context to kubeconfig will be configured for the OpenShift cluster. Shell echo "export PATH=$PATH:<crc-extracted-path>/.crc/cache/crc_libvirt_4.11.18_amd64/" >>.bash_profile Example: echo "export PATH=$PATH:/home/ocadmin/.crc/cache/crc_libvirt_4.11.18_amd64/" >>.bash_profile Once crc is started up, that will display kubeadmin and developer credentials with web-GUI URL details and an oc cli command to log onocp cluster. As mentioned, above OpenShift cluster started, which is accessible through oc command line and web console. OCP WEB-GUI URL Login into the OCP cluster master node and run the following commands to get a number of nodes, create a project, and schedule the deployment. Shell oc get nodes oc create namespace cgt-app-ui oc create deployment app-ui –image=nginx:latest –replicas=1 -n cgt-app-ui Output for the above command execution will be as shown below. Summary Red Hat OpenShift Local is a great place where you can quickly spin up an OpenShift cluster for development purposes. It uses a single node that behaves as both a control plane and a worker node. For example, when migrating applications to microservices on containers, these containers will be orchestrated from Red Hat OpenShift.

By Vamsi Kiran Naidu
Architectural Patterns for Microservices With Kubernetes
Architectural Patterns for Microservices With Kubernetes

This is an article from DZone's 2022 Kubernetes in the Enterprise Trend Report.For more: Read the Report For some time, microservices have drawn interest across the architecture and software engineering landscape, and now, applications comprised of microservices have become commonplace. So what exactly is the definition of a microservice? That is somewhat of a loaded question as there is plenty of debate on granularity, segmentation, and what designates a microservice. For the purposes of this discussion, a microservices-based architecture is segmenting an application's units of work into discrete, interoperable components. This is a broad definition, but it is workable in that it identifies two foundational microservice concepts: discrete and interoperable. Along with the technical and business benefits, a microservices-based application architecture brings its own set of challenges. These challenges have been met with solutions ranging from new architectural patterns to the evolution of tech stacks themselves. Kubernetes has become one of the technologies in the tech stack evolution. Deploying microservices using Kubernetes enhances and enforces key principles and patterns while offering additional benefits. It's an Evolution, Not a Revolution As with any technical evolution, taking the next step improves upon what has already been shown to be successful while removing barriers to adoption or execution. Kubernetes is not going to address all microservices challenges, but it does address several pain points. Best Practices Remain In many cases, the development and packaging of microservices destined for Kubernetes deployment is no different than a non-Kubernetes deployment. Non-Kubernetes deployments include bare metal servers, virtual machines, and containerized applications. Applications already packaged for containerized deployment make the step to adopt Kubernetes-managed microservices straightforward. All key microservices patterns, development, and deployment best practices are applied. Application tech stacks and components are unchanged. Continuous integration/continuous delivery (deployment) systems remain intact. Operating system platforms and versions can be tightly controlled. Differences The differences between Kubernetes and non-Kubernetes microservices architectures focus less on the task performed by the microservices and more on the deployment of non-functional requirements. Satisfying non-functional requirements is not a new concept introduced by Kubernetes or even by a microservices architecture. However, through a combination of leveraging the services offered by Kubernetes itself as well as defining cross-cutting application support services, supporting many nonfunctional requirements becomes transparent to an application. The following are two examples. Kubernetes Ingress A Kubernetes Ingress is an example of a configurable service that auto-configures external access to microservices. When a microservice is deployed, it can define whether and how it is to be externally accessed. If a microservice specifies that it is to be externally accessible, the Ingress services within the Kubernetes cluster automatically configure external access, including details such as virtual host definitions and SSL certificates. Figure 1: An Ingress definition supporting two services Here, a Kubernetes Ingress accepts HTTP(S) requests external to the Kubernetes cluster and, based on the request path, routes requests to specific services within the cluster. Operators Kubernetes Operators are a Cloud Native Computing Foundation (CNCF) specification outlining a pattern that supports cross-cutting application services. They behave similarly to a Kubernetes Ingress in that a service is auto-configured based on application specification. The primary difference is that Kubernetes Operators present an abstraction where any type of service is automatically configured to extend the behavior of a Kubernetes cluster. There are Kubernetes Operators that connect applications to logging and metrics systems with the application knowing little of the specifics regarding those systems' implementation. There are also Kubernetes Operators that will build and deploy complete database instances. Figure 2: Kubernetes Operator flow In the diagram above, an application requests that a service be made available for its use. The Kubernetes Operator monitors and watches for requests. When a request is made, the Kubernetes Operator instructs the Kubernetes cluster to deploy or configure a cross-cutting service specific to the application's request. Abstractions Kubernetes provides and supports abstractions over many systems required to satisfy non-functional components. Successful Kubernetes microservices architectures are comprehensive beyond application architecture, considering a strategy to not only address interoperability across microservices but coordination with common services. Applying Kubernetes Constructs to a Microservices Architecture Kubernetes deploys container-based applications; this implies that an artifact of a microservice build and packaging process is a Docker (or suitable alternative) image. In Kubernetes, the basic deployment unit for an image is a Pod. Often there is a one-to-one relationship between a deployed image and a Pod. However, Kubernetes Pods can support multiple deployed images within a single Pod. While the deployed containers do not share a file system, they can reference each other using localhost. Within a Kubernetes cluster, deployed Pods can provide their services to other Pods. This is like a deployed microservice on bare metal or a virtual machine, although this deployment doesn't provide access to the Pod's service from an external resource, high availability, or scalability. As discussed, Kubernetes helps applications meet non-functional requirements. A general rule of thumb is when "-ility" is used to describe a function, it often means a non-functional requirement. Using high availability and scalability as examples, Kubernetes provides these with relative ease. There are a few Kubernetes constructs that support these functions. Two are presented here: Services and Deployments. Kubernetes provides a construct called a Service. A Kubernetes Service specifies ports that a microservice wishes to expose and how they are to be exposed. Services provide two powerful features. First, a Kubernetes Service integrates with the internal Kubernetes DNS service to provide a consistent hostname by which the microservices are accessed within the Kubernetes cluster. In addition, if there are multiple instances of the same microservice Pod, a Kubernetes Service can act as a load balancer across the Pod instances, providing high availability. While Pod instances can be individually deployed, manually monitoring their status is impractical. A common pattern for adding automation to Pod "-ilities" is Kubernetes Deployments. Kubernetes Deployments specify details surrounding Pod definitions and provide several features that support the production deployment of microservices, including: The number of replicas to be maintained Updating the state of declared Pods Rollback to earlier versions Scaling up or down the number of Pods With Pod, Service, and Deployment definitions, a solid microservices architecture is in place. In this microcosm, one piece remains — that is, auto-scaling. With Deployments, scalability is available, but like direct Pod deployments, they are manually controlled. The final component to this architectural pattern is using a HorizontalPodAutoscaler to automatically scale the number of Pod instances based on certain criteria (e.g., CPU usage). This example demonstrates how Kubernetes can take any containerized microservice and, using Kubernetes constructs, satisfy the critical non-functional requirements that most applications require. Assembling the patterns discussed here, the following diagram presents a high-level visual of a Kubernetes microservices deployment pattern: Figure 3: Putting it all together The diagram portrays two microservices, "greensvc" and "bluesvc." Each microservice utilizes a Kubernetes Service to expose its functionality. In addition to providing high availability by load balancing multiple Kubernetes Pods per microservice, the Kubernetes Service maps expose Pod ports to port 80. The definition of a Kubernetes Service also creates DNS entries internal to the Kubernetes cluster (greensvc.ns.cluster.local and bluesvc.ns.cluster.local) that can allow microservices to interoperate. Both microservices are exposed outside the Kubernetes cluster through a Kubernetes Ingress. The configured ingress routes incoming requests to their respective services. Microservices Deployment Patterns Kubernetes provides many constructs and abstractions to support service and application Deployment. While applications differ, there are foundational concepts that help drive a well-defined microservices deployment strategy. Well-designed microservices deployment patterns play into an often-overlooked Kubernetes strength. Kubernetes is independent of runtime environments. Runtime environments include Kubernetes clusters running on cloud providers, in-house, bare metal, virtual machines, and developer workstations. When Kubernetes Deployments are designed properly, deploying to each of these and other environments is accomplished with the same exact configuration. In grasping the platform independence offered by Kubernetes, developing and testing the deployment of microservices can begin with the development team and evolve through to production. Each iteration contributes to the overall deployment pattern. A production deployment definition is no different than a developer's workstation configuration. This pattern provides a level of validation that is difficult to reproduce in any previous pattern and can lead to rapid maturity of an application's delivery cycle. The Kubernetes ecosystem offers tools that support these patterns. The most predominant tool is Helm, which orchestrates the definition, installation, and upgrade of Kubernetes applications. It's through tools such as Helm that the same deployment definition can be executed across multiple runtime environments by simply supplying a set of parameters specific to a runtime environment. These parameters don't change the deployment pattern; rather, they configure the deployment pattern to meet the runtime environment (e.g., configuring the amount of memory to allocate to a process). To learn more about Helm charts, check out the article, "Advanced Guide to Helm Charts for Package Management in Kubernetes." Microservices Deployment in Kubernetes Makes Sense Deploying microservices in Kubernetes is an evolution of microservices architectures. Kubernetes addresses many pain points and challenges in developing and deploying microservices-based applications. Being an evolution implies that it's not a revolution. It's not a rewrite. When designing microservices, in many ways, Kubernetes addresses the question that needs to be answered. Rather than waiting, good Kubernetes design and deployment patterns encourage tackling non-functional requirements early in the development process, leading to an application that will mature much faster. Whether it's Kubernetes or a different deployment platform, the same issues that need to be considered will need to be addressed upfront or later. In software engineering, it's almost always best to consider issues upfront. Kubernetes directly helps in addressing many microservices architectures and deployment challenges. This is an article from DZone's 2022 Kubernetes in the Enterprise Trend Report.For more: Read the Report

By Ray Elenteny CORE
Advancements in Cloud-Native and Kubernetes Observability
Advancements in Cloud-Native and Kubernetes Observability

This is an article from DZone's 2022 Kubernetes in the Enterprise Trend Report.For more: Read the Report In today's world, it's more important than ever to have visibility into your system's performance and health. Modern applications rely on complex microservices architectures and cloud-native technologies, like Kubernetes. Observability helps us understand not just application behavior, but also infrastructure configuration changes and dependencies, as they happen in real-time. What Is Cloud-Native Observability? Cloud-native observability is the ability to understand the health and status of a system based on the external data exposed by its elements, like containers, microservices, Kubernetes, and serverless functions. Cloud-native observability is built on three postulates: monitoring, logging, and traceability. By understanding these postulates and their implications, you can begin to understand why observability is essential in modern DevOps practices. Figure 1: The three pillars of observability Monitoring, or metrics, measure the health and performance of applications and their infrastructure — these are quantifiable measurements. Metrics provide real-time alerts of resource statuses and are critical for collecting insights into how fast your application responds and/or detects early indicators of performance problems. Another pillar of observability is logging. Logging captures detailed error messages as well as stack traces. Logs are records of events, warnings, and faults that occur inside a system. They are a great source of visibility as they contain information like the time an event took place and who or what endpoint was associated with the event. By combining logs and traces, you can get the full context of a system's availability. Tracing helps to investigate an issue or performance bottleneck in a containerized or microservices-based ecosystem by collecting data of the entire journey of an application request that moves through all the layers and nodes of that system. Why Is Observability Important? Observability is very important for developers and DevOps engineers because it provides the necessary information needed to track an entire system's performance and health, troubleshoot and diagnose issues quickly, and make informed decisions about the next steps needed to fix a problem. Observability in cloud-native environments, such as Kubernetes, can help to understand what's going on within the clusters, so you can optimize compute power without compromising performance. Another great use case where observability helps is cloud cost management. To avoid unnecessary compute costs, you need to monitor clusters and understand your application's resource usage and needs. Security issues and vulnerabilities can be detected quickly with good monitoring and observability tools in place. Common Challenges in Observability and Monitoring Kubernetes is one of the most popular open-source container runtime engines because of its versatility, reliability, and ability to provide abstraction among cloud providers. However, monitoring Kubernetes can be a quite challenging task for teams that require a diverse set of monitoring tools and solutions. High Quantity of Components and Metrics Working with Kubernetes brings its own set of challenges; it has more components than traditional infrastructures, like clusters, nodes, Pods, namespaces, Services, and more. These components produce their own metrics, which can be really helpful but also overwhelming if you don't know how to understand them. The Complexity of Microservices Modern applications can include numerous microservices, and Kubernetes needs to handle their availability and scalability by keeping track of all of them. Each Service can be distributed across multiple instances, forcing containers to move across your infrastructure as needed. Additionally, microservices need to be in constant communication with each other, which is also done inside the Kubernetes cluster. Dynamic Containers Depending on the organization, the container's life can be very short (from 10 seconds to 5 minutes). This creates something known as "pod churn" — the lifecycle through which new containers and Pods are created, used, destroyed, and recreated. Kubernetes manages these elements to make sure that there are available resources at any time and that they are allocated where needed. Figure 2: Layers of Kubernetes infrastructure With every new deployment, Kubernetes decides to move, recreate, or destroy Pods as necessary. If there is a downscaling need for the application, Pods or nodes can disappear forever. This means that once a container dies, all of the information inside is gone. Advancements in Kubernetes Observability Now that Kubernetes is more popular than ever, observability tools have also become more sophisticated. Some of the key advancements in Kubernetes observability include real-time monitoring, performance analytics, and application visibility. Emergence of Observability Tools In the traditional development lifecycle, monitoring and observability haven't been exactly part of the whole process. But, now as cloud-native systems grow, and infrastructures become more complex, observability is becoming the focus for organizations. The need to maintain system stability, reliability, and performance is what enables the growth and maturity of observability and analysis tools. In the past few years, cloud-native observability tools have made huge advancements. Many different tools have emerged for metrics monitoring, collecting data, and analyzing logs and traces from individual parts of your cluster. Achieving Cluster Visibility The need to aggregate and relate all of this observability data from different sources and get a holistic view of your entire system is much bigger now. This is why different open-source tools can be easily integrated with each other, to create complete visibility for developers. Cluster visibility tools can aggregate metrics from different microservices and applications in a data center, give insights into performance during deployments, understand how active services are behaving across the data center, pinpoint application-level issues, and provide real-time alerts to administrators. To achieve this goal, most of these tools offer native integration with monitoring systems so that you can get notified if any service goes down unexpectedly or is experiencing high load. Additionally, many of these products also have sophisticated analytics capabilities that allow analysts to drill down and understand what is happening at a microservices or application level. A Unified Observability Approach In order to achieve an even deeper level of observability, every unit of data needs to be contextualized. So every metric, trace, and log captured needs to have the complete context of what is happening in the system. Additionally, developers need to be able to monitor the whole application behavior, from the start of the delivery pipeline through the deployment. Tools that integrate seamlessly into the application environment are crucial to automate this whole observability journey. This unified observability approach has an exciting promise: to provide a deeper correlation between the three pillars and help teams define and track metrics that are important for their business requirements. The state of Kubernetes observability is constantly evolving, so it's important to stay up to date on the latest trends. This includes learning about the different types of monitoring tools that are available and choosing the best one for your needs. Popular Tools for Cloud-Native Observability In the last few years, the implementation of observability has become more accessible, as there are many open-source tools available that help developers introduce different aspects of observability into their systems. Let's look at the most popular ones: POPULAR CLOUD-NATIVE OBSERVABILITY TOOLS Tool Description Advantages Disadvantages Prometheus The most adopted open-source observability tool for event monitoring and alerting Real-time metrics and data collection Grafana integration Visualization for containerized applications Doesn't have built-in long-term storage Rudimental anomaly detection Handles only metrics, not logs or traces Has challenges with horizontal scaling OpenTelemetry This project standardizes how you collect and send telemetry data to a backend platform, such as Prometheus or Jaeger Expands the fourth pillar called "profiling" to help better understand performance The Kubernetes Dashboard can be used to visually track important info, like containers and Pods running in clusters Metrics about apps and app deployments running in the Pods, as well as resource utilization Doesn't provide backend storage Doesn't have a visualization layer Doesn't provide a storage solution Jaeger A popular choice for Kubernetes tracing; its main purpose is to monitor microservice-based systems and collect data in storage back ends of your choice Makes debugging easier by using tracing to analyze root causes and monitor distributed transactions Offers limited back-end integration cAdvisor A Kubernetes built-in monitoring feature that exists by default on every node Collects data on a node level, such as resource usage and performance characteristics, and displays them on a web-based interface Doesn't offer rich functionality, as it collects only basic utilization metrics Doesn't offer any long-term storage No deep analysis or trending metrics Handles only metrics, not logs or traces Conclusion Cloud-native observability has emerged as an important part of managing Kubernetes deployments in the cloud. By providing visibility into the performance of applications running on Kubernetes, cloud-native observability has helped to improve system performance, reliability, and security. With the advancements in open-source tools, we are lowering the barriers of implementing observability for organizations. Current offerings and standards will mature and become more established by focusing on the outcomes rather than technical implementation. Additionally, the trend of moving applications to Kubernetes is likely to increase the demand for observability solutions in the future. Kubernetes helps developers ship their apps to the cloud in a containerized matter, which is now deeply connected to the cloud-native way of working. More than just a technology, Kubernetes becomes a business decision that drives the value for the company that incorporates it, because it means that you develop and deliver apps in a cloud-native manner, making you ready for innovation. Kubernetes is most powerful when properly managed and monitored. That's why it's important to think about observability from the start. Implementing the right tools and standards for cloud-native observability can save your business valuable time and resources because it will help the organization detect incidents and resolve them in a timely manner. This is an article from DZone's 2022 Kubernetes in the Enterprise Trend Report.For more: Read the Report

By Marija Naumovska
Microservices Orchestration
Microservices Orchestration

This is an article from DZone's 2022 Microservices and Containerization Trend Report.For more: Read the Report Does your organization use a microservices-style architecture to implement its business functionality? What approaches to microservices communication and orchestration do you use? Microservices have been a fairly dominant application architecture for the last few years and are usually coupled with the adoption of a cloud platform (e.g., containers, Kubernetes, FaaS, ephemeral cloud services). Communication patterns between these types of services vary quite a bit. Microservices architectures stress independence and the ability to change frequently, but these services often need to share data and initiate complex interactions between themselves to accomplish their functionality. In this article, we'll take a look at patterns and strategies for microservices communication. Problems in the Network Communicating over the network introduces reliability concerns. Packets can be dropped, delayed, or duplicated, and all of this can contribute to misbehaving and unreliable service-to-service communication. In the most basic case — service A opening a connection to service B — we put a lot of trust in the application libraries and the network itself to open a connection and send a request to the target service (service B in this case). Figure 1: Simple example of service A calling service B But what happens if that connection takes too long to open? What if that connection times out and cannot be open? What if that connection succeeds but then later gets shut down after processing a request, but before a response? We need a way to quickly detect connection or request issues and decide what to do. Maybe if service A cannot communicate with service B, there is some reasonable fallback (e.g., return an error message, static response, respond with a cached value). Figure 2: More complicated example of calling multiple services In a slightly more complicated case, service A might need to call service B, retrieve some values from the service B response, and use it to call service C. If the call to service B succeeds but the call to service C fails, the fallback option may be a bit more complicated. Maybe we can fall back to a predefined response, retry the request, pull from a cache based on some of the data from the service B response, or maybe call a different service? Problems within the network that cause connection or request failures can, and do, happen intermittently and must be dealt with by the applications. These problems become more likely and more complicated with the more service calls orchestrated from a given service, as is seen in Figure 3. Figure 3: Trying to orchestrate multiple service calls across read/write APIs These problems become even more troubling when these calls between services aren't just "read" calls. For example, if service A calls service B, which performs some kind of data mutation that must be coordinated with the next call to service C (e.g., service A tells service B that customer Joe’s address is updated but must also tell service C to change the shipping because of the address change), then these failed calls are significant. This can result in inconsistent data and an inconsistent state across services. Network errors like this impact resilience, data consistency, and likely service-level objectives (SLOs) and service-level agreements (SLAs). We need a way to deal with these network issues while taking into account other issues that crop up when trying to account for failures. Helpful Network Resilience Patterns Building APIs and services to be resilient to network unreliability is not always easy. Services (including the frameworks and libraries used to build a service) can fail due to the network in sometimes unpredictable ways. A few patterns that go a long way to building resilient service communication are presented here but are certainly not the only ones. These three patterns can be used as needed or together to improve communication reliability (but each has its own drawbacks): Retry/backoff retry – if a call fails, send the request again, possibly waiting a period of time before trying again Idempotent request handling – the ability to handle a request multiple times with the same result (can involve de-duplication handling for write operations) Asynchronous request handling – eliminating the temporal coupling between two services for request passing to succeed Let’s take a closer look at each of these patterns. Retries With Backoff Handling Network unreliability can strike at any time and if a request fails or a connection cannot be established, one of the easiest things to do is retry. Typically, we need some kind of bounded number of retries (e.g., "retry two times" vs. just retry indefinitely) and potentially a way to backoff the retries. With backoffs, we can stagger the time we spend between a call that fails and how long to retry. One quick note about retries: We cannot just retry forever, and we cannot configure every service to retry the same number of times. Retries can contribute negatively to "retry storm" events where services are degrading and the calling services are retrying so much that it puts pressure on, and eventually takes down, a degraded service (or keeps it down as it tries to come back up). A starting point could be to use a small, fixed number of retries (say, two) higher up in a call chain and don’t retry the deeper you get into a call chain. Idempotent Request Handling Idempotent request handling is implemented on the service provider for services that make changes to data based on an incoming request. A simple example would be a counter service that keeps a running total count and increments the count based on requests that come in. For example, a request may come in with the value "5," and the counter service would increment the current count by 5. But what if the service processes the request (increments of 5), but somehow the response back to the client gets lost (network drops the packets, connection fails, etc.)? The client may retry the request, but this would then increment the count by 5 again, and this may not be the desired state. What we want is the service to know that it’s seen a particular request already and either disregard it or apply a "no-op." If a service is built to handle requests idempotently, a client can feel safe to retry the failed request with the service able to filter out those duplicate requests. Asynchronous Request Handling For the service interactions in the previous examples, we've assumed some kind of request/response interaction, but we can alleviate some of the pains of the network by relying on some kind of queue or log mechanism that persists a message in flight and delivers it to consumers. In this model, we remove the temporal aspect of both a sender and a receiver of a request being available at the same time. We can trust the message log or queue to persist and deliver the message at some point in the future. Retry and idempotent request handling are also applicable in the asynchronous scenario. If a message consumer can correctly apply changes that may occur in an "at-least once delivery" guarantee, then we don't need more complicated transaction coordination. Essential Tools and Considerations for Service-to-Service Communication To build resilience into service-to-service communication, teams may rely on additional platform infrastructure, for example, an asynchronous message log like Kafka or a microservices resilience framework like Istio service mesh. Handling tasks like retries, circuit breaking, and timeouts can be configured and enforced transparently to an application with a service mesh. Since you can externally control and configure the behavior, these behaviors can be applied to any/all of your applications — regardless of the programming language they've been written in. Additionally, changes can be made quickly to these resilience policies without forcing code changes. Another tool that helps with service orchestration in a microservices architecture is a GraphQL engine. GraphQL engines allow teams to fan out and orchestrate service calls across multiple services while taking care of authentication, authorization, caching, and other access mechanisms. GraphQL also allows teams to focus more on the data elements of a particular client or service call. GraphQL started primarily for presentation layer clients (web, mobile, etc.) but is being used more and more in service-to-service API calls as well. Figure 4: Orchestrating service calls across multiple services with a GraphQL engine GraphQL can also be combined with API Gateway technology or even service mesh technology, as described above. Combining these provides a common and consistent resilience policy layer — regardless of what protocols are being used to communicate between services (REST, gRPC, GraphQL, etc.). Conclusion Most teams expect a cloud infrastructure and microservices architecture to deliver big promises around service delivery and scale. We can set up CI/CD, container platforms, and a strong service architecture, but if we don’t account for runtime microservices orchestration and the resilience challenges that come with it, then microservices are really just an overly complex deployment architecture with all of the drawbacks and none of the benefits. If you’re going down a microservices journey (or already well down the path), make sure service communication, orchestration, security, and observability are at front of mind and consistently implemented across your services. This is an article from DZone's 2022 Microservices and Containerization Trend Report.For more: Read the Report

By Christian Posta

Top Containers Experts

expert thumbnail

Yitaek Hwang

Software Engineer,
NYDIG

‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎
expert thumbnail

Abhishek Gupta

Principal Developer Advocate,
AWS

I mostly work on open-source technologies including distributed data systems, Kubernetes and Go
expert thumbnail

Alan Hohn

Director, Software Strategy,
Lockheed Martin

I'm the author of The Book of Kubernetes, published in 2022 by No Starch Press. I've worked for over 25 years as a software developer, lead, architect, and manager. I've delivered real applications to production in Ada, Java, Python, and Go, amongst others, and have led multiple software teams in modernization efforts, incorporating cloud, microservice architecture, and containerization on complex programs. I'm an Agile and DevSecOps coach and an experienced trainer for Java, Ansible, containers, software architecture, and Kubernetes.
expert thumbnail

Marija Naumovska

Co-founder and Technical Product Manager,
Microtica

‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎

The Latest Containers Topics

article thumbnail
Real-Time Stream Processing With Hazelcast and StreamNative
In this article, readers will learn about real-time stream processing with Hazelcast and StreamNative in a shorter time, along with demonstrations and code.
January 27, 2023
by Timothy Spann
· 1,897 Views · 2 Likes
article thumbnail
Cloud Native London Meetup: 3 Pitfalls Everyone Should Avoid With Cloud Data
Explore this session from Cloud Native London that highlights top lessons learned as developers transitioned their data needs into cloud-native environments.
January 27, 2023
by Eric D. Schabell CORE
· 1,399 Views · 3 Likes
article thumbnail
DevOps Roadmap for 2022
[Originally published February 2022] In this post, I will share some notes from my mentoring session that can help you - a DevOps engineer or platform engineer, learn where to focus.
January 26, 2023
by Anjul Sahu
· 17,974 Views · 6 Likes
article thumbnail
Key Considerations When Implementing Virtual Kubernetes Clusters
In this article, readers will receive key considerations to examine when implementing virutal Kubernetes clusters, along with essential questions and answers.
January 25, 2023
by Hanumantha (Hemanth) Kavuluru
· 3,107 Views · 3 Likes
article thumbnail
How Do the Docker Client and Docker Servers Work?
This article will help you deeply understand how Docker's client-server model works and give you more insights about the Docker system.
January 25, 2023
by Eugenia Kuzmenko
· 3,067 Views · 1 Like
article thumbnail
The Future of Cloud Engineering Evolves
Central cloud engineering platform defines consistent workloads, architectures, and best practices.
January 25, 2023
by Tom Smith CORE
· 2,947 Views · 2 Likes
article thumbnail
A Brief Overview of the Spring Cloud Framework
Readers will get an overview of the Spring Cloud framework, a list of its main packages, and their relation with the Microservice Architectural patterns.
January 25, 2023
by Mario Casari
· 4,879 Views · 1 Like
article thumbnail
How To Check Docker Images for Vulnerabilities
Regularly checking for vulnerabilities in your pipeline is very important. One of the steps to execute is to perform a vulnerability scan of your Docker images. In this blog, you will learn how to perform the vulnerability scan, how to fix the vulnerabilities, and how to add it to your Jenkins pipeline. Enjoy! 1. Introduction In a previous blog from a few years ago, it was described how you could scan your Docker images for vulnerabilities. A follow-up blog showed how to add the scan to a Jenkins pipeline. However, Anchore Engine, which was used in the previous blogs, is not supported anymore. An alternative solution is available with grype, which is also provided by Anchore. In this blog, you will take a closer look at grype, how it works, how you can fix the issues, and how you can add it to your Jenkins pipeline. But first of all, why check for vulnerabilities? You have to stay up-to-date with the latest security fixes nowadays. Many security vulnerabilities are publicly available and therefore can be exploited quite easily. It is therefore a must-have to fix security vulnerabilities as fast as possible in order to minimize your attack surface. But how do you keep up with this? You are mainly focused on business and do not want to have a full-time job fixing security vulnerabilities. That is why it is important to scan your application and your Docker images automatically. Grype can help with scanning your Docker images. Grype will check operating system vulnerabilities but also language-specific packages such as Java JAR files for vulnerabilities and will report them. This way, you have a great tool that will automate the security checks for you. Do note that grype is not limited to scanning Docker images. It can also scan files and directories and can therefore be used for scanning your sources. In this blog, you will create a vulnerable Docker image containing a Spring Boot application. You will install and use grype in order to scan the image and fix the vulnerabilities. In the end, you will learn how to add the scan to your Jenkins pipeline. The sources used in this blog can be found on GitHub. 2. Prerequisites The prerequisites needed for this blog are: Basic Linux knowledge Basic Docker knowledge Basic Java and Spring Boot knowledge 3. Vulnerable Application Navigate to Spring Initializr and choose a Maven build, Java 17, Spring Boot 2.7.6, and the Spring Web dependency. This will not be a very vulnerable application because Spring already ensures that you use the latest Spring Boot version. Therefore, change the Spring Boot version to 2.7.0. The Spring Boot application can be built with the following command, which will create the jar file for you: Shell $ mvn clean verify You are going to scan a Docker image, therefore a Dockerfile needs to be created. You will use a very basic Dockerfile which just contains the minimum instructions needed to create the image. If you want to create production-ready Docker images, do read the posts Docker Best Practices and Spring Boot Docker Best Practices. Dockerfile FROM eclipse-temurin:17.0.1_12-jre-alpine WORKDIR /opt/app ARG JAR_FILE COPY target/${JAR_FILE} app.jar ENTRYPOINT ["java", "-jar", "app.jar"] At the time of writing, the latest eclipse-temurin base image for Java 17 is version 17.0.5_8. Again, use an older one in order to make it vulnerable. For building the Docker image, a fork of the dockerfile-maven-plugin of Spotify will be used. The following snippet is therefore added to the pom file. XML com.xenoamess.docker dockerfile-maven-plugin 1.4.25 mydeveloperplanet/mygrypeplanet ${project.version} ${project.build.finalName}.jar The advantage of using this plugin is that you can easily reuse the configuration. Creating the Docker image can be done by a single Maven command. Building the Docker image can be done by invoking the following command: Shell $ mvn dockerfile:build You are now all set up to get started with grype. 4. Installation Installation of grype can be done by executing the following script: Shell $ curl -sSfL https://raw.githubusercontent.com/anchore/grype/main/install.sh | sh -s -- -b /usr/local/bin Verify the installation by executing the following command: Shell $ grype version Application: grype Version: 0.54.0 Syft Version: v0.63.0 BuildDate: 2022-12-13T15:02:51Z GitCommit: 93499eec7e3ce2704755e9f51457181b06b519c5 GitDescription: v0.54.0 Platform: linux/amd64 GoVersion: go1.18.8 Compiler: gc Supported DB Schema: 5 5. Scan Image Scanning the Docker image is done by calling grype followed by docker:, indicating that you want to scan an image from the Docker daemon, the image, and the tag: Shell $ grype docker:mydeveloperplanet/mygrypeplanet:0.0.1-SNAPSHOT Application: grype Version: 0.54.0 Syft Version: v0.63.0 Vulnerability DB [updated] Loaded image Parsed image Cataloged packages [50 packages] Scanned image [42 vulnerabilities] NAME INSTALLED FIXED-IN TYPE VULNERABILITY SEVERITY busybox 1.34.1-r3 1.34.1-r5 apk CVE-2022-28391 High jackson-databind 2.13.3 java-archive CVE-2022-42003 High jackson-databind 2.13.3 java-archive CVE-2022-42004 High jackson-databind 2.13.3 2.13.4 java-archive GHSA-rgv9-q543-rqg4 High jackson-databind 2.13.3 2.13.4.1 java-archive GHSA-jjjh-jjxp-wpff High java 17.0.1+12 binary CVE-2022-21248 Low java 17.0.1+12 binary CVE-2022-21277 Medium java 17.0.1+12 binary CVE-2022-21282 Medium java 17.0.1+12 binary CVE-2022-21283 Medium java 17.0.1+12 binary CVE-2022-21291 Medium java 17.0.1+12 binary CVE-2022-21293 Medium java 17.0.1+12 binary CVE-2022-21294 Medium java 17.0.1+12 binary CVE-2022-21296 Medium java 17.0.1+12 binary CVE-2022-21299 Medium java 17.0.1+12 binary CVE-2022-21305 Medium java 17.0.1+12 binary CVE-2022-21340 Medium java 17.0.1+12 binary CVE-2022-21341 Medium java 17.0.1+12 binary CVE-2022-21360 Medium java 17.0.1+12 binary CVE-2022-21365 Medium java 17.0.1+12 binary CVE-2022-21366 Medium libcrypto1.1 1.1.1l-r7 apk CVE-2021-4160 Medium libcrypto1.1 1.1.1l-r7 1.1.1n-r0 apk CVE-2022-0778 High libcrypto1.1 1.1.1l-r7 1.1.1q-r0 apk CVE-2022-2097 Medium libretls 3.3.4-r2 3.3.4-r3 apk CVE-2022-0778 High libssl1.1 1.1.1l-r7 apk CVE-2021-4160 Medium libssl1.1 1.1.1l-r7 1.1.1n-r0 apk CVE-2022-0778 High libssl1.1 1.1.1l-r7 1.1.1q-r0 apk CVE-2022-2097 Medium snakeyaml 1.30 java-archive GHSA-mjmj-j48q-9wg2 High snakeyaml 1.30 1.31 java-archive GHSA-3mc7-4q67-w48m High snakeyaml 1.30 1.31 java-archive GHSA-98wm-3w3q-mw94 Medium snakeyaml 1.30 1.31 java-archive GHSA-c4r9-r8fh-9vj2 Medium snakeyaml 1.30 1.31 java-archive GHSA-hhhw-99gj-p3c3 Medium snakeyaml 1.30 1.32 java-archive GHSA-9w3m-gqgf-c4p9 Medium snakeyaml 1.30 1.32 java-archive GHSA-w37g-rhq8-7m4j Medium spring-core 5.3.20 java-archive CVE-2016-1000027 Critical ssl_client 1.34.1-r3 1.34.1-r5 apk CVE-2022-28391 High zlib 1.2.11-r3 1.2.12-r0 apk CVE-2018-25032 High zlib 1.2.11-r3 1.2.12-r2 apk CVE-2022-37434 Critical What does this output tell you? NAME: The name of the vulnerable package INSTALLED: Which version is installed FIXED-IN: In which version the vulnerability is fixed TYPE: The type of dependency, e.g., binary for the JDK, etc. VULNERABILITY: The identifier of the vulnerability; with this identifier, you are able to get more information about the vulnerability in the CVE database SEVERITY: Speaks for itself and can be negligible, low, medium, high, or critical. As you take a closer look at the output, you will notice that not every vulnerability has a confirmed fix. So what do you do in that case? Grype provides an option in order to show only the vulnerabilities with a confirmed fix. Adding the --only-fixed flag will do the trick. Shell $ grype docker:mydeveloperplanet/mygrypeplanet:0.0.1-SNAPSHOT --only-fixed Vulnerability DB [no update available] Loaded image Parsed image Cataloged packages [50 packages] Scanned image [42 vulnerabilities] NAME INSTALLED FIXED-IN TYPE VULNERABILITY SEVERITY busybox 1.34.1-r3 1.34.1-r5 apk CVE-2022-28391 High jackson-databind 2.13.3 2.13.4 java-archive GHSA-rgv9-q543-rqg4 High jackson-databind 2.13.3 2.13.4.1 java-archive GHSA-jjjh-jjxp-wpff High libcrypto1.1 1.1.1l-r7 1.1.1n-r0 apk CVE-2022-0778 High libcrypto1.1 1.1.1l-r7 1.1.1q-r0 apk CVE-2022-2097 Medium libretls 3.3.4-r2 3.3.4-r3 apk CVE-2022-0778 High libssl1.1 1.1.1l-r7 1.1.1n-r0 apk CVE-2022-0778 High libssl1.1 1.1.1l-r7 1.1.1q-r0 apk CVE-2022-2097 Medium snakeyaml 1.30 1.31 java-archive GHSA-3mc7-4q67-w48m High snakeyaml 1.30 1.31 java-archive GHSA-98wm-3w3q-mw94 Medium snakeyaml 1.30 1.31 java-archive GHSA-c4r9-r8fh-9vj2 Medium snakeyaml 1.30 1.31 java-archive GHSA-hhhw-99gj-p3c3 Medium snakeyaml 1.30 1.32 java-archive GHSA-9w3m-gqgf-c4p9 Medium snakeyaml 1.30 1.32 java-archive GHSA-w37g-rhq8-7m4j Medium ssl_client 1.34.1-r3 1.34.1-r5 apk CVE-2022-28391 High zlib 1.2.11-r3 1.2.12-r0 apk CVE-2018-25032 High zlib 1.2.11-r3 1.2.12-r2 apk CVE-2022-37434 Critical Note that the vulnerabilities for the Java JDK have disappeared, although there exists a more recent update for the Java 17 JDK. However, this might not be a big issue, because the other (non-java-archive) vulnerabilities show you that the base image is outdated. 6. Fix Vulnerabilities Fixing the vulnerabilities is quite easy in this case. First of all, you need to update the Docker base image. Change the first line in the Docker image: Dockerfile FROM eclipse-temurin:17.0.1_12-jre-alpine into: Dockerfile FROM eclipse-temurin:17.0.5_8-jre-alpine Build the image and run the scan again: Shell $ mvn dockerfile:build ... $ grype docker:mydeveloperplanet/mygrypeplanet:0.0.1-SNAPSHOT --only-fixed Vulnerability DB [no update available] Loaded image Parsed image Cataloged packages [62 packages] Scanned image [14 vulnerabilities] NAME INSTALLED FIXED-IN TYPE VULNERABILITY SEVERITY jackson-databind 2.13.3 2.13.4 java-archive GHSA-rgv9-q543-rqg4 High jackson-databind 2.13.3 2.13.4.1 java-archive GHSA-jjjh-jjxp-wpff High snakeyaml 1.30 1.31 java-archive GHSA-3mc7-4q67-w48m High snakeyaml 1.30 1.31 java-archive GHSA-98wm-3w3q-mw94 Medium snakeyaml 1.30 1.31 java-archive GHSA-c4r9-r8fh-9vj2 Medium snakeyaml 1.30 1.31 java-archive GHSA-hhhw-99gj-p3c3 Medium snakeyaml 1.30 1.32 java-archive GHSA-9w3m-gqgf-c4p9 Medium snakeyaml 1.30 1.32 java-archive GHSA-w37g-rhq8-7m4j Medium As you can see in the output, only the java-archive vulnerabilities are still present. The other vulnerabilities have been solved. Next, fix the Spring Boot dependency vulnerability. Change the version of Spring Boot from 2.7.0 to 2.7.6 in the POM. XML org.springframework.boot spring-boot-starter-parent 2.7.6 Build the JAR file, build the Docker image, and run the scan again: Shell $ mvn clean verify ... $ mvn dockerfile:build ... $ grype docker:mydeveloperplanet/mygrypeplanet:0.0.1-SNAPSHOT --only-fixed Vulnerability DB [no update available] Loaded image Parsed image Cataloged packages [62 packages] Scanned image [10 vulnerabilities] NAME INSTALLED FIXED-IN TYPE VULNERABILITY SEVERITY snakeyaml 1.30 1.31 java-archive GHSA-3mc7-4q67-w48m High snakeyaml 1.30 1.31 java-archive GHSA-98wm-3w3q-mw94 Medium snakeyaml 1.30 1.31 java-archive GHSA-c4r9-r8fh-9vj2 Medium snakeyaml 1.30 1.31 java-archive GHSA-hhhw-99gj-p3c3 Medium snakeyaml 1.30 1.32 java-archive GHSA-9w3m-gqgf-c4p9 Medium snakeyaml 1.30 1.32 java-archive GHSA-w37g-rhq8-7m4j Medium So, you got rid of the jackson-databind vulnerability, but not of the snakeyaml vulnerability. So, in which dependency is snakeyaml 1.30 being used? You can find out by means of the dependency:tree Maven command. For brevity purposes, only a part of the output is shown here: Shell $ mvnd dependency:tree ... com.mydeveloperplanet:mygrypeplanet:jar:0.0.1-SNAPSHOT [INFO] +- org.springframework.boot:spring-boot-starter-web:jar:2.7.6:compile [INFO] | +- org.springframework.boot:spring-boot-starter:jar:2.7.6:compile [INFO] | | +- org.springframework.boot:spring-boot:jar:2.7.6:compile [INFO] | | +- org.springframework.boot:spring-boot-autoconfigure:jar:2.7.6:compile [INFO] | | +- org.springframework.boot:spring-boot-starter-logging:jar:2.7.6:compile [INFO] | | | +- ch.qos.logback:logback-classic:jar:1.2.11:compile [INFO] | | | | \- ch.qos.logback:logback-core:jar:1.2.11:compile [INFO] | | | +- org.apache.logging.log4j:log4j-to-slf4j:jar:2.17.2:compile [INFO] | | | | \- org.apache.logging.log4j:log4j-api:jar:2.17.2:compile [INFO] | | | \- org.slf4j:jul-to-slf4j:jar:1.7.36:compile [INFO] | | +- jakarta.annotation:jakarta.annotation-api:jar:1.3.5:compile [INFO] | | \- org.yaml:snakeyaml:jar:1.30:compile ... The output shows us that the dependency is part of the spring-boot-starter-web dependency. So, how do you solve this? Strictly speaking, Spring has to solve it. But if you do not want to wait for a solution, you can solve it by yourself. Solution 1: You do not need the dependency. This is the easiest fix and is low risk. Just exclude the dependency from the spring-boot-starter-web dependency in the pom. XML org.springframework.boot spring-boot-starter-web org.yaml snakeyaml Build the JAR file, build the Docker image, and run the scan again: Shell $ mvn clean verify ... $ mvn dockerfile:build ... $ grype docker:mydeveloperplanet/mygrypeplanet:0.0.1-SNAPSHOT --only-fixed Vulnerability DB [no update available] Loaded image Parsed image Cataloged packages [61 packages] Scanned image [3 vulnerabilities] No vulnerabilities found No vulnerabilities are found anymore. Solution 2: You do need the dependency. You can replace this transitive dependency by means of dependencyManagement in the pom. This is a bit more tricky because the updated transitive dependency is not tested with the spring-boot-starter-web dependency. It is a trade-off whether you want to do this or not. Add the following section to the pom: XML org.yaml snakeyaml 1.32 Build the jar file, build the Docker image, and run the scan again: Shell $ mvn clean verify ... $ mvn dockerfile:build ... $ grype docker:mydeveloperplanet/mygrypeplanet:0.0.1-SNAPSHOT --only-fixed Vulnerability DB [no update available] Loaded image Parsed image Cataloged packages [62 packages] Scanned image [3 vulnerabilities] No vulnerabilities found Again, no vulnerabilities are present anymore. Solution 3: This is the solution when you do not want to do anything or whether it is a false positive notification. Create a .grype.yaml file where you exclude the vulnerability with High severity and execute the scan with the --config flag followed by the .grype.yaml file containing the exclusions. The .grype.yaml file looks as follows: YAML ignore: - vulnerability: GHSA-3mc7-4q67-w48m Run the scan again: Shell $ grype docker:mydeveloperplanet/mygrypeplanet:0.0.1-SNAPSHOT --only-fixed Vulnerability DB [no update available] Loaded image Parsed image Cataloged packages [62 packages] Scanned image [10 vulnerabilities] NAME INSTALLED FIXED-IN TYPE VULNERABILITY SEVERITY snakeyaml 1.30 1.31 java-archive GHSA-98wm-3w3q-mw94 Medium snakeyaml 1.30 1.31 java-archive GHSA-c4r9-r8fh-9vj2 Medium snakeyaml 1.30 1.31 java-archive GHSA-hhhw-99gj-p3c3 Medium snakeyaml 1.30 1.32 java-archive GHSA-9w3m-gqgf-c4p9 Medium snakeyaml 1.30 1.32 java-archive GHSA-w37g-rhq8-7m4j Medium The High vulnerability is not shown anymore. 7. Continuous Integration Now you know how to manually scan your Docker images. However, you probably want to scan the images as part of your continuous integration pipeline. In this section, a solution is provided when using Jenkins as a CI platform. The first question to answer is how you will be notified when vulnerabilities are found. Up until now, you only noticed the vulnerabilities by looking at the standard output. This is not a solution for a CI pipeline. You want to get notified and this can be done by failing the build. Grype has the --fail-on flag for this purpose. You probably do not want to fail the pipeline when a vulnerability with severity negligible has been found. Let’s see what happens when you execute this manually. First of all, introduce the vulnerabilities again in the Spring Boot application and in the Docker image. Build the JAR file, build the Docker image and run the scan with flag --fail-on: Shell $ mvn clean verify ... $ mvn dockerfile:build ... $ grype docker:mydeveloperplanet/mygrypeplanet:0.0.1-SNAPSHOT --only-fixed --fail-on high ... 1 error occurred: * discovered vulnerabilities at or above the severity threshold Not all output has been shown here, but only the important part. And, as you can see, at the end of the output, a message is shown that the scan has generated an error. This will cause your Jenkins pipeline to fail and as a consequence, the developers are notified that something went wrong. In order to add this to your Jenkins pipeline, several options exist. Here it is chosen to create the Docker image and execute the grype Docker scan from within Maven. There is no separate Maven plugin for grype, but you can use the exec-maven-plugin for this purpose. Add the following to the build-plugins section of the POM. XML org.codehaus.mojo exec-maven-plugin 3.1.0 grype docker:mydeveloperplanet/mygrypeplanet:${project.version} --scope all-layers --fail-on high --only-fixed -q Two extra flags are added here: --scope all-layers: This will scan all layers involved in the Docker image. -q: This will use quiet logging and will show only the vulnerabilities and possible failures. You can invoke this with the following command: Shell $ mvnd exec:exec You can add this to your Jenkinsfile inside the withMaven wrapper: Plain Text withMaven() { sh 'mvn dockerfile:build dockerfile:push exec:exec' } 8. Conclusion In this blog, you learned how to scan your Docker images by means of grype. Grype has some interesting, user-friendly features which allow you to efficiently add them to your Jenkins pipeline. Also, installing grype is quite easy. Grype is definitely a great improvement over Anchor Engine.
January 24, 2023
by Gunter Rotsaert CORE
· 3,250 Views · 2 Likes
article thumbnail
How Observability Is Redefining Developer Roles
This article will explore observability and inform readers about the evolution of developer roles and how developers can stay ahead of the observability game.
January 24, 2023
by Hiren Dhaduk
· 3,564 Views · 1 Like
article thumbnail
Distributed Stateful Edge Platforms
As companies move to compute and apps closer to where the data is being produced, they need to make their platforms easier and more cost-efficient to manage.
January 24, 2023
by Tom Smith CORE
· 3,027 Views · 1 Like
article thumbnail
Using QuestDB to Collect Infrastructure Metrics
In this article, readers will learn how QuestDB uses its own database to power the monitoring system of QuestDB Cloud with guide code and helpful visuals.
January 24, 2023
by Steve Sklar
· 2,740 Views · 2 Likes
article thumbnail
What Should You Know About Graph Database’s Scalability?
Graph database scalability how-to, designing a distributed database system, graph database query optimization.
January 20, 2023
by Ricky Sun
· 4,568 Views · 6 Likes
article thumbnail
What Is a Kubernetes CI/CD Pipeline?
A Kubernetes CI/CD pipeline is different from a traditional CI/CD pipeline. The primary difference is the containerization process.
January 20, 2023
by Jyoti Sahoo
· 4,475 Views · 3 Likes
article thumbnail
How To Create a Stub in 5 Minutes
Readers will learn how to create stubs in five minutes, which uses regression and load testing, debugging, and more, and how to configure the stubs flexibly.
January 17, 2023
by Andrei Rogalenko
· 2,988 Views · 3 Likes
article thumbnail
Visual Network Mapping Your K8s Clusters To Assess Performance
Visual network mapping is crucial for effective network management. Caretta and Grafana provide real-time network visualization and monitoring.
January 17, 2023
by Anton Lawrence CORE
· 3,076 Views · 2 Likes
article thumbnail
Tutorial: Developing a Scala Application and Connecting It to ScyllaDB NoSQL
How to create a sample Scala app and connect it to ScyllaDB NoSQL using the Phantom library for Scala: a Scala-idiomatic wrapper over a standard Java driver.
January 16, 2023
by Guy Shtub
· 2,053 Views · 1 Like
article thumbnail
Resolve Apache Kafka Starting Issue Installed on Single/Multi-Node Cluster
Without integrating Apache Zookeeper, Kafka alone won’t be able to form the complete Kafka cluster.
January 12, 2023
by Gautam Goswami CORE
· 2,244 Views · 1 Like
article thumbnail
Dockerizing an Ansible Playbook, Part 1
In Part 1 of this two-part series, we will go over the concept of Dockerization. And then, Dockerize an Ansible playbook to run in a Docker container.
January 12, 2023
by Gitanjali Sahoo
· 3,281 Views · 4 Likes
article thumbnail
Workload-Centric Over Infrastructure-Centric Development
This article takes a look at why a workload-centric approach is better for a developer's productivity in comparison to an infrastructure-centric approach.
January 12, 2023
by Susanne Tuenker
· 2,824 Views · 2 Likes
article thumbnail
Change Data Capture With QuestDB and Debezium
This article goes into detail about streaming data into QuestDB with change data capture via Debezium and Kafka Connect with charts and guide code blocks.
January 10, 2023
by Yitaek Hwang CORE
· 1,876 Views · 1 Like
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • ...
  • Next

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: