Software Integration
Seamless communication — that, among other consequential advantages, is the ultimate goal when integrating your software. And today, integrating modern software means fusing various applications and/or systems — many times across distributed environments — with the common goal of unifying isolated data. This effort often signifies the transition of legacy applications to cloud-based systems and messaging infrastructure via microservices and REST APIs.So what's next? Where is the path to seamless communication and nuanced architecture taking us? Dive into our 2023 Software Integration Trend Report and fill the gaps among modern integration practices by exploring trends in APIs, microservices, and cloud-based systems and migrations. You have to integrate to innovate!
Distributed SQL Essentials
Advanced Cloud Security
Do you consider yourself a developer? If "yes," then this survey is for you. We need you to share your knowledge on web and mobile development, how (if) you leverage low code, scalability challenges, and more. The research covered in our Trend Reports depends on your feedback and helps shape the report. Our April Trend Report this year focuses on you, the developer. This Trend Report explores development trends and how they relate to scalability within organizations, highlighting application challenges, code, and more. And this is where we could use your insights! We're asking for ~10 minutes of your time to share your experience.Let us know your thoughts on the future of the developer! And enter for a chance to win one of 10 $50 gift cards! Take Our Survey Over the coming weeks, we will compile and analyze data from hundreds of DZone members to help inform the "Key Research Findings" for our upcoming April Trend Report, Development at Scale: An Exploration of Mobile, Web, and Low-Code Applications. Your responses help shape the narrative of our Trend Reports, so we cannot do this without you. The DZone Publications team thanks you in advance for all your help!
In this blog post, you will be using AWS Controllers for Kubernetes on an Amazon EKS cluster to put together a solution wherein data from an Amazon SQS queue is processed by an AWS Lambda function and persisted to a DynamoDB table. AWS Controllers for Kubernetes (also known as ACK) leverage Kubernetes Custom Resource and Custom Resource Definitions and give you the ability to manage and use AWS services directly from Kubernetes without needing to define resources outside of the cluster. The idea behind ACK is to enable Kubernetes users to describe the desired state of AWS resources using the Kubernetes API and configuration language. ACK will then take care of provisioning and managing the AWS resources to match the desired state. This is achieved by using Service controllers that are responsible for managing the lifecycle of a particular AWS service. Each ACK service controller is packaged into a separate container image that is published in a public repository corresponding to an individual ACK service controller. There is no single ACK container image. Instead, there are container images for each individual ACK service controller that manages resources for a particular AWS API. This blog post will walk you through how to use the SQS, DynamoDB, and Lambda service controllers for ACK. Prerequisites To follow along step-by-step, in addition to an AWS account, you will need to have AWS CLI, kubectl, and Helm installed. There are a variety of ways in which you can create an Amazon EKS cluster. I prefer using eksctl CLI because of the convenience it offers. Creating an EKS cluster using eksctl can be as easy as this: eksctl create cluster --name my-cluster --region region-code For details, refer to Getting started with Amazon EKS – eksctl. Clone this GitHub repository and change it to the right directory: git clone https://github.com/abhirockzz/k8s-ack-sqs-lambda cd k8s-ack-sqs-lambda Ok, let's get started! Setup the ACK Service Controllers for AWS Lambda, SQS, and DynamoDB Install ACK Controllers Log into the Helm registry that stores the ACK charts: aws ecr-public get-login-password --region us-east-1 | helm registry login --username AWS --password-stdin public.ecr.aws Deploy the ACK service controller for Amazon Lambda using the lambda-chart Helm chart: RELEASE_VERSION_LAMBDA_ACK=$(curl -sL "https://api.github.com/repos/aws-controllers-k8s/lambda-controller/releases/latest" | grep '"tag_name":' | cut -d'"' -f4) helm install --create-namespace -n ack-system oci://public.ecr.aws/aws-controllers-k8s/lambda-chart "--version=${RELEASE_VERSION_LAMBDA_ACK}" --generate-name --set=aws.region=us-east-1 Deploy the ACK service controller for SQS using the sqs-chart Helm chart: RELEASE_VERSION_SQS_ACK=$(curl -sL "https://api.github.com/repos/aws-controllers-k8s/sqs-controller/releases/latest" | grep '"tag_name":' | cut -d'"' -f4) helm install --create-namespace -n ack-system oci://public.ecr.aws/aws-controllers-k8s/sqs-chart "--version=${RELEASE_VERSION_SQS_ACK}" --generate-name --set=aws.region=us-east-1 Deploy the ACK service controller for DynamoDB using the dynamodb-chart Helm chart: RELEASE_VERSION_DYNAMODB_ACK=$(curl -sL "https://api.github.com/repos/aws-controllers-k8s/dynamodb-controller/releases/latest" | grep '"tag_name":' | cut -d'"' -f4) helm install --create-namespace -n ack-system oci://public.ecr.aws/aws-controllers-k8s/dynamodb-chart "--version=${RELEASE_VERSION_DYNAMODB_ACK}" --generate-name --set=aws.region=us-east-1 Now, it's time to configure the IAM permissions for the controller to invoke Lambda, DynamoDB, and SQS. Configure IAM Permissions Create an OIDC Identity Provider for Your Cluster For the steps below, replace the EKS_CLUSTER_NAME and AWS_REGION variables with your cluster name and region. export EKS_CLUSTER_NAME=demo-eks-cluster export AWS_REGION=us-east-1 eksctl utils associate-iam-oidc-provider --cluster $EKS_CLUSTER_NAME --region $AWS_REGION --approve OIDC_PROVIDER=$(aws eks describe-cluster --name $EKS_CLUSTER_NAME --query "cluster.identity.oidc.issuer" --output text | cut -d '/' -f2- | cut -d '/' -f2-) Create IAM Roles for Lambda, SQS, and DynamoDB ACK Service Controllers ACK Lambda Controller Set the following environment variables: ACK_K8S_SERVICE_ACCOUNT_NAME=ack-lambda-controller ACK_K8S_NAMESPACE=ack-system AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text) Create the trust policy for the IAM role: read -r -d '' TRUST_RELATIONSHIP <<EOF { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_PROVIDER}" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringEquals": { "${OIDC_PROVIDER}:sub": "system:serviceaccount:${ACK_K8S_NAMESPACE}:${ACK_K8S_SERVICE_ACCOUNT_NAME}" } } } ] } EOF echo "${TRUST_RELATIONSHIP}" > trust_lambda.json Create the IAM role: ACK_CONTROLLER_IAM_ROLE="ack-lambda-controller" ACK_CONTROLLER_IAM_ROLE_DESCRIPTION="IRSA role for ACK lambda controller deployment on EKS cluster using Helm charts" aws iam create-role --role-name "${ACK_CONTROLLER_IAM_ROLE}" --assume-role-policy-document file://trust_lambda.json --description "${ACK_CONTROLLER_IAM_ROLE_DESCRIPTION}" Attach IAM policy to the IAM role: # we are getting the policy directly from the ACK repo INLINE_POLICY="$(curl https://raw.githubusercontent.com/aws-controllers-k8s/lambda-controller/main/config/iam/recommended-inline-policy)" aws iam put-role-policy \ --role-name "${ACK_CONTROLLER_IAM_ROLE}" \ --policy-name "ack-recommended-policy" \ --policy-document "${INLINE_POLICY}" Attach ECR permissions to the controller IAM role. These are required since Lambda functions will be pulling images from ECR. aws iam put-role-policy \ --role-name "${ACK_CONTROLLER_IAM_ROLE}" \ --policy-name "ecr-permissions" \ --policy-document file://ecr-permissions.json Associate the IAM role to a Kubernetes service account: ACK_CONTROLLER_IAM_ROLE_ARN=$(aws iam get-role --role-name=$ACK_CONTROLLER_IAM_ROLE --query Role.Arn --output text) export IRSA_ROLE_ARN=eks.amazonaws.com/role-arn=$ACK_CONTROLLER_IAM_ROLE_ARN kubectl annotate serviceaccount -n $ACK_K8S_NAMESPACE $ACK_K8S_SERVICE_ACCOUNT_NAME $IRSA_ROLE_ARN Repeat the steps for the SQS controller. ACK SQS Controller Set the following environment variables: ACK_K8S_SERVICE_ACCOUNT_NAME=ack-sqs-controller ACK_K8S_NAMESPACE=ack-system AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text) Create the trust policy for the IAM role: read -r -d '' TRUST_RELATIONSHIP <<EOF { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_PROVIDER}" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringEquals": { "${OIDC_PROVIDER}:sub": "system:serviceaccount:${ACK_K8S_NAMESPACE}:${ACK_K8S_SERVICE_ACCOUNT_NAME}" } } } ] } EOF echo "${TRUST_RELATIONSHIP}" > trust_sqs.json Create the IAM role: ACK_CONTROLLER_IAM_ROLE="ack-sqs-controller" ACK_CONTROLLER_IAM_ROLE_DESCRIPTION="IRSA role for ACK sqs controller deployment on EKS cluster using Helm charts" aws iam create-role --role-name "${ACK_CONTROLLER_IAM_ROLE}" --assume-role-policy-document file://trust_sqs.json --description "${ACK_CONTROLLER_IAM_ROLE_DESCRIPTION}" Attach IAM policy to the IAM role: # for sqs controller, we use the managed policy ARN instead of the inline policy (unlike the Lambda controller) POLICY_ARN="$(curl https://raw.githubusercontent.com/aws-controllers-k8s/sqs-controller/main/config/iam/recommended-policy-arn)" aws iam attach-role-policy --role-name "${ACK_CONTROLLER_IAM_ROLE}" --policy-arn "${POLICY_ARN}" Associate the IAM role to a Kubernetes service account: ACK_CONTROLLER_IAM_ROLE_ARN=$(aws iam get-role --role-name=$ACK_CONTROLLER_IAM_ROLE --query Role.Arn --output text) export IRSA_ROLE_ARN=eks.amazonaws.com/role-arn=$ACK_CONTROLLER_IAM_ROLE_ARN kubectl annotate serviceaccount -n $ACK_K8S_NAMESPACE $ACK_K8S_SERVICE_ACCOUNT_NAME $IRSA_ROLE_ARN Repeat the steps for the DynamoDB controller. ACK DynamoDB Controller Set the following environment variables: ACK_K8S_SERVICE_ACCOUNT_NAME=ack-dynamodb-controller ACK_K8S_NAMESPACE=ack-system AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text) Create the trust policy for the IAM role: read -r -d '' TRUST_RELATIONSHIP <<EOF { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_PROVIDER}" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringEquals": { "${OIDC_PROVIDER}:sub": "system:serviceaccount:${ACK_K8S_NAMESPACE}:${ACK_K8S_SERVICE_ACCOUNT_NAME}" } } } ] } EOF echo "${TRUST_RELATIONSHIP}" > trust_dynamodb.json Create the IAM role: ACK_CONTROLLER_IAM_ROLE="ack-dynamodb-controller" ACK_CONTROLLER_IAM_ROLE_DESCRIPTION="IRSA role for ACK dynamodb controller deployment on EKS cluster using Helm charts" aws iam create-role --role-name "${ACK_CONTROLLER_IAM_ROLE}" --assume-role-policy-document file://trust_dynamodb.json --description "${ACK_CONTROLLER_IAM_ROLE_DESCRIPTION}" Attach IAM policy to the IAM role: # for dynamodb controller, we use the managed policy ARN instead of the inline policy (like we did for Lambda controller) POLICY_ARN="$(curl https://raw.githubusercontent.com/aws-controllers-k8s/dynamodb-controller/main/config/iam/recommended-policy-arn)" aws iam attach-role-policy --role-name "${ACK_CONTROLLER_IAM_ROLE}" --policy-arn "${POLICY_ARN}" Associate the IAM role to a Kubernetes service account: ACK_CONTROLLER_IAM_ROLE_ARN=$(aws iam get-role --role-name=$ACK_CONTROLLER_IAM_ROLE --query Role.Arn --output text) export IRSA_ROLE_ARN=eks.amazonaws.com/role-arn=$ACK_CONTROLLER_IAM_ROLE_ARN kubectl annotate serviceaccount -n $ACK_K8S_NAMESPACE $ACK_K8S_SERVICE_ACCOUNT_NAME $IRSA_ROLE_ARN Restart ACK Controller Deployments and Verify the Setup Restart the ACK service controller Deployment using the following commands. It will update service controller Pods with IRSA environment variables. Get list of ACK service controller deployments: export ACK_K8S_NAMESPACE=ack-system kubectl get deployments -n $ACK_K8S_NAMESPACE Restart Lambda, SQS, and DynamoDB controller Deployments: DEPLOYMENT_NAME_LAMBDA=<enter deployment name for lambda controller> kubectl -n $ACK_K8S_NAMESPACE rollout restart deployment $DEPLOYMENT_NAME_LAMBDA DEPLOYMENT_NAME_SQS=<enter deployment name for sqs controller> kubectl -n $ACK_K8S_NAMESPACE rollout restart deployment $DEPLOYMENT_NAME_SQS DEPLOYMENT_NAME_DYNAMODB=<enter deployment name for dynamodb controller> kubectl -n $ACK_K8S_NAMESPACE rollout restart deployment $DEPLOYMENT_NAME_DYNAMODB List Pods for these Deployments. Verify that the AWS_WEB_IDENTITY_TOKEN_FILE and AWS_ROLE_ARN environment variables exist for your Kubernetes Pod using the following commands: kubectl get pods -n $ACK_K8S_NAMESPACE LAMBDA_POD_NAME=<enter Pod name for lambda controller> kubectl describe pod -n $ACK_K8S_NAMESPACE $LAMBDA_POD_NAME | grep "^\s*AWS_" SQS_POD_NAME=<enter Pod name for sqs controller> kubectl describe pod -n $ACK_K8S_NAMESPACE $SQS_POD_NAME | grep "^\s*AWS_" DYNAMODB_POD_NAME=<enter Pod name for dynamodb controller> kubectl describe pod -n $ACK_K8S_NAMESPACE $DYNAMODB_POD_NAME | grep "^\s*AWS_" Now that the ACK service controller has been set up and configured, you can create AWS resources! Create SQS Queue, DynamoDB Table, and Deploy the Lambda Function Create SQS Queue In the file sqs-queue.yaml, replace the us-east-1 region with your preferred region as well as the AWS account ID. This is what the ACK manifest for the SQS queue looks like: apiVersion: sqs.services.k8s.aws/v1alpha1 kind: Queue metadata: name: sqs-queue-demo-ack annotations: services.k8s.aws/region: us-east-1 spec: queueName: sqs-queue-demo-ack policy: | { "Statement": [{ "Sid": "__owner_statement", "Effect": "Allow", "Principal": { "AWS": "AWS_ACCOUNT_ID" }, "Action": "sqs:SendMessage", "Resource": "arn:aws:sqs:us-east-1:AWS_ACCOUNT_ID:sqs-queue-demo-ack" }] } Create the queue using the following command: kubectl apply -f sqs-queue.yaml # list the queue kubectl get queue Create DynamoDB Table This is what the ACK manifest for the DynamoDB table looks like: apiVersion: dynamodb.services.k8s.aws/v1alpha1 kind: Table metadata: name: customer annotations: services.k8s.aws/region: us-east-1 spec: attributeDefinitions: - attributeName: email attributeType: S billingMode: PAY_PER_REQUEST keySchema: - attributeName: email keyType: HASH tableName: customer You can replace the us-east-1 region with your preferred region. Create a table (named customer) using the following command: kubectl apply -f dynamodb-table.yaml # list the tables kubectl get tables Build Function Binary and Create Docker Image GOARCH=amd64 GOOS=linux go build -o main main.go aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws docker build -t demo-sqs-dynamodb-func-ack . Create a private ECR repository, tag and push the Docker image to ECR: AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text) aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin $AWS_ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com aws ecr create-repository --repository-name demo-sqs-dynamodb-func-ack --region us-east-1 docker tag demo-sqs-dynamodb-func-ack:latest $AWS_ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/demo-sqs-dynamodb-func-ack:latest docker push $AWS_ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/demo-sqs-dynamodb-func-ack:latest Create an IAM execution Role for the Lambda function and attach the required policies: export ROLE_NAME=demo-sqs-dynamodb-func-ack-role ROLE_ARN=$(aws iam create-role \ --role-name $ROLE_NAME \ --assume-role-policy-document '{"Version": "2012-10-17","Statement": [{ "Effect": "Allow", "Principal": {"Service": "lambda.amazonaws.com"}, "Action": "sts:AssumeRole"}]}' \ --query 'Role.[Arn]' --output text) aws iam attach-role-policy --role-name $ROLE_NAME --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole Since the Lambda function needs to write data to DynamoDB and invoke SQS, let's add the following policies to the IAM role: aws iam put-role-policy \ --role-name "${ROLE_NAME}" \ --policy-name "dynamodb-put" \ --policy-document file://dynamodb-put.json aws iam put-role-policy \ --role-name "${ROLE_NAME}" \ --policy-name "sqs-permissions" \ --policy-document file://sqs-permissions.json Create the Lambda Function Update function.yaml file with the following info: imageURI - The URI of the Docker image that you pushed to ECR, e.g., <AWS_ACCOUNT_ID>.dkr.ecr.us-east-1.amazonaws.com/demo-sqs-dynamodb-func-ack:latest role - The ARN of the IAM role that you created for the Lambda function, e.g., arn:aws:iam::<AWS_ACCOUNT_ID>:role/demo-sqs-dynamodb-func-ack-role This is what the ACK manifest for the Lambda function looks like: apiVersion: lambda.services.k8s.aws/v1alpha1 kind: Function metadata: name: demo-sqs-dynamodb-func-ack annotations: services.k8s.aws/region: us-east-1 spec: architectures: - x86_64 name: demo-sqs-dynamodb-func-ack packageType: Image code: imageURI: AWS_ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/demo-sqs-dynamodb-func-ack:latest environment: variables: TABLE_NAME: customer role: arn:aws:iam::AWS_ACCOUNT_ID:role/demo-sqs-dynamodb-func-ack-role description: A function created by ACK lambda-controller To create the Lambda function, run the following command: kubectl create -f function.yaml # list the function kubectl get functions Add SQS Trigger Configuration Add SQS trigger which will invoke the Lambda function when an event is sent to the SQS queue. Here is an example using AWS Console: Open the Lambda function in the AWS Console and click on the Add trigger button. Select SQS as the trigger source, select the SQS queue, and click on the Add button. Now you are ready to try out the end-to-end solution! Test the Application Send a few messages to the SQS queue. For the purposes of this demo, you can use the AWS CLI: export SQS_QUEUE_URL=$(kubectl get queues/sqs-queue-demo-ack -o jsonpath='{.status.queueURL}') aws sqs send-message --queue-url $SQS_QUEUE_URL --message-body user1@foo.com --message-attributes 'name={DataType=String, StringValue="user1"}, city={DataType=String,StringValue="seattle"}' aws sqs send-message --queue-url $SQS_QUEUE_URL --message-body user2@foo.com --message-attributes 'name={DataType=String, StringValue="user2"}, city={DataType=String,StringValue="tel aviv"}' aws sqs send-message --queue-url $SQS_QUEUE_URL --message-body user3@foo.com --message-attributes 'name={DataType=String, StringValue="user3"}, city={DataType=String,StringValue="new delhi"}' aws sqs send-message --queue-url $SQS_QUEUE_URL --message-body user4@foo.com --message-attributes 'name={DataType=String, StringValue="user4"}, city={DataType=String,StringValue="new york"}' The Lambda function should be invoked and the data should be written to the DynamoDB table. Check the DynamoDB table using the CLI (or AWS console): aws dynamodb scan --table-name customer Clean Up After you have explored the solution, you can clean up the resources by running the following commands: Delete SQS queue, DynamoDB table and the Lambda function: kubectl delete -f sqs-queue.yaml kubectl delete -f function.yaml kubectl delete -f dynamodb-table.yaml To uninstall the ACK service controllers, run the following commands: export ACK_SYSTEM_NAMESPACE=ack-system helm ls -n $ACK_SYSTEM_NAMESPACE helm uninstall -n $ACK_SYSTEM_NAMESPACE <enter name of the sqs chart> helm uninstall -n $ACK_SYSTEM_NAMESPACE <enter name of the lambda chart> helm uninstall -n $ACK_SYSTEM_NAMESPACE <enter name of the dynamodb chart> Conclusion and Next Steps In this post, we have seen how to use AWS Controllers for Kubernetes to create a Lambda function, SQS, and DynamoDB table and wire them together to deploy a solution. All of this (almost) was done using Kubernetes! I encourage you to try out other AWS services supported by ACK. Here is a complete list. Happy building!
In today's world of real-time data processing and analytics, streaming databases have become an essential tool for businesses that want to stay ahead of the game. These databases are specifically designed to handle data that is generated continuously and at high volumes, making them perfect for use cases such as the Internet of Things (IoT), financial trading, and social media analytics. However, with so many options available in the market, choosing the right streaming database can be a daunting task. This post helps you understand what SQL streaming is, the streaming database, when and why to use it, and discusses some key factors that you should consider when choosing the right streaming database for your business. Learning Objectives You will learn the following throughout the article: What streaming data is (event stream processing) What is the Streaming SQL method? Streaming Database features and use cases Top 5 streaming databases (both open-source and SaaS). Criteria for choosing a streaming database Let's review quickly some concepts like what’s streaming data, streaming SQL, and databases in the next few sections. What Is Streaming Data? A stream is a sequence of events/data elements made available over time. Data streaming is a method of processing and analyzing data in real-time as it's generated by various sources such as sensors, e-commerce purchases, web and mobile applications, social networks, and many more. It involves the continuous and persistent collection, processing, and delivery of data in the form of events or messages. You can ingest data from different data sources such as message brokers Kafka, Redpanda, Kinesis, Pulsar, or databases MySQL or PostgreSQL using their Change Data Capture (CDC), which is the process of identifying and capturing data changes. What Is Streaming SQL? Once you collect data, you can store this data in a streaming database (in the next section, it is explained), where it can be processed and analyzed using SQL queries with SQL streaming. It is a technique for processing real-time data streams using SQL queries. It allows businesses to use the same SQL language they use for batch processing to query and process data streams in real-time. The data can be transformed, filtered, and aggregated in real-time from the stream into more useful outputs like materialized view (CREATE MATERIALIZED VIEW) to provide insights and enable automated decision-making. Materialized views are typically used in situations where a complex query needs to be executed frequently or where the query result takes a long time to compute. By precomputing the result and storing it in a materialized view (virtual tables), queries can be executed more quickly and with less overhead. PostgreSQL, Microsoft SQL Server, RisingWave or Materialize support materialized views with automatic updates. One of the key benefits of SQL streaming is that it allows businesses to leverage their existing SQL skills and infrastructure to process real-time data. This can be more efficient than having to learn new programming languages such as Java, and Scala, or tools to work with data streams. What Is a Streaming Database? A streaming database, also known as a real-time database, is a database management system that is designed to handle a continuous stream of data in real-time. It is optimized for processing and storing large volumes of data that arrive in a continuous and rapid stream. A streaming database uses the same declarative SQL and the same abstractions (tables, columns, rows, views, indexes) as a traditional database. Unlike in traditional databases, data is stored in tables matching the structure of the writes (inserts, updates), and all the computation work happens on read queries (selects); streaming databases operate on a continuous basis, processing data as it arrives and saving it to persistent storage in the form of a materialized view. This allows for immediate analysis and response to real-time events, enabling businesses to make decisions and take actions based on the most up-to-date information. Streaming databases typically use specialized data structures and algorithms that are optimized for fast and efficient data processing. They also support complex event processing (CEP) and other real-time analytics tools to help businesses gain insights and extract value from the data in real-time. One of the features unique to streaming databases is the ability to update incrementally materialized views. What Can You Do With the Stream Database? Here are some of the things you can do with a streaming database: Collect and transform data from different streams/data sources, such as Apache Kafka. Create materialized views for the data that needs to be incrementally aggregated. Query complex stream data with simple SQL syntaxes. After aggregating and analyzing real-time data streams, you can use real-time analytics to trigger downstream applications. 5 Top Streaming Databases As there are various types of streaming databases available, and numerous features are provided by each. Below, I have shared the 5 top streaming databases (both open-source and SaaS) and note that they are not in the specific order of popularity or use. RisingWave. Materialize. Amazon Kinesis. Confluent. Apache Flink. How To Select Your Streaming Databases Choosing the right streaming data platform can be a challenging task, as there are several factors to consider. Here are some key considerations to keep in mind when selecting a streaming data platform: Data Sources: Consider the types of data sources that the platform can ingest and process. Make sure the platform can handle the data sources you need. Kafka, Redpanda, Apache Pulsar, AWS Kinesis, Google Pub/Sub are mostly used as stream source services/message brokers. Or databases such as PostgreSQL or MySQL. Scalability: Consider the ability of the platform to scale as your data needs grow. Some platforms may be limited in their ability to scale, while others can handle large volumes of data and multiple concurrent users. Make sure that the scaling process can be completed almost instantaneously without interrupting data processing. For example, the open-source project RisingWave dynamically partitions data into each compute node using a consistent hashing algorithm. These compute nodes collaborate by computing their unique portion of the data and then exchanging output with each other. In the case of streaming data platforms in cloud providers, they support auto-scaling features out-of-the-box, so it is not an issue. Integration: Consider the ability of the platform to integrate with other systems and tools, such as BI and data analytics platforms you are currently using or plan to use in the future. Make sure the platform supports the protocols and APIs that you need to connect with your other systems. RisingWave has integration with many BI services that, include Grafana, Metabase, Apache Superset, and so on. Performance: Consider the speed and efficiency of the platform. Some platforms may perform better than others in terms of query speed, data processing, and analysis. Therefore, you need to select a streaming database that can extract, transform and load millions of records in seconds. The key performance indicators (KPIs) for streaming data platforms are event rate, throughput (event rate times event size), latency, reliability, and the number of topics (for pub-sub architectures). Sometimes compared to JVM-based systems, a platform designed with a low-level programming language such as Rust can be super fast. Security: Consider the security features of the platform, such as access controls, data encryption, and compliance certifications, to ensure your data is protected. Ease of Use: Consider the ease of use of the platform, including its user interface, documentation, and support resources. Make sure the platform is easy to use and provides adequate support for your team. Cost: Consider the cost of the platform, including licensing fees, maintenance costs, and any additional hardware or software requirements. Make sure the platform fits within your budget and provides a good return on investment. Summary In summary, streaming databases offer several unique features, including real-time data processing, event-driven architecture, continuous processing, low latency, scalability, support for various data formats, and flexibility. These features enable faster insights, better decision-making, and more efficient use of data in real-time applications. The best streaming database for your use case will depend on your specific requirements, such as supported data sources, volume and velocity, data structure, scalability, performance, integration, and cost. It's important to carefully evaluate each option based on these factors to determine the best fit for your organization.
When it comes to managing large amounts of data in a distributed system, Apache Cassandra and Apache Pulsar are two names that often come up. Apache Cassandra is a highly scalable NoSQL database that excels at handling high-velocity writes and queries across multiple nodes. It is an ideal solution for use cases such as user profile management, product catalogs, and real-time analytics. A platform for distributed messaging and streaming, called Apache Pulsar, was created to manage moving data. It can handle standard messaging workloads and more complex streaming use cases including real-time data processing and event-driven architectures. This article covers the main steps of building a Spring Boot and React-based web application that interacts with Pulsar and Cassandra, displaying stock data live as it is received. This is not a complete tutorial, it only covers the most important steps. You can find the complete source code for the application on GitHub. You will learn how to: Set up Cassandra and Pulsar instances using DataStax Astra DB and Astra Streaming. Publish and consume Pulsar messages in a Spring Boot application. Store Pulsar messages in Cassandra using a sink. Viewing live and stored data in React using the Hilla framework by Vaadin. Used Technologies and Libraries Apache Cassandra (with Astra DB) Apache Pulsar (with Astra Streaming) Spring Boot Spring for Apache Pulsar Spring Data for Apache Cassandra React Hilla AlphaVantage API Requirements Java 17 or newer Node 18 or newer Intermediate Java skills and familiarity with Spring Boot Storing Sensitive Data in Spring Boot Much of the setup for Cassandra and Pulsar is configuration-based. While it might be tempting to put the configuration in application.properties, it is not a smart idea as the file is under source control, and you may unintentionally reveal secrets. Instead, create a local config/local/application.properties configuration file and add it to .gitignore to ensure it does not leave your computer. The settings from the local configuration file will be automatically applied by Spring Boot: mkdir -p config/local touch config/local/application.properties echo " # Contains secrets that shouldn't go into the repository config/local/" >> .gitignore You may provide Spring Boot with the options as environment variables when using it in production. Setting Up Cassandra and Pulsar Using DataStax Astra Both Apache technologies used in this article are open-source projects and can be installed locally. However, using cloud services to set up the instances is a simpler option. In this article, we set up the data infrastructure required for our example web application using DataStax free tier services. Begin by logging in to your existing account or signing up for a new one on Astra DataStax’s official website, where you will be required to create the database and streaming service separately. Cassandra Setup Start by clicking “Create Database” from the official Astra DataStax website. Sinking data from a stream into Astra DB requires that both services are deployed in a region that supports both Astra Streaming and Astra DB: Enter the name of your new database instance. Select the keyspace name. (A keyspace stores your group of tables, a bit like schema in relational databases). Select a cloud Provider and Region.Note: For the demo application to work, you need to deploy the database service on a region that supports streaming too. Select “Create Database.” Cassandra: Connecting to the Service Once the initialization of the database service is created, you need to generate a token and download the “Secure Connection Bundle” that encrypts the data transfer between the app and the cloud database (mTLS). Navigate to the DB dashboard “Connect” tab sheet where you will find the button to generate a one-time token (please remember to download it) and the bundle download button: spring.cassandra.schema-action=CREATE_IF_NOT_EXISTS spring.cassandra.keyspace-name=<KEYSPACE_NAME> spring.cassandra.username=<ASTRADB_TOKEN_CLIENT_ID> spring.cassandra.password=<ASTRADB_TOKEN_SECRET> # Increase timeouts when connecting to Astra from a dev workstation spring.cassandra.contact-points=<ASTRADB_DATACENTER_ID> spring.cassandra.port=9042 spring.cassandra.local-datacenter=<ASTRADB_REGION> datastax.astra.secure-connect-bundle=<secure-connect-astra-stock-db.zip> Pulsar parameters for application.properties. Pulsar Set Up Start by clicking “Create Stream” from the main Astra DataStax page: Enter the name for your new streaming instance. Select a provider and region.Note: Remember to use the same provider and region you used to create the database service. Select “Create Stream.” Pulsar: Enabling Auto Topic Creation In addition to getting the streaming service up and running, you will also need to define the topic that is used by the application to consume and produce messages. You can create a topic explicitly using UI, but a more convenient way is to enable “Allow Auto Topic Creation” setting for the created instance: Click on the newly created stream instance and navigate to the “Namespace and Topics” tab sheet, and click “Modify Namespace.” Navigate to the “Settings” tab located under the default namespace (not the top-level “Settings” tab) and scroll all the way down. Change the “Allow Topic Creation” to “Allow Auto Topic Creation.” Changing this default setting will allow the application to create new topics automatically without any additional admin effort in Astra. With this, you have successfully established the infrastructure for hosting your active and passive data. Pulsar: Connecting to the Service Once the streaming instance has been set up, you need to create a token to access the service from your app. Most of the necessary properties are located on the “Connect” tab sheet of the “Streaming dashboard.” The “topic-name” input is found in the “Namespaces and Topics” tab sheet: ## Client spring.pulsar.client.service-url=<Broker Service URL> spring.pulsar.client.auth-plugin-class-name=org.apache.pulsar.client.impl.auth.AuthenticationToken spring.pulsar.client.authentication.token=<Astra_Streaming_Token> ## Producer spring.pulsar.producer.topic-name=persistent://<TENANT_NAME>/default/<TOPIC_NAME> spring.pulsar.producer.producer-name=<name of your choice> ## Consumer spring.pulsar.consumer.topics=persistent://<TENANT_NAME>/default/<TOPIC_NAME> spring.pulsar.consumer.subscription-name=<name of your choice> spring.pulsar.consumer.consumer-name=<name of your choice> spring.pulsar.consumer.subscription-type=key_shared Pulsar parameters for application.properties. Publishing Pulsar Messages From Spring Boot The Spring for Apache Pulsar library takes care of setting up Pulsar producers and consumers based on the given configuration. In the application, the StockPriceProducer component handles message publishing. To fetch stock data, it makes use of an external API call before publishing it to a Pulsar stream using a PulsarTemplate. Autowire the PulsarTemplate into the class and save it to a field: Java @Component public class StockPriceProducer { private final PulsarTemplate<StockPrice> pulsarTemplate; public StockPriceProducer(PulsarTemplate<StockPrice> pulsarTemplate) { this.pulsarTemplate = pulsarTemplate; } //... } Then use it to publish messages: Java private void publishStockPrices(Stream<StockPrice> stockPrices) { // Publish items to Pulsar with 100ms intervals Flux.fromStream(stockPrices) // Delay elements for the demo, don't do this in real life .delayElements(Duration.ofMillis(100)) .subscribe(stockPrice -> { try { pulsarTemplate.sendAsync(stockPrice); } catch (PulsarClientException e) { throw new RuntimeException(e); } }); } You need to configure the schema for the custom StockPrice type. In Application.java, define the following bean: Java @Bean public SchemaResolver.SchemaResolverCustomizer<DefaultSchemaResolver> schemaResolverCustomizer() { return (schemaResolver) -> schemaResolver.addCustomSchemaMapping(StockPrice.class, Schema.JSON(StockPrice.class)); } Consuming Pulsar Messages in Spring Boot The Spring for Apache Pulsar library comes with a @PulsarListener annotation for a convenient way of listening to Pulsar messages. Here, the messages are emitted to a Project Reactor Sink so the UI can consume them as a Flux: Java @Service public class StockPriceConsumer { private final Sinks.Many<StockPrice> stockPriceSink = Sinks.many().multicast().directBestEffort(); private final Flux<StockPrice> stockPrices = stockPriceSink.asFlux(); @PulsarListener private void stockPriceReceived(StockPrice stockPrice) { stockPriceSink.tryEmitNext(stockPrice); } public Flux<StockPrice> getStockPrices() { return stockPrices; } } Creating a Server Endpoint for Accessing Data From React The project uses Hilla, a full-stack web framework for Spring Boot. It manages websocket connections for reactive data types and allows type-safe server communication. The client may utilize the matching TypeScript methods created by the StockPriceEndpoint to fetch data: Java @Endpoint @AnonymousAllowed public class StockPriceEndpoint { private final StockPriceProducer producer; private final StockPriceConsumer consumer; private final StockPriceService service; StockPriceEndpoint(StockPriceProducer producer, StockPriceConsumer consumer, StockPriceService service) { this.producer = producer; this.consumer = consumer; this.service = service; } public List<StockSymbol> getSymbols() { return StockSymbol.supportedSymbols(); } public void produceDataForTicker(String ticker) { producer.produceStockPriceData(ticker); } public Flux<StockPrice> getStockPriceStream() { return consumer.getStockPrices(); } public List<StockPrice> findAllByTicker(String ticker) { return service.findAllByTicker(ticker); } } Displaying a Live-Updating Chart in React The DashboardView has an Apex Chart candle stick chart for displaying the stock data. It’s bound to a state of type ApexAxisChartSeries: TypeScript const [series, setSeries] = useState<ApexAxisChartSeries>([]); The view uses a React effect hook to call the server endpoint and subscribe to new data. It returns a disposer function to close the websocket when it is no longer needed: TypeScript useEffect(() => { const subscription = StockPriceEndpoint .getStockPriceStream() .onNext((stockPrice) => updateSeries(stockPrice)); return () => subscription.cancel(); }, []); The series is bound to the template. Because the backend and frontend are reactive, the chart is automatically updated any time a new Pulsar message is received: HTML <ReactApexChart type="candlestick" options={options} series={series} height={350} ></div> Persisting Pulsar Messages to Cassandra Sinking Pulsar messages to Astra DB can be useful in scenarios where you need a reliable, scalable, and secure platform to store event data from Pulsar for further analysis, processing, or sharing. Perhaps you need to retain a copy of event data for compliance and auditing purposes, need to store event data from multiple tenants in a shared database, or for some other use case. Astra Streaming offers numerous fully-managed Apache Pulsar connectors you can use to persist event data to various databases and third party solutions, like Snowflake. In this article, we are persisting the stream data into Astra DB. Creating a Sink Start by selecting the “Sink” tab sheet from the Astra streaming dashboard. Select the “default” namespace: From the list of available “Sink Types,” choose “Astra DB.” Give the sink a name of your choice Select the “stock-feed” that will be available once you have published messages to that topic from your app. After selecting data stream input, select the database you want to persist pulsar messages: To enable table creation, paste the Astra DB token with valid roles. You’ll notice keyspaces after the entry of a valid token, choose the keyspace name that was used to create the database. Then enter the table name.Note: This needs to match the @Table("stock_price") annotation value you use in StockPrice.java class to read back the data. Next, you need to map the properties from the Pulsar message to the database table column. Property fields are automatically mapped in our demo application, so you can simply click “Create” to proceed. If you were, for instance, persisting a portion of the data to the database, opening the schema definition would enable you to view the property names employed and create a custom mapping between the fields. After the sink is created, the initialization process will begin. After which, the status will change to “active.” Then, you’re done with automatically persisting stock data into your database for easy access by application. The sink dashboard provides access to sink log files in the event of an error. Displaying Cassandra Data in a Table The historical data that is stored in Cassandra are displayed in a data grid component. The DetailsView contains a Vaadin Grid component that is bound to an array of StockPrice objects which are kept in a state variable: TypeScript const [stockData, setStockData] = useState<StockPrice[]>([]); The view has a dropdown selector for selecting the stock you want to view. When the selection is updated, the view fetches the data for that stock from the server endpoint: TypeScript async function getDataFor(ticker?: string) { if (ticker) setStockData(await StockPriceEndpoint.findAllByTicker(ticker)); } The StockData array is bound to the grid in the template. GridColumns define the properties that columns should map to: HTML <Grid items={stockData} className="flex-grow"> <GridColumn path="time" ></GridColumn> <GridColumn path="open" ></GridColumn> <GridColumn path="high" ></GridColumn> <GridColumn path="low" ></GridColumn> <GridColumn path="close" ></GridColumn> <GridColumn path="volume" ></GridColumn> </Grid> Conclusion In this article, we showed how you can build a scalable real-time application using an open-source Java stack. You can clone the completed application and use it as a base for your own experiments.
In this article, I will look at Specification by Example (SBE) as explained in Gojko Adzic’s book of the same name. It’s a collaborative effort between developers and non-developers to arrive at textual specifications that are coupled to automatic tests. You may also have heard of it as behavior-driven development or executable specifications. These are not synonymous concepts, but they do overlap. It's a common experience in any large, complex project. Crucial features do not behave as intended. Something was lost in the translation between intention and implementation, i.e., business and development. Inevitably we find that we haven’t built quite the right thing. Why wasn’t this caught during testing? Obviously, we’re not testing enough, or the wrong things. Can we make our tests more insightful? Some enthusiastic developers and SBE adepts jump to the challenge. Didn’t you know you can write all your tests in plain English? Haven’t you heard of Gherkin's syntax? She demonstrates the canonical Hello World of executable specifications, using Cucumber for Java. Gherkin Scenario: Items priced 100 euro or more are eligible for 5% discount for loyal customers Given Jill has placed three orders in the last six months When she looks at an item costing 100 euros or more Then she is eligible for a 5% discount Everybody is impressed. The Product Owner greenlights a proof of concept to rewrite the most salient test in Gherkin. The team will report back in a month to share their experiences. The other developers brush up their Cucumber skills but find they need to write a lot of glue code. It’s repetitive and not very DRY. Like the good coders they are, they make it more flexible and reusable. Gherkin Scenario: discount calculator for loyal customers Given I execute a POST call to /api/customer/12345/orders?recentMonths=6 Then I receive a list of 3 OrderInfo objects And a DiscountRequestV1 message for customer 12345 is put on the queue /discountcalculator [ you get the message ] Reusable yes, readable, no. They’re right to conclude that the textual layer offers nothing, other than more work. It has zero benefits over traditional code-based tests. Business stakeholders show no interest in these barely human-readable scripts, and the developers quickly abandon the effort. It’s About Collaboration, Not Testing The experiment failed because it tried to fix the wrong problem. It failed because better testing can’t repair a communication breakdown between getting from the intended functionality to implementation. SBE is about collaboration. It is not a testing approach. You need this collaboration to arrive at accurate and up-to-date specifications. To be clear, you always have a spec (like you always have an architecture). It may not always be a formal one. It can be a mess that only exists in your head, which is only acceptable if you’re a one-person band. In all other cases, important details will get lost or confused in the handover between disciplines. The word handover has a musty smell to it, reminiscent of old-school Waterfall: the go-to punchbag for everything we did wrong in the past, but also an antiquated approach that few developers under the age of sixty have any real experience with it. Today we’re Agile and multi-disciplinary. We don’t have specialists who throw documents over the fence of their silos. It is more nuanced than that, now as well as in 1975. Waterfall didn’t prohibit iteration. You could always go back to an earlier stage. Likewise, the definition of a modern multi-disciplinary team is not a fungible collection of Jacks and Jills of all trades. Nobody can be a Swiss army knife of IT skills and business domain knowledge. But one enduring lesson from the past is that we can’t produce flawless and complete specifications of how the software should function, before writing its code. Once you start developing, specs always turn out over-complete, under-complete, and just plain wrong in places. They have bugs, just like code. You make them better with each iteration. Accept that you may start off incomplete and imprecise. You Always Need a Spec Once we have built the code according to spec (whatever form that takes), do we still need that spec, as an architect’s drawing after the house was built? Isn’t the ultimate truth already in the code? Yes, it is, but only at a granular level, and only accessible to those who can read it. It gives you detail, but not the big picture. You need to zoom out to comprehend the why. Here’s where I live: This is the equivalent of source code. Only people who have heard of the Dutch village of Heeze can relate this map to the world. It’s missing the context of larger towns and a country. The next map zooms out only a little, but with the context of the country's fifth-largest city, it’s recognizable to all Dutch inhabitants. The next map should be universal. Even if you can’t point out the Netherlands on a map, you must have heard of London. Good documentation provides a hierarchy of such maps, from global and generally accessible to more detailed, requiring more domain-specific knowledge. At every level, there should be sufficient context about the immediately connecting parts. If there is a handover at all, it’s never of the kind: “Here’s my perfect spec. Good luck, and report back in a month”. It’s the finalization of a formal yet flexible document created iteratively with people from relevant disciplines in an ongoing dialogue throughout the development process. It should be versioned, and tied to the software that it describes. Hence the only logical place is together with the source code repository, at least for specifications that describe a well-defined body of code, a module, or a service. Such specs can rightfully be called the ultimate source of truth about what the code does, and why. Because everybody was involved and invested, everybody understands it, and can (in their own capacity) help create and maintain it. However, keeping versioned specs with your software is no automatic protection against mismatches, when changes to the code don’t reflect the spec and vice versa. Therefore, we make the spec executable, by coupling it to testing code that executes the code that the spec covers and validates the results. It sounds so obvious and attractive. Why isn’t everybody doing it if there’s a world of clarity to be gained? There are two reasons: it’s hard and you don’t always need SBE. We routinely overestimate the importance of the automation part, which puts the onus disproportionally on the developers. It may be a deliverable of the process, but it’s only the collaboration that can make it work. More Art Than Science Writing good specifications is hard, and it’s more art than science. If there ever was a need for clear, unambiguous, SMART writing, executable specifications fit the bill. Not everybody has a talent for it. As a developer with a penchant for writing, I flatter myself that I can write decent spec files on my own. But I shouldn’t – not without at least a good edit from a business analyst. For one, I don’t know when my assumptions are off the mark, and I can’t always avoid technocratic wording from creeping in. A process that I favor and find workable is when a businessperson drafts acceptance criteria which form the input to features and scenarios. Together with a developer, they are refined: adding clarity, and edge cases, and removing duplication and ambiguity. Only then can they be rigorous enough to be turned into executable spec files. Writing executable specifications can be tremendously useful for some projects and a complete waste of time for others. It’s not at all like unit testing in that regard. Some applications are huge but computationally simple. These are the enterprise behemoths with their thousands of endpoints and hundreds of database tables. Their code is full of specialist concepts specific to the world of insurance, banking, or logistics. What makes these programs complex and challenging to grasp is the sheer number of components and the specialist domain they relate to. The math in Fintech isn’t often that challenging. You add, subtract, multiply, and watch out for rounding errors. SBE is a good candidate to make the complexity of all these interfaces and edge cases manageable. Then there’s software with a very simple interface behind which lurks some highly complex logic. Consider a hashing algorithm, or any cryptographic code, that needs to be secure and performant. Test cases are simple. You can tweak the input string, seed, and log rounds, but that’s about it. Obviously, you should test for performance and resource usage. But all that is best handled in a code-based test, not Gherkin. This category of software is the world of libraries and utilities. Their concepts stay within the realm of programming and IT. It relates less directly to concepts in the real world. As a developer, you don’t need a business analyst to explain the why. You can be your own. No wonder so many Open Source projects are of this kind. Gherkin Scenario: discount calculator for loyal customers Given I execute a POST call to /api/customer/12345/orders?recentMonths=6 Then I receive a list of 3 OrderInfo objects And a DiscountRequestV1 message for customer 12345 is put on the queue /discountcalculator [ you get the message ]
Setting up a VPN server to allow remote connections can be challenging if you set this up for the first time. In this post, I will guide you through the steps to set up your own VPN Server and connect to it using a VPN Client. Additionally, I will also show how to set up a free Radius server and a plugin to implement multi-factor authentication for additional security. 1. Installation OpenVPN Server on Linux (Using a Centos Stream 9 Linux) # yum update # curl -O https://raw.githubusercontent.com/angristan/openvpn-install/master/openvpn-install.sh # chmod +x openvpn-install.sh # ./openvpn-install.sh Accept defaults for installation of OpenVPN and, in the end, provide a Client name e.g. demouser. I have chosen a passwordless client, but if you want, you can also add an additional password to protect your private key. PowerShell Client name: demouser Do you want to protect the configuration file with a password? (e.g. encrypt the private key with a password) 1) Add a passwordless client 2) Use a password for the client Select an option [1-2]: 1 ... ... The configuration file has been written to /root/demouser.ovpn. Download the .ovpn file and import it in your OpenVPN client. Finally, a client configuration file is ready to be imported into the VPN Client. 2. Installation of OpenVPN Client for Windows Download the OpenVPN Client software. Install the OpenVPN Client: Once the installation is finished, we can import the configuration file demouser.ovpn which was generated on the OpenVPN server, but before importing, we need to modify the IP address of our OpenVPN server within this file: client proto udp explicit-exit-notify remote 192.168.0.150 1194 dev tun resolv-retry infinite nobind persist-key persist-tun ... Normally the remote IP by default will be the address of your public IP which is normal if you have your VPN server on your local network and need remote access from outside this network. You can leave the public IP address in the config, but then you will have to open up the correct port and set the routing on your internet access point. Finally, we can test the VPN connection. The first time the connection will probably fail as the firewall on the OpenVPN Linux server is blocking the access. To quickly test this, we can just disable the firewall using the command: # systemctl stop firewalld Alternatively, configure Linux firewall for OpenVPN connectivity: # sudo firewall-cmd --add-service=openvpn # sudo firewall-cmd --permanent --add-service=openvpn # sudo firewall-cmd --add-masquerade # sudo firewall-cmd --permanent --add-masquerade # sudo firewall-cmd --permanent --add-port=1194/udp # sudo firewall-cmd --reload Now the connection should work: On the windows client, you should now also get an additional VPN adapter configured with a default IP address of 10.8.0.2 (this subnet is defined within the file /etc/openvpn/server.conf). 3. How To Use Radius With OpenVPN First, we will install the IBM Security Verify Gateway for Radius on a Windows machine. This package can be downloaded from the IBM Security AppExchange (you will need to use your IBMid to log in). Extract and run the installation using setup_radius.exe. Edit the Radius configuration file c:\Program Files\IBM\IbmRadius\IbmRadiusConfig.json: Find the clients section in the configuration file. The default file has three example client definitions. Delete these definitions and replace them with the single definition shown above. This definition will match any Radius client connecting from the network used by the test machines. The secret authenticates the client. Save the file and close the editor. JSON { "address":"::", "port":1812, /* "trace-file":"c:/tmp/ibm-auth-api.log", "trace-rollover":12697600, */ "ibm-auth-api":{ "client-id":"???????", "obf-client-secret":"???????", /* See IbmRadius -obf "the-secret" */ "protocol":"https", "host":"???????.verify.ibm.com", "port":443, "max-handles":16 }, "clients":[ { "name": "OpenVPN", "address": "192.168.0.0", "mask": "255.255.0.0", "secret": "Passw0rd", "auth-method": "password-and-device", "use-external-ldap": false, "reject-on-missing-auth-method": true, "device-prompt": "A push notification has been sent to your device:[%D].", "poll-device": true, "poll-timeout": 60 } ] } Complete the fields client-id, obf-client-secret and host with the correct information to point to your IBM Verify Saas API. Before we can do this, we will need to set up API access in IBM Verify Saas. Login to your environment or go for a trial account if you don’t have one. From the main menu, select Security > API Access > Add API client Create a new API Client : Specify the entitlements by selecting the check bow from the list: Authenticate any user Read authenticator registrations for all users Read users and groups Read second-factor authentication enrollment for all users Click next on the following screens and finally give the API client a name: e.g. MFA-Client A Client ID and Secret will automatically be created for you. Use this information to complete the Radius config. Use the c:\Program Files\IBM\IbmRadius\IbmRadius.exe -obf command to generate the obfuscated secret value. Finally, configure the IBM Radius service to startup automatically and start the service: Test Radius Authentication using the Radius tool : NTRadPing You should get a push notification on the IBM Verify app on the mobile device. (Make sure you test with a userid that is known in IBM Verify Saas and is enrolled for OTP) 4. Install OpenVPN Radius Plugin Log in to the Linux OpenVPN server and launch the following commands: # wget https://www.nongnu.org/radiusplugin/radiusplugin_v2.1a_beta1.tar.gz # tar -xvf radiusplugin_v2.1a_beta1.tar.gz # cd radiusplugin_v2.1a_beta1 # yum install libgcrypt libgcrypt-devel gcc-c++ # make Copy the Radius plugin files to /etc/openvpn # cp /root/radiusplugin_v2.1a_beta1/radiusplugin.cnf /etc/openvpn # cp /root/radiusplugin_v2.1a_beta1/radiusplugin.so /etc/openvpn Edit the file /etc/openvpn/server.conf and add the following line to activate the Radius plugin: plugin /etc/openvpn/radiusplugin.so /etc/openvpn/radiusplugin.cnf Edit the file /etc/openvpn/radiusplugin.cnf and modify the ip address of the Radius server and set the sharedsecret to Passw0rd (this is the secret that was also configured on the Radius server side). Make sure to set nonfatalaccounting=true because the Radius server does not support Radius accounting. C ... NAS-IP-Address=<IP Address of the OpenVPN Server> ... nonfatalaccounting=true ... Server { # The UDP port for RADIUS accounting. acctport=1813 # The UDP port for RADIUS authentication. authport=1812 # The name or ip address of the RADIUS server. name=<IP Address of the RADIUS Server> # How many times should the plugin send the if there is no response? retry=1 # How long should the plugin wait for a response? wait=60 # The shared secret. sharedsecret=Passw0rd } Save the file and restart the OpenVPN server using the command : # systemctl restart openserver-server@server.service Finally, edit the OpenVPN client file demouser.ovpn and add a line auth-user-pass : client proto udp auth-user-pass explicit-exit-notify remote 192.168.0.150 1194 dev tun resolv-retry infinite nobind persist-key persist-tun ... This will allow the user to enter a username and password when initiating the VPN connection. These credentials will be authenticated against the IBM Verify Saas directory, and this should result in a challenge request on the IBM Verify Mobile app. The wait=60 will allow the plugin to wait for a response from the user who has to respond to the challenge using the IBM Verify App on his phone. If you prefer to use a TOTP challenge instead, you can modify the Radius configuration file on Windows (IBMRadiusConfig.json) and set the auth-method to password-and-totp. Then you can open the client VPN connection and use 123456:password instead of the normal password.
Secrets management in Docker is a critical security concern for any business. When using Docker containers, it is essential to keep sensitive data, such as passwords, API keys, and other credentials, secure. This article will discuss some best practices for managing secrets in Docker, including how to store them securely and minimize their exposure. We will explore multiple solutions: using Docker Secrets with Docker Swarm, Docker Compose, or Mozilla SOPS. Feel free to choose what’s more appropriate to your use case. But most importantly is to remember to never hard-code your Docker secrets in plain text in your Dockerfile! Following these guidelines ensures your organization’s sensitive information remains safe even when running containerized services. 4 Ways To Store and Manage Secrets in Docker 1. Using Docker Secrets and Docker Swarm Docker Secrets and Docker Swarm are two official and complimentary tools allowed to securely manage secrets when running containerized services. Docker Secrets provides a secure mechanism for storing and retrieving secrets from the system without exposing them in plain text. It enables users to keep their credentials safe by encrypting the data with a unique key before passing it to the system. Docker Swarm is a powerful tool for managing clusters of nodes for distributed applications. It provides an effective means of deploying containerized applications at scale. With this tool, you can easily manage multiple nodes within a cluster and automatically distribute workloads among them. This helps ensure your application has enough resources available at all times, even during peak usage periods or unexpected traffic spikes. Together, these two tools provide an effective way to ensure your organization’s sensitive information remains safe despite ever-evolving security needs. Let’s see how to create and manage an example secret. Creating a Secret To create a secret, we need to first initialize Docker Swarm. You can do so using the following command: docker swarm init Once the service is initialized, we can use the docker secret create command to create the secret: ssh-keygen -t rsa -b 4096 -N "" -f mykey docker secret create my_key mykey rm mykey In these commands, we first create an SSH key using the ssh-keygen command and write it to mykey. Then, we use the Docker secret command to generate the secret. Ensure you delete the mykey file to avoid any security risks. You can use the following command to confirm the secret is created successfully: docker secret ls We can now use this secret in our Docker containers. One way is to pass this secret with –secret flag when creating a service: docker service create --name mongodb --secret my_mongodb_secret redis:latest We can also pass this secret to the docker-compose.yml file. Let’s take a look at an example file: version: '3.7' services: myapp: image: mydummyapp:latest secrets: - my_secret volumes: - type: bind source: my_secret_key target: /run/secrets/my_secret read_only: true secrets: my_secret: external: true In the example compose file, the secrets section defines a secret named my_secret_key (discussed earlier). The myapp service definition specifies that it requires my_secret_key , and mounts it as a file at /run/secrets/my_secret in the container. 2. Using Docker Compose Docker Compose is a powerful tool for defining and running multi-container applications with Docker. A stack is defined by a docker-compose file allowing you to define and configure the services that make up your application, including their environment variables, networks, ports, and volumes. With Docker Compose, it is easy to set up an application in a single configuration file and deploy it quickly and consistently across multiple environments. Docker Compose provides an effective solution for managing secrets for organizations handling sensitive data such as passwords or API keys. You can read your secrets from an external file (like a TXT file). But be careful not to commit this file with your code: version: '3.7' services: myapp: image: myapp:latest secrets: - my_secret secrets: my_secret: file: ./my_secret.txt 3. Using a Sidecar Container A typical strategy for maintaining and storing secrets in a Docker environment is to use sidecar containers. Secrets can be sent to the main application container via the sidecar container, which can also operate a secrets manager or another secure service. Let’s understand this using a Hashicorp Vault sidecar for a MongoDB container: First, create a Docker Compose (docker-compose.yml) file with two services: mongo and secrets. In the secrets service, use an image containing your chosen secret management tool, such as a vault. Mount a volume from the secrets container to the mongo container so the mongo container can access the secrets stored in the secrets container. In the mongo service, use environment variables to set the credentials for the MongoDB database, and reference the secrets stored in the mounted volume. Here is the example compose file: version: '3.7' services: mongo: image: mongo volumes: - secrets:/run/secrets environment: MONGO_INITDB_ROOT_USERNAME_FILE: /run/secrets/mongo-root-username MONGO_INITDB_ROOT_PASSWORD_FILE: /run/secrets/mongo-root-password secrets: image: vault volumes: - ./secrets:/secrets command: ["vault", "server", "-dev", "-dev-root-token-id=myroot"] ports: - "8200:8200" volumes: secrets: 4. Using Mozilla SOPS Mozilla SOPS (Secrets Ops) is an open-source platform that provides organizations with a secure and automated way to manage encrypted secrets in files. It offers a range of features designed to help teams share secrets in code in a safe and practical way. The following assumes you are already familiar with SOPS, if that’s not the case, start here. Here is an example of how to use SOPS with docker-compose.yml: version: '3.7' services: myapp: image: myapp:latest environment: API_KEY: ${API_KEY} secrets: - mysecrets sops: image: mozilla/sops:latest command: ["sops", "--config", "/secrets/sops.yaml", "--decrypt", "/secrets/mysecrets.enc.yaml"] volumes: - ./secrets:/secrets environment: # Optional: specify the path to your PGP private key if you encrypted the file with PGP SOPS_PGP_PRIVATE_KEY: /secrets/myprivatekey.asc secrets: mysecrets: external: true In the above, the myapp service requires a secret called API_KEY. The secrets section uses a secret called mysecrets, which is expected to be stored in an external key/value store, such as Docker Swarm secrets or HashiCorp Vault. The sops service uses the official SOPS Docker image to decrypt the mysecrets.enc.yaml file, which is stored in the local ./secrets directory. The decrypted secrets are mounted to the myapp service as environment variables. Note: Make sure to create the secrets directory and add the encrypted mysecrets.enc.yaml file and the sops.yaml configuration file (with SOPS configuration) in that directory. Scan for Secrets in Your Docker Images Hard coding secrets in Docker is a significant security risk, making them vulnerable to attackers. We have seen different best practices to avoid hard-coding secrets in plain text in your Docker images, but security doesn’t stop there. You Should Also Scan Your Images for Secrets All Dockerfiles start with a FROM directive that defines the base image. It’s important to understand when you use a base image, especially from a public registry like Docker Hub, you are pulling external code that may contain hardcoded secrets. More information is exposed than visible in your single Dockerfile. Indeed, it’s possible to retrieve a plain text secret hard-coded in a previous layer starting from your image. In fact, many public Docker images are concerned: in 2021, we estimated that **7% of the Docker Hub images contained at least one secret.** Fortunately, you can easily detect them with ggshield (GitGuardian CLI). For example: ggshield secret scan docker ubuntu:22.04 Conclusion Managing secrets in Docker is a crucial part of preserving the security of your containerized apps. Docker includes several built-in tools for maintaining secrets, such as Docker Secrets and Docker Compose files. Additionally, organizations can use third-party solutions, like HashiCorp Vault and Mozilla SOPS, to manage secrets in Docker. These technologies offer extra capabilities, like access control, encryption, and audit logging, to strengthen the security of your secret management. Finally, finding and limiting accidental or unintended exposure of sensitive information is crucial to handling secrets in Docker. Companies are invited to use secret scanning tools, such as GitGuardian, to scan the Docker images built in their CI/CD pipelines as mitigation to prevent supply-chain attacks. If you want to know more about Docker security, we also summarized some of the best practices in a cheat sheet.
Three Hard Facts First, the complexity of your software systems is through the roof, and you have more external dependencies than ever before. 51% of IT professionals surveyed by SolarWinds in 2021 selected IT complexity as the top issue facing their organization. Second, you must deliver faster than the competition, which is increasingly difficult as more open-source and reusable tools let small teams move extremely fast. Of the 950 IT professionals surveyed by RedHat, only 1% indicated that open-source software was “not at all important.” And third, reliability is slowing you down. The Reliability/Speed Tradeoff In the olden days of software, we could just test the software before a release to ensure it was good. We ran unit tests, made sure the QA team took a look, and then we’d carefully push a software update during a planned maintenance window, test it again, and hopefully get back to enjoying our weekend. By 2023 standards, this is a lazy pace! We expect teams to constantly push new updates (even on Fridays) with minimal dedicated manual testing. They must keep up with security patches, release the latest features, and ensure that bug fixes flow to production. The challenge is that pushing software faster increases the risk of something going wrong. If you took the old software delivery approach and sped it up, you’d undoubtedly have always broken releases. To solve this, modern tooling and cloud-native infrastructure make delivering software more reliable and safer, all while reducing the manual toil of releases. According to the 2021 State of DevOps report, more than 74% of organizations surveyed have Change Failure Rate (CFR) greater than 16%. For organizations seeking to speed up software changes (see DORA metrics), many of these updates caused issues requiring additional remediation like a hotfix or rollback. If your team hasn’t invested in improving the reliability of software delivery tooling, you won’t be able to achieve reliable releases at speed. In today’s world, all your infrastructure, including dev/test infrastructure, is part of the production environment. To go fast, you also have to go safely. More minor incremental changes, automated release and rollback procedures, high-quality metrics, and clearly defined reliability goals make fast and reliable software releases possible. Defining Reliability With clearly defined goals, you will know if your system is reliable enough to meet expectations. What does it mean to be up or down? You have hundreds of thousands of services deployed in clouds worldwide in constant flux. The developers no longer coordinate releases and push software. Dependencies break for unexpected reasons. Security fixes force teams to rush updates to production to avoid costly data breaches and cybersecurity threats. You need a structured, interpreted language to encode your expectations and limits of your systems and automated corrective actions. Today, definitions are in code. Anything less is undefined. The alternative is manual intervention, which will slow you down. You can’t work on delivering new features if you’re constantly trying to figure out what’s broken and fix releases that have already gone out the door. The most precious resource in your organization is attention, and the only way to create more is to reduce distractions. Speeding Up Reliably Service level objectives (SLOs) are reliability targets that are precisely defined. SLOs include a pointer to a data source, usually a query against a monitoring or observability system. They also have a defined threshold and targets that clearly define pass or fail at any given time. SLOs include a time window (either rolling or calendar aligned) to count errors against a budget. OpenSLO is the modern de facto standard for declaring your reliability targets. Once you have SLOs to describe your reliability targets across services, something changes. While SLOs don’t improve reliability directly, they shine a light on the disconnect between expectations and reality. There is a lot of power in simply clarifying and publishing your goals. What was once a rough shared understanding becomes explicitly defined. We can debate the SLO and decide to raise, lower, redefine, split, combine, and modify it with a paper trail in the commit history. We can learn from failures as well as successes. Whatever other investments you’re making, SLOs help you measure and improve your service. Reliability is engineered; you can’t engineer a system without understanding its requirements and limitations. SLOs-as-code defines consistent reliability across teams, companies, implementations, clouds, languages, etc.
As with back-end development, observability is becoming increasingly crucial in front-end development, especially when it comes to troubleshooting. For example, imagine a simple e-commerce application that includes a mobile app, web server, and database. If a user reports that the app is freezing while attempting to make a purchase, it can be challenging to determine the root cause of the problem. That's where OpenTelemetry comes in. This article will dive into how front-end developers can leverage OpenTelemetry to improve observability and efficiently troubleshoot issues like this one. Why Front-End Troubleshooting? Similar to back-end development, troubleshooting is a crucial aspect of front-end development. For instance, consider a straightforward e-commerce application structure that includes a mobile app, a web server, and a database. Suppose a user reported that the app is freezing while attempting to purchase a dark-themed mechanical keyboard. Without front-end tracing, we wouldn't have enough information about the problem since it could be caused by different factors such as the front-end or back-end, latency issues, etc. We can try collecting logs to get some insight, but it's challenging to correlate client-side and server-side logs. We might attempt to reproduce the issue from the mobile application, but it could be time-consuming and impossible if the client-side conditions aren't available. However, if the issue isn't reproduced, we need more information to identify the specific problem. This is where front-end tracing comes in handy because, with the aid of front-end tracing, we can stop making assumptions and instead gain clarity on the location of the issue. Front-End Troubleshooting With Distributed Tracing Tracing data is organized in spans, which represent individual operations like an HTTP request or a database query. By displaying spans in a tree-like structure, developers can gain a comprehensive and real-time view of their system, including the specific issue they are examining. This allows them to investigate further and identify the cause of the problem, such as bottlenecks or latency issues. Tracing can be a valuable tool for pinpointing the root cause of an issue. The example below displays three simple components: a front-end a back-end, and a database. When there is an issue, the trace encompasses spans from both the front-end app and the back-end service. By reviewing the trace, it's possible to identify the data that was transmitted between the components, allowing developers to follow the path from the specific user click in the front-end to the DB query. Rather than relying on guesswork to identify the issue, with tracing, you can have a visual representation of it. For example, you can determine whether the request was sent out from the device, whether the back-end responded, whether certain components were missing from the response and other factors that may have caused the app to become unresponsive. Suppose we need to determine if a delay caused a problem. In Helios, there's a functionality that displays the span's duration. Here's what it looks like: Now you can simply analyze the trace to pinpoint the bottleneck. In addition, each span in the trace is timestamped, allowing you to see exactly when each action took place and whether there were any delays in processing the request. Helios comes with a span explorer that was created explicitly for this purpose. The explorer enables the sorting of spans based on their duration or timestamp: The trace visualization provides information on the time taken by each operation, which can help identify areas that require optimization. A default view available in Jaeger is also an effective method to explore all the bottlenecks by displaying a trace breakdown. Adding Front-End Instrumentation to Your Traces in OpenTelemetery: Advanced Use Cases It's advised to include front-end instrumentation in your traces to enhance the ability to analyze bottlenecks. While many SDKs provided by OpenTelemetry are designed for back-end services, it's worth noting that OpenTelemetry has also developed an SDK for JavaScript. Additionally, they plan to release more client libraries in the future. Below, we will look at how to integrate these libraries. Aggregating Traces Aggregating multiple traces from different requests into one large trace can be useful for analyzing a flow as a whole. For instance, imagine a purchasing process that involves three REST requests, such as validating the user, billing the user, and updating the database. To see this flow as a single trace for all three requests, developers can create a custom span that encapsulates all three into one flow. This can be achieved using a code example like the one below. const { createCustomSpan } = require('@heliosphere/web-sdk'); const purchaseFunction = () => { validateUser(user.id); chargeUser(user.cardToken); updateDB(user.id); }; createCustomSpan("purchase", {'id': purchase.id}, purchaseFunction); From now on, the trace will include all the spans generated under the validateUser, chargeUser, and updateDB categories. This will allow us to see the entire flow as a single trace rather than separate ones for each request. Adding Span Events Adding information about particular events can be beneficial when investigating and analyzing front-end bottlenecks. With OpenTelemetry, developers can utilize a feature called Span Event, which allows them to include a report about an event and associate it with a specific span. A Span Event is a message on a span that describes a specific event with no duration and can be identified by a single time stamp. It can be seen as a basic log and appears in this format: const activeSpan = opentelemetry.trace.getActiveSpan(); activeSpan.addEvent('User clicked Purchase button); Span Events can gather various data, such as clicks, device events, networking events, and so on. Adding Baggage Baggage is a useful feature provided by OpenTelemetry that allows adding contextual information to traces. This information can be propagated across all spans in a trace and can be helpful in transferring user data, such as user identification, preferences, and Stripe tokens, among other things. This feature can benefit front-end development since user data is a crucial element in this area. You can find more information about Baggage right here. Deploying Front-End Instrumentation Deploying the instrumentation added to your traces is straightforward, just like deploying any other OpenTelemetry SDK. Additionally, you can use Helios's SDK to visualize and gain more insights without setting up your own infrastructure. To do this, simply visit the Helios website, register, and follow the steps to install the SDK and add the code snippet to your application. The deployment instructions for the Helios front-end SDK are shown below: Where to Go From Here: Next Steps for Front-End Developers Enabling front-end instrumentation is a simple process that unlocks a plethora of new troubleshooting capabilities for full-stack and front-end developers. It allows you to map out a transaction, starting from a UI click and to lead up to a specific database query or scheduled job, providing unique insights for bottleneck identification and issue analysis. Both OpenTelemetry and Helios support front-end instrumentation, making it even more accessible for developers. Begin utilizing these tools today to enhance your development workflow.
In this latest article of the series about simulating and troubleshooting performance problems in Kotlin, let’s discuss how to make threads go into a BLOCKED state. A thread will enter into a BLOCKED state if it can’t acquire a lock on an object because another thread already holds the lock on the same object and doesn’t release it. Kotlin BLOCKED Thread Program Here is a sample program that would make threads go into a BLOCKED state. package com.buggyapp class BlockedApp { fun start() { println("BlockedApp:started") for (counter in 0..9) { // Launch 10 threads. AppThread().start() } } } class AppThread : Thread() { override fun run() { AppObject.something } } object AppObject { @get:Synchronized val something: Unit get() { while (true) { try { Thread.sleep(6000000.toLong()) } catch (e: Exception) { } } } } fun main() { println(BlockedApp().start()) } The sample program contains the BlockedApp class. This class has a start() method. In this method, 10 new threads are created. In the AppThread class there is a run() method that invokes the getSomething() method on the AppObject. In this getSomething() method, the thread is put to continuous sleep; i.e., the thread is repeatedly sleeping for 10 minutes again and again. But if you notice, the getSomething() method is a synchronized method. Synchronized methods can be executed by only one thread at a time. If any other thread tries to execute the getSomething() method while the previous thread is still working on it, then the new thread will be put in the BLOCKED state. In this case, 10 threads are launched to execute the getSomething() method. However, only one thread will acquire a lock and execute this method. The remaining 9 threads will be put in a BLOCKED state. NOTE: If threads are in BLOCKED state for a prolonged period, then the application may become unresponsive. How To Diagnose BLOCKED Threads You can diagnose BLOCKED threads either through a manual or automated approach. Manual Approach In the manual approach, you need to capture thread dumps as the first step. A thread dump shows all the threads that are in memory and their code execution path. You can capture a thread dump using one of the 8 options mentioned here. But an important criterion is: You need to capture the thread dump right when the problem is happening (which might be tricky to do). Once the thread dump is captured, you need to manually import the thread dump from your production servers to your local machine and analyze it using thread dump analysis tools like fastThread or samurai. Automated Approach On the other hand, you can also use the yCrash open source script, which would capture 360-degree data (GC log, 3 snapshots of thread dump, heap dump, netstat, iostat, vmstat, top, top -H, etc.) right when the problem surfaces in the application stack and analyze them instantly to generate root cause analysis report. We used the automated approach. Below is the root cause analysis report generated by the yCrash tool highlighting the source of the problem. yCrash reporting transitive dependency graph of 9 BLOCKED threads yCrash prints a transitive dependency graph that shows which threads are getting BLOCKED and who is blocking them. In this transitive graph, you can see "Thread-0" blocking 9 other threads. If you click on the thread names in the graph, you can see the stack trace of that particular thread. yCrash reporting the stack trace of 9 threads that are in BLOCKED state Here is the screenshot that shows the stack trace of the 9 threads which are in the BLOCKED state and it’s also pointing out the stack trace in which they are stuck. From the stack trace, you can observe that thread is stuck on the com.buggyapp.blockedapp.AppObject#getSomething() method. Equipped with this information, one can easily identify the root cause of the BLOCKED state threads. Video To see the visual walk-through of this post, click below:
Ten Easy Steps Toward a Certified Data Scientist Career
March 22, 2023 by
5 Best Python Testing Frameworks
March 21, 2023 by
Spring Boot, Quarkus, or Micronaut?
March 23, 2023 by
Asynchronous Messaging Service
March 23, 2023 by
Spring Boot, Quarkus, or Micronaut?
March 23, 2023 by
Create CloudWatch Custom Log Metric Alarm Notification Email Solution Using Terraform
March 22, 2023 by
Introduction to Container Orchestration
March 22, 2023 by
Spring Boot, Quarkus, or Micronaut?
March 23, 2023 by
Create CloudWatch Custom Log Metric Alarm Notification Email Solution Using Terraform
March 22, 2023 by
Introduction to Container Orchestration
March 22, 2023 by
Create CloudWatch Custom Log Metric Alarm Notification Email Solution Using Terraform
March 22, 2023 by
Introduction to Container Orchestration
March 22, 2023 by
Spring Boot, Quarkus, or Micronaut?
March 23, 2023 by
Create CloudWatch Custom Log Metric Alarm Notification Email Solution Using Terraform
March 22, 2023 by
File Uploads for the Web (2): Upload Files With JavaScript
March 22, 2023 by CORE