Integrating PostgreSQL Databases with ANF: Join this workshop to learn how to create a PostgreSQL server using Instaclustr’s managed service
Mobile Database Essentials: Assess data needs, storage requirements, and more when leveraging databases for cloud and edge applications.
Unlocking Data Insights and Architecture
Microfrontends for Quarkus Microservices
Database Systems
This data-forward, analytics-driven world would be lost without its database and data storage solutions. As more organizations continue to transition their software to cloud-based systems, the growing demand for database innovation and enhancements has climbed to novel heights. We are upon a new era of the "Modern Database," where databases must both store data and ensure that data is prepped and primed securely for insights and analytics, integrity and quality, and microservices and cloud-based architectures.In our 2023 Database Systems Trend Report, we explore these database trends, assess current strategies and challenges, and provide forward-looking assessments of the database technologies most commonly used today. Further, readers will find insightful articles — written by several of our very own DZone Community experts — that cover hand-selected topics, including what "good" database design is, database monitoring and observability, and how to navigate the realm of cloud databases.
Design Patterns
Threat Modeling
Artificial intelligence (AI) is transforming various industries and changing the way businesses operate. Although Python is often regarded as the go-to language for AI development, Java provides robust libraries and frameworks that make it an equally strong contender for creating AI-based applications. In this article, we explore using Java and Gradle for AI development by discussing popular libraries, providing code examples, and demonstrating end-to-end working examples. Java Libraries for AI Development Java offers several powerful libraries and frameworks for building AI applications, including: Deeplearning4j (DL4J) - A deep learning library for Java that provides a platform for building, training, and deploying neural networks, DL4J supports various neural network architectures and offers GPU acceleration for faster computations. Weka - A collection of machine learning algorithms for data mining tasks, Weka offers tools for data pre-processing, classification, regression, clustering, and visualization. Encog - A machine learning framework supporting various advanced algorithms, including neural networks, support vector machines, genetic programming, and Bayesian networks Setting up Dependencies With Gradle To begin AI development in Java using Gradle, set up the required dependencies in your project by adding the following to your build.gradle file: Groovy dependencies { implementation 'org.deeplearning4j:deeplearning4j-core:1.0.0-M1.1' implementation 'nz.ac.waikato.cms.weka:weka-stable:3.8.5' implementation 'org.encog:encog-core:3.4' } Code Examples Building a Simple Neural Network With DL4J This example demonstrates creating a basic neural network using the Deeplearning4j (DL4J) library. The code sets up a two-layer neural network architecture consisting of a DenseLayer with 4 input neurons and 10 output neurons, using the ReLU activation function, and an OutputLayer with 10 input neurons and 3 output neurons, using the Softmax activation function and Negative Log Likelihood as the loss function. The model is then initialized and can be further trained on data and used for predictions. Java import org.deeplearning4j.nn.api.OptimizationAlgorithm; import org.deeplearning4j.nn.conf.MultiLayerConfiguration; import org.deeplearning4j.nn.conf.NeuralNetConfiguration; import org.deeplearning4j.nn.conf.layers.DenseLayer; import org.deeplearning4j.nn.conf.layers.OutputLayer; import org.deeplearning4j.nn.multilayer.MultiLayerNetwork; import org.deeplearning4j.nn.weights.WeightInit; import org.nd4j.linalg.activations.Activation; import org.nd4j.linalg.learning.config.Sgd; import org.nd4j.linalg.lossfunctions.LossFunctions; public class SimpleNeuralNetwork { public static void main(String[] args) { MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder() .seed(123) .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT) .updater(new Sgd(0.01)) .list() .layer(0, new DenseLayer.Builder().nIn(4).nOut(10) .weightInit(WeightInit.XAVIER) .activation(Activation.RELU) .build()) .layer(1, new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD) .nIn(10).nOut(3) .weightInit(WeightInit.XAVIER) .activation(Activation.SOFTMAX) .build()) .pretrain(false).backprop(true) .build(); MultiLayerNetwork model = new MultiLayerNetwork(conf); model.init(); } } Classification Using Weka This example shows how to use the Weka library for classification on the Iris dataset. The code loads the dataset from an ARFF file, sets the class attribute (the attribute we want to predict) to be the last attribute in the dataset, builds a Naive Bayes classifier using the loaded data, and classifies a new instance. Java import weka.classifiers.bayes.NaiveBayes; import weka.core.Instance; import weka.core.Instances; import weka.core.converters.ConverterUtils.DataSource; public class WekaClassification { public static void main(String[] args) throws Exception { DataSource source = new DataSource("data/iris.arff"); Instances data = source.getDataSet(); data.setClassIndex(data.numAttributes() - 1); NaiveBayes nb = new NaiveBayes(); nb.buildClassifier(data); Instance newInstance = data.instance(0); double result = nb.classifyInstance(newInstance); System.out.println("Predicted class: " + data.classAttribute().value((int) result)); } } Conclusion Java, with its rich ecosystem of libraries and frameworks for AI development, is a viable choice for building AI-based applications. By leveraging popular libraries like Deeplearning4j, Weka, and Encog, and using Gradle as the build tool, developers can create powerful AI solutions using the familiar Java programming language. The provided code examples demonstrate the ease of setting up and configuring AI applications using Java and Gradle. The DL4J example shows how to create a basic deep learning model that can be applied to tasks such as image recognition or natural language processing. The Weka example demonstrates how to use Java and the Weka library for machine learning tasks, specifically classification, which can be valuable for implementing machine learning solutions in Java applications, such as predicting customer churn or classifying emails as spam or not spam. Happy Learning!!
If you’re still building and delivering your software applications the traditional way, then you are missing out on a major innovation in the software development process or software development life cycle. To show you what I’m talking about, in this article, I will share how to create a CI/CD Pipeline with Jenkins, Containers, and Amazon ECS that deploys your application and overcomes the limitations of the traditional software delivery model. This innovation greatly affects deadlines, time to market, quality of the product, etc. I will take you through the whole step-by-step process of setting up a CI/CD Docker pipeline for a sample Node.js application. What Is a CI/CD Pipeline? A CI/CD Pipeline or Continuous Integration Continuous Delivery Pipeline is a set of instructions to automate the process of Software tests, builds, and deployments. Here are a few benefits of implementing CI/CD in your organization. Smaller code change: The ability of CI/CD Pipelines to allow the integration of a small piece of code at a time helps developers recognize any potential problem before too much work is completed. Faster delivery: Multiple daily releases or continual releases can be made a reality using CI/CD Pipelines. Observability: Having automation in place that generates extensive logs at each stage of the development process helps to understand if something goes wrong. Easier rollbacks: There are chances that the code that has been deployed may have issues. In such cases, it is very crucial to get back to the previous working release as soon as possible. One of the biggest advantages of using the CI/CD Pipelines is that you can quickly and easily roll back to the previous working release. Reduce costs: Having automation in place for repetitive tasks frees up the Developer and Operation guys’ time that could be spent on Product Development. Now, before we proceed with the steps to set up a CI/CD Pipeline with Jenkins, Containers, and Amazon ECS, let’s see, in short, what tools and technologies we will be using. CI/CD Docker Tool Stack GitHub: It is a web-based application or a cloud-based service where people or developers collaborate, store, and manage their application code using Git. We will create and store our sample Nodejs application code here. AWS EC2 Instance: AWS EC2 is an Elastic Computer Service provided by Amazon Web Services used to create Virtual Machines or Virtual Instances on AWS Cloud. We will create an EC2 instance and install Jenkins and other dependencies in it. Java: This will be required to run the Jenkins Server. AWS CLI: aws-cli i.e AWS Command Line Interface, is a command-line tool used to manage AWS Services using commands. We will be using it to manage AWS ECS Task and ECS Service. Node.js and NPM: Node.js is a back-end JavaScript runtime environment, and NPM is a package manager for Node. We will be creating a CI CD Docker Pipeline for the Node.js application. Docker: Docker is an open-source containerization platform used for developing, shipping, and running applications. We will use it to build Docker Images of our sample Node.js application and push/pull them to/from AWS ECR. Jenkins: Jenkins is an open-source, freely available automation server used to build, test, and deploy software applications. We will be creating our CI/CD Docker Pipeline to build, test, and deploy our Node.js application on AWS ECS using Jenkins AWS ECR: AWS Elastic Container Registry is a Docker Image Repository fully managed by AWS to easily store, share, and deploy container images. We will be using AWS ECR to store Docker Images of our sample Node.js application. AWS ECS: AWS Elastic Container Service is a container orchestration service fully managed by AWS to easily deploy, manage, and scale containerized applications. We will be using it to host our sample Node.js application. Architecture This is how our architecture will look like after setting up the CI/CD Pipeline with Docker. After the CI/CD Docker Pipeline is successfully set up, we will push commits to our GitHub repository, and in turn, GitHub Webhook will trigger the CI/CD Pipeline on Jenkins Server. Jenkins Server will then pull the latest code, perform unit tests, build a docker image, and push it to AWS ECR. After the image is pushed to AWS ECR, the same image will be deployed in AWS ECS by Jenkins. CI/CD Workflow and Phases Workflow CI and CD Workflow allows us to focus on Development while it carries out the tests, build, and deployments in an automated way. Continuous Integration: This allows the developers to push the code to the Version Control System or Source Code Management System, build & test the latest code pushed by the developer, and generate and store artifacts. Continuous Delivery: This is the process that lets us deploy the tested code to the Production whenever required. Continuous Deployment: This goes one step further and releases every single change without any manual intervention to the customer system every time the production pipeline passes all the tests. Phases The primary goal of the automated CI/CD pipeline is to build the latest code and deploy it. There can be various stages as per the need. The most common ones are mentioned below. Trigger: The CI/CD pipeline can do its job on the specified schedule when executed manually or triggered automatically on a particular action in the Code Repository. Code pull: In this phase, the pipeline pulls the latest code whenever the pipeline is triggered. Unit tests: In this phase, the pipeline performs tests that are there in the codebase. This is also referred to as unit tests. Build or package: Once all the tests pass, the pipeline moves forward and builds artifacts or docker images in case of dockerized applications. Push or store: In this phase, the code that has been built is pushed to the Artifactory or Docker Repository in case of dockerized applications. Acceptance tests: This phase or stage of the pipeline validates if the software behaves as intended. It is a way to ensure that the software or application does what it is meant to do. Deploy: This is the final stage in any CI/CD pipeline. In this stage, the application is ready for delivery or deployment. Deployment Strategy A deployment strategy is a way in which containers of the micro-services are taken down and added. There are various options available; however, we will only discuss the ones that are available and supported by ECS Rolling Updates In rolling updates, the scheduler in the ECS Service replaces the currently running tasks with new ones. The tasks in the ECS cluster are nothing but running containers created out of the task definition. Deployment configuration controls the number of tasks that Amazon ECS adds or removes from the service. The lower and the upper limit on the number of tasks that should be running is controlled by minimumHealthyPercent and maximumPercent, respectively. minimumHealthyPercent example: If the value of minimumHealthyPercent is 50 and the desired task count is four, then the scheduler can stop two existing tasks before starting two new tasks maximumPercent example: If the value of maximumPercent is four and the desired task is four, then the scheduler can start four new tasks before stopping four existing tasks. If you want to learn more about this, visit the official documentation here. Blue/Green Deployment Blue/Green deployment strategy enables the developer to verify a new deployment before sending traffic to it by installing an updated version of the application as a new replacement task set. There are primarily three ways in which traffic can shift during blue/green deployment. Canary — Traffic is shifted in two increments. The percentage of traffic shifted to your updated task set in the first increment and the interval, in minutes, before the remaining traffic is shifted in the second increment. Linear — Traffic is shifted in equal increments, the percentage of traffic shifted in each increment, and the number of minutes between each increment. All-at-once — All traffic is shifted from the original task set to the updated task set all at once. To learn more about this, visit the official documentation here. Out of these two strategies, we will be using the rolling-updates deployment strategy in our demo application. Dockerize Node.js App Now, let’s get started and make our hands dirty. The Dockerfile for the sample Nodejs application is as follows. There is no need to copy-paste this file. It is already available in the sample git repository that you cloned previously. Let’s just try to understand the instructions of our Dockerfile. FROM node:12.18.4-alpineThis will be our base image for the container. WORKDIR /appThis will be set as a working directory in the container. ENV PATH /app/node_modules/.bin:$PATHPATH variable is assigned a path to /app/node_modules/.bin. COPY package.json ./Package.json will be copied in the working directory of the container. RUN npm installInstall dependencies. COPY . ./Copy files and folders with dependencies from the host machine to the container. EXPOSE 3000Allow to port 300 of the container. CMD [“node”, “./src/server.js”]Start the application This is the Docker file that we will use to create a Docker image. Setup GitHub Repositories Create a New Repository Go to GitHub, create an account if you don’t have it already else log in to your account and create a new repository. You can name it as per your choice; however, I would recommend using the same name to avoid any confusion. You will get the screen as follows: copy the repository URL and keep it handy. Call this URL a GitHub Repository URL and note it down in the text file on your system. Note: Create a new text file on your system and note down all the details that will be required later. Create a GitHub Token This will be required for authentication purposes. It will be used instead of a password for Git over HTTP or can be used to authenticate to the API over Basic Authentication. Click on the user icon in the top-right, go to “Settings,” then click on the “Developers settings” option in the left panel. Click on the “Personal access tokens” options and “Generate new token” to create a new token. Tick the “repo” checkbox. The token will then have “full control of private repositories” You should see your token created now. Clone the Sample Repository Check your present working directory.pwd Note: You are in the home directory, i.e.,/home/ubuntu. Clone my sample repository containing all the required code.git clone Create a new repository. This repository will be used for CI/CD Pipeline setup.git clone Copy all the code from my node.js repository to the newly created demo-nodejs-app repository.cp -r nodejs/* demo-nodejs-app/ Change your working directory.cd demo-nodejs-app/ Note: For the rest of the article, do not change your directory. Stay in the same directory. Here it is /home/ubuntu/demo-nodejs-app/, and execute all the commands from there. ls -l git status Push Your First Commit to the Repository Check your present working directory. It should be the same. Here it is:/home/ubuntu/demo-nodejs-app/pwd Set a username for your git commit message.git config user.name “Rahul” Set an email for your git commit message.git config user.email “<>” Verify the username and email you set.git config –list Check the status, see files that have been changed or added to your git repository.git status Add files to the git staging area.git add Check the status, see files that have been added to the git staging area.git status Commit your files with a commit message.git commit -m “My first commit” Push the commit to your remote git repository.git push Setup the AWS Infrastructure Create an IAM User With Programmatic Access Create an IAM user with programmatic access in your AWS account and note down the access key and secret key in your text file for future reference. Provide administrator permissions to the user. We don’t need admin access; however, to avoid permission issues and for the sake of the demo, let’s proceed with administrator access. Create an ECR Repository Create an ECR Repository in your AWS account and note its URL in your text file for future reference. Create an ECS Cluster Go to ECS Console and click on “Get Started” to create a cluster. Click on the “Configure” button available in the “custom” option under “Container definition.” Specify a name to the container as “nodejs-container,” the ECR Repository URL in the “Image” text box, and “3000” port in the Port mappings section, and then click on the “Update” button. You can specify any name of your choice for the container. You can now see the details you specified under “Container definition.” Click on the “Next” button to proceed. Select “Application Load Balancer” under “Define your service” and then click on the “Next” button. Keep the cluster name as “default” and proceed by clicking on the “Next” button. You can change the cluster name if you want. Review the configuration, and it should look as follows. If the configurations match, then click on the “Create” button. This will initiate the ECS Cluster creation. After a few minutes, you should have your ECS cluster created, and the Launch Status should be something as follows. Create an EC2 Instance for Setting up the Jenkins Server Create an EC2 Instance with Ubuntu 18.04 AMI and open Port 22 for your IP and Port 8080 for 0.0.0.0/0 in its Security Group. Port 22 will be required for ssh and 8080 for accessing the Jenkins Server. Port 8080 is where GitHub Webhook will try to connect to on Jenkins Server hence we need to allow it for 0.0.0.0/0 Setup Jenkins on the EC2 Instance After the instance is available, let’s install Jenkins Server on it along with all the dependencies. Prerequisites of the EC2 Instance Verify if the OS is Ubuntu 18.04 LTScat /etc/issue Check the RAM, minimum of 2 GB is what we require.free -m The User that you use to log in to the server should have sudo privileges. “ubuntu” is the user available with sudo privileges for EC2 instances created using “Ubuntu 18.04 LTS” AMI.whoami Check your present working directory, it will be your home directory.pwd Install Java, JSON Processor jq, Node.js/NPM, and aws-cli on the EC2 Instance Update your system by downloading package information from all configured sources.sudo apt update Search and Install Java 11sudo apt search openjdksudo apt install openjdk-11-jdk Install jq command, the JSON processor.sudo apt install jq Install Nodejs 12 and NPMcurl -sL https://deb.nodesource.com/setup_12.x | sudo -E bash –sudo apt install nodejs Install aws cli tool.sudo apt install awscli Check the Java version.java –version Check the jq version.jq –version Check the Nodejs versionnode –version Check the NPM versionnpm –version Check the aws cli versionaws –version Note: Make sure all your versions match the versions seen in the above image. Install Jenkins on the EC2 Instance Jenkins can be installed from the Debian repositorywget -q -O – http://pkg.jenkins-ci.org/debian/jenkins-ci.org.key | sudo apt-key add -sudo sh -c ‘echo deb http://pkg.jenkins-ci.org/debian binary/ > /etc/apt/sources.list.d/jenkins.list’ Update the apt package indexsudo apt-get update Install Jenkins on the machinesudo apt-get install jenkins Check the service status if it is running or not.service jenkins status You should have your Jenkins up and running now. You may refer to the official documentation here if you face any issues with the installation. Install Docker on the EC2 Instance Install packages to allow apt to use a repository over HTTPS:sudo apt-get install apt-transport-https ca-certificates curl gnupg lsb-release Add Docker’s official GPG key:curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg –dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg Set up the stable repositoryecho “deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable” | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null Update the apt package indexsudo apt-get update Install the latest version of Docker Engine and containerd,sudo apt-get install docker-ce docker-ce-cli containerd.io Check the docker version.docker –version Create a “docker” group, this may exit.sudo groupadd docker Add “ubuntu” user to the “docker” groupsudo usermod -aG docker ubuntu Add “jenkins” user to the “docker” groupsudo usermod -aG docker jenkins Test if you can create docker objects using “ubuntu” user.docker run hello-world Switch to “root” usersudo -i Switch to “jenkins” usersu jenkins Test if you can create docker objects using “jenkins” user.docker run hello-world Exit from “jenkins” userexit Exit from “root” userexit Now you should be back in “ubuntu” user. You may refer to the official documentation here if you face any issues with the installation. Configure the Jenkins Server After Jenkins has been installed, the first step is to extract its password.sudo cat /var/lib/jenkins/secrets/initialAdminPassword Hit the URL in the browserJenkins URL: http://<public-ip-of-the-ec2-instace>:8080 Select the “Install suggested plugins” option Specify the user-name, and password for the new admin user to be created. You can use this user as an admin user. This URL field will be auto-filled. Click on the “Save and Finish” button to proceed. Your Jenkins Server is ready now. Here is what its Dashboard looks like. Install Plugins Let’s install all the plugins that we will need. Click on “Manage Jenkins” in the left panel. Here is a list of plugins that we need to install CloudBees AWS Credentials:Allows storing Amazon IAM credentials keys within the Jenkins Credentials API. Docker Pipeline:This plugin allows building, testing, and using Docker images from Jenkins Pipeline. Amazon ECR:This plugin provides integration with AWS Elastic Container Registry (ECR)Usage: AWS Steps:This plugin adds Jenkins pipeline steps to interact with the AWS API. In the “Available” tab, search all these plugins and click on “Install without restart.” You will see the screen as follows after the plugins have been installed successfully. Create Credentials in Jenkins CloudBees AWS Credentials plugin will come to the rescue here. Go to “Manage Jenkins,” and then click on “Manage Credentials." Click on “(global)” “Add credentials”. Select Kind as “AWS Credentials” and provide ID as “demo-admin-user.” This can be provided as per your choice. Keep a note of this ID in the text file. Specify the Access Key and Secret Key of the IAM user we created in the previous steps. Click on “OK” to store the IAM credentials. Follow the same step, and this time select Kind as “Username with password” to store the GitHub Username and Token we created earlier. Click on “Ok” to store the GitHub credentials. You should now have IAM and GitHub credentials in your Jenkins. Create a Jenkins Job Go to the main dashboard and click on “New Item” to create a Jenkins Pipeline. Select the “Pipeline” and name it “demo-job,” or provide a name of your choice. Tick the “GitHub project” checkbox under the “General” tab, and provide the GitHub Repository URL of the one we created earlier. Also, tick the checkbox “GitHub hook trigger for GitScm polling” under the “Build Trigger” tab. Under the “Pipeline” tab, select “Pipeline script from the SCM” definition, specify our repository URL, and select the credential we created for Github. Check the branch name if it matches the one you will be using for your commits. Review the configurations and click on “Save” to save your changes to the pipeline. Now you can see the pipeline we just created. Integrate GitHub and Jenkins The next step is to integrate Github with Jenkins so that whenever there is an event on the Github Repository, it can trigger the Jenkins Job. Go to the settings tab of the repository and click on “Webhooks” in the left panel. You can see the “Add webhook” button. Click on it to create a webhook. Provide the Jenkins URL with context as “/github-webhook/.” The URL will look as follows.Webhook URL: http://<Jenkins-IP>:8080/github-webhook/You can select the events of your choice; however, for the sake of simplicity, I have chosen “Send me everything.” Make sure the “Active” checkbox is checked. Click on “Add webhook” to create a webhook that will trigger the Jenkins job whenever there is any kind of event in the GitHub Repository. You should see your webhook. Click on it to see if it has been configured correctly or not. Click on the “Recent Deliveries” tab, and you should see a green tick mark. The green tick mark shows that the webhook was able to connect to the Jenkins Server. Deploy the Node.js Application to the ECS Cluster Before we trigger the Pipeline from GitHub Webhook, let's try to execute it manually. Build the Job Manually Go to the Job we created and Build it. If you see its logs, you will see that it failed. The reason is we have not yet assigned values to the variable we have in our Jenkinsfile. Push Your Second Commit Reminder Note: For the rest of the article, do not change your directory. Stay in the same directory, i.e.,/home/ubuntu/demo-nodejs-app, and execute all the commands from here. Assign values to the variable in the Jenkinsfile To overcome the above error, you need to make some changes to the Jenkinsfile. We have variables in that file, and we need to assign values to those variables to deploy our application to the ECS cluster we created. Assign correct values to the variables having “CHANGE_ME.”cat Jenkinsfile Here is the list of variables for your convenience.We have the following variables in the Jenkinsfile. AWS_ACCOUNT_ID=”CHANGE_ME”Assign your AWS Account Number here. AWS_DEFAULT_REGION=”CHANGE_ME”Assign the region you created your ECS Cluster in CLUSTER_NAME=”CHANGE_ME”Assign the name of the ECS Cluster that you created. SERVICE_NAME=”CHANGE_ME”Assign the Service name that got created in the ECS Cluster. TASK_DEFINITION_NAME=”CHANGE_ME”Assign the Task name that got created in the ECS Cluster. DESIRED_COUNT=”CHANGE_ME”Assing the number of tasks you want to be created in the ECS Cluster. IMAGE_REPO_NAME=”CHANGE_ME”Assign the ECR Repositoy URL IMAGE_TAG=”${env.BUILD_ID}”Do not change this. REPOSITORY_URI = “${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_DEFAULT_REGION}.amazonaws.com/${IMAGE_REPO_NAME}”Do not change this. registryCredential = “CHANGE_ME”Assign the name of the credentials you created in Jenkins to store the AWS Access Key and Secret Key Check the status to confirm that the file has been changed.git statuscat Jenkinsfile Add a file to the git staging area, commit it, and then push it to the remote Github Repository.git statusgit add Jenkinsfilegit commit -m “Assigned environment specific values in Jenkinsfile”git push Error on Jenkins Server After pushing the commit, the Jenkins Pipeline will get triggered. However, you will see an error “Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock” in your Jenkins Job. The reason for this is a “Jenkins” user that is used by the Jenkins Job is not allowed to create docker objects. To give permission to a “Jenkins” user, we added it to the “docker” group in the previous step; however, we did not restart the Jenkins service after that. I kept this deliberately so that I could show you the need to add the “Jenkins” user to the “docker” group in your EC2 Instance. Now you know what needs to be done to overcome the above error. Restart the Jenkins service.sudo service jenkins restart Check if the Jenkins service has started or not.sudo service jenkins status Push Your Third Commit Make some changes in README.md to commit, push, and test if the Pipeline gets triggered automatically or not.vim README.md Add, commit, and push the file.git statusgit diff README.mdgit add README.mdgit commit -m “Modified README.md to trigger the Jenkins job after restarting the Jenkins service”git push This time, you can observe that the job must have been triggered automatically. Go to the Jenkins job and verify the same. This is what the Stage View looks like. It shows us the stages that we have specified in our Jenkinsfile. Check the Status of the Task in the ECS Cluster Go to the Cluster, click on the “Tasks” tab, and then open the running “Task.” Click on the “JSON” tab and verify the image. The image tag should match the Jenkins Build number. In this case, it is “6,” and it matches my Jenkins Job Build number. Hit the ELB URL to check if the Nodejs application is available or not. You should get the message as follows in the browser after hitting the ELB URL. Push Your Fourth Commit Open the “src/server.js” file and make some changes in the display message to test the CI CD Pipeline again.vim src/server.js Check the files that have been changed. In this case, only one file can be seen as changed.git status Check the difference that your change has caused in the file.git diff src/server.js Add the file that you changed to the git staging area.git add src/server.js Check the status of the local repository.git status Add a message to the commit.git commit -m “Updated welcome message” Push your change to the remote repository.git push Go to the Task. This time, you will see two tasks running. One with the older revision and one with the newer revision. You see two tasks because of the rolling-update deployment strategy configured by default in the cluster. Wait for around 2-3 minutes, and you should only have one task running with the latest revision. Again, hit the ELB URL, and you should see your changes. In this case, we had changed the display message.Congratulations! You have a working Jenkins CI CD Pipeline to deploy your Nodejs containerized application on AWS ECS whenever there is a change in your source code. Cleanup the Resources We Created If you were just trying to set up a CI/CD pipeline to get familiar with it or for POC purposes in your organization and no longer need it, it is always better to delete the resources you created while carrying out the POC. As part of this CI/CD pipeline, we created a few resources. We created the below list to help you delete them. Delete the GitHub Repository Delete the GitHub Token Delete the IAM User Delete the EC2 Instance Delete the ECR Repository Delete the ECS Cluster Deregister the Task Definition Summary Finally, here is the summary of what you have to do to set up a CI/CD Docker pipeline to deploy a sample Node.js application on AWS ECS using Jenkins. Clone the existing sample GitHub Repository Create a new GitHub Repository and copy the code from the sample repository in it Create a GitHub Token Create an IAM User Create an ECR Repository Create an ECS Cluster Create an EC2 Instance for setting up the Jenkins Server Install Java, JSON processor jq, Node.js, and NPM on the EC2 Instance Install Jenkins on the EC2 Instance Install Docker on the EC2 Instance Install Plugins Create Credentials in Jenkins Create a Jenkins Job Integrate GitHub and Jenkins Check the deployment Cleanup the resources Conclusion A CI/CD Pipeline serves as a way of automating your software applications’ builds, tests, and deployments. It is the backbone of any organization with a DevOps culture. It has numerous benefits for software development, and it boosts your business greatly. In this blog, we demonstrated the steps to create a Jenkins CI/CD Docker Pipeline to deploy a sample Node.js containerized application on AWS ECS. We saw how GitHub Webhooks can be used to trigger the Jenkins pipeline on every push to the repository, which in turn deploys the latest docker image to AWS ECS. CI/CD Pipelines with Docker is best for your organization to improve code quality and deliver software releases quickly without any human errors. We hope this blog helped you learn more about the integral parts of the CI/CD Docker Pipeline.
This is an article from DZone's 2023 Database Systems Trend Report.For more: Read the Report The cloud is seamlessly integrated with almost all aspects of life, like business, personal computing, social media, artificial intelligence, Internet of Things, and more. In this article, we will dive into clouds and discuss their optimal suitability based on different types of organizational or individual needs. Public vs. Private Cloud Evaluation Mixing and matching cloud technologies provides a lot of ease and flexibility, but it comes with a lot of responsibilities, too. Sometimes it is difficult to make a choice between the types of cloud available, i.e., public, private, or hybrid cloud. An evaluation based on providers, cloud, and project demand is very crucial to selecting the right type. When evaluating a public and private cloud, it is important to consider the factors listed in Table 1: PUBLIC VS. PRIVATE CLOUD Public Cloud Private Cloud Best use cases Good for beginners Testing new features with minimal cost and setup Handling protected data and industry compliance Ensuring customized security Dedicated resources Cost Pay per usage Can be expensive Workload Suitable for fluctuating workloads Offers scalability and flexibility Suitable for predictable workloads Data confidentiality requirements (e.g., hospitals with confidential patient data) Infrastructure Shared infrastructure Can use existing on-premises investments Services Domain-specific services, including healthcare, education, finance, and retail Industry-specific customization options, including infrastructure, hardware, software stack, security, and access control Presence Global Reduces latency for geographically distributed customers Effective for limited geographic audiences and targeted client needs Table 1 Hybrid Cloud Both public and private clouds are useful in various ways, so it is possible to choose both to gain maximum advantages. This approach is achieved by adopting a hybrid cloud. Let's understand some of the key factors to consider: Hybrid clouds are suitable when the workload is both predicted or variable. A public cloud provides scalability and on-demand resources during peak seasons, while a private cloud handles base workload during off-seasons. To save money, public clouds can be shut off during non-peak seasons and store non-sensitive data. Private clouds generally cost more, but it is necessary for storing sensitive data. Private clouds are used for confidential data; non-regulated or non-sensitive information can be stored in a public cloud. Hybrid cloud is suitable for businesses operating in multiple regions. Private clouds serve specific regions, while public cloud providers offer global reach and accessibility for other services. Before adopting a hybrid approach, thorough analysis should be done, keeping factors such as workload patterns, budget, and compliance needs in consideration. Figure 1: Hybrid cloud combines features of public and private cloud Key Considerations for DBAs and Non-DBAs To achieve overall operational efficiency, it is essential for DBAs and non-DBAs to understand the key considerations related to database management. These considerations will help in effective collaboration, streamlined processes, and optimized data usage within an organization. Cost Optimization Cost is one of the major decision-making factors for everyone. It is very crucial to consider cost optimization strategies surrounding data, backup and archival strategies, and storage. Data is one of the most important factors when it comes to cost saving. It is always good to know your data patterns and to understand who the end user is so the classification of data is optimized for storage. No duplicates in data means no extra storage used. Also, an in-depth understanding of the types of storage available is required in order to get maximum benefit within budget. Classify your data into a structured or unstructured format. It is important for DBAs to analyze data that is no longer actively used but might be needed in the future. By moving this data to archival storage, DBAs can effectively save primary storage space. Implementing efficient backup strategies can help minimize redundant storage requirements, hence less cost. Data, storage, and cost are directly proportional to each other, so it is important to review these three for maximum benefits and performance with minimum costs from cloud providers. Available storage options include object storage, block storage, thin provisioning, and tiered storage. Figure 2: Data is directly proportional to the storage used and cost savings Performance To optimize storage speed, cloud providers use technologies like CDNs. Network latency can be reduced through strategies such as data compression, CDNs, P2P networking, edge computing, and geographic workload distribution. Larger memory capacity improves caching and overall performance. Computing power also plays a vital role. Factors like CPU-intensive tasks and parallel processing should be considered. GPUs or TPUs offer improved performance for intensive workloads in machine learning, data analytics, and video processing. Disaster Recovery Data should be available to recover after a disaster. If choosing a private cloud, be ready with backup and make sure you will be able to recover! It's important to distribute data, so if one area is affected, other locations can serve and business can run as usual. Security Cloud providers have various levels of data protection: With multi-factor authentication, data will be secure by adding an extra layer of verification. A token will be sent to a mobile device, via email, or by a preferred method like facial recognition. The right data should be accessible by the right consumer. To help these restrictions, technologies like identity and access management or role-based access control assign permission to the users based on assigned roles. Virtual private networks could be your savior. They provide secure private network connections over public networks and encrypted tunnels between the consumer's device and the cloud that will protect data from intruders. With encryption algorithms, clouds protect data at rest and in transit within infrastructure. However, it is always a good idea for a DBA and non-DBA to configure encryption settings within an app to achieve an organization's required security. Scalability When working with a cloud, it is important to understand how scalability is achieved: Cloud providers deploy virtual instances of servers, storage, and networks, which result in faster provisioning and allocation of virtual resources on demand. Serverless computing allows developers to focus on writing code. Infrastructure, scaling, and other resources required to handle incoming requests will be handled by cloud providers. Cloud providers suggest horizontal scaling instead of vertical scaling. By adding more servers instead of upgrading hardware in existing machines, the cloud will develop a distributed workload, which increases capacity. Vendor Lock-In For organizations looking for flexibility and varying cloud deployments, the risk of vendor lock-in can be limiting. To minimize this risk, implementing a hybrid cloud approach enables the distribution of data, flexibility, and easy migration. Using multiple cloud providers through a hybrid model helps avoid dependence on a single vendor with diverse capabilities. Open data formats and vendor-independent storage solutions will help in easy porting. In addition, containerization technologies for applications allow flexible vendor selection. It is essential for organizations to consider exit strategies, including contractual support, to ensure smooth transitions between vendors and to reduce challenges. Next Steps Cloud computing is a huge revolution for not only the IT industry but also for individuals. Here are some next steps based on the features and extensive usage. Extensive Acceptance Cloud computing is a long-term deal. It offers flexibility and the ability for developers to focus on code alone, even if they don't have prior knowledge or a dedicated team to maintain infrastructure. Other benefits include increased innovation, since most of the hustle is taken care by the cloud providers, and little-to no downtime, which is great for businesses that operate 24/7. Database Options To Mix and Match in the Cloud When we talk about cloud computing, there are many databases available. The following are some popular database options: DATABASE OPTIONS NoSQL Relational Serverless Managed Database Services Purpose Structured and unstructured data Structured and complex queries Unpredictable workloads Simplify database management Pros Scalability, flexibility, pairs well with cloud computing Strong data consistency, robust query capabilities Pay per usage, server management Scaling, automated backups, maintenance, cost-effective Cons Data consistency Scalability, fixed schema Vendor lock-in Dependent on providers Prevalence Will grow in popularity Will continue to stick around Will grow in popularity Will be a secure choice Examples MongoDB, Cassandra MySQL, Postgres, Oracle Google Cloud Firestore, Amazon Aurora Serverless Table 2 Conclusion It is important to note that there are scenarios where a private cloud might be preferred, such as when strict data security and compliance requirements exist, or when an organization needs maximum control over infrastructure and data. Each organization should evaluate its specific needs and consider a hybrid cloud approach, as well. Cloud providers often introduce new instances, types, updates, and features, so it is always good to review their documentation carefully for the most up-to-date information. By mindfully assessing vendor lock-in risks and implementing appropriate strategies, businesses can maintain flexibility and control over their cloud deployments while minimizing the challenges associated with switching cloud providers in the future. In this article, I have shared my own opinions and experiences as a DBA — I hope it offered additional insights and details about cloud options that can help to improve performance and cost savings based on individual objectives. This is an article from DZone's 2023 Database Systems Trend Report.For more: Read the Report
This is an article from DZone's 2023 Data Pipelines Trend Report.For more: Read the Report Data-driven design is a game changer. It uses real data to shape designs, ensuring products match user needs and deliver user-friendly experiences. This approach fosters constant improvement through data feedback and informed decision-making for better results. In this article, we will explore the importance of data-driven design patterns and principles, and we will look at an example of how the data-driven approach works with artificial intelligence (AI) and machine learning (ML) model development. Importance of the Data-Driven Design Data-driven design is crucial as it uses real data to inform design decisions. This approach ensures that designs are tailored to user needs, resulting in more effective and user-friendly products. It also enables continuous improvement through data feedback and supports informed decision-making for better outcomes. Data-driven design includes the following: Data visualization – Aids designers in comprehending trends, patterns, and issues, thus leading to effective design solutions. User-centricity – Data-driven design begins with understanding users deeply. Gathering data about user behavior, preferences, and challenges enables designers to create solutions that precisely meet user needs. Iterative process – Design choices are continuously improved through data feedback. This iterative method ensures designs adapt and align with user expectations as time goes on. Measurable outcomes – Data-driven design targets measurable achievements, like enhanced user engagement, conversion rates, and satisfaction. This is a theory, but let's reinforce it with good examples of products based on data-driven design: Netflix uses data-driven design to predict what content their customers will enjoy. They analyze daily plays, subscriber ratings, and searches, ensuring their offerings match user preferences and trends. Uber uses data-driven design by collecting and analyzing vast amounts of data from rides, locations, and user behavior. This helps them optimize routes, estimate fares, and enhance user experiences. Uber continually improves its services by leveraging data insights based on real-world usage patterns. Waze uses data-driven design by analyzing real-time GPS data from drivers to provide accurate traffic updates and optimal route recommendations. This data-driven approach ensures users have the most up-to-date and efficient navigation experience based on the current road conditions and user behavior. Common Data-Driven Architectural Principles and Patterns Before we jump into data-driven architectural patterns, let's reveal what data-driven architecture and its fundamental principles are. Data-Driven Architectural Principles Data-driven architecture involves designing and organizing systems, applications, and infrastructure with a central focus on data as a core element. Within this architectural framework, decisions concerning system design, scalability, processes, and interactions are guided by insights and requirements derived from data. Fundamental principles of data-driven architecture include: Data-centric design – Data is at the core of design decisions, influencing how components interact, how data is processed, and how insights are extracted. Real-time processing – Data-driven architectures often involve real-time or near real-time data processing to enable quick insights and actions. Integration of AI and ML – The architecture may incorporate AI and ML components to extract deeper insights from data. Event-driven approach – Event-driven architecture, where components communicate through events, is often used to manage data flows and interactions. Data-Driven Architectural Patterns Now that we know the key principles, let's look into data-driven architecture patterns. Distributed data architecture patterns include the data lakehouse, data mesh, data fabric, and data cloud. Data Lakehouse Data lakehouse allows organizations to store, manage, and analyze large volumes of structured and unstructured data in one unified platform. Data lakehouse architecture provides the scalability and flexibility of data lakes, the data processing capabilities, and the query performance of data warehouses. This concept is perfectly implemented in Delta Lake. Delta Lake is an extension of Apache Spark that adds reliability and performance optimizations to data lakes. Data Mesh The data mesh pattern treats data like a product and sets up a system where different teams can easily manage their data areas. The data mesh concept is similar to how microservices work in development. Each part operates on its own, but they all collaborate to make the whole product or service of the organization. Companies usually use conceptual data modeling to define their domains while working toward this goal. Data Fabric Data fabric is an approach that creates a unified, interconnected system for managing and sharing data across an organization. It integrates data from various sources, making it easily accessible and usable while ensuring consistency and security. A good example of a solution that implements data fabric is Apache NiFi. It is an easy-to-use data integration and data flow tool that enables the automation of data movement between different systems. Data Cloud Data cloud provides a single and adaptable way to access and use data from different sources, boosting teamwork and informed choices. These solutions offer tools for combining, processing, and analyzing data, empowering businesses to leverage their data's potential, no matter where it's stored. Presto exemplifies an open-source solution for building a data cloud ecosystem. Serving as a distributed SQL query engine, it empowers users to retrieve information from diverse data sources such as cloud storage systems, relational databases, and beyond. Now we know what data-driven design is, including its concepts and patterns. Let's have a look at the pros and cons of this approach. Pros and Cons of Data-Driven Design It's important to know the strong and weak areas of the particular approach, as it allows us to choose the most appropriate approach for our architecture and product. Here, I gathered some pros and cons of data-driven architecture: PROS AND CONS OF DATA-DRIVEN DESIGN Pros Cons Personalized experiences: Data-driven architecture supports personalized user experiences by tailoring services and content based on individual preferences. Privacy concerns: Handling large amounts of data raises privacy and security concerns, requiring robust measures to protect sensitive information. Better customer understanding: Data-driven architecture provides deeper insights into customer needs and behaviors, allowing businesses to enhance customer engagement. Complex implementation: Implementing data-driven architecture can be complex and resource-intensive, demanding specialized skills and technologies. Informed decision-making: Data-driven architecture enables informed and data-backed decision-making, leading to more accurate and effective choices. Dependency on data availability: The effectiveness of data-driven decisions relies on the availability and accuracy of data, leading to potential challenges during data downtimes. Table 1 Data-Driven Approach in ML Model Development and AI A data-driven approach in ML model development involves placing a strong emphasis on the quality, quantity, and diversity of the data used to train, validate, and fine-tune ML models. A data-driven approach involves understanding the problem domain, identifying potential data sources, and gathering sufficient data to cover different scenarios. Data-driven decisions help determine the optimal hyperparameters for a model, leading to improved performance and generalization. Let's look at the example of the data-driven architecture based on AI/ML model development. The architecture represents the factory alerting system. The factory has cameras that shoot short video clips and photos and send them for analysis to our system. Our system has to react quickly if there is an incident. Below, we share an example of data-driven architecture using Azure Machine Learning, Data Lake, and Data Factory. This is only an example, and there are a multitude of tools out there that can leverage data-driven design patterns. The IoT Edge custom module captures real-time video streams, divides them into frames, and forwards results and metadata to Azure IoT Hub. The Azure Logic App watches IoT Hub for incident messages, sending SMS and email alerts, relaying video fragments, and inferencing results to Azure Data Factory. It orchestrates the process by fetching raw video files from Azure Logic App, splitting them into frames, converting inferencing results to labels, and uploading data to Azure Blob Storage (the ML data repository). Azure Machine Learning begins model training, validating data from the ML data store, and copying required datasets to premium blob storage. Using the dataset cached in premium storage, Azure Machine Learning trains, validates model performance, scores against the new model, and registers it in the Azure Machine Learning registry. Once the new ML inferencing module is ready, Azure Pipelines deploys the module container from Container Registry to the IoT Edge module within IoT Hub, updating the IoT Edge device with the updated ML inferencing module. Figure 1: Smart alerting system with data-driven architecture Conclusion In this article, we dove into data-driven design concepts and explored how they merge with AI and ML model development. Data-driven design uses insights to shape designs for better user experiences, employing iterative processes, data visualization, and measurable outcomes. We've seen real-world examples like Netflix using data to predict content preferences and Uber optimizing routes via user data. Data-driven architecture, encompassing patterns like data lakehouse and data mesh, orchestrates data-driven solutions. Lastly, our factory alerting system example showcases how AI, ML, and data orchestrate an efficient incident response. A data-driven approach empowers innovation, intelligent decisions, and seamless user experiences in the tech landscape. This is an article from DZone's 2023 Data Pipelines Trend Report.For more: Read the Report
GitHub Actions has a large ecosystem of high-quality third-party actions, plus built-in support for executing build steps inside Docker containers. This means it's easy to run end-to-end tests as part of a workflow, often only requiring a single step to run testing tools with all the required dependencies. In this post, I show you how to run browser tests with Cypress and API tests with Postman as part of a GitHub Actions workflow. Getting Started GitHub Actions is a hosted service, so all you need to get started is a GitHub account. All other dependencies, like Software Development Kits (SDKs) or testing tools, are provided by the Docker images or GitHub Actions published by testing platforms. Running Browser Tests With Cypress Cypress is a browser automation tool that lets you interact with web pages in much the same way an end user would, for example by clicking on buttons and links, filling in forms, and scrolling the page. You can also verify the content of a page to ensure the correct results are displayed. The Cypress documentation provides an example first test which has been saved to the junit-cypress-test GitHub repo. The test is shown below: describe('My First Test', () => { it('Does not do much!', () => { expect(true).to.equal(true) }) }) This test is configured to generate a JUnit report file in the cypress.json file: { "reporter": "junit", "reporterOptions": { "mochaFile": "cypress/results/results.xml", "toConsole": true } } The workflow file below executes this test with the Cypress GitHub Action, saves the generated video file as an artifact, and processes the test results. You can find an example of this workflow in the junit-cypress-test repository: name: Cypress on: push: workflow_dispatch: jobs: build: runs-on: ubuntu-latest steps: - name: Checkout uses: actions/checkout@v1 - name: Cypress run uses: cypress-io/github-action@v2 - name: Save video uses: actions/upload-artifact@v2 with: name: sample_spec.js.mp4 path: cypress/videos/sample_spec.js.mp4 - name: Report uses: dorny/test-reporter@v1 if: always() with: name: Cypress Tests path: cypress/results/results.xml reporter: java-junit fail-on-error: true The official Cypress GitHub action is called to execute tests with the default options: - name: Cypress run uses: cypress-io/github-action@v2 Cypress generates a video file capturing the browser as the tests are run. You save the video file as an artifact to be downloaded and viewed after the workflow completes: - name: Save video uses: actions/upload-artifact@v2 with: name: sample_spec.js.mp4 path: cypress/videos/sample_spec.js.mp4 The test results are processed by the dorny/test-reporter action. Note that test reporter has the ability to process Mocha JSON files, and Cypress uses Mocha for reporting, so an arguably more idiomatic solution would be to have Cypress generate Mocha JSON reports. Unfortunately, there is a bug in Cypress that prevents the JSON reporter from saving results as a file. Generating JUnit report files is a useful workaround until this issue is resolved: - name: Report uses: dorny/test-reporter@v1 if: always() with: name: Cypress Tests path: cypress/results/results.xml reporter: java-junit fail-on-error: true Here are the results of the test: The video file artifact is listed in the Summary page: Not all testing platforms provide a GitHub action, in which case you can execute steps against a standard Docker image. This is demonstrated in the next section. Running API Tests With Newman Unlike Cypress, Postman does not provide an official GitHub action. However, you can use the postman/newman Docker image directly inside a workflow. You can find an example of the workflow in the junit-newman-test repository: name: Cypress on: push: workflow_dispatch: jobs: build: runs-on: ubuntu-latest steps: - name: Checkout uses: actions/checkout@v1 - name: Run Newman uses: docker://postman/newman:latest with: args: run GitHubTree.json --reporters cli,junit --reporter-junit-export results.xml - name: Report uses: dorny/test-reporter@v1 if: always() with: name: Cypress Tests path: results.xml reporter: java-junit fail-on-error: true The uses property for a step can either be the name of a published action, or can reference a Docker image directly. In this example, you run the postman/newman docker image, with the with.args parameter defining the command-line arguments: - name: Run Newman uses: docker://postman/newman:latest with: args: run GitHubTree.json --reporters cli,junit --reporter-junit-export results.xml The resulting JUnit report file is then processed by the dorny/test-reporter action: - name: Report uses: dorny/test-reporter@v1 if: always() with: name: Cypress Tests path: results.xml reporter: java-junit fail-on-error: true Here are the results of the test: Behind the scenes, GitHub Actions executes the supplied Docker image with a number of standard environment variables relating to the workflow and with volume mounts that allow the Docker container to persist changes (like the report file) on the main file system. The following is an example of the command to execute a step in a Docker image: /usr/bin/docker run --name postmannewmanlatest_fefcec --label f88420 --workdir /github/workspace --rm -e INPUT_ARGS -e HOME -e GITHUB_JOB -e GITHUB_REF -e GITHUB_SHA -e GITHUB_REPOSITORY -e GITHUB_REPOSITORY_OWNER -e GITHUB_RUN_ID -e GITHUB_RUN_NUMBER -e GITHUB_RETENTION_DAYS -e GITHUB_RUN_ATTEMPT -e GITHUB_ACTOR -e GITHUB_WORKFLOW -e GITHUB_HEAD_REF -e GITHUB_BASE_REF -e GITHUB_EVENT_NAME -e GITHUB_SERVER_URL -e GITHUB_API_URL -e GITHUB_GRAPHQL_URL -e GITHUB_WORKSPACE -e GITHUB_ACTION -e GITHUB_EVENT_PATH -e GITHUB_ACTION_REPOSITORY -e GITHUB_ACTION_REF -e GITHUB_PATH -e GITHUB_ENV -e RUNNER_OS -e RUNNER_NAME -e RUNNER_TOOL_CACHE -e RUNNER_TEMP -e RUNNER_WORKSPACE -e ACTIONS_RUNTIME_URL -e ACTIONS_RUNTIME_TOKEN -e ACTIONS_CACHE_URL -e GITHUB_ACTIONS=true -e CI=true -v "/var/run/docker.sock":"/var/run/docker.sock" -v "/home/runner/work/_temp/_github_home":"/github/home" -v "/home/runner/work/_temp/_github_workflow":"/github/workflow" -v "/home/runner/work/_temp/_runner_file_commands":"/github/file_commands" -v "/home/runner/work/junit-newman-test/junit-newman-test":"/github/workspace" postman/newman:latest run GitHubTree.json --reporters cli,junit --reporter-junit-export results.xml This is a complex command, but there are a few arguments that we're interested in. The -e arguments define environment variables for the container. You can see that dozens of workflow environment variables are exposed. The --workdir /github/workspace argument overrides the working directory of the Docker container, while the -v "/home/runner/work/junit-newman-test/junit-newman-test":"/github/workspace" argument mounts the workflow workspace to the /github/workspace directory inside the container. This has the effect of mounting the working directory inside the Docker container, which exposes the checked-out files, and allows any newly created files to persist once the container is shutdown: Because every major testing tool provides a supported Docker image, the process you used to run Newman can be used to run most other testing platforms. Conclusion GitHub Actions has enjoyed widespread adoption among developers, and many platforms provide supported actions for use in workflows. For those cases where there is no suitable action available, GitHub Actions provides an easy way to execute a standard Docker image as part of a workflow. In this post, you learned how to run the Cypress action to execute browser-based tests and how to run the Newman Docker image to execute API tests. Happy deployments!
Seasoned software engineers long for the good old days when web development was simple. You just needed a few files and a server to get up and running. No complicated infrastructure, no endless amount of frameworks and libraries, and no build tools. Just some ideas and some code hacked together to make an app come to life. Whether or not this romanticized past was actually as great as we think it was, developers today agree that software engineering has gotten complicated. There are too many choices with too much setup involved. In response to this sentiment, many products are providing off-the-shelf starter kits and zero-config toolchains to try to abstract away the complexity of software development. One such startup is Zipper, a company that offers an online IDE where you can create applets that run as serverless TypeScript functions in the cloud. With Zipper, you don’t have to spend time worrying about your toolchain — you can just start writing code and deploy your app within minutes. Today, we’ll be looking at a ping pong ranking app I built — once in 2018 with jQuery, MongoDB, Node.js, and Express; and once in 2023 with Zipper. We’ll examine the development process for each and see just how easy it is to build a powerful app using Zipper. Backstory First, a little context: I love to play ping pong. Every office in which I’ve worked has had a ping pong table, and for many years ping pong was an integral part of my afternoon routine. It’s a great game to relax, blow off some steam, strengthen friendships with coworkers, and reset your brain for a half hour. Those who played ping pong every day began to get a feel for who was good and who wasn’t. People would talk. A handful of people were known as the best in the office, and it was always a challenge to take them on. Being both highly competitive and a software engineer, I wanted to build an app to track who was the best ping pong player in the office. This wouldn’t be for bracket-style tournaments, but just for recording the games that were played every day by anybody. With that, we’d have a record of all the games played, and we’d be able to see who was truly the best. This was 2018, and I had a background in the MEAN/MERN stack (MongoDB, Express, Angular, React, and Node.js) and experience with jQuery before that. After dedicating a week’s worth of lunch breaks and nights to this project, I had a working ping-pong ranking app. I didn’t keep close track of my time spent working on the app, but I’d estimate it took about 10–20 hours to build. Here’s what that version of the app looked like. There was a login and signup page: Office Competition Ranking System — Home page The login page asked for your username and password to authenticate: Office Competition Ranking System — Login page Once authenticated, you could record your match by selecting your opponent and who won: Office Competition Ranking System — Record game results page You could view the leaderboard to see the current office rankings. I even included an Elo rating algorithm like they use in chess: Office Competition Ranking System — Leaderboard page Finally, you could click on any of the players to see their specific game history of wins and losses: Office Competition Ranking System — Player history page That was the app I created back in 2018 with jQuery, MongoDB, Node.js, and Express. And, I hosted it on an AWS EC2 server. Now let’s look at my experience recreating this app in 2023 using only Zipper. About Zipper Zipper is an online tool for creating applets. It uses TypeScript and Deno, so JavaScript and TypeScript users will feel right at home. You can use Zipper to build web services, web UIs, scheduled jobs, and even Slack or GitHub integrations. Zipper even includes auth. In short, what I find most appealing about Zipper is how quickly you can take an idea from conception to execution. It’s perfect for side projects or internal-facing apps to quickly improve a business process. Demo App Here’s the ping-pong ranking app I built with Zipper in just three hours. And that includes time reading through the docs and getting up to speed with an unfamiliar platform! First, the app requires authentication. In this case, I’m requiring users to sign in to their Zipper account: Ping pong ranking app — Authentication page Once authenticated, users can record a new ping-pong match: Ping pong ranking app — Record a new match page They can view the leaderboard: Ping pong ranking app — Leaderboard page And they can view the game history for any individual player: Ping pong ranking app — Player history page Not bad! The best part is that I didn’t have to create any of the UI components for this app. All the inputs and table outputs were handled automatically. And, the auth was created for me just by checking a box in the app settings! You can find the working app hosted publicly on Zipper. Ok, now let’s look at how I built this. Creating a New Zipper App First, I created my Zipper account by authenticating with GitHub. Then, on the main dashboard page, I clicked the Create Applet button to create my first applet. Create your first applet Next, I gave my applet a name, which became its URL. I also chose to make my code public and required users to sign in before they could run the applet. Applet configuration Then I chose to generate my app using AI, mostly because I was curious how it would turn out! This was the prompt I gave it: “I’d like to create a leaderboard ranking app for recording wins and losses in ping pong matches. Users should be able to log into the app. Then they should be able to record a match showing who the two players were and who won and who lost. Users should be able to see the leaderboard for all the players, sorted with the best players displayed at the top and the worst players displayed at the bottom. Users should also be able to view a single player to see all of their recorded matches and who they played and who won and who lost.” Applet initialization I might need to get better at prompt engineering because the output didn’t include all the features or pages I wanted. The AI-generated code included two files: a generic “hello world” main.ts file, and a view-player.ts file for viewing the match history of an individual player. main.ts file generated by AI view-player.ts file generated by AI So, the app wasn’t perfect from the get-go, but it was enough to get started. Writing the Ping Pong App Code I knew that Zipper would handle the authentication page for me, so that left three pages to write: A page to record a ping-pong match A page to view the leaderboard A page to view an individual player’s game history Record a New Ping Pong Match I started with the form to record a new ping-pong match. Below is the full main.ts file. We’ll break it down line by line right after this. TypeScript type Inputs = { playerOneID: string; playerTwoID: string; winnerID: string; }; export async function handler(inputs: Inputs) { const { playerOneID, playerTwoID, winnerID } = inputs; if (!playerOneID || !playerTwoID || !winnerID) { return "Error: Please fill out all input fields."; } if (playerOneID === playerTwoID) { return "Error: PlayerOne and PlayerTwo must have different IDs."; } if (winnerID !== playerOneID && winnerID !== playerTwoID) { return "Error: Winner ID must match either PlayerOne ID or PlayerTwo ID"; } const matchID = Date.now().toString(); const matchInfo = { matchID, winnerID, loserID: winnerID === playerOneID ? playerTwoID : playerOneID, }; try { await Zipper.storage.set(matchID, matchInfo); return `Thanks for recording your match. Player ${winnerID} is the winner!`; } catch (e) { return `Error: Information was not written to the database. Please try again later.`; } } export const config: Zipper.HandlerConfig = { description: { title: "Record New Ping Pong Match", subtitle: "Enter who played and who won", }, }; Each file in Zipper exports a handler function that accepts inputs as a parameter. Each of the inputs becomes a form in UI, with the input type being determined by the TypeScript type that you give it. After doing some input validation to ensure that the form was correctly filled out, I stored the match info in Zipper’s key-value storage. Each Zipper applet gets its own storage instance that any of the files in your applet can access. Because it’s a key-value storage, objects work nicely for values since they can be serialized and deserialized as JSON, all of which Zipper handles for you when reading from and writing to the database. At the bottom of the file, I’ve added a HandlerConfig to add some title and instruction text to the top of the page in the UI. With that, the first page is done. Ping pong ranking app — Record a new match page Leaderboard Next up is the leaderboard page. I’ve reproduced the leaderboard.ts file below in full: TypeScript type PlayerRecord = { playerID: string; losses: number; wins: number; }; type PlayerRecords = { [key: string]: PlayerRecord; }; type Match = { matchID: string; winnerID: string; loserID: string; }; type Matches = { [key: string]: Match; }; export async function handler() { const allMatches: Matches = await Zipper.storage.getAll(); const matchesArray: Match[] = Object.values(allMatches); const players: PlayerRecords = {}; matchesArray.forEach((match: Match) => { const { loserID, winnerID } = match; if (players[loserID]) { players[loserID].losses++; } else { players[loserID] = { playerID: loserID, losses: 0, wins: 0, }; } if (players[winnerID]) { players[winnerID].wins++; } else { players[winnerID] = { playerID: winnerID, losses: 0, wins: 0, }; } }); return Object.values(players); } export const config: Zipper.HandlerConfig = { run: true, description: { title: "Leaderboard", subtitle: "See player rankings for all recorded matches", }, }; This file contains a lot more TypeScript types than the first file did. I wanted to make sure my data structures were nice and explicit here. After that, you see our familiar handler function, but this time without any inputs. That’s because the leaderboard page doesn’t need any inputs; it just displays the leaderboard. We get all of our recorded matches from the database, and then we manipulate the data to get it into an array format of our liking. Then, simply by returning the array, Zipper creates the table UI for us, even including search functionality and column sorting. No UI work is needed! Finally, at the bottom of the file, you’ll see a description setup that’s similar to the one on our main page. You’ll also see the run: true property, which tells Zipper to run the handler function right away without waiting for the user to click the Run button in the UI. Ping pong ranking app — Leaderboard page Player History Alright, two down, one to go. Let’s look at the code for the view-player.ts file, which I ended up renaming to player-history.ts: TypeScript type Inputs = { playerID: string; }; type Match = { matchID: string; winnerID: string; loserID: string; }; type Matches = { [key: string]: Match; }; type FormattedMatch = { matchID: string; opponent: string; result: "Won" | "Lost"; }; export async function handler({ playerID }: Inputs) { const allMatches: Matches = await Zipper.storage.getAll(); const matchesArray: Match[] = Object.values(allMatches); const playerMatches = matchesArray.filter((match: Match) => { return playerID === match.winnerID || playerID === match.loserID; }); const formattedPlayerMatches = playerMatches.map((match: Match) => { const formattedMatch: FormattedMatch = { matchID: match.matchID, opponent: playerID === match.winnerID ? match.loserID : match.winnerID, result: playerID === match.winnerID ? "Won" : "Lost", }; return formattedMatch; }); return formattedPlayerMatches; } export const config: Zipper.HandlerConfig = { description: { title: "Player History", subtitle: "See match history for the selected player", }, }; The code for this page looks a lot like the code for the leaderboard page. We include some types for our data structures at the top. Next, we have our handler function which accepts an input for the player ID that we want to view. From there, we fetch all the recorded matches and filter them for only matches in which this player participated. After that, we manipulate the data to get it into an acceptable format to display, and we return that to the UI to get another nice auto-generated table. Ping pong ranking app — Player history page Conclusion That’s it! With just three handler functions, we’ve created a working app for tracking our ping-pong game history. This app does have some shortcomings that we could improve, but we’ll leave that as an exercise for the reader. For example, it would be nice to have a dropdown of users to choose from when recording a new match, rather than entering each player’s ID as text. Maybe we could store each player’s ID in the database and then display those in the UI as a dropdown input type. Or, maybe we’d like to turn this into a Slack integration to allow users to record their matches directly in Slack. That’s an option too! While my ping pong app isn’t perfect, I hope the takeaway here is how easy it is to get up and running with a product like Zipper. You don’t have to spend time agonizing over your app’s infrastructure when you have a simple idea that you just want to see working in production. Just get out there, start building, and deploy!
Developers craft software that both delights consumers and delivers innovative applications for enterprise users. This craft requires more than just churning out heaps of code; it embodies a process of observing, noticing, interviewing, brainstorming, reading, writing, and rewriting specifications; designing, prototyping, and coding to the specifications; reviewing, refactoring and verifying the software; and a virtuous cycle of deploying, debugging and improving. At every stage of this cycle, developers consume and generate two things: code and text. Code is text, after all. The productivity of the developers is limited by real-world realities, challenges with timelines, unclear requirements, legacy codebase, and more. To overcome these obstacles and still meet the deadlines, developers have long relied on adding new tools to their toolbox. For example, code generation tools such as compilers, UI generators, ORM mappers, API generators, etc. Developers have embraced these tools without reservation, progressively evolving them to offer more intelligent functionalities. Modern compilers do more than just translate; they rewrite and optimize the code automatically. SQL, developed fifty years ago as a declarative language with a set of composable English templates, continues to evolve and improve data access experience and developer productivity. Developers have access to an endless array of tools to expand their toolbox. The Emergence of GenAI GenAI is a new, powerful tool for the developer toolbox. GenAI, short for Generative AI, is a subset of AI capable of taking prompts and then autonomously creating many forms of content — text, code, images, videos, music, and more — that imitate and often mirror the quality of human craftsmanship. Prompts are instructions in the form of expository writing. Better prompts produce better text and code. The seismic surge surrounding GenAI, supported with technologies such as ChatGPT and copilot, positions 2023 to be heralded as the “Year of GenAI.” GenAI’s text generation capability is expected to revolutionize every aspect of developer experience and productivity. Impact on Developers Someone recently noted, “In 2023, natural language has emerged as the fastest programming language.” While the previous generation of tools focused on incremental improvement to productivity for writing code and improving code quality, GenAI tools promise to revolutionize these and every other aspect of developer work. ChatGPT can summarize a long requirement specification, give you the delta of what changed between the two versions, or help you come up with a checklist of a specific task. For coding, the impact is dramatic. Since these models have been trained on the entire internet, billions of parameters, and trillions of tokens, they’ve seen a lot of code. By writing a good prompt, you make it to write a big piece of code, design the APIs, and refactor the code. And in just one sentence, you can ask ChatGPT to rewrite everything in a brand-new language. All these possibilities were simply science fiction just a few years ago. It makes the mundane tasks disappear, hard tasks easier, and difficult tasks possible. Developers are relying more on ChatGPT to explain new concepts and clarify confusing ideas. Apparently, this trend has reduced the traffic to StackOverflow, a popular Q&A site for developers, anywhere between 16% to 50%, on various measures! Developers choose the winning tool. But there’s a catch. More than one, in fact. The GenAI tools of the current generation, although promising, are unaware of your goals and objectives. These tools, developed through training on a vast array of samples, operate by predicting the succeeding token, one at a time, rooted firmly in the patterns they have previously encountered. Their answer is guided and constrained by the prompt. To harness their potential effectively, it becomes imperative to craft detailed, expository-style prompts. This nudges the technology to produce output that is closer to the intended goal, albeit with a style and creativity that is bounded by their training data. They excel in replicating styles they have been exposed to but fall short in inventing unprecedented ones. Multiple companies and groups are busy with training LLMs for specific tasks to improve their content generation. I recommend heeding the advice of Sathya Nadella, Microsoft’s CEO, who suggests it is prudent to treat the content generated by GenAI as a draft, requiring thorough review to ensure its clarity and accuracy. The onus falls on the developer to delineate between routine tasks and those demanding creativity — a discernment that remains beyond GenAI’s reach, at least for now. Despite this, with justifiable evidence, GenAI promises improved developer experience and productivity. OpenAI’s ChatGPT raced to 100 million users in a record time. Your favorite IDEs have plugins to exploit it. Microsoft has promised to use GenAI in all its products, including its revitalized search offering, bing.com. Google has answered with its own suite of services and products; Facebook and others have released multiple models to help developers progress. It’s a great time to be a developer. The revolution has begun promptly. At Couchbase, we’ve introduced generative AI capabilities into our Database as a Service Couchbase Capella to significantly enhance developer productivity and accelerate time to market for modern applications. The new capability called Capella iQ enables developers to write SQL++ and application-level code more quickly by delivering recommended sample code.
This is an article from DZone's 2023 Database Systems Trend Report.For more: Read the Report Hearing the vague statement, "We have a problem with the database," is a nightmare for any database manager or administrator. Sometimes it's true, sometimes it's not, and what exactly is the issue? Is there really a database problem? Or is it a problem with networking, an application, a user, or another possible scenario? If it is a database, what is wrong with it? Figure 1: DBMS usage Databases are a crucial part of modern businesses, and there are a variety of vendors and types to consider. Databases can be hosted in a data center, in the cloud, or in both for hybrid deployments. The data stored in a database can be used in various ways, including websites, applications, analytical platforms, etc. As a database administrator or manager, you want to be aware of the health and trends of your databases. Database monitoring is as crucial as databases themselves. How good is your data if you can't guarantee its availability and accuracy? Database Monitoring Considerations Database engines and databases are systems hosted on a complex IT infrastructure that consists of a variety of components: servers, networking, storage, cables, etc. Database monitoring should be approached holistically with consideration of all infrastructure components and database monitoring itself. Figure 2: Database monitoring clover Let's talk more about database monitoring. As seen in Figure 2, I'd combine monitoring into four pillars: availability, performance, activity, and compliance. These are broad but interconnected pillars with overlap. You can add a fifth "clover leaf" for security monitoring, but I include that aspect of monitoring into activity and compliance, for the same reason capacity planning falls into availability monitoring. Let's look deeper into monitoring concepts. While availability monitoring seems like a good starting topic, I will deliberately start with performance since performance issues may render a database unavailable and because availability monitoring is "monitoring 101" for any system. Performance Monitoring Performance monitoring is the process of capturing, analyzing, and alerting to performance metrics of hardware, OS, network, and database layers. It can help avoid unplanned downtimes, improve user experience, and help administrators manage their environments efficiently. Native Database Monitoring Most, if not all, enterprise-grade database systems come with a set of tools that allow database professionals to examine internal and/or external database conditions and the operational status. These are system-specific, technical tools that require SME knowledge. In most cases, they are point-in-time performance data with limited or non-existent historical value. Some vendors provide additional tools to simplify performance data collection and analysis. With an expansion of cloud-based offerings (PaaS or IaaS), I've noticed some improvements in monitoring data collection and the available analytics and reporting options. However, native performance monitoring is still a set of tools for a database SME. Enterprise Monitoring Systems Enterprise monitoring systems (EMSs) offer a centralized approach to keeping IT systems under systematic review. Such systems allow monitoring of most IT infrastructure components, thus consolidating supervised systems with a set of dashboards. There are several vendors offering comprehensive database monitoring systems to cover some or all your monitoring needs. Such solutions can cover multiple database engines or be specific to a particular database engine or a monitoring aspect. For instance, if you only need to monitor SQL servers and are interested in the performance of your queries, then you need a monitoring system that identifies bottlenecks and contentions. Let's discuss environments with thousands of database instances (on-premises and in a cloud) scattered across multiple data centers across the globe. This involves monitoring complexity growth with a number of monitored devices, database type diversity, and geographical locations of your data centers and actual data that you monitor. It is imperative to have a global view of all database systems under one management and an ability to identify issues, preferably before they impact your users. EMSs are designed to help organizations align database monitoring with IT infrastructure monitoring, and most solutions include an out-of-the-box set of dashboards, reports, graphs, alerts, useful tips, and health history and trends analytics. They also have pre-set industry-outlined thresholds for performance counters/metrics that should be adjusted to your specific conditions. Manageability and Administrative Overhead Native database monitoring is usually handled by a database administrator (DBA) team. If it needs to be automated, expanded, or have any other modifications, then DBA/development teams would handle that. This can be efficiently managed by DBAs in a large enterprise environment on a rudimental level for internal DBA specific use cases. Bringing in a third-party system (like an EMS) requires management. Hypothetically, a vendor has installed and configured monitoring for your company. That partnership can continue, or internal personnel can take over EMS management (with appropriate training). There is no "wrong" approach — it solely depends on your company's operating model and is assessed accordingly. Data Access and Audit Compliance Monitoring Your databases must be secure! Unauthorized access to sensitive data could be as harmful as data loss. Data breaches, malicious activities (intentional or not) — no company would be happy with such publicity. That brings us to audit compliance and data access monitoring. There are many laws and regulations around data compliance. Some are common between industries, some are industry-specific, and some are country-specific. For instance, SOX compliance is required for all public companies in numerous countries, and US healthcare must follow HIPAA regulations. Database management teams must implement a set of policies, procedures, and processes to enforce laws and regulations applicable to their company. Audit reporting could be a tedious and cumbersome process, but it can and should be automated. While implementing audit compliance and data access monitoring, you can improve your database audit reporting, as well — it's virtually the same data set. What do we need to monitor to comply with various laws and regulations? These are normally mandatory: Access changes and access attempts Settings and/or objects modifications Data modifications/access Database backups Who should be monitored? Usually, access to make changes to a database or data is strictly controlled: Privileged accounts – usually DBAs; ideally, they shouldn't be able to access data, but that is not always possible in their job so activity must be monitored Service accounts – either database or application service accounts with rights to modify objects or data "Power" accounts – users with rights to modify database objects or data "Lower" accounts – accounts with read-only activity As with performance monitoring, most database engines provide a set of auditing tools and mechanisms. Another option is third-party compliance software, which uses database-native auditing, logs, and tracing to capture compliance-related data. It provides audit data storage capabilities and, most importantly, a set of compliance reports and dashboards to adhere to a variety of compliance policies. Compliance complexity directly depends on regulations that apply to your company and the diversity and size of your database ecosystem. While we monitor access and compliance, we want to ensure that our data is not being misused. An adequate measure should be in place for when unauthorized access or abnormal data usage is detected. Some audit compliance monitoring systems provide means to block abnormal activities. Data Corruption and Threats Database data corruption is a serious issue that could lead to a permanent loss of valuable data. Commonly, data corruption occurs due to hardware failures, but it could be due to database bugs or even bad coding. Modern database engines have built-in capabilities to detect and sometimes prevent data corruption. Data corruption will generate an appropriate error code that should be monitored and highlighted. Checking database integrity should be a part of the periodical maintenance process. Other threats include intentional or unintentional data modification and ransomware. While data corruption and malicious data modification can be detected by DBAs, ransomware threats fall outside of the monitoring scope for database professionals. It is imperative to have a bulletproof backup to recover from those threats. Key Database Performance Metrics Database performance metrics are extremely important data points that measure the health of database systems and help database professionals maintain efficient support. Some of the metrics are specific to a database type or vendor, and I will generalize them as "internal counters." Availability The first step in monitoring is to determine if a device or resource is available. There is a thin line between system and database availability. A database could be up and running, but clients may not be able to access it. With that said, we need to monitor the following metrics: Network status – Can you reach the database over the network? If yes, what is the latency? While network status may not commonly fall into the direct responsibility of a DBA, database components have configuration parameters that might be responsible for a loss of connectivity. Server up/down Storage availability Service up/down – another shared area between database and OS support teams Whether the database is online or offline CPU, Memory, Storage, and Database Internal Metrics The next important set of server components which could, in essence, escalate into an availability issue are CPU, memory, and storage. The following four performance areas are tightly interconnected and affect each other: Lack of available memory High CPU utilization Storage latency or throughput bottleneck Set of database internal counters which could provide more content to utilization issues For instance, lack of memory may force a database engine to read and write data more frequently, creating contention on the IO system. 100% CPU utilization could often cause an entire database server to stop responding. Numerous database internal counters can help database professionals analyze use trends and identify an appropriate action to mitigate potential impact. Observability Database observability is based on metrics, traces, and logs — what we supposedly collected based on the discussion above. There are a plethora of factors that may affect system and application availability and customer experience. Database performance metrics are just a single set of possible failure points. Supporting the infrastructure underneath a database engine is complex. To successfully monitor a database, we need to have a clear picture of the entire ecosystem and the state of its components while monitoring. Relevant performance data collected from various components can be a tremendous help in identifying and addressing issues before they occur. The entire database monitoring concept is data driven, and it is our responsibility to make it work for us. Monitoring data needs to tell us a story that every consumer can understand. With database observability, this story can be transparent and provide a clear view of your database estate. Balanced Monitoring As you could gather from this article, there are many points of failure in any database environment. While database monitoring is the responsibility of database professionals, it is a collaborative effort of multiple teams to ensure that your entire IT ecosystem is operational. So what's considered "too much" monitoring and when is it not enough? I will use DBAs' favorite phrase: it depends. Assess your environment – It would be helpful to have a configuration management database. If you don't, create a full inventory of your databases and corresponding applications: database sizes, number of users, maintenance schedules, utilization times — as many details as possible. Assess your critical systems – Outline your critical systems and relevant databases. Most likely those will fall into a category of maximum monitoring: availability, performance, activity, and compliance. Assess your budget – It's not uncommon to have a tight cash flow allocated to IT operations. You may or may not have funds to purchase a "we-monitor-everything" system, and certain monitoring aspects would have to be developed internally. Find a middle ground – Your approach to database monitoring is unique to your company's requirements. Collecting monitoring data that has no practical or actionable applications is not efficient. Defining actionable KPIs for your database monitoring is a key to finding a balance — monitor what your team can use to ensure systems availability, stability, and satisfied customers. Remember: Successful database monitoring is data-driven, proactive, continuous, actionable, and collaborative. This is an article from DZone's 2023 Database Systems Trend Report.For more: Read the Report
In this blog, you will take a closer look at Podman Desktop, a graphical tool when you are working with containers. Enjoy! Introduction Podman is a container engine, just as Docker is. Podman commands are to be executed by means of a CLI (Command Line Interface), but it would come in handy when a GUI would be available. That is exactly the purpose of Podman Desktop! As stated on the Podman Desktop website: “Podman Desktop is an open source graphical tool enabling you to seamlessly work with containers and Kubernetes from your local environment.” In the next sections, you will execute most of the commands as executed in the two previous posts. If you are new to Podman, it is strongly advised to read those two posts first before continuing. Is Podman a Drop-in Replacement for Docker? Podman Equivalent for Docker Compose Sources used in this blog can be found on GitHub. Prerequisites Prerequisites for this blog are: Basic Linux knowledge, Ubuntu 22.04 is used during this blog; Basic Podman knowledge, see the previous blog posts; Podman version 3.4.4 is used in this blog because that is the version available for Ubuntu although the latest stable release is version 4.6.0 at the time of writing. Installation and Startup First of all, Podman Desktop needs to be installed, of course. Go to the downloads page. When using the Download button, a flatpak file will be downloaded. Flatpak is a framework for distributing desktop applications across various Linux distributions. However, this requires you to install flatpak. A tar.gz file is also available for download, so use this one. After downloading, extract the file to /opt: Shell $ sudo tar -xvf podman-desktop-1.2.1.tar.gz -C /opt/ In order to start Podman Desktop, you only need to double-click the podman-desktop file. The Get Started with Podman Desktop screen is shown. Click the Go to Podman Desktop button, which will open the Podman Desktop main screen. As you can see from the screenshot, Podman Desktop detects that Podman is running but also that Docker is running. This is already a nice feature because this means that you can use Podman Desktop for Podman as well as for Docker. At the bottom, a Docker Compatibility warning is shown, indicating that the Docker socket is not available and some Docker-specific tools will not function correctly. But this can be fixed, of course. In the left menu, you can find the following items from top to bottom: the dashboard, the containers, the pods, the images, and the volumes. Build an Image The container image you will try to build consists out of a Spring Boot application. It is a basic application containing one Rest endpoint, which returns a hello message. There is no need to build the application. You do need to download the jar-file and put it into a target directory at the root of the repository. The Dockerfile you will be using is located in the directory podman-desktop. Choose in the left menu the Images tab. Also note that in the screenshot, both Podman images and Docker images are shown. Click the Build an Image button and fill it in as follows: Containerfile path: select file podman-desktop/1-Dockerfile. Build context directory: This is automatically filled out for you with the podman-desktop directory. However, you need to change this to the root of the repository; otherwise, the jar-file is not part of the build context and cannot be found by Podman. Image Name: docker.io/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT Container Engine: Podman Click the Build button. This results in the following error: Shell Uploading the build context from <user directory>/mypodmanplanet...Can take a while... Error:(HTTP code 500) server error - potentially insufficient UIDs or GIDs available in user namespace (requested 262143:262143 for /var/tmp/libpod_builder2108531042/bError:Error: (HTTP code 500) server error - potentially insufficient UIDs or GIDs available in user namespace (requested 262143:262143 for /var/tmp/libpod_builder2108531042/build/.git): Check /etc/subuid and /etc/subgid: lchown /var/tmp/libpod_builder2108531042/build/.git: invalid argument This error sounds familiar because the error was also encountered in a previous blog. Let’s try to build the image via the command line: Shell $ podman build . --tag docker.io/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT -f podman-desktop/1-Dockerfile The image is built without any problem. An issue has been raised for this problem. At the time of writing, building an image via Podman Desktop is not possible. Start a Container Let’s see whether you can start the container. Choose in the left menu the Containers tab and click the Create a Container button. A choice menu is shown. Choose an Existing image. The Images tab is shown. Click the Play button on the right for the mypodmanplanet image. A black screen is shown, and no container is started. Start the container via CLI: Shell $ podman run -p 8080:8080 --name mypodmanplanet -d docker.io/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT The running container is now visible in Podman Desktop. Test the endpoint, and this functions properly. Shell $ curl http://localhost:8080/hello Hello Podman! Same conclusion as for building the image. At the time of writing, it is not possible to start a container via Podman Desktop. What is really interesting is the actions menu. You can view the container logs. The Inspect tab shows you the details of the container. The Kube tab shows you what the Kubernetes deployment yaml file will look like. The Terminal tab gives you access to a terminal inside the container. You can also stop, restart, and remove the container from Podman Desktop. Although starting the container did not work, Podman Desktop offers some interesting features that make it easier to work with containers. Volume Mount Remove the container from the previous section. You will create the container again, but this time with a volume mount to a specific application.properties file, which will ensure that the Spring Boot application runs on port 8082 inside the container. Execute the following command from the root of the repository: Shell $ podman run -p 8080:8082 --volume ./properties/application.properties:/opt/app/application.properties:ro --name mypodmanplanet -d docker.io/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT The container is started successfully, but an error message is shown in Podman Desktop. This error will show up regularly from now on. Restarting Podman Desktop resolves the issue. An issue has been filed for this problem. Unfortunately, the issue cannot be reproduced consistently. The volume is not shown in the Volumes tab, but that’s because it is an anonymous volume. Let’s create a volume and see whether this shows up in the Volumes tab. Shell $ podman volume create myFirstVolume myFirstVolume The volume is not shown in Podman Desktop. It is available via the command line, however. Shell $ podman volume ls DRIVER VOLUME NAME local myFirstVolume Viewing volumes is not possible with Podman Desktop at the time of writing. Delete the volume. Shell $ podman volume rm myFirstVolume myFirstVolume Create Pod In this section, you will create a Pod containing two containers. The setup is based on the one used for a previous blog. Choose in the left menu the Pods tab and click the Play Kubernetes YAML button. Select the YAML file Dockerfiles/hello-pod-2-with-env.yaml. Click the Play button. The Pod has started. Check the Containers tab, and you will see the three containers which are part of the Pod. Verify whether the endpoints are accessible. Shell $ curl http://localhost:8080/hello Hello Podman! $ curl http://localhost:8081/hello Hello Podman! The Pod can be stopped and deleted via Podman Desktop. Sometimes, Podman Desktop stops responding after deleting the Pod. After a restart of Podman Desktop, the Pod can be deleted without experiencing this issue. Conclusion Podman Desktop is a nice tool with some fine features. However, quite some bugs were encountered when using Podman Desktop (I did not create an issue for all of them). This might be due to the older version of Podman, which is available for Ubuntu, but then I would have expected that an incompatibility warning would be raised when starting Podman Desktop. However, it is a nice tool, and I will keep on using it for the time being.
This is an article from DZone's 2023 Database Systems Trend Report.For more: Read the Report In today's rapidly evolving digital landscape, businesses across the globe are embracing cloud computing to streamline operations, reduce costs, and drive innovation. At the heart of this digital transformation lies the critical role of cloud databases — the backbone of modern data management. With the ever-growing volume of data generated for business, education, and technology, the importance of scalability, security, and cloud services has become paramount in choosing the right cloud vendor. In this article, we will delve into the world of primary cloud vendors, taking an in-depth look at their offerings and analyzing the crucial factors that set them apart: scalability, security, and cloud services for cloud databases. Armed with this knowledge, businesses can make informed decisions as they navigate the vast skies of cloud computing and select the optimal vendor to support their unique data management requirements. Scaling in the Cloud One of the fundamental advantages of cloud databases is their ability to scale in response to increasing demands for storage and processing power. Scalability can be achieved in two primary ways: horizontally and vertically. Horizontal scaling, also known as scale-out, involves adding more servers to a system, distributing the load across multiple nodes. Vertical scaling, or scale-up, refers to increasing the capacity of existing servers by adding more resources such as CPU, memory, and storage. Benefits of Scalability By distributing workloads across multiple servers or increasing the resources available on a single server, cloud databases can optimize performance and prevent bottlenecks, ensuring smooth operation even during peak times. Scalability allows organizations to adapt to sudden spikes in demand or changing requirements without interrupting services. By expanding or contracting resources as needed, businesses can maintain uptime and avoid costly outages. By scaling resources on-demand, organizations can optimize infrastructure costs, paying only for what they use. This flexible approach allows for more efficient resource allocation and cost savings compared to traditional on-premises infrastructure. Examples of Cloud Databases With Scalability Several primary cloud vendors offer scalable cloud databases designed to meet the diverse needs of organizations. The most popular releases encompass database platforms from licensed versions to open source, such as MySQL and PostgreSQL. In public clouds, there are three major players in the arena: Amazon, Microsoft Azure, and Google. The major cloud vendors offer managed cloud databases in various flavors of both licensed and open-source database platforms. These databases are easily scalable in storage and compute resources, but all controlled through service offerings. Scalability is about more power in the cloud, although some cloud databases are able to scale out, too. Figure 1: Scaling up behind the scenes in the cloud Each cloud vendor provides various high availability and scalability options with minimal manual intervention, allowing organizations to scale instances up or down and add replicas for read-heavy workloads or maintenance offloading. Securing Data in the Cloud As organizations increasingly embrace cloud databases to store and manage their sensitive data, ensuring robust security has become a top priority. While cloud databases offer numerous advantages, they also come with potential risks, such as data breaches, unauthorized access, and insider threats. In this section, we will explore the security features that cloud databases provide and discuss how they help mitigate these risks. Common Security Risks Data breaches aren't a question of if, but a question of when. Unauthorized access to sensitive data can lead to data access by those who shouldn't, potentially resulting in reputational damage, financial losses, and regulatory penalties. It shouldn't surprise anyone that cloud databases can be targeted by cybercriminals attempting to gain unauthorized access to data. This risk makes it essential to implement strict access controls at all levels — cloud, network, application, and database. As much as we don't like to think about it, disgruntled employees or other insiders can pose a significant threat to organizations' data security, as they may have legitimate access to the system but misuse it for malicious or unintentional abuse. Security Features in Cloud Databases One of the largest benefits of a public cloud vendor is the numerous first-party and partner security offerings, which can offer better security for cloud databases. Cloud databases offer robust access control mechanisms, such as role-based access control (RBAC) and multi-factor authentication (MFA), to ensure that only authorized users can access data. These features help prevent unauthorized access and reduce the risk of insider threats. Figure 2: Database security in the public cloud The second most implemented protection method is encryption and data level protection. To protect data from unauthorized access, cloud databases provide various encryption methods. These different levels and layers of encryption help secure data throughout its lifecycle. Encryption comes in three main methods: Encryption at rest protects data stored on a disk by encrypting it using strong encryption algorithms. Encryption in transit safeguards data as it travels between the client and the server or between different components within the database service. Encryption in use encrypts data while it is being processed or used by the database, ensuring that data remains secure even when in memory. Compliance and Regulations Cloud database providers often adhere to strict compliance standards and regulations, such as the General Data Protection Regulation (GDPR), the Health Insurance Portability and Accountability Act (HIPAA), and the Payment Card Industry Data Security Standard (PCI-DSS). Compliance with these regulations helps ensure that organizations meet their legal and regulatory obligations, further enhancing data security. Integrating cloud databases with identity and access management (IAM) services, such as AWS Identity and Access Management, Azure Active Directory, and Google Cloud Identity, helps enforce strict security and access control policies. This integration ensures that only authorized users can access and interact with the cloud database, enhancing overall security. Cloud Services and Databases Cloud databases not only provide efficient storage and management of data but can also be seamlessly integrated with various other cloud services to enhance their capabilities. By leveraging these integrations, organizations can access powerful tools for insights, analytics, security, and quality. In this section, we will explore some popular cloud services that can be integrated with cloud databases and discuss their benefits. Cloud Machine Learning Services Machine learning services in the cloud enable organizations to develop, train, and deploy machine learning models using their cloud databases as data sources. These services can help derive valuable insights and predictions from stored data, allowing businesses to make data-driven decisions and optimize processes. With today's heavy investment in artificial intelligence (AI), no one should be surprised that Cloud Services for AI are at the top of the services list. AI services in the cloud, such as natural language processing, computer vision, and speech recognition, can be integrated with cloud databases to unlock new capabilities. These integrations enable organizations to analyze unstructured data, automate decision-making, and improve user experiences. Cloud Databases and Integration Integrating cloud databases with data warehouse solutions, such as Amazon Redshift, Google BigQuery, Azure Synapse Analytics, and Snowflake, allows organizations to perform large-scale data analytics and reporting. This combination provides a unified platform for data storage, management, and analysis, enabling businesses to gain deeper insights from their data. Along with AI and machine learning, cloud databases can be integrated with business intelligence (BI) tools like Tableau, Power BI, and Looker to create visualizations and dashboards. By connecting BI tools to cloud databases, organizations can easily analyze and explore data, empowering them to make informed decisions based on real-time insights. Data streaming and integrating cloud databases with services like Amazon Kinesis, Azure Stream Analytics, and Google Cloud Pub/Sub enable organizations to process and analyze data in real time, providing timely insights and improving decision-making. By integrating cloud databases with monitoring and alerting services, such as Amazon CloudWatch, Azure Monitor, and Google Cloud Monitoring, organizations can gain insights into the health and performance of their databases. These services allow businesses to set up alerts, monitor key performance indicators (KPIs), and troubleshoot issues in real time. Data Pipelines and ETL Services Data pipelines and ETL services are the final services from the category of integration, such as AWS Glue, Azure Data Factory, and Google Cloud Data Fusion, that can be integrated with relational cloud databases to automate data ingestion, transformation, and loading processes, ensuring seamless data flow between systems. Conclusion The scalability of cloud databases is an essential factor for organizations looking to manage their growing data needs effectively. Along with scalability, security plays a critical aspect of cloud databases, and it is crucial for organizations to understand the features and protections offered by their chosen provider. By leveraging robust access control, encryption, and compliance measures, businesses can significantly reduce the risks associated with data breaches, unauthorized access, and insider threats, ensuring that their sensitive data remains secure and protected in the cloud. Finally, to offer the highest return on investment, integrating cloud databases with other services unlocks the powerful analytics and insights available in the public cloud. By leveraging these integrations, organizations can enhance the capabilities of their cloud databases and optimize their data management processes, driving innovation and growth in the digital age. This is an article from DZone's 2023 Database Systems Trend Report.For more: Read the Report
The Pitfalls of Using General AI in Software Development: A Case for a Human-Centric Approach
October 1, 2023 by CORE
Choreography Pattern: Optimizing Communication in Distributed Systems
September 30, 2023 by CORE
The Pitfalls of Using General AI in Software Development: A Case for a Human-Centric Approach
October 1, 2023 by CORE
How to Debug an Unresponsive Elasticsearch Cluster
October 1, 2023 by CORE
Explainable AI: Making the Black Box Transparent
May 16, 2023 by
How to Debug an Unresponsive Elasticsearch Cluster
October 1, 2023 by CORE
Bringing Healthcare Into the Cloud-Driven Future
October 1, 2023 by CORE
Low Code vs. Traditional Development: A Comprehensive Comparison
May 16, 2023 by
How to Debug an Unresponsive Elasticsearch Cluster
October 1, 2023 by CORE
Enhancing ASP.NET Core Web API Responses With Consistent and Predictable Wrapper Classes
September 29, 2023 by
How to Debug an Unresponsive Elasticsearch Cluster
October 1, 2023 by CORE
Bringing Healthcare Into the Cloud-Driven Future
October 1, 2023 by CORE
Low Code vs. Traditional Development: A Comprehensive Comparison
May 16, 2023 by
The Pitfalls of Using General AI in Software Development: A Case for a Human-Centric Approach
October 1, 2023 by CORE
Exploring the Breadth First: Understanding Breadth-First Search
October 1, 2023 by
Five IntelliJ Idea Plugins That Will Change the Way You Code
May 15, 2023 by