DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations

DZone Spotlight

Tuesday, June 13 View All Articles »
A Practical Guide for Container Security

A Practical Guide for Container Security

By Akanksha Pathak
This is an article from DZone's 2023 Containers Trend Report.For more: Read the Report As containerized architecture gains momentum, businesses are realizing the growing significance of container security. While containers undeniably offer profound benefits, such as portability, flexibility, and scalability, they also introduce unprecedented security challenges. In this report, we will address the fundamental principles and strategies of container security and delve into two specific methods — secrets management and patching. Additionally, we will examine tools and techniques for securing keys, tokens, and passwords. Current Trends in Container Security Developers and DevOps teams have embraced the use of containers for application deployment. In a report, Gartner stated, "By 2025, over 85% of organizations worldwide will be running containerized applications in production, a significant increase from less than 35% in 2019." On the flip side, various statistics indicate that the popularity of containers has also made them a target for cybercriminals who have been successful in exploiting them. According to a survey released in a 2023 State of Kubernetes security report by Red Hat, 67% of respondents stated that security was their primary concern when adopting containerization. Additionally, 37% reported that they had suffered revenue or customer loss due to a container or Kubernetes security incident. These data points emphasize the significance of container security, making it a critical and pressing topic for discussion among organizations that are currently using or planning to adopt containerized applications. Strategies for Container Security Container security can be most effectively handled using a comprehensive multi-level approach, each involving different strategies and principles. Employing this approach minimizes the risk of exposure and safeguards the application against threats. Figure 1: Multi-layer approach to container security Application security focuses on securing the application that is executed within the container, which can be achieved by implementing input validation, secure coding practices, and encryption. In contrast, the container runtime environment should undergo regular vulnerability scans and patching. Finally, the host layer is considered the most critical security layer because it is responsible for running the containers. It can be secured by implementing baseline configurations to harden the host operating system, deploying firewalls, implementing network segmentation, and using intrusion detection and intrusion prevention systems. Each layer of the container infrastructure provides an opportunity to apply a set of overarching principles and strategies for security. Below, we've outlined some key strategies to help provide a better understanding of how these principles can be put into action. SECURING CONTAINERIZED ENVIRONMENTS Core Principles and Strategies Description Secure by design Least privilege, separation of duty, defense in depth Risk assessment Vulnerability scanning and remediation, threat modeling, security policy Access management RBAC, MFA, centralized identity management Runtime security Network segmentation, container isolation, intrusion detection and prevention Incident management and response Log management, incident planning and response, continuous monitoring Container Segmentation To secure communication within a container segment, containers can be deployed as microservices, ensuring that only authorized connections are allowed. This is achieved using cloud-native container firewalls, container zones, service mesh technologies, etc., that control the traffic to the virtual network using granular policies. While network segmentation divides the physical network into sub-networks, container segmentation works on an overlay network to provide additional controls for resource-based identity. Image Scanning Before deploying containers, it is important to analyze the container base images, libraries, and packages for any vulnerabilities. This can be accomplished by utilizing image scanning tools, such as Anchore and Docker Scout. Runtime Protection To identify and respond to potential security incidents in real time, it is crucial to monitor activities within the container. Runtime security tools can assist in this task by identifying unauthorized access, malware, and anomalous behavior. Access Control To minimize the possibility of unauthorized access to the host machine, only authorized personnel should be granted access to the containerized environment. Strong authentication and authorization mechanisms, such as multifactor authentication, role-based access control (RBAC), and OAuth, could be deployed for this purpose. Secrets Management in Container Security Secrets management protects against both external and internal threats and simplifies credential management by centralizing it. It attempts to protect sensitive information (keys, tokens, etc.) that controls access to various services, container resources, and databases. Ultimately, it ensures that sensitive data is kept secure and meets regulatory compliance requirements. Due to the importance of secrets, they should always be encrypted and stored securely. Mishandling of this information can lead to data leakage, breach of intellectual property, and losing customer trust. Common missteps include secrets being stored in plain text, hardcoding them, or committing them to source control system/repository. Overview of Common Types of Secrets To ensure the security of secrets, it's crucial to have a clear understanding of the various types: Passwords are the most commonly used secret. They are used to authenticate users and provide access to web services, databases, and other resources in the container. Keys serve multiple purposes, such as encrypting and decrypting data and providing authentication for devices and services. Common key types include SSH keys, API keys, and encryption keys. Tokens are used to provide temporary access to resources or services. Authentication tokens, such as access tokens, OAuth, and refresh tokens, are used to authenticate third-party services or APIs. Database credentials could be usernames, passwords, and connection strings that are used to access the database and database-specific secrets. Overview of Popular Secrets Management Tools When evaluating a security solution, it's important to consider a range of factors, such as encryption, access control, integration capabilities, automation, monitoring, logging, and scalability. These are all desirable traits that can contribute to a robust and effective security posture. Conversely, pitfalls such as lack of transparency, limited functionality, poor integration, and cost should also be considered. In addition to the above-listed capabilities, a comprehensive evaluation of a security solution also takes into account the specific needs and requirements of a company's current infrastructure (AWS, Azure, Google Cloud, etc.) and compatibility with its existing tools to ensure the best possible outcome. Below is the list of some proven tools in the industry for your reference: HashiCorp Vault – An open-source tool that provides features like centralized secrets management, secrets rotation, and dynamic secrets. Kubernetes Secrets – A built-in secrets management tool within the Kubernetes environment that allow users to store sensitive information such as Kubernetes objects. It is advised to use encryption, RBAC rules, and other security best practices for configuration when using Kubernetes Secrets. AWS Secrets Manager – A cloud-based tool that is both scalable and highly available. It supports containers running on Amazon ECS and EKS, provides automatic secret rotation, and can integrate with AWS services, like Lambda. Azure Key Vault – Usually used by containers running on Azure Kubernetes Service. It can support various key types and integrates with most Azure services. Docker Secrets – A built-in secrets management tool that can store and manage secrets within Docker Swarm. Note that this tool is only available for Swarm services and not for standalone containers. Short-Lived Secrets An emerging trend in the field of secrets management is the use of short-lived secrets that have a limited lifespan, are automatically rotated at regular intervals, and are generated on demand. This is a response to the risk associated with longlived, unencrypted secrets, as these new secrets typically only last for a matter of minutes or hours and are automatically deleted once they expire. Patching in Container Security To reduce exposure risk from known threats, it is important to ensure that containers are using the latest software versions. Patching ensures that the software is regularly updated to address any open vulnerabilities. If patching is not applied, malicious actors can exploit vulnerabilities and cause malware infections and data breaches. Mature organizations use automated patching tools to keep their container environments up to date. Patching Tools To keep container images up to date with the latest security patches, there are many tools available in the market. Kubernetes and Swarm are the most widely used orchestration tools that provide a centralized platform and allow users to automate container deployment. Ansible and Puppet are other popular automation tools used for automated deployment of patches for Docker images. Best Practices for Implementing Patching Applying patches in a container environment can significantly enhance the security posture of an organization, provided they follow industry best practices: Scan containers on a periodic basis to identify vulnerabilities and keep the base images up to date. Use an automatic patching process with automated tools as much as possible to reduce manual intervention. Use official images and test patches in a testing environment before deploying into production. Track the patching activity, monitor logs, and act on alerts or issues generated. Create automated build pipelines for testing and deploying containers that are patched. Conclusion As more organizations adopt containerized environments, it's vital to understand the potential security risks and take a proactive approach to container security. This involves implementing core security principles and strategies, using available tools, and prioritizing the security of containerized data and applications. Strategies like multi-layered security, secrets management, and automated patching can help prevent security breaches. Additionally, integrating patching processes with CI/CD pipelines can improve efficiency. It's important for organizations to stay up to date with the latest container security trends and solutions to effectively protect their containerized environments. This is an article from DZone's 2023 Containers Trend Report.For more: Read the Report More
From CPU to Memory: Techniques for Tracking Resource Consumption Over Time

From CPU to Memory: Techniques for Tracking Resource Consumption Over Time

By Denis Matveev
Sometimes, it is necessary to examine the behavior of a system to determine which process has utilized its resources, such as memory or CPU time. These resources are often scarce and may not be easily replenished, making it important for the system to record its status in a file. By doing so, it becomes feasible to identify the most resource-intensive process in the past. If the system has not encountered an Out-of-Memory (OOM) killer, which can be found in the syslog, this information can be used to further pinpoint the problematic process. Atop Tool: An Overview There is a special tool that can be used both for real-time monitoring system usage and collecting system status into logs in the background. This is atop. With atop, you can gather information on CPU and memory usage, which can also be collected by other popular monitoring tools like top and htop. Additionally, atop provides insights into I/O and network usage, eliminating the need to install additional tools for network and I/O monitoring, such as iftop and iostat. In my opinion, atop is a versatile tool for many tasks. Atop is an open-source project and is available for most Linux distributions. What Is Atop Used For? Atop can be used for incident investigations in a Linux environment. Atop is a system resource monitor that can provide detailed information about system activity, including CPU, memory, and disk usage, as well as process-level activity During an incident investigation, atop can help you identify which processes were running at the time of the incident, how many resources they were consuming, and whether there were any spikes in resource usage that may have contributed to the incident. You can also use atop to monitor specific system components, such as network activity, and track changes over time. Basic use cases are listed below: Real-time resources monitoring Incidents analysis of the system behavior Capacity planning Resource allocation For most of the cases in the list, you can use modern monitoring systems like Zabbix and Prometheus. In my personal experience, I find atop to be a useful tool for troubleshooting and identifying the root cause of issues. While special monitoring systems can provide consolidated data on resource usage, they may not be able to answer specific questions about which processes led to server inaccessibility. Atop, on the other hand, can provide detailed information on individual processes, making it easier to differentiate between them and understand their impact on system performance. General principles working with atop: Real-time monitoring Incident investigation The first approach can be helpful for debugging or profiling your application, providing insights into its behavior and performance. On the other hand, the second approach is more useful for incident investigations, allowing you to identify the root cause of system failures or performance issues. Setting Up For writing logs, you should launch a demon: Shell # systemctl start atop It is recommended to change the interval for collecting data: Shell # vi /lib/systemd/system/atop.service You can find the env variable: Shell LOGINTERVAL=60 Change this value (in seconds) and reload the systemd unit configuration: Shell # daemon-reload Then start: Shell # systemctl start atop After that, atop will write info into a log file every 60 seconds (as above). Real-Time Monitoring Practical Examples Launching 1. To launch the utility type: Shell # atop In a terminal and track resource consumption: 2. In order to change the interval, press 'I' and enter the number in seconds: I prefer to set up an interval of 1-2 seconds. 3. In case the consumption of server resources reaches a critical value, it will be marked with a specific color: Red if consumption is critical Cyan if consumption is almost critical(80% of critical) The amount considered critical varies for different resources: 90% utilization of CPU 70% usage of disk 90% of network bandwidth 90% of memory occupation 80% of SWAP Of course, these parameters can be modified. Pay attention, the CPU has two cores, and you can see utilization distribution among these cores. 4. For killing a process, press ‘k’ and then type a PID of the process to be killed(it’s similar to ‘top’). Further, you can specify a signal to be sent to a process. Output Options Resource Related Output 1. To show commands how they have been run, type ‘c’: 2. If you would like to show all about memory, use the ‘m’ key: 3. There is ‘g’ for showing generic output. It might be needed when you want to revert to initial output. This is the default output. 4. For output of disk things, press ‘d’: 5. Network-related output (UDP, TCP, and bandwidth). For this, press ‘n’: Please, take into account that a kernel module netatop must be installed. Otherwise, atop won’t be out network-related information. This module allows us to show network activity per process. Refer to the official web page. So, we considered basic options, which is enough for most cases. Also, there are interesting options I recommend considering: ‘y’ — for showing per thread. It is a very useful functionality for examining the behavior of multi-threaded applications(or for debugging such apps). ‘e’ — shows GPU utilization ‘o’ — if you’d like to customize the output, it’s possible in ~/.atoprc, then you can use your own output just by pressing ‘o’ ‘z’ — if you need to pause your atop Aggregation Functions Top of Resources Eaters 1. Switch to show output accumulated per user, push ‘u’: 2. Output per process, hit ‘p’: 3. For output processes accumulated per Docker container, there is ‘j’ key: Where ‘host’ — host native processes. For observing only a specific container, use ‘J’ for this. Sorting Options 1. For sorting by CPU usage, press shift + ‘c’(or capital C) This is default behavior. 2. Sort by memory usage, hit shift + ‘m’(capital M) 3. Sort by disk usage, hit shift + ‘d’(capital D) 4. Network utilization sorting, use shift + ‘n’ (capital N) 5. If you are tracking threads, there is option ‘Y’ to aggregate threads by the process. Note. Sorting and output modifiers are different and should be used in combination. Incidents Examining (Looking to the Past) All those rules for real-time monitoring work for looking for events in logs. Initially, we need to start reading logs instead of real-time status output: Shell # atop -r /var/log/atop/atop.log Will read the log file. Navigating Navigate within the file using the t (forward) and shift+t keys (back). This allows you to go to the next sample or go back to the previous one. Time Limit There are options to limit time: Shell # atop -r /var/log/atop/atop.log -b 1400 Opens atop from 14:00 of the current day to the end of the current log file: <screencast> Shell # atop -r /var/log/atop/atop_20230523.log -b 1400 Opens file written on 25 of May 2023 year after 14:00, and navigates until 23:59 of the 25 of May: <screencast> Shell # atop -r /var/log/atop/atop_20230525 -b 14:00 -e 16:00 You’ll see records from 14:00 until 16:00 written on 25 of May 2023: <screencast> In case your system does not rotate logs, you can use atop's begin and end limitations in such view: Shell [-b [YYYYMMDD]hhmm ] [-e [YYYYMMDD]hhmm ] As was told above, sorting, aggregating data, and showing specific output related to some resources all these work perfectly in this mode. Other Atop Capabilities Atop has a unique feature that allows users to create charts directly in their terminal. To use this feature, you need only Python and pip, then install a specific package atopsar-plot, and you are able to visualize historical data. While this feature may not be particularly useful for modern systems that are already under monitoring, it's worth noting as an additional capability of the program. Monitor a Process Resource Consumption When it comes to monitoring a server, having the right tools in place is crucial to ensure optimal performance and identify potential issues. Two popular systems for server monitoring are Zabbix and Prometheus, both of which are capable of monitoring various process resources consumptions such as memory, CPU, and disk usage. These systems can extract information about a process from the /proc filesystem and send it to the server for storage. I should tell you monitoring systems extract info about spending resources by a specific process only or totally by all processes with no differentiation. Atop, in this case, is a powerful tool. Atop vs. Top While both atop and top are system performance monitoring tools, they differ in their capabilities and level of detail. Top is a simple command-line utility that provides a basic overview of the system's current processes and their resource usage. It is useful for quickly identifying processes that are consuming significant resources, but it does not provide detailed information on system activity. Atop, on the other hand, provides a more detailed report of system activity, including CPU usage, memory usage, and disk I/O. It can also monitor system activity over a period of time, making it useful for analyzing long-term trends and identifying patterns. Conclusion Atop is a powerful tool for system performance monitoring and analysis. It provides detailed information on system activity and can be used to diagnose and troubleshoot performance issues, plan for future capacity requirements, monitor security and compliance and allocate resources effectively. While it may be more complex than traditional tools like top, it offers greater insight into system activity and can be an invaluable tool for system administrators and IT professionals. More

Trend Report

Containers

The proliferation of containers in recent years has increased the speed, portability, and scalability of software infrastructure and deployments across all kinds of application architectures and cloud-native environments. Now, with more and more organizations migrated to the cloud, what's next? The subsequent need to efficiently manage and monitor containerized environments remains a crucial task for teams. With organizations looking to better leverage their containers — and some still working to migrate out of their own monolithic environments — the path to containerization and architectural modernization remains a perpetual climb. In DZone's 2023 Containers Trend Report, we will explore the current state of containers, key trends and advancements in global containerization strategies, and constructive content for modernizing your software architecture. This will be examined through DZone-led research, expert community articles, and other helpful resources for designing and building containerized applications.

Containers

Refcard #335

Distributed SQL Essentials

By Andrew Oliver
Distributed SQL Essentials

Refcard #384

Advanced Cloud Security

By Samir Behara CORE
Advanced Cloud Security

More Articles

React Helpful Hints Series: Volume 2
React Helpful Hints Series: Volume 2

React is a popular JavaScript framework that was developed by Facebook and has gained a massive following among developers due to its simplicity, modularity, and reusability. In this series of short “bloglets,” our team will cover a wide array of React topics, including developer tips, issues, and experiences. We are hopeful that everyone from the beginner to the seasoned professional will find something useful in each post. Using React Context By Jeff Schuman Developers new to React have a tendency to over-use property propagation to nested child components. The term for this is prop drilling. Prop drilling is generally frowned upon as it inhibits clean, reusable, and DRY code. One alternative to prop drilling is using the React Context API. Context allows for the sharing of data in a component hierarchy without passing the data down to each component through properties. This article will give you an overview of the Context API through an example. A typical use case for Context is application theming. For our example, we’ll give our user the ability to easily enlarge the text for each UI widget. Here’s what our final product looks like: By selecting the dropdown (currently set to ‘Normal’), the user can increase the size of the elements inside the box. Other options include ‘Enlarged’ and ‘Gigantic.’ Here’s a breakdown of the various React components discussed below: The first step in using React Context is to create the Context itself. It is generally a good idea to use a separate module to create the context so it can be reused. Here is our simple ThemeContext.js file: Note that we are using the React API createContext() function and initializing it with a null value. Later, we will ensure that the context has valid data. In our AppContainer component, we’ll create a state to capture the current theme. This state can be changed by manipulation of the dropdown. The theme state is defined and updated. The dropdown updates the theme state value. Next, we need to provide our context to a set of components. We do this by importing the context and then using its Provider to encapsulate the components that have access to the Context data. We import the ThemeContext from our context module and then use the ThemeContext.Provider component to wrap the components that have access to the context. NOTE that we set the value of the context to our theme state data by using the value prop of the Provider component. Let’s take a look at how we can now read the context data in a component within the Provider’s hierarchy. Our UserList component is fairly simple: Note that we are not using Context in any way in this component. It simply instantiates a list of User names and then creates a User component for each user. Here is the User component: In this component, we can see how to gain access to the Context data. We import the useContext hook AND the ThemeContext module. We invoke the useContext hook, passing in the Context that we created in the module, and the return value is the context data. In our example, we simply assign the context value as the className to our <span> content. Our ButtonPair and Button components work similarly: As does the StaticText component: ...add in some relatively simple css for each class: And putting it all together, you have a functioning application where modifying the theme affects all the visual elements within the Context Provider: In Summary The React Context API is one solution to prop drilling in React components. Start by creating a context object. Import the context object in your component and use its Provider component to establish the container of components (and their descendant components) that will have access to the context data. You’ll also want to set (and provide for modification) of your context data here. Wherever the context is needed, import the context object and use the useContext hook to gain access to the context data. Using the Context API helps keep your components clean and DRY, but there can be drawbacks: Component reusability can suffer when using the Context API. We’ve effectively created a dependency on the context data on any component within the hierarchy that uses that context. Attempting to use the component outside the Provider hierarchy is not advisable. There can be performance concerns with using the Context API in a complicated and deep hierarchy. As the context changes, this change is broadcast throughout the hierarchy regardless of whether the component is dependent on the change or not. An alternative to using Context is Component Composition — a topic for another time! I hope you have enjoyed this overview of the React Context API and an example of its use. It is a powerful feature that can help you write clean, DRY code and avoid the messiness of passing properties throughout your component hierarchy. _____________________________________________________________________________________ Controlled Components: The Key To Consistent React Forms By Jared Mohney Controlled components are a key concept of React. With their state fully controlled by their parent component, it allows for their data to remain in sync with the rest of our application. Without controlled components, we could find ourselves with multiple sources of truth, and that’s no good! Let’s quickly look at three simple examples we can run into when building forms: Input Example In this example, we have an input element that is fully controlled. Our parent component sets the initial state via useState and handles any updates to it thereafter via handleChange. When submitted, the parent component has access to the value of the input element and can do whatever it needs to with our pizza! Checkbox Now this feels familiar. Here we have a clean-cut example of a basic checkbox, reading and updating the state of its parent instead of managing its own internally. Want to learn more about the useCallback being used by handleChange? Check out our article on rendering React reliably! Select Our final example is a multi-select dropdown, and our approach is identical. We want to hijack the internal state management of these elements so that we can establish reliable data flow within our application. I hope it’s clear now: State control isn’t scary (or difficult)! TIP: If you find yourself tackling a larger form, reaching for libraries like Formik and React Hook Form can handle a lot of this boilerplate for you (and more). In summary, by controlling the state of our form components with React, we can ensure that our data is consistent. This is important in larger applications where there may be multiple components that access the same data. With controlled components, we can avoid inconsistencies and maintain a single source of truth! _____________________________________________________________________________________ React: Converting Class-Based Components Into Functional Ones. It's Not So Bad! By Adam Boudion Introduction If you’ve worked with React for any length of time, you’ll likely know that there are two different types of React components: Class and Functional. Class components are the older way and involve extending from the internal React class called Component. Functional components, the newer way, are simple JavaScript functions that return JSX. Let’s say you find yourself working in a codebase of class-based components, and to keep things closer to the leading edge, you decide you want to update some of these class-based components to functional ones. It can be overwhelming if you’re not familiar with functional components and hooks, but it’s really not as daunting as it first seems. Consider the following class component: Nothing fancy for the sake of simplicity, but enough to allow us to understand what needs doing in order to make this into a functional component. This component takes in a single prop, name, which will be used to personalize a welcome message. It also contains a button that will use the state to keep track of how many times you click it and display that in the browser. Finally, it will display some messages in the console when the component mounts and unmounts so that we can keep track of its lifecycle. Converting Props First, we’ll address the prop. In the class component, the props object is passed into the constructor on lines 4 and 5. The prop is then used within the component by getting its value off of this.props as seen on line 18 in the example above. Functional components are a bit different. Instead of props being passed in the constructor, they’re simply passed into the component method itself, as you would any other ES6 arrow function. At that point, you can simply reference the prop directly inside the component: It’s worth noting that this particular example is destructuring the props. One could easily write the above two lines of code like this, and they would still be functionally equivalent: Converting State Next, we need to talk about the differences in state. In the class component example, we can see the state being initialized in the constructor on line 7. A single state variable called count is declared and initialized with a value of zero. That state variable is then printed to the screen as part of line 19. Then, on line 20, the button click handler will increment the state variable by one and trigger a re-render via the setState method. In functional components, the state is handled using the relatively new useState hook, though the core reactive behavior remains the same. To initialize our count variable in our new functional component, we’ll need a line of code like this: The useState hook takes in a single argument, an initial value (zero, in this case), and returns two things. The first is a new state variable, called count, in this case. The second, a method called setCount which is used to change the value of this variable. So, instead of using the setState methods as we could in the class-based example, we need to call setCount with the new value we’d like to set it to. So, lines 19 and 20 from above would become something like this: Rendering Next, we need to take a look at the different ways HTML is rendered in functional components. In our class example above, we have our familiar render method on line 15, which is responsible for rendering the HTML of our component to the virtual DOM. In a functional component, there isn’t a render method. Instead, the return statement of the component itself contains the content we’d like to render like this: Lifecycle Considerations Finally, we need to look at how to translate our familiar React class lifecycle methods into the world of functional components and hooks. This one is a little tricky, but once you understand the functional equivalent, it will start to make sense. Typically, in class-based components like the one above, there exists the componentDidMount method, which is used to run code that we’d like to execute when the component is rendered for the very first time. Conversely, there is a componentWillUnmount method, which runs when the component is about to be removed from the virtual DOM. Functional components operate quite a bit differently in this regard, in part because they utilize something called the useEffecthook. Let’s walk through what that would look like. As you can see, this looks markedly different and maybe a little daunting. But it really isn’t! Let’s break it down. The useEffect hook runs every time a component re-renders. But wait a second! We just want this to run once when the component is initialized, not after every render. Well, that’s where this tricky little empty array on line 13 comes into play. This argument is optional, and when it’s omitted, it ensures that the useEffect hook will run every time the component renders without conditions. If the array is passed but is empty, as in our example, the effect will only run once when the component is first mounted. That’s the behavior we want for our example, but it’s worth talking about what happens when you actually pass something into that array. If we passed in our count variable, for example, then this effect would be skipped on every render except for the ones where the count has changed. This is powerful as it allows the developer to optimize and cut down on excessive re-renders. Then there’s the return statement where our code that was in componentWillUnmount is now. This is a cleanup method, and it is also optional. This runs whenever the component is unmounted, which is what we want in this case. It’s important to note here that it will also run right before the same useEffect is run again to clean up from the last one. But, since our array of dependencies is empty, it will only run once when first rendered. Therefore, the cleanup method will only run once when dismounted. Conclusion So, with all of that said, let’s take a look at the final product. This code is functionally equivalent (no pun intended) to the earlier class example and uses hooks to replace the old lifecycle methods. We all know that real-world class-based component are usually not this simple, but they can be updated to be functional components using the same techniques listed here. While both methods are viable options for creating components, functional components are widely considered to be the path forward by the overall React community, including the creators of React themselves. For this reason, they have committed themselves to maintain backward compatibility for class components to avoid forcing rewrites of established code but are focusing their attention on improving functional components going forward. Because of this, you may want to consider migrating at some point so you can take advantage of the fancy new features they’re adding now and in the future.

By Joel Nylund CORE
Deploying Smart Contract on Ethereum Blockchain
Deploying Smart Contract on Ethereum Blockchain

In this tutorial, we will learn how to deploy and interact with an Ethereum Smart contract using Web3j and Spring. What Is a Smart Contract? A Smart contract is an algorithm for certain actions integrated into the blockchain code. It is deployed on the Ethereum blockchain network and automatically executes predefined actions when specific conditions in the contract are met. What Is Solidity? Solidity is an object-oriented programming language for creating Smart contracts. Learning and using Solidity is easy if you are already familiar with Java or C. Solidity has built-in features specifically related to the blockchain. They allow you to withdraw and send "money" (ETH), get the address of the person who invoked the Smart contract, and make calls to other Smart contracts using their addresses. What Is Web3j? Web3j is a lightweight Java library that allows you to work with the Ethereum blockchain, providing the ability to manage transactions and generate type-safe wrappers for Smart contracts. 1. Installing Solc and Web3j Solc is the compiler for Solidity. To install it, run the following command: Shell npm install -g solc To generate the wrapper code of a Smart contract, we need to install Web3j: Shell curl -L get.web3j.io | sh && source ~/.web3j/source.sh 2. Initializing the Spring Project To quickly start a project, you can use Spring Initializr: Download the project by clicking the "Generate" button and open it with a convenient IDE. Add the Web3j dependency in the pom.xml: XML <dependency> <groupId>org.web3j</groupId> <artifactId>core</artifactId> <version>4.10.0</version> </dependency> 3. Creating a Smart Contract Let’s create the Counter.sol file as a Smart contract in your project: The first line in a contract sets the version of the source code that will be taken into account by the compiler: Plain Text pragma solidity ^0.8.20; For this example, we’ll create a contract with the Counter name using the contract keyword: Plain Text contract Counter { uint private count = 0; } We have declared the count state variable with data type uint as an unsigned integer. With the private access modifier, as in Java, this field can only be called within the current contract. Next, let’s add two functions for incrementing and decrementing the count variable. In addition, we add the getter method for the count variable with the view modifier that ensures the contract’s state won’t be changed: Plain Text pragma solidity ^0.8.20; contract Counter { uint private count = 0; function increment() public { count += 1; } function decrement() public { count--; } function getCount() public view returns (uint) { return count; } } 4. Compiling the Smart Contract and Creating a Web3j Wrapper To compile the listed Smart contract, we’ll use the previously installed solc library: Shell solcjs Counter.sol --bin --abi --optimize -o ../artifacts This will create two files with .abi and .bin extensions in the artifacts folder: To convert the generated .abi and .bin files into a Java class with method signatures from the contract, we’ll utilize the Web3j generator: Shell web3j generate solidity -b Counter_sol_Counter.bin -a Counter_sol_Counter.abi -o ../java -p com.example.smartcontract After execution, in the src/main/java/org/example directory, you should have a new class named Counter_sol_Counter. Let’s rename it to CounterContract: 5. Generating Ethereum Address To generate an Ethereum address and private key, you can use this website. Copy the private key and put it in the application.properties as ethereum.private-key property: I’ll be deploying our contract to the Sepolia testnet (you can also use Goerli or Kovan instead). In order to deploy a Smart contract and pay for transaction fees, we’ll need a little bit of ETH on the Sepolia testnet. Here are a few faucets where you can receive test coins: Sepolia Faucet Faucet Sepolia Sepolia PoW Faucet Infura Copy your Ethereum address and submit sending ETH coins: Lastly, we’ll add the URL of Sepolia's JSON-RPC endpoint in application.properties: Properties files ethereum.private-key=<your_ethereum_private_key> ethereum.provider=https://eth-sepolia.g.alchemy.com/v2/demo 6. Java Configuration Let’s create the Web3jConfig class, in which we declare beans for org.web3j.protocol.Web3j and org.web3j.crypto.Credentials using the parameters from application.properties: Java package com.example.smartcontract; import lombok.extern.slf4j.Slf4j; import org.springframework.beans.factory.annotation.Value; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration; import org.web3j.crypto.Credentials; import org.web3j.protocol.Web3j; import org.web3j.protocol.http.HttpService; @Configuration @Slf4j public class Web3jConfig { @Value("${ethereum.provider}") private String ethereumProvider; @Value("${ethereum.private-key}") private String ethereumPrivateKey; @Bean public Web3j web3j() { return Web3j.build(new HttpService(ethereumProvider)); } @Bean public Credentials credentials() { return Credentials.create(ethereumPrivateKey); } } Next, we declare a bean for the CounterContract, that will be deployed during initialization: Java @Bean public CounterContract counterContract() { CounterContract counterContract; try { counterContract = CounterContract.deploy(web3j(), credentials(), new DefaultGasProvider()).send(); // counter = Counter.load(counterContractAddress, web3j(), credentials(), new DefaultGasProvider()); } catch (Exception e) { log.error("Error while deploying a contract", e); throw new RuntimeException(e); } log.info("Counter contract has been deployed: {}", counterContract.getContractAddress()); return counterContract; } To use functions of the deployed Smart contract, let’s create the CounterContractService class with the injected CounterContract in it: Java package com.example.smartcontract; import lombok.SneakyThrows; import lombok.extern.slf4j.Slf4j; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.stereotype.Service; import org.web3j.protocol.core.methods.response.TransactionReceipt; import java.math.BigInteger; @Service @Slf4j public class CounterContractService { @Autowired private CounterContract counterContract; @SneakyThrows public BigInteger getCount() { return counterContract.getCount().send(); } @SneakyThrows public void increment() { TransactionReceipt transactionReceipt = counterContract.increment().send(); log.info("increment transaction : {}", transactionReceipt.getTransactionHash()); } @SneakyThrows public void decrement() { TransactionReceipt transactionReceipt = counterContract.decrement().send(); log.info("decrement transaction : {}", transactionReceipt.getTransactionHash()); } } 7. Wrap Up At this point, the basic implementation of a Smart contract is ready. After launching the application, in the logs, you’ll see the address of the deployed contract, which you can trace in the Etherscan Explorer for Sepolia. The source code is available on GitHub.

By Alexander Makeev
Components of Container Management
Components of Container Management

This is an article from DZone's 2023 Containers Trend Report.For more: Read the Report Containers are a major catalyst for rapid cloud-native adoption across all kinds of enterprises because they help organizations quickly lift and shift legacy applications or break monoliths into microservices to move to the cloud. They also unlock system architecture to adopt a multi-cloud ecosystem by providing an abstraction between the application and underlying platform. Benefits of containers are widely evident around the cloud-native world and its modernization journey. Enterprises on the cloud-native roadmap are adopting and running containers at scale. Containers are not only about building and running images — a lot more goes on behind the scenes for container management, including all the tools and processes covering the complete lifecycle of containers. When enterprises start adopting containers, they will only have a handful of containers to look after. In this case, "container management" looks like little more than having docker build and docker run. Ignoring a container management strategy can lead to developer and operator ineffectiveness, poor governance and compliance, and security challenges in the long term. Giving priority to strategizing and managing the container lifecycle can help boost productivity and the effectiveness of developers and teams. It also contributes toward solution agility and helps in reducing the blast radius and vulnerabilities. Enterprises need to holistically consider container management planning and lifecycle before accelerating container adoption. Aspects of Container Management Strategy Let's understand various key parts of container management and its components. Container and Image Supply Chain Container images are building blocks for running containers. An image supply chain consists of all the nuts and bolts to make it executable on environments by pull, build, and run. An image supply chain also includes: All the layers of images built on top of the base image, which includes libraries and utilities that complement the containerized application package CI/CD tools that test and scan your packaging as a container image Static and runtime scanning for vulnerability detecting and patching, signing, or hashing of images to validate their sanctity in your registries or pipeline Figure 1: Container management lifecycle - Container image supply chain Container Infrastructure Handling Once your container image supply chain has been established (see Figure 1), you next want to run and build your application on top of it. For this, you need something on which you can run or execute containers. This includes compute for running containers and software logistics to schedule and organize them. If you're working with just a few containers, you can still manually gauge and control where to run the containers, what else will be in the app sidecars, or support ecosystem components. Provisioning the right storage and networking for those containers can be manually or semi-automatically handled. At scale, however, it is almost unmanageable to handle a large workload without an intelligent orchestrator that orchestrates these infrastructures as well as other aspects of container execution. Container Runtime Security and Policy Enforcement It is equally important for your container management solution to perform security scans, competence checks, and policy enforcement. A management solution enforces policy and compliance in parallel with a runtime security scan for vulnerabilities inside a container pipeline, and it scans running containers on host nodes. Container Monitoring and Observability Images and containers are fully packaged with all the dependencies and prerequisites of apps running on top of an identified compute. Now we need to understand containers' behavior and what they are up to. A containerization strategy — which covers monitoring and observability of logging, traces, and metrics collection — should include container workloads, orchestration, and tooling that support container execution. Container execution inside a cluster of managed infrastructure includes supporting tools and utilities for running containers. Orchestrators will also have their own logging and monitoring since containers are ephemeral in nature. Planning Container Management Strategy So far, we have discussed all major components of container management. Enterprises should address the following aspects while designing a container management lifecycle. Figure 2: Container governance and policy compliance - Container management stages Handling Image Supply Chain Existing CI/CD tools can be leveraged to build container images after compiling code and base references. A few important things to handle while building your enterprise image supply chain are: The ability to scan container images in an enterprise repository Security and policy compliance Hashing or signing the image to avoid any tampering Scanning mirror images from a well-known and sanitized registry before bringing them into an enterprise repository Tagging and attributing images with details of the teams owning it for better support, portability, and upkeep Some mature enterprises handle redundancy and replication of an image repository and artifacts to ensure high availability across the DevOps cycle, followed by periodic backups and a recovery process. Elastic, highly available, and fault-tolerant systems are not just limited to an execution environment but are equally important for the end-to-end DevOps cycle. Infrastructure and Orchestration Handling Strategy Infrastructure and orchestration handling strategy is all about the allocation of compute, storage, networking, and backups of containers running at scale. Selecting the type and quantum of compute is very important for designing containers. Containers can only be truly portable if the underlying compute is elastic and supports X (horizontal) and Y (vertical) scalability. Storage requirements for containers can be a mix of OS usage as well as container persistence. It means that container operations require a well-planned storage supply with diverse options of file, block, and blob storage. Networking is an essential part of the connectivity and delivery of a solution alongside enterprise security. Using a mature orchestrator like Kubernetes, Docker swarm, etc., provides different flavors of inter- and intra-container cluster connectivity. Backups are an important part of operating containerized environments, which consist of mounted storage that holds data required to persist. A well-managed backup strategy contributes toward resiliency, cross-regional recovery, and autoscaling. For example, you can use image and container backups to recreate immutable read-only containers, given their ephemerality. Container Security Principles You are only as secure as your most vulnerable container. One of the main advantages of containers is that they reduce the blast radius and attack surface. Regular scanning and re-scanning of a repository is a good starting point, as you can see in Figure 2. Also, it is vital to consider implementing container runtime scanning — most likely traditional, agent-based host scanning to detect runtime anomalies. Container images are immutable; hence, vulnerability patching should replace an old image with a new properly scanned and tested image. Patching hundreds or thousands of containers can be cumbersome and should be replaced with new containers based on updated and patched base images. Container Observability Planning Looking inside a dense cluster of small ephemeral containers is challenging, and they may grow out of control if not handled maturely. The 12-Factor App guides us through the critical aspect of externalizing your logs. Containers will come and go, but the draining of logs toward an external syslog gives you better insights via log aggregation and mining. Figure 3: Container strategy phases and execution pipeline Besides everything, developer experience is crucial in enterprise container management planning. It's important to analyze the productivity and effectiveness that the container lifecycle is bringing to developers and operators working on a DevOps pipeline chain. Enterprises also need to evaluate whether DIY or managed services (like EKS, AKS, or GKE) are a better fit for them. The answer may depend on the enterprise's maturity around different aspects of infrastructure, networking, and security handling, as you see in Figure 3. Organizations' roadmaps for infrastructure (private vs. hybrid vs. multi-cloud architecture) should be considered in the container management strategy. Best Practices for Building an Optimized Container Ecosystem Let's quickly review some best practices to help build better containers: Package a single app per container Do not treat containers as VMs Handle container PIDs and zombie processes carefully Optimize docker build cache Remove unnecessary tools and utilities from images Be cautious of using publicly sourced images vs. scanned enterprise build images Build on the smallest possible images Properly tag your images for better lifecycle handling Conclusion Finally, I am containerizing and packaging a portable summary of an effective container management strategy (pun intended). The takeaway is to inspect how effectively your engineers and developers are managing a large containerized production environment. How agile you will be in responding to urgent vulnerabilities? How soon you can respond to dynamic scalability requirements of compute and storage? The 12-Factor App is an effective gauge of measuring container ecosystem maturity. When choosing your tool, consider options that support infrastructure requirements of today and tomorrow. Enterprises also need to determine whether to use DIY or managed services based on in-house maturity around container lifecycle stages. You can always strategize your plan around the re-use of tools and processes to manage containers as well as non-container components optimally. This is an article from DZone's 2023 Containers Trend Report.For more: Read the Report

By Pratik Prakash CORE
Breaking Down the Monolith
Breaking Down the Monolith

This is an article from DZone's 2023 Containers Trend Report.For more: Read the Report Conventionally, software applications were developed using a monolithic architecture where all application components were tightly interwoven and deployed as a single unit. As software applications became more intricate and organizations started relying on distributed systems, the constraints of monolithic architecture started becoming more evident. Containerization was introduced as a solution to this and the growing demand for application scalability, agility, and resilience. The success of microservices further propelled the use of containers, which enabled organizations to break down monolithic applications into independently deployable services. This loosely coupled framework also enabled developers to deploy changes quickly, while achieving enhanced scalability and fault tolerance. In this article, we explore the limitations of monolithic architecture and demonstrate how containers and microservices can support modern application delivery. We also delve into the various strategies, stages, and key steps organizations can adopt in their journey towards containerization and learn how different strategies can support different use cases. Breaking a Monolith Into Microservices Compared to monolithic architecture, a microservices architecture comprises modular, loosely coupled components that communicate via well-defined APIs. This architecture promotes fault tolerance, as the failure of one service has limited impact on the overall system. Microservices also differ from monoliths by using polyglot persistence, which enables each service to select its ideal storage solution based on its requirements. However, before you transition from monolithic to microservices architecture, it's essential to understand the key differences between the two for making informed decisions about application design and choosing the right transformation strategy. The following table outlines these distinctions, offering insights into the unique benefits and characteristics of each approach: KEY DIFFERENCES BETWEEN MONOLITHIC AND MICROSERVICES ARCHITECTURES Aspect Monolithic Architectures Microservices Architectures Structure Single, large application Multiple, small services Deployment Deploy the entire application at once Deploy individual services independently Scalability Scale the entire application Scale specific services as needed Development Changes require coordination across the team Changes can be made independently Technology stack Typically uses a single, unified stack Each service can use a different stack Fault tolerance Failure in one component affects the entire app Failure in one service has limited impact Strategies for Migrating to Containers Strategies for migrating to containers vary depending on an organization's specific requirements and goals. These strategies help manage the transition from monolithic architectures to containerized microservices, allowing organizations to achieve increased agility, scalability, and resilience. Let's review some common approaches. Phased Approach This approach involves incrementally breaking down monoliths into microservices through containerization, beginning with components that will potentially realize maximum benefits first. It reduces risks while giving teams time to learn and adapt processes over time. When to use: The phased approach is considered best when you wish to minimize risk and disrupt ongoing operations. It is also suitable for organizations with limited resources or complex monolithic applications who would prefer a gradual transformation from monolithic to microservices. Hybrid Architecture The hybrid architecture approach combines monolithic and microservices components, where some remain within monolithic architecture while others migrate toward microservices architectures progressively. This approach balances the benefits of both architectures and is suitable for organizations that cannot fully commit to a complete migration. When to use: Adopt this approach when it isn't feasible or necessary to completely transition an application over to microservices. This strategy works especially well for organizations that wish to retain parts of a monolithic application while reaping some advantages of microservices for specific components. Big Bang Approach Redesign and implement an entire application using microservices from scratch. Although this strategy might require dedicated resources and may introduce greater risk, this allows organizations to fully exploit the advantages of microservices and containers. When to use: Choose this approach when your monolithic application has reached a point where a complete overhaul is necessary to meet current and future demands, or when your organization can afford a resource-intensive yet riskier transition to microservices and containers while reaping all their advantages. Re-Platforming This approach involves moving the monolithic application to a container platform without breaking it into microservices. Replatforming offers some benefits of containerization, such as improved deployment and scalability, without the complexities of a full migration to microservices. When to use: It's recommended to use re-platforming when the goal is to gain some of the advantages of containerization without breaking down the monolith into microservices. It is also ideal for organizations that are new to containerization and want to test the waters before committing to a full migration to microservices. Practical Steps to Embracing a Containerization Journey Embarking on a containerization journey signifies the transformation of monolithic architectures into streamlined, efficient, and scalable microservices. The following section explores various stages involved in migrating monoliths to containers, and it provides a comprehensive roadmap to successfully navigate the complexities of containerization. Stages of Migrating Monoliths to Containers The migration process from monoliths to containers typically goes through three stages. Each stage focuses on addressing specific challenges and gradually transforming the architecture to optimize efficiency, flexibility, and maintainability: Identify the monolith in an organization's application architecture. Look for large, tightly coupled systems that have a single codebase, shared resources, and limited modularity. Analyze the dependencies, data flow, and communication patterns to understand the complexities of your monolith. Define service boundaries. Perform domain-driven design (DDD) analysis to identify logical service boundaries within the monolith. Establish bounded contexts that encapsulate specific business functions or processes, enabling microservices to be designed around these contexts and reducing inter-service dependencies. Re-architect the application into smaller, more manageable pieces. Re-architect the monolithic application into microservices using patterns like API gateway, service registry, and circuit breaker. Implement an API-driven approach, with each microservice exposing a RESTful API or using message-based communication such as AMQP or Kafka. Figure 1: Migrating monoliths to containers Key Steps of a Containerization Journey Embracing containerization often involves a series of well-defined steps that may be tailored for individual use cases. Successful containerization adoption may vary based on each organization's use case; however, the following four steps provide general guidance as organizations navigate their container journey from identifying component applications and setting up robust management to administering security practices for containers. Identify Application Components Analyze your application's architecture using dependency graphs or architecture diagrams to identify individual components like databases, web servers, and background workers. Determine the components that can be containerized and identify related dependencies that should be resolved during containerization. Purpose: Provides clarity on the application's architecture Helps with efficient resource allocation Enables component isolation for greater modularity Helps with dependency management Ensures seamless communication between containerized components Containerize Application Components Create Dockerfiles for each component to enable the encapsulation of application components into isolated containers, which facilitates easier deployment, portability, and version control. Use multi-stage builds to optimize image sizes and employ best practices like using non-root users and minimizing the attack surface. Ensure proper versioning of images by tagging them and storing them in a container registry like Docker Hub, RedHat Quay, or GitHub Container registry. Purpose: Encapsulates components in containers to create isolated environments Enables easy transfer of components across different environments Facilitates better version control of components Container Orchestration Choose a container orchestration platform such as Kubernetes or Docker Swarm to manage the deployment, scaling, and management of containerized applications. Implement appropriate resource allocation by defining resource limits and requests in your deployment manifests or compose files. Create self-healing deployments using liveness and readiness probes to monitor container health and restart unhealthy containers. Purpose: Ensures optimal allocation of resources for each container Maintains high availability and fault tolerance of applications Facilitates rolling updates and rollbacks with minimal downtime Container Management and Security Use tools like Prometheus and Grafana for monitoring and logging, with custom alerts for critical events. Implement a CI/CD pipeline for container updates, automating the build, test, and deployment process. Employ security best practices such as scanning images for vulnerabilities, enforcing network segmentation, and applying the principle of least privilege. Additionally, it is also recommended to use container security solutions that help with the continuous monitoring of threats and enforce security policies. For instance, the following configuration file can be considered as a pseudocode for managing and securing containers: # Example of container security best practices configuration container_security: - use_non_root_user: true - minimize_attack_surface: true - implement_network_segmentation: true - enforce_least_privilege: true As the next step, the following pseudocode, for instance, illustrates how to load the security best practices configuration and apply it to running containers using any container management tool: # Pseudocode to manage and secure containers import container_management_tool # Load security best practices configuration security_config = load_config("container_security") # Apply security best practices to running containers container_management_tool.apply_security_config(security_config) # Monitor container performance and resource usage container_management_tool.monitor_containers() # Implement logging and alerting container_management_tool.setup_logging_and_alerts() Purpose: Monitors container performance and resource usage Implements logging and alerting for better visibility into containerized applications Ensures container security by scanning images for vulnerabilities and applying best practices Enforces security policies and monitors threats using container security solutions Real-World Use Cases Some real-world examples of organizations that have successfully implemented containerization in their application architecture include Netflix, Spotify, and Uber. Netflix: Architecture Transformation for Agility and Scalability Netflix is one such company that has successfully leveraged containers and microservices architecture to address the massive scale of its streaming service. By investing in container-based workloads, Netflix was able to break down their monolithic app into separate, more focused services that could be deployed and managed independently, giving greater agility and scalability while accommodating large traffic spikes during peak times more easily. This provided greater flexibility as they handled increased traffic with greater ease. Spotify: Containerization for Better Developer Productivity Spotify also embraced containers and microservices to increase developer productivity. Before the transformation journey, they relied on a monolithic architecture that required extensive coordination among cross-functional teams to run and maintain. By breaking up this monolith with containerization and microservices, they created a modular architecture where developers were able to work independently on specific components or features of their application without interference from one team to the next. Containers also provided a consistent environment in which developers were able to build, test, and deploy iterative changes on application easily. Uber: Containers for Deployments and Traffic Spike Management At first, Uber was using a monolithic framework that was difficult to scale and required close team collaboration in order to function smoothly. They transitioned over to using Domain-Oriented Microservice Architecture (DOMA) for scaling their services based on demand. This platform utilized containers, which allowed more rapid deployment and improved the handling of traffic spikes. Uber took advantage of microservices and containers to scale its services independently, enabling it to adapt their offerings according to traffic or usage patterns. In addition, using microservices provided increased fault isolation and resilience that ensured its core services remained available even if one service failed. Conclusion Embarking on a containerization journey enables organizations to transform their monolithic applications into more agile, scalable, and resilient microservices-based systems. Despite the benefits, it's also essential to acknowledge that transitioning from monolithic applications to containerized microservices may introduce challenges, such as increased operational complexity and potential issues in managing distributed systems. As you reflect on your own organization's containerization journey, consider the following questions: Which components of your monolithic application will benefit the most from containerization? What strategy will you adopt for migrating to containers: phased approach, hybrid architecture, big bang approach, or re-platforming? How will you ensure effective container orchestration and management in your new architecture? With the above questions answered, the blueprint of the transformation journey is already defined. What comes next is implementing the chosen strategy, monitoring the progress, and iterating as needed to refine the process. This is an article from DZone's 2023 Containers Trend Report.For more: Read the Report

By Sudip Sengupta CORE
Superior Stream Processing: Apache Flink's Impact on Data Lakehouse Architecture
Superior Stream Processing: Apache Flink's Impact on Data Lakehouse Architecture

In the era of data-driven decision-making, the Data Lakehouse paradigm has emerged as a promising solution, bringing together the best of both data lakes and data warehouses. By combining the scalability of data lakes with the data management features of warehouses, Data Lakehouses offer a highly scalable, agile, and cost-effective data infrastructure. They provide robust support for both analytical and operational workloads, empowering organizations to extract more value from their data. In our previous articles, we've explored the concept of Data Lakehouses in depth. Data Lakehouses: The Future of Scalable, Agile, and Cost-Effective Data Infrastructure laid the groundwork by highlighting the key business benefits of lakehouses. A New Era of Data Analytics: Exploring the Innovative World of Data Lakehouse Architectures took a closer look at the architectural aspects of lakehouses, while Delta, Hudi, and Iceberg: The Data Lakehouse Trifecta focused on the three main lakehouse solutions: Delta Lake, Hudi, and Iceberg. As we delve into the world of Data Lakehouses, one technology that stands out for its potential is Apache Flink. Renowned for its superior stream processing capabilities, Flink can handle both batch and real-time data, making it a compelling choice for implementing Data Lakehouses. Furthermore, it boasts high processing speeds and fault tolerance, features that align well with the demands of modern, data-intensive applications. In this article, we aim to explore the intersection of Apache Flink and Data Lakehouses. We will delve into the capabilities of Flink, compare it with other technologies like Apache Spark, and illustrate how it can be leveraged in the context of a Data Lakehouse. By providing practical examples, we hope to illustrate the potential of Flink in this exciting realm and offer insights to those considering its adoption. Let's embark on this journey to understand how Flink could be a game-changer in the Data Lakehouse landscape. A Closer Look at Apache Flink Apache Flink, an open-source project under the Apache Software Foundation, is a potent stream-processing framework. With its ability to proficiently manage both real-time and batch data processing, Flink has made a significant impact in the Big Data landscape. Its unique capabilities, such as fault tolerance and event time processing, enable it to deliver fast and accurate results, marking it as a standout contender in the data processing realm. Although we won't dive deep into the intricacies of Flink's architecture, it's important to highlight its key features and how they differentiate it from other big data processing systems. Flink operates under a unique principle known as "Stream-Batch Unification," which treats batch processing as a subset of stream processing. This allows Flink to manage bounded (batch) and unbounded (stream) data with equal proficiency. The architectural design of Flink includes several vital components. The JobManager, equivalent to the master node in other distributed systems, orchestrates the distributed processing. TaskManagers, the worker nodes, are responsible for carrying out tasks, while the Source function allows data intake, and the Sink function facilitates results output. This structure allows Flink to effectively handle massive data quantities, scaling out as needed. When compared to other big data processing frameworks, Flink's unique strengths become apparent. Its high-speed and low-latency processing capabilities, even in large-scale operations, are noteworthy. Flink also provides strong consistency and fault tolerance through its asynchronous checkpointing mechanism. Moreover, its support for event time processing and windowing functions makes it particularly suitable for intricate event processing and time-series analysis. In the forthcoming section, we will delve into the role of Flink within Data Lakehouses and benchmark it against Apache Spark, another leading big data processing framework. The Role of Apache Flink in Data Lakehouses As organizations increasingly adopt the data lakehouse paradigm, the need for an efficient, flexible, and robust processing engine becomes paramount. Apache Flink, with its unique architecture and capabilities, is well-positioned to fill this role. The data lakehouse model seeks to bring together the best attributes of data lakes and data warehouses. It needs to handle vast volumes of structured and unstructured data, provide real-time insights, and offer robust data governance. Flink's architecture and features align remarkably well with these requirements. Flink's "Stream-Batch Unification" principle allows it to efficiently process both real-time (unbounded) and historical (bounded) data. This is particularly important in a data lakehouse setup, where real-time data ingestion and analysis can coexist with batch processing jobs. The high throughput and low-latency processing capabilities of Flink also enable the delivery of timely insights, a crucial aspect for data-driven decision-making. Furthermore, Flink's fault tolerance mechanism provides data consistency and reliability, critical for ensuring data integrity in a lakehouse environment. Its event time processing capability, in conjunction with windowing functions, enables sophisticated analytical operations, including complex event processing and time-series analysis. This is essential for extracting valuable insights from the data stored in a lakehouse. In essence, Flink's ability to handle high volumes of data, process real-time and batch data efficiently, and provide reliable and consistent data processing, aligns perfectly with the requirements of a data lakehouse. In the next section, we will explore how Flink stands against Apache Spark, another prominent data processing framework, in the context of data lakehouses. Flink vs. Spark: A Comparative Analysis in the Data Lakehouse Context In the big data processing landscape, Apache Spark has long been a front-runner, known for its versatility and efficiency. However, when it comes to implementing data lakehouses, Apache Flink presents a compelling case with its unique attributes. One of the key distinctions between Flink and Spark lies in their approach to data processing. Spark operates primarily as a batch processing system, with streaming capabilities built on top of its batch engine. In contrast, Flink is designed as a true streaming engine, with batch processing treated as a special case of streaming. This makes Flink more adept at handling real-time data, a critical aspect in many data lakehouse use cases. Flink's event-time processing is another feature that gives it an edge over Spark. While Spark also supports event-time processing, Flink's handling of late events and watermarks is more sophisticated, which is crucial for ensuring accurate real-time analytics. In terms of fault tolerance, both frameworks offer robust mechanisms. However, Flink's lightweight asynchronous checkpointing mechanism causes less performance impact compared to Spark's more resource-intensive approach. Despite these differences, it's important to remember that the choice between Flink and Spark isn't always a zero-sum game. Each has its strengths and is better suited to certain scenarios. A comprehensive understanding of their capabilities can help organizations make the best choice for their specific data lakehouse needs. In the following section, we'll present some practical examples of implementing data lakehouses with Flink. Practical Implementation of Data Lakehouses With Apache Flink Understanding Apache Flink's capabilities within a data lakehouse setup is greatly enhanced with practical examples. In this section, we'll discuss typical implementations and provide code snippets to give a clearer picture of how Flink can be utilized within a data lakehouse environment. Consider a data lakehouse architecture where Flink serves as the real-time data processing layer. It can consume data from diverse sources, such as Kafka or IoT devices, process it in real time, and store it in the data lakehouse for further use. The processed data can be directly channeled into real-time dashboards or used to trigger alerts. Here's a simplified Flink code snippet demonstrating data ingestion from Kafka, processing, and writing the results to a Sink: Java // Create a StreamExecutionEnvironment StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); // Create a Kafka source FlinkKafkaConsumer<String> kafkaSource = new FlinkKafkaConsumer<>( "topic-name", new SimpleStringSchema(), kafkaProperties ); // Add the source to the environment DataStream<String> stream = env.addSource(kafkaSource); // Process the data DataStream<String> processedStream = stream.map(new ProcessingFunction()); // Write the processed data to a Sink (e.g., HDFS) StreamingFileSink<String> sink = StreamingFileSink .forRowFormat(new Path("hdfs://output-path"), new SimpleStringEncoder<String>("UTF-8")) .build(); processedStream.addSink(sink); // Execute the Flink job env.execute("Flink Data Lakehouse Job"); In the code above, we've ingested data from a Kafka topic, processed it using a hypothetical ProcessingFunction(), and finally written the processed data to a Hadoop Distributed File System (HDFS) sink. This example demonstrates how Flink can serve as an efficient data processing layer within a data lakehouse. Consider a use case in a retail business where Flink processes real-time customer activity data and feeds the insights into the lakehouse. These insights can then be utilized to customize customer experiences, adjust inventory, or enhance marketing tactics. Similarly, a financial institution could leverage Flink to process real-time transaction data. By executing complex event processing with Flink, the institution can detect fraudulent activities as they occur and take immediate action. The processed data, once stored in the lakehouse, can be used for generating detailed reports and further analysis. Furthermore, Flink's compatibility with popular storage systems like HDFS, S3 and databases like Cassandra or HBase ensures easy integration with existing data infrastructure. Its ability to interoperate with other big data tools, such as Hadoop and Spark, allows organizations to maximize their existing technology investments. In the concluding section, we'll encapsulate Flink's potential in data lakehouse implementations and discuss why it's a technology worth considering for your data strategy. Case Studies: Successful Implementations of Apache Flink in Data Lakehouses To further illustrate the practical usage of Apache Flink in data lakehouse architectures, let's delve into a few real-world case studies where organizations have successfully leveraged Flink's capabilities. Alibaba Group: Alibaba, the Chinese multinational conglomerate, uses Flink extensively for various real-time computing scenarios in its data lakehouse. They use Flink for real-time search indexing, online machine learning, and personalized recommendations. By adopting Flink, Alibaba has been able to process billions of events per day in real-time, significantly improving their business agility and customer experience. Uber: Uber uses Flink for processing real-time and historical data to power applications like dynamic pricing and supply positioning. Flink's ability to unify batch and stream processing and its robust fault tolerance mechanisms are some of the key reasons why Uber chose Flink. This has enabled Uber to deliver more accurate, real-time responses to market changes. Netflix: Netflix uses Flink as part of its data lakehouse to process billions of events daily for real-time analytics, anomaly detection, and machine learning. Flink's ability to handle massive scale and its sophisticated windowing functions have proven invaluable to Netflix's data strategy. These examples showcase Flink's potential in data lakehouse setups and how different businesses have capitalized on its unique strengths. As organizations continue to search for robust, scalable, and versatile data processing tools for their data lakehouses, Apache Flink emerges as a strong contender. In the final section, we'll summarize the potential of Flink in data lakehouse implementations and discuss why it's a technology worth considering. Conclusion: Why Apache Flink Is a Compelling Choice for Data Lakehouses In this era of data-driven decision-making, the importance of a robust, efficient, and versatile data infrastructure cannot be overstated. The data lakehouse concept, which combines the strengths of data lakes and data warehouses, has emerged as an attractive solution for managing complex and diverse data workloads. Apache Flink, with its unique architecture and capabilities, stands out as a promising technology for implementing data lakehouses. Its ability to handle both real-time and batch processing, robust fault tolerance, and event time processing capabilities aligns remarkably well with the requirements of a data lakehouse. Moreover, compared to other popular data processing frameworks like Apache Spark, Flink's true streaming nature and sophisticated handling of event time and watermarks offer a significant advantage, particularly for use cases requiring real-time insights and accurate event processing. The practical examples and case studies we discussed highlight the flexibility of Flink in accommodating diverse data workloads and its potential to deliver substantial business value. Whether real-time customer activity analysis in retail, fraud detection in financial transactions, or powering real-time machine learning applications, Flink is proving its worth in varied scenarios. In conclusion, as organizations continue to evolve their data strategies and seek to extract more value from their data, Apache Flink presents a compelling case for consideration. Its alignment with the data lakehouse model, coupled with its unique strengths in handling complex data processing tasks, makes it an exciting technology for the future of data infrastructure.

By Andrey Gusarov
IDE Changing as Fast as Cloud Native
IDE Changing as Fast as Cloud Native

Much has been said about how much containerization has changed how we deliver and operate the software. But at the same time, the tools such as IDEs have changed just as much. We’ve gone from heavyweight, thick client tools running on our desktops with all the possible features we might need to be incorporated into the IDE to a very lightweight core. Everything is a plug-in, and increasingly the browser is the delivery platform for it. We have also seen a trend of IDEs becoming free, if not open source, and you may be surprised to read that Oracle has been at the forefront of this in the Java space. You have to look at the contributions to Netbeans as an Apache project. JDeveloper has been free, and Oracle has made plugins freely available for Eclipse for a very long time. This tradition continues with the code for the latest VS Code plugins being freely available (Oracle Samples — GitHub repository). While these products have provided plugin frameworks, as these products had a lot of the features for a language like Java built-in, the contributor community hasn’t been so broad. After all, the IDE out of the box contains a lot of tooling, and if you’re not working with the IDEs language, then you’ve got a significant footprint overhead that you won’t really utilize. Despite this, the more prominent software vendor-built plugins for the likes of Eclipse and Netbeans. For example, Oracle’s extensions to support work with PL/SQL, SOA Suite, and SonarSource have adapted their code quality tool for almost every IDE possible (more on this can be found here). They only provide the basics; plugging in everything else isn’t new. We can look at Emacs as an example of such an approach. But the modern approach has taken off with the likes of Sublime, Electron, and Eclipse Theia, which uses many of the same foundations as VS Code, such as the core Monaco editor (for more about the differences, check out this article). One of the most important details here is that Theia and VS Code can work with the same plugins (details here). This lighter nature and need for plugins to make the IDE more than just a fancy text editor. As there is no initial Language bias that came with earlier generations of IDE has meant the pool of potential users and plugin contributors is enormous. The one outlier to this trend has been JetBrains with IntelliJ IDEA and their other language variants. But here, the IDE has been packaged with the tooling for one language and variants of the base platform for different languages. Power of Plugins Supporting plugins and the core tool being made freely available has been core to the success of VS Code beyond the basic editor usability. As a result, a community of both independent developers and commercial vendors has developed a wealth of plugins. This is both a positive and a negative, as with all community-driven things, there are capability overlaps and varying quality as some solutions may not be more than beta solutions, and others lack sufficient support to sustain and maintain them. While download and star ratings can be indicative of quality and sustainability, this does make new arrivals into the marketplace harder to get established. Fortunately, if you’re working with a particular vendor, you can find their plugins grouped together in the Marketplace using the /publisher/<name>, for example, (and Oracle Labs). Some of the open-source communities, such as Apache, and MicroProfile Community, are identified as publishers. However, Linux Foundation and subsidiary groups like CNCF don’t appear to have their tools published collectively. Plugins From Cloud Vendors One newer trend with IDEs is to have plugins to make it increasingly easier not only to develop but also to ease the deployment into clouds such as Azure, GCP, and OCI. This means we can remain working within the IDE without having to drop out to command line tools or web interfaces, removing that context switching that can be annoying, particularly when working with limited screen estate or is' that have a greater emphasis on single app contexts. This can sometimes also include process simplification by combining several steps, such as generating the right packaging at the same time as pushing code. IDE Divergence While Microsoft has made VS Code freely available, it is still subject to Microsoft licensing; data gathering on its use goes to Microsoft as well as configuration and branding. VS Codium is, fortunately, the same as VS Code, but without those metrics, Microsoft license details, etc. There is a community that has taken to ensure the same source is packaged, deployable, and available through Brew, Chocolatey, Snap, and other package managers. You can, of course, build the binaries yourself. Not being bound to Microsoft’s implementation is great for those who are wary of what data may be collected or simply the solution being for a competitor, but it creates a new issue — specifically, the risk of divergence in the way things may be done. Most crucially, the plugins as this is the lifeblood of these types of IDEs. The Eclipse foundation has recognized this, and as is key to supporting Theia as an IDE intended to be a foundation that is extended on which people can build their own IDEs (examples of this include Red Hat OpenShift Dev Spaces). This is achieved by their Open VSX Registry, which provides a repository of plugins that are able to work across multiple IDEs, including VS Code and VS Codium, and any extension of Theia. Oracle, like all major developer tool providers, has gravitated to supporting VS Code but is also working to ensure that its development of plugins is compliant and visible on the Open VSX Registry. Releases will typically be published into the VS Code marketplace, as this enables the biggest single community. But verification and registration for VSX should follow quickly afterward. OCI Cloud Shell Editor It is also a reasonable expectation that these plugins will also appear in the OCI Cloud Shell Editor at some point in the future. Making the development processes even easier in cloud environments. We can this trend Conclusion When it comes to IDEs, VS Code and related solutions (such as Theia, VS Codium, etc.) are here, established, and with Github Codespaces and similar cloud vendor provider IDE solutions (such as Oracle’s Cloud Console Editor), they’re going to be around for a while for a good while. I think we’ll see the trend for cloud service plugins like those illustrated. What we’ve not yet observed is whether we’ll see services common to multiple cloud vendors, such as Kubernetes, OpenSearch, etc., coalesce around a common code base with differences down to naming conventions and authentication frameworks becoming pluggable. As to whether VS Code will become all dominant? Well, as we've discussed, VS Code has a strong open-source background, and there is standardization that aligns with this. It is going to be influential — the convenience and name recognition will combine with the pluggability characteristics will dominate. The adoption of common foundations into cloud Hosting solutions is also going to help keep it well positioned. But we do have some options in the form of VS Codium if you're concerned about undue influence.

By Phil Wilkins
Introduction to Kanban Methodology
Introduction to Kanban Methodology

Kanban is a methodology that originated in Japan in the 1940s as a way to improve manufacturing efficiency. Today, it has evolved into a widely used approach to managing work in various industries, from software development to healthcare. Kanban is a lean method for managing and improving work across human systems. This method uses a visual system to manage work as it progresses through various stages of development. It is a simple but powerful tool that helps teams manage workflow and reduce waste. This article will provide an overview of the Kanban methodology, its benefits, and how to implement it. What Is Kanban? The Kanban methodology originated in the manufacturing industry in Japan in the 1940s. It was developed by Taiichi Ohno, an industrial engineer at Toyota, to improve manufacturing efficiency. The word "kanban" means "visual signal" in Japanese, and the method is based on using visual signals to communicate information about work. Kanban is a Japanese word that translates to "visual signal" or "card." The Kanban method is now widely used in software development, project management, and other industries. It is a framework for managing and improving work processes by visualizing the work, limiting work in progress, and continuously improving the process. The Kanban Methodology is a visual process management tool that is used to manage workflows. It is based on the principles of transparency, visual management, and continuous improvement. The methodology is designed to help teams visualize their work, limit work in progress, and optimize their workflow. The Kanban board is a key element of the Kanban methodology. It is a visual representation of the workflow that the team uses to manage their work. The board is divided into columns that represent the stages of the workflow, from "To Do" to "Done." Each column contains cards that represent individual tasks or work items. The cards include information such as the task description, the person responsible for the task, and the due date. Principles of Kanban The Kanban methodology is based on four principles: Visualize the workflow: Visualizing the workflow helps team members understand the work process, identify bottlenecks, and collaborate more effectively. Kanban boards are used to visualize the flow of work, from the initial request to the final delivery. Limit work in progress: Limiting work in progress (WIP) is a key element of Kanban. It ensures that team members are not overloaded with tasks, and that work flows smoothly through the system. By limiting the amount of work in progress, teams can focus on completing tasks before moving on to the next one. Manage flow: Managing flow means optimizing the flow of work through the system. Kanban teams use metrics such as lead time, cycle time, and throughput to measure and improve the flow of work. Make process policies explicit: The fourth principle of Kanban is to make process policies explicit. This involves documenting the process policies and making them available to all team members. This helps to ensure that everyone understands the process and follows it consistently. Visualize the Work The first principle of Kanban is to visualize the work. This means creating a visual representation of the work that needs to be done. The Kanban board is the most common tool used to visualize work. It provides a clear and concise view of the work that needs to be done, the work that is in progress, and the work that has been completed. The Kanban board is typically divided into columns that represent different stages of the workflow. For example, the columns might be "to do," "in progress," and "done." Each card on the board represents a task or work item. The card contains information about the task, such as its description, priority, and due date. Limit Work in Progress The second principle of Kanban is to limit work in progress. This means that the team should limit the number of tasks they are working on at any given time. This helps to prevent overloading the team and ensures that work is completed in a timely manner. The team can limit work in progress by setting a WIP (work in progress) limit for each column on the Kanban board. This limit represents the maximum number of tasks that can be in that column at any given time. For example, if the WIP limit for the "in progress" column is set to three, the team can only work on three tasks at a time. Managing Flow The third principle of Kanban is to manage flow. This means that the team should focus on ensuring that work flows smoothly through the workflow. The team should aim to identify and eliminate bottlenecks that slow down the workflow. The team can manage flow by monitoring the flow of work through the workflow and identifying areas where work is getting stuck. They can then take steps to address these bottlenecks and ensure that work flows smoothly. Make Process Policies Explicit The fourth principle of Kanban is to make process policies explicit. This means that the team should clearly define the policies and procedures that govern the workflow. This helps to ensure that everyone on the team understands the process and can follow it consistently. The team can make process policies explicit by documenting the process and sharing it with the team. This might include creating a process manual, developing training materials, or holding team meetings to discuss the process. Implementing Kanban Implementing Kanban is a straightforward process. Here are the steps: Visualize the Workflow The first step in implementing Kanban is to visualize the workflow. This can be done by creating a Kanban board that represents the different stages of the process. Each column on the board represents a different stage, and each work item is represented by a card or a sticky note. Define Work Item Types The next step is to define the different types of work items that will be represented on the board. This could include tasks, bugs, features, and other work items that are relevant to the process. Set WIP Limits The next step is to set work in progress (WIP) limits for each stage of the process. This will ensure that teams only work on a limited number of items at a time, which will help to prevent bottlenecks and improve efficiency. Implement Pull System The next step is to implement a pull system, which means that work items are pulled into the process as resources become available. This helps to ensure that work items are not pushed into the process too quickly, which can lead to bottlenecks. Manage Flow The next step is to manage flow. This means that teams should focus on ensuring that work items move smoothly and efficiently through the process. Teams should regularly review their processes and identify areas where bottlenecks occur or where work items are getting stuck. Once these bottlenecks have been identified, teams should work to eliminate them and improve the flow of work items. Continuous Improvement The final step in implementing Kanban is to focus on continuous improvement. Teams should regularly review their processes and look for ways to improve them. This could include making small changes to the workflow, adjusting WIP limits, or changing the way work items are prioritized. By continually improving their processes, teams can become more efficient and productive over time. Benefits of Kanban Implementing Kanban can provide a number of benefits, including: Improved Efficiency Kanban helps teams become more efficient by limiting work in progress and focusing on improving the flow of work items through the process. This leads to faster turnaround times and improved productivity. Increased Visibility Kanban provides real-time visibility into the status of work items. This helps teams identify bottlenecks and prioritize their work accordingly. Better Communication Kanban encourages better communication between team members by providing a clear, visual representation of the workflow. This helps to ensure that everyone is on the same page and working towards the same goals. Flexibility Kanban is a flexible methodology that can be adapted to meet the specific needs of different teams and industries. This makes it an ideal choice for organizations that are looking to improve their workflows and processes. Types of Kanban As mentioned earlier, there are several types of Kanban that can be used in software development and project management. Let's take a closer look at some of the most common types of Kanban: Task Kanban This is the most common type of Kanban used in software development. It is used to manage individual tasks and their progress through various stages of the workflow. Each task is represented by a card on the Kanban board, and the columns on the board represent the stages of the workflow. As the task moves through each stage, the card is moved to the corresponding column. Task Kanban is useful for tracking progress and identifying any bottlenecks in the workflow. Team Kanban This type of Kanban is used to manage the workflow of an entire team. It is used to visualize the work that needs to be done and ensure that team members are working together effectively. Team Kanban boards typically have columns for each team member, with cards representing the work that needs to be done. This allows team members to see what others are working on and identify any dependencies or potential conflicts. Portfolio Kanban Portfolio Kanban is used to manage a portfolio of projects or initiatives. It provides a high-level view of the status of each project and helps to prioritize work based on business objectives. The Kanban board typically has columns for each project, with cards representing the tasks or work items associated with each project. Portfolio Kanban is useful for managing complex projects with multiple stakeholders. Value Stream Kanban Value Stream Kanban is used to manage the entire value stream of a product or service. It involves mapping the entire process from concept to delivery and identifying any inefficiencies or areas for improvement. The Kanban board typically has columns representing each stage of the value stream, with cards representing the work items associated with each stage. Value Stream Kanban is useful for identifying areas for process improvement and optimizing the entire value stream. Continuous Delivery Kanban Continuous Delivery Kanban is used to manage the delivery of software in a continuous manner. It involves breaking down the delivery process into smaller, manageable pieces and using Kanban to track the progress of each piece. The Kanban board typically has columns representing each stage of the delivery process, with cards representing the features or changes associated with each stage. Continuous Delivery Kanban is useful for speeding up the delivery process and ensuring that software is delivered in a timely and efficient manner. Conclusion The Kanban methodology is a simple yet effective system for managing workflow and improving efficiency. By visualizing work, limiting work in progress, managing flow, and focusing on continuous improvement, teams can become more productive and efficient over time. Kanban is a flexible methodology that can be adapted to meet the specific needs of different teams and industries. Whether you are working in manufacturing, software development, or any other industry, Kanban can help you improve your processes and achieve better results. Kanban is a powerful methodology that can be used in a variety of settings to improve workflow efficiency, reduce waste, and increase productivity. By visualizing the workflow, limiting work in progress, managing flow, and making process policies explicit, Kanban helps teams to work more effectively and efficiently. With several different types of Kanban to choose from, teams can select the approach that best suits their needs and use it to manage their work more effectively. Whether you are working on software development, project management, or any other type of initiative, Kanban can help you achieve your goals and deliver value to your customers.

By Aditya Bhuyan
A Comprehensive Guide To Testing and Debugging AWS Lambda Functions
A Comprehensive Guide To Testing and Debugging AWS Lambda Functions

AWS Lambda functions are a powerful tool for running serverless applications in the cloud. But as with any code, bugs can occur that can result in poor performance or even system crashes. Testing and debugging Lambda functions can help you identify potential issues before they become a problem. In this guide, we’ll cover everything from unit testing to automated tests to ensure that your Lambda functions are running smoothly. Testing and Debugging AWS Lambda Functions When it comes to AWS Lambda functions, testing and debugging are crucial steps in ensuring that the code runs smoothly. One of the essential concepts to understand is what a test event is in AWS Lambda. A test event is a simulated event that triggers the Lambda function. It allows you to test the function's response to different inputs. To create a test event, you need to define a JSON payload that resembles the actual event that will trigger the function. The payload should contain all the required parameters and data needed to execute the function successfully. You can then use this payload to test the function using the AWS Lambda console or any other testing framework. It's important to note that testing with different types of events is necessary to ensure that the Lambda function can handle various scenarios and inputs. By doing so, you can catch potential errors or bugs before pushing the code into production. In the next sections of this guide, we'll explore different methods for testing and debugging AWS Lambda functions to ensure they run as expected. Configure Test Event in AWS Lambda To configure a test event in AWS Lambda, you can follow a few simple steps. First, navigate to the AWS Management Console and select your Lambda function. In the Function code section, you should see a Test button. Clicking on this button will open up the Configure test event dialog box. From there, you can give your test event a name and then define the JSON payload that will be used to trigger the function. You can also use sample templates provided by AWS or upload your own JSON file. Once you've configured the test event, you can click the Create button to save it. Now, whenever you want to test your function, you can select the test event from the drop-down menu, and the function will be triggered using the specified payload. By configuring test events for different scenarios and inputs, you can ensure that your Lambda function is well-tested and performs as expected. Testing and Debugging AWS Lambda Locally via AWS SAM Another way to test and debug AWS Lambda functions is by using AWS SAM (Serverless Application Model) to run them locally. AWS SAM is an open-source framework that you can use to build serverless applications locally and deploy them to the cloud. With AWS SAM, you can write code in your favorite IDE (Integrated Development Environment), test it locally, and then deploy it to AWS Lambda. To get started with AWS SAM, you need to install it on your local machine and configure your development environment. Once you have done that, you can create a SAM template file that defines your Lambda function's configuration and dependencies. The SAM template file also specifies the events that will trigger your function. You can then use the sam local start-lambdacommand to test your function locally. This command simulates an event and passes it to your function for processing. You can also use sam local start-api to test your function as a REST API, which allows you to send HTTP requests to your function and receive responses. By testing and debugging AWS Lambda functions locally with AWS SAM, you can catch errors and bugs early on in the development process and streamline your deployment workflow. Setup Integration Tests With AWS Lambda Another important aspect of testing AWS Lambda functions is setting up integration tests. Integration tests are essential as they ensure that all the different components of your application work together seamlessly. AWS provides several tools and services that help you set up integration tests for your Lambda functions. One such tool is AWS Step Functions, which allows you to define a workflow that integrates multiple Lambda functions and other AWS services. You can use Step Functions to test the entire workflow end-to-end, including error handling and retries. Another tool is AWS CodePipeline, which provides a fully managed continuous delivery service that automates your release process for fast and reliable updates. In addition to these tools, you can also use third-party testing frameworks such as JUnit or TestNG to write integration tests for your Lambda functions. These frameworks allow you to create test suites that cover various scenarios and edge cases. When setting up integration tests, it is important to consider factors such as data consistency, error handling, and scalability. You should also ensure that your tests are repeatable and can be run in different environments. By setting up robust integration tests, you can catch any issues before they reach production and ensure that your Lambda function works as expected in real-world scenarios. How To Automate AWS Lambda Testing Automating AWS Lambda testing can save time and effort in the long run. One way to automate testing is by using AWS Lambda Layers. Lambda Layers is a distribution mechanism for libraries, custom runtimes, and other function dependencies. By creating a Lambda Layer for your testing framework and including it in your Lambda function, you can automate the testing process whenever there is an update to the function code. Another approach to automating AWS Lambda testing is by using AWS CodeBuild, a fully-managed continuous integration service that compiles source code, runs tests, and produces software packages. You can use AWS CodeBuild to automatically build and test your Lambda function whenever there is a new commit to the code repository. This ensures that any changes made to the codebase are tested thoroughly before being deployed to production. Finally, you can also use AWS CloudFormation to automate the deployment and testing of your Lambda function. AWS CloudFormation allows you to define infrastructure as code, including the Lambda function, its dependencies, and any associated resources. By defining a CloudFormation stack that includes your Lambda function and its tests, you can automate the entire deployment and testing process. In conclusion, automating AWS Lambda testing is crucial for ensuring the reliability and performance of your serverless applications. By using tools such as Lambda Layers, AWS CodeBuild, and AWS CloudFormation, you can streamline the testing process and catch any issues before they impact your users. Conclusion Testing and debugging are essential for ensuring that your AWS Lambda functions perform optimally. With the help of test events, you can create and manage different scenarios to check for bugs and errors before they become a problem. By using this proactive approach, you can rest assured that your Lambda functions will continue to run smoothly. Try out the steps outlined in this guide today to start testing and debugging your Lambda functions.

By Satrajit Basu CORE
How to Supplement SharePoint Site Drive Security With Java Code Examples
How to Supplement SharePoint Site Drive Security With Java Code Examples

There are more than 250,000 companies/organizations around the world leaning on SharePoint to securely manage their most valuable documents, and more than 3 million total users. This widespread popularity makes the platform a market-leading document management solution - and this, by extension, makes it a worthwhile target for motivated threat actors. Bypassing SharePoint’s built-in security is an extremely difficult task, of course. The O365 environment provides tenants with powerful protection at every entry point, from exhaustive physical data center security up to leading-edge application security policies. Top-notch file encryption with SSL and TLS connections is applied to keep user data safe in transit, and BitLocker disk-level encryption with unique encryption keys is used to secure files at rest. Further, as infected file uploads have grown to become an extremely common attack vector, O365 provides built-in virus and malware detection policies (along with anti-phishing policies and various additional email link and attachment security measures) which can be customized extensively per individual or organizational tenants' needs. The list goes on, with each tenant's specific subscription level ultimately determining the extent of their built-in protection. As powerful as SharePoint's customizable built-in security policies are, however, no storage platform's policies are ever intended to be applied as a single point of protection for sensitive data. Document storage security, like any branch of cybersecurity, is a moving target requiring myriad solutions working together to jointly create a formidable defense against evolving attack vectors. In other words, any tenant’s threat profile can always be improved upon with selective layering of external security policies on top of built-in security policies. In the remainder of this article, I’ll demonstrate a free-to-use Virus Scanning API solution that can be integrated with a SharePoint Site Drive instance to scan files for viruses, malware, and a variety of non-malware content threats, working alongside O365's built-in asynchronous scanning to root out a wide range of file upload threat types. Demonstration The Advanced Virus Scan API below is intended to serve as a powerful layer of document storage security in conjunction with SharePoint's built-in customizable policies, directly scanning new file uploads in targeted Site Drive instances for a growing list of 17 million+ virus and malware signatures (including ransomware, spyware, trojans, etc.), while also performing full content verification to identify invalid file types and other non-malware threats hidden behind misleading file names and illegitimate file extensions. This API also allows developers to set custom restrictions against unwanted file types in the API request body, so various unnecessary and potentially threatening file types can be detected and deleted outright regardless of the legitimacy of their contents. For example, a Site Drive storing contract documents likely only requires common file types like .DOCX or .PDF: limiting files to these types helps minimize risks without compromising workflow efficiency. Below, I’ve outlined the information you’ll need to integrate this API with your SharePoint Online Site Drive instance, and I’ve provided ready-to-run Java code examples to help you structure your API call with ease. To start off, you’ll need to gather the following SharePoint information to satisfy mandatory parameters in the API request body: Client ID (Client ID access credentials; can be obtained from Azure Active Directory portal) Client Secret (Client Secret access credentials; also obtained from Azure Active Directory portal SharePoint Domain Name (i.e., yourdomain.sharepoint.com) Site ID (the specific SharePoint ID for the site drive you want to retrieve and scan files from) Optionally, you can also gather the following SharePoint information: Tenant ID (pertaining to your Azure Active Directory) File Path (path of a specific file within your Site Drive) Item ID (e.g., DriveItem ID) Once you’ve gotten all your mandatory information, you can start client SDK installation by adding the following reference to the repository in your Maven POM File (JitPack is used to dynamically compile the library): XML <repositories> <repository> <id>jitpack.io</id> <url>https://jitpack.io</url> </repository> </repositories> Then you can wrap up by adding the following reference to the dependency: XML <dependencies> <dependency> <groupId>com.github.Cloudmersive</groupId> <artifactId>Cloudmersive.APIClient.Java</artifactId> <version>v4.25</version> </dependency> </dependencies> At this point, you can add the imports and copy Java code examples to structure your API call: Java // Import classes: //import com.cloudmersive.client.invoker.ApiClient; //import com.cloudmersive.client.invoker.ApiException; //import com.cloudmersive.client.invoker.Configuration; //import com.cloudmersive.client.invoker.auth.*; //import com.cloudmersive.client.ScanCloudStorageApi; ApiClient defaultClient = Configuration.getDefaultApiClient(); // Configure API key authorization: Apikey ApiKeyAuth Apikey = (ApiKeyAuth) defaultClient.getAuthentication("Apikey"); Apikey.setApiKey("YOUR API KEY"); // Uncomment the following line to set a prefix for the API key, e.g. "Token" (defaults to null) //Apikey.setApiKeyPrefix("Token"); ScanCloudStorageApi apiInstance = new ScanCloudStorageApi(); String clientID = "clientID_example"; // String | Client ID access credentials; see description above for instructions on how to get the Client ID from the Azure Active Directory portal. String clientSecret = "clientSecret_example"; // String | Client Secret access credentials; see description above for instructions on how to get the Client Secret from the Azure Active Directory portal String sharepointDomainName = "sharepointDomainName_example"; // String | SharePoint Online domain name, such as mydomain.sharepoint.com String siteID = "siteID_example"; // String | Site ID (GUID) of the SharePoint site you wish to retrieve the file from String tenantID = "tenantID_example"; // String | Optional; Tenant ID of your Azure Active Directory String filePath = "filePath_example"; // String | Path to the file within the drive, such as 'hello.pdf' or '/folder/subfolder/world.pdf'. If the file path contains Unicode characters, you must base64 encode the file path and prepend it with 'base64:', such as: 'base64:6ZWV6ZWV6ZWV6ZWV6ZWV6ZWV'. String itemID = "itemID_example"; // String | SharePoint itemID, such as a DriveItem Id Boolean allowExecutables = true; // Boolean | Set to false to block executable files (program code) from being allowed in the input file. Default is false (recommended). Boolean allowInvalidFiles = true; // Boolean | Set to false to block invalid files, such as a PDF file that is not really a valid PDF file, or a Word Document that is not a valid Word Document. Default is false (recommended). Boolean allowScripts = true; // Boolean | Set to false to block script files, such as a PHP files, Python scripts, and other malicious content or security threats that can be embedded in the file. Set to true to allow these file types. Default is false (recommended). Boolean allowPasswordProtectedFiles = true; // Boolean | Set to false to block password protected and encrypted files, such as encrypted zip and rar files, and other files that seek to circumvent scanning through passwords. Set to true to allow these file types. Default is false (recommended). Boolean allowMacros = true; // Boolean | Set to false to block macros and other threats embedded in document files, such as Word, Excel and PowerPoint embedded Macros, and other files that contain embedded content threats. Set to true to allow these file types. Default is false (recommended). Boolean allowXmlExternalEntities = true; // Boolean | Set to false to block XML External Entities and other threats embedded in XML files, and other files that contain embedded content threats. Set to true to allow these file types. Default is false (recommended). String restrictFileTypes = "restrictFileTypes_example"; // String | Specify a restricted set of file formats to allow as clean as a comma-separated list of file formats, such as .pdf,.docx,.png would allow only PDF, PNG and Word document files. All files must pass content verification against this list of file formats, if they do not, then the result will be returned as CleanResult=false. Set restrictFileTypes parameter to null or empty string to disable; default is disabled. try { CloudStorageAdvancedVirusScanResult result = apiInstance.scanCloudStorageScanSharePointOnlineFileAdvanced(clientID, clientSecret, sharepointDomainName, siteID, tenantID, filePath, itemID, allowExecutables, allowInvalidFiles, allowScripts, allowPasswordProtectedFiles, allowMacros, allowXmlExternalEntities, restrictFileTypes); System.out.println(result); } catch (ApiException e) { System.err.println("Exception when calling ScanCloudStorageApi#scanCloudStorageScanSharePointOnlineFileAdvanced"); e.printStackTrace(); } To satisfy the request authentication parameter, you'll need to provide a free-tier API key, which will allow you to scan up to 800 files per month. Within this request body, you can set Booleans to apply custom non-malware threat policies against files containing executables, invalid files, scripts, password-protected files, macros, XML external entities, insecure deserialization, and HTML, and you can provide a comma-separated list of acceptable file types in the restrictFileTypes parameter to disallow unwanted file extensions. Any files violating these policies will automatically receive a CleanResult: False value in the API response body, which is the same value assigned to files containing viruses and malware. The idea is to enact 360-degree content protection in a single request so you can quickly delete (or quarantine/analyze) files that may pose a serious risk to your system. Below, I’ve provided a full example API response for your reference: JSON { "Successful": true, "CleanResult": true, "ContainsExecutable": true, "ContainsInvalidFile": true, "ContainsScript": true, "ContainsPasswordProtectedFile": true, "ContainsRestrictedFileFormat": true, "ContainsMacros": true, "VerifiedFileFormat": "string", "FoundViruses": [ { "FileName": "string", "VirusName": "string" } ], "ErrorDetailedDescription": "string", "FileSize": 0, "ContentInformation": { "ContainsJSON": true, "ContainsXML": true, "ContainsImage": true, "RelevantSubfileName": "string" } } It’s worth noting that regardless of how you choose to set your custom threat rules, files containing JSON, XML, or embedded images will be labeled as such in the API response as well.

By Brian O'Neill CORE
Idempotent Liquibase Changesets
Idempotent Liquibase Changesets

Abstract “Idempotence is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application” [Resource 3]. The purpose of this article is to outline a few ways of creating idempotent changes when the database modifications are managed with Liquibase. Throughout the lifetime of a software product that has such tier, various database modifications are being applied as it evolves. The more robust the modifications are, the more maintainable the solution is. In order to accomplish such a way of working, it is usually a good practice to design the executed changesets to have zero side effects, that is, to be able to be run as many times as needed with the same end result. The simple proof of concept built here aims to show case how Liquibase changesets may be written to be idempotent. Moreover, the article explains in more depth what exactly happens when the application starts. Set Up Java 17 Spring Boot v.3.1.0 Liquibase 4.20.0 PostgreSQL Driver 42.6.0 Maven 3.6.3 Proof of Concept As PostgreSQL is the database used here, first and foremost one shall create a new schema — liquidempo. This operation is easy to accomplish by issuing the following SQL command, once connected to the database. SQL create schema liquidempo; At the application level: The Maven Spring Boot project is created and configured to use the PostgreSQL Driver, Spring Data JPA and Liquibase dependencies. A simple entity is created — Human — with only one attribute, a unique identifier which is also the primary key at database level. Java @Entity @Table(name = "human") @SequenceGenerator(sequenceName = "human_seq", name = "CUSTOM_SEQ_GENERATOR", allocationSize = 1) public class Human { @Id @GeneratedValue(strategy = GenerationType.AUTO, generator = "CUSTOM_SEQ_GENERATOR") @Column(name = "id") private Long id; public Long getId() { return id; } public void setId(Long id) { this.id = id; } } For convenience, when entities are stored, their unique identifiers are generated using a database sequence called human_seq. The data source is configured as usual in the application.properties file. Properties files spring.datasource.type=com.zaxxer.hikari.HikariDataSource spring.datasource.url=jdbc:postgresql://localhost:5432/postgres?currentSchema=liquidempo&useUnicode=true&characterEncoding=utf8&useSSL=false&allowPublicKeyRetrieval=true spring.datasource.username=postgres spring.datasource.password=123456 spring.jpa.database-platform=org.hibernate.dialect.PostgreSQLDialect spring.jpa.hibernate.ddl-auto=none The previously created schema is referred to in the connection URL. DDL handling is disabled, as the infrastructure and the data are intended to be persistent when the application is restarted. As Liquibase is the database migration manager, the changelog path is configured in the application.properties file as well. Properties files spring.liquibase.change-log=classpath:/db/changelog/db.changelog-root.xml For now, the db.changelog-root.xml file is empty. The current state of the project requires a few simple changesets, in order to create the database elements depicted around the Human entity — the table, the sequence, and the primary key constraint. XML <?xml version="1.0" encoding="UTF-8"?> <databaseChangeLog xmlns="http://www.liquibase.org/xml/ns/dbchangelog" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog https://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-4.17.xsd"> <changeSet author="horatiucd" id="100"> <createSequence sequenceName="human_seq" startValue="1" incrementBy="1"/> </changeSet> <changeSet author="horatiucd" id="200"> <createTable tableName="human"> <column name="id" type="BIGINT"> <constraints nullable="false"/> </column> </createTable> </changeSet> <changeSet author="horatiucd" id="300"> <addPrimaryKey columnNames="id" constraintName="human_pk" tableName="human"/> </changeSet> </databaseChangeLog> In order for these to be applied, they need to be recorded as part of db.changelog-root.xml file, as indicated below. XML <?xml version="1.0" encoding="UTF-8"?> <databaseChangeLog xmlns="http://www.liquibase.org/xml/ns/dbchangelog" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog https://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-4.17.xsd"> <include file="db/changelog/human_init.xml"/> </databaseChangeLog> When the application is restarted, the three changesets are executed in the order they are declared. Plain Text INFO 9092 --- [main] liquibase.database : Set default schema name to liquidempo INFO 9092 --- [main] liquibase.lockservice : Successfully acquired change log lock INFO 9092 --- [main] liquibase.changelog : Creating database history table with name: liquidempo.databasechangelog INFO 9092 --- [main] liquibase.changelog : Reading from liquidempo.databasechangelog Running Changeset: db/changelog/human_init.xml::100::horatiucd INFO 9092 --- [main] liquibase.changelog : Sequence human_seq created INFO 9092 --- [main] liquibase.changelog : ChangeSet db/changelog/human_init.xml::100::horatiucd ran successfully in 6ms Running Changeset: db/changelog/human_init.xml::200::horatiucd INFO 9092 --- [main] liquibase.changelog : Table human created INFO 9092 --- [main] liquibase.changelog : ChangeSet db/changelog/human_init.xml::200::horatiucd ran successfully in 4ms Running Changeset: db/changelog/human_init.xml::300::horatiucd INFO 9092 --- [main] liquibase.changelog : Primary key added to human (id) INFO 9092 --- [main] liquibase.changelog : ChangeSet db/changelog/human_init.xml::300::horatiucd ran successfully in 8ms INFO 9092 --- [main] liquibase : Update command completed successfully. INFO 9092 --- [main] liquibase.lockservice : Successfully released change log lock Moreover, they are recorded as separate rows in the databasechangelog database table. Plain Text +---+---------+---------------------------+--------------------------+-------------+--------+----------------------------------+------------------------------------------------------+ |id |author |filename |dateexecuted |orderexecuted|exectype|md5sum |description | +---+---------+---------------------------+--------------------------+-------------+--------+----------------------------------+------------------------------------------------------+ |100|horatiucd|db/changelog/human_init.xml|2023-05-26 16:23:17.184239|1 |EXECUTED|8:db8c5fb392dc96efa322da2c326b5eba|createSequence sequenceName=human_seq | |200|horatiucd|db/changelog/human_init.xml|2023-05-26 16:23:17.193031|2 |EXECUTED|8:ed8e5e7df5edb17ed9a0682b9b640d7f|createTable tableName=human | |300|horatiucd|db/changelog/human_init.xml|2023-05-26 16:23:17.204184|3 |EXECUTED|8:a2d6eff5a1e7513e5ab7981763ae532b|addPrimaryKey constraintName=human_pk, tableName=human| +---+---------+---------------------------+--------------------------+-------------+--------+----------------------------------+------------------------------------------------------+ So far, everything is straightforward, nothing out of the ordinary — a simple Spring Boot application whose database changes are managed with Liquibase. When examining the above human_init.xml file, one can easily depict the three scripts that result from the three changesets. None is idempotent. It means that if they are executed again (although there is no reason for doing it here) errors will occur because the human_seq sequence, the human table, and the human_pk primary key already exist. Idempotent Changesets If the SQL code that results from the XML changesets had been written directly and aimed to be idempotent, it would have read as follows: SQL CREATE SEQUENCE IF NOT EXISTS human_seq INCREMENT 1 MINVALUE 1 MAXVALUE 99999999999; CREATE TABLE IF NOT EXISTS human ( id SERIAL CONSTRAINT human_pk PRIMARY KEY ); If the two commands are executed several times, no errors occur and the outcome remains the same. After the first run, the sequence, the table, and the constraint are created, then every new execution leaves them in the same usable state. The aim is to accomplish the same in the written Liquibase changesets (changelog). According to the Liquibase documentation [Resource 1]: “Preconditions are tags you add to your changelog or individual changesets to control the execution of an update based on the state of the database. Preconditions let you specify security and standardization requirements for your changesets. If a precondition on a changeset fails, Liquibase does not deploy that changeset.” These constructs may be configured in various ways, either at changelog or changeset level. For simplicity, the three changesets of this proof of concept will be made idempotent. Basically, whenever a changeset fails to execute because the entity (sequence, table, or primary key) already exists, it would be convenient to continue and not halt the execution of the entire changelog and not be able to start the application. In this direction, Liquibase preconditions provide at least two options: Either skip over the changeset and continue with the changelog, or Skip over the changeset but mark it as executed and continue with the changelog. Either of the two can be configured by adding a preConditions tag in the changeset of interest and setting the onFail attribute as CONTINUE (the former case) or MARK_RAN (the latter case). In pseudo-code, this looks as below: XML <changeSet author="horatiucd" id="100"> <preConditions onFail="CONTINUE or MARK_RAN"> ... </preConditions> ... </changeSet> This seems in line with the initial desire — execute the changeset only if the preconditions are met. Next, each of the two situations is analyzed. onFail=”CONTINUE” The changelog file — human_init_idempo_continue.xml — becomes as below: XML <?xml version="1.0" encoding="UTF-8"?> <databaseChangeLog xmlns="http://www.liquibase.org/xml/ns/dbchangelog" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog https://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-4.17.xsd"> <changeSet author="horatiucd" id="101"> <preConditions onFail="CONTINUE"> <not> <sequenceExists sequenceName="human_seq"/> </not> </preConditions> <createSequence sequenceName="human_seq" startValue="1" incrementBy="1"/> </changeSet> <changeSet author="horatiucd" id="201"> <preConditions onFail="CONTINUE"> <not> <tableExists tableName="human"/> </not> </preConditions> <createTable tableName="human"> <column name="id" type="BIGINT"> <constraints nullable="false"/> </column> </createTable> </changeSet> <changeSet author="horatiucd" id="301"> <preConditions onFail="CONTINUE"> <not> <primaryKeyExists primaryKeyName="human_pk" tableName="human"/> </not> </preConditions> <addPrimaryKey columnNames="id" constraintName="human_pk" tableName="human"/> </changeSet> </databaseChangeLog> For each item, the precondition checks if it does not exist. When running the application, the log shows what is executed: Plain Text INFO 49016 --- [main] liquibase.database : Set default schema name to liquidempo INFO 49016 --- [main] liquibase.changelog : Reading from liquidempo.databasechangelog INFO 49016 --- [main] liquibase.lockservice : Successfully acquired change log lock Running Changeset: db/changelog/human_init_idempo_continue.xml::101::horatiucd INFO 49016 --- [main] liquibase.changelog : Continuing past: db/changelog/human_init_idempo_continue.xml::101::horatiucd despite precondition failure due to onFail='CONTINUE': db/changelog/db.changelog-root.xml : Not precondition failed Running Changeset: db/changelog/human_init_idempo_continue.xml::201::horatiucd INFO 49016 --- [main] liquibase.changelog : Continuing past: db/changelog/human_init_idempo_continue.xml::201::horatiucd despite precondition failure due to onFail='CONTINUE': db/changelog/db.changelog-root.xml : Not precondition failed Running Changeset: db/changelog/human_init_idempo_continue.xml::301::horatiucd INFO 49016 --- [main] liquibase.changelog : Continuing past: db/changelog/human_init_idempo_continue.xml::301::horatiucd despite precondition failure due to onFail='CONTINUE': db/changelog/db.changelog-root.xml : Not precondition failed INFO 49016 --- [main] liquibase : Update command completed successfully. INFO 49016 --- [main] liquibase.lockservice : Successfully released change log lock As expected, all three preconditions failed and the execution of the changelog continued. The databasechangelog database table does not have any records in addition to the previous three, which means the changesets will be attempted to be executed again at the next startup of the application. onFail=”MARK_RAN” The changelog file — human_init_idempo_mark_ran.xml — is similar to the one in human_init_idempo_continue.xml. The only difference is the onFail attribute, which is set as onFail="MARK_RAN". The db.changelog-root.xml root changelog now looks as below: XML <?xml version="1.0" encoding="UTF-8"?> <databaseChangeLog xmlns="http://www.liquibase.org/xml/ns/dbchangelog" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog https://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-4.17.xsd"> <include file="db/changelog/human_init.xml"/> <include file="db/changelog/human_init_idempo_continue.xml"/> <include file="db/changelog/human_init_idempo_mark_ran.xml"/> </databaseChangeLog> For this proof of concept, all three files were kept on purpose, in order to be able to observe the behavior in detail. If the application is restarted, no errors are encountered and the log depicts the following: Plain Text INFO 38788 --- [main] liquibase.database : Set default schema name to liquidempo INFO 38788 --- [main] liquibase.changelog : Reading from liquidempo.databasechangelog INFO 38788 --- [main] liquibase.lockservice : Successfully acquired change log lock INFO 38788 --- [main] liquibase.changelog : Reading from liquidempo.databasechangelog Running Changeset: db/changelog/human_init_idempo_continue.xml::101::horatiucd INFO 38788 --- [main] liquibase.changelog : Continuing past: db/changelog/human_init_idempo_continue.xml::101::horatiucd despite precondition failure due to onFail='CONTINUE': db/changelog/db.changelog-root.xml : Not precondition failed Running Changeset: db/changelog/human_init_idempo_continue.xml::201::horatiucd INFO 38788 --- [main] liquibase.changelog : Continuing past: db/changelog/human_init_idempo_continue.xml::201::horatiucd despite precondition failure due to onFail='CONTINUE': db/changelog/db.changelog-root.xml : Not precondition failed Running Changeset: db/changelog/human_init_idempo_continue.xml::301::horatiucd INFO 38788 --- [main] liquibase.changelog : Continuing past: db/changelog/human_init_idempo_continue.xml::301::horatiucd despite precondition failure due to onFail='CONTINUE': db/changelog/db.changelog-root.xml : Not precondition failed Running Changeset: db/changelog/human_init_idempo_mark_ran.xml::101::horatiucd INFO 38788 --- [main] liquibase.changelog : Marking ChangeSet: "db/changelog/human_init_idempo_mark_ran.xml::101::horatiucd" as ran despite precondition failure due to onFail='MARK_RAN': db/changelog/db.changelog-root.xml : Not precondition failed Running Changeset: db/changelog/human_init_idempo_mark_ran.xml::201::horatiucd INFO 38788 --- [main] liquibase.changelog : Marking ChangeSet: "db/changelog/human_init_idempo_mark_ran.xml::201::horatiucd" as ran despite precondition failure due to onFail='MARK_RAN': db/changelog/db.changelog-root.xml : Not precondition failed Running Changeset: db/changelog/human_init_idempo_mark_ran.xml::301::horatiucd INFO 38788 --- [main] liquibase.changelog : Marking ChangeSet: "db/changelog/human_init_idempo_mark_ran.xml::301::horatiucd" as ran despite precondition failure due to onFail='MARK_RAN': db/changelog/db.changelog-root.xml : Not precondition failed INFO 38788 --- [main] liquibase : Update command completed successfully. INFO 38788 --- [main] liquibase.lockservice : Successfully released change log lock The changesets with onFail="CONTINUE" were tried to be re-executed, as this is a new attempt, while the ones with onFail="MARK_RAN" were marked in the databasechangelog and will be passed over at the next start-up. Plain Text +---+---------+-------------------------------------------+--------------------------+-------------+--------+----------------------------------+------------------------------------------------------+ |id |author |filename |dateexecuted |orderexecuted|exectype|md5sum |description | +---+---------+-------------------------------------------+--------------------------+-------------+--------+----------------------------------+------------------------------------------------------+ |100|horatiucd|db/changelog/human_init.xml |2023-05-26 16:23:17.184239|1 |EXECUTED|8:db8c5fb392dc96efa322da2c326b5eba|createSequence sequenceName=human_seq | |200|horatiucd|db/changelog/human_init.xml |2023-05-26 16:23:17.193031|2 |EXECUTED|8:ed8e5e7df5edb17ed9a0682b9b640d7f|createTable tableName=human | |300|horatiucd|db/changelog/human_init.xml |2023-05-26 16:23:17.204184|3 |EXECUTED|8:a2d6eff5a1e7513e5ab7981763ae532b|addPrimaryKey constraintName=human_pk, tableName=human| |101|horatiucd|db/changelog/human_init_idempo_mark_ran.xml|2023-05-29 16:40:26.453305|4 |MARK_RAN|8:db8c5fb392dc96efa322da2c326b5eba|createSequence sequenceName=human_seq | |201|horatiucd|db/changelog/human_init_idempo_mark_ran.xml|2023-05-29 16:40:26.463021|5 |MARK_RAN|8:ed8e5e7df5edb17ed9a0682b9b640d7f|createTable tableName=human | |301|horatiucd|db/changelog/human_init_idempo_mark_ran.xml|2023-05-29 16:40:26.475153|6 |MARK_RAN|8:a2d6eff5a1e7513e5ab7981763ae532b|addPrimaryKey constraintName=human_pk, tableName=human| +---+---------+-------------------------------------------+--------------------------+-------------+--------+----------------------------------+------------------------------------------------------+ At the next run of the application, the log will be similar to the one where the onFail was set on "CONTINUE". One more observation is worth making at this point. In case of a changeset whose preconditions do not fail, they are executed normally and recorded with exectype = EXECUTED in the databasechangelog table. Conclusions This article presented two ways of writing idempotent Liquibase changesets, a practice that allows having more robust and easy-to-maintain applications. This was accomplished by leveraging the changeset preConditions tag inside the changelog files. While both onFail attribute values — CONTINUE and MARK_RAN — may be used depending on the actual performed operation, the latter seems more appropriate for this proof of concept, as it does not attempt to re-run the changesets at every start-up of the application. Resources Liquibase Documentation Source code for the sample application Idempotence

By Horatiu Dan

Culture and Methodologies

Agile

Agile

Career Development

Career Development

Methodologies

Methodologies

Team Management

Team Management

Faster, Smarter, Stronger: Unleash Your Software's Full Potential With Continuous Shift-Left Test Automation

June 13, 2023 by Pavel Novik

How Dynamic Internal Developer Platforms Boost Developer Experience and Productivity

June 13, 2023 by Luca Galante

Operational Testing Tutorial: Comprehensive Guide With Best Practices

May 16, 2023 by Harshit Paul

Data Engineering

AI/ML

AI/ML

Big Data

Big Data

Databases

Databases

IoT

IoT

Harnessing the Power of NiFi: Building a Seamless Flow To Ingest PM2.5 Data From a MiNiFi Java Agent With a Particle Sensor

June 13, 2023 by Tim Spann CORE

Scaling for Success: Why Scalability Is the Forefront of Modern Applications

June 13, 2023 by Ankit Anchlia

Explainable AI: Making the Black Box Transparent

May 16, 2023 by Yifei Wang

Software Design and Architecture

Cloud Architecture

Cloud Architecture

Integration

Integration

Microservices

Microservices

Performance

Performance

Spring Cloud: How to Implement Service Discovery (Part 1)

June 13, 2023 by Mario Casari

Everything Bad in Java Is Good for You

June 13, 2023 by Shai Almog CORE

Low Code vs. Traditional Development: A Comprehensive Comparison

May 16, 2023 by Tien Nguyen

Coding

Frameworks

Frameworks

Java

Java

JavaScript

JavaScript

Languages

Languages

Tools

Tools

Spring Cloud: How to Implement Service Discovery (Part 1)

June 13, 2023 by Mario Casari

Everything Bad in Java Is Good for You

June 13, 2023 by Shai Almog CORE

Scaling Event-Driven Applications Made Easy With Sveltos Cross-Cluster Configuration

May 15, 2023 by Gianluca Mardente

Testing, Deployment, and Maintenance

Deployment

Deployment

DevOps and CI/CD

DevOps and CI/CD

Maintenance

Maintenance

Monitoring and Observability

Monitoring and Observability

Faster, Smarter, Stronger: Unleash Your Software's Full Potential With Continuous Shift-Left Test Automation

June 13, 2023 by Pavel Novik

Scaling for Success: Why Scalability Is the Forefront of Modern Applications

June 13, 2023 by Ankit Anchlia

Low Code vs. Traditional Development: A Comprehensive Comparison

May 16, 2023 by Tien Nguyen

Popular

AI/ML

AI/ML

Java

Java

JavaScript

JavaScript

Open Source

Open Source

Spring Cloud: How to Implement Service Discovery (Part 1)

June 13, 2023 by Mario Casari

Everything Bad in Java Is Good for You

June 13, 2023 by Shai Almog CORE

Five IntelliJ Idea Plugins That Will Change the Way You Code

May 15, 2023 by Toxic Dev

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com

Let's be friends: