2024 DevOps Lifecycle: Share your expertise on CI/CD, deployment metrics, tech debt, and more for our Feb. Trend Report (+ enter a raffle!).
Kubernetes in the Enterprise: Join our Virtual Roundtable as we dive into Kubernetes over the past year, core usages, and emerging trends.
Achieving Continuous Compliance
DZone's Annual DevOps Research — Join Us! [survey + raffle]
Enterprise Security
This year has observed a rise in the sophistication and nuance of approaches to security that far surpass the years prior, with software supply chains being at the top of that list. Each year, DZone investigates the state of application security, and our global developer community is seeing both more automation and solutions for data protection and threat detection as well as a more common security-forward mindset that seeks to understand the Why.In our 2023 Enterprise Security Trend Report, we dive deeper into the greatest advantages and threats to application security today, including the role of software supply chains, infrastructure security, threat detection, automation and AI, and DevSecOps. Featured in this report are insights from our original research and related articles written by members of the DZone Community — read on to learn more!
API Integration Patterns
Getting Started With Low-Code Development
In the dynamic world of web development, Single Page Applications (SPAs) and frameworks like React, Angular, and Vue.js have emerged as the preferred approach for delivering seamless user experiences. With the evolution of the Kotlin language and its recent multiplatform capabilities, new options exist that are worthwhile to evaluate. In this article, we will explore Kotlin/JS for creating a web application that communicates with a Spring Boot backend which is also written in Kotlin. In order to keep it as simple as possible, we will not bring in any other framework. Advantages of Kotlin/JS for SPA Development As described in the official documentation, Kotlin/JS provides the ability to transpile Kotlin code, the Kotlin standard library, and any compatible dependencies to JavaScript (ES5). With Kotlin/JS we can manipulate the DOM and create dynamic HTML by taking advantage of Kotlin's conciseness and expressiveness, coupled with its compatibility with JavaScript. And of course, we do have the much needed type-safety, which reduces the likelihood of runtime errors. This enables developers to write client-side code with reduced boilerplate and fewer errors. Additionally, Kotlin/JS seamlessly integrates with popular JavaScript libraries (and frameworks), thus leveraging the extensive ecosystem of existing tools and resources. And, last but not least: this makes it easier for a backend developer to be involved with the frontend part as it looks more familiar. Moderate knowledge of "vanilla" JavaScript, the DOM, and HTML is of course needed; but especially when we are dealing with non-intensive apps (admin panels, back-office sites, etc.), one can get engaged rather smoothly. Sample Project The complete source code for this showcase is available on GitHub. The backend utilizes Spring Security for protecting a simple RESTful API with basic CRUD operations. We won't expand more on this since we want to keep the spotlight on the frontend part which demonstrates the following: Log in with username/password Cookie-based session Page layout with multiple tabs and top navigation bar (based on Bootstrap) Client-side routing (based on Navigo) Table with pagination, sorting, and filtering populated with data fetched from the backend (based on DataTables) Basic form with input fields including (dependent) drop-down lists (based on Bootstrap) Modals and loading masks (based on Bootstrap and spin.js) Usage of sessionStorage and localStorage Usage of Ktor HttpClient for making HTTP calls to the backend An architectural overview is provided in the diagram below: Starting Point The easiest way to start exploring is by creating a new Kotlin Multiplatform project from IntelliJ. The project's template must be "Full-Stack Web Application": This will create the following project structure: springMain: This is the module containing the server-side implementation. springTest: For the Spring Boot tests commonMain: This module contains "shared" code between the frontend and the backend; e.g., DTOs commonTest: For the unit tests of the "common" module jsMain: This is the frontend module responsible for our SPA. jsTest: For the Kotlin/JS tests The sample project on GitHub is based on this particular skeleton. Once you clone the project you may start the backend by executing: $ ./gradlew bootRun This will spin up the SpringBoot app, listening on port: 8090. In order to start the frontend, execute: $ ./gradlew jsBrowserDevelopmentRun -t This will open up a browser window automatically navigating to http://localhost:8080 and presenting the user login page. For convenience, a couple of users are provisioned on the server (have a look at dev.kmandalas.demo.config.SecurityConfig for details). Once logged in, the user views a group of tabs with the main tab presenting a table (data grid) with items fetched from the server. The user can interact with the table (paging, sorting, filtering, data export) and add a new item (product) by pressing the "Add product" button. In this case, a form is presented within a modal with typical input fields and dependent drop-down lists with data fetched from the server. In fact, there is some caching applied on this part in order to reduce network calls. Finally, from the top navigation bar, the user can toggle the theme (this setting is preserved in the browser's local storage) and perform logout. In the next section, we will explore some low-level details for selected parts of the frontend module. The jsMain Module Let's start by having a look at the structure of the module: The naming of the Kotlin files should give an idea about the responsibility of each class. The "entrypoint" is of course the Main.kt class: Kotlin import home.Layout import kotlinx.browser.window import kotlinx.coroutines.MainScope import kotlinx.coroutines.launch fun main() { MainScope().launch { window.onload = { Layout.init() val router = Router() router.start() } } } Once the "index.html" file is loaded, we initialize the Layout and our client-side Router. Now, the "index.html" imports the JavaScript source files of the things we use (Bootstrap, Navigo, Datatables, etc.) and their corresponding CSS files. And of course, it imports the "transpiled" JavaScript file of our Kotlin/JS application. Apart from this, the HTML body part consists of some static parts like the "Top Navbar," and most importantly, our root HTML div tag. Under this tag, we will perform the DOM manipulations needed for our simple SPA. By importing the kotlinx.browser package in our Kotlin classes and singletons, we have access to top-level objects such as the document and window. The standard library provides typesafe wrappers for the functionality exposed by these objects (wherever possible) as described in the Browser and DOM API. So this is what we do at most parts of the module by writing Kotlin and not JavaScript or using jQuery, and at the same time having type-safety without using, e.g., TypeScript. So for example we can create content like this: Kotlin private fun buildTable(products: List<Product>): HTMLTableElement { val table = document.createElement("table") as HTMLTableElement table.className = "table table-striped table-hover" // Header val thead = table.createTHead() val headerRow = thead.insertRow() headerRow.appendChild(document.createElement("th").apply { textContent = "ID" }) headerRow.appendChild(document.createElement("th").apply { textContent = "Name" }) headerRow.appendChild(document.createElement("th").apply { textContent = "Category" }) headerRow.appendChild(document.createElement("th").apply { textContent = "Price" }) // Body val tbody = table.createTBody() for (product in products) { val row = tbody.insertRow() row.appendChild(document.createElement("td").apply { textContent = product.id.toString() }) row.appendChild(document.createElement("td").apply { textContent = product.name }) row.appendChild(document.createElement("td").apply { textContent = product.category.name }) row.appendChild(document.createElement("td").apply { textContent = product.price.toString() }) } document.getElementById("root")?.appendChild(table) return table } Alternatively, we can use the Typesafe HTML DSL of the kotlinx.html library which looks pretty cool. Or we can load HTML content as "templates" and further process them. Seems that many possibilities exist for this task. Moving on, we can attach event-listeners thus dynamic behavior to our UI elements like this: Kotlin categoryDropdown?.addEventListener("change", { val selectedCategory = categoryDropdown.value // Fetch sub-categories based on the selected category mainScope.launch { populateSubCategories(selectedCategory) } }) Before talking about some "exceptions to the rule", it's worth mentioning that we use the Ktor HTTP client (see ProductApi) for making the REST calls to the backend. We could use the ported Fetch API for this task but going with the client looks way better. Of course, we need to add the ktor-client as a dependency to the build.gradle.kts file: Kotlin val jsMain by getting { dependsOn(commonMain) dependencies { implementation("io.ktor:ktor-client-core:$ktorVersion") implementation("io.ktor:ktor-client-js:$ktorVersion") implementation("io.ktor:ktor-client-content-negotiation:$ktorVersion") //... } } The client includes the JSESSIONID browser cookie received from the server upon successful authentication of the HTTP requests. If this is omitted, we will get back HTTP 401/403 errors from the server. These are also handled and displayed within Bootstrap modals. Also, a very convenient thing regarding the client-server communication is the sharing of common data classes (Product.kt and Category.kt, in our case) between the jsMain and springMain modules. Exception 1: Use Dependencies From npm For client-side routing, we selected the Navigo JavaScript library. This library is not part of Kotlin/JS, but we can import it in Gradle using the npm function: Kotlin val jsMain by getting { dependsOn(commonMain) dependencies { //... implementation(npm("navigo", "8.11.1")) } } However, because JavaScript modules are dynamically typed and Kotlin is statically typed, in order to manipulate Navigo from Kotlin we have to provide an "adapter." This is what we do within the Router.kt class: Kotlin @JsModule("navigo") @JsNonModule external class Navigo(root: String, resolveOptions: ResolveOptions = definedExternally) { fun on(route: String, handler: () -> Unit) fun resolve() fun navigate(s: String) } With this in place, the Navigo JavaScript module can be used just like a regular Kotlin class. Exception 2: Use JavaScript Code From Kotlin It is possible to invoke JavaScript functions from Kotlin code using the js() function. Here are some examples from our example project: Kotlin // From ProductTable.kt: private fun initializeDataTable() { js("new DataTable('#$PRODUCTS_TABLE_ID', $DATATABLE_OPTIONS)") } // From ModalUtil.kt: val modalElement = document.getElementById(modal.id) as? HTMLDivElement modalElement?.let { js("new bootstrap.Modal(it).show()") } However, this should be used with caution since this way we are outside Kotlin's type system. Takeaways In general, the best framework to choose depends on several factors with one of the most important ones being, "The one that the developer team is more familiar with." On the other hand, according to Thoughtworks Technology radar, the SPA by default approach is under question; meaning, that we should not blindly accept the complexity of SPAs and their frameworks by default even when the business needs don't justify it. In this article, we provided an introduction to Kotlin multiplatform with Kotlin/JS which brings new things to the table. Taking into consideration the latest additions in the ecosystem - namely Kotlin Wasm and Compose Multiplatform - it becomes evident that these advancements offer not only a fresh perspective but also robust solutions for streamlined development.
Every organization dealing with information processing eventually faces the challenge of securely storing confidential data and preventing its leakage. The importance of this issue for a company depends on the potential damage a data breach could cause. The greater the risk of loss from a data leak, the more rigorous the protective measures should be. These measures can range from establishing internal policies and installing Data Loss Prevention (DLP) systems to adopting a Zero Trust approach or creating Air Gaps, which involves physically isolating critical network segments from external access. Isolating secure networks to prevent data exchange with other segments is crucial, particularly for industrial infrastructures and various process control systems like DCS, PLC, SCADA, state-owned companies handling regulated data, and commercial entities involved in innovative projects. However, the concept of an Air Gap is not entirely foolproof. This is mainly because even a fully isolated infrastructure must occasionally interact with the external world. For example, controller firmware needs regular updates, confidential commercial or government data requires refreshing, and outcomes of product designs often have to be presented to the public. One way to address this issue is by transferring data using physical media such as USB drives, memory cards, or hard drives. However, this method comes with its own significant challenges, the most notable being the lack of assurance that the information will not be moved in reverse. A flash drive used to deliver data to an isolated network segment could unintentionally become a vessel for confidential information to leave the company. The inefficiency and security risks associated with using flash drives raise serious concerns. Meanwhile, for about a decade, there has been a much more elegant and technologically advanced solution for one-way information transfer, the Data Diode. This solution is specifically designed to send packets of raw data in a single direction only. What sets Data Diodes apart from other unidirectional data transfer methods is their physical incapability to transmit data both ways. While Data Diodes have some limitations, they offer significant advantages over other options for setting up such connections in several key aspects. Data Diodes – Functionality and Implementation Options Hardware Diode Data diodes can serve as standalone network devices or part of a hardware and software system, offering specialized functionality for one-way data transfer. A hardware diode typically works by having either a transmitting or receiving component removed from a bidirectional communication system. Most data diodes are built with only one of two necessary fiber optic cables, and either the receiver or transmitter is omitted. There are also RS-232 based devices, but they are infrequent, and a notable drawback of this standard is the control lines that allow data to potentially return to the source network. The basic design of a data diode includes interfaces for connecting both to the receiving and transmitting networks, along with a power connector. To enhance functionality, manufacturers sometimes add features to their data diodes, like indicators showing packet transmission or settings to customize the device using a list of approved IP addresses. Unidirectional Gateway A hardware data diode is designed for the one-way transmission of streaming, unprocessed data, such as video camera signals that use specialized protocols like RTP or UDP. However, this becomes challenging with most common file transfer protocols like TCP, FTP, and HTTP, which require two-way communication to verify packet delivery and exchange other information. To enable file transfers using these standard protocols, a combination of hardware and software is necessary. This involves integrating the data diode with a set of proxy servers that convert and adapt the data packets, mimicking the functions of TCP, SMB, or similar protocols. This setup, known as a unidirectional gateway, uses proxy servers on both sides of the data diode, offering more capabilities than a hardware-only solution. Such a gateway not only facilitates data transfer but also adds layers of security, allowing for the monitoring and filtering of data and incorporating antivirus systems and other security tools. Software Data Diodes A significant limitation of hardware data diodes and unidirectional gateways is their relatively low information transfer speed. Often, manufacturers list device speeds ranging up to 100 Mbit/s. In some scenarios, this limited speed can become a bottleneck in a secure network infrastructure. Software data diodes present a solution to this issue. These network devices rely on the logic of their firmware, rather than hardware constraints, to manage information transfer. This allows for a significant increase in the throughput of a unidirectional channel. Generally, these systems are built around a secure operating system's microkernel, which facilitates the logical separation of networks without a return channel. They can achieve throughput rates up to 10 Gbps, support standard transport protocols, and offer advanced features like HTTP status code support. However, software data diodes come with their own drawbacks. There is a theoretical risk of information leakage through the return channel. Application Scenarios for Data Diodes and Unidirectional Gateways Data diodes are commonly used to transfer data from less secure (low) networks to more secure (high) ones. In secure networks, where sensitive data is stored, data diodes help prevent any data leakage. These unidirectional devices are typically employed for tasks such as receiving security updates, replicating databases, and broadcasting external video or audio feeds. However, it is important to understand that data diodes are not designed to protect a high-security network from modern cyberattacks. Their primary function is to stop data from leaking out. This means incoming data packets containing malicious payload could still reach the intended high-security system. Therefore, similar to a two-way connection, it is necessary to thoroughly inspect and "clean" the traffic passing through the data diode. Data can also move in reverse, from a secure network to a less secure one. This process typically involves extracting a limited set of data from a closed system without the ability to control that system. A typical example is using data diodes to transfer parameters from DCS, PLC, and SCADA devices, such as logic controllers, sensors, and other monitoring tools. Additionally, there is a hybrid approach to using data diodes. In this setup, two independent one-way channels are established: one channel sends information to the secure system, while the other sends information out. This method enables comprehensive data exchange, like sending emails, updates, and various logs, and greatly lowers the risk of response-based cyberattacks. Essentially, an attacker would have to breach both channels, overcoming each one's security measures. Data diodes are also valuable for bolstering Industrial Control System (ICS) protection by strictly controlling traffic at sensitive points. For example, a data server in a Demilitarized Zone (DMZ) might be one such point. Even though firewall settings usually let these intermediary devices pass traffic to the industrial network, installing a data diode before the data server and the ICS segment ensures that while critical devices can send status information to the server, no return traffic enters the secure network. Placing another diode between the data server and the corporate network can help preserve the integrity. In both instances, the more critical side of the diode connects to the less crucial components, safeguarding the ICS network from threats originating either from the corporate network or storage while maintaining the essential integrity and accessibility of the data. Conclusion To wrap up, let's review the main benefits and drawbacks of data diodes and unidirectional gateways. The standout feature of most data diodes is their design, which physically prevents two-way information transmission. This characteristic sets them apart from firewalls, as they are, in theory, impervious to being bypassed or hacked. As such, hardware data diodes are extremely reliable for maintaining the confidentiality of sensitive information. However, these advantages come with certain limitations: Data diodes, in their hardware form, do not inherently support traditional transport protocols. This necessitates using proxy servers to adapt/convert the data for transfer. Due to this design, activities like routing and parsing traffic directly through a diode are most often also impossible. The restrictions they impose, coupled with their relatively basic functionality, can make data diodes a costly option. Some of these hardware systems also have limited bandwidth capacity. In essence, data diodes have established themselves as effective tools for providing tangible, robust security for sensitive data. They excel in preventing data leaks and ensuring that only verified traffic is transmitted to secure network segments. These systems are particularly valuable in scenarios where reliability is paramount, such as handling state or commercial secrets and managing production networks, and in the military-industrial sector.
The official Ubuntu Docker image is the most downloaded image from Docker Hub. With over one billion downloads, Ubuntu has proven itself to be a popular and reliable base image on which to build your own custom Docker images. In this post, I show you how to make the most of the base Ubuntu images while building your own Docker images. An Example Dockerfile This is an example Dockerfile that includes the tweaks discussed in this post. I go through each of the settings to explain what value they add: Dockerfile FROM ubuntu:22.04 RUN echo 'APT::Install-Suggests "0";' >> /etc/apt/apt.conf.d/00-docker RUN echo 'APT::Install-Recommends "0";' >> /etc/apt/apt.conf.d/00-docker RUN DEBIAN_FRONTEND=noninteractive \ apt-get update \ && apt-get install -y python3 \ && rm -rf /var/lib/apt/lists/* RUN useradd -ms /bin/bash apprunner USER apprunner Build the image with the command: Shell docker build . -t myubuntu Now that you've seen how to build a custom image from the Ubuntu base image, let's go through each of the settings to understand why they were added. Selecting a Base Image Docker images are provided for all versions of Ubuntu, including Long Term Support (LTS) releases such as 20.04 and 22.04, and normal releases like 19.04, 19.10, 21.04, and 21.10. LTS releases are supported for 5 years, and the associated Docker images are also maintained by Canonical during this period, as described on the Ubuntu release cycle page: These images are also kept up to date, with the publication of rolled up security updated images on a regular cadence, and you should automate your use of the latest images to ensure consistent security coverage for your users. When creating Docker images hosting production software, it makes sense to base your images on the latest LTS release. This allows DevOps teams to rebuild their custom images on top of the latest LTS base image, which automatically includes all updates but is also unlikely to include the kind of breaking changes that can be introduced between major operating system versions. I used the Ubuntu 22.04 LTS Docker image as the base for this image: Shell FROM ubuntu:22.04 Not Installing Suggested or Recommended Dependencies Some packages have a list of suggested or recommended dependencies that aren't required but are installed by default. These additional dependencies can add to the size of the final Docker image unnecessarily, as Ubuntu notes in their blog post about reducing Docker image sizes. To disable the installation of these optional dependencies for all invocations of apt-get, the configuration file at /etc/apt/apt.conf.d/00-docker is created with the following settings: Shell RUN echo 'APT::Install-Suggests "0";' >> /etc/apt/apt.conf.d/00-docker RUN echo 'APT::Install-Recommends "0";' >> /etc/apt/apt.conf.d/00-docker Installing Additional Packages Most custom images based on Ubuntu require you to install additional packages. For example, to run custom applications written in Python, PHP, Java, Node.js, or DotNET, your custom image must have the packages associated with those languages installed. On a typical workstation or server, packages are installed with a simple command like: Shell apt-get install python3 The process of installing new software in a Docker image is non-interactive, which means you don't have an opportunity to respond to prompts. This means you must add the -y argument to automatically answer "yes" to the prompt asking to continue with the package installation: Shell RUN apt-get install -y python3 Preventing Prompt Errors During Package Installation The installation of some packages attempts to open additional prompts to further customize installation options. In a non-interactive environment, such as during the construction of a Docker image, attempts to open these dialogs result in errors like: Shell unable to initialize frontend: Dialog These errors can be ignored as they don't prevent the packages from being installed. But the errors can be prevented by setting the DEBIAN_FRONTEND environment variable to noninteractive: Shell RUN DEBIAN_FRONTEND=noninteractive apt-get install -y python3 The Docker website provides official guidance on the use of the DEBIAN_FRONTEND environment variable. They consider it a cosmetic change and recommend against permanently setting the environment variable. The command above sets the environment variable for the duration of the single apt-get command, meaning any subsequent calls to apt-get will not have the DEBIAN_FRONTEND defined. Cleaning Up Package Lists Before any packages can be installed, you need to update the package list by calling: Shell RUN apt-get update However, the package list is of little value after the required packages have been installed. It's best practice to remove any unnecessary files from a Docker image to ensure the resulting image is as small as it can be. To clean up the package list after the required packages have been installed, the files under /var/lib/apt/lists/ are deleted. Here you update the package list, install the required packages, and clean up the package list as part of a single command, broken up over multiple lines with a backslash at the end of each line: Shell RUN DEBIAN_FRONTEND=noninteractive \ apt-get update \ && apt-get install -y python3 \ && rm -rf /var/lib/apt/lists/* Run as a Non-Root User By default, the root user is run in a Docker container. The root user typically has far more privileges than are required when running a custom application, so creating a new user without root privileges provides better security. The useradd command provides a non-interactive way to create new users. This isn't to be confused with the adduser command, which is a higher-level wrapper over useradd. After all configuration files have been edited and packages have been installed, you create a new user called apprunner: Shell RUN useradd -ms /bin/bash apprunner This user is then set as the default user for any further operations: Shell USER apprunner Conclusion It's possible to use the base Ubuntu Docker images with little customization beyond installing any required additional packages. But with a few tweaks to limit optional packages from being installed, cleaning up package lists after the packages are installed, and creating new users with limited permissions to run custom applications, you can create smaller and more secure images for your custom applications. Learn how to use other popular container images: Using the NGINX Docker image Using the Alpine Docker image Resources Ubuntu Docker image Dockerfile reference Happy deployments!
This is an article from DZone's 2023 Enterprise Security Trend Report.For more: Read the Report Effective application security relies on well-defined processes and a diverse array of specialized tools to provide protection against unauthorized access and attacks. Security testing is a critical part of an application security strategy and should be seamlessly integrated into the secure software development lifecycle (SDLC), acting as a proactive and continuous defense against vulnerabilities throughout the software development process. Development teams are now delivering increasingly complex software using fast release cycles and continuous development and deployment practices. Identifying and addressing potential vulnerabilities at an early stage in the SDLC has given rise to the implementation of a security approach referred to as shift-left security. Within this article, we'll explore the inner workings of the essential security testing tools driving the shift-left security movement. Along our exploration, we'll also demystify runtime application self-protection (RASP) and compare it with the other security testing technologies. Static Application Security Testing Static application security testing (SAST) is a well-known and mature technology that is used to statically analyze source code for known potential vulnerabilities and insecure coding practices without executing it. SAST tools leverage techniques and technologies that are already in use in compilers, such as lexical and semantic analysis, type checking, control and data flow analysis, and taint tracking. In the last decade, these tools have rapidly evolved these compiler techniques to a set of much more comprehensive and complex methods for analyzing the security of the code. Figure 1: The main phases of SAST In modern DevSecOps, static analysis security testing is typically performed as early as possible by integrating SAST tools into developers' development environments and build pipelines. They are valuable security tools to establish good code hygiene and overall secure development practices. Challenges of SAST Solutions Although modern SAST tools demonstrate high effectiveness in identifying coding flaws, they have limitations in identifying business logic or design flaws. Such flaws can only be found through security code review and threat modeling. Also, due to their lack of runtime visibility, SAST tools are not always accurate in their results, often producing false positives. One major challenge of SAST solutions is that modern apps do not consist only of application source code. In modern development environments, the "everything-as-code" paradigm is embraced. That includes not just application source code, but it also extends to infrastructure, smart contracts, continuous integration and continuous delivery (CI/CD) pipelines, business process workflows, and declarative scripts. This diverse range of code artifacts poses a considerable challenge for SAST tools, which are traditionally designed to analyze application source code. Another challenge arises when analyzing large codebases, like large monoliths, as it might take several hours to complete the process. However, if configured to scan only incremental changes, the time required to complete the static analysis is significantly reduced to just a few minutes. SAST Tools and Solutions In the last decade, a plethora of both open-source and commercial SAST tools have emerged, each with different sets of features and capabilities. Free-to-use and open-source tool options that have gained significant popularity include, but are not limited to: Spotbugs, Bandit, Brakeman, Checkov, CodeQL, Semgrep, Snyk, and SonarQube. On the commercial front, SAST solutions offer comprehensive capabilities, often incorporating advanced features such as machine learning, analytics, secret scanning, software composition analysis, remediation recommendations, and integrations with development environments. Dynamic Application Security Testing Dynamic application security testing (DAST) stands as a well-established technology used to evaluate the security of web applications and APIs through simulated attacks. DAST tools invoke an application's entry points in the same way that attackers would. DAST tools don't depend on the source code. Instead, they employ reconnaissance techniques to dynamically discover the app's endpoints, generate an attack surface map, initiate probing by sending carefully crafted requests, and subsequently analyze the outputs for potential vulnerabilities. Figure 2: The main phases of DAST DAST tools provide features and techniques commonly employed in penetration testing or security assessments of applications. These techniques include vulnerability scanning, request/response analysis, brute force, exploit generation, attack surface discovery, and simulated attacks. Over the years, DAST tools have advanced these features, providing a more comprehensive and fine-grained approach to identifying vulnerabilities in web applications. Challenges of DAST Solutions Due to their lack of source code visibility, DAST tools are not always accurate in their results, often producing false positives. They also cannot identify the exact location of a vulnerability in the code. When assessing large-scale applications, DAST tools often take several hours to complete their security testing. Configuring DAST tools to scan only specific application scopes, APIs, or incremental changes can help alleviate this challenge, reducing the time required to conduct dynamic analysis. In modern DevSecOps, DAST is typically executed later in the development lifecycle, often in staging or production environments. Integrating a DAST tool into a CI/CD pipeline is also possible; however, it should be rolled out in phases after ensuring that the configuration of the deployed DAST tool does not produce false positives and does not delay the pipeline considerably more. For these reasons, many development teams have chosen not to integrate DAST tools into their CI/CD pipelines. On the other hand, penetration testers typically include DAST tools as standard tools in their toolkits. DAST Tools and Solutions Notable free-to-use and open-source options include, but are not limited to: ZAP, PortSwigger Burp Suite Community Edition, Nikto, and Wapiti. These tools provide basic scanning, testing, and reporting capabilities. On the other hand, commercial DAST solutions provide more advanced and powerful features with numerous configuration options, fine-grained customization, extensive payload libraries and generators, automated workflows, and professional reporting against security standards, such as OWASP Top 10 or PCI DSS. Interactive Application Security Testing Interactive application security testing (IAST) is an innovative approach to application security testing that examines vulnerabilities during the actual execution of the application with requests that originate from real users or automated tests. This newer technology utilizes instrumentation to monitor running applications, providing real-time visibility into security issues within the code. IAST solutions not only actively monitor HTTP requests and responses exchanged with the application, but they also gather runtime information on the running instance. This level of visibility allows the solution to provide context-driven insights that effectively eliminate any blind spots and improve the detection accuracy. Consequently, IAST tools demonstrate a notably low false positive rate and offer advanced insights into vulnerabilities by correlating identified issues with precise source code locations — something not achievable through DAST tools. IAST solutions typically require integration into the CI/CD environment in order to automatically run all the test suites that will exercise the system's execution paths. This integration enables the immediate delivery of vulnerability information and remediation guidance early in the SDLC. To employ an IAST solution, an instrumentation agent must be deployed on the runtime platform of the application. The goal of the IAST agent is to embed specially created sensor modules in the application code through instrumentation. These sensor modules track the application's runtime state while tests are running. Challenges of IAST Solutions It is important to highlight that the effectiveness of IAST solutions depends on whether all the code paths are being actively exercised or executed with the proper attack payloads during the testing period. In a typical scenario, end-to-end negative tests and DAST scanners are used to send attack payloads against the app. Meanwhile, the IAST tool monitors parts of the app while validating if the payloads uncover flaws or successfully exploit vulnerabilities. This is the reason why IAST tools may not deliver 100% code coverage and their results' effectiveness heavily depends on the coverage and capabilities of the security test suites and the accompanying DAST scanner. This limitation can result in gaps in vulnerability detection, especially in scenarios where certain parts of the application's code remain unexplored or untriggered during the testing process. Another downside of IAST tools is that they are programming language dependent due to their dependence in instrumenting the runtime platform. IAST Tools and Solutions Notably, the OWASP AppSensor project deserves attention as the first open-source IAST tool. Implemented as a library, its primary goal is to provide prescriptive guidance on incorporating runtime application intrusion detection and automated response mechanisms. It serves as a reference implementation and is not intended for direct deployment in its current state. It is important to note that its feature development appears to have ceased since 2019. Numerous commercial instrumentation-based security solutions have emerged, each contributing distinct features across all layers of the modern web application stack. Commercial IAST solutions provide unique security detection capabilities compared to other security testing tools that lack the visibility that runtime instrumentation provides. Most commercial IAST solutions provide integrations with integrated development environments, facilitating security analysis during the application development phase. This integration proves valuable in aiding developers with precise and efficient remediation efforts to address identified security issues. Certain IAST solutions broaden their integration functionalities to include SIEM systems, issue-tracking systems, and compliance dashboards, and some even offer automated generation of web application firewalls (WAF) or RASP rules. Runtime Application Self-Protection RASP extends the principles of IAST, utilizing the insights gained from runtime monitoring, but it goes a step further by incorporating self-protecting capabilities directly within the application. Effectively, this means that unlike SAST, DAST, and IAST, which are testing technologies used to identify vulnerabilities during various stages of the SDLC, RASP focuses on realtime self-protection. It actively monitors and analyzes the execution of the application and responds to security threats as they occur, making it a proactive defense mechanism rather than a testing tool. Similar to IAST, RASP uses instrumentation agents that hook into the runtime platform to monitor and analyze the application's behavior. Unlike other defensive solutions, such as WAF, RASP goes beyond analyzing inputs and outputs by actively monitoring and assessing the internal execution and state of the application. IAST utilizes instrumentation to monitor runtime execution for the identification of vulnerabilities and attacks, while RASP, not limited to detection, actively modifies the executing code to prevent attacks and protect the application. Figure 3: Conceptual diagram of IAST/RASP architecture in a tech stack Image source: Introduction to IAST, DZone Refcard Having said that, not all RASP solutions implement the same type of self-protection mechanisms or analyze the internal execution of the app at the same code-level granularity. Different RASP solutions may employ diverse approaches and detection algorithms, achieving different levels of accuracy and efficacy in terms of real-time threat detection, false positive rates, and self-protection capabilities. RASP tools leverage the unique insights provided by the runtime platform to go beyond traditional pattern recognition techniques, identifying anomalous behavior and actual security attacks that may not be covered by known patterns or signatures. Importantly, RASP solutions have the ability to provide self-protecting capabilities throughout the entire runtime stack and not only the application's business logic layer. This includes protection of the runtime environment, standard libraries, third-party libraries, frameworks, servers, and middleware. Additionally, certain RASP solutions offer on-the-fly virtual patching of vulnerable applications. This means that they can dynamically apply security controls or mitigations to protect against known vulnerabilities without requiring the application to be rebuilt or restarted. This way, RASP solutions help bridge the gap between the discovery of a zero-day vulnerability and the implementation of a permanent fix, providing an immediate layer of protection. Challenges of RASP Solutions The issue of false positives is crucial in RASP solutions as there is a risk that the tool may block legitimate and revenue-generating traffic, causing disruptions to the normal operation of the application. Given the potential impact on business operations and the user experience, and the risk of losing trust in the tool, minimizing false positives is a significant concern when choosing or implementing RASP solutions. Striking the right balance between effective threat detection, performance impact, and avoiding false positives with minimal configuration is essential for a successful RASP deployment into production environments. By instrumenting the runtime platform, RASP solutions have unique visibility of the code, which provides the runtime context that is essential for evaluating security intelligence and minimizing false positives. To achieve this unique visibility, RASP solutions inject specialized sensors in the runtime platform and perform a real-time analysis of the runtime insights. The challenge for RASP vendors is determining the appropriate injection points for these sensors and selecting which detection algorithms to analyze those runtime insights. These design decisions need to be made carefully in order to provide protection with as few false positives as possible while keeping the performance impact as low as possible. RASP Tools and Solutions The technical landscape of RASP is complex and marked by strong competition, which may be the underlying cause for the absence of actively maintained, enterprise-ready, open-source RASP solutions. The OpenRASP project is a notable exception. It is a plugin-based runtime security agent that supports Java and PHP, claiming to provide protection against the OWASP Top 10. However, the project's development is rather slow, with its last release in January 2022. Commercial RASP solutions provide tailored protection across different types of applications. Some are designed for general application security or web applications only, while others specialize in mobile application security, providing runtime protection for both Android and iOS platforms. When selecting a RASP product, users should take into consideration the trade-off that some vendors make between language support and the depth of security features. Some vendors opt for a more focused approach, offering advanced features for a smaller set of languages, while others prioritize broader language coverage, often with more basic protection capabilities. A Comparison of Application Security Testing and RASP Not all tools are the same, so developers and DevSecOps engineers must understand the key characteristics and differences of SAST, DAST, IAST, and RASP in order to make informed decisions about selecting and utilizing the most suitable tools for their specific needs, ultimately contributing to the improvement of the overall security posture in the development lifecycle. COMPARING KEY CHARACTERISTICS OF SAST, DAST, IAST, AND RASP Criteria SAST DAST IAST RASP Purpose Testing Testing Testing Protection Type of testing White box Black box Gray box N/A How it works Analyzes source code before compilation Sends malicious requests and analyzes responses for vulnerability patterns Instruments the application and analyzes data from both the source code and traffic Embeds into the app, intercepts requests and responses, and analyzes executing code, aiming to detect and protect apps at runtime Environment Dev or Test Dev or Test Dev or Test Prod Remediation guidance Detailed vulnerability information in the source code Generic vulnerability information and best practices Rich vulnerability information in the source code and running app Detailed exploit information on the location and cause in the source code and running app False positives rate Low to moderate Moderate to high Low Low to none Programming language support Programming language dependent Language agnostic Programming language dependent Runtime platform dependent Advantages Finds vulnerabilities during development Simulates real-world attacks Provides more accurate results Blocks attacks in real time Disadvantages Can miss vulnerabilities that manifest only at runtime Can miss coding flaws and vulnerabilities that are not exposed by responses Requires instrumentation agent and is language specific Installing the agent may pose difficulties and affect performance; is language specific Conclusion This article emphasizes the need for a comprehensive understanding of SAST, DAST, IAST, and RASP to make informed decisions in the development lifecycle. Despite the distinct roles of SAST, DAST, and IAST as testing tools, and RASP as a defensive tool, they share a common focus on proactive security and the shift-left approach. Each of these testing tools contributes to strengthening security measures by identifying vulnerabilities early in the SDLC, aligning with the proactive and preventative principles of the shift-left philosophy in DevSecOps practices. To protect applications against evolving threats, a comprehensive security testing strategy is required. There is no such thing as absolute security. Thus, it is crucial to implement a testing strategy that combines different security testing methods in all stages of the SDLC and implements defense in depth. This layered strategy focuses on implementing diverse security controls in both testing and production environments. Striking a balance between effective and thorough vulnerability detection, minimal performance impact, and false positive avoidance is key for a successful and resilient security testing and defense strategy. As the software landscape evolves, continuous adaptation of security practices becomes paramount for safeguarding applications and maintaining user trust. On a final note, the future of security testing and protection tools will be significantly influenced by generative AI, particularly large language models. This trend involves the increasing integration of AI-based copilot tools to enhance security practices across all layers of the technology stack and throughout the entire SDLC. We should anticipate the advent of AI-powered SAST, DAST, IAST and RASP tools that will provide critical advantages in key areas, including security code reviews, pair programming, test coverage, design and threat model reviews, as well as vulnerability and anomaly detection and attack protection. This is an article from DZone's 2023 Enterprise Security Trend Report.For more: Read the Report
Data is widely considered the lifeblood of an organization; however, it is worthless—and useless—in its raw form. Help is needed to turn this data into life-giving information. Part of this help is to move this data from its source to its destination via data pipelines. However, managing the end-to-end data pipeline lifecycle is not easy: some are challenging to scale on demand, and others can result in delays in detecting and addressing issues or be difficult to control. These and the other problems may be solved in the declarative programming paradigm, and in this article, we will see it on Terraform. Where Does This Data Come From? The data in these extensive datasets come from multiple disparate sources and are typically loaded into datasets in data storage like ClickHouse via data pipelines. These pipelines play a critical role in this process by efficiently moving, transforming, and processing the data. For instance, imagine you own a data analytics startup that provides organizations with real-time data analytics. Your client owns an eBike hire startup that wants to integrate weather data into its existing data to inform its marketing messaging strategies. Undoubtedly, your data engineers will set up many different data pipelines, pulling data from many sources. The two we are concerned with are designed to extract data from your client’s CRM and the local weather station. The Layers of the Modern Data Stack A data stack is a collection of different technologies, logically stacked on top of each other, that provide end-to-end data processing capabilities. As described here, a data stack can have many different layers. The modern data stack consists of ingestion (data pipelines), storage (OLAP database), and business intelligence (data analytics) layers. Lastly, there is an orchestration layer—for example, Kubernetes or Docker Compose—that sits on top of these layers, orchestrating everything together so that the stack does what it is supposed to do; ergo, ingest raw data and egest advanced predictive analytics. Data Pipelines are typically structured either as Extract, Transform, Load (ETL) or Extract, Load Transform (ELT) pipelines. However, developing, monitoring, and managing these data pipelines, especially in international enterprises with many branches worldwide, can be highly challenging, especially if the end-to-end data pipeline monitoring process is manual. Developing Data Pipelines, the Traditional Way Data pipelines are broadly categorized into two groups: code-based and deployed in a Direct Acyclic Graph or DAG-like tools such as Airflow and Apache Fink or non-code-based, which are often developed in a SaaS-based application via a drag-and-drop UI. Both have their own problems. Code-based data pipelines are challenging to scale on demand. The development/test/release cycle is typically too long, especially in critical environments with thousands of pipelines, loading petabytes of data into data stores daily. Additionally, they are typically manually implemented, deployed, and monitored by human data engineers, allowing the risk of errors creeping in and significant downtime. With non-code-based data pipelines, engineers don’t have access to the backend code; there can be limited visibility and monitoring, resulting in delays in detecting and addressing issues and failures. Moreover, it is challenging to reproduce errors, as making an exact copy of the pipeline is impossible. Lastly, version control using a tool like Git is complex. It is harder to control the evolution of a pipeline as the code is not stored in a repository such as a GitHub repo. The Solution: Improving Your Data Pipelines With Declarative, Reproducible, Modular Engineering The good news is that there is a sustainable solution to these challenges. It is important to remind ourselves of the difference between imperative and declarative programming. In imperative programming, you control how things happen. In declarative programming, you express the logic without specifying the control flow. The answer to the end-to-end data pipeline development, deployment, monitoring, and maintenance challenges is to utilize the declarative programming paradigm, that is, abstracting the computation logic. In practice, this is best achieved by using Terraform by HashiCorp — an open-source project that is an infrastructure-as-code development tool. Because modern world data-intensive application architecture is very similar to the data stack the applications run on, Terraform provides the framework that will make your data stack declarative, reproducible, and modular. Let’s return to our data analytics startup example described above to elucidate these concepts further. Imagine your data analytics app is a web application that executes inside Terraform. As expected, the most significant part of the application is the ClickHouse database. We must build two data pipelines to ETL data from the CRM system—stored in a PostgreSQL database—into the ClickHouse database, transforming and enriching the data before loading it into ClickHouse. The second pipeline extracts real-time weather data from the local weather service—through an API and transforms it—ensuring there are no errors and enriching it—before loading it into ClickHouse. Using Terraform to Code the Data Pipelines Classic infrastructure for modern cloud-based applications looks like: This is an oversimplification but a more or less solid way of building applications. You have a public subnet with a user facing API/UI, which is connected to a private subnet where data is stored. To simplify things, let’s look at how to use Terraform to build the first data pipeline—to ETL data from your RDBMS (in this case, PostgreSQL) system into the ClickHouse database. This diagram visualizes the data pipeline that ELTs the data from the RDBMS database into the ClickHouse database and then exposes as a connection to Visualization service. This workflow must be reproducible as your company’s information technology architecture has at least three environments: dev, QA, and prod. Enter Terraform. Let’s look at how to implement this data pipeline in Terraform. Note: it is best practice to develop modular code, distilled down to the smallest unit, which can be reused many times. The first step is to start with a main.tf file, containing nothing more than the provider’s definition. JSON provider "doublecloud" { endpoint = "api.double.cloud:443" authorized_key = file(var.dc-token) } provider "aws" { profile = var.profile } The second step is to create a BYOA network: JSON data "aws_caller_identity" "self" {} data "aws_region" "self" {} # Prepare BYOC VPC and IAM Role module "doublecloud_byoc" { source = "doublecloud/doublecloud-byoc/aws" version = "1.0.2" providers = { aws = aws } ipv4_cidr = var.vpc_cidr_block } # Create VPC to peer with resource "aws_vpc" "peered" { cidr_block = var.dwh_ipv4_cidr provider = aws } # Get account ID to peer with data "aws_caller_identity" "peered" { provider = aws } # Create DoubleCloud BYOC Network resource "doublecloud_network" "aws" { project_id = var.dc_project_id name = "alpha-network" region_id = module.doublecloud_byoc.region_id cloud_type = "aws" aws = { vpc_id = module.doublecloud_byoc.vpc_id account_id = module.doublecloud_byoc.account_id iam_role_arn = module.doublecloud_byoc.iam_role_arn private_subnets = true } } # Create VPC Peering from DoubleCloud Network to AWS VPC resource "doublecloud_network_connection" "example" { network_id = doublecloud_network.aws.id aws = { peering = { vpc_id = aws_vpc.peered.id account_id = data.aws_caller_identity.peered.account_id region_id = var.aws_region ipv4_cidr_block = aws_vpc.peered.cidr_block ipv6_cidr_block = aws_vpc.peered.ipv6_cidr_block } } } # Accept Peering Request on AWS side resource "aws_vpc_peering_connection_accepter" "own" { provider = aws vpc_peering_connection_id = time_sleep.avoid_aws_race.triggers["peering_connection_id"] auto_accept = true } # Confirm Peering creation resource "doublecloud_network_connection_accepter" "accept" { id = doublecloud_network_connection.example.id depends_on = [ aws_vpc_peering_connection_accepter.own, ] } # Create ipv4 routes to DoubleCloud Network resource "aws_route" "ipv4" { provider = aws route_table_id = aws_vpc.peered.main_route_table_id destination_cidr_block = doublecloud_network_connection.example.aws.peering.managed_ipv4_cidr_block vpc_peering_connection_id = time_sleep.avoid_aws_race.triggers["peering_connection_id"] } # Sleep to avoid AWS InvalidVpcPeeringConnectionID.NotFound error resource "time_sleep" "avoid_aws_race" { create_duration = "30s" triggers = { peering_connection_id = doublecloud_network_connection.example.aws.peering.peering_connection_id } } This is preparing our Stage for later clusters. Architecture after terraform apply looks follows: Once we prepare a stage, we can create a cluster in this private-subnet: JSON resource "doublecloud_clickhouse_cluster" "alpha-clickhouse" { project_id = var.dc_project_id name = "alpha-clickhouse" region_id = var.aws_region cloud_type = "aws" network_id = doublecloud_network.aws.id resources { clickhouse { resource_preset_id = "s1-c2-m4" disk_size = 34359738368 replica_count = 1 } } config { log_level = "LOG_LEVEL_TRACE" max_connections = 120 } access { ipv4_cidr_blocks = [ { value = doublecloud_network.aws.ipv4_cidr_block description = "DC Network interconnection" }, { value = aws_vpc.tutorial_vpc.cidr_block description = "Peered VPC" }, { value = "${var.my_ip}/32" description = "My IP" } ] ipv6_cidr_blocks = [ { value = "${var.my_ipv6}/128" description = "My IPv6" } ] } } This is added to our Stage for simple Clickhouse Cluster. But this cluster is still empty, so we must enable transfer between PostgreSQL and ClickHouse: JSON resource "doublecloud_transfer_endpoint" "pg-source" { name = "chinook-pg-source" project_id = var.dc_project_id settings { postgres_source { connection { on_premise { tls_mode { ca_certificate = file("global-bundle.pem") } hosts = [ aws_db_instance.tutorial_database.address ] port = 5432 } } database = aws_db_instance.tutorial_database.db_name user = aws_db_instance.tutorial_database.username password = var.db_password } } } data "doublecloud_clickhouse" "dwh" { name = doublecloud_clickhouse_cluster.alpha-clickhouse.name project_id = var.dc_project_id } resource "doublecloud_transfer_endpoint" "dwh-target" { name = "alpha-clickhouse-target" project_id = var.dc_project_id settings { clickhouse_target { connection { address { cluster_id = doublecloud_clickhouse_cluster.alpha-clickhouse.id } database = "default" user = data.doublecloud_clickhouse.dwh.connection_info.user password = data.doublecloud_clickhouse.dwh.connection_info.password } } } } resource "doublecloud_transfer" "pg2ch" { name = "postgres-to-clickhouse-snapshot" project_id = var.dc_project_id source = doublecloud_transfer_endpoint.pg-source.id target = doublecloud_transfer_endpoint.dwh-target.id type = "SNAPSHOT_ONLY" activated = false } This is adding a ELT snapshot process between PostgreSQL and ClickHouse. As a last piece, we can add a connection to this newly created cluster to visualization service: JSON resource "doublecloud_workbook" "dwh-viewer" { project_id = var.dc_project_id title = "DWH Viewer" config = jsonencode({ "datasets" : [], "charts" : [], "dashboards" : [] }) connect { name = "main" config = jsonencode({ kind = "clickhouse" cache_ttl_sec = 600 host = data.doublecloud_clickhouse.dwh.connection_info.host port = 8443 username = data.doublecloud_clickhouse.dwh.connection_info.user secure = true raw_sql_level = "off" }) secret = data.doublecloud_clickhouse.dwh.connection_info.password } } As you can see, most of the code consists of variables. Therefore, executing this code to set up the different environments is easy by adding a stage_name.tfvars file and running Terraform apply with it. Note: See the Terraform documentation for more information on .tvars files. JSON // This variable is to set the // AWS region that everything will be // created in variable "aws_region" { default = "eu-west-2" // london } // This variable is to set the // CIDR block for the VPC variable "vpc_cidr_block" { description = "CIDR block for VPC" type = string default = "10.0.0.0/16" } // This variable holds the // number of public and private subnets variable "subnet_count" { description = "Number of subnets" type = map(number) default = { public = 1, private = 2 } } // This variable contains the configuration // settings for the EC2 and RDS instances variable "settings" { description = "Configuration settings" type = map(any) default = { "database" = { allocated_storage = 10 // storage in gigabytes engine = "postgres" // engine type engine_version = "15.4" // engine version instance_class = "db.t3.micro" // rds instance type db_name = "chinook" // database name identifier = "chinook" // database identifier skip_final_snapshot = true }, "web_app" = { count = 1 // the number of EC2 instances instance_type = "t3.micro" // the EC2 instance } } } // This variable contains the CIDR blocks for // the public subnet. I have only included 4 // for this tutorial, but if you need more you // would add them here variable "public_subnet_cidr_blocks" { description = "Available CIDR blocks for public subnets" type = list(string) default = [ "10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24", "10.0.4.0/24" ] } // This variable contains the CIDR blocks for // the public subnet. I have only included 4 // for this tutorial, but if you need more you // would add them here variable "private_subnet_cidr_blocks" { description = "Available CIDR blocks for private subnets" type = list(string) default = [ "10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24", "10.0.104.0/24", ] } // This variable contains your IP address. This // is used when setting up the SSH rule on the // web security group variable "my_ip" { description = "Your IP address" type = string sensitive = true } // This variable contains your IP address. This // is used when setting up the SSH rule on the // web security group variable "my_ipv6" { description = "Your IPv6 address" type = string sensitive = true } // This variable contains the database master user // We will be storing this in a secrets file variable "db_username" { description = "Database master user" type = string sensitive = true } // This variable contains the database master password // We will be storing this in a secrets file variable "db_password" { description = "Database master user password" type = string sensitive = true } // Stage 2 Variables variable "dwh_ipv4_cidr" { type = string description = "CIDR of a used vpc" default = "172.16.0.0/16" } variable "dc_project_id" { type = string description = "ID of the DoubleCloud project in which to create resources" } Conclusion There you have it. A small modern data stack that ETLs data from a PostgreSQL database into a ClickHouse database. The best news is that you can set up multiple data pipelines—as many as you need—using the code in this example. There is also a complete example that you can find here, with additional features like adding a peer connection to your existing VPC and creating a sample ClickHouse database with a replication transfer between the two databases. Have fun playing with your small modern data stack!
In the dynamic landscape of modern application development, efficient and seamless interaction with databases is paramount. HarperDB, with its NoSQL capabilities, provides a robust solution for developers. To streamline this interaction, the HarperDB SDK for Java offers a convenient interface for integrating Java applications with HarperDB. This article is a comprehensive guide to getting started with the HarperDB SDK for Java. Whether you're a seasoned developer or just diving into the world of databases, this SDK aims to simplify the complexities of database management, allowing you to focus on HarperDB's NoSQL features. Motivation for Using HarperDB SDK Before delving into the intricacies of the SDK, let's explore the motivations behind its usage. The SDK is designed to provide a straightforward pathway for Java applications to communicate with HarperDB via HTTP requests. By abstracting away the complexities of raw HTTP interactions, developers can concentrate on leveraging the NoSQL capabilities of HarperDB without dealing with the intricacies of manual HTTP requests. In the fast-paced realm of software development, time is a precious resource. The HarperDB SDK for Java is a time-saving solution designed to accelerate the integration of Java applications with HarperDB. Rather than reinventing the wheel by manually crafting HTTP requests and managing the intricacies of communication with HarperDB, the SDK provides a high-level interface that streamlines these operations. By abstracting away the complexities of low-level HTTP interactions, developers can focus their efforts on building robust applications and leveraging the powerful NoSQL capabilities of HarperDB. It expedites the development process and enhances code maintainability, allowing developers to allocate more time to core business logic and innovation. The motivation for utilizing HTTP as the communication protocol between Java applications and HarperDB is rooted in efficiency, security, and performance considerations. While SQL is a widely adopted language for querying and managing relational databases, the RESTful HTTP interface provided by HarperDB offers distinct advantages. The purpose of this guide is to shed light on the functionality of HarperDB in the context of supported SQL operations. It's essential to note that the SQL parser within HarperDB is an evolving feature, and not all SQL functionalities may be fully optimized or utilize indexes. As a result, the REST interface emerges as a more stable, secure, and performant option for interacting with data. The RESTful nature of HTTP communication aligns with modern development practices, providing a scalable and straightforward approach to data interaction. The stability and security inherent in the RESTful architecture make it an attractive choice for integrating Java applications with HarperDB. While the SQL functionality in HarperDB can benefit administrative ad-hoc querying and leveraging existing SQL statements, the guide emphasizes the advantages of the RESTful HTTP interface for day-to-day data operations. As features and functionality evolve, the guide will be updated to reflect the latest capabilities of HarperDB. The motivation for using the HarperDB SDK and opting for HTTP communication lies in the quest for efficiency, security, and a more streamlined development experience. This guide aims to empower developers to make informed choices and harness the full potential of HarperDB's NoSQL capabilities while navigating the evolving landscape of SQL functionality. We understand the motivation behind employing the HarperDB SDK for Java and choosing HTTP as the communication protocol, which lays a solid foundation for an efficient and streamlined development process. The SDK is a valuable tool to save time and simplify complex interactions with HarperDB, allowing developers to focus on innovation rather than the intricacies of low-level communication. As we embark on the hands-on session on the following topic, we will delve into practical examples and guide you through integrating the SDK into your Java project. Let's dive into the hands-on session to bring theory into practice and unlock the full potential of HarperDB for your Java applications. Hands-On Session: Building a Simple Java SE Application with HarperDB In this hands-on session, we'll guide you through creating a simple Java SE application that performs CRUD operations using the HarperDB SDK. Before we begin, ensure you have a running instance of HarperDB. For simplicity, we'll use a Docker instance with the following command: Shell docker run -d -e HDB_ADMIN_USERNAME=root -e HDB_ADMIN_PASSWORD=password -e HTTP_THREADS=4 -p 9925:9925 -p 9926:9926 harperdb/harperdb This command sets up a HarperDB instance with a root username and password for administration. The instance will be accessible on ports 9925 and 9926. Now, let's proceed with building our Java application. We'll focus on CRUD operations for a straightforward entity—Beer. Throughout this session, we'll demonstrate the seamless integration of the HarperDB SDK into a Java project. To kickstart our project, we’ll create a Maven project and include the necessary dependencies—HarperDB SDK for Java and DataFaker for generating beer data. Create a Maven Project Open your preferred IDE or use the command line to create a new Maven project. If you’re using an IDE, there is typically an option to create a new Maven project. If you’re using the command line, you can use the following command: Shell mvn archetype:generate -DgroupId=com.example -DartifactId=harperdb-demo -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false Replace com.example with your desired package name and harperdb-demo with the name of your project. Include dependencies in pom.xml: Open the pom.xml file in your project and include the following dependencies: XML <dependencies> <dependency> <groupId>expert.os.harpderdb</groupId> <artifactId>harpderdb-core</artifactId> <version>0.0.1</version> </dependency> <dependency> <groupId>net.datafaker</groupId> <artifactId>datafaker</artifactId> <version>2.0.2</version> </dependency> </dependencies> Create the Beer Entity In your src/main/java/com/example directory, create a new Java file named Beer.java. Define the Beer entity as a record, taking advantage of the immutability provided by records. Additionally, include a static factory method to create a Beer instance using DataFaker: Java package com.example; import net.datafaker.Faker; public record Beer(String id, String name, String style, String brand) { static Beer of(Faker faker) { String id = faker.idNumber().valid(); String name = faker.beer().name(); String style = faker.beer().style(); String brand = faker.beer().brand(); return new Beer(id, name, style, brand); } } With these initial steps, you’ve set up a Maven project, included the required dependencies, and defined a simple immutable Beer entity using a record. The next phase involves leveraging the HarperDB SDK to perform CRUD operations with this entity, showcasing the seamless integration between Java and HarperDB. Let’s proceed to implement the interaction with HarperDB in the subsequent steps of our hands-on session. The Server and Template classes are fundamental components of the HarperDB SDK for Java, providing a seamless interface for integrating Java applications with HarperDB’s NoSQL database capabilities. Let’s delve into the purpose and functionality of each class. Server Class The Server class is the entry point for connecting with a HarperDB instance. It encapsulates operations related to server configuration, database creation, schema definition, table creation, and more. Using the ServerBuilder, users can easily set up the connection details, including the host URL and authentication credentials. Key features of the Server class: Database management: Create, delete, and manage databases. Schema definition: Define schemas within databases. Table operations: Create tables with specified attributes. Credential configuration: Set up authentication credentials for secure access. Template Class The Template class is a high-level abstraction for performing CRUD (Create, Read, Update, Delete) operations on Java entities within HarperDB. It leverages Jackson’s JSON serialization to convert Java objects to JSON, facilitating seamless communication with HarperDB via HTTP requests. Key features of the Template class: Entity operations: Perform CRUD operations on Java entities. ID-based retrieval: Retrieve entities by their unique identifiers. Integration with Server: Utilize a configured Server instance for database interaction. Type-Safe operations: Benefit from type safety when working with Java entities. Together, the Server and Template classes provide a robust foundation for developers to integrate their Java applications with HarperDB effortlessly. In the subsequent sections, we’ll explore practical code examples to illustrate the usage of these classes in real-world scenarios, showcasing the simplicity and power of the HarperDB SDK for Java. Let’s delve into the code and discover the capabilities these classes bring to your Java projects. In this session, we’ll execute a comprehensive code example to demonstrate the functionality of the HarperDB SDK for Java. The code below showcases a practical scenario where we create a database, define a table, insert a beer entity, retrieve it by ID, delete it, and then confirm its absence. Java public static void main(String[] args) { // Create a Faker instance for generating test data Faker faker = new Faker(); // Configure HarperDB server with credentials Server server = ServerBuilder.of("http://localhost:9925") .withCredentials("root", "password"); // Create a database and table server.createDatabase("beers"); server.createTable("beer").id("id").database("beers"); // Obtain a Template instance for the "beers" database Template template = server.template("beers"); // Generate a random beer entity Beer beer = Beer.of(faker); // Insert the beer entity into the "beer" table template.insert(beer); // Retrieve the beer by its ID and print it template.findById(Beer.class, beer.id()).ifPresent(System.out::println); // Delete the beer entity by its ID template.delete(Beer.class, beer.id()); // Attempt to retrieve the deleted beer and print a message template.findById(Beer.class, beer.id()) .ifPresentOrElse( System.out::println, () -> System.out.println("Beer not found after deletion") ); } Explanation of the code: Faker instance: We use the Faker library to generate random test data, including the details of a beer entity. Server configuration: The Server instance is configured with the HarperDB server’s URL and authentication credentials (username: root, password: password). Database and table creation: We create a database named “beers” and define a table within it named “beer” with an “id” attribute. Template instance: The Template instance is obtained from the configured server, specifically for the “beers” database. Beer entity operations: Insertion: A randomly generated beer entity is inserted into the “beer” table. Retrieval: The inserted beer is retrieved by its ID and printed. Deletion: The beer entity is deleted by its ID. Confirmation of deletion: We attempt to retrieve the deleted beer entity and print a message confirming its absence. This code provides a hands-on exploration of the core CRUD operations supported by the HarperDB SDK for Java. By running this code, you’ll witness the seamless integration of Java applications with HarperDB, making database interactions straightforward and efficient. Let’s execute and observe the SDK in action! In this hands-on session, we executed a concise yet comprehensive code example that showcased the power and simplicity of the HarperDB SDK for Java. By creating a database, defining a table, and manipulating beer entities, we explored the SDK's capability to integrate Java applications with HarperDB's NoSQL features seamlessly. The demonstrated operations, including insertion, retrieval, and deletion, underscored the SDK's user-friendly approach to handling CRUD functionalities. This session offered a practical glimpse into the ease of use and effectiveness of the HarperDB SDK to Java developers, making database interactions a seamless part of application development. As we proceed, we'll delve deeper into more advanced features and scenarios, building on this foundation to empower developers in leveraging HarperDB's capabilities within their Java projects. Conclusion In conclusion, this article has thoroughly explored the HarperDB SDK for Java, showcasing its capabilities in simplifying the integration of Java applications with HarperDB’s NoSQL database. From understanding the core classes like Server and Template to executing practical CRUD operations with a sample beer entity, we’ve witnessed the user-friendly nature of the SDK. By choosing the HarperDB SDK, developers can streamline database interactions, focusing more on application logic and less on intricate database configurations. For those eager to dive deeper, the accompanying GitHub repository contains the complete source code used in the hands-on session. Explore, experiment, and adapt the code to your specific use cases. Additionally, the official HarperDB Documentation serves as an invaluable resource, offering in-depth insights into the NoSQL operations API, making it an excellent reference for further exploration. As you embark on your journey with HarperDB and Java, remember that this SDK empowers developers, providing a robust and efficient bridge between Java applications and HarperDB’s NoSQL capabilities. Whether you’re building a small-scale project or a large-scale enterprise application, the HarperDB SDK for Java stands ready to enhance your development experience.
I recently read an article at 9 Reasons SQL has Got to Go suggesting that SQL has "got to go". This article is misguided, and the claims are patently false. I'll discuss the points the original author made, and why I believe they are off the mark. Tables Do Scale I live in Canada. The Bank of Canada estimates there are approximately 30 million financial transactions per day in this country. We have what we call the "Big 5" banks. If we assume they get the brunt of these transactions (let's say two-thirds), that would be 20 million transactions. If we assume an even distribution, that means they each handle about 4 million transactions per day. Banks have been handling this kind of load every day for years, and use SQL. Banks have figured out how to scale SQL usage. I've never worked at a bank, so I do not know what they do to scale it. I suspect part of the solution is partitioning, most likely by date range - using separate disk space for transactions within a given date range, like a calendar month. If a bank averages 4 million transactions per day, at worst, a partition would contain 31 days * 4 million transactions = 124 million rows. I assume separate records would be required for withdrawing from one account and depositing to another. Assuming two rows for every transaction yields 248 million rows per month. For perspective, your average corporate craptop, which performs far worse than an actual server, can easily find a random row out of a million in 2 ms by primary key. I'm sure it's not that simple, and there is probably more data than that - banks have a fantastically reliable system, and my guess is this reliability comes at the cost of more data. Even if we double that amount again, at about 500 million rows per month, that is a manageable amount of data. If that amount of data is too slow, partitions could be more granular, like the first and second halves of the month, to cut the number of rows per partition roughly in half. Banks in Canada are required to store data for 7 years, so the process might work by running a job near the end of each month, that does two things: Create new partition(s) for the upcoming month's date range. The partition(s) must exist before the new month, to ensure there are separate partition(s) to store the next month's data before the first transaction of the next month occurs. The safest bet is to run the job on the 28th, as every month has at least 28 days. Delete any partitions more than 7 years old. This saves space for data that is not required to be stored. Nothing Is “Bolted On” The author keeps referring to features as "bolted on". To borrow an English expression - "bollocks". SQL vendors aren't stupid, they keep up with the times. Like banks came up with email transfers to make our lives easier, SQL vendors have added requested features like XML, JSON, window functions, partitioning, etc. Each of these features is well thought out, which is why they are so reliable. Take JSON for example: while different vendors implement JSON differently, they have common threads: A single type named JSON can internally represent an Object, Array, String, Number, Boolean, or Null value Operators to access particular object keys or array elements Functions to turn a JSON array into rows of JSON value elements, and vice-versa Functions to turn a JSON object into rows of string keys and JSON values, and vice-versa Operators/functions to dig through a JSON value with a path expression A function to determine which of the six valid types of JSON values a particular JSON value contains Indexes can be made on individual JSON properties, or maybe on all properties With really not that many lines of SQL, you can do things like the following: Turn an object into rows of key and value columns Use a where clause to eliminate any keys that match some pattern that represents internal storage details that should not be exposed to the application Rebuild a new object from the remaining rows In the above process, the important part is that the table of keys and values has the value column typed as JSON: an SQL table must have a single type for each column, yet the values of JSON keys can be any of the six different valid JSON values. This is why the single-type JSON can internally store any of the six value types, unlike how the application layer handles JSON. Effectively, SQL support for JSON is fairly consistent with SQL philosophy and patterns. There is good documentation explaining it, it doesn't require being a genius to use it. There are plenty of examples to follow on common sites like SO, and god forbid, the vendor documentation pages. Every new feature SQL vendors offer has the same meticulous care taken to implement it in a way that makes sense and to consider what kind of operators, functions, aggregate functions, window functions, etc might be needed for the feature. Each new feature works like it was well planned and reasonably similar to other features. And of course, due to user requests, sometimes additional use cases are considered, and additional operators and so on are added over time. Nothing is just thrown in as an afterthought. Ever. Marshaling Is Hard No, it isn't. Any language with reflection and SQL support is bound to have one or more ready-made ORM solutions to translate a row into an Object/Struct/whatever the language calls it, and vice-versa. Some solutions like Java JPA are incredibly complex, but in my view, that is a symptom of the Java community making mountains out of molehills for practically everything. Generally speaking, languages have low-level database support like the following, that these ORMs are built on: Open a connection Create a statement with a parameterized query Set the parameter values Execute the query in one of two ways: For write queries, did it succeed or fail For read queries, collect the results into one or more objects added to some kind of list/array/whatever it is called Like anything, if the ORMs you see are way too complicated, you can make a simple one. All you need at a minimum, is to figure out the reflection details of translating SQL types into application types and vice-versa. This isn't as bad as it sounds, as most SQL driver implementations allow some flexibility, such as you can ask for any type of string. An ORM doesn't have to be a beast that generates queries for you, caches results, and so on. You can also write a code generator that takes some kind of specification - I would use TOML if I did this today - and generate Data Transfer Objects (DTOs) that represent the columns of a table, and Data Access Objects (DAOs) that have methods for operations like the following: Upsert one DTO Upsert many DTOs Select one row by id Select many rows by IDs Delete one row by id Delete many rows by IDs Helper methods Populate a statement from a DTO and vice-versa Collect query results into one DTO or a list of DTOs Any custom queries can be written as methods of subclasses of the generated DAOs in a different directory, and leverage the helper methods to translate SQL rows <-> DTOs. When the code generator is rerun later, it can begin by wiping the directory of generated classes to ensure that: any no longer relevant previously generated classes are removed custom subclasses are untouched Arguably, making your solution sounds like time wasted, but if you have a situation that warrants it, why not? EG, in Java, there are less popular, simpler solutions than JPA available, exactly because, like me, they figure JPA is an overblown memory hog. Someone had to write those simpler solutions. SQL Is Not Good at Real-Time In this case, the author may have a point - but that does not mean SQL should be abandoned. Just because you encounter a case SQL is not good at, does not mean throwing it out entirely. Instead, it means adding another solution based on the data in the SQL database. I haven't used real-time databases, but I'm sure there are ways to populate real-time databases from SQL data as SQL data is added, with some kind of replication. The SQL database might even have such replication built-in or could be added - eg, Postgres has Foreign Data Wrappers that might help in this case. In some cases, you should be able to just use SQL, by tuning it accordingly. You don't have to have one database to rule them all. Usually, you can do everything you need with SQL, but not always - just like everything else. Joins Are Hard No, they aren't. You only really have to understand a few key ideas to pick them apart: All joins of Table A to Table B effectively collect all columns of Table A and Table B into one flat list of columns Just using a comma means a cartesian cross product - multiply every row of table a by every row of table b. This is most often used when the data selected from table a and/or b is a single row so that you are only multiplying by one. The keywords CROSS APPLY can be used instead of a comma. A left join of the table a to table b is optional - the corresponding row in b may not exist, in which case all columns of b are null for that specific row. This can be detected by checking if the primary key column of b is null. A right join of the table a to table b requires a corresponding row in b to exist, otherwise, the row in a is filtered out. It effectively acts as a where criteria and could be rewritten as a left join where the primary key of b is not null. A full join of Table A to Table B provides three kinds of rows: A row exists in both Table A and Table B and all columns have data A row exists in Table A only, all Table B columns are null A row exists in Table B only, all Table A columns are null Columns Are Not a Waste of Space Since you can have columns that are type JSON, you can use JSON to store fields for various counter-cases of the usual SQL table definition, such as: Rarely used fields that will only exist for a very small percentage of rows User-defined fields Fields whose type can vary depending on other values of some of the non-JSON columns and/or JSON properties You don't have to use only non-JSON or only JSON columns. You can mix and match for perfectly good reasons. Optimizers Don’t Always Help This is not quite true. Yes, some queries may not scale super well, but that is where proper application design comes into play. You should design the application to have a separate model, where only the model code knows you are using SQL, and contains all SQL queries. If a given query doesn't scale somehow, then the model can do anything necessary to speed up that query, such as: Use the EXPLAIN command to find out why the query is slow, and take actions, such as: Add another index Start using partitions Add materialized views - views backed by a table to store the results in, where you have to periodically refresh the view Write more efficient model code that could, for example, use multiple queries of less data per query and give the optimizer a better choice of indexes to use for them. Cache data with Redis I know some people will say "But we're using microservices and they each own their tables". Having each service on its tables is not very good for various reasons, such as: If a feature like row-level authorization is required, you have to implement it in every service If some tables need extra solutions like caching, it is hard to know which tables use these extra solutions A better idea is to have one microservice whose sole responsibility is to do all querying It acts as a choking point, a single place to implement features like row-level authorizations It is a form of the Single Responsibility principle, where one service is just data, and each other service is just handling the details like validations of one data type Just because it is popular to have each service on its table(s) does not automatically make it a good idea Denormalization Is Generally Not Needed I'm sure there are use cases for it, but I have personally never needed it for any project I've worked on. You don't have to choose normalized or denormalized - you can have a hybrid of both, where a view can be used to produce a denormalized view of multiple normalized tables. Such a view can be materialized to act as a cache to speed up results and has the added benefit of not having to keep replicating the same join conditions in multiple queries. This is why purists suggest always using views - it allows for manipulating views in any way needed over time, without always having to tweak application code. Your SQL vendor may support "INSTEAD OF" triggers, which apply only to views. Such a trigger can be added to the denormalized view to translate the denormalized data into the normalized tables. Somebody Has To Learn SQL The original article reads like someone who is not interested in learning SQL in any real depth. I keep encountering this throughout my career, but much more so in more recent years. It seems to me that the following events have occurred over the last 15 years: Software companies had DBAs when I started, who helped devs with advice on improving performance, and how to make certain kinds of queries Companies stopped hiring DBAs, leaving a gap of nobody who knows databases in-depth Companies started using the cloud, where most of the management of a database is offloaded to the cloud company ORMs like JPA that have a lot of complexity cropped up, and became defacto standard tools Developers got used to not having to write SQL As a result of this progression, a lot of developers out there today: Have probably never worked with a DBA Are unaccustomed to writing SQL queries of any real complexity Do not know what features are available for speeding up queries or scaling performance Do not know their SQL database can do any of the following: Full-text searching Graph queries Hierarchical queries This knowledge gap isn't due entirely to the developers, companies need to ensure they have some database developers. Maybe not every team needs a database developer, but companies need to have some available for those who need them. Average developers need to get in the habit of assuming that their SQL database will serve their needs until a more savvy database developer says otherwise. A little common sense goes a long way: If nobody understands how some queries work, how is that different than nobody understanding how some Java code works? The same solution can be used to solve both these problems: appropriate comments and documentation If you are using several different strategies in the database, document each strategy, including what problem is being solved, and how it is being solved with a concrete example CTEs are a good way of making your SQL readable in a top-down fashion, more procedural like application code Comment queries just like you would comment on application code I am not an old, greybeard, Gandalph DBA. I am a developer who happens to have a keen interest in databases, and our team is in the process of migrating an ETL from MSSQL Server to BigQuery. There are a lot of BQ functions and procedures simply because the original MSSQL solution was written that way, and it is the easiest way to translate the code. Others who aren't me and weren't involved in the original implementation can understand the approaches being used and can keep up with the kinds of queries we're writing. SQL should remain for a long time because it is an excellent language for manipulating data. Just like any other language, those who invest in learning it will find reasons to say it is a good solution, and those who don't will find reasons to say it isn't.
This is an article from DZone's 2023 Enterprise Security Trend Report.For more: Read the Report DevSecOps — a fusion of development, security, and operations — emerged as a response to the challenges of traditional software development methodologies, particularly the siloed nature of development and security teams. This separation often led to security vulnerabilities being discovered late in the development cycle, resulting in costly delays and rework. DevSecOps aims to break down these silos by integrating security practices into the entire software development lifecycle (SDLC), from planning and coding to deployment and monitoring. The evolution of DevSecOps has been driven by several factors, including the increasing complexity of software applications, the growing sophistication of cyberattacks, and the demand for faster software delivery. As organizations adopted DevOps practices to accelerate software delivery, they realized that security could no longer be an afterthought. Benefits of Modern DevSecOps Today, DevSecOps teams have a number of opportunities to improve their security posture, reduce the risk of data breaches and other security incidents, increase compliance, and deliver software products and services more quickly. Improved Security Posture DevSecOps can help to improve the security posture of organizations by: Automating security testing and scanning throughout the SDLC. This helps to identify and fix security vulnerabilities early in the development process before they can be exploited by attackers. Deploying security patches and updates quickly and efficiently. This helps to reduce the window of time in which organizations are vulnerable to known vulnerabilities. Building security into the software development process. This helps to ensure that security is considered at every stage of the development lifecycle, and that security requirements are met. Reduced Risk of Data Breaches and Other Security Incidents By integrating security into the SDLC, DevSecOps can help to reduce the risk of data breaches and other security incidents. This is because security vulnerabilities are more likely to be identified and fixed early in the development process. Additionally, DevSecOps can help to reduce the impact of security incidents by providing a more rapid and efficient response. Increased Compliance DevSecOps can help organizations comply with a variety of security regulations and standards. This is because DevSecOps helps to ensure that security is built into the software development process and that security requirements are met. Additionally, DevSecOps can help organizations demonstrate their commitment to security to their customers and partners. Faster Time to Market DevSecOps can help organizations deliver software products and services faster by automating security testing and scanning and by deploying security patches and updates efficiently. This allows organizations to focus on developing and delivering new features and functionality to their customers. Challenges of Modern DevSecOps Modern DevSecOps teams face a number of challenges, including: Cultural challenges – Collaboration between developers, security, and operations teams can be difficult as these teams often have different priorities and cultures. Additionally, security is not always seen as a priority by business stakeholders. Technical challenges – Integrating security tools into the development pipeline can be complex, and automating security testing and scanning can be challenging. Additionally, DevSecOps teams need to have the skills and training necessary to implement and manage DevSecOps practices effectively. Seamless Security Integration: How IaC and CI/CD Foster DevSecOps Excellence Infrastructure as Code (IaC) and CI/CD, when combined, create a powerful synergy that fosters DevSecOps excellence. IaC enables the codification of security configurations into infrastructure templates, while CI/CD facilitates the automation of security checks and scans throughout the development lifecycle. This seamless integration ensures that security is embedded into every stage of the DevOps process, from code development to infrastructure deployment, resulting in secure and reliable software delivery at scale. IaC and Infrastructure Management IaC is a modern approach to managing and provisioning infrastructure resources through code instead of manual processes. This codified approach offers several benefits, including increased agility, consistency, and repeatability. Security Risks Associated With IaC While IaC provides many advantages, it also introduces potential security risks if not implemented and managed carefully. Common security risks associated with IaC include: Overprivileged access – IaC templates may grant excessive access permissions to users or services, increasing the attack surface and potential for misuse. Misconfigurations – Errors in IaC templates can lead to misconfigured security settings, leaving vulnerabilities open to exploitation. Insecure code storage – Sensitive information, such as passwords or API keys, may be embedded directly into IaC templates, exposing them to unauthorized access. Lack of logging and monitoring – Inadequate audit logging and monitoring can make it difficult to detect and track unauthorized changes or security breaches. Best Practices for Secure IaC To mitigate security risks and ensure the safe adoption of IaC, consider these best practices: Least privilege – Grant only the minimum necessary permissions required for each user or service. Thorough code review – Implement rigorous code review processes to identify and rectify potential security flaws. Secure code storage – Store sensitive information securely using encryption or access control mechanisms. Comprehensive logging and auditing – Implement robust logging and auditing to track changes and identify suspicious activity. Utilize security tools – Leverage automated security scanning tools to detect vulnerabilities and misconfigurations. Security Considerations for Infrastructure Management Infrastructure management encompasses the provisioning, configuration, and monitoring of infrastructure resources. Security considerations for infrastructure management include: Hardening infrastructure – Implement security measures to harden infrastructure components against known vulnerabilities and attack vectors. Vulnerability management – Regularly scan infrastructure for vulnerabilities and prioritize remediation based on criticality. Access control – Enforce strict access control policies to limit access to infrastructure resources to only authorized personnel. Incident response – Establish a comprehensive incident response plan to effectively address security breaches. IaC Benefits for Security Checks and Scans IaC facilitates the integration of security checks and scans into the infrastructure management process, further enhancing security posture. By codifying infrastructure configurations, IaC enables consistent and repeatable security checks across environments, reducing the likelihood of human error and missed vulnerabilities. Additionally, IaC facilitates the integration of security checks and scans into the infrastructure management process, further enhancing security posture. IaC for Consistent Security Configurations IaC promotes consistency in security configurations and reduces the risk of human errors by enforcing standard security settings across all infrastructure deployments. This consistency helps ensure that security measures are applied uniformly, minimizing the risk of vulnerabilities arising from misconfigurations. By automating infrastructure provisioning and configuration, IaC further empowers organizations to manage infrastructure securely. CI/CD Pipelines Continuous integration (CI) and continuous delivery (CD) are software development practices that allow teams to release software more frequently and reliably. CI pipelines automate the build and testing process, while CD pipelines automate the deployment process. Figure 1: CI/CD pipeline diagram Security Risks Associated With CI/CD Pipelines CI/CD pipelines, if not implemented and managed securely, can introduce potential security risks. Common security risks associated with CI/CD pipelines include: Vulnerable code – Malicious code can be introduced into the pipeline, either intentionally or unintentionally. Unsecured infrastructure – Infrastructure components used in the pipeline may be vulnerable to attack. Lack of access control – Unauthorized access to the pipeline can allow attackers to inject malicious code or modify configurations. Insufficient monitoring – Inadequate monitoring of pipeline activity can make it difficult to detect and respond to security incidents. Best Practices for Secure CI/CD Pipelines To mitigate security risks and ensure the secure operation of CI/CD pipelines, consider these best practices: Scan code for vulnerabilities – Regularly scan code for vulnerabilities using automated static and dynamic application security testing (SAST and DAST) tools. Harden infrastructure – Implement security measures to harden infrastructure components used in the pipeline. Enforce access control – Enforce strict access control policies to limit access to the pipeline and its components to only authorized personnel. Implement continuous monitoring – Continuously monitor pipeline activity for signs of suspicious activity or potential security incidents. Automate security patching – Automate the deployment of security patches and updates to minimize the time window for exploitation of vulnerabilities. Security Considerations for CI/CD Pipeline Management CI/CD pipeline management encompasses the planning, implementation, and maintenance of CI/CD pipelines. Security considerations for CI/CD pipeline management include: Security by design – Integrate security into the design and implementation of CI/CD pipelines from the outset. Use secure tools and processes – Employ security-focused tools and processes throughout the pipeline, from code scanning to deployment. Train and educate personnel – Provide training and education to developers, security professionals, and operations teams on secure CI/CD practices. Implement a vulnerability disclosure policy – Establish a clear and accessible vulnerability disclosure policy to encourage responsible reporting of security issues. CI/CD Pipelines for Automated Security Testing and Scanning CI/CD pipelines can be used to automate security testing and scanning throughout the development lifecycle. By integrating security checks into the pipeline, organizations can identify and address vulnerabilities early in the development process, reducing the risk of them reaching production. CI/CD Pipelines for Rapid Patch Deployment CI/CD pipelines can also be used to deploy security patches and updates quickly and efficiently. By automating the deployment process, organizations can minimize the time window for exploitation of vulnerabilities, reducing the potential impact of security breaches. Enhancing CI/CD Pipeline Management With Security Tools and Practices The use of security tools and practices can significantly improve the security posture of CI/CD pipeline management. Tools such as SAST and DAST can automate vulnerability detection, while practices such as access control and continuous monitoring can prevent unauthorized access and identify suspicious activity. Conclusion DevSecOps is a rapidly evolving field that offers organizations the opportunity to improve their security posture, reduce the risk of data breaches, and deliver software products and services more quickly. While DevSecOps teams face some challenges, there are also many opportunities to improve the security of their software products and services by adopting best practices for secure IaC, CI/CD pipelines, and infrastructure management. The future of DevSecOps is bright, and we can expect to see further innovation in this field, as well as increased integration with other DevOps practices. This is an article from DZone's 2023 Enterprise Security Trend Report.For more: Read the Report
This is an article from DZone's 2023 Enterprise Security Trend Report.For more: Read the Report Today, safeguarding assets is not just a priority; it's the cornerstone of survival. The lurking threats of security breaches and data leaks loom larger than ever, carrying the potential for financial fallout, reputational ruin, legal trouble, and much more. Thus, in today's digital battleground, it's no longer sufficient to merely react to threats. Instead, organizations must proactively fortify their defenses and enter the era of security-first design — an avant-garde approach that transcends traditional security measures. Security-first design is about strategic, forward-thinking defense that transforms our vulnerabilities into invincible strengths. Key Principles of Security-First Design Security-first design is an approach that emphasizes integrating robust security measures into the design and development of software systems from the outset. By prioritizing security considerations early on, organizations can proactively mitigate risks and build resilient systems that protect valuable data and assets. Let's discuss the key principles of security-first design. Threat Modeling Prerequisites to the security-first design journey require us to have a strategic understanding of potential threats and risks. Threat modeling involves examining a system from the perspective of an attacker, with the goal of understanding how they might exploit weaknesses or vulnerabilities to gain unauthorized access or cause harm. Several key steps to threat modeling are featured in Figure 1 below: Figure 1: Steps to threat modeling Principle of Least Privilege The principle of least privilege emphasizes the importance of limiting access to sensitive data and resources. This is achieved by granting users and systems with only the level of access required to perform their tasks, and nothing more. The main objective of least privilege is to reduce the risk of unauthorized access or misuse of sensitive data and resources. By limiting access, it becomes more difficult for attackers to compromise the system, and it also reduces the potential impact if an attack does occur. Defense in Depth The more complex the protocol is, the less likely it is to fail. Defense in depth involves implementing multiple layers of security controls to protect against potential threats. The idea behind defense in depth is that different layers of controls can provide complementary protection against different types of threats, and by combining multiple layers, the overall security of the system can be significantly improved. Key steps involved in implementing defense in depth are to: Identify potential threats – Identify the types of threats that could potentially affect the system and assign a level of risk based on its potential impact as well as the likelihood of occurrence. Implement multiple layers – Once potential threats have been identified, implement multiple layers of security controls to protect against them — e.g., firewalls, intrusion detection systems, antivirus software, access controls, encryption, physical security measures like cameras and locks, and other measures designed to prevent attacks and protect sensitive data. Test and validate – This should be done after implementing multiple layers of security controls. Secure Defaults The motivations behind secure defaults involve configuring systems and applications to operate in a secure state by default rather than relying on manual configurations or user input to set security settings. This is achieved by implementing default security controls — such as strong passwords, encryption, and access controls — and by making sure that these controls are enabled automatically when the system or application is installed or configured. The main objective of secure defaults is to improve the overall security of the system by reducing the risk of human error or oversight in configuring security settings. By implementing default security controls, organizations can ensure that their systems and applications are protected against common threats and vulnerabilities out of the box. Regular Updates and Patching Regular updates and patching are critical security practices that involve keeping software, systems, and applications up to date with the latest security patches and updates. This is done to ensure that known vulnerabilities and weaknesses are addressed in a timely manner to prevent attackers from exploiting them. Implementing Security-First Design Implementing security-first design is a holistic approach that encompasses the entire software development lifecycle (SDLC) — from requirements gathering to ongoing maintenance, the primary focus is on building and maintaining secure software systems. By integrating security principles and best practices at every stage, organizations can proactively identify and address security risks, safeguard sensitive data, and establish robust and resilient systems. Making security a top priority from the very beginning allows organizations to effectively protect their assets and mitigate potential threats. Secure SDLC Implementing security-first design in the SDLC is an art that involves enriching the process with security requirements. This enrichment allows organizations to proactively mitigate security risks, protect sensitive data, comply with regulations, maintain a trusted reputation, and build resilient and reliable software systems. By integrating security practices throughout the SDLC, organizations can save costs, enhance efficiency, and minimize the potential impact of security incidents. Prioritizing security from the outset reduces the likelihood of breaches, safeguards data, and fosters a culture of security awareness and responsibility. Ultimately, securing the SDLC is essential for long-term success, customer trust, and the protection of critical information assets. Secure Network Architecture Secure network architecture is the process of designing and implementing a network infrastructure to protect against threats and attacks. The goal of a secure network architecture is to provide a robust and secure environment for data transmission and communication between devices on the network. Figure 2: Identify risks and implement security practices Secure Authentication and Authorization Secure authentication and authorization involve verifying the identity of users and granting them access to the appropriate resources on a network or system. The goal of secure authentication and authorization is to ensure that only authorized users are allowed to access sensitive data and resources, and that their access is limited to only what is needed to perform their job functions. Incident Response Planning Incident response planning is the process of developing and implementing a plan to respond to security incidents, such as data breaches, cyberattacks, or unauthorized access. The goal of incident response planning is to minimize the impact of a security incident by quickly identifying and containing the attack, assessing the damage, restoring affected systems and data, and preventing similar incidents in the future. Testing and Continuous Improvement Testing and continuous improvement are critical components of the security-first design approach, ensuring the effectiveness and ongoing resilience of secure software systems. These practices involve thorough security testing methodologies, monitoring and logging, and employee awareness. Security Testing In digital security, remember that practice is key. Each rehearsal in conducting regular security testing, orchestrating real-time vigilance, and nurturing the guardians of cyber-surveillance is a step closer to mastering and safeguarding digital landscapes. It's not just a one-time job but a continuous refinement — an ongoing practice that resonates in the core of the digital experience. Security Monitoring and Logging Security monitoring and logging is the process of collecting, analyzing, and storing data related to system and network activity in order to detect and respond to security incidents. The goal of security monitoring and logging is to provide visibility into system and network activity, identify potential threats or anomalies, and facilitate incident response. Employee Training and Awareness Employee training and awareness is an essential component of any security program. The goal of employee training and awareness is to educate employees about the importance of security and how to identify and respond to potential threats, as well as to promote a culture of security within the organization. Regular training sessions should be conducted for all employees on topics related to security, such as password management, phishing attacks, social engineering, safe browsing practices, and incident response procedures. These sessions should be conducted at regular intervals to ensure that employees are up to date with the latest security best practices and threats. Case Study: A Successful Security-First Design There is a company in Zurich, let's call it Company X, that deals with crypto assets and had an IPO in 2017. In July 2017, a security incident occurred, involving an attempt to sweep bitcoins from Company X's cryptocurrency holdings, which would have wiped out their assets by 60%. The incident was detected and thwarted due to the robust security measures implemented as part of the organization's security-first design. At 2 a.m. (local time), the company's security system flagged an unusual access pattern related to the company's cryptocurrency wallet. The system immediately triggered an alert to the security team, who promptly initiated an investigation. Upon closer examination, it was discovered that an insider, identified as Jakub, attempted to access the cryptocurrency wallet with unauthorized privileges. He had lower-level access since he was an existing employee of the company, but the action that he was trying to take could lead to an unauthorized transfer of Bitcoin and Ethereum to an external wallet. The success in preventing the unauthorized transfer was attributed to the security-first design philosophy implemented within the organization's infrastructure. The following security measures were instrumental in detecting and responding to the incident: Access logs monitoring (orchestrating real-time vigilance) – The company's security system continuously monitors access logs for any unusual or unauthorized activities. In this case, the system detected the anomalous access pattern to the cold wallets, triggering an immediate response. The principle of least privilege – The insider attempted to use privileges beyond their authorized scope. The security-first design included strict controls on privilege escalation, preventing unauthorized access and transfer. Upon detection of the incident, the security team took swift action to isolate the affected systems, revoke unauthorized privileges, and launch an internal investigation. The insider was identified, and appropriate disciplinary actions were taken in accordance with company policies along with quick legal action. The incident serves as a testament to the effectiveness of the security-first design philosophy implemented by the company. By proactively monitoring logs, enforcing strict access controls, and ensuring real-time alerts, the organization successfully prevented the unauthorized transfer of bitcoins and demonstrated its commitment to safeguarding digital assets. Conclusion Moving forward, it is crucial to continue prioritizing security and embracing security-first design principles. By staying vigilant, regularly updating systems, conducting security testing, and fostering a culture of security awareness, both organizations and individuals can protect their assets, mitigate threats, and maintain customer trust. The ever-evolving software landscape requires ongoing adaptation and improvement to ensure the security of critical information assets. Embracing security-first design is a crucial step toward achieving long-term success in the digital age. This is an article from DZone's 2023 Enterprise Security Trend Report.For more: Read the Report
In the world of enterprise technology, shared platforms like Kafka, RabbitMQ, Apache Flink clusters, data warehouses, and monitoring platforms are essential components that support the robust infrastructure leading to modern microservices architectures. We see shared platforms acting as mediators between microservices, aggregating logs from them, providing cross-domain analytics, and many other cross-cutting functionalities. In this blog, I will explore shared platforms from the perspectives of both platform owners and platform users, revealing some best practices and strategies vital for a healthy technological ecosystem. Shared Platforms: The Rationale Behind Setting Them Up The existence of shared platforms in enterprise environments is born out of both necessity and strategic choice. Not every application can or should rely on its own isolated platform. Here's why shared platforms are a calculated decision for modern enterprises: Specialized skillset requirements: The operation and maintenance of advanced platforms such as Kafka or Flink require a high level of expertise. Acquiring such specialized skills for every separate application is not practical or economical. Shared platforms allow for a pool of Subject Matter Experts (SMEs) who maintain and optimize these resources efficiently. Centralized data sharing and management: In a microservices architecture, different applications often need to exchange data. Shared platforms facilitate this data exchange, acting as a central hub, which is more efficient than managing multiple points of integration between isolated platforms. Cost optimization: The financial implications of licensing, infrastructure, and operational costs are significant when duplicated across multiple platforms. Shared platforms consolidate these costs, enabling businesses to benefit from economies of scale. By sharing resources, enterprises can optimize their investments and reduce overall expenses. Resource utilization: Dedicated platforms can lead to underutilization of capacity, where servers and services might idle when the demand from a single application does not match the maximum capacity. Shared platforms ensure that resources are utilized more evenly and effectively, reducing waste of resources and improving overall return on investment (ROI). Agility and scalability: Shared platforms offer a flexible foundation that can quickly adapt to changing needs. As new applications come online or existing ones grow, shared platforms can scale to accommodate these demands without the lead time and costs associated with setting up new infrastructure. Consistency and compliance: Ensuring compliance with industry regulations and standards can be complex and resource-intensive. Shared platforms can be designed to meet these requirements universally, providing a consistent and compliant environment for all applications. Innovation and collaboration: Shared platforms can foster an environment of innovation and collaboration. They provide a common ground where different teams can work together, share insights, and develop new solutions that benefit from the shared platform's capabilities. Disaster recovery and business continuity: Centralized shared platforms can be more easily managed for disaster recovery purposes. They allow for streamlined backup processes and quicker restoration of services in the event of a system outage or other disruptive incidents. Ready integrations with other platforms: Shared platforms commonly come with a suite of integrations and connectors that facilitate interaction with other systems, including monitoring and alerting tools. This interconnectedness means that new applications can plug into a rich ecosystem of services without the need for additional integration work. The ease of connecting to established monitoring systems, for example, simplifies the monitoring of applications significantly. Now that we have explored some advantages of shared platforms, it's important to talk about considerations for platform owners and users. This understanding will ensure that the benefits of shared platforms are fully realized while minimizing the potential for bottlenecks. The considerations for each group play a vital role in maintaining a healthy, efficient, and scalable shared platform ecosystem. Protecting the Platform: The Owner's Perspective Platform owners bear the crucial responsibility of safeguarding their platforms against potential abuses, such as a single application going rogue or the side effects of overprovisioning. Platform owners should also make sure they follow FinOps practices to make their systems maintainable and feasible from a cost point of view. To maintain a healthy, efficient, and cost-efficient platform, several practices are essential: Implementing quotas: Quotas are vital in preventing any single application from monopolizing resources. By setting limits on usage, platform owners ensure fair resource distribution among all consumers. RabbitMQ maximum queue sizes and Kafka throughput quotas are common examples we see in the enterprise. Monitoring metrics: Continuous monitoring helps in identifying unusual patterns or potential issues before they escalate. Metrics provide insights into the platform's health and guide decision-making processes. Metrics also reveal the usage and changes in demand information, which is crucial for platform planning practices. Retention period policies: The retention period determines how long data is stored before being discarded. Implementing data retention policies is crucial to prevent disk space from filling up, which can lead to performance degradation or even system failures. Emitting important updates: When there are updates or changes to the Flink cluster, such as version upgrades or maintenance schedules, it's important to inform all users. This ensures that they can prepare for any potential impact on their data processing tasks. For instance, a version upgrade might introduce new features or deprecate old ones, and users need to be aware of these changes to adapt their applications accordingly. To implement this, platform owners can use a variety of tools and methods, such as email alerts, messaging services like Slack or Microsoft Teams, or even custom webhooks that integrate with users' own monitoring systems. The key is to ensure that the communication is timely, relevant, and actionable, thereby maintaining transparency and trust between the platform owners and the users. The platform team: The Platform Team plays a critical role that extends beyond merely keeping the lights on; they are the custodians of the platform's integrity and efficiency. Governance is a key part of their responsibilities, encompassing tasks such as approving new publishers or consumers to ensure that those who use the platform can contribute positively to the shared ecosystem. Maintaining an optimal team size is crucial to ensure agility and responsiveness. It's important that the team is not bloated, which can slow down processes, nor too lean, which can lead to burnout and oversight issues. Investing in continuous education for the team is equally important. A well-informed team stays ahead of the curve, adopting the latest best practices in technology and governance to streamline operations. Streamlining governance practices is also essential to avoid becoming a bottleneck for new applications that are looking to onboard. The goal is to establish clear, efficient processes that facilitate rather than slow down progress. This might involve automating certain approval processes or setting up self-service portals for routine requests, thereby freeing up the team to focus on more complex tasks that require their expertise. By optimizing governance procedures, the Platform Team not only enhances its own productivity but also drives the overall velocity of the enterprise's innovation and growth. Cost concerns: In managing shared platforms, financial operations, or FinOps, are critical disciplines that platform owners must embrace. The essence of FinOps is to drive cost efficiency without compromising on performance or capability. This involves a continuous cycle of monitoring, optimization, and negotiation. Here's how platform owners can apply FinOps practices to their shared platforms: Tiered storage: Implementing tiered storage solutions can lead to significant cost savings. By storing older, less frequently accessed data on cheaper storage while keeping hot data on higher-performance (but more expensive) storage, platform owners can optimize for both cost and performance. Demand-driven scaling: Scale-in initiatives during periods of low demand help in reducing costs. Conversely, scaling out when demand spikes ensures performance isn't compromised. This elastic approach to infrastructure management is essential to maintain a balance between cost and capability. Negotiating cloud costs: Regular discussions with cloud account managers can reveal discount opportunities. Platform owners should be proactive in seeking out cost-saving measures through commitments or custom pricing packages. Reserved instances: For predictable workloads with consistent demand, purchasing reserved instances can provide substantial savings over on-demand pricing. This guarantees a base level of resource availability and can be significantly more cost-effective. Spot instances: Utilizing spot instances for non-critical or flexible workloads can further reduce costs. These instances are available at a fraction of the cost of on-demand resources but require the ability to handle possible interruptions. Budgeting and reporting: Implementing a transparent budgeting and reporting process helps in tracking cloud spending. This should be done in real-time, where possible, to allow for immediate adjustments and prevent budget overruns. Protecting the Application: The User's Perspective Platform users, including developers and consumer teams, must align their practices with platform constraints and capabilities. Writing software that can handle throttling effects, understanding resource limits, and maintaining open communication with platform owners are key. This approach ensures users can maximize the benefits of shared platforms without disrupting the overall system balance. As a developer or architect, while you engage with your platform teams and collaborate with their experts, it's crucial to study not only the strengths but also the limitations of the platform. Ensure that there's a seamless integration of your applications by thoroughly understanding the platform's SLAs and assessing whether they align with your business requirements. This due diligence is key to confirming that the platform can support your operational goals and drive your enterprise toward success. Conclusion In this blog, I've dived into the strategic details of shared platforms, highlighting their critical role in enterprise technology. From cost-saving efficiencies to fostering innovation, shared platforms are more than a choice; they are a necessity for agile and scalable growth. The next step is being more proactive! Actively reach out to your platform teams and establish a dialogue. Take the initiative to understand the service level agreements (SLAs) and complexities of the platforms you rely on. Assess how they match up with your business needs and where you might need to plan for adjustments. Together, as a unified “one team” of application and platform teams, you have the power to drive your enterprise to a new maturity level of efficiency and innovation.
December 20, 2023 by
Streamlining Development Workflows With Internal Platforms
December 20, 2023
by
CORE
PHP Laravel Cache Setup for Apitoolkit to Avoid SDK Reinit
December 20, 2023 by
Query a Database With Arrow Flight
December 20, 2023 by
Explainable AI: Making the Black Box Transparent
May 16, 2023 by
From Elasticsearch to Apache Doris: Upgrading an Observability Platform
December 20, 2023 by
How To Make Legacy Code More Testable
December 20, 2023 by
Low Code vs. Traditional Development: A Comprehensive Comparison
May 16, 2023 by
PHP Laravel Cache Setup for Apitoolkit to Avoid SDK Reinit
December 20, 2023 by
Query a Database With Arrow Flight
December 20, 2023 by
From Elasticsearch to Apache Doris: Upgrading an Observability Platform
December 20, 2023 by
How To Make Legacy Code More Testable
December 20, 2023 by
Low Code vs. Traditional Development: A Comprehensive Comparison
May 16, 2023 by
Implementing Virtualization on a Mini PC
December 20, 2023 by
Mastering MERN Full Stack Development
December 20, 2023 by
Five IntelliJ Idea Plugins That Will Change the Way You Code
May 15, 2023 by