Containers
The proliferation of containers in recent years has increased the speed, portability, and scalability of software infrastructure and deployments across all kinds of application architectures and cloud-native environments. Now, with more and more organizations migrated to the cloud, what's next? The subsequent need to efficiently manage and monitor containerized environments remains a crucial task for teams. With organizations looking to better leverage their containers — and some still working to migrate out of their own monolithic environments — the path to containerization and architectural modernization remains a perpetual climb. In DZone's 2023 Containers Trend Report, we will explore the current state of containers, key trends and advancements in global containerization strategies, and constructive content for modernizing your software architecture. This will be examined through DZone-led research, expert community articles, and other helpful resources for designing and building containerized applications.
Distributed SQL Essentials
Advanced Cloud Security
There are more than 250,000 companies/organizations around the world leaning on SharePoint to securely manage their most valuable documents, and more than 3 million total users. This widespread popularity makes the platform a market-leading document management solution - and this, by extension, makes it a worthwhile target for motivated threat actors. Bypassing SharePoint’s built-in security is an extremely difficult task, of course. The O365 environment provides tenants with powerful protection at every entry point, from exhaustive physical data center security up to leading-edge application security policies. Top-notch file encryption with SSL and TLS connections is applied to keep user data safe in transit, and BitLocker disk-level encryption with unique encryption keys is used to secure files at rest. Further, as infected file uploads have grown to become an extremely common attack vector, O365 provides built-in virus and malware detection policies (along with anti-phishing policies and various additional email link and attachment security measures) which can be customized extensively per individual or organizational tenants' needs. The list goes on, with each tenant's specific subscription level ultimately determining the extent of their built-in protection. As powerful as SharePoint's customizable built-in security policies are, however, no storage platform's policies are ever intended to be applied as a single point of protection for sensitive data. Document storage security, like any branch of cybersecurity, is a moving target requiring myriad solutions working together to jointly create a formidable defense against evolving attack vectors. In other words, any tenant’s threat profile can always be improved upon with selective layering of external security policies on top of built-in security policies. In the remainder of this article, I’ll demonstrate a free-to-use Virus Scanning API solution that can be integrated with a SharePoint Site Drive instance to scan files for viruses, malware, and a variety of non-malware content threats, working alongside O365's built-in asynchronous scanning to root out a wide range of file upload threat types. Demonstration The Advanced Virus Scan API below is intended to serve as a powerful layer of document storage security in conjunction with SharePoint's built-in customizable policies, directly scanning new file uploads in targeted Site Drive instances for a growing list of 17 million+ virus and malware signatures (including ransomware, spyware, trojans, etc.), while also performing full content verification to identify invalid file types and other non-malware threats hidden behind misleading file names and illegitimate file extensions. This API also allows developers to set custom restrictions against unwanted file types in the API request body, so various unnecessary and potentially threatening file types can be detected and deleted outright regardless of the legitimacy of their contents. For example, a Site Drive storing contract documents likely only requires common file types like .DOCX or .PDF: limiting files to these types helps minimize risks without compromising workflow efficiency. Below, I’ve outlined the information you’ll need to integrate this API with your SharePoint Online Site Drive instance, and I’ve provided ready-to-run Java code examples to help you structure your API call with ease. To start off, you’ll need to gather the following SharePoint information to satisfy mandatory parameters in the API request body: Client ID (Client ID access credentials; can be obtained from Azure Active Directory portal) Client Secret (Client Secret access credentials; also obtained from Azure Active Directory portal SharePoint Domain Name (i.e., yourdomain.sharepoint.com) Site ID (the specific SharePoint ID for the site drive you want to retrieve and scan files from) Optionally, you can also gather the following SharePoint information: Tenant ID (pertaining to your Azure Active Directory) File Path (path of a specific file within your Site Drive) Item ID (e.g., DriveItem ID) Once you’ve gotten all your mandatory information, you can start client SDK installation by adding the following reference to the repository in your Maven POM File (JitPack is used to dynamically compile the library): XML <repositories> <repository> <id>jitpack.io</id> <url>https://jitpack.io</url> </repository> </repositories> Then you can wrap up by adding the following reference to the dependency: XML <dependencies> <dependency> <groupId>com.github.Cloudmersive</groupId> <artifactId>Cloudmersive.APIClient.Java</artifactId> <version>v4.25</version> </dependency> </dependencies> At this point, you can add the imports and copy Java code examples to structure your API call: Java // Import classes: //import com.cloudmersive.client.invoker.ApiClient; //import com.cloudmersive.client.invoker.ApiException; //import com.cloudmersive.client.invoker.Configuration; //import com.cloudmersive.client.invoker.auth.*; //import com.cloudmersive.client.ScanCloudStorageApi; ApiClient defaultClient = Configuration.getDefaultApiClient(); // Configure API key authorization: Apikey ApiKeyAuth Apikey = (ApiKeyAuth) defaultClient.getAuthentication("Apikey"); Apikey.setApiKey("YOUR API KEY"); // Uncomment the following line to set a prefix for the API key, e.g. "Token" (defaults to null) //Apikey.setApiKeyPrefix("Token"); ScanCloudStorageApi apiInstance = new ScanCloudStorageApi(); String clientID = "clientID_example"; // String | Client ID access credentials; see description above for instructions on how to get the Client ID from the Azure Active Directory portal. String clientSecret = "clientSecret_example"; // String | Client Secret access credentials; see description above for instructions on how to get the Client Secret from the Azure Active Directory portal String sharepointDomainName = "sharepointDomainName_example"; // String | SharePoint Online domain name, such as mydomain.sharepoint.com String siteID = "siteID_example"; // String | Site ID (GUID) of the SharePoint site you wish to retrieve the file from String tenantID = "tenantID_example"; // String | Optional; Tenant ID of your Azure Active Directory String filePath = "filePath_example"; // String | Path to the file within the drive, such as 'hello.pdf' or '/folder/subfolder/world.pdf'. If the file path contains Unicode characters, you must base64 encode the file path and prepend it with 'base64:', such as: 'base64:6ZWV6ZWV6ZWV6ZWV6ZWV6ZWV'. String itemID = "itemID_example"; // String | SharePoint itemID, such as a DriveItem Id Boolean allowExecutables = true; // Boolean | Set to false to block executable files (program code) from being allowed in the input file. Default is false (recommended). Boolean allowInvalidFiles = true; // Boolean | Set to false to block invalid files, such as a PDF file that is not really a valid PDF file, or a Word Document that is not a valid Word Document. Default is false (recommended). Boolean allowScripts = true; // Boolean | Set to false to block script files, such as a PHP files, Python scripts, and other malicious content or security threats that can be embedded in the file. Set to true to allow these file types. Default is false (recommended). Boolean allowPasswordProtectedFiles = true; // Boolean | Set to false to block password protected and encrypted files, such as encrypted zip and rar files, and other files that seek to circumvent scanning through passwords. Set to true to allow these file types. Default is false (recommended). Boolean allowMacros = true; // Boolean | Set to false to block macros and other threats embedded in document files, such as Word, Excel and PowerPoint embedded Macros, and other files that contain embedded content threats. Set to true to allow these file types. Default is false (recommended). Boolean allowXmlExternalEntities = true; // Boolean | Set to false to block XML External Entities and other threats embedded in XML files, and other files that contain embedded content threats. Set to true to allow these file types. Default is false (recommended). String restrictFileTypes = "restrictFileTypes_example"; // String | Specify a restricted set of file formats to allow as clean as a comma-separated list of file formats, such as .pdf,.docx,.png would allow only PDF, PNG and Word document files. All files must pass content verification against this list of file formats, if they do not, then the result will be returned as CleanResult=false. Set restrictFileTypes parameter to null or empty string to disable; default is disabled. try { CloudStorageAdvancedVirusScanResult result = apiInstance.scanCloudStorageScanSharePointOnlineFileAdvanced(clientID, clientSecret, sharepointDomainName, siteID, tenantID, filePath, itemID, allowExecutables, allowInvalidFiles, allowScripts, allowPasswordProtectedFiles, allowMacros, allowXmlExternalEntities, restrictFileTypes); System.out.println(result); } catch (ApiException e) { System.err.println("Exception when calling ScanCloudStorageApi#scanCloudStorageScanSharePointOnlineFileAdvanced"); e.printStackTrace(); } To satisfy the request authentication parameter, you'll need to provide a free-tier API key, which will allow you to scan up to 800 files per month. Within this request body, you can set Booleans to apply custom non-malware threat policies against files containing executables, invalid files, scripts, password-protected files, macros, XML external entities, insecure deserialization, and HTML, and you can provide a comma-separated list of acceptable file types in the restrictFileTypes parameter to disallow unwanted file extensions. Any files violating these policies will automatically receive a CleanResult: False value in the API response body, which is the same value assigned to files containing viruses and malware. The idea is to enact 360-degree content protection in a single request so you can quickly delete (or quarantine/analyze) files that may pose a serious risk to your system. Below, I’ve provided a full example API response for your reference: JSON { "Successful": true, "CleanResult": true, "ContainsExecutable": true, "ContainsInvalidFile": true, "ContainsScript": true, "ContainsPasswordProtectedFile": true, "ContainsRestrictedFileFormat": true, "ContainsMacros": true, "VerifiedFileFormat": "string", "FoundViruses": [ { "FileName": "string", "VirusName": "string" } ], "ErrorDetailedDescription": "string", "FileSize": 0, "ContentInformation": { "ContainsJSON": true, "ContainsXML": true, "ContainsImage": true, "RelevantSubfileName": "string" } } It’s worth noting that regardless of how you choose to set your custom threat rules, files containing JSON, XML, or embedded images will be labeled as such in the API response as well.
Abstract “Idempotence is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application” [Resource 3]. The purpose of this article is to outline a few ways of creating idempotent changes when the database modifications are managed with Liquibase. Throughout the lifetime of a software product that has such tier, various database modifications are being applied as it evolves. The more robust the modifications are, the more maintainable the solution is. In order to accomplish such a way of working, it is usually a good practice to design the executed changesets to have zero side effects, that is, to be able to be run as many times as needed with the same end result. The simple proof of concept built here aims to show case how Liquibase changesets may be written to be idempotent. Moreover, the article explains in more depth what exactly happens when the application starts. Set Up Java 17 Spring Boot v.3.1.0 Liquibase 4.20.0 PostgreSQL Driver 42.6.0 Maven 3.6.3 Proof of Concept As PostgreSQL is the database used here, first and foremost one shall create a new schema — liquidempo. This operation is easy to accomplish by issuing the following SQL command, once connected to the database. SQL create schema liquidempo; At the application level: The Maven Spring Boot project is created and configured to use the PostgreSQL Driver, Spring Data JPA and Liquibase dependencies. A simple entity is created — Human — with only one attribute, a unique identifier which is also the primary key at database level. Java @Entity @Table(name = "human") @SequenceGenerator(sequenceName = "human_seq", name = "CUSTOM_SEQ_GENERATOR", allocationSize = 1) public class Human { @Id @GeneratedValue(strategy = GenerationType.AUTO, generator = "CUSTOM_SEQ_GENERATOR") @Column(name = "id") private Long id; public Long getId() { return id; } public void setId(Long id) { this.id = id; } } For convenience, when entities are stored, their unique identifiers are generated using a database sequence called human_seq. The data source is configured as usual in the application.properties file. Properties files spring.datasource.type=com.zaxxer.hikari.HikariDataSource spring.datasource.url=jdbc:postgresql://localhost:5432/postgres?currentSchema=liquidempo&useUnicode=true&characterEncoding=utf8&useSSL=false&allowPublicKeyRetrieval=true spring.datasource.username=postgres spring.datasource.password=123456 spring.jpa.database-platform=org.hibernate.dialect.PostgreSQLDialect spring.jpa.hibernate.ddl-auto=none The previously created schema is referred to in the connection URL. DDL handling is disabled, as the infrastructure and the data are intended to be persistent when the application is restarted. As Liquibase is the database migration manager, the changelog path is configured in the application.properties file as well. Properties files spring.liquibase.change-log=classpath:/db/changelog/db.changelog-root.xml For now, the db.changelog-root.xml file is empty. The current state of the project requires a few simple changesets, in order to create the database elements depicted around the Human entity — the table, the sequence, and the primary key constraint. XML <?xml version="1.0" encoding="UTF-8"?> <databaseChangeLog xmlns="http://www.liquibase.org/xml/ns/dbchangelog" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog https://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-4.17.xsd"> <changeSet author="horatiucd" id="100"> <createSequence sequenceName="human_seq" startValue="1" incrementBy="1"/> </changeSet> <changeSet author="horatiucd" id="200"> <createTable tableName="human"> <column name="id" type="BIGINT"> <constraints nullable="false"/> </column> </createTable> </changeSet> <changeSet author="horatiucd" id="300"> <addPrimaryKey columnNames="id" constraintName="human_pk" tableName="human"/> </changeSet> </databaseChangeLog> In order for these to be applied, they need to be recorded as part of db.changelog-root.xml file, as indicated below. XML <?xml version="1.0" encoding="UTF-8"?> <databaseChangeLog xmlns="http://www.liquibase.org/xml/ns/dbchangelog" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog https://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-4.17.xsd"> <include file="db/changelog/human_init.xml"/> </databaseChangeLog> When the application is restarted, the three changesets are executed in the order they are declared. Plain Text INFO 9092 --- [main] liquibase.database : Set default schema name to liquidempo INFO 9092 --- [main] liquibase.lockservice : Successfully acquired change log lock INFO 9092 --- [main] liquibase.changelog : Creating database history table with name: liquidempo.databasechangelog INFO 9092 --- [main] liquibase.changelog : Reading from liquidempo.databasechangelog Running Changeset: db/changelog/human_init.xml::100::horatiucd INFO 9092 --- [main] liquibase.changelog : Sequence human_seq created INFO 9092 --- [main] liquibase.changelog : ChangeSet db/changelog/human_init.xml::100::horatiucd ran successfully in 6ms Running Changeset: db/changelog/human_init.xml::200::horatiucd INFO 9092 --- [main] liquibase.changelog : Table human created INFO 9092 --- [main] liquibase.changelog : ChangeSet db/changelog/human_init.xml::200::horatiucd ran successfully in 4ms Running Changeset: db/changelog/human_init.xml::300::horatiucd INFO 9092 --- [main] liquibase.changelog : Primary key added to human (id) INFO 9092 --- [main] liquibase.changelog : ChangeSet db/changelog/human_init.xml::300::horatiucd ran successfully in 8ms INFO 9092 --- [main] liquibase : Update command completed successfully. INFO 9092 --- [main] liquibase.lockservice : Successfully released change log lock Moreover, they are recorded as separate rows in the databasechangelog database table. Plain Text +---+---------+---------------------------+--------------------------+-------------+--------+----------------------------------+------------------------------------------------------+ |id |author |filename |dateexecuted |orderexecuted|exectype|md5sum |description | +---+---------+---------------------------+--------------------------+-------------+--------+----------------------------------+------------------------------------------------------+ |100|horatiucd|db/changelog/human_init.xml|2023-05-26 16:23:17.184239|1 |EXECUTED|8:db8c5fb392dc96efa322da2c326b5eba|createSequence sequenceName=human_seq | |200|horatiucd|db/changelog/human_init.xml|2023-05-26 16:23:17.193031|2 |EXECUTED|8:ed8e5e7df5edb17ed9a0682b9b640d7f|createTable tableName=human | |300|horatiucd|db/changelog/human_init.xml|2023-05-26 16:23:17.204184|3 |EXECUTED|8:a2d6eff5a1e7513e5ab7981763ae532b|addPrimaryKey constraintName=human_pk, tableName=human| +---+---------+---------------------------+--------------------------+-------------+--------+----------------------------------+------------------------------------------------------+ So far, everything is straightforward, nothing out of the ordinary — a simple Spring Boot application whose database changes are managed with Liquibase. When examining the above human_init.xml file, one can easily depict the three scripts that result from the three changesets. None is idempotent. It means that if they are executed again (although there is no reason for doing it here) errors will occur because the human_seq sequence, the human table, and the human_pk primary key already exist. Idempotent Changesets If the SQL code that results from the XML changesets had been written directly and aimed to be idempotent, it would have read as follows: SQL CREATE SEQUENCE IF NOT EXISTS human_seq INCREMENT 1 MINVALUE 1 MAXVALUE 99999999999; CREATE TABLE IF NOT EXISTS human ( id SERIAL CONSTRAINT human_pk PRIMARY KEY ); If the two commands are executed several times, no errors occur and the outcome remains the same. After the first run, the sequence, the table, and the constraint are created, then every new execution leaves them in the same usable state. The aim is to accomplish the same in the written Liquibase changesets (changelog). According to the Liquibase documentation [Resource 1]: “Preconditions are tags you add to your changelog or individual changesets to control the execution of an update based on the state of the database. Preconditions let you specify security and standardization requirements for your changesets. If a precondition on a changeset fails, Liquibase does not deploy that changeset.” These constructs may be configured in various ways, either at changelog or changeset level. For simplicity, the three changesets of this proof of concept will be made idempotent. Basically, whenever a changeset fails to execute because the entity (sequence, table, or primary key) already exists, it would be convenient to continue and not halt the execution of the entire changelog and not be able to start the application. In this direction, Liquibase preconditions provide at least two options: Either skip over the changeset and continue with the changelog, or Skip over the changeset but mark it as executed and continue with the changelog. Either of the two can be configured by adding a preConditions tag in the changeset of interest and setting the onFail attribute as CONTINUE (the former case) or MARK_RAN (the latter case). In pseudo-code, this looks as below: XML <changeSet author="horatiucd" id="100"> <preConditions onFail="CONTINUE or MARK_RAN"> ... </preConditions> ... </changeSet> This seems in line with the initial desire — execute the changeset only if the preconditions are met. Next, each of the two situations is analyzed. onFail=”CONTINUE” The changelog file — human_init_idempo_continue.xml — becomes as below: XML <?xml version="1.0" encoding="UTF-8"?> <databaseChangeLog xmlns="http://www.liquibase.org/xml/ns/dbchangelog" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog https://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-4.17.xsd"> <changeSet author="horatiucd" id="101"> <preConditions onFail="CONTINUE"> <not> <sequenceExists sequenceName="human_seq"/> </not> </preConditions> <createSequence sequenceName="human_seq" startValue="1" incrementBy="1"/> </changeSet> <changeSet author="horatiucd" id="201"> <preConditions onFail="CONTINUE"> <not> <tableExists tableName="human"/> </not> </preConditions> <createTable tableName="human"> <column name="id" type="BIGINT"> <constraints nullable="false"/> </column> </createTable> </changeSet> <changeSet author="horatiucd" id="301"> <preConditions onFail="CONTINUE"> <not> <primaryKeyExists primaryKeyName="human_pk" tableName="human"/> </not> </preConditions> <addPrimaryKey columnNames="id" constraintName="human_pk" tableName="human"/> </changeSet> </databaseChangeLog> For each item, the precondition checks if it does not exist. When running the application, the log shows what is executed: Plain Text INFO 49016 --- [main] liquibase.database : Set default schema name to liquidempo INFO 49016 --- [main] liquibase.changelog : Reading from liquidempo.databasechangelog INFO 49016 --- [main] liquibase.lockservice : Successfully acquired change log lock Running Changeset: db/changelog/human_init_idempo_continue.xml::101::horatiucd INFO 49016 --- [main] liquibase.changelog : Continuing past: db/changelog/human_init_idempo_continue.xml::101::horatiucd despite precondition failure due to onFail='CONTINUE': db/changelog/db.changelog-root.xml : Not precondition failed Running Changeset: db/changelog/human_init_idempo_continue.xml::201::horatiucd INFO 49016 --- [main] liquibase.changelog : Continuing past: db/changelog/human_init_idempo_continue.xml::201::horatiucd despite precondition failure due to onFail='CONTINUE': db/changelog/db.changelog-root.xml : Not precondition failed Running Changeset: db/changelog/human_init_idempo_continue.xml::301::horatiucd INFO 49016 --- [main] liquibase.changelog : Continuing past: db/changelog/human_init_idempo_continue.xml::301::horatiucd despite precondition failure due to onFail='CONTINUE': db/changelog/db.changelog-root.xml : Not precondition failed INFO 49016 --- [main] liquibase : Update command completed successfully. INFO 49016 --- [main] liquibase.lockservice : Successfully released change log lock As expected, all three preconditions failed and the execution of the changelog continued. The databasechangelog database table does not have any records in addition to the previous three, which means the changesets will be attempted to be executed again at the next startup of the application. onFail=”MARK_RAN” The changelog file — human_init_idempo_mark_ran.xml — is similar to the one in human_init_idempo_continue.xml. The only difference is the onFail attribute, which is set as onFail="MARK_RAN". The db.changelog-root.xml root changelog now looks as below: XML <?xml version="1.0" encoding="UTF-8"?> <databaseChangeLog xmlns="http://www.liquibase.org/xml/ns/dbchangelog" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog https://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-4.17.xsd"> <include file="db/changelog/human_init.xml"/> <include file="db/changelog/human_init_idempo_continue.xml"/> <include file="db/changelog/human_init_idempo_mark_ran.xml"/> </databaseChangeLog> For this proof of concept, all three files were kept on purpose, in order to be able to observe the behavior in detail. If the application is restarted, no errors are encountered and the log depicts the following: Plain Text INFO 38788 --- [main] liquibase.database : Set default schema name to liquidempo INFO 38788 --- [main] liquibase.changelog : Reading from liquidempo.databasechangelog INFO 38788 --- [main] liquibase.lockservice : Successfully acquired change log lock INFO 38788 --- [main] liquibase.changelog : Reading from liquidempo.databasechangelog Running Changeset: db/changelog/human_init_idempo_continue.xml::101::horatiucd INFO 38788 --- [main] liquibase.changelog : Continuing past: db/changelog/human_init_idempo_continue.xml::101::horatiucd despite precondition failure due to onFail='CONTINUE': db/changelog/db.changelog-root.xml : Not precondition failed Running Changeset: db/changelog/human_init_idempo_continue.xml::201::horatiucd INFO 38788 --- [main] liquibase.changelog : Continuing past: db/changelog/human_init_idempo_continue.xml::201::horatiucd despite precondition failure due to onFail='CONTINUE': db/changelog/db.changelog-root.xml : Not precondition failed Running Changeset: db/changelog/human_init_idempo_continue.xml::301::horatiucd INFO 38788 --- [main] liquibase.changelog : Continuing past: db/changelog/human_init_idempo_continue.xml::301::horatiucd despite precondition failure due to onFail='CONTINUE': db/changelog/db.changelog-root.xml : Not precondition failed Running Changeset: db/changelog/human_init_idempo_mark_ran.xml::101::horatiucd INFO 38788 --- [main] liquibase.changelog : Marking ChangeSet: "db/changelog/human_init_idempo_mark_ran.xml::101::horatiucd" as ran despite precondition failure due to onFail='MARK_RAN': db/changelog/db.changelog-root.xml : Not precondition failed Running Changeset: db/changelog/human_init_idempo_mark_ran.xml::201::horatiucd INFO 38788 --- [main] liquibase.changelog : Marking ChangeSet: "db/changelog/human_init_idempo_mark_ran.xml::201::horatiucd" as ran despite precondition failure due to onFail='MARK_RAN': db/changelog/db.changelog-root.xml : Not precondition failed Running Changeset: db/changelog/human_init_idempo_mark_ran.xml::301::horatiucd INFO 38788 --- [main] liquibase.changelog : Marking ChangeSet: "db/changelog/human_init_idempo_mark_ran.xml::301::horatiucd" as ran despite precondition failure due to onFail='MARK_RAN': db/changelog/db.changelog-root.xml : Not precondition failed INFO 38788 --- [main] liquibase : Update command completed successfully. INFO 38788 --- [main] liquibase.lockservice : Successfully released change log lock The changesets with onFail="CONTINUE" were tried to be re-executed, as this is a new attempt, while the ones with onFail="MARK_RAN" were marked in the databasechangelog and will be passed over at the next start-up. Plain Text +---+---------+-------------------------------------------+--------------------------+-------------+--------+----------------------------------+------------------------------------------------------+ |id |author |filename |dateexecuted |orderexecuted|exectype|md5sum |description | +---+---------+-------------------------------------------+--------------------------+-------------+--------+----------------------------------+------------------------------------------------------+ |100|horatiucd|db/changelog/human_init.xml |2023-05-26 16:23:17.184239|1 |EXECUTED|8:db8c5fb392dc96efa322da2c326b5eba|createSequence sequenceName=human_seq | |200|horatiucd|db/changelog/human_init.xml |2023-05-26 16:23:17.193031|2 |EXECUTED|8:ed8e5e7df5edb17ed9a0682b9b640d7f|createTable tableName=human | |300|horatiucd|db/changelog/human_init.xml |2023-05-26 16:23:17.204184|3 |EXECUTED|8:a2d6eff5a1e7513e5ab7981763ae532b|addPrimaryKey constraintName=human_pk, tableName=human| |101|horatiucd|db/changelog/human_init_idempo_mark_ran.xml|2023-05-29 16:40:26.453305|4 |MARK_RAN|8:db8c5fb392dc96efa322da2c326b5eba|createSequence sequenceName=human_seq | |201|horatiucd|db/changelog/human_init_idempo_mark_ran.xml|2023-05-29 16:40:26.463021|5 |MARK_RAN|8:ed8e5e7df5edb17ed9a0682b9b640d7f|createTable tableName=human | |301|horatiucd|db/changelog/human_init_idempo_mark_ran.xml|2023-05-29 16:40:26.475153|6 |MARK_RAN|8:a2d6eff5a1e7513e5ab7981763ae532b|addPrimaryKey constraintName=human_pk, tableName=human| +---+---------+-------------------------------------------+--------------------------+-------------+--------+----------------------------------+------------------------------------------------------+ At the next run of the application, the log will be similar to the one where the onFail was set on "CONTINUE". One more observation is worth making at this point. In case of a changeset whose preconditions do not fail, they are executed normally and recorded with exectype = EXECUTED in the databasechangelog table. Conclusions This article presented two ways of writing idempotent Liquibase changesets, a practice that allows having more robust and easy-to-maintain applications. This was accomplished by leveraging the changeset preConditions tag inside the changelog files. While both onFail attribute values — CONTINUE and MARK_RAN — may be used depending on the actual performed operation, the latter seems more appropriate for this proof of concept, as it does not attempt to re-run the changesets at every start-up of the application. Resources Liquibase Documentation Source code for the sample application Idempotence
Minikube: it's probably the simplest and the most approachable K8s cluster. As a lightweight K8s distribution designed for the purpose to run with low resources, an effective Minikube setup doesn't require anything else than your own laptop. From this perspective, Minikube is a great choice for development environments, able to give quick access to infrastructure elements like nodes, pods, deployments, services, and other K8s subtleties, more difficult to implement in a full-scale scenario. As a K8s-native runtime, Quarkus supports various types of clusters, including but not limited to Minikube, Kind, OpenShift, EKS (Elastic Kubernetes Service), AKS (Azure Kubernetes Service), etc. Packaging to Docker Images Quarkus offers several choices to package a cloud-native application based on different techniques and tools, as follows: Jib Docker S2I (Source to Image) In our prototype, we're using Jib, which, as opposed to the other two methods, has the advantage of not requiring a Docker daemon running on the host machine. In order to take advantage of it, just include the following Maven dependency in the master pom.xml file: XML ... <dependency> <groupId>io.quarkus</groupId> <artifactId>quarkus-container-image-jib</artifactId> </dependency> ... Run a new build: Shell mvn -DskipTests -Dquarkus.container-image.build=true clean package When finished, if there is a Docker daemon running locally, the container image creation may be checked as follows: Shell $ docker images REPOSITORY TAG IMAGE ID CREATED SIZE aws-quarkus/aws-camelk-sqs 1.0.0-SNAPSHOT 63072102ba00 9 days ago 382MB aws-quarkus/aws-camelk-jaxrs 1.0.0-SNAPSHOT 776a0f99c5d6 9 days ago 387MB aws-quarkus/aws-camelk-s3 1.0.0-SNAPSHOT 003f0a987901 9 days ago 382MB aws-quarkus/aws-camelk-file 1.0.0-SNAPSHOT 1130a9c3dfcb 9 days ago 382MB ... Deploying to Minikube Our Apache Camel microservices don't require any modification or refactoring in order to be deployed to Minikube. However, the build process consisting of all the steps necessary for testing, packaging, and deploying the application to K8s is to be adapted to become cloud-aware and to take advantage of the Minikube peculiarities. Hence, the first modification that we need to tweak is to add the quarkus-minikube Maven artifact to our master pom.xml file, as shown below: XML ... <dependency> <groupId>io.quarkus</groupId> <artifactId>quarkus-minikube</artifactId> </dependency> ... This artifact will generate Minikube-specific manifest files in the project's target/kubernetes directory. As everyone knows, everything that K8s is about is described in YAML (Yet Another Markup Language) notation. And while K8s historically requires a quite heavy YAML authoring and editing process, using this artifact has the advantage of automatically generating the required YAML files or, at least, a base skeleton that might be enriched later. Performing a new build by running the mvn -DskipTests clean install command at the project's root level will produce two categories of files in the target/kubernetes directory for each Quarkus microservice: A kubernetes.yaml/json pair of files containing the manifest describing the microservice's K8 general resources A minikube.yaml/json pair of files containing the manifest describing the microservice's Minikube-specific resources For example, for the aws-camelk-jaxrs microservice, going to aws-camelk-jaxrs/target/kubernetes and opening the minikube.yaml file, you'll see that: XML ... spec: ports: - name: http nodePort: 30326 port: 80 protocol: TCP targetPort: 8080 selector: app.kubernetes.io/name: aws-camel-jaxrs app.kubernetes.io/part-of: aws-camelk app.kubernetes.io/version: 1.0.0-SNAPSHOT type: NodePort ... This manifest fragment defines a K8s service of the type NodePort listening for HTTP requests on the TCP port number 8080, mapped to the host's port number 30326. This configuration is Minikube specific as for other clusters like EKS, the type of the configured K8s service would be ClusterIP instead of NodePort. The selector paragraph defines the service name, version, and package, customized via the following properties: Properties files ... quarkus.kubernetes.part-of=aws-camelk quarkus.kubernetes.name=aws-camel-jaxrs quarkus.kubernetes.version=1.0.0-SNAPSHOT ... Another important point to notice is the AWS credentials definition. Our microservices need access to AWS and, for that purpose, some properties like the region name, the access key id, and value should be defined. While the region name isn't a piece of sensible information and may be defined as a clear text property, this isn't the case for the access key-related properties, which require to use K8s secrets. The following listing shows a fragment of the application.properties file: Properties files ... quarkus.kubernetes.env.vars.aws_region=eu-west-3 quarkus.kubernetes.env.secrets=aws-secret ... Here, the region name is defined as being eu-west-3 in plain text, while the AWS access key credentials are defined via a K8S secret named aws-secret. Running on Minikube We just have reviewed how to refactor our Maven-based build process in order to adapt it to K8s native. In order to run the microservices on Minikube, proceed as follows: Start Minikube Minikube should be installed, of course, on your box. That's a very easy operation; just follow the guide here. Once installed, you need to start Minikube: Shell $ minikube start ... $ eval $(minikube -p minikube docker-env) After starting Minikube, the last command in the listing above sets the K8s Docker registry to the instance running in Minikube such that the newly generated images be published directly to it. Clone the Project From GitHub Run the following commands to clone the repository: Shell $ git clone https://github.com/nicolasduminil/aws-camelk.git $ cd aws-camelk $ git checkout minikube Create a K8s Namespace and Secret Run the following commands to create the K8s namespace and secret: Shell $ kubectl create namespace quarkus-camel $ kubectl apply -f aws-secret.yaml --namespace quarkus-camel Here, after creating a K8s namespace named quarkus-camel, we create a K8s secret in this same namespace by applying the config in the manifest file named aws-secret.yaml, as shown below: YAML apiVersion: v1 kind: Secret metadata: name: aws-secret type: Opaque data: AWS_ACCESS_KEY_ID: ... AWS_SECRET_ACCESS_KEY: ... The properties labeled AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are BASE64 encoded. Start the Microservices The same script (start-ms.sh) that we used in Part 2 in order to start the microservices may be used again for the same purposes. It has been modified, as shown below: Shell #!/bin/sh ./delete-all-buckets.sh ./create-queue.sh mvn -DskipTests -Dquarkus.kubernetes.deploy clean package sleep 3 ./copy-xml-file.sh Here, we start by cleaning up the environment and deleting all the S3 buckets named "mys3*", if any. Then we create an SQS queue named "myQueue" if it doesn't exist already. If it exists, we purge it by removing all the messages stored in it. The Maven command uses the property quarkus.kubernetes.deploy such that to deploy to Minikube the newly generated Docker images. Last but not least, copying an XML file into the input directory will trigger the Camel route named aws-camelk-file, which starts the pipeline. Observe the Log Files In order to follow the microservices execution, run the commands below: Shell $ kubectl get pods --namespace quarkus-camel $ kubectl logs <pod-id> --namespace quarkus-camel Stop the Microservices In order to stop the microservices, run the command below: Shell ./kill-ms.sh Clean up the AWS Infrastructure Don't forget to clean up your AWS infrastructure such that to avoid being billed. Shell $ ./delete-all-buckets.sh $ ./purge-sqs-queue.sh $ ./delete-sqs-queue.sh Stop Minikube Last but not least, stop Minikube: Shell $ eval $(minikube -p minikube docker-env --unset) $ minikube stop Enjoy! Previous Posts in This Series: Microservices With Apache Camel and Quarkus (Part 1) Microservices With Apache Camel and Quarkus (Part 2)
In this article, we will briefly describe and roughly compare the performance of various request-handling approaches that could be used in Spring Boot applications. Efficient request handling plays a crucial role in developing high-performance back-end applications. Traditionally, most developers use the Tomcat embedded in Spring Boot applications, with its default Thread Pool for processing requests under the hood. However, alternative approaches are gaining popularity in recent times. WebFlux, which utilizes reactive request handling with an event loop, and Kotlin coroutines with their suspend functions have become increasingly favored. Additionally, Project Loom, which introduces virtual threads, is set to be released in Java 21. However, even though Java 21 is not released yet, we can already experiment with Project Loom since Java 19. Therefore, in this article, we will also explore the use of virtual threads for request handling. Furthermore, we will conduct a performance comparison of the different request-handling approaches using JMeter for high-load testing. Performance Testing Model We will compare request-handling approaches using the following model: The flow is as easy as cake: The client (JMeter) will send 500 requests in parallel. Each thread will await for response and then send another request repeatedly. The request timeout is 10 seconds. The testing period will last for 1 minute. The Spring Boot application that we are testing will receive the requests from the client and await responses from the slow server. The slow server responds with a random timeout. Max response time is 4 seconds. The average response time is 2 seconds. The more requests that are handled, the better the performance results are. 1. Thread Pool By default, Tomcat uses Thread Pool (sometimes called a connection pool) for request handling. The concept is simple: when a request arrives at Tomcat, it assigns a thread from the thread pool to handle the request. This allocated thread remains blocked until a response is generated and sent back to the client. By default, Thread Pool contains up to 200 threads. That basically means that only 200 requests could be handled at a single point in time. But this and other parameters are configurable. Default Thread Pool Let's make performance measurements on a simple Spring Boot application with embedded Tomcat and default Thread Pool. The Thread Pool holds 200 threads. On each request, the server makes a blocking call to another server with an average response time of two seconds. So we can anticipate a throughput of 100 requests per second. Total requests Throughput, req/sec Response time, ms Error rate, % Average Min 90% line Max 3356 91.2 4787 155 6645 7304 0.00 Remarkably, the actual result is quite close, with a measured throughput of 91.2 requests per second. Increased Thread Pool Let's increase the maximum number of threads in the Thread Pool up to 400 using application properties: server: tomcat: threads: max: 400 And let's run tests again: Total requests Throughput, req/sec Response time, ms Error rate, % Average Min 90% line Max 6078 176.7 2549 10 4184 4855 0.00 It is expected that doubling the number of threads in the Thread Pool would improve the throughput by two times. But note that increasing the number of threads in a Thread Pool without considering the system's capacity and resource limitations can have adverse effects on performance, stability, and overall system behavior. It is crucial to carefully tune and optimize the Thread Pool size based on the specific requirements and capabilities of the system. 2. WebFlux Instead of assigning a dedicated thread per request, WebFlux employs an event loop model with a small number of threads, typically referred to as an event loop group. This allows it to handle a high number of concurrent requests with a limited number of threads. Requests are processed asynchronously, and the event loop can efficiently handle multiple requests concurrently by leveraging non-blocking I/O operations. WebFlux is well-suited for scenarios that require high scalability, such as handling a large number of long-lived connections or streaming data. Ideally, WebFlux applications should be fully written in a reactive way; sometimes, this is not that easy. But we have a simple application, and we can just use a WebClient for calls to the slow server. Java @Bean public WebClient slowServerClient() { return WebClient.builder() .baseUrl("http://127.0.0.1:8000") .build(); } In the context of Spring WebFlux, a RouterFunction is an alternative approach to request mapping and handling: Java @Bean public RouterFunction<ServerResponse> routes(WebClient slowServerClient) { return route(GET("/"), (ServerRequest req) -> ok() .body(slowServerClient .get() .exchangeToFlux(resp -> resp.bodyToFlux(Object.class)), Object.class )); } But traditional controllers could still be used instead. So, let's run the tests: Total requests Throughput, req/sec Response time, ms Error rate, % Average Min 90% line Max 7443 219.2 2068 12 3699 4381 0.00 The results are even better than in the case of an increased Thread Pool. But it's important to note that both Thread Pool and WebFlux have their strengths and weaknesses, and the choice depends on the specific requirements, the nature of the workload, and the expertise of the development team. 3. Coroutines and WebFlux Kotlin coroutines can be effectively used for request handling, providing an alternative approach in a more concurrent and non-blocking manner. Spring WebFlux supports coroutines for requests handling, so let's try to write such a controller: Kotlin @GetMapping suspend fun callSlowServer(): Flow<Any> { return slowServerClient.get().awaitExchange().bodyToFlow(String::class) } A suspend function can perform long-running or blocking operations without blocking the underlying thread. Kotlin Coroutine Fundamentals article describes the basics pretty well. So, let's run the tests again: Total requests Throughput, req/sec Response time, ms Error rate, % Average Min 90% line Max 7481 220.4 2064 5 3615 4049 0.00 Roughly, we can make a conclusion that results do not differ from the case of the WebFlux application without coroutines. But besides coroutines, the same WebFlux was used, and the test probably didn't fully reveal the potential of coroutines. Next time, it's worth trying Ktor. 4. Virtual Threads (Project Loom) Virtual threads, or fibers, are a concept introduced by Project Loom. Virtual threads have a significantly lower memory footprint compared to native threads, allowing applications to create and manage a much larger number of threads without exhausting system resources. They can be created and switched between more quickly, reducing the overhead associated with thread creation. The switching of virtual thread execution is handled internally by the Java Virtual Machine (JVM) and could be done on: Voluntary suspension: A virtual thread can explicitly suspend its execution using methods like Thread.sleep() or CompletableFuture.await(). When a virtual thread suspends itself, the execution is temporarily paused, and the JVM can switch to executing another virtual thread. Blocking operation: When a virtual thread encounters a blocking operation, such as waiting for I/O or acquiring a lock, it can be automatically suspended. This allows the JVM to utilize the underlying native threads more efficiently by using them to execute other virtual threads that are ready to run. If you're interested in the topic of virtual and carrier threads, read this nice article on DZone — Java Virtual Threads — Easy Introduction. Virtual threads will finally be released in Java 21, but we can already test them since Java 19. We just have to explicitly specify the JVM option: --enable-preview Basically, all that we have to do is to replace Tomcat Thread Pool with some virtual thread-based executor: @Bean public TomcatProtocolHandlerCustomizer<?> protocolHandler() { return protocolHandler -> protocolHandler.setExecutor(Executors.newVirtualThreadPerTaskExecutor()); } So, instead of the Thread Pool executor, we've started using virtual threads. Let's run the tests: Total requests Throughput, req/sec Response time, ms Error rate, % Average Min 90% line Max 7427 219.3 2080 12 3693 4074 0.00 The results are practically the same as in the case of WebFlux, but we didn't utilize any reactive techniques at all. Even for calls to the slow server, a regular blocking RestTemplate was used. All we did was replace the Thread Pool with a virtual thread executor. Conclusion Let's gather the test results into one table: Request handler Total requestsin 30 seconds Throughput, req/sec Response time, ms Error rate, % Average Min 90% line Max Thread Pool (200 threads) 3356 91.2 4787 155 6645 7304 0.00 Increased Thread Pool (400 threads) 6078 176.7 2549 10 4184 4855 0.00 WebFlux 7443 219.2 2068 12 36999 4381 0.00 WebFlux + Coroutines 7481 220.4 2064 5 3615 4049 0.00 Virtual Threads (Project Loom) 7427 219.3 2080 12 3693 4074 0.00 The performance tests conducted in this article are quite superficial, but we can draw some preliminary conclusions: Thread Pool demonstrated inferior performance results. Increasing the number of threads might provide improvements, but it should be done considering the system's capacity and resource limitations. Nevertheless, a thread pool can still be a viable option, especially when dealing with numerous blocking operations. WebFlux demonstrated really good results. However, it is important to note that to fully benefit from its performance. The whole code should be written in a reactive style. Using Coroutines in conjunction with WebFlux yielded similar performance results. Perhaps we had to try them with Ktor, a framework specifically designed to harness the power of coroutines. Using Virtual Threads (Project Loom) also yielded similar results. However, it's important to note that we didn't modify the code or employ any reactive techniques. The only change made was replacing the Thread Pool with a virtual thread executor. Despite this simplicity, the performance results were significantly improved compared to using a Thread Pool. Therefore, we can tentatively conclude that the release of virtual threads in Java 21 will significantly transform the approach to request handling in existing servers and frameworks. The entire code, along with the JMeter test file, is available on GitHub as usual.
These days, distributed version control systems like Git have "won the war" of version control. One of the arguments I used to hear when DVCSs were gaining traction was around how easy it is to branch and merge with a VCS like Git. However, I'm a big fan of Trunk-Based Development (TBD), and I want to tell you why. With trunk-based development, all developers work on a single branch (e.g., 'main'). You might have read or heard Martin Fowler or Dave Farley talking about it. It was when I was working with Dave (around about the time that Git was rapidly becoming the "go-to" version control system) that I really saw the benefits of trunk-based development for the team, particularly in an environment that was pioneering continuous delivery (Dave was writing the book with Jez Humble while I worked with him). In contrast, the branching model encourages developers to create separate branches for every feature, bug fix, or enhancement. Although branching may seem like a logical approach to isolate changes and reduce risk, several factors make me more comfortable with trunk-based development. 1. Speed and Efficiency In trunk-based development, the entire team works on a single branch. This model allows for quicker integrations and fewer merge conflicts. This is literally "continuous integration (CI)", as originally suggested by the practices of Extreme Programming. While these days we tend to mean "run your build and tests on a team server every time you commit" when we say CI, what CI really meant was actually integrate your code regularly. Code living on separate branches is, by definition, not integrated. And the longer these branches live, the more challenging it is to merge them back into the main codebase. It might seem fast to develop your fixes and improvements on a separate branch that isn't impacted by other developers' changes, but you still have to pay that cost at some point. Integrating small changes regularly into your code is usually less painful than a big merge at the end of a longer period of time. 2. Greater Code Stability Trunk-based development encourages frequent commits, which leads to smaller and more manageable changes. How and why? For the same reason that we don't want big merges from long-lived branches - the longer we leave it to commit our changes, the higher the chances our commit will clash with someone else's changes. By frequently pulling in the other developers' changes, and frequently pushing small changes of working code, we know the codebase is stable and working. Of course, this assumption of "stable and working" is easier to check if we have a CI server that's running the build and tests for each of these commits. We also have to stop making commits if the build breaks at any time and focus on fixing that build. Continuously pushing small, frequent commits when the build is already broken isn't going to do anyone any favors.In the branching model, large and infrequent merges can introduce bugs that are hard to identify and resolve due to the sheer size of the changes. Have you ever merged the trunk into your branch after someone else has merged their own big piece of work and found your code no longer works? It can take a LOT of time to track down why your tests are failing or the application isn't working the way you expect when you've made a whole bunch of changes and someone else has made a whole bunch of different, or overlapping, changes. And that's assuming you actually have reliable test coverage that can tell you there's a problem. 3. Enhanced Team Collaboration My favorite way of sharing knowledge between team members is pair programming. I know not everyone is a fan or is in a position to do it (especially now more people are working remotely, but if so, check out JetBrains' Code With Me). If you're not pairing, then at least you want to be working on the same code, right? If you're all working on your own branches, you are not collaborating. You are competing. To see who can get their code in fastest. To avoid being stomped on by someone else's code changes. If you're all working on the same branch, you tend to have a better awareness of the changes being made. This approach fosters greater team collaboration and knowledge sharing. In contrast, branching can create a siloed work environment where you're all working independently, leading to knowledge gaps within the team. 4. Improved Continuous Integration and Delivery (CI/CD) Practices Dave Farley's book, "Continuous Delivery," his blog posts, and videos argue something along the lines of "trunk-based development is inherently compatible with Continuous Integration and Continuous Delivery (CI/CD) practices." In a trunk-based model, continuous integration becomes more straightforward because your code is committed frequently to trunk, and that's the branch your CI environment is running the build and tests on. Any failures there are seen and addressed promptly, reducing the risk of nasty failures. It's usually easy to track down which changes caused the problem. If the issue can't be fixed immediately, you can back the specific changes that caused it. By now we should know the importance of a quick feedback loop - when you find problems faster you can locate the cause faster, and you can fix it faster. This improves your software's quality. Continuous delivery also thrives in a trunk-based development environment. Successful continuous delivery hinges on the ability to have a codebase that is always in a deployable state. The trunk-based development approach ensures this by promoting frequent commits, frequent integrations, and tests on all of these integrations. The small number of changes being introduced at any one time makes the software easier to deploy and test. In contrast, implementing effective CI/CD can be more complex and time-consuming with the branching model. While it's tempting to think, "Well, I run my build and all my tests on my branch," you're not actually integrating every time you commit. It's at merge (or rebase) time that you start to see any integration issues. All those tests you were running on your branch "in CI" were not testing any kind of integration at all. Merging and testing code from different branches can introduce delays and potential errors, which takes away some of the benefits of having a build pipeline in the first place. 5. Reduced Technical Debt Long-lived branches often lead to 'merge hell' where the differences between one branch (like 'main') and another (for example, your feature branch) are so great that merging becomes a nightmare. This can result in technical debt, as you may resort to quick fixes to resolve merge conflicts or accept suggestions from your IDE that resolve the merge but that you don't fully understand. With trunk-based development, frequent merges, and smaller changes make it easier to manage and reduce the build-up of technical debt. In Conclusion I personally think trunk-based development has clear advantages, and I have experienced them first-hand working in teams that have adopted this approach. However, it requires a mindset and a culture within the development team. You need to frequently merge other developers' changes into your own code. You need to commit small changes, frequently, which requires you to only change small sections of the code and make incremental changes, something which can be a difficult habit to get used to. Pair programming, comprehensive automated testing, and maybe code reviews are key practices to helping all the team to adopt the same approach and culture. Trunk-based development, done in a disciplined way, streamlines the development process, enhances team collaboration, improves code stability, supports efficient CI/CD practices, and may result in less technical debt. While it may be challenging to adapt to this approach if you've been working with a branch-based model, the long-term benefits are worthwhile. If you want to move to work this way, you may also want to read Dave's article addressing barriers to trunk-based development.
Mermaid is a trendy diagramming tool. A year ago, it was integrated into the Markdown rendering of GitHub. It is also integrated into several editors (see "Include diagrams in your Markdown files with Mermaid" for more information). What can you do, however, if you use a different editor? What if you want to use your Markdown document in an environment that does not integrate Mermaid yet? What can you do if the diagram is not Mermaid but PlantUML, Graphviz, or any other diagramming tool? This article will show how you can integrate any diagram-as-code tool into your documents. The technique works for Markdown, Asciidoc, APT, or any other text-based markup language. However, before anything else, here is a demonstration image, which was created the way I will describe in this article. Problem Statement When documenting some systems, it is often necessary to include diagrams. Keeping diagrams in separate files has advantages, but also disadvantages. It is easier to keep the consistency of the documentation when the different parts are close together. The more distanced the two corresponding and related parts are, the more likely that one or the other becomes stale when the other is updated. It is also a good idea if you can parameterize the diagram, and you could avoid copy-pasting diagram parameters from the document, the documented code, or the other way around. To solve these problems, more and more markup languages support selected diagramming tool markups to embed in the text. You can include Mermaid in Markdown documents if you target GitHub hosting for your document. You can include PlantUML diagrams in Asciidoc documents. What happens, however, if you want to include Mermaid in Asciidoc? What if you need PlantUML in Markdown? How do you solve the issue if you want to host your Markdown elsewhere besides GitHub? You can abandon your ideas, stick to the available tools, or wait for a solution. The latter approach, however, will always remain an issue. There will always be a new tool you want to use, and you will have to wait for the support of that tool in your favorite markup language. The principal reason for this is an architectural mismatch. Document markup languages must be responsible only for document structure and content and nothing else. Embedding a diagramming tool into the markup language must not be implemented in these languages. It is a separate concern with the document’s programmability ensuring document consistency automation. The solution is to use a meta markup above the document markup. This meta markup can be document markup agnostic and support all the diagramming tools you want to use. Ideas and Approach To Solve the Problem The basic idea is not new: separation of concerns. The document markup language should be responsible for the document structure and content. The diagramming tool should be responsible for the diagramming. The meta markup should be responsible for the integration. Since the meta markup is language agnostic, it can be used with any existing and future document markup languages. There is no need to wait for the support of the diagramming tool in the document markup language. The only question is the integration of the meta markup into the document markup language. The simplest and loosest way to integrate the meta markup is to use a preprocessor. Processing the meta markup, we read and generate a text file. The document markup processing tool catches where the meta markup has left off. It has no idea that a program generates the document markup and is not manually edited. Strictly speaking, when you edit a document markup, then the editor is the program that generates the file. Technically, there is no difference. There are other possibilities. Most document markups support different editors to deliver some form of WYSIWYG editing. The meta markup preprocessor can be integrated into these editors. That way, the document markup enriched with the meta markup can seamlessly be edited in the WYSIWYG editor. The proposed meta markup and the implementing tool, Jamal, follow both approaches. Suggested Solutions/Tools The suggested solution is to use Jamal as the meta markup. Jamal is a general-purpose macro processor. There are other meta-markup processing tools, like PyMdown. These tools usually target a specific document markup and a specific purpose. Jamal is a general-purpose, turning complete macro processor with more than 200 macros for different purposes. These macros make your documents programmable to automate manual document maintenance tasks. The general saying is, "If you could give a task to an assistant to do, then you can automate it with Jamal." Jamal has a PlantUML module. PlantUML is written in Java, the development language I used to create Jamal. It makes the integration of PlantUML into Jamal easy, and PlantUML diagrams embedded into the documentation can be converted in the process. Jamal, however, is not limited to using only tools written in Java. To demonstrate it, we will use the Mermaid diagramming tool, written in JavaScript and running with node. Since Mermaid is not a Java program, it cannot be executed inside the JVM. We will create our documentation to execute Mermaid as a separate process. Other diagramming tools can be integrated similarly if executed on the document processing machine. Install Mermaid The first step is to install Mermaid. The steps are documented on the Mermaid site. I will not repeat the steps here because I do not want to create a document that gets outdated. On my machine, the installation creates the /usr/local/bin/mmdc executable. This file is a JavaScript script that starts the Mermaid diagramming tool. While executing from Jamal, I realized the process has a different environment than my login script. To deal with that, I had to edit the file. Instead of using the env command to find the node interpreter, I specified the full path to the node executable. Other installations may be different, and it does not affect the rest of the article. It is a technical detail. Install Jamal We will use Jamal as the meta markup processor. The installation is detailed in the documentation of Jamal. You can start it from the command line, as a Maven plugin, using Jbang, and many other ways. I recommend using it as a preprocessor integrated into the IntelliJ Asciidoctor plugin. It will provide you with WYSIWYG editing of your document in Markdown and Asciidoc enriched with Jamal macros. Also, the installation is nothing more than executing the command line: mvn com.javax0.jamal:jamal-maven-plugin:2.1.0:jamalize This will download the version 2.1.0 we use in this article by the time pre-release and copy all the needed files into your project’s .asciidoctor/lib directory. It will make the macros available for the Asciidoctor plugin. What needs manual work is configuring IntelliJ to treat all .jam files as Asciidoc files. That way, the editor will invoke the Asciidoctor plugin and use the Jamal preprocessor. It is the setup that I also use to write the articles. Create the Macros for Mermaid To have a mermaid document inside the document, we should do three things using macros: Save the Mermaid text into a file. Execute the Mermaid tool to convert the text into an SVG file. Reference the SVG file as an image in the document. Later, we will see how to save on Mermaid processing, executing it only when the Mermaid text changes. We will use the io:write macro to save the Mermaid text into a file. This macro is in a package that is not loaded by default. We have to load it explicitly. To do that, we use the maven:load macro. {@maven:load com.javax0.jamal:jamal-io:2.1.0} Note This macro package has to be configured as safe for the document in the .jamal/settings.properties file as it is documented. The macros in this package can read and write files and execute commands configured. To use a macro package like that from an untrusted source is a security risk. For this reason, every package loaded by the maven:load macro has to be configured as safe. The configuration specifies the package and the documents where it can be used. At the same time, the io package also needs configuration to be able to execute the mmdc command. To do that, the configuration file contains a line assigning a symbolic name to the command as follows: mermaid=/usr/local/bin/mmdc The io:exec macro will use this symbolic name to find the command to execute. When the macro package is loaded, we can use the io:write macro as in the following sample: {#define CHART=flowchart TD A[Christmas] -->|Get money| B(Go shopping) B --> C{Let me think} C -->|One| D[Laptop] C -->|Two| E[iPhone] C -->|Three| F[fa:fa-car Care] } {#io:write (output="aaa.mmd") {CHART} When the file is created, we can execute the Mermaid tool to convert it into an SVG file, as the following: {#io:exec command="""mermaid -i aaa.mmd -o aaa.svg """ cwd="." error="convert.err.txt" output="convert.out.txt" } By that, we have the file. Whenever the Mermaid text changes, the SVG file will be recreated. As a matter of fact, whenever the document changes, the SVG file will be recreated. It wastes resources when the diagram remains the same and the processing runs interactively. To help with that, we can use the hashCode macro. The macro hashCode calculates the hash code of the text. We will use the hash code to name the file. Whenever the diagram changes, the file’s name changes. Also, if the file exists, it should contain the diagram for the current text. To check that the file exists, we include it in the document. Because we do not want the SVG text in the document, we surround the include with the block macro. If the file does not exist, then an error will occur. The macro try will handle this error, and the execution will continue. However, the macro CREATE will be set to true in this case. If there is no error when the file already exists, the macro CREATE will be set to false. The if macro will check the value of the macro CREATE. If it is true, it will execute the io:write and io:exec macros to create the file. If it is false, then it will do nothing. {#define KEY={#hashCode {CHART}}{@define CREATE=true} {@try {#block{#include images/{KEY}.svg}{@define CREATE=false} {#if `//`{CREATE}// {#io:write (mkdir output="images/aaa.mmd") {CHART} {#io:exec command="""mermaid -i images/aaa.mmd -o images/{KEY}.svg """ cwd="." error="convert.err.txt" output="convert.out.txt" }//} Summary and Takeaway This article discussed integrating Mermaid diagrams into your Asciidoc, Markdown, or any other markup document. We selected Mermaid for two reasons. First, usually, this is the tool people ask for. Second, this is an excellent example of a non-Java tool that can be integrated into document processing. The described way can be applied to any external tool capable of running as a process. The samples also demonstrate a complex structure of macros showing the power of the Jamal macro processor. Such complexity is rarely needed. In addition to the technology, I discussed, though only briefly, the separation of concerns for document handling and how the document formatting markup should be separated from the processing meta markup. If you want diagrams in your documentation, download Jamal and enhance your documents.
This is the third and final post about my OCP-17 preparation. In part one, I explained how playing a human virtual machine and refreshing your mastery of arcane constructs is not pointless, even if the OCP doesn’t — and doesn’t claim to — make you a competent developer. In the second part, I showed you how intrinsic motivation keeps itself going without carrots or sticks, provided you can find ways to make your practice fun and memorable. It's time to share some of these examples and tips. Make It Quality Time But first some advice about logistics and time management. As with physical exercise, short and frequent trumps long and sporadic. It’s more effective and more likely to become a habit, like brushing your teeth. Choose the time of day when you are most energetic and productive. The early morning works best for me because I’m a morning person. And there is a satisfaction in getting the daily dose crossed off your to-do list, even when it doesn’t feel like a chore. Make a good balance between reading, practicing, and revising. Once you’ve worked through the entire textbook you will need to refresh much of the first few chapters. That’s okay. Keep revising them, doing a few questions from each chapter each day. You’ll get there slowly, but surely. Make It Practical and Productive Practice in the previous paragraph means writing novel code aimed to teach yourself a certain language construct. It’s about your productivity, so copying snippets from the book doesn’t count. If you’ve ever learned a foreign language the old-fashioned way, you will agree that cramming vocabulary and grammar rules does little for your oral skills. Only speaking can make you fluent, preferably among native speakers. It’s like swimming or playing the saxophone: you can’t learn it from a book. Never used the NIO2 API or primitive streams? Never done a comparison or binary search of arrays? Get your feet wet, preferably with autocomplete turned off. Better even, scribble in a plaintext editor and paste it into your IDE when you’re done. Understand the Why While Java shows its age, its evolution is managed carefully so new additions don’t feel as if they were haphazardly tacked on. Decisions take long to mature and were made for a reason. When the book doesn’t explain the reasoning behind a certain API peculiarity, try to explain it to yourself instead of parroting a rule. Here’s a case in point from the concurrency API. The submit() method of an executor has two overloaded versions for a Runnable or Callable argument. It returns a Future. The void execute() method only takes a Runnable, not a Callable. Why does that make good sense? Well, a Callable yields a value and can throw an Exception. Since execute() acts in a fire-and-forget fashion, the result of a Callable would be inaccessible, so it’s not supported. Conversely, submitting a Runnable with a void result is fine. Its Future returns null. The memory athletes from my previous post, who memorized random stacks of cards, have it much tougher than you and I. Learning Java is about memorizing a lot of facts, but they’re not random. Making a Visual Story The ancient Greeks taught us how to construct mental memory palaces to store random facts for easy retrieval. Joshua Foer added a moonwalking Albert Einstein to jog his memory. You should make your code samples equally fun and memorable. Here’s how to illustrate the fundamental differences between an ArrayList and a LinkedList. Imagine a movie theater with a fixed number of seats (the ArrayList) and a line of patrons (the LinkedList) at the ticket booth, who receive a numbered ticket. People arrive (offer(..) or add(..)) at the tail of the queue irregularly while every ten seconds the first person in the queue can enter the theater (poll(), element()) and is shown to their seat (seats.set(number, patron). Let’s add concurrency to the mix. Suppose there are two ticket booths, each with its own line, and a central ticket dispenser that increments a number. That’s right: getAndIncrement() in AtomicInteger to the rescue. I’d happily show you the code, but that wouldn’t teach you much. Or take access rights in class hierarchies. Subtypes may not impose stricter access rights or declare broader or new checked exceptions. Let’s put it less academically. Imagine a high rise with multiple company offices (classes) and several floors (packages). Private access is limited to employees of one company. Package access extends to offices on the same floor. Public access is everybody: other floors as well as external visitors. The proprietor provides a public bathroom that clearly shows when it’s occupied. You can dress it up with scented towels and music through a subclass, but you must obey this contract: public void visitRestRoom(Person p) throws OccupiedException { .. } Every outside visitor is welcome to use it. You are not allowed to restrict access to only employees on your floor (package access), much less your own employees (private access). Neither may you bother visitors with a PaymentExpectedException. It violates the contract. Code samples in the exam are meant to confuse you. Your own examples should do the exact opposite. You use real-life examples (a public office restroom, the queue outside a movie theater) and combine them in a way that is easy to visualize and fun to remember. Mnemonics Sometimes there’s nothing for it but to commit stuff to memory, like the types you can use as a switch variable (byte, int, char, short, String, enum, var). You can string them together in a mnemonic like this one: In one short intense soundbite, the stringy character enumerated the seven variables for switch. Or how about the methods that operate on the front of a queue (element, push, peek, pop, poll, and remove)? Elmer pushed to the front of the queue to get a peek at the pop star, but he was pulled out and removed. Yes, it’s far-fetched, silly, and outlandish. That’s what makes them memorable. To me at least. Or try your hand at light verse. The educational benefit may not be as strong for you as a reader, but the time I spent crafting it made sure I won’t quickly confuse Comparable and Comparator again. The Incomparable Sonnet You implement Comparable to sort(in java.lang: no reason to import).CompareTo runs through items with the aimto see if they are different or the same. If it returns a positive, it meantthat this was greater than the argument.For smaller ones a minus is supplied,a zero means the same, or “can’t decide”. Comparator looks similar, but bearin mind its logic is more self-contained.It has a single method called comparewhere difference of two args is ascertained.A range of default methods supplementyour lambdas. Chain them to your heart’s content! Some Closing Thoughts The aim of your practice is not to pass the exam as quickly as possible (or at all). It’s to become a more competent developer and have fun studying. I mentioned that there is some merit in playing human compiler, but that doesn’t mean that I fully agree with the OCP’s line of questioning in its current form and the emphasis on API details. Being able to write code from scratch with only your limited memory to save you is not a must-have skill for a developer in the coming decade. She will need to acquire new skills to counter the relentless progress of AI in the field. If I needed to assess you as a new joiner to our team, and you showed me a 90% OCP passing grade, I’d be seriously impressed and a little jealous, but I will still not be convinced that you’re a great developer until I see some of your work. You could still be terrible. And you can be a competent programmer and fail the exam. That’s where the OCP is so different from, say, a driving test. If you’re a bad driver you should not get a license, no exceptions. And if you fail the test, you’re not a great driver. Full disclosure: it took me four tries. If the original C language was a portable toolbox and Java 1.1 a toolshed, then Java 17 SE is a warehouse with many advanced power tools. The great thing is that you don’t have to wonder what all the buttons do. The instructions are clearly printed on the tools themselves through autocomplete and Javadoc. It makes sense to know what tools the warehouse stocks and when you should use them. But learning the instructions by heart? I can think of a better use of my time, energy, and memory.
The Power of Ansible, Jinja2, and Go-Templates in Streamlining DevOps Workflow Streamlining DevOps workflows with Go service templates can be an intimidating task, especially when dealing with YAML files that contain Go-inspired string interpolation, as seen in tools like Prometheus, Loki, and Promtail. However, when these files are managed with Ansible, a versatile IT automation tool, and Jinja2, a powerful Python templating engine, these complex tasks can be made easier and more efficient for DevOps professionals. The Challenge: Distinguishing Between Ansible Variables and Go Template Expressions A common challenge when using Ansible is distinguishing between Ansible variables and Go template expressions within YAML files. It's critical to generate a YAML file that includes literal {{ mustaches } that Ansible should not interpolate. This is where Jinja2's {% raw %}{% endraw %} tags come into play, telling Ansible not to treat the enclosed text as Jinja2 code, thus avoiding variable interpolation. The Solution: Leveraging Jinja2's {% raw %}{% endraw %} Tags When working with YAML files, especially those that contain Go-Templates such as Prometheus, Loki, and Promtail, you may encounter a syntax error when Ansible processes the template if you do not use {% raw %}{% endraw %}. This error occurs because Ansible uses Jinja2 to manage variables within templates. If your template contains expressions that Jinja2 interprets as variables or control structures (such as the double curly braces used in Go-Templates), Ansible will attempt to process them, causing an error because these expressions do not match the expected Jinja2 format. Shell $ ansible-playbook roles/promtail.yaml -l testing-host An exception occurred during task execution. To see the full traceback, use -vvv. The error was: . unexpected '.' fatal: [testing-host]: FAILED! => { "changed": false, "msg": "AnsibleError: template error while templating string: unexpected '.'. String: {{ansible_managed | comment(decoration='# ')} --- ... pipeline_stages: - logfmt: mapping: timestamp: time level: - match: ... - template: source: environment template: '{{ if .Value }{{ .Value }{{ else }default{{ end }' - labels: timestamp: time environment: level: . unexpected '.'"} Common Errors: The Consequences of Not Using {% raw %}{% endraw %} in Ansible Jinja2 Templates The power of Jinja2 lies in its ability to handle these potential errors. By using the {% raw %}{% endraw %} tags, anything enclosed is considered a literal string and will not be processed by Jinja2. This feature is very effective when dealing with Ansible configurations that contain Go templates. Practical Example: Using {% raw %}{% endraw %} Tags in a Go-template with Ansible and Jinja2 To illustrate, consider the following example: /etc/promtail/config.yaml YAML pipeline_stages: - logfmt: mapping: timestamp: time level: - match: ... - match: selector: '{job="exporters"}' stages: - regex: source: service expression: .+-(?P<environment>(stable|testing|unstable)$) - labels: environment: - template: source: environment template: '{{ if .Value }{{ .Value }{{ else }default{{ end }' Ansible - roles/promtail/templates/promtail.yaml.j2 YAML pipeline_stages: - logfmt: mapping: timestamp: time level: - match: ... - match: selector: '{job="exporters"}' stages: - regex: source: service expression: .+-(?P<environment>(stable|testing|unstable)$) - labels: environment: - template: source: environment template: '{% raw %}{{ if .Value }{{ .Value }{{ else }default{{ end }{% endraw %}' In the example above, the {% raw %}{% endraw %} tags are used to prevent Jinja2 from processing the Go template within the template field. This ensures that Ansible treats the content between these tags as a literal string, avoiding potential syntax errors. Conclusion: The Benefits of Jinja2 and Ansible in Managing Go-Templates By using Jinja2 and Ansible, DevOps professionals can easily manage configurations that include Go-Templates, improving their workflow and operational efficiency. However, as with all software tools and practices, it's important to keep exploring and learning to stay abreast of best practices and newer, more efficient methodologies.
The ReactAndGo project is used to compare a single page application frontend based on React and a Rest backend based on Go to Angular frontends and Spring Boot/Java backends. The goal of the project is to send out notifications to car drivers if the gas price falls below their target price. The gas prices are imported from a provider via MQTT messaging and stored in the database. For development, two test messages are provided that are sent to an Apache Artemis server to be processed in the project. The Apache Artemis server can be run as a Docker image, and the commands to download and run the image can be found in the 'docker-artemis.sh' file. As a database, Postgresql is used, and it can be run as a Docker image too. The commands can be found in the 'docker-postgres.sh' file. Architecture The system architecture looks like this: The React frontend uses the Rest interface that the Gin framework provides to communicate with the backend. The Apache Artemis Messaging Server is used in development to receive and send back the gas price test messages that are handled with the Paho-MQTT library. In production, the provider sends the MQTT messages. The Gorm framework is used to store the data in Postgresql. A push notification display is used to show the notification from the frontend if the target prices are reached. The open-source projects using Go have more of a domain-driven architecture that splits the code for each domain into packages. For the ReactAndGo project, the domain-driven architecture is combined with a layered architecture to structure the code. The common BaseController is needed to manage the routes and security of the application. The architecture is split between the gas station domain, the push notification domain, and the application user domain. The Rest request and response handling is in its own layer that includes the Rest client for the gas station import. The service layer contains the logic, database access, and other helper functions. Domain-independent functions like Cron Jobs, Jwt token handling, and message handling are implemented in separate packages that are in a utility role. Notifications From React Frontend to Go/Gin/Gorm Backend The ReactAndGo project is used to show how to display notifications with periodic requests to the backend and how to process rest requests in the backend in controllers and repositories. React Frontend In the front end, a dedicated worker is started after login that manages the notifications. The initWebWorker(...) function of the LoginModal.tsx starts the worker and handles the tokens: TypeScript-JSX const initWebWorker = async (userResponse: UserResponse) => { let result = null; if (!globalWebWorkerRefState) { const worker = new Worker(new URL('../webpush/dedicated-worker.js', import.meta.url)); if (!!worker) { worker.addEventListener('message', (event: MessageEvent) => { //console.log(event.data); if (!!event?.data?.Token && event?.data.Token?.length > 10) { setGlobalJwtToken(event.data.Token); } }); worker.postMessage({ jwtToken: userResponse.Token, newNotificationUrl: `/usernotification/new/${userResponse.Uuid}` } as MsgData); setGlobalWebWorkerRefState(worker); result = worker; } } else { globalWebWorkerRefState.postMessage({ jwtToken: userResponse.Token, newNotificationUrl: `/usernotification/new/${userResponse.Uuid}` } as MsgData); result = globalWebWorkerRefState; } return result; }; The React frontend uses the Recoil library for state management and checks if the globalWebWorkerRefState exists. If not, the worker in dedicated-worker.js gets created and the event listener for the Jwt tokens is created. The Jwt token is stored in a Recoil state to be used in all rest requests. Then the postMessage(...) method of the worker is called to start the requests for the notifications. Then the worker is stored in the globalWebWorkerRefState and the worker is returned. The worker is developed in the dedicated-worker.ts file. The worker is needed as .js file. To have the help of Typescript, the worker is developed in Typescript and then turned into Javascript in the Typescript Playground. That saves a lot of time for me. The refreshToken(...) function of the worker refreshes the Jwt tokens: TypeScript-JSX interface UserResponse { Token?: string Message?: string } let jwtToken = ''; let tokenIntervalRef: ReturnType<typeof setInterval>; const refreshToken = (myToken: string) => { if (!!tokenIntervalRef) { clearInterval(tokenIntervalRef); } jwtToken = myToken; if (!!jwtToken && jwtToken.length > 10) { tokenIntervalRef = setInterval(() => { const requestOptions = { method: 'GET', headers: { 'Content-Type': 'application/json', 'Authorization': `Bearer ${jwtToken}` }, }; fetch('/appuser/refreshtoken', requestOptions).then(response => response.json() as UserResponse) .then(result => { if ((!result.Message && !!result.Token && result.Token.length > 10)) { //console.log('Token refreshed.'); jwtToken = result.Token; /* eslint-disable-next-line no-restricted-globals */ self.postMessage(result); } else { jwtToken = ''; clearInterval(tokenIntervalRef); } }); }, 45000); } } The refreshToken(...) function first checks if another token interval has been started and stops it. Then the token is assigned and checked. If it passes the check a new interval is started to refresh the token every 45 seconds. The requestOptions are created with the token in the Authorization header field. Then the new token is retrieved with fetch(...) , and the response is checked, the token is set, and it is posted to the EventListener in the LoginModal.tsx. If the Jwt token has not been received, the interval is stopped, and the jwtToken is set to an empty string. The Eventlistener of the worker receives the token message and processes it as follows: TypeScript-JSX interface MsgData { jwtToken: string; newNotificationUrl: string; } let notificationIntervalRef: ReturnType<typeof setInterval>; /* eslint-disable-next-line no-restricted-globals */ self.addEventListener('message', (event: MessageEvent) => { const msgData = event.data as MsgData; refreshToken(msgData.jwtToken); if (!!notificationIntervalRef) { clearInterval(notificationIntervalRef); } notificationIntervalRef = setInterval(() => { if (!jwtToken) { clearInterval(notificationIntervalRef); } const requestOptions = { method: 'GET', headers: { 'Content-Type': 'application/json', 'Authorization': `Bearer ${jwtToken}` }, }; /* eslint-disable-next-line no-restricted-globals */ self.fetch(msgData.newNotificationUrl, requestOptions).then(result => result.json()).then(resultJson => { if (!!resultJson && resultJson?.length > 0) { /* eslint-disable-next-line no-restricted-globals */ self.postMessage(resultJson); //Notification //console.log(Notification.permission); if (Notification.permission === 'granted') { if(resultJson?.length > 0 && resultJson[0]?.Message?.length > 1 && resultJson[0]?.Title?.length > 1) { for(let value of resultJson) { new Notification(value?.Title, {body: value?.Message}); } } } } }); }, 60000); }); The addEventListener(...) method handles the MessageEvent messages with the MsgData. The jwtToken of the MsgData is used to start the refreshToken(...) function. Then it is checked to see if a notification interval has been started, and if so, it is stopped. Then a new interval is created that checks for new target-matching gas prices every 60 seconds. The jwtToken is checked, and if the check fails, the interval is stopped. Then the requestOptions are created with the Jwt token in the Authorization header field. Then fetch(...) is used to retrieve the new matching gas price updates. Then the result JSON is checked and posted back to the EventListener in the LoginModal.tsx. With Notification.permission the user gets asked for permission to send notifications, and granted means he agreed. The data for the notification is checked, and the notification is sent with new Notification(...). Backend To handle the frontend requests, the Go backend uses the Gin framework. The Gin framework provides the needed functions to handle Rest requests, like a router, context (url related stuff), TLS support, and JSON handling. The route is defined in the basecontroller.go: Go func Start(embeddedFiles fs.FS) { router := gin.Default() ... router.GET("/usernotification/new/:useruuid", token.CheckToken, getNewUserNotifications) ... router.GET("/usernotification/current/:useruuid", token.CheckToken, getCurrentUserNotifications) router.StaticFS("/public", http.FS(embeddedFiles)) router.NoRoute(func(c *gin.Context) { c.Redirect(http.StatusTemporaryRedirect, "/public") }) absolutePathKeyFile := strings.TrimSpace(os.Getenv("ABSOLUTE_PATH_KEY_FILE")) absolutePathCertFile := strings.TrimSpace(os.Getenv("ABSOLUTE_PATH_CERT_FILE")) myPort := strings.TrimSpace(os.Getenv("PORT")) if len(absolutePathCertFile) < 2 || len(absolutePathKeyFile) < 2 || len(myPort) < 2 { router.Run() // listen and serve on 0.0.0.0:3000 } else { log.Fatal(router.RunTLS(":"+myPort, absolutePathCertFile, absolutePathKeyFile)) } } The Start function gets the embedded files for the /public directory with the static frontend files. The line: Go router.GET("/usernotification/new/:useruuid", token.CheckToken, getNewUserNotifications) Creates the route /usernotification/new/:useruuid with the useruuid as parameter. The CheckToken function in the token.go file handles the Jwt Token validation. The getNewUserNotifications function in the in the uncontroller.go handles the requests. The getNewUserNotifications(...) function: Go func getNewUserNotifications(c *gin.Context) { userUuid := c.Param("useruuid") myNotifications := notification.LoadNotifications(userUuid, true) c.JSON(http.StatusOK, mapToUnResponses(myNotifications)) } ... func mapToUnResponses(myNotifications []unmodel.UserNotification) []unbody.UnResponse { var unResponses []unbody.UnResponse for _, myNotification := range myNotifications { unResponse := unbody.UnResponse{ Timestamp: myNotification.Timestamp, UserUuid: myNotification.UserUuid, Title: myNotification.Title, Message: myNotification.Message, DataJson: myNotification.DataJson, } unResponses = append(unResponses, unResponse) } return unResponses } The getNewUserNotifications(…) function uses the Gin context to get the path parameter useruuid and then calls the LoadNotifications(…) function of the repository with it. The result is turned into UserNotifications with the mapToUnResponses(…) function, which sends only the data needed by the frontend. The Gin context is used to return the HTTP status OK and to marshal the UserNotifications to JSON. The function LoadNotifications(...) is in the unrepo.go file and loads the notifications from the database with the Gorm framework: Go func LoadNotifications(userUuid string, newNotifications bool) []unmodel.UserNotification { var userNotifications []unmodel.UserNotification if newNotifications { database.DB.Transaction(func(tx *gorm.DB) error { tx.Where("user_uuid = ? and notification_send = ?", userUuid, !newNotifications) .Order("timestamp desc").Find(&userNotifications) for _, userNotification := range userNotifications { userNotification.NotificationSend = true tx.Save(&userNotification) } return nil }) } else { database.DB.Transaction(func(tx *gorm.DB) error { tx.Where("user_uuid = ?", userUuid).Order("timestamp desc").Find(&userNotifications) var myUserNotifications []unmodel.UserNotification for index, userNotification := range userNotifications { if index < 10 { myUserNotifications = append(myUserNotifications, userNotification) continue } tx.Delete(&userNotification) } userNotifications = myUserNotifications return nil }) } return userNotifications } The LoadNotifications(...) function checks if only new notifications are requested. Then a database transaction is created, and the new UserNotifications (notification.go) of the user file are selected, ordered newest first. The send flag is set to true to mark them as no longer new, and the UserNotifications are saved to the database. The transaction is then closed, and the notifications are returned. If the current notifications are requested, a database transaction is opened, and the UserNotifications of the user are selected, ordered newest first. The first 10 notifications of the user are appended to the myUserNotification slice, and the others are deleted from the database. Then the transaction is closed, and the notifications are returned. Conclusion This is the first React frontend for me, and I share my experience developing this frontend. React is a much smaller library than the Angular Framework and needs more extra libraries like Recoil for state management. Features like interval are included in the Angular RxJs library. React has much fewer features and needs more additional libraries to achieve the same result. Angular is better for use cases where the frontend needs more than basic features. React has the advantage of simple frontends. A React frontend that grows to medium size will need more design and architecture work to be comparable to an Angular solution and might take more effort during development due to its less opinionated design. My impression is: React is the kitplane that you have to assemble yourself. Angular is the plane that is rolled out of the factory. The Go/Gin/Gorm backend works well. The Go language is much simpler than Java and makes reading it fast. Go can be learned in a relatively short amount of time and has strict types and a multi-threading concept that Project Loom tries to add to Java. The Gin framework offers the features needed to develop the controllers and can be compared to the Spring Boot framework in features and ease of development. The Gorm framework offers the features needed to develop the repositories for database access and management and can be compared to the Spring Boot framework in terms of features and ease of development. The selling point of Go is its lower memory consumption and fast startup because it compiles to a binary and does not need a virtual machine. Go and Java have garbage collection. Java can catch up with Project Graal on startup time, but the medium- to large-sized examples have to be available and analyzed first for memory consumption. A decision can be based on developer skills, the amount of memory saved, and the expected future of Project Graal.
Sometimes you need to control access to the data in your databases in a very granular way - much more granular than most databases allow. For instance, you might want some database users to be able to read only the last few digits of some credit card number, or you may need certain columns of certain rows to be readable by certain users only. Or maybe you need to hide some rows from some users under specific circumstances. The data still needs to be stored in the database, we just need to restrict who can see certain parts of that data. This is called data masking, and I've already talked about the two main approaches: static vs. dynamic data masking in a previous article. In this article, I'll show you how to roll your own dynamic data masking solution for Cassandra and Cassandra-compatible databases such as AWS Keyspaces, Azure Cosmos DB, and DataStax DSE, using a couple of off-the-shelf tools. What Cassandra Can Do on Its Own When it comes to hiding data, Cassandra provides table-level GRANT permissions, but nothing more fine-grained than that. Other Cassandra-compatible products, such as DataStax DSE, do provide some row- and column-level security, but even that has significant limitations. To narrow down how people access some tables, most relational databases offer the concept of views. Cassandra has materialized views, which are tables that are derived from other tables. Unfortunately, for materialized views, among many other restrictions, Cassandra requires that the columns must be the naked columns from the base table. This, and the other restrictions, means that materialized views are only tangentially useful for data masking, and cannot cover the vast majority of use cases. You might think you're stuck at this point. The fine folks in the Cassandra team are in fact working on a data masking solution, but that's still some ways away, and in any case, it will be limited. There is another option: using a programmable database proxy to shape the queries and the corresponding result sets. How Would a Proxy Help? The idea is simple: we introduce a programmable proxy between the database clients and the database server(s). We can then define some simple logic in the proxy, which will enforce our data masking policies as the network traffic goes back and forth (through the proxy) between clients and servers: Standing up a database proxy is easy: it's just a Docker container, and we can set up the database connection in just a few clicks. The database clients then connect to the proxy instead of directly to the database. Other than that, the clients and the server will have no idea that they are talking to each other through a proxy. Because the proxy works at the network level, no changes are required to either side, and this works with any Cassandra-compatible implementations such as AWS Keyspaces and Azure CosmosDB. Once the proxy is in place, we can define filters in the proxy to shape the traffic. For data masking, we have three possibilities: Reject invalid queries Rewrite invalid queries Modify result sets Let's take a look at each approach. Rejecting Queries Just because a database client sends you a request doesn't mean that you have to execute it. The proxy can look at a query and decide to reject it if it does not obey our access requirements. There are two main ways to do this: Limiting queries to a list of pre-approved queries, possibly depending on the user, the user's IP address, or any other factor Examining the query before it gets executed, and rejecting it if it does not satisfy certain criteria Query Control Enforcing a list of pre-approved queries is called query control, and I have covered that topic in a previous article. It's a simple idea: you record all the queries during a period of time (like during testing), then you only allow those queries after that. Any query that is not on the list gets rejected (or given an empty result set if we want to be more sneaky). This is a solid approach, but it only works for some use cases. For instance, this is not a viable solution if your queries are not 100% predictable. Still, it's a good tool to have in your toolbox. Vetting Queries A more subtle approach consists of examining the queries and determining whether they are acceptable or not. This is of course trickier - people can be very clever - but Cassandra's query language CQL is not as complex as typical SQL languages, making this a practical solution. For instance, we could decide that we don't want certain users to have access to the phones column in our customers table. In that case, we could simply reject any queries on that table that either specify the phones column, or that try to use the * operator to get all columns. This is easily done thanks to Gallium Data's CQL parser service, which can parse any CQL command and tell us exactly what that command does, and which tables/columns are involved. In the proxy, our filter will: Get the CQL from the query or prepared statement Send it to the parser service to analyze it If the CQL refers to the phones column, reject the request Otherwise let the query proceed to Cassandra See the hands-on tutorial for this article for all the details. Rewriting Queries A more flexible approach is to modify incoming queries so that they satisfy our criteria. For instance, let's say we still want to restrict access to the column phones in the table customers. Again, we can use the CQL parser service to determine whether an incoming query uses this column, or uses * to request all columns. If the query does use * to request all columns, the safest thing to do is to reject the query. It would be tempting to think that we can replace the asterisk with the names of the columns, but that is actually quite difficult to do correctly, as illustrated by this perfectly valid query: CQL SELECT /* all */ * FROM credit_card_txs If the query uses the phones column, we can replace it with something that will hide the data as we wish. Let's say we want to hide the phones column completely. You might think that you can rewrite the query from: CQL SELECT id, country, first_name, last_name, phones FROM customers to: CQL SELECT id, country, first_name, last_name, '****' as phones FROM customers That seems reasonable, but sadly, Cassandra does not support this: Shell InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot infer type for term '****' in selection clause (try using a cast to force a type)" Thankfully, there is a slightly ugly workaround: CQL SELECT id, country, first_name, last_name, blobAsText(textAsBlob('****')) as phones FROM customers We could do somewhat better using user-defined functions, if your Cassandra implementation supports them. We can thus easily create a filter in the proxy that will rewrite the query to mask the value of the phones column (see the hands-on tutorial linked previously for all the details). Let's test that: Shell cqlsh:demo> SELECT id, country, first_name, last_name, phones FROM customers; id | country | first_name | last_name | phones ----+---------+------------+---------------+--------- 23 | WF | Wanda | Williams | **** 5 | ES | Eric | Edmunds | **** 10 | JP | Juliet | Jefferson | **** 16 | PE | Patricia | Pérez | **** etc... If you need to hide only portions of a column, and your Cassandra implementation does not allow for user-defined functions, your only option is to modify result sets - let's look at that now. Modifying Result Sets For the ultimate in power and flexibility, we can modify result sets on their way back to the database clients: We can modify individual columns in specific rows. We can remove entire rows from result sets. We can also insert new rows in result sets, change the shape of result sets, etc., but that's beyond the scope of this article. Changing a column in a row is usually trivial with a few lines of code in a filter, e.g.: JavaScript let phones = context.row.phones; if (phones && phones.home) { phones.home.phone_number = "####"; } Let's try it out: Shell cqlsh:gallium_demo> SELECT id, country, last_name, phones FROM customers; id | country | last_name | phones ----+---------+---------------+----------------------------------------------------- 23 | WF | Williams | {'home': {country_code: 123, phone_number: '####'} 5 | ES | Edmunds | {'home': {country_code: 55, phone_number: '####'} 16 | PE | Pérez | {'home': {country_code: 116, phone_number: '####'} etc... Notice how much more precise this is: we're not blotting out the entire column, we're only hiding parts of it. Removing a row from a result set is also easy. It can be done either by setting parameters in the filter, or for more complex cases, in filter code, e.g.: JavaScript // Hide customers whose home phone number is in Japan let phones = context.row.phones; if (phones && phones.home && phones.home.country_code === 81) { context.row.remove(); } Again, you can see this in action in the hands-on tutorial for this article. Nothing has changed in the database: we're only affecting the data as it travels back to the Cassandra client. In Summary We've looked at three general techniques for hiding data in Cassandra with a proxy: Rejecting queries that try to access secret data Modifying queries so they do not show secret data Modifying result sets to hide secret data Rejecting queries is a blunt but effective tool. It might be sufficient for many use cases. Modifying queries has the advantage of performance: only one packet (the query) has to be modified, and the rest can work as usual. However, this technique can only work for some types of data masking requirements. Modifying result sets, on the other hand, is slightly more expensive, but it gives you complete control: you can change literally anything in the result set, no matter how fine-grained the required changes are. These techniques are not mutually exclusive: many solutions will involve a combination, perhaps in conjunction with other approaches such as fine-grained security (if available) or the data masking solution that will someday be available in Cassandra. But for complete power and flexibility, you can't beat a programmable database proxy.
Pydantic and Elasticsearch: Dynamic Couple for Data Management
June 9, 2023 by
Enhancing Search Engine Efficiency With Elasticsearch Aliases
June 9, 2023 by
Explainable AI: Making the Black Box Transparent
May 16, 2023 by
A Practical Guide for Container Security
June 9, 2023 by
Ethereum and EVM Traces Explained
June 9, 2023 by CORE
Low Code vs. Traditional Development: A Comprehensive Comparison
May 16, 2023 by
A Practical Guide for Container Security
June 9, 2023 by
Low Code vs. Traditional Development: A Comprehensive Comparison
May 16, 2023 by
React Helpful Hints Series: Volume 2
June 9, 2023 by CORE
Five IntelliJ Idea Plugins That Will Change the Way You Code
May 15, 2023 by