Migrate, Modernize and Build Java Web Apps on Azure: This live workshop will cover methods to enhance Java application development workflow.
Modern Digital Website Security: Prepare to face any form of malicious web activity and enable your sites to optimally serve your customers.
New Profiles Now on DZone!
Quarkus 3: The Future of Java Microservices With Virtual Threads and Beyond
Observability and Application Performance
Making data-driven decisions, as well as business-critical and technical considerations, first comes down to the accuracy, depth, and usability of the data itself. To build the most performant and resilient applications, teams must stretch beyond monitoring into the world of data, telemetry, and observability. And as a result, you'll gain a far deeper understanding of system performance, enabling you to tackle key challenges that arise from the distributed, modular, and complex nature of modern technical environments.Today, and moving into the future, it's no longer about monitoring logs, metrics, and traces alone — instead, it’s more deeply rooted in a performance-centric team culture, end-to-end monitoring and observability, and the thoughtful usage of data analytics.In DZone's 2023 Observability and Application Performance Trend Report, we delve into emerging trends, covering everything from site reliability and app performance monitoring to observability maturity and AIOps, in our original research. Readers will also find insights from members of the DZone Community, who cover a selection of hand-picked topics, including the benefits and challenges of managing modern application performance, distributed cloud architecture considerations and design patterns for resiliency, observability vs. monitoring and how to practice both effectively, SRE team scalability, and more.
E-Commerce Development Essentials
Real-Time Data Architecture Patterns
This guide delves into the meticulous steps of deploying a Spring MVC application on a local Tomcat server. This hands-on tutorial is designed to equip you with the skills essential for seamless deployment within your development environment. Follow along to enhance your proficiency in deploying robust and reliable Spring MVC apps, ensuring a smooth transition from development to production. Introduction In the preliminary stages, it's crucial to recognize the pivotal role of deploying a Spring MVC application on a local Tomcat server. This initial step holds immense significance as it grants developers the opportunity to rigorously test their applications within an environment closely mirroring the production setup. The emphasis on local deployment sets the stage for a seamless transition, ensuring that the application, when deemed ready for release, aligns effortlessly with the intricacies of the production environment. This strategic approach enhances reliability and mitigates potential challenges in the later stages of the development life cycle. Prerequisites To get started, ensure you have the necessary tools and software installed: Spring MVC Project: A well-structured Spring MVC project. Tomcat Server: Download and install Apache Tomcat, the popular servlet container. Integrated Development Environment (IDE): Use your preferred IDE (Eclipse, IntelliJ, etc.) for efficient development. Configuring the Spring MVC App Initiating the deployment process entails meticulous configuration of your Spring MVC app development project. Navigate to your project within the Integrated Development Environment (IDE) and focus on pivotal files such as `web.xml` and `dispatcher-servlet.xml` These files house crucial configurations that dictate the behavior of your Spring MVC application. Pay meticulous attention to details like servlet mappings and context configurations within these files. This configuration step is foundational, as it establishes the groundwork for the application's interaction with the servlet container, paving the way for a well-orchestrated deployment on the local Tomcat server. 1. Create the Spring Configuration Class In a typical Spring MVC application, you create a Java configuration class to define the application's beans and configuration settings. Let's call this class 'AppConfig'. Java import org.springframework.context.annotation.ComponentScan; import org.springframework.context.annotation.Configuration; import org.springframework.web.servlet.config.annotation.EnableWebMvc; @Configuration @EnableWebMvc @ComponentScan(basePackages = "com.example.controller") // Replace with your actual controller package public class AppConfig { // Additional configurations or bean definitions can go here } Explanation '@Configuration': Marks the class as a configuration class. '@EnableWebMvc': Enables Spring MVC features. '@ComponentScan': Scans for Spring components (like controllers) in the specified package. 2. Create the DispatcherServlet Configuration Create a class that extends 'AbstractAnnotationConfigDispatcherServletInitializer' to configure the DispatcherServlet. Java import org.springframework.web.servlet.support.AbstractAnnotationConfigDispatcherServletInitializer; public class MyWebAppInitializer extends AbstractAnnotationConfigDispatcherServletInitializer { @Override protected Class<?>[] getRootConfigClasses() { return null; // No root configuration for this example } @Override protected Class<?>[] getServletConfigClasses() { return new Class[]{AppConfig.class}; // Specify your configuration class } @Override protected String[] getServletMappings() { return new String[]{"/"}; } } Explanation 'getServletConfigClasses()' : Specifies the configuration class (in this case, AppConfig) for the DispatcherServlet. 'getServletMappings()' : Maps the DispatcherServlet to the root URL ("/"). Now, you've configured the basic setup for a Spring MVC application. This includes setting up component scanning, enabling MVC features, and configuring the DispatcherServlet. Adjust the package names and additional configurations based on your application's structure and requirements. Setting up Tomcat Server Locally Transitioning to the next phase involves the establishment of a local Tomcat server. Start by downloading the latest version of Apache Tomcat from the official website and meticulously follow the installation instructions. Once the installation process is complete, the next pivotal step is configuring Tomcat within your Integrated Development Environment (IDE). If you're using Eclipse, for example, seamlessly navigate to the server tab, initiate the addition of a new server, and opt for Tomcat from the available options. This localized setup ensures a synchronized and conducive environment for the impending deployment of your Spring MVC application. Building the Spring MVC App As you progress, it's imperative to verify that your Spring MVC project is poised for a seamless build. Leverage automation tools such as Maven or Gradle to expedite this process efficiently. Integrate the requisite dependencies into your project configuration file, such as the `pom.xml` for Maven users. Execute the build command to orchestrate the compilation and assembly of your project. This step ensures that your Spring MVC application is equipped with all the necessary components and dependencies, laying a solid foundation for subsequent phases of deployment on the local Tomcat server. 1. Project Structure Ensure that your project follows a standard Maven directory structure: CSS project-root │ ├── src │ ├── main │ │ ├── java │ │ │ └── com │ │ │ └── example │ │ │ ├── controller │ │ │ │ └── MyController.java │ │ │ └── AppConfig.java │ │ └── resources │ └── webapp │ └── WEB-INF │ └── views │ ├── pom.xml └── web.xml /* Write CSS Here */ 2. MyController.java: Sample Controller Create a simple controller that handles requests. This is a basic example; you can expand it based on your application requirements. Java package com.example.controller; import org.springframework.stereotype.Controller; import org.springframework.ui.Model; import org.springframework.web.bind.annotation.RequestMapping; @Controller public class MyController { @RequestMapping("/hello") public String hello(Model model) { model.addAttribute("message", "Hello, Spring MVC!"); return "hello"; // This corresponds to the view name } } 3. View ('hello.jsp') Create a simple JSP file under 'src/main/webapp/WEB-INF/views/hello.jsp' Java Server Pages <%@ page contentType="text/html;charset=UTF-8" language="java" %> <html> <head> <title>Hello Page</title> </head> <body> <h2>${message}</h2> </body> </html> 4. 'AppConfig.java': Configuration Ensure that AppConfig.java scans the package where your controllers are located Java package com.example; import org.springframework.context.annotation.ComponentScan; import org.springframework.context.annotation.Configuration; import org.springframework.web.servlet.config.annotation.EnableWebMvc; @Configuration @EnableWebMvc @ComponentScan(basePackages = "com.example.controller") public class AppConfig { // Additional configurations or bean definitions can go here } 5. 'web.xml': Web Application Configuration Configure the DispatcherServlet in web.xml XML <web-app xmlns="http://xmlns.jcp.org/xml/ns/javaee" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/javaee http://xmlns.jcp.org/xml/ns/javaee/web-app_4_0.xsd" version="4.0"> <servlet> <servlet-name>dispatcher</servlet-name> <servlet-class>org.springframework.web.servlet.DispatcherServlet</servlet-class> <init-param> <param-name>contextConfigLocation</param-name> <param-value>/WEB-INF/dispatcher-servlet.xml</param-value> </init-param> <load-on-startup>1</load-on-startup> </servlet> <servlet-mapping> <servlet-name>dispatcher</servlet-name> <url-pattern>/</url-pattern> </servlet-mapping> </web-app> 6. 'dispatcher-servlet'.xml Create a dispatcher-servlet.xml file under src/main/webapp/WEB-INF/ to define additional Spring MVC configurations: XML <?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:context="http://www.springframework.org/schema/context" xmlns:mvc="http://www.springframework.org/schema/mvc" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd http://www.springframework.org/schema/mvc http://www.springframework.org/schema/mvc/spring-mvc.xsd"> <!-- Enables component scanning for the specified package --> <context:component-scan base-package="com.example.controller"/> <!-- Enables annotation-driven Spring MVC --> <mvc:annotation-driven/> <!-- Resolves views selected for rendering by @Controllers to .jsp resources in the /WEB-INF/views directory --> <bean class="org.springframework.web.servlet.view.InternalResourceViewResolver"> <property name="prefix" value="/WEB-INF/views/"/> <property name="suffix" value=".jsp"/> </bean> </beans> 7. Run the Application Run your application (this depends on your IDE). Access the hello endpoint at http://localhost:8080/your-app-context/hello. You should see the "Hello, Spring MVC!" message. Remember to replace "your-app-context" with the actual context path of your deployed application. War File Creation Transitioning to the packaging phase, it's time to create a deployable Web Application Archive (WAR) file for your Spring MVC application. This file serves as the standardized encapsulation of your Java web application. Utilize prevalent build tools like Maven to automate this process, simplifying the generation of the WAR file. Typically, you'll find this compiled archive neatly organized within the target directory. The WAR file encapsulates your Spring MVC app, ready to be seamlessly deployed onto the local Tomcat server, marking a pivotal step towards actualizing the functionality of your application in a real-world web environment. Deploying on Tomcat Embarking on the deployment phase, the excitement builds as you launch your application onto the local Tomcat server. This involves a straightforward process: copy the previously generated WAR file into the designated `webapps` directory within your Tomcat installation. This directory serves as the portal for deploying web applications. Subsequently, initiate or restart the Tomcat server and watch as it autonomously detects and deploys your Spring MVC application. This automated deployment mechanism streamlines the process, ensuring that your application is swiftly up and running on the local Tomcat server, ready for comprehensive testing and further development iterations. Testing the Deployed App Upon successful deployment, it's time to conduct a comprehensive test of your Spring MVC application. Open your web browser and enter the address `http://localhost:8080/your-app-context`, replacing `your-app-context` with the precise context path assigned to your deployed application. This step allows you to visually inspect and interact with your application in a real-time web environment. If all configurations align seamlessly, you should witness your Spring MVC app dynamically come to life, marking a pivotal moment in the deployment process and affirming the correct integration of your application with the local Tomcat server. Tips for Efficient Development To enhance your development workflow, consider the following tips: Hot swapping: Leverage hot-swapping features in your IDE to avoid restarting the server after every code change. Logging: Implement comprehensive logging to troubleshoot any issues during deployment. Monitoring: Utilize tools like JConsole or VisualVM to monitor your application's performance metrics. Conclusion In reaching this conclusion, congratulations are in order! The successful deployment of your Spring MVC app on a local Tomcat server marks a significant milestone. This guide has imparted a foundational understanding of the deployment process, a vital asset for a seamless transition to production environments. As you persist in honing your development skills, bear in mind that adept deployment practices are instrumental in delivering applications of utmost robustness and reliability. Your achievement in this deployment endeavor underscores your capability to orchestrate a streamlined and effective deployment pipeline for future projects. Well done!
Apache Airflow is an open-source platform that allows you to programmatically author, schedule, and monitor workflows. It uses Python as its programming language and offers a flexible architecture suited for both small-scale and large-scale data processing. The platform supports the concept of Directed Acyclic Graphs to define workflows, making it easy to visualize complex data pipelines. One of the key features of Apache Airflow is its ability to schedule and trigger batch jobs, making it a popular choice for processing large volumes of data. It provides excellent support for integrating with various data processing technologies and frameworks such as Apache Hadoop and Apache Spark. By using Apache Airflow for batch processing, you can easily define and schedule your data processing tasks, ensuring that they are executed in the desired order and within the specified time constraints. Batch processing is a common approach in big data processing that involves the processing of data in large volumes, typically at regular time intervals. This approach is well-suited for scenarios where data can be collected over a period and processed together as a batch. Within the fintech sector, batch processing caters to a wide range of applications, including but not limited to authorization and settlement processes, management of recurring payments, enabling reconciliation operations, performing fraud detection and analytic tasks, adhering to regulatory mandates, and overseeing changes to customer relationship management systems. Let's explore a simple use case of processing an input file and writing back to the output file using Apache Airflow. To get started with Apache Airflow, you can follow the official documentation for installation and setup. Overview diagram illustrating the basic flow of a batch processing scenario Setting the Stage Python from airflow import DAG from airflow.operators.python import PythonOperator from datetime import datetime, timedelta # Default arguments for the DAG default_args = { 'owner': 'airflow', 'start_date': datetime(2021, 1, 1), 'retries': 1, 'retry_delay': timedelta(minutes=5), } The script begins by importing necessary modules and defining default arguments for the DAG. These default parameters include the DAG owner, start date, and retry settings. Reading Function: Extracting Data Python def read_function(**kwargs): ti = kwargs["ti"] # Read from a file (example: input.txt) with open("path/to/file/input_file.txt", "r") as file: # Read the remaining lines lines = file.readlines() # Push each line to XCom storage for i, line in enumerate(lines): ti.xcom_push(key=f"line_{i}", value=line.strip()) # Push the total number of lines to XCom storage ti.xcom_push(key="num_lines", value=len(lines)) The read_function simulates the extraction of data by reading lines from a file (`input.txt`). It then uses Airflow's XCom feature to push each line and the total number of lines into storage, making it accessible to subsequent tasks. Sample Input File Plain Text CardNumber,TransactionId,Amount,TxnType,Recurring,Date 1,123456789,100.00,Debit,Monthly,2023-12-31 2,987654321,50.00,Credit,Weekly,2023-10-15 3,456789012,75.50,Debit,Monthly,2023-11-30 4,555111222,120.75,Credit,Daily,2023-09-30 In the given input file, we can see the handling of a recurring transactions file. Processing Function: Transforming Data Python def process_function(**kwargs): ti = kwargs["ti"] # Pull all lines from XCom storage lines = [ti.xcom_pull(task_ids="read", key=f"line_{i}") for i in range(ti.xcom_pull(task_ids="read", key="num_lines"))] # Process and print all lines for i, line in enumerate(lines): logging.info(f"Make Payment Transaction {i + 1}: {line}") The process_function pulls all lines from XCom storage and simulates the transformation process by printing each line to the console. This task demonstrates the flexibility of Airflow in handling data flow between tasks. The process_function can have multiple implementations, allowing it to either invoke a web service call to execute the transaction or call another DAG to follow a different flow. Logs Plain Text [2023-11-28, 03:49:06 UTC] {taskinstance.py:1662} INFO - Exporting env vars: AIRFLOW_CTX_DAG_OWNER='airflow' AIRFLOW_CTX_DAG_ID='batch_processing_dag' AIRFLOW_CTX_TASK_ID='process' AIRFLOW_CTX_EXECUTION_DATE='2023-11-28T03:48:00+00:00' AIRFLOW_CTX_TRY_NUMBER='1' AIRFLOW_CTX_DAG_RUN_ID='scheduled__2023-11-28T03:48:00+00:00' [2023-11-28, 03:49:06 UTC] {batch_processing_dag.py:38} INFO - Make Payment Transaction 1: 1,123456789,100.00,Debit,Monthly,2023-12-31 [2023-11-28, 03:49:06 UTC] {batch_processing_dag.py:38} INFO - Make Payment Transaction 2: 2,987654321,50.00,Credit,Weekly,2023-10-15 [2023-11-28, 03:49:06 UTC] {batch_processing_dag.py:38} INFO - Make Payment Transaction 3: 3,456789012,75.50,Debit,Monthly,2023-11-30 [2023-11-28, 03:49:06 UTC] {batch_processing_dag.py:38} INFO - Make Payment Transaction 4: 4,555111222,120.75,Credit,Daily,2023-09-30 [2023-11-28, 03:49:06 UTC] {python.py:194} INFO - Done. Returned value was: None Writing Function: Loading Data Python def write_function(**kwargs): ti = kwargs["ti"] # Pull all lines from XCom storage lines = [ti.xcom_pull(task_ids="read", key=f"line_{i}") for i in range(ti.xcom_pull(task_ids="read", key="num_lines"))] # Write all lines to an output file (example: output.txt) with open("path/to/file/processed.txt", "a") as file: for i, line in enumerate(lines): processed_line = f"{line.strip()} PROCESSED" file.write(f"{processed_line}\n") The write_function pulls all lines from XCom storage and writes them to an output file (`processed.txt`). Sample Output File After Transaction Is Processed Plain Text 1,123456789,100.00,Debit,Monthly,2023-12-31 PROCESSED 2,987654321,50.00,Credit,Weekly,2023-10-15 PROCESSED 3,456789012,75.50,Debit,Monthly,2023-11-30 PROCESSED 4,555111222,120.75,Credit,Daily,2023-09-30 PROCESSED DAG Definition: Orchestrating the Workflow Python dag = DAG( 'batch_processing_dag', default_args=default_args, description='DAG with Read, Process, and Write functions', schedule_interval='*/1 * * * *', # Set the schedule interval according to your needs catchup=False, ) The DAG is instantiated with the name batch_processing_dag, the previously defined default arguments, a description, a schedule interval (running every 1 minute), and the catchup parameter set to False. Task Definitions: Executing the Functions Python # Task to read from a file and push to XCom storage read_task = PythonOperator( task_id='read', python_callable=read_function, provide_context=True, dag=dag, ) # Task to process the data from XCom storage (print to console) process_task = PythonOperator( task_id='process', python_callable=process_function, provide_context=True, dag=dag, ) # Task to write the data back to an output file write_task = PythonOperator( task_id='write', python_callable=write_function, provide_context=True, dag=dag, ) Three tasks (read_task, process_task, and write_task) are defined using the PythonOperator. Each task is associated with one of the Python functions (read_function, process_function, and write_function). The provide_context=True parameter allows the functions to access the task instance and context information. Defining Task Dependencies Python # Define task dependencies read_task >> process_task >> write_task The task dependencies are specified using the >> operator, indicating the order in which the tasks should be executed. Conclusion In conclusion, Apache Airflow proves to be a flexible open-source tool that is great at managing workflows, especially when it comes to batch processing. It is the best choice for organizations of all sizes because it has features like dynamic workflow definition, support for Directed Acyclic Graphs (DAGs), careful task dependency management, full monitoring and logging, efficient parallel execution, and strong error handling. Illustrated by a straightforward batch processing scenario, the example emphasizes Apache Airflow's user-friendly interface and its adaptability to a range of data processing needs, showcasing its ease of use and versatility.
I had to recently add UI tests for an application implemented with Swing library for the Posmulten project. The GUI does not do any rocket science. It does what the Posmulten project was created for, generating DDL statements that make RLS policy for the Postgres database, but with a user interface based on Swing components. Now, because the posmulten is an open-source project and the CI/CD process uses GitHub action, it would be worth having tests covering the UI application's functionality. Tests that could be run in a headless environment. Testing Framework As for testing purposes, I picked the AssertJ Swing library. It is effortless to mimic application users' actions. Not to mention that I could, with no effort, check application states and their components. Below is an example of a simple test case that checks if the correct panel will show up with the expected content after entering text and clicking the correct button. Java @Test public void shouldDisplayCreationScriptsForCorrectConfigurationWhenClickingSubmitButton() throws SharedSchemaContextBuilderException, InvalidConfigurationException { // GIVEN String yaml = "Some yaml"; ISharedSchemaContext context = mock(ISharedSchemaContext.class); Mockito.when(factory.build(eq(yaml), any(DefaultDecoratorContext.class))).thenReturn(context); List<SQLDefinition> definitions = asList(sqlDef("DEF 1", null), sqlDef("ALTER DEFINIT and Function", null)); Mockito.when(context.getSqlDefinitions()).thenReturn(definitions); window.textBox(CONFIGURATION_TEXTFIELD_NAME).enterText(yaml); // WHEN window.button("submitBtn").click(); // THEN window.textBox(CREATION_SCRIPTS_TEXTFIELD_NAME).requireText("DEF 1" + "\n" + "ALTER DEFINIT and Function"); // Error panel should not be visible findJTabbedPaneFixtureByName(ERROR_TAB_PANEL_NAME).requireNotVisible(); } You can find the complete test code here. Posmulten The library for which the GUI application was created is generally a simple DDL statement builder that makes RSL policy in the Postgres database. The generated RLS policies allow applications communicating with the Postgres database to work in Mutli-tenant architecture with the shared schema strategy. For more info, please check below links: Posmulten GUI module Shared Schema Strategy With Postgres Multi-tenancy Architecture With Shared Schema Strategy in Webapp Application Based on Spring-boot, Thymeleaf, and Posmulten-hibernate Maven Configuration It is worth excluding UI tests from unit tests. Although tests might not be fully e2e with mocked components, it is worth excluding from running together with unit tests because their execution might take a little longer than running standard unit tests. XML <profile> <id>swing-tests</id> <activation> <activeByDefault>false</activeByDefault> </activation> <build> <plugins> <plugin> <groupId>org.codehaus.gmavenplus</groupId> <artifactId>gmavenplus-plugin</artifactId> <version>1.5</version> <executions> <execution> <goals> <goal>compile</goal> <goal>testCompile</goal> </goals> </execution> </executions> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <version>2.22.1</version> <configuration> <includes> <include>**/*SwingTest.java</include> </includes> </configuration> </plugin> </plugins> </build> </profile> Full Maven file. To run tests locally, on an environment with a Graphics card, you need to execute tests with a maven wrapper, like below. Shell ./mvnw -pl :openwebstart '-DxvfbRunningTests=true' -P !unit-tests,swing-tests test GitHub Action Now, moving to the GitHub action, running the UI test on the environment with a Graphics card seems easy. However, there might be situations when some UI windows with a WhatsApp or MS Teams notification appear on the Desktop on which UI tests are executed, and our tests will fail. Tests should be repeated in such cases, but that is not the problem. Many more problems can occur when we try to execute tests on a headless environment, which is probably the default environment for every CI/CD pipeline. And we still need to run those tests and ensure they will pass no matter if they are executed in such an environment. When we ask how to execute UI tests in a headless environment, the first suggestion on the internet is to use the Xvfb. However, the contributors of AssertJ Swing suggest a different approach. Our tests maximize windows and do other stuff the default window manager of xvfb doesn't support. TightVNC makes it easy to use another window manager. Just add gnome-wm & (or the window manager of your choice) to ~/.vnc/xstartup and you're ready to run. GitHub So, I followed suggestions from the contributors' team and used the Tightvncserver. I had some problems with adding the gnome-wm. Instead, I used the Openbox. Below, you can see the step that runs UI tests. The full GitHub action file can be found here. The script files used to configure CI can be found here. YAML testing_swing_app: needs: [compilation_and_unit_tests, database_tests, testing_configuration_jar] runs-on: ubuntu-latest name: "Testing Swing Application" steps: - name: Git checkout uses: actions/checkout@v2 # Install JDKs and maven toolchain - uses: actions/setup-java@v3 name: Set up JDK 11 id: setupJava11 with: distribution: 'zulu' # See 'Supported distributions' for available options java-version: '11' - name: Set up JDK 1.8 id: setupJava8 uses: actions/setup-java@v1 with: java-version: 1.8 - uses: cactuslab/maven-toolchains-xml-action@v1 with: toolchains: | [ {"jdkVersion": "8", "jdkHome": "${{steps.setupJava8.outputs.path}"}, {"jdkVersion": "11", "jdkHome": "${{steps.setupJava11.outputs.path}"} ] - name: Install tightvncserver run: sudo apt-get update && sudo apt install tightvncserver - name: Install openbox run: sudo apt install openbox - name: Copy xstartup run: mkdir $HOME/.vnc && cp ./swing/xstartup $HOME/.vnc/xstartup && chmod +x $HOME/.vnc/xstartup - name: Setting password for tightvncserver run: ./swing/setpassword.sh - name: Run Swing tests id: swingTest1 continue-on-error: true run: ./mvnw -DskipTests --quiet clean install && ./swing/execute-on-vnc.sh ./mvnw -pl :openwebstart '-DxvfbRunningTests=true' -P !unit-tests,swing-tests test #https://www.thisdot.co/blog/how-to-retry-failed-steps-in-github-action-workflows/ # https://stackoverflow.com/questions/54443705/change-default-screen-resolution-on-headless-ubuntu - name: Run Swing tests (Second time) id: swingTest2 if: steps.swingTest1.outcome == 'failure' run: ./mvnw -DskipTests --quiet clean install && ./swing/execute-on-vnc.sh ./mvnw -pl :openwebstart '-DxvfbRunningTests=true' -P !unit-tests,swing-tests test Retry Failed Steps in GitHub Action Workflows As you probably saw in the GitHub action file, the last step, which executes the UI tests, is added twice. After correctly setting up Tightvncserver and Openbox, I didn't observe that the second step had to be executed, except at the beginning during deployment when there were not a lot of UI components. I used Xvfb, and sometimes, the CI passed only after the second step. So even if there is no problem with executing the test the first time right now, then it is still worth executing those tests the second time in case of failure. To check if the first step failed, we first have to name it. In this case, the name is "swingTest1". In the second step, we use the "if" property like the below: YAML if: steps.swingTest1.outcome == 'failure' And that is generally all for running the step a second time in case of failure. Check this resource if you want to check other ways to execute the step a second time. Summary Setting CI for UI tests might not be a trivial task, but it can benefit a project with any GUI. Not all things can be tested with unit tests.
Have you ever wondered how data warehouses are different from Databases? And what are Data Lakes and Data Lake Houses? Let’s understand these with a hypothetical example. Bookster.biz is the new sensation in selling books worldwide. The business is flourishing, and they need to keep track of a lot of data: a large catalog of millions of books, millions of customers worldwide placing billions of orders to buy books. How do they keep track of all this data? How do they ensure their website and apps don’t grind to a halt because of all this load? Databases to the Rescue Databases are the workhorses of websites and mobile apps, handling all the data and millions of transactions. These databases come in many flavors (we will cover all different types of databases in a separate post). Still, the most popular ones are called Relational Databases (aka RDBMS), like MySQL, Postgres, Oracle, etc. Bookster would possibly have the following tables and schema (not exhaustive for brevity): BookCatalog: book ID, ISBN, title, authors, description, publisher, … BookInventory: book ID, number of books available for sale, ... Users: user ID, user name, email, … Orders: Order ID, book ID, user ID, payment information, order status, … When a user orders a book, Bookster will update two records simultaneously: reducing book inventory and inserting a new order entry in the Orders table. RDBMSs support transactions that enable such atomic operations where either all such operations succeed or all fail. Imagine if two or more users could order the last copy of a popular book. Without transaction support, all customers will place orders, and Bookster will have many pissed-off customers except one. Similarly, if the Database host crashes during the processing, the data may be inconsistent without transactions. This database interaction type is called Online Transaction Processing (aka OLTP), where the read and write operations happen very fast on a small amount of data, i.e., precisely two rows in the previous example. This is great. The customers are now happy, and they can order books fast. But the management wants to know what’s going on with the business. Which books are the best-sellers in different categories? Which authors are trending, and which are not selling much? How many orders are coming from which geographies or demographics? These kinds of answers are not accessible with just the databases. Data Warehouses Shine for Analytical Queries Data Warehouses (DWs) can handle large amounts of data, e.g., billions of orders, millions of book entries, etc. Bookster can load the data from the Database to the DW to answer the management questions. The analytical queries read a lot of data and summarise it in some form, like listing the total number of orders for a particular book broken down by geography and demographics. Examples of popular DWs are AWS Redshift, GCP BigQuery, etc. This database interaction type is called Online Analytical Processing (aka OLAP), where most reads happen on a large amount of data. The data is uploaded to the DWs in batches or can be streamed. The loading process is also known as ETL (Extract, Transform, and Load), which is done regularly to keep the DW in sync with the Database updates. DWs typically don't allow updating data but only add a newer version. Like RDBMS, DWs also have a notion of schema where tables and schema are well defined, and the ETL process converts the data into appropriate schema for loading. Some data doesn’t fit the schema easily but can be used by Machine Learning (ML) processes. For example, customers review different books as a text or a video review, and some rockstar ML engineers want to generate popular books by training an LLM on all books. So, the data can’t be structured as a strict schema anymore. Data Lakes help here by storing even more significant amounts of data with different formats and allowing efficient processing. Data Lakes and Data Lake Houses Are the Relatively New Kids on the Block Data Lakes (DLs) overcome the friction of converting the data into a specific format irrespective of if and when it will be used. Vast amounts of data in different native formats like JSON, text, binary, images, videos, etc., can be stored in a DL and converted to a specific schema at read time only when there is a need to process the data. The processing is flexible and scalable as DLs can support big data processing frameworks like Apache Spark. On the flip side, such flexibility could become a drawback if most of the data ingested is low quality due to the lack of data quality check or governance, making DL a ‘Data Swamp’ instead. That’s where the clever people of Databricks combined the goodness of DWs with DLs to create Data Lake Houses (DLHs). DLHs are more flexible than DWs, allowing schema both at the time of writing or reading, as needed, but with stricter mechanisms for data quality checks and metadata management, aka Data Governance. Also, DLHs allow flexibility in big data processing like DLs. The following table summarises the differences between these technologies: Key Characteristics Suitable for Drawbacks Examples Database Fast, small queries, transaction support Online use cases (OLTP) Not ideal for large analytical queries RDBMS: MySQL Data Warehouse Slow, large queries, no updates after write Analytics (OLAP) Less flexible as strict schema and lack of support for big data processing frameworks AWS Redshift, Google BigQuery, *Snowflake Data Lake Unstructured data, schema on read, flexible and big data processing Analytics (OLAP) Data quality issues due to lack of Data Governance *Snowflake, **AWS Lake Formation, **Databricks Delta Lake Data Lake House Structured or unstructured data, flexible with better Data Governance and supports big data processing Analytics (OLAP) More complex, less performance, and more expensive compared to DW *Snowflake, **AWS Lake Formation, **Databricks Delta Lake *Snowflake can be configured as a Data Warehouse, Data Lake, or Data Lake House. **AWS Lake Formation and Databricks Delta Lake can be configured as either Data Lake or Data Lake House.
Java, known for its versatility and robustness, has often faced criticism for its verbosity. However, it's essential to recognize that Java's perceived verbosity is not always a fault of the language itself but can be attributed to overengineering in code design. In this article, we'll explore the benefits of simplifying Java code by reducing unnecessary layers and interfaces and unlocking the power of simplicity for enhanced maintainability without sacrificing functionality. The Pitfall of Unnecessary Interfaces One common practice contributing to code complexity is the creation of interfaces without a clear purpose. Consider the classical case of having one interface for one implementation: Java public interface CreditCard { String payment(); } public class CreditCardImpl implements CreditCard{ String payment(); } The first sign of an unnecessary interface is the generation of a non-meaningful name, going against the principles of Clean Code advocated by Robert Martin. Instead of creating separate interfaces and implementations, a more straightforward approach is to have a single class handling both: Java public class CreditCard { public String payment() { return "Payment done!"; } } By eliminating the unnecessary interface, the code becomes more concise and adheres to the principles of clarity and simplicity. Choosing Interfaces Wisely Interfaces are potent tools in Java, but they should be used judiciously. One valid interface use case is implementing design patterns like the strategy pattern. For instance, you might have various strategies in a payment system, such as credit card payments, debit card payments, and more. In such scenarios, interfaces can help define a common contract: Java public interface Payment { String payment(); } public class CreditCard implements Payment { public String payment() { return "Credit card payment done!"; } } public class DebitCard implements Payment { public String payment() { return "Debit card payment done!"; } } Here, interfaces provide a unified structure for different payment strategies. The Unnecessary Layer Conundrum Another pitfall in code design involves the creation of unnecessary layers that act as mere pass-throughs, adding complexity without offering tangible benefits. Consider a scenario where an additional layer is introduced without any clear purpose: Java public class PaymentGateway { private CreditCard creditCard; public PaymentGateway(CreditCard creditCard) { this.creditCard = creditCard; } public String processPayment() { // Some processing logic return creditCard.payment(); } } In cases where the added layer serves no meaningful purpose, it's advisable to remove it, simplifying the code and improving its clarity: Java public class PaymentProcessor { private CreditCard creditCard; public PaymentProcessor(CreditCard creditCard) { this.creditCard = creditCard; } public String processPayment() { // Processing logic directly in the class return creditCard.payment(); } } Eliminating unnecessary layers makes the code more straightforward to maintain. Embracing Simplicity for Maintainability In conclusion, the key to unlocking the full potential of Java lies in embracing simplicity. Avoid unnecessary interfaces and layers that add complexity without providing clear benefits. Choose interfaces wisely, leveraging them for scenarios that enhance code structure, such as implementing design patterns. By simplifying your Java code, you make it more readable and maintainable, ensuring a more efficient and enjoyable development process. Video
In this article, I will show you how to use Cloudera DataFlow powered by Apache NiFi to interact with IBM WatsonX.AI foundation large language models in real time. We can work with any of the foundation models such as Google FLAN T5 XXL or IBM Granite models. I’ll show you how easy it is to build a real-time data pipeline feeding your Slack-like and mobile applications questions directly to secure WatsonX.AI models running in IBM Cloud. We will handle all the security, management, lineage, and governance with Cloudera Data Flow. As part of decision-making, we can choose different WatsonX.AI models on the fly based on what type of prompt it is. For example, if we want to continue a sentence versus answering a question I can pick different models. For questions answering Google FLAN T5 XXL works well. If I want to continue sentences I would use one of the IBM Granite models. You will notice how amazingly fast the WatsonX.AI models return the results we need. I do some quick enrichment and transformation and then send them out their way to Cloudera Apache Kafka to be used for continuous analytics and distribution to many other applications, systems, platforms, and downstream consumers. We will also output our answers to the original requester which could be someone in a Slack channel or someone in an application. All of this happens in real-time, with no code, full governance, lineage, data management, and security at any scale and on any platform. The power of IBM and Cloudera together in private, public, and hybrid cloud environments for real-time data and AI is just getting started. Try it today. Step By Step Real-Time Flow First, in Slack, I type a question: “Q: What is a good way to integrate Generative AI and Apache NiFi?” NiFi Flow Top Once that question is typed, the Slack server sends these events to our registered service. This can be hosted anywhere publicly facing. (Click here for Slack API link) Slack API Once enabled, your server will start receiving JSON events for each Slack post. This is easy to receive and parse in NiFi. Cloudera DataFlow enables receiving secure HTTPS REST calls in the public cloud-hosted edition with ease, even in Designer mode. NiFi Top Flow 2 In the first part of the flow, we received the REST JSON Post, which is as follows. Slackbot 1.0 (+https://api.slack.com/robots) application/json POST HTTP/1.1 { "token" : "qHvJe59yetAp1bao6wmQzH0C", "team_id" : "T1SD6MZMF", "context_team_id" : "T1SD6MZMF", "context_enterprise_id" : null, "api_app_id" : "A04U64MN9HS", "event" : { "type" : "message", "subtype" : "bot_message", "text" : "==== NiFi to IBM <http://WatsonX.AI|WatsonX.AI> LLM Answers\n\nOn Date: Wed, 15 Nov 20 This is a very rich detailed JSON file that we could push immediately raw to an Apache Iceberg Open Cloud Lakehouse, a Kafka topic, or an object store as a JSON document (Enhancement Option). I am just going to parse what I need. EvaluateJSONPath We parse out the channel ID and plain text of the post. I only want messages from general (“C1SD6N197”). Then I copy the texts to an inputs field as is required for Hugging Face. We check our input: if it’s stocks or weather (more to come) we avoid calling the LLM. SELECT * FROM FLOWFILE WHERE upper(inputs) like '%WEATHER%' AND not upper(inputs) like '%LLM SKIPPED%' SELECT * FROM FLOWFILE WHERE upper(inputs) like '%STOCK%' AND not upper(inputs) like '%LLM SKIPPED%' SELECT * FROM FLOWFILE WHERE (upper(inputs) like 'QUESTION:%' OR upper(inputs) like 'Q:%') and not upper(inputs) like '%WEATHER%' and not upper(inputs) like '%STOCK%' For Stocks processing: To parse what stock we need I am using my Open NLP processor to get it. So you will need to download the processor and the Entity extraction models. GitHub - tspannhw/nifi-nlp-processor: Apache NiFi NLP Processor Open NLP Example Apache NiFi Processor Then we pass that company name to an HTTP REST endpoint from AlphaVantage that converts the Company Name to Stock symbols. In free accounts, you only get a few calls a day, so if we fail we then bypass this step and try to just use whatever you passed in. Using RouteOnContent we filter an Error message out. Then we use a QueryRecord processor to convert from CSV to JSON and filter. SELECT name as companyName, symbol FROM FLOWFILE ORDER BY matchScore DESC LIMIT 1 We do a SplitRecord to ensure we are only one record. We then run EvaluateJsonPath to get our fields as attributes. In an UpdateAttribute we trim the symbol just in case. ${stockSymbol:trim()} We then pass that stock symbol to Twelve Data via InvokeHTTP to get our stock data. We then get a lot of stock data back. { "meta" : { "symbol" : "IBM", "interval" : "1min", "currency" : "USD", "exchange_timezone" : "America/New_York", "exchange" : "NYSE", "mic_code" : "XNYS", "type" : "Common Stock" }, "values" : [ { "datetime" : "2023-11-15 10:37:00", "open" : "152.07001", "high" : "152.08000", "low" : "151.99500", "close" : "152.00999", "volume" : "8525" }, { "datetime" : "2023-11-15 10:36:00", "open" : "152.08501", "high" : "152.12250", "low" : "152.08000", "close" : "152.08501", "volume" : "15204" } ... We then run EvaluateJSONPath to grab the exchange information. We fork the record to just get one record as this is just to return to Slack. We use UpdateRecord calls to enrich the stock data with other values. We then run a QueryRecord to limit us to 1 record to send to Slack. SELECT * FROM FLOWFILE ORDER BY 'datetime' DESC LIMIT 1 We run an EvaluateJsonPath to get the most value fields to display. We then run a PutSlack with our message. LLM Skipped. Stock Value for ${companyName} [${nlp_org_1}/${stockSymbol}] on ${date} is ${closeStockValue}. stock date ${stockdateTime}. stock exchange ${exchange} We also have a separate flow that is split from Company Name. In the first step, we call Yahoo Finance to get RSS headlines for that stock. https://feeds.finance.yahoo.com/rss/2.0/headline?s=${stockSymbol:trim()}®ion=US&lang=en-US We use QueryRecord to convert RSS/XML Records to JSON. We then run a SplitJSON to break out the news items. We run a SplitRecord to limit to 1 record. We use EvaluateJSONPath to get the fields we need for our Slack message. We then run UpdateRecord to finalize our JSON. We then send this message to Slack. LLM Skipped. Stock News Information for ${companyName} [${nlp_org_1}/${stockSymbol}] on ${date} ${title} : ${description}. ${guid} article date ${pubdate} For those who selected weather, we follow a similar route (we should add caching with Redis @ Aiven) to stocks. We use my OpenNLP processor to extract locations you might want to have weather on. The next step is taking the output of the processor and building a value to send to our Geoencoder. weatherlocation = ${nlp_location_1:notNull():ifElse(${nlp_location_1}, "New York City")} If we can’t find a valid location, I am going to say “New York City." We could use some other lookup. I am doing some work on loading all locations and could do some advanced PostgreSQL searches on that - or perhaps OpenSearch or a vectorized datastore. I pass that location to Open Meteo to find the geo via InvokeHTTP. https://geocoding-api.open-meteo.com/v1/search?name=${weatherlocation:trim():urlEncode()}&count=1&language=en&format=json We then parse the values we need from the results. { "results" : [ { "id" : 5128581, "name" : "New York", "latitude" : 40.71427, "longitude" : -74.00597, "elevation" : 10.0, "feature_code" : "PPL", "country_code" : "US", "admin1_id" : 5128638, "timezone" : "America/New_York", "population" : 8175133, "postcodes" : [ "10001", "10002", "10003", "10004", "10005", "10006", "10007", "10008", "10009", "10010", "10011", "10012", "10013", "10014", "10016", "10017", "10018", "10019", "10020", "10021", "10022", "10023", "10024", "10025", "10026", "10027", "10028", "10029", "10030", "10031", "10032", "10033", "10034", "10035", "10036", "10037", "10038", "10039", "10040", "10041", "10043", "10044", "10045", "10055", "10060", "10065", "10069", "10080", "10081", "10087", "10090", "10101", "10102", "10103", "10104", "10105", "10106", "10107", "10108", "10109", "10110", "10111", "10112", "10113", "10114", "10115", "10116", "10117", "10118", "10119", "10120", "10121", "10122", "10123", "10124", "10125", "10126", "10128", "10129", "10130", "10131", "10132", "10133", "10138", "10150", "10151", "10152", "10153", "10154", "10155", "10156", "10157", "10158", "10159", "10160", "10161", "10162", "10163", "10164", "10165", "10166", "10167", "10168", "10169", "10170", "10171", "10172", "10173", "10174", "10175", "10176", "10177", "10178", "10179", "10185", "10199", "10203", "10211", "10212", "10213", "10242", "10249", "10256", "10258", "10259", "10260", "10261", "10265", "10268", "10269", "10270", "10271", "10272", "10273", "10274", "10275", "10276", "10277", "10278", "10279", "10280", "10281", "10282", "10285", "10286" ], "country_id" : 6252001, "country" : "United States", "admin1" : "New York" } ], "generationtime_ms" : 0.92196465 } We then parse the results so we can call another API to get the current weather for that latitude and longitude via InvokeHTTP. https://api.weather.gov/points/${latitude:trim()},${longitude:trim()} The results are geo-json. { "@context": [ "https://geojson.org/geojson-ld/geojson-context.jsonld", { "@version": "1.1", "wx": "https://api.weather.gov/ontology#", "s": "https://schema.org/", "geo": "http://www.opengis.net/ont/geosparql#", "unit": "http://codes.wmo.int/common/unit/", "@vocab": "https://api.weather.gov/ontology#", "geometry": { "@id": "s:GeoCoordinates", "@type": "geo:wktLiteral" }, "city": "s:addressLocality", "state": "s:addressRegion", "distance": { "@id": "s:Distance", "@type": "s:QuantitativeValue" }, "bearing": { "@type": "s:QuantitativeValue" }, "value": { "@id": "s:value" }, "unitCode": { "@id": "s:unitCode", "@type": "@id" }, "forecastOffice": { "@type": "@id" }, "forecastGridData": { "@type": "@id" }, "publicZone": { "@type": "@id" }, "county": { "@type": "@id" } } ], "id": "https://api.weather.gov/points/40.7143,-74.006", "type": "Feature", "geometry": { "type": "Point", "coordinates": [ -74.006, 40.714300000000001 ] }, "properties": { "@id": "https://api.weather.gov/points/40.7143,-74.006", "@type": "wx:Point", "cwa": "OKX", "forecastOffice": "https://api.weather.gov/offices/OKX", "gridId": "OKX", "gridX": 33, "gridY": 35, "forecast": "https://api.weather.gov/gridpoints/OKX/33,35/forecast", "forecastHourly": "https://api.weather.gov/gridpoints/OKX/33,35/forecast/hourly", "forecastGridData": "https://api.weather.gov/gridpoints/OKX/33,35", "observationStations": "https://api.weather.gov/gridpoints/OKX/33,35/stations", "relativeLocation": { "type": "Feature", "geometry": { "type": "Point", "coordinates": [ -74.0279259, 40.745251000000003 ] }, "properties": { "city": "Hoboken", "state": "NJ", "distance": { "unitCode": "wmoUnit:m", "value": 3906.1522008034999 }, "bearing": { "unitCode": "wmoUnit:degree_(angle)", "value": 151 } } }, "forecastZone": "https://api.weather.gov/zones/forecast/NYZ072", "county": "https://api.weather.gov/zones/county/NYC061", "fireWeatherZone": "https://api.weather.gov/zones/fire/NYZ212", "timeZone": "America/New_York", "radarStation": "KDIX" } } We use EvaluateJSONPath to grab a forecast URL. Then we call that forecast URL via invokeHTTP. That produces a larger JSON output that we will parse for the results we want to return to Slack. { "@context": [ "https://geojson.org/geojson-ld/geojson-context.jsonld", { "@version": "1.1", "wx": "https://api.weather.gov/ontology#", "geo": "http://www.opengis.net/ont/geosparql#", "unit": "http://codes.wmo.int/common/unit/", "@vocab": "https://api.weather.gov/ontology#" } ], "type": "Feature", "geometry": { "type": "Polygon", "coordinates": [ [ [ -74.025095199999996, 40.727052399999998 ], [ -74.0295579, 40.705361699999997 ], [ -74.000948300000005, 40.701977499999998 ], [ -73.996479800000003, 40.723667899999995 ], [ -74.025095199999996, 40.727052399999998 ] ] ] }, "properties": { "updated": "2023-11-15T14:34:46+00:00", "units": "us", "forecastGenerator": "BaselineForecastGenerator", "generatedAt": "2023-11-15T15:11:39+00:00", "updateTime": "2023-11-15T14:34:46+00:00", "validTimes": "2023-11-15T08:00:00+00:00/P7DT17H", "elevation": { "unitCode": "wmoUnit:m", "value": 2.1335999999999999 }, "periods": [ { "number": 1, "name": "Today", "startTime": "2023-11-15T10:00:00-05:00", "endTime": "2023-11-15T18:00:00-05:00", "isDaytime": true, "temperature": 51, "temperatureUnit": "F", "temperatureTrend": null, "probabilityOfPrecipitation": { "unitCode": "wmoUnit:percent", "value": null }, "dewpoint": { "unitCode": "wmoUnit:degC", "value": 2.2222222222222223 }, "relativeHumidity": { "unitCode": "wmoUnit:percent", "value": 68 }, "windSpeed": "1 to 7 mph", "windDirection": "SW", "icon": "https://api.weather.gov/icons/land/day/bkn?size=medium", "shortForecast": "Partly Sunny", "detailedForecast": "Partly sunny, with a high near 51. Southwest wind 1 to 7 mph." }, { "number": 2, "name": "Tonight", "startTime": "2023-11-15T18:00:00-05:00", "endTime": "2023-11-16T06:00:00-05:00", "isDaytime": false, "temperature": 44, "temperatureUnit": "F", "temperatureTrend": null, "probabilityOfPrecipitation": { "unitCode": "wmoUnit:percent", "value": null }, "dewpoint": { "unitCode": "wmoUnit:degC", "value": 3.8888888888888888 }, "relativeHumidity": { "unitCode": "wmoUnit:percent", "value": 82 }, "windSpeed": "8 mph", "windDirection": "SW", "icon": "https://api.weather.gov/icons/land/night/sct?size=medium", "shortForecast": "Partly Cloudy", "detailedForecast": "Partly cloudy, with a low around 44. Southwest wind around 8 mph." }, { "number": 3, "name": "Thursday", "startTime": "2023-11-16T06:00:00-05:00", "endTime": "2023-11-16T18:00:00-05:00", "isDaytime": true, "temperature": 60, "temperatureUnit": "F", "temperatureTrend": "falling", "probabilityOfPrecipitation": { "unitCode": "wmoUnit:percent", "value": null }, "dewpoint": { "unitCode": "wmoUnit:degC", "value": 5.5555555555555554 }, "relativeHumidity": { "unitCode": "wmoUnit:percent", "value": 82 }, "windSpeed": "6 mph", "windDirection": "SW", "icon": "https://api.weather.gov/icons/land/day/few?size=medium", "shortForecast": "Sunny", "detailedForecast": "Sunny. High near 60, with temperatures falling to around 58 in the afternoon. Southwest wind around 6 mph." }, { "number": 4, "name": "Thursday Night", "startTime": "2023-11-16T18:00:00-05:00", "endTime": "2023-11-17T06:00:00-05:00", "isDaytime": false, "temperature": 47, "temperatureUnit": "F", "temperatureTrend": null, "probabilityOfPrecipitation": { "unitCode": "wmoUnit:percent", "value": null }, "dewpoint": { "unitCode": "wmoUnit:degC", "value": 6.1111111111111107 }, "relativeHumidity": { "unitCode": "wmoUnit:percent", "value": 80 }, "windSpeed": "3 mph", "windDirection": "SW", "icon": "https://api.weather.gov/icons/land/night/few?size=medium", "shortForecast": "Mostly Clear", "detailedForecast": "Mostly clear, with a low around 47. Southwest wind around 3 mph." }, { "number": 5, "name": "Friday", "startTime": "2023-11-17T06:00:00-05:00", "endTime": "2023-11-17T18:00:00-05:00", "isDaytime": true, "temperature": 63, "temperatureUnit": "F", "temperatureTrend": "falling", "probabilityOfPrecipitation": { "unitCode": "wmoUnit:percent", "value": 20 }, "dewpoint": { "unitCode": "wmoUnit:degC", "value": 12.222222222222221 }, "relativeHumidity": { "unitCode": "wmoUnit:percent", "value": 86 }, "windSpeed": "2 to 10 mph", "windDirection": "S", "icon": "https://api.weather.gov/icons/land/day/bkn/rain,20?size=medium", "shortForecast": "Partly Sunny then Slight Chance Light Rain", "detailedForecast": "A slight chance of rain after 1pm. Partly sunny. High near 63, with temperatures falling to around 61 in the afternoon. South wind 2 to 10 mph. Chance of precipitation is 20%." }, { "number": 6, "name": "Friday Night", "startTime": "2023-11-17T18:00:00-05:00", "endTime": "2023-11-18T06:00:00-05:00", "isDaytime": false, "temperature": 51, "temperatureUnit": "F", "temperatureTrend": null, "probabilityOfPrecipitation": { "unitCode": "wmoUnit:percent", "value": 70 }, "dewpoint": { "unitCode": "wmoUnit:degC", "value": 12.777777777777779 }, "relativeHumidity": { "unitCode": "wmoUnit:percent", "value": 100 }, "windSpeed": "6 to 10 mph", "windDirection": "SW", "icon": "https://api.weather.gov/icons/land/night/rain,60/rain,70?size=medium", "shortForecast": "Light Rain Likely", "detailedForecast": "Rain likely. Cloudy, with a low around 51. Chance of precipitation is 70%. New rainfall amounts between a quarter and half of an inch possible." }, { "number": 7, "name": "Saturday", "startTime": "2023-11-18T06:00:00-05:00", "endTime": "2023-11-18T18:00:00-05:00", "isDaytime": true, "temperature": 55, "temperatureUnit": "F", "temperatureTrend": null, "probabilityOfPrecipitation": { "unitCode": "wmoUnit:percent", "value": 70 }, "dewpoint": { "unitCode": "wmoUnit:degC", "value": 11.111111111111111 }, "relativeHumidity": { "unitCode": "wmoUnit:percent", "value": 100 }, "windSpeed": "8 to 18 mph", "windDirection": "NW", "icon": "https://api.weather.gov/icons/land/day/rain,70/rain,30?size=medium", "shortForecast": "Light Rain Likely", "detailedForecast": "Rain likely before 1pm. Partly sunny, with a high near 55. Chance of precipitation is 70%." }, { "number": 8, "name": "Saturday Night", "startTime": "2023-11-18T18:00:00-05:00", "endTime": "2023-11-19T06:00:00-05:00", "isDaytime": false, "temperature": 40, "temperatureUnit": "F", "temperatureTrend": null, "probabilityOfPrecipitation": { "unitCode": "wmoUnit:percent", "value": null }, "dewpoint": { "unitCode": "wmoUnit:degC", "value": 1.1111111111111112 }, "relativeHumidity": { "unitCode": "wmoUnit:percent", "value": 65 }, "windSpeed": "12 to 17 mph", "windDirection": "NW", "icon": "https://api.weather.gov/icons/land/night/few?size=medium", "shortForecast": "Mostly Clear", "detailedForecast": "Mostly clear, with a low around 40." }, { "number": 9, "name": "Sunday", "startTime": "2023-11-19T06:00:00-05:00", "endTime": "2023-11-19T18:00:00-05:00", "isDaytime": true, "temperature": 50, "temperatureUnit": "F", "temperatureTrend": null, "probabilityOfPrecipitation": { "unitCode": "wmoUnit:percent", "value": null }, "dewpoint": { "unitCode": "wmoUnit:degC", "value": -0.55555555555555558 }, "relativeHumidity": { "unitCode": "wmoUnit:percent", "value": 65 }, "windSpeed": "10 to 14 mph", "windDirection": "W", "icon": "https://api.weather.gov/icons/land/day/few?size=medium", "shortForecast": "Sunny", "detailedForecast": "Sunny, with a high near 50." }, { "number": 10, "name": "Sunday Night", "startTime": "2023-11-19T18:00:00-05:00", "endTime": "2023-11-20T06:00:00-05:00", "isDaytime": false, "temperature": 38, "temperatureUnit": "F", "temperatureTrend": null, "probabilityOfPrecipitation": { "unitCode": "wmoUnit:percent", "value": null }, "dewpoint": { "unitCode": "wmoUnit:degC", "value": -0.55555555555555558 }, "relativeHumidity": { "unitCode": "wmoUnit:percent", "value": 67 }, "windSpeed": "13 mph", "windDirection": "NW", "icon": "https://api.weather.gov/icons/land/night/few?size=medium", "shortForecast": "Mostly Clear", "detailedForecast": "Mostly clear, with a low around 38." }, { "number": 11, "name": "Monday", "startTime": "2023-11-20T06:00:00-05:00", "endTime": "2023-11-20T18:00:00-05:00", "isDaytime": true, "temperature": 46, "temperatureUnit": "F", "temperatureTrend": null, "probabilityOfPrecipitation": { "unitCode": "wmoUnit:percent", "value": null }, "dewpoint": { "unitCode": "wmoUnit:degC", "value": -1.6666666666666667 }, "relativeHumidity": { "unitCode": "wmoUnit:percent", "value": 70 }, "windSpeed": "13 mph", "windDirection": "NW", "icon": "https://api.weather.gov/icons/land/day/sct?size=medium", "shortForecast": "Mostly Sunny", "detailedForecast": "Mostly sunny, with a high near 46." }, { "number": 12, "name": "Monday Night", "startTime": "2023-11-20T18:00:00-05:00", "endTime": "2023-11-21T06:00:00-05:00", "isDaytime": false, "temperature": 38, "temperatureUnit": "F", "temperatureTrend": null, "probabilityOfPrecipitation": { "unitCode": "wmoUnit:percent", "value": null }, "dewpoint": { "unitCode": "wmoUnit:degC", "value": -1.1111111111111112 }, "relativeHumidity": { "unitCode": "wmoUnit:percent", "value": 70 }, "windSpeed": "10 mph", "windDirection": "N", "icon": "https://api.weather.gov/icons/land/night/sct?size=medium", "shortForecast": "Partly Cloudy", "detailedForecast": "Partly cloudy, with a low around 38." }, { "number": 13, "name": "Tuesday", "startTime": "2023-11-21T06:00:00-05:00", "endTime": "2023-11-21T18:00:00-05:00", "isDaytime": true, "temperature": 49, "temperatureUnit": "F", "temperatureTrend": null, "probabilityOfPrecipitation": { "unitCode": "wmoUnit:percent", "value": 30 }, "dewpoint": { "unitCode": "wmoUnit:degC", "value": 2.7777777777777777 }, "relativeHumidity": { "unitCode": "wmoUnit:percent", "value": 73 }, "windSpeed": "9 to 13 mph", "windDirection": "E", "icon": "https://api.weather.gov/icons/land/day/bkn/rain,30?size=medium", "shortForecast": "Partly Sunny then Chance Light Rain", "detailedForecast": "A chance of rain after 1pm. Partly sunny, with a high near 49. Chance of precipitation is 30%." }, { "number": 14, "name": "Tuesday Night", "startTime": "2023-11-21T18:00:00-05:00", "endTime": "2023-11-22T06:00:00-05:00", "isDaytime": false, "temperature": 46, "temperatureUnit": "F", "temperatureTrend": null, "probabilityOfPrecipitation": { "unitCode": "wmoUnit:percent", "value": 50 }, "dewpoint": { "unitCode": "wmoUnit:degC", "value": 7.7777777777777777 }, "relativeHumidity": { "unitCode": "wmoUnit:percent", "value": 86 }, "windSpeed": "13 to 18 mph", "windDirection": "S", "icon": "https://api.weather.gov/icons/land/night/rain,50?size=medium", "shortForecast": "Chance Light Rain", "detailedForecast": "A chance of rain. Mostly cloudy, with a low around 46. Chance of precipitation is 50%." } ] } } We parse the data with EvaluateJSONPath to get primary fields for the weather. We then format those fields to PutSlack. LLM Skipped. Read forecast on ${date} for ${weatherlocation} @ ${latitude},${longitude} Used ${forecasturl} ${icon} Temp: ${temperature} ${temperatureunit} - ${temperaturetrend} There is a wind ${winddirection} at ${windspeed}. ${detailedforecast} Slack Output If we do have an LLM question, let’s make sure it’s just one record. We use a few different models that are available at IBM WatsonX.AI on IBM Cloud to quickly be accessed by our REST prompts. I tested and built the prompts initially at IBM’s Prompt Lab and then copied the initial curl statement from there. Click here for supported foundation models available with IBM watsonx.ai. ibm/mpt-7b-instruct2meta-llama/llama-2–70b-chatibm/granite-13b-chat-v1 We have to send our unique secure key to IBM and they will give us a token to use in our next call. We parse out the question and then send it to WatsonX via REST API. We build a prompt to send to IBM as follows. { "model_id": "meta-llama/llama-2-70b-chat", "input": "${inputs:urlEncode()}", "parameters": { "decoding_method": "greedy", "max_new_tokens": 200, "min_new_tokens": 50, "stop_sequences": [], "repetition_penalty": 1 }, "project_id": "0ead8ec4-d137-4f9c-8956-50b0da4a7068" } We parse the generated text which is our Generative AI results plus some helpful metadata on timings. The result posted to Slack is as follows: “You can use Apache NiFi to integrate Generative AI models in several ways: Data Preprocessing: Use NiFi to preprocess data before feeding it into your Generative AI model. This can include data cleaning, transformation, and feature engineering. Model Training: Use NiFi to automate the training process of your Generative AI model. You can use NiFi’s PutFile and PutFile_SFTP processors to write the training data to a file, and then use a processor like ExecuteScript to run the training script. Model Deployment: Once your Generative AI model is trained, you can use NiFi to deploy it. You can create a NiFi flow that takes in input data, runs it through the Generative AI model, and then outputs the generated data. Real-time Inference: You can use NiFi’s StreamingJobs” After the Slackbot posted the results, it posted metrics and debugging information to the chat channel. All of the metadata is posted to another Slack channel for administrator monitoring. ==== NiFi to IBM WatsonX.AI LLM Answers On Date: Wed, 15 Nov 2023 15:43:29 GMT Created: 2023-11-15T15:43:29.248Z Prompt: Q: What is a good way to integrate Generative AI and Apache NiFi? Response: ) You can use Apache NiFi to integrate Generative AI models in several ways: 1. Data Preprocessing: Use NiFi to preprocess data before feeding it into your Generative AI model. This can include data cleaning, transformation, and feature engineering. 2. Model Training: Use NiFi to automate the training process of your Generative AI model. You can use NiFi's PutFile and PutFile_SFTP processors to write the training data to a file, and then use a processor like ExecuteScript to run the training script. 3. Model Deployment: Once your Generative AI model is trained, you can use NiFi to deploy it. You can create a NiFi flow that takes in input data, runs it through the Generative AI model, and then outputs the generated data. 4. Real-time Inference: You can use NiFi's StreamingJobs Token: 200 Req Duration: 8153 HTTP TX ID: 89d71099-da23-4e7e-89f9-4e8f5620c0fb IBM Msg: This model is a Non-IBM Product governed by a third-party license that may impose use restrictions and other obligations. By using this model you agree to its terms as identified in the following URL. URL: https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/fm-models.html?context=wx IBM Msg ID: disclaimer_warning Model ID: meta-llama/llama-2-70b-chat Stop Reason: max_tokens Token Count: 38 TX ID: NGp0djg-c05f740f84f84b7c80f93f9da05aa756 UUID: da0806cb-6133-4bf4-808e-1fbf419c09e3 Corr ID: NGp0djg-c05f740f84f84b7c80f93f9da05aa756 Global TX ID: 20c3a9cf276c38bcdaf26e3c27d0479b Service Time: 478 Request ID: 03c2726a-dcb6-407f-96f1-f83f20fe9c9c File Name: 1a3c4386-86d2-4969-805b-37649c16addb Request Duration: 8153 Request URL: https://us-south.ml.cloud.ibm.com/ml/v1-beta/generation/text?version=2023-05-29 cf-ray: 82689bfd28e48ce2-EWR ===== Make Your Own Slackbot Slack Output Kafka Distribute Apache Flink SQL Table Creation DDL CREATE TABLE `ssb`.`Meetups`.`watsonairesults` ( `date` VARCHAR(2147483647), `x_global_transaction_id` VARCHAR(2147483647), `x_request_id` VARCHAR(2147483647), `cf_ray` VARCHAR(2147483647), `inputs` VARCHAR(2147483647), `created_at` VARCHAR(2147483647), `stop_reason` VARCHAR(2147483647), `x_correlation_id` VARCHAR(2147483647), `x_proxy_upstream_service_time` VARCHAR(2147483647), `message_id` VARCHAR(2147483647), `model_id` VARCHAR(2147483647), `invokehttp_request_duration` VARCHAR(2147483647), `message` VARCHAR(2147483647), `uuid` VARCHAR(2147483647), `generated_text` VARCHAR(2147483647), `transaction_id` VARCHAR(2147483647), `tokencount` VARCHAR(2147483647), `generated_token` VARCHAR(2147483647), `ts` VARCHAR(2147483647), `advisoryId` VARCHAR(2147483647), `eventTimeStamp` TIMESTAMP(3) WITH LOCAL TIME ZONE METADATA FROM 'timestamp', WATERMARK FOR `eventTimeStamp` AS `eventTimeStamp` - INTERVAL '3' SECOND ) WITH ( 'deserialization.failure.policy' = 'ignore_and_log', 'properties.request.timeout.ms' = '120000', 'format' = 'json', 'properties.bootstrap.servers' = 'kafka:9092', 'connector' = 'kafka', 'properties.transaction.timeout.ms' = '900000', 'topic' = 'watsonxaillmanswers', 'scan.startup.mode' = 'group-offsets', 'properties.auto.offset.reset' = 'earliest', 'properties.group.id' = 'watsonxaillmconsumer' ) CREATE TABLE `ssb`.`Meetups`.`watsonxresults` ( `date` VARCHAR(2147483647), `x_global_transaction_id` VARCHAR(2147483647), `x_request_id` VARCHAR(2147483647), `cf_ray` VARCHAR(2147483647), `inputs` VARCHAR(2147483647), `created_at` VARCHAR(2147483647), `stop_reason` VARCHAR(2147483647), `x_correlation_id` VARCHAR(2147483647), `x_proxy_upstream_service_time` VARCHAR(2147483647), `message_id` VARCHAR(2147483647), `model_id` VARCHAR(2147483647), `invokehttp_request_duration` VARCHAR(2147483647), `message` VARCHAR(2147483647), `uuid` VARCHAR(2147483647), `generated_text` VARCHAR(2147483647), `transaction_id` VARCHAR(2147483647), `tokencount` VARCHAR(2147483647), `generated_token` VARCHAR(2147483647), `ts` VARCHAR(2147483647), `eventTimeStamp` TIMESTAMP(3) WITH LOCAL TIME ZONE METADATA FROM 'timestamp', WATERMARK FOR `eventTimeStamp` AS `eventTimeStamp` - INTERVAL '3' SECOND ) WITH ( 'deserialization.failure.policy' = 'ignore_and_log', 'properties.request.timeout.ms' = '120000', 'format' = 'json', 'properties.bootstrap.servers' = 'kafka:9092', 'connector' = 'kafka', 'properties.transaction.timeout.ms' = '900000', 'topic' = 'watsonxaillm', 'scan.startup.mode' = 'group-offsets', 'properties.auto.offset.reset' = 'earliest', 'properties.group.id' = 'allwatsonx1' ) Example Prompt {"inputs":"Please answer to the following question. What is the capital of the United States?"} IBM DB2 SQL alter table "DB2INST1"."TRAVELADVISORY" add column "summary" VARCHAR(2048); -- DB2INST1.TRAVELADVISORY definition CREATE TABLE "DB2INST1"."TRAVELADVISORY" ( "TITLE" VARCHAR(250 OCTETS) , "PUBDATE" VARCHAR(250 OCTETS) , "LINK" VARCHAR(250 OCTETS) , "GUID" VARCHAR(250 OCTETS) , "ADVISORYID" VARCHAR(250 OCTETS) , "DOMAIN" VARCHAR(250 OCTETS) , "CATEGORY" VARCHAR(4096 OCTETS) , "DESCRIPTION" VARCHAR(4096 OCTETS) , "UUID" VARCHAR(250 OCTETS) NOT NULL , "TS" BIGINT NOT NULL , "summary" VARCHAR(2048 OCTETS) ) IN "IBMDB2SAMPLEREL" ORGANIZE BY ROW; ALTER TABLE "DB2INST1"."TRAVELADVISORY" ADD PRIMARY KEY ("UUID") ENFORCED; GRANT CONTROL ON TABLE "DB2INST1"."TRAVELADVISORY" TO USER "DB2INST1"; GRANT CONTROL ON INDEX "SYSIBM "."SQL230620142604860" TO USER "DB2INST1"; SELECT "summary", TITLE , ADVISORYID , TS, PUBDATE FROM DB2INST1.TRAVELADVISORY t WHERE "summary" IS NOT NULL ORDER BY ts DESC Example Output Email GitHub README GitHub repo Video Source Code Source Code
In today's world of distributed systems and microservices, it is crucial to maintain consistency. Microservice architecture is considered almost a standard for building modern, flexible, and reliable high-loaded systems. But at the same time introduces additional complexities. Monolith vs Microservices In monolithic applications, consistency can be achieved using transactions. Within a transaction, we can modify data in multiple tables. If an error occurred during the modification process, the transaction would roll back and the data would remain consistent. Thus consistency was achieved by the database tools. In a microservice architecture, things get much more complicated. At some point, we will have to change data not only in the current microservice but also in other microservices. Imagine a scenario where a user interacts with a web application and creates an order on the website. When the order is created, it is necessary to reduce the number of items in stock. In a monolithic application, this could look like the following: In a microservice architecture, such tables can change within different microservices. When creating an order, we need to call another service using, for example, REST or Kafka. But there are many problems here: the request may fail, the network or the microservice may be temporarily unavailable, the microservice may stop immediately after creating a record in the orders table and the message will not be sent, etc. Transactional Outbox One solution to this problem is to use the transactional outbox pattern. We can create an order and a record in the outbox table within one transaction, where we will add all the necessary data for a future event. A specific handler will read this record and send the event to another microservice. This way we ensure that the event will be sent if we have successfully created an order. If the network or microservice is unavailable, then the handler will keep trying to send the message until it receives a successful response. This will result in eventual consistency. It is worth noting here that it is necessary to support idempotency because, in such architectures, request processing may be duplicated. Implementation Let's consider an example of implementation in a Spring Boot application. We will use a ready solution transaction-outbox. First, let's start PostgreSQL in Docker: Shell docker run -d -p 5432:5432 --name db \ -e POSTGRES_USER=admin \ -e POSTGRES_PASSWORD=password \ -e POSTGRES_DB=demo \ postgres:12-alpine Add a dependency to build.gradle: Groovy implementation 'com.gruelbox:transactionoutbox-spring:5.3.370' Declare the configuration: Java @Configuration @EnableScheduling @Import({ SpringTransactionOutboxConfiguration.class }) public class TransactionOutboxConfig { @Bean public TransactionOutbox transactionOutbox(SpringTransactionManager springTransactionManager, SpringInstantiator springInstantiator) { return TransactionOutbox.builder() .instantiator(springInstantiator) .initializeImmediately(true) .retentionThreshold(Duration.ofMinutes(5)) .attemptFrequency(Duration.ofSeconds(30)) .blockAfterAttempts(5) .transactionManager(springTransactionManager) .persistor(Persistor.forDialect(Dialect.POSTGRESQL_9)) .build(); } } Here we specify how many attempts should be made in case of unsuccessful request sending, the interval between attempts, etc. For the functioning of a separate thread that will parse records from the outbox table, we need to call outbox.flush() periodically. For this purpose, let's declare a component: Java @Component @AllArgsConstructor public class TransactionOutboxWorker { private final TransactionOutbox transactionOutbox; @Scheduled(fixedDelay = 5000) public void flushTransactionOutbox() { transactionOutbox.flush(); } } The execution time of flush should be chosen according to your requirements. Now we can implement the method with business logic. We need to create an Order in the database and send the event to another microservice. For demonstration purposes, I will not implement the actual call but will simulate the error of sending the event by throwing an exception. The method itself should be marked @Transactional, and the event sending should be done not directly, but using the TransactionOutbox object: Java @Service @AllArgsConstructor @Slf4j public class OrderService { private OrderRepository repository; private TransactionOutbox outbox; @Transactional public String createOrderAndSendEvent(Integer productId, Integer quantity) { String uuid = UUID.randomUUID().toString(); repository.save(new OrderEntity(uuid, productId, quantity)); outbox.schedule(getClass()).sendOrderEvent(uuid, productId, quantity); return uuid; } void sendOrderEvent(String uuid, Integer productId, Integer quantity) { log.info(String.format("Sending event for %s...", uuid)); if (ThreadLocalRandom.current().nextBoolean()) throw new RuntimeException(); log.info(String.format("Event sent for %s", uuid)); } } Here randomly the method may throw an exception. However, the key feature is that this method is not called directly, and the call information is stored in the Outbox table within a single transaction. Let's start the service and execute the query: Shell curl --header "Content-Type: application/json" \ --request POST \ --data '{"productId":"10","quantity":"2"}' \ http://localhost:8080/order {"id":"6a8e2960-8e94-463b-90cb-26ce8b46e96c"} If the method is successful, the record is removed from the table, but if there is a problem, we can see the record in the table: Shell docker exec -ti <CONTAINER ID> bash psql -U admin demo psql (12.16) Type "help" for help. demo=# \x Expanded display is on. demo=# SELECT * FROM txno_outbox; -[ RECORD 1 ]---+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ id | d0b69f7b-943a-44c9-9e71-27f738161c8e invocation | {"c":"orderService","m":"sendOrderEvent","p":["String","Integer","Integer"],"a":[{"t":"String","v":"6a8e2960-8e94-463b-90cb-26ce8b46e96c"},{"t":"Integer","v":10},{"t":"Integer","v":2}]} nextattempttime | 2023-11-19 17:59:12.999 attempts | 1 blocked | f version | 1 uniquerequestid | processed | f lastattempttime | 2023-11-19 17:58:42.999515 Here we can see the parameters of the method call, the time of the next attempt, the number of attempts, etc. According to your settings, the handler will try to execute the request until it succeeds or until it reaches the limit of attempts. This way, even if our service restarts (which is considered normal for cloud-native applications), we will not lose important data about the external service call, and eventually the message will be delivered to the recipient. Conclusion Transactional outbox is a powerful solution for addressing data consistency issues in distributed systems. It provides a reliable and organized approach to managing transactions between microservices. This greatly reduces the risks associated with data inconsistency. We have examined the fundamental principles of the transactional outbox pattern, its implementation, and its benefits in maintaining a coherent and synchronized data state. The project code is available on GitHub.
In the dynamic landscape of web development, the choice of an API technology plays a pivotal role in determining the success and efficiency of a project. In this article, we embark on a comprehensive exploration of three prominent contenders: REST, gRPC, and GraphQL. Each of these technologies brings its own set of strengths and capabilities to the table, catering to different use cases and development scenarios. What Is REST? REST API, or Representational State Transfer Application Programming Interface, is a set of architectural principles and conventions for building web services. It provides a standardized way for different software applications to communicate with each other over the Internet. REST is often used in the context of web development to create scalable and maintainable APIs that can be easily consumed by a variety of clients, such as web browsers or mobile applications. Key characteristics of a REST API include: Statelessness: Each request from a client to a server contains all the information needed to understand and process the request. The server does not store any information about the client's state between requests. This enhances scalability and simplifies the implementation on both the client and server sides. Resource-based: REST APIs are centered around resources, which are identified by URLs (Uniform Resource Locators). These resources can represent entities like objects, data, or services. CRUD (Create, Read, Update, Delete) operations are performed on these resources using standard HTTP methods like GET, POST, PUT, and DELETE. Representation: Resources are represented in a format such as JSON (JavaScript Object Notation) or XML (eXtensible Markup Language). Clients can request different representations of a resource, and the server will respond with the data in the requested format. Uniform interface: REST APIs maintain a uniform interface, making it easy for developers to understand and work with different APIs. This uniformity is achieved through a set of constraints, including statelessness, resource-based representation, and standard HTTP methods. Stateless communication: Communication between the client and server is stateless, meaning that each request from the client contains all the information necessary for the server to fulfill that request. The server does not store any information about the client's state between requests. Client-server architecture: REST APIs follow a client-server architecture, where the client and server are independent entities that communicate over a network. This separation allows for flexibility and scalability, as changes to one component do not necessarily affect the other. Cacheability: Responses from the server can be explicitly marked as cacheable or non-cacheable, allowing clients to optimize performance by caching responses when appropriate. REST APIs are widely used in web development due to their simplicity, scalability, and compatibility with the HTTP protocol. They are commonly employed to enable communication between different components of a web application, including front-end clients and back-end servers, or to facilitate integration between different software systems. Pros and Cons of REST REST has several advantages that contribute to its widespread adoption in web development. One key advantage is its simplicity, as RESTful APIs are easy to understand and implement. This simplicity accelerates the development process and facilitates integration between different components of a system. The statelessness of RESTful communication allows for easy scalability, as each request from the client contains all the necessary information, and servers don't need to maintain client state between requests. REST's flexibility, compatibility with various data formats (commonly JSON), and support for caching enhance its overall performance. Its well-established nature and support from numerous tools and frameworks make REST a popular and accessible choice for building APIs. However, REST does come with certain disadvantages. One notable challenge is the potential for over-fetching or under-fetching of data, where clients may receive more information than needed or insufficient data, leading to additional requests. The lack of flexibility in data retrieval, especially in scenarios where clients require specific data combinations, can result in inefficiencies. Additionally, while REST is excellent for stateless communication, it lacks built-in support for real-time features, requiring developers to implement additional technologies or workarounds for immediate data updates. Despite these limitations, the advantages of simplicity, scalability, and widespread support make REST a robust choice for many web development projects. What Is gPRC? gRPC, which stands for "gRPC Remote Procedure Calls," is an open-source RPC (Remote Procedure Call) framework developed by Google. It uses HTTP/2 as its transport protocol and Protocol Buffers (protobuf) as the interface description language. gRPC facilitates communication between client and server applications, allowing them to invoke methods on each other as if they were local procedures, making it a powerful tool for building efficient and scalable distributed systems. Key features of gRPC include: Performance: gRPC is designed to be highly efficient, leveraging the capabilities of HTTP/2 for multiplexing multiple requests over a single connection. It also uses Protocol Buffers, a binary serialization format, which results in faster and more compact data transmission compared to traditional text-based formats like JSON. Language agnostic: gRPC supports multiple programming languages, enabling developers to build applications in languages such as Java, C++, Python, Go, Ruby, and more. This language-agnostic nature promotes interoperability between different components of a system. IDL (Interface Definition Language): gRPC uses Protocol Buffers as its IDL for defining the service methods and message types exchanged between the client and server. This provides a clear and structured way to define APIs, allowing for automatic code generation in various programming languages. Bidirectional streaming: One of gRPC's notable features is its support for bidirectional streaming. This means that both the client and server can send a stream of messages to each other over a single connection, providing flexibility in communication patterns. Code generation: gRPC generates client and server code based on the service definition written in Protocol Buffers. This automatic code generation simplifies the development process and ensures that the client and server interfaces are in sync. Strong typing: gRPC uses strongly typed messages and service definitions, reducing the chances of runtime errors, and making the communication between services more robust. Support for authentication and authorization: gRPC supports various authentication mechanisms, including SSL/TLS for secure communication. It also allows for the implementation of custom authentication and authorization mechanisms. gRPC is particularly well-suited for scenarios where high performance, scalability, and efficient communication between distributed systems are critical, such as in microservices architectures. Its use of modern protocols and technologies makes it a compelling choice for building complex and scalable applications. Pros and Cons of gPRC gRPC presents several advantages that contribute to its popularity in modern distributed systems. One key strength is its efficiency, as it utilizes the HTTP/2 protocol, enabling multiplexing of multiple requests over a single connection and reducing latency. This efficiency, combined with the use of Protocol Buffers for serialization, results in faster and more compact data transmission compared to traditional REST APIs, making gRPC well-suited for high-performance applications. The language-agnostic nature of gRPC allows developers to work with their preferred programming languages, promoting interoperability in heterogeneous environments. The inclusion of bidirectional streaming and strong typing through Protocol Buffers further enhances its capabilities, offering flexibility and reliability in communication between client and server components. While gRPC offers substantial advantages, it comes with certain challenges. One notable drawback is the learning curve associated with adopting gRPC, particularly for teams unfamiliar with Protocol Buffers and the concept of remote procedure calls. Debugging gRPC services can be more challenging due to the binary nature of Protocol Buffers, requiring specialized tools and knowledge for effective troubleshooting. Additionally, the maturity of the gRPC ecosystem may vary across different languages and platforms, potentially impacting the availability of third-party libraries and community support. Integrating gRPC into existing systems or environments that do not fully support HTTP/2 may pose compatibility challenges, requiring careful consideration before migration. Despite these challenges, the efficiency, flexibility, and performance benefits make gRPC a compelling choice for certain types of distributed systems. What Is GraphQL? GraphQL is a query language for APIs (Application Programming Interfaces) and a runtime for executing those queries with existing data. It was developed by Facebook in 2012 and later open-sourced in 2015. GraphQL provides a more efficient, powerful, and flexible alternative to traditional REST APIs by allowing clients to request only the specific data they need. Key features of GraphQL include: Declarative data fetching: Clients can specify the structure of the response they need, including nested data and relationships, in a single query. This eliminates over-fetching and under-fetching of data, ensuring that clients precisely receive the information they request. Single endpoint: GraphQL APIs typically expose a single endpoint, consolidating multiple RESTful endpoints into one. This simplifies the API surface and allows clients to request all the required data in a single query. Strong typing and schema: GraphQL APIs are defined by a schema that specifies the types of data that can be queried and the relationships between them. This schema provides a clear contract between clients and servers, enabling strong typing and automatic validation of queries. Real-time updates (subscriptions): GraphQL supports real-time data updates through a feature called subscriptions. Clients can subscribe to specific events, and the server will push updates to the client when relevant data changes. Introspection: GraphQL APIs are self-documenting. Clients can query the schema itself to discover the types, fields, and relationships available in the API, making it easier to explore and understand the data model. Batched queries: Clients can send multiple queries in a single request, reducing the number of network requests and improving efficiency. Backend aggregation: GraphQL allows the backend to aggregate data from multiple sources, such as databases, microservices, or third-party APIs, and present it to the client in a unified way. GraphQL is often used in modern web development, particularly in single-page applications (SPAs) and mobile apps, where optimizing data transfer and minimizing over-fetching are crucial. It has gained widespread adoption and is supported by various programming languages and frameworks, both on the client and server sides. Deciding the Right API Technology Choosing between REST, gRPC, and GraphQL depends on the specific requirements and characteristics of your project. Each technology has its strengths and weaknesses, making them more suitable for certain use cases. Here are some considerations for when to choose REST, gRPC, or GraphQL: Choose REST when: Simplicity is key: REST is straightforward and easy to understand. If your project requires a simple and intuitive API, REST might be the better choice. Statelessness is sufficient: If statelessness aligns well with your application's requirements and you don't need advanced features like bidirectional streaming, REST is a good fit. Widespread adoption and compatibility: If you need broad compatibility with various clients, platforms, and tooling, REST is well-established and widely supported. Choose gRPC when: High performance is critical: gRPC is designed for high-performance communication, making it suitable for scenarios where low latency and efficient data transfer are crucial, such as microservices architectures. Strong typing is important: If you value strong typing and automatic code generation for multiple programming languages, gRPC's use of Protocol Buffers can be a significant advantage. Bidirectional streaming is needed: For applications that require bidirectional streaming, real-time updates, and efficient communication between clients and servers, gRPC provides a robust solution. Choose GraphQL when: Flexible data retrieval is required: If your application demands flexibility in data retrieval and allows clients to specify the exact data they need, GraphQL's query language provides a powerful and efficient solution. Reducing over-fetching and under-fetching is a priority: GraphQL helps eliminate over-fetching and under-fetching of data by allowing clients to request only the specific data they need. This is beneficial in scenarios where optimizing data transfer is crucial. Real-time updates are essential: If real-time features and the ability to subscribe to data updates are critical for your application (e.g., chat applications, live notifications), GraphQL's support for subscriptions makes it a strong contender. Ultimately, the choice between REST, gRPC, and GraphQL should be based on a careful evaluation of your project's requirements, existing infrastructure, and the specific features offered by each technology. Additionally, consider factors such as developer familiarity, community support, and ecosystem maturity when making your decision. It's also worth noting that hybrid approaches, where different technologies are used for different parts of an application, can be viable in certain scenarios. Conclusion The choice between REST, gRPC, and GraphQL is a nuanced decision that hinges on the specific requirements and objectives of a given project. REST, with its simplicity and widespread adoption, remains a solid choice for scenarios where ease of understanding and compatibility are paramount. Its statelessness and broad support make it an excellent fit for many web development projects. On the other hand, gRPC emerges as a powerful contender when high performance and efficiency are critical, particularly in microservices architectures. Its strong typing, bidirectional streaming, and automatic code generation make it well-suited for applications demanding low-latency communication and real-time updates. Meanwhile, GraphQL addresses the need for flexible data retrieval and the elimination of over-fetching and under-fetching, making it an optimal choice for scenarios where customization and optimization of data transfer are essential, especially in applications requiring real-time features. Ultimately, the decision should be guided by a careful assessment of project requirements, developer expertise, and the specific features offered by each technology, recognizing that a hybrid approach may offer a pragmatic solution in certain contexts.
Improving an organization's overall data capabilities enables teams to operate more efficiently. Emerging technologies have brought real-time data closer to business users, which plays a critical role in effective decision-making. In data analytics, the "hot path" and "cold path" refer to two distinct processing routes for handling data. The hot path involves real-time or near-real-time processing of data, where information is analyzed and acted upon immediately as it arrives. This path is crucial for time-sensitive applications, enabling quick responses to emerging trends or events. On the other hand, the cold path involves the batch processing of historical or less time-sensitive data, allowing for in-depth analysis, long-term trends identification, and comprehensive reporting, making it ideal for strategic planning and retrospective insights in data analytics workflows. In typical analytics solutions, the integration of incoming telemetry data with corresponding meta-data related to entities such as devices, users, or applications is a prerequisite on the server side before effective visualization in an application can occur. In this article, we will explore innovative methodologies for seamlessly combining data from diverse sources so that an effective dashboard can be built. The Event-Driven Architecture for Real-Time Anomalies Let's explore a real-time dashboard wherein administrators meticulously monitor network usage. In this scenario, live data on network usage from each device is transmitted in real-time, undergoing aggregation on the server side, inclusive of associating the data with respective client names before refreshing the user's table. In such use cases, the implementation of Event-Driven architecture patterns emerges as the optimal approach for ensuring seamless data processing and real-time insights. Event-driven design seamlessly orchestrates data flow between disparate microservices, enabling the aggregation of critical data points. Through clearly defined events, information from two distinct microservices is aggregated, ensuring real-time updates. The culmination of this event-driven approach provides a comprehensive and up-to-date representation of key metrics and insights for informed decision-making. In the depicted scenario, the telemetry data is seamlessly transmitted to the service bus for integration into the Dashboard service. Conversely, device metadata exhibits infrequent changes. Upon receipt of new telemetry events, the Dashboard service dynamically augments each record with all relevant metadata, presenting a comprehensive dataset for consumption by APIs. This entire process unfolds in real-time, empowering administrators to promptly identify network anomalies and initiate timely corrective measures. This methodology proves effective for those real-time scenarios, characterized by frequent incremental data ingestion to the server and a resilient system for processing those events. The Materialized View Architecture for Historical Reports For a historical report dashboard, adopting an event-driven approach might entail unnecessary effort, given that real-time updates are not imperative. A more efficient strategy would involve leveraging PostgreSQL Materialized Views, which is particularly suitable for handling bursty data updates. This approach allows for scheduled data crunching at predefined intervals, such as daily, weekly, or monthly, aligning with the periodic nature of the reporting requirements. PostgreSQL Materialized Views provide a robust mechanism for persistently storing the results of complex joins between disparate tables as physical tables. One of the standout advantages of materialized views is their ability to significantly improve the efficiency of data retrieval operations in APIs, as a considerable portion of the data is pre-computed. The incorporation of materialized views within PostgreSQL represents a substantial performance boost for read queries, particularly beneficial when the application can tolerate older, stale data. This feature serves to reduce disk access and streamline complex query computations by transforming the result set of a view into a tangible physical table. Let’s look at the above example with Device telemetry and metadata tables. The mat view can be created by the command below in SQL. SQL CREATE MATERIALIZED VIEW device_health_mat AS SELECT t.bsod_count, t.storage_used, t.date FROM device_telemetry t INNER JOIN device d ON t.ID = d.ID WITH DATA; Materialized views are beneficial in data warehousing and business intelligence applications where complex queries, data transformation, and aggregations are the norms. You can leverage materialized views when you have complex queries powering user-facing visualizations that need to load quickly to provide a great user experience. The only bottleneck with them is that the refresh needs to be explicitly done when the underlying tables have new data and can be scheduled with the command below. SQL REFRESH MATERIALIZED VIEW device_health_mat; (or) REFRESH MATERIALIZED VIEW CONCURRENTLY device_health_mat; In conclusion, while both aforementioned use cases share a dashboard requirement, the selection of tools and design must be meticulously tailored to the specific usage patterns to ensure the effectiveness of the solution.
Let’s say you’ve got 8 people on one of your engineering squads. Your daily standup takes 15 minutes a day. That’s 75 minutes per week, or roughly 3750 minutes per person. You’ve got 8 people, so that’s 30,000 minutes each year for your team or 500 hours. That works out to about 12.5 weeks spent in daily standup for your team.If your average engineer is earning $75,000, then you’re spending about $18,000 each year on your daily standup (not to mention opportunity cost, which is a whole topic in itself). But if your daily standups take 30 minutes, it costs you $36,000. And if you have 10 squads like this in your company, that’s $360,000. So we sure as heck better make our standups count! Five Ways I See Companies Waste Time in Standups (There Are More) The daily standup is envisioned as a 15-minute pulse check, but it often morphs into a parade of distractions, irrelevancies, and just plain inefficiencies. Company A The back-end engineering team at Company A religiously congregates every morning around their Scrum board. The standup never exceeds 15 minutes. But their updates are a monotonous loop of "I did X, today I'll do Y" – an agile charade with zero alignment with sprint goals. The engineers don’t even have sprint goals. The product manager prioritises the backlog, but no two-way conversations are happening. Problem 1: No Alignment With Sprint Goals Anecdotally, an engineer at a FinTech firm recently told me, “Strategic alignment comes from our initiative. I take the initiative to ask about it, but not everyone does that”. Lack of alignment with sprint goals leads to a lack of direction and purpose, causing team members to operate in isolation rather than as a cohesive unit working toward a shared objective. This inefficiency wastes valuable resources and could derail other tasks that are critical to the project. It also impacts stakeholder satisfaction and puts the team out of sync with broader strategic objectives. The very foundation of the Scrum framework is to inspect progress toward the Sprint Goal. Daily Scrums are designed for task updates and as opportunities to adapt and pivot towards achieving that Sprint Goal. Without this focal point, the team misses out on critical chances to adjust course, while eroding cohesion and motivation. Company B Company B has a fully distributed, remote team scattered across 9 hours’ worth of time zones. Ada, from Poland, finds that the live standup in her afternoon takes her out of her flow at peak concentration time. Brent in San Francisco grumbles about needing to log in early while juggling kids who haven't yet been sent off to school. And Clem, who, due to caregiving responsibilities, often misses the live standup, has to catch up through meeting reruns and often misses the chance to contribute. Problem 2: Operational Inefficiencies The logistical gymnastics needed for live standups often means getting in the way of engineers’ personal lives or interrupting their workflow. Time-zone clashes? Check. Flow state broken? Double-check. Is everyone mildly irritated? Triple-check. Problem 3: Lack of Flexibility More like a one-size-fits-none. Failing to adapt the standup format to cater to diverse needs and lifestyles often leads to dissatisfaction and poor adoption. And when you're remote, that dissatisfaction can quietly fester into full-blown detachment. Company C For Company C, the “daily standup” is something of a Proustian endeavour. Complex topics are discussed ad nauseam, sometimes taking up to 45-60 minutes a day. Today, Luis and Zaynab spend 15 minutes dissecting a problem, leaving everyone else twiddling their thumbs or heading off their second screen to multitask. 15 minutes turned out not to be long enough to fix it, and they decided to continue their chat later – but not until they’d wasted a collective 120 working minutes of engineering time for everyone else. A few of the team members are introverted, and the forum doesn’t feel like a good one for them to share their thoughts. They get meeting fatigue alongside the inevitable information overload. Problem 4: Redundancy and Overload When standups turn into mini-hackathons or ad hoc troubleshooting sessions, important issues either get glossed over or spiral into time-consuming digressions. We all end up stuck in this labyrinth of futile discussion, staggering out with meeting fatigue and information overload. Problem 5: Engagement At this point, people are emotionally checked out. The more vocal become inadvertent meeting hoggers, while others feel their contributions being sidelined. It's a fertile ground for a toxic culture that breeds disengagement and kills morale. How Asynchronous Standups (Sort Of) Help None of the scenarios we covered above captures the intended spirit of a stand-up meeting, which is to streamline the workflow and foster cohesion among team members. It's time to part ways with practices that suck the productivity out of our workday. So let’s consider asynchronous stand-ups. Traditional work schedules don't account for the natural ebb and flow of human energy, let alone the strain of managing personal lives alongside work commitments. Asynchronous stand-ups check a lot of boxes for the modern workforce. Where Asynchronous Stand-Ups Shine 1. Universal Participation: Time zones become irrelevant. Whether you're dialling in from Tokyo or Texas, you have an equal seat at the virtual table. 2. Your Pace, Your Place: These stand-ups play by your rules, allowing you to time your updates according to your work rhythm, not the other way around. 3. Efficiency: Bid farewell to meetings that drag on and interrupt your flow state. Asynchronous stand-ups are short and to the point. 4. Documented Progress: Think of it as an automatic diary of team updates. No need to chase minutes or compile reports; it's all there. 5. Quality Over Quantity: The chance to craft your update promotes clear and concise communication, rather than on-the-spot, often incoherent chatter. However, relying solely on asynchronous standups might not address all the complexities of collaboration effectively. In the current landscape, artificial intelligence stands prepared to integrate into your existing communication tools seamlessly. It can adeptly monitor and comprehend your projects and objectives, capturing the essence of decisions and activities that mould them. Unlike humans, who grapple with the inability to retain every detail and extract the pertinent elements, AI remains unhindered by these constraints. Our memory is finite, we lose sight of our goals, and overlook occurrences, all due to the limitations of time. AI, on the other hand, operates outside of these confines, positioning it as a far superior contender to succinctly summarise daily events or provide insights into ongoing matters. Getting Asynchronous Stand-Ups Right Asynchronous stand-ups work when everyone’s on the same page about how to do them at your organisation. Always ensure everyone understands what’s expected of them, and how to go about their asynchronous stand-up. Stay Consistent: Insist on a clear format that focuses on completed tasks, work in progress, and any roadblocks. Flag for Follow-up: Asynchronous doesn't mean antisocial. Urgent matters should be highlighted for immediate attention. A virtual pow-wow can still happen if needed. Time Frames Matter: Establish a window within which all updates and responses should be posted to maintain momentum. Clear Paths for Emergencies: Identify a procedure for raising and addressing urgent issues promptly. Making AI Alternatives Work Human memory has its limitations. We're bound to forget, overlook, or downright miss out on crucial updates. AI doesn't share these shortcomings. To make the most out of this tool, just stick to some collaboration fundamentals: Open Dialogue: Transparency is key. Holding the majority of discussions in private channels or direct messages is generally not advisable unless there's a specific reason like confidentiality concerns or data protection. Use Your Tools Effectively: Whether it's crafting well-structured Git commits, keeping your Jira boards current, or meticulously documenting workflows in Notion, encourage correct tool usage across the board. The beauty of AI is its ability to draw smart conclusions from the data it's given. By actively participating in an AI-enhanced asynchronous stand-up, your team stands to gain significantly. AI tools can highlight important activities and provide clear answers to any queries about ongoing projects.
A Guide to Missing Sprint Goals
December 4, 2023
by
CORE
December 4, 2023 by
How Big Data Is Saving Lives in Real Time: IoV Data Analytics Helps Prevent Accidents
December 4, 2023 by
Navigating API Governance: Best Practices for Product Managers
December 4, 2023 by
Explainable AI: Making the Black Box Transparent
May 16, 2023 by
Navigating API Governance: Best Practices for Product Managers
December 4, 2023 by
From Docker Swarm to Kubernetes: Transitioning and Scaling
December 4, 2023 by
Low Code vs. Traditional Development: A Comprehensive Comparison
May 16, 2023 by
From Docker Swarm to Kubernetes: Transitioning and Scaling
December 4, 2023 by
How To Convert MySQL Database to SQL Server
December 4, 2023
by
CORE
Navigating API Governance: Best Practices for Product Managers
December 4, 2023 by
A Guide to Missing Sprint Goals
December 4, 2023
by
CORE
Low Code vs. Traditional Development: A Comprehensive Comparison
May 16, 2023 by
Automate JBoss Web Server Deployment With the Red Hat Certified Content Collection for JWS
December 4, 2023 by
December 4, 2023 by
Five IntelliJ Idea Plugins That Will Change the Way You Code
May 15, 2023 by