UK-US Data Bridge: Join TechnologyAdvice and OneTrust as they discuss the UK extension to the EU-US Data Privacy Framework (DPF).
Migrate, Modernize and Build Java Web Apps on Azure: This live workshop will cover methods to enhance Java application development workflow.
CodeCraft: Agile Strategies for Crafting Exemplary Software
The Advantage of Using Cache to Decouple the Frontend Code
Observability and Application Performance
Making data-driven decisions, as well as business-critical and technical considerations, first comes down to the accuracy, depth, and usability of the data itself. To build the most performant and resilient applications, teams must stretch beyond monitoring into the world of data, telemetry, and observability. And as a result, you'll gain a far deeper understanding of system performance, enabling you to tackle key challenges that arise from the distributed, modular, and complex nature of modern technical environments.Today, and moving into the future, it's no longer about monitoring logs, metrics, and traces alone — instead, it’s more deeply rooted in a performance-centric team culture, end-to-end monitoring and observability, and the thoughtful usage of data analytics.In DZone's 2023 Observability and Application Performance Trend Report, we delve into emerging trends, covering everything from site reliability and app performance monitoring to observability maturity and AIOps, in our original research. Readers will also find insights from members of the DZone Community, who cover a selection of hand-picked topics, including the benefits and challenges of managing modern application performance, distributed cloud architecture considerations and design patterns for resiliency, observability vs. monitoring and how to practice both effectively, SRE team scalability, and more.
Identity and Access Management
Core PostgreSQL
What Is a Micro Frontend? Micro frontends are an architectural approach that extends the concept of microservices to the front end of web applications. In a micro frontend architecture, a complex web application is broken down into smaller, independently deployable, and maintainable units called micro frontends. Each micro frontend is responsible for a specific part of the user interface and its related functionality. Key characteristics and concepts of micro frontends include: Independence: Micro frontends are self-contained and independently developed, tested, and deployed. This autonomy allows different teams to work on different parts of the application with minimal coordination. Technology agnostic: Each micro frontend can use different front-end technologies (e.g., Angular, React, Vue.js) as long as they can be integrated into the parent or Shell application. Isolation: Micro frontends are isolated from each other, both in terms of code and dependencies. This isolation ensures that changes in one micro frontend do not impact others. Integration: A container or Shell application is responsible for integrating and orchestrating the micro frontends. It provides the overall structure of the user interface and handles routing between micro frontends. Independent deployment: Micro frontends can be deployed independently, allowing for continuous delivery and faster updates. This reduces the risk of regression issues and accelerates the release cycle. Loose coupling: Micro frontends communicate through well-defined APIs and shared protocols, such as HTTP, allowing them to be loosely coupled. This separation of concerns simplifies development and maintenance. User Interface composition: The container application assembles the user interface by composing the micro frontends together. This composition can be done on the server-side (Server-Side Includes) or client-side (Client-Side Routing). Scaling and performance: Micro frontends enable the horizontal scaling of specific parts of an application, helping to optimize performance for frequently accessed areas. Decentralized teams: Different teams or development groups can work on individual micro frontends. This decentralization is advantageous for large or distributed organizations. Micro frontend architectures are particularly useful in large, complex web applications, where a monolithic approach might lead to development bottlenecks, increased complexity, and difficulties in maintaining and scaling the application. By using micro frontends, organizations can achieve greater flexibility, agility, and maintainability in their front-end development processes, aligning with the broader trend of microservices in the world of software architecture. Micro Frontends Hosted Into a Single Shell UI Let's look at how two Angular micro frontends can be hosted into a single Shell UI. To host two Angular micro frontends in a single Shell Angular UI, you can use a micro frontend framework like single-spa or qiankun to achieve this. These frameworks enable you to integrate multiple independently developed micro frontends into a single application Shell. Here’s a high-level overview of how to set up such an architecture: 1. Create the Shell Angular Application Set up your Shell Angular application as the main container for hosting the micro frontends. You can create this application using the Angular CLI or any other preferred method. 2. Create the Micro Frontends Create your two Angular micro frontends as separate Angular applications. Each micro frontend should have its own routing and functionality. 3. Configure Routing for Micro Frontends In each micro frontend application, configure the routing so that each micro frontend has its own set of routes. You can use Angular routing for this. 4. Use a Micro Frontend Framework Integrate a micro frontend framework like single-spa or qiankun into your Shell Angular application. Here’s an example of how to use single-spa in your Shell Angular application: Install single-spa: npm install single-spa Shell Angular Application Code In your Shell Angular application, configure the single-spa to load the micro frontends. import { registerApplication, start } from 'single-spa'; // Register the micro frontends registerApplication({ name: 'customer-app', app: () => System.import('customer-app'), // Load customer-app activeWhen: (location) => location.pathname.startsWith('/customer-app'), }); registerApplication({ name: 'accounts-app', app: () => System.import('accounts-app'), // Load accounts-app activeWhen: (location) => location.pathname.startsWith('/accounts-app'), }); // Start single-spa start(); 5. Host Micro Frontends Configure your Shell Angular’s routing to direct to the respective micro frontends based on the URL. For example, when a user accesses /customer-app, the shell should load the customer micro frontend, and for /accounts-app, load accounts micro frontend. 6. Development and Build Develop and build your micro frontends separately. Each should be a standalone Angular application. 7. Deployment Deploy the Shell Angular application along with the micro frontends, making sure they are accessible from the same domain. With this setup, your Shell Angular application will act as the main container for hosting the micro frontends, and you can navigate between the micro frontends within the shell’s routing. This allows you to have a single Angular UI that hosts multiple micro frontends, each with its own functionality.
On the 19th of September, 2023, Java 21 was released. It is time to take a closer look at the changes since the last LTS release, which is Java 17. In this blog, some of the changes between Java 17 and Java 21 are highlighted, mainly by means of examples. Enjoy! Introduction First of all, the short introduction is not entirely correct because Java 21 is mentioned in one sentence with being an LTS release. An elaborate explanation is given in this blog of Nicolai Parlog. In short, Java 21 is a set of specifications defining the behaviour of the Java language, the API, the virtual machine, etc. A reference implementation of Java 21 is implemented by OpenJDK. Updates to the reference implementation are made in this OpenJDK repository. After the release, a fork is created jdk21u. This jdk21u fork is maintained and will receive updates for a longer time than the regular 6-month cadence. Even with jdk21u, there is no guarantee that fixes are made during a longer time period. This is where the different Vendor implementations make a difference. They build their own JDKs and make them freely available, often with commercial support. So, it is better to say “JDK21 is a version, for which many vendors offer support." What has changed between Java 17 and Java 21? A complete list of the JEPs (Java Enhancement Proposals) can be found at the OpenJDK website. Here you can read the nitty gritty details of each JEP. For a complete list of what has changed per release since Java 17, the Oracle release notes give a good overview. In the next sections, some of the changes are explained by example, but it is mainly up to you to experiment with these new features in order to get acquainted with them. Do note that no preview or incubator JEPs are considered here. The sources used in this post are available at GitHub. Check out an earlier blog if you want to know what has changed between Java 11 and Java 17. Last thing to mention in this introduction, is the availability of a Java playground, where you can experiment with Java from within your browser. Prerequisites Prerequisites for this blog are: You must have a JDK21 installed; You need some basic Java knowledge. JEP444: Virtual Threads Let’s start with the most important new feature in JDK21: virtual threads. Virtual threads are lightweight threads that dramatically reduce the effort of writing, maintaining, and observing high-throughput concurrent applications. Up till now, threads were implemented as wrappers around Operating System (OS) threads. OS threads are costly and if you send an http request to another server, you will block this thread until you have received the answer of the server. The processing part (creating the request and processing the answer) is just a small portion of the entire time the thread was blocked. Sending the request and waiting for the answer takes up much more time than the processing part. A way to circumvent this, is to use asynchronous style. Disadvantage of this approach is the more complex implementation. This is where virtual threads come to the rescue. You are able to keep the implementation simple like you did before and still have the scalability of the asynchronous style. The Java application PlatformThreads.java demonstrates what happens when creating 1.000, 10.000, 100.000 and 1.000.000 threads concurrently. The threads only wait for one second. Dependent on your machine, you will get different results because the threads are bound to the OS threads. Java public class PlatformThreads { public static void main(String[] args) { testPlatformThreads(1000); testPlatformThreads(10_000); testPlatformThreads(100_000); testPlatformThreads(1_000_000); } private static void testPlatformThreads(int maximum) { long time = System.currentTimeMillis(); try (var executor = Executors.newCachedThreadPool()) { IntStream.range(0, maximum).forEach(i -> { executor.submit(() -> { Thread.sleep(Duration.ofSeconds(1)); return i; }); }); } time = System.currentTimeMillis() - time; System.out.println("Number of threads = " + maximum + ", Duration(ms) = " + time); } } The output of running this application is the following: Shell Number of threads = 1000, Duration(ms) = 1094 Number of threads = 10000, Duration(ms) = 1625 Number of threads = 100000, Duration(ms) = 5292 [21,945s][warning][os,thread] Attempt to protect stack guard pages failed (0x00007f8525d00000-0x00007f8525d04000). # # A fatal error has been detected by the Java Runtime Environment: # Native memory allocation (mprotect) failed to protect 16384 bytes for memory to guard stack pages # An error report file with more information is saved as: # /home/<user_dir>/MyJava21Planet/hs_err_pid8277.log [21,945s][warning][os,thread] Attempt to protect stack guard pages failed (0x00007f8525c00000-0x00007f8525c04000). [thread 82370 also had an error] [thread 82371 also had an error] [21,946s][warning][os,thread] Failed to start thread "Unknown thread" - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached. [21,946s][warning][os,thread] Failed to start the native thread for java.lang.Thread "pool-4-thread-32577" ... What do you see here? The application takes about 1s for 1.000 threads, 1.6s for 10.000 threads, 5.3s for 100.000 threads and it crashes with 1.000.000 threads. The boundary for the maximum number of OS threads on my machine lies somewhere between 100.000 and 1.000.000 threads. Change the application by replacing the Executors.newCachedThreadPool with the new Executors.newVirtualThreadPerTaskExecutor (VirtualThreads.java). Java try (var executor = Executors.newVirtualThreadPerTaskExecutor()) { IntStream.range(0, maximum).forEach(i -> { executor.submit(() -> { Thread.sleep(Duration.ofSeconds(1)); return i; }); }); } Run the application again. The output is the following: Shell Number of threads = 1000, Duration(ms) = 1020 Number of threads = 10000, Duration(ms) = 1056 Number of threads = 100000, Duration(ms) = 1106 Number of threads = 1000000, Duration(ms) = 1806 Number of threads = 10000000, Duration(ms) = 22010 The application takes about 1s for 1.000 threads (similar to the OS threads), 1s for 10.000 threads (better than OS threads), 1.1s for 100.000 threads (also better), 1.8s for 1.000.000 (does not crash) and even 10.000.000 threads are no problem, taking about 22s in order to execute. This is quite amazing and incredible, isn’t it? JEP431: Sequenced Collections Sequenced Collections fill the lack of a collection type that represents a sequence of elements with a defined encounter order. Besides that, a uniform set of operations were absent that apply such collections. There have been quite some complaints from the community about this topic and this is now solved by the introduction of some new collection interfaces. The overview is available in the following image which is based on the overview as created by Stuart Marks. Besides the new introduced interfaces, some unmodifiable wrappers are available now. Java Collections.unmodifiableSequencedCollection(sequencedCollection) Collections.unmodifiableSequencedSet(sequencedSet) Collections.unmodifiableSequencedMap(sequencedMap) The next sections will show these new interfaces based on the application SequencedCollections.java. SequencedCollection A sequenced collection is a Collection whose elements have a predefined encounter order. The new interface SequencedCollection is: Java interface SequencedCollection<E> extends Collection<E> { // new method SequencedCollection<E> reversed(); // methods promoted from Deque void addFirst(E); void addLast(E); E getFirst(); E getLast(); E removeFirst(); E removeLast(); } In the following example, a list is created and reversed. The first and last item are retrieved and a new first and last item are added. Java private static void sequencedCollection() { List<String> sc = Stream.of("Alpha", "Bravo", "Charlie", "Delta").collect(Collectors.toCollection(ArrayList::new)); System.out.println("Initial list: " + sc); System.out.println("Reversed list: " + sc.reversed()); System.out.println("First item: " + sc.getFirst()); System.out.println("Last item: " + sc.getLast()); sc.addFirst("Before Alpha"); sc.addLast("After Delta"); System.out.println("Added new first and last item: " + sc); } The output is: Shell Initial list: [Alpha, Bravo, Charlie, Delta] Reversed list: [Delta, Charlie, Bravo, Alpha] First item: Alpha Last item: Delta Added new first and last item: [Before Alpha, Alpha, Bravo, Charlie, Delta, After Delta] As you can see, no real surprises here, it just works. SequencedSet A sequenced set is a Set that is a SequencedCollection that contains no duplicate elements. The new interface is: Java interface SequencedSet<E> extends Set<E>, SequencedCollection<E> { SequencedSet<E> reversed(); // covariant override } In the following example, a SortedSet is created and reversed. The first and last item are retrieved and it is tried to add a new first and last item. Java private static void sequencedSet() { SortedSet<String> sortedSet = new TreeSet<>(Set.of("Charlie", "Alpha", "Delta", "Bravo")); System.out.println("Initial list: " + sortedSet); System.out.println("Reversed list: " + sortedSet.reversed()); System.out.println("First item: " + sortedSet.getFirst()); System.out.println("Last item: " + sortedSet.getLast()); try { sortedSet.addFirst("Before Alpha"); } catch (UnsupportedOperationException uoe) { System.out.println("addFirst is not supported"); } try { sortedSet.addLast("After Delta"); } catch (UnsupportedOperationException uoe) { System.out.println("addLast is not supported"); } } The output is: Shell Initial list: [Alpha, Bravo, Charlie, Delta] Reversed list: [Delta, Charlie, Bravo, Alpha] First item: Alpha Last item: Delta addFirst is not supported addLast is not supported The only difference with a SequencedCollection is that the elements are sorted alphabetically in the initial list and that the addFirst and addLast methods are not supported. This is obvious because you cannot guarantee that the first element will remain the first element when added to the list (it will be sorted again anyway). SequencedMap A sequenced map is a Map whose entries have a defined encounter order. The new interface is: Java interface SequencedMap<K,V> extends Map<K,V> { // new methods SequencedMap<K,V> reversed(); SequencedSet<K> sequencedKeySet(); SequencedCollection<V> sequencedValues(); SequencedSet<Entry<K,V>> sequencedEntrySet(); V putFirst(K, V); V putLast(K, V); // methods promoted from NavigableMap Entry<K, V> firstEntry(); Entry<K, V> lastEntry(); Entry<K, V> pollFirstEntry(); Entry<K, V> pollLastEntry(); } In the following example, a LinkedHashMap is created, and some elements are added and the list is reversed. The first and last elements are retrieved and new first and last items are added. Java private static void sequencedMap() { LinkedHashMap<Integer,String> hm = new LinkedHashMap<Integer,String>(); hm.put(1, "Alpha"); hm.put(2, "Bravo"); hm.put(3, "Charlie"); hm.put(4, "Delta"); System.out.println("== Initial List =="); printMap(hm); System.out.println("== Reversed List =="); printMap(hm.reversed()); System.out.println("First item: " + hm.firstEntry()); System.out.println("Last item: " + hm.lastEntry()); System.out.println(" == Added new first and last item =="); hm.putFirst(5, "Before Alpha"); hm.putLast(3, "After Delta"); printMap(hm); } The output is: Shell == Initial List == 1 Alpha 2 Bravo 3 Charlie 4 Delta == Reversed List == 4 Delta 3 Charlie 2 Bravo 1 Alpha First item: 1=Alpha Last item: 4=Delta == Added new first and last item == 5 Before Alpha 1 Alpha 2 Bravo 4 Delta 3 After Delta Also here no surprises. JEP440: Record Patterns Record patterns enhance the Java programming language in order to deconstruct record values. This will make it easier to navigate into the data. Let’s see how this works with application RecordPatterns.java. Assume the following GrapeRecord which consists out of a color and a number of pits. Java record GrapeRecord(Color color, Integer nbrOfPits) {} When you need to access the number of pits, you had to implicitely cast the GrapeRecord and you were able to access the nbrOfPits member using the grape variable. Java private static void singleRecordPatternOldStyle() { Object o = new GrapeRecord(Color.BLUE, 2); if (o instanceof GrapeRecord grape) { System.out.println("This grape has " + grape.nbrOfPits() + " pits."); } } With Record Patterns, you can add the record members as part of the instanceof check and access them directly. Java private static void singleRecordPattern() { Object o = new GrapeRecord(Color.BLUE, 2); if (o instanceof GrapeRecord(Color color, Integer nbrOfPits)) { System.out.println("This grape has " + nbrOfPits + " pits."); } } Introduce a record SpecialGrapeRecord which consists out of a record GrapeRecord and a boolean. Java record SpecialGrapeRecord(GrapeRecord grape, boolean special) {} You have created a nested record. Record Patterns also support nested records as can be seen in the following example: Java private static void nestedRecordPattern() { Object o = new SpecialGrapeRecord(new GrapeRecord(Color.BLUE, 2), true); if (o instanceof SpecialGrapeRecord(GrapeRecord grape, boolean special)) { System.out.println("This grape has " + grape.nbrOfPits() + " pits."); } if (o instanceof SpecialGrapeRecord(GrapeRecord(Color color, Integer nbrOfPits), boolean special)) { System.out.println("This grape has " + nbrOfPits + " pits."); } } JEP441: Pattern Matching for Switch Pattern matching for instanceof has been introduced with Java 17. Pattern matching for switch expressions will allow to test expressions against a number of patterns. This leads to several new and interesting possibilities as is demonstrated in application PatternMatchingSwitch.java. Pattern Matching Switch When you want to verify whether an object is an instance of a particular type, you needed to write something like the following: Java private static void oldStylePatternMatching(Object obj) { if (obj instanceof Integer i) { System.out.println("Object is an integer:" + i); } else if (obj instanceof String s) { System.out.println("Object is a string:" + s); } else if (obj instanceof FruitType f) { System.out.println("Object is a fruit: " + f); } else { System.out.println("Object is not recognized"); } } This is quite verbose and the reason is that you cannot test whether the value is of a particular type in a switch expression. With the introduction of pattern matching for switch, you can refactor the code above to the following, less verbose code: Java private static void patternMatchingSwitch(Object obj) { switch(obj) { case Integer i -> System.out.println("Object is an integer:" + i); case String s -> System.out.println("Object is a string:" + s); case FruitType f -> System.out.println("Object is a fruit: " + f); default -> System.out.println("Object is not recognized"); } } Switches and Null When the object argument in the previous example happens to be null, a NullPointerException will be thrown. Therefore, you need to check for null values before evaluating the switch expression. The following code uses pattern matching for switch, but if obj is null, a NullPointerException is thrown. Java private static void oldStyleSwitchNull(Object obj) { try { switch (obj) { case Integer i -> System.out.println("Object is an integer:" + i); case String s -> System.out.println("Object is a string:" + s); case FruitType f -> System.out.println("Object is a fruit: " + f); default -> System.out.println("Object is not recognized"); } } catch (NullPointerException npe) { System.out.println("NullPointerException thrown"); } } However, now it is possible to test against null and determine in your switch what to do when the value happens to be null. Java private static void switchNull(Object obj) { switch (obj) { case Integer i -> System.out.println("Object is an integer:" + i); case String s -> System.out.println("Object is a string:" + s); case FruitType f -> System.out.println("Object is a fruit: " + f); case null -> System.out.println("Object is null"); default -> System.out.println("Object is not recognized"); } } Case Refinement What if you need to add extra checks based on a specific FruitType in the previous example? This would lead to extra if-statements in order to determine what to do. Java private static void inefficientCaseRefinement(Object obj) { switch (obj) { case String s -> System.out.println("Object is a string:" + s); case FruitType f -> { if (f == FruitType.APPLE) { System.out.println("Object is an apple"); } if (f == FruitType.AVOCADO) { System.out.println("Object is an avocado"); } if (f == FruitType.PEAR) { System.out.println("Object is a pear"); } if (f == FruitType.ORANGE) { System.out.println("Object is an orange"); } } case null -> System.out.println("Object is null"); default -> System.out.println("Object is not recognized"); } } This type of problem is solved by allowing when-clauses in switch blocks to specify guards to pattern case labels. The case label is called a guarded case label and the boolean expression is called the guard. The above code becomes the following code, which is much more readable. Java private static void caseRefinement(Object obj) { switch (obj) { case String s -> System.out.println("Object is a string:" + s); case FruitType f when (f == FruitType.APPLE) -> { System.out.println("Object is an apple"); } case FruitType f when (f == FruitType.AVOCADO) -> { System.out.println("Object is an avocado"); } case FruitType f when (f == FruitType.PEAR) -> { System.out.println("Object is a pear"); } case FruitType f when (f == FruitType.ORANGE) -> { System.out.println("Object is an orange"); } case null -> System.out.println("Object is null"); default -> System.out.println("Object is not recognized"); } } Enum Constants Enum types can be used in switch expressions, but the evaluation is limited to the enum constants of the specific type. What if you want to evaluate based on multiple enum constants? Introduce a new enum CarType. Java public enum CarType { SUV, CABRIO, EV } Now that it is possible to use a case refinement, you could write something like the following. Java private static void inefficientEnumConstants(Object obj) { switch (obj) { case String s -> System.out.println("Object is a string:" + s); case FruitType f when (f == FruitType.APPLE) -> System.out.println("Object is an apple"); case FruitType f when (f == FruitType.AVOCADO) -> System.out.println("Object is an avocado"); case FruitType f when (f == FruitType.PEAR) -> System.out.println("Object is a pear"); case FruitType f when (f == FruitType.ORANGE) -> System.out.println("Object is an orange"); case CarType c when (c == CarType.CABRIO) -> System.out.println("Object is a cabrio"); case null -> System.out.println("Object is null"); default -> System.out.println("Object is not recognized"); } } This code would be more readable if you would have a separate case for every enum constant instead of having a lots of guarded patterns. This turns the above code into the following, much more readable code. Java private static void enumConstants(Object obj) { switch (obj) { case String s -> System.out.println("Object is a string:" + s); case FruitType.APPLE -> System.out.println("Object is an apple"); case FruitType.AVOCADO -> System.out.println("Object is an avocado"); case FruitType.PEAR -> System.out.println("Object is a pear"); case FruitType.ORANGE -> System.out.println("Object is an orange"); case CarType.CABRIO -> System.out.println("Object is a cabrio"); case null -> System.out.println("Object is null"); default -> System.out.println("Object is not recognized"); } } JEP413: Code Snippets Code snippets allow you to simplify the inclusion of example source code in API documentation. Code snippets are now often added by means of the <pre> HTML tag. See application Snippets.java for the complete source code. Java /** * this is an example in Java 17 * <pre>{@code * if (success) { * System.out.println("This is a success!"); * } else { * System.out.println("This is a failure"); * } * } * </pre> * @param success */ public void example1(boolean success) { if (success) { System.out.println("This is a success!"); } else { System.out.println("This is a failure"); } } Generate the javadoc: Shell $ javadoc src/com/mydeveloperplanet/myjava21planet/Snippets.java -d javadoc In the root of the repository, a directory javadoc is created. Open the index.html file with your favourite browser and click the snippets URL. The above code has the following javadoc. There are some shortcomings using this approach: no source code validation; no way to add comments because the fragment is already located in a comment block; no code syntax highlighting; etc. Inline Snippets In order to overcome these shortcomings, a new @snippet tag is introduced. The code above can be rewritten as follows. Java /** * this is an example for inline snippets * {@snippet : * if (success) { * System.out.println("This is a success!"); * } else { * System.out.println("This is a failure"); * } * } * * @param success */ public void example2(boolean success) { if (success) { System.out.println("This is a success!"); } else { System.out.println("This is a failure"); } } The generated javadoc is the following. You notice here that the code snippet is visible marked as source code and a copy source code icon is added. As an extra test, you can remove in the javadoc of methods example1 and example2 a semi-colon, introducing a compiler error. In example1, the IDE just accepts this compiler error. However, in example2, the IDE will prompt you about this compiler error. External Snippets An interesting feature is to move your code snippets to an external file. Create in package com.mydeveloperplanet.myjava21planet a directory snippet-files. Create a class SnippetsExternal in this directory and mark the code snippets by means of an @start tag and an @end tag. With the region parameter, you can give the code snippet a name to refer to. The example4 method also contains the @highlight tag which allows you highlight certain elements in the code. Many more formatting and highlighting options are available, it is too much to cover them all. Java public class SnippetsExternal { public void example3(boolean success) { // @start region=example3 if (success) { System.out.println("This is a success!"); } else { System.out.println("This is a failure"); } // @end } public void example4(boolean success) { // @start region=example4 if (success) { System.out.println("This is a success!"); // @highlight substring="println" } else { System.out.println("This is a failure"); } // @end } } In your code, you refer to the SnippetsExternal file and the region you want to include in your javadoc. Java /** * this is an example for external snippets * {@snippet file="SnippetsExternal.java" region="example3" }" * * @param success */ public void example3(boolean success) { if (success) { System.out.println("This is a success!"); } else { System.out.println("This is a failure"); } } /** * this is an example for highlighting * {@snippet file="SnippetsExternal.java" region="example4" }" * * @param success */ public void example4(boolean success) { if (success) { System.out.println("This is a success!"); } else { System.out.println("This is a failure"); } } When you generate the javadoc as before, you will notice in the output that the javadoc tool cannot find the SnippetsExternal file. Shell src/com/mydeveloperplanet/myjava21planet/Snippets.java:48: error: file not found on source path or snippet path: SnippetsExternal.java * {@snippet file="SnippetsExternal.java" region="example3" }" ^ src/com/mydeveloperplanet/myjava21planet/Snippets.java:62: error: file not found on source path or snippet path: SnippetsExternal.java * {@snippet file="SnippetsExternal.java" region="example4" }" You need to add the path to the snippet files by means of the --snippet-path argument. Shell $ javadoc src/com/mydeveloperplanet/myjava21planet/Snippets.java -d javadoc --snippet-path=./src/com/mydeveloperplanet/myjava21planet/snippet-files The javadoc for method example3 contains the defined snippet. The javadoc for method example4 contains the highlighted section. JEP408: Simple Web Server Simple Web Server is a minimal HTTP server for serving a single directory hierarchy. Goal is to provide a web server for computer science students for testing or prototyping purposes. Create in the root of the repository a httpserver directory, containing a simple index.html file. HTML Welcome to Simple Web Server You can start the web server programmatically as follows (see SimpleWebServer.java). The path to the directory must refer to the absolute path of the directory. Java private static void startFileServer() { var server = SimpleFileServer.createFileServer(new InetSocketAddress(8080), Path.of("/<absolute path>/MyJava21Planet/httpserver"), SimpleFileServer.OutputLevel.VERBOSE); server.start(); } Verify the output. Shell $ curl http://localhost:8080 Welcome to Simple Web Server You can change the contents of the index.html file on the fly and it will serve the new contents immediately after a refresh of the page. It is also possible to create a custom HttpHandler in order to intercept the response and change it. Java class MyHttpHandler implements com.sun.net.httpserver.HttpHandler { @Override public void handle(HttpExchange exchange) throws IOException { if ("GET".equals(exchange.getRequestMethod())) { OutputStream outputStream = exchange.getResponseBody(); String response = "It works!"; exchange.sendResponseHeaders(200, response.length()); outputStream.write(response.getBytes()); outputStream.flush(); outputStream.close(); } } } Start the web server on a different port and add a context path and the HttpHandler. Java private static void customFileServerHandler() { try { var server = HttpServer.create(new InetSocketAddress(8081), 0); server.createContext("/custom", new MyHttpHandler()); server.start(); } catch (IOException ioe) { System.out.println("IOException occured"); } } Run this application and verify the output. Shell $ curl http://localhost:8081/custom It works! Conclusion In this blog, you took a quick look at some features added since the last LTS release Java 17. It is now up to you to start thinking about your migration plan to Java 21 and a way to learn more about these new features and how you can apply them into your daily coding habits. Tip: IntelliJ will help you with that!
This is an article from DZone's 2023 Observability and Application Performance Trend Report.For more: Read the Report From cultural and structural challenges within an organization to balancing daily work and dividing it between teams and individuals, scaling teams of site reliability engineers (SREs) comes with many challenges. However, fostering a resilient site reliability engineering (SRE) culture can facilitate the gradual and sustainable growth of an SRE team. In this article, we explore the challenges of scaling and review a successful scaling framework. This framework is suitable for guiding emerging teams and startups as they cultivate an evolving SRE culture, as well as for established companies with firmly entrenched SRE cultures. The Challenges of Scaling SRE Teams As teams scale, complexity may increase as it can be more difficult to communicate, coordinate, and maintain a team's coherence. Below is a list of challenges to consider as your team and/or organization grows: Rapid growth – Rapid growth leads to more complex systems, which can outpace the capacity of your SRE team, leading to bottlenecks and reduced reliability. Knowledge-sharing – Maintaining a shared understanding of systems and processes may become difficult, making it challenging to onboard new team members effectively. Tooling and automation – Scaling without appropriate tooling and automation can lead to increased manual toil, reducing the efficiency of the SRE team. Incident response – Coordinating incident responses can become more challenging, and miscommunications or delays can occur. Maintaining a culture of innovation and learning – This can be challenging as SREs may become more focused on solving critical daily problems and less focused on new initiatives. Balancing operational and engineering work – Since SREs are responsible for both operational tasks and engineering work, it is important to ensure that these teams have enough time to focus on both areas. A Framework for Scaling SRE Teams Scaling may come naturally if you do the right things in the right order. First, you must identify what your current state is in terms of infrastructure. How well do you understand the systems? Determine existing SRE processes that need improvement. For the SRE processes that are necessary but are not employed yet, find the tools and the metrics necessary to start. Collaborate with the appropriate stakeholders, use feedback, iterate, and improve. Step 1: Assess Your Current State Understand your system and create a detailed map of your infrastructure, services, and dependencies. Identify all the components in your infrastructure, including servers, databases, load balancers, networking equipment, and any cloud services you utilize. It is important to understand how these components are interconnected and dependent on each other — this includes understanding which services rely on others and the flow of data between them. It's also vital to identify and evaluate existing SRE practices and assess their effectiveness: Analyze historical incident data to identify recurring issues and their resolutions. Gather feedback from your SRE team and other relevant stakeholders. Ask them about pain points, challenges, and areas where improvements are needed. Assess the performance metrics related to system reliability and availability. Identify any trends or patterns that indicate areas requiring attention. Evaluate how incidents are currently being handled. Are they being resolved efficiently? Are post-incident reviews being conducted effectively to prevent recurrences? Step 2: Define SLOs and Error Budgets Collaborate with stakeholders to establish clear and meaningful service-level objectives (SLOs) by determining the acceptable error rate and creating error budgets based on the SLOs. SLOs and error budgets can guide resource allocation optimization. Computing resources can be allocated to areas that directly impact the achievement of the SLOs. SLOs set clear, achievable goals for the team and provide a measurable way to assess the reliability of a service. By defining specific targets for uptime, latency, or error rates, SRE teams can objectively evaluate whether the system is meeting the desired standards of performance. Using specific targets, a team can prioritize their efforts and focus on areas that need improvement, thus fostering a culture of accountability and continuous improvement. Error budgets provide a mechanism for managing risk and making trade-offs between reliability and innovation. They allow SRE teams to determine an acceptable threshold for service disruptions or errors, enabling them to balance the need for deploying new features or making changes to maintain a reliable service. Step 3: Build and Train Your SRE Team Identify talent according to the needs of each and every step of this framework. Look for the right skillset and cultural fit, and be sure to provide comprehensive onboarding and training programs for new SREs. Beware of the golden rule that culture eats strategy for breakfast: Having the right strategy and processes is important, but without the right culture, no strategy or process will succeed in the long run. Step 4: Establish SRE Processes, Automate, Iterate, and Improve Implement incident management procedures, including incident command and post-incident reviews. Define a process for safe and efficient changes to the system. Figure 1: Basic SRE process One of the cornerstones of SRE involves how to identify and handle incidents through monitoring, alerting, remediation, and incident management. Swift incident identification and management are vital in minimizing downtime, which can prevent minor issues from escalating into major problems. By analyzing incidents and their root causes, SREs can identify patterns and make necessary improvements to prevent similar issues from occurring in the future. This continuous improvement process is crucial for enhancing the overall reliability and performance whilst ensuring the efficiency of systems at scale. Improving and scaling your team can go hand in hand. Monitoring Monitoring is the first step in ensuring the reliability and performance of a system. It involves the continuous collection of data about the system's behavior, performance, and health. This can be broken down into: Data collection – Monitoring systems collect various types of data, including metrics, logs, and traces, as shown in Figure 2. Real-time observability – Monitoring provides real-time visibility into the system's status, enabling teams to identify potential issues as they occur. Proactive vs. reactive – Effective monitoring allows for proactive problem detection and resolution, reducing the need for reactive firefighting. Figure 2: Monitoring and observability Alerting This is the process of notifying relevant parties when predefined conditions or thresholds are met. It's a critical prerequisite for incident management. This can be broken down into: Thresholds and conditions – Alerts are triggered based on predefined thresholds or conditions. For example, an alert might be set to trigger when CPU usage exceeds 90% for five consecutive minutes. Notification channels – Alerts can be sent via various notification channels, including email, SMS, or pager, or even integrated into incident management tools. Severity levels – Alerts should be categorized by severity levels (e.g., critical, warning, informational) to indicate the urgency and impact of the issue. Remediation This involves taking actions to address issues detected through monitoring and alerting. The goal is to mitigate or resolve problems quickly to minimize the impact on users. Automated actions – SRE teams often implement automated remediation actions for known issues. For example, an automated scaling system might add more resources to a server when CPU usage is high. Playbooks – SREs follow predefined playbooks that outline steps to troubleshoot and resolve common issues. Playbooks ensure consistency and efficiency during remediation efforts. Manual interventions – In some cases, manual intervention by SREs or other team members may be necessary for complex or unexpected issues. Incident Management Effective communication, knowledge-sharing, and training are crucial during an incident, and most incidents can be reproduced in staging environments for training purposes. Regular updates are provided to stakeholders, including users, management, and other relevant teams. Incident management includes a culture of learning and continuous improvement: The goal is not only to resolve the incident but also to prevent it from happening again. Figure 3: Handling incidents A robust incident management process ensures that service disruptions are addressed promptly, thus enhancing user trust and satisfaction. In addition, by effectively managing incidents, SREs help preserve the continuity of business operations and minimize potential revenue losses. Incident management plays a vital role in the scaling process since it establishes best practices and promotes collaboration, as shown in Figure 3. As the system scales, the frequency and complexity of incidents are likely to increase. A well-defined incident management process enables the SRE team to manage the growing workload efficiently. Conclusion SRE is an integral part of the SDLC. At the end of the day, your SRE processes should be integrated into the entire process of development, testing, and deployment, as shown in Figure 4. Figure 4: Holistic view of development, testing, and the SRE process Iterating on and improving the steps above will inevitably lead to more work for SRE teams; however, this work can pave the way for sustainable and successful scaling of SRE teams at the right pace. By following this framework and overcoming the challenges, you can effectively scale your SRE team while maintaining system reliability and fostering a culture of collaboration and innovation. Remember that SRE is an ongoing journey, and it is essential to stay committed to the principles and practices that drive reliability and performance. This is an article from DZone's 2023 Observability and Application Performance Trend Report.For more: Read the Report
The INSERT INTO ... RETURNING SQL clause inserts one or more records into a table and immediately retrieves specified values from these newly inserted rows or additional data from expressions. This is particularly useful when you need to get values generated by the database upon insertion, such as auto-incremented IDs, calculated fields, or default values. Is this useful? Are there any actual use cases for this SQL clause? Don't ORM frameworks make it obsolete? I don't have definitive answers to these questions. However, I recently found it useful when I created a demo to explain how read/write splitting works (see this article). I needed a SQL query that inserted a row and returned the "server ID" of the node that performed the write (this, to demonstrate that the primary node is always performing writes as opposed to the replicas). INSERT INTO ... RETURNING was perfect for this demo, and it got me thinking about other possible scenarios for this feature. After speaking with colleagues, it was clear that there actually are real-world use cases where INSERT INTO ... RETURNING is a good fit. These use cases include situations in which efficiency, simplicity, readability, direct access to the database, or database-specific features are needed, not to mention, when possible limitations in ORMs hit. Even though you might still feel the urge to implement this in application code, it's worth looking at how others use this SQL construct and evaluate whether it's useful in your project or not. Let's dig in. Case: E-Commerce Order Processing Scenario: Generating and retrieving an order ID during order placement. This is very likely handled by ORMs, but still useful in case of scripts, absence of ORM, or even limitations with the ORM. SQL Example: MariaDB SQL INSERT INTO orders (customer_id, product_id, quantity) VALUES (123, 456, 2) RETURNING order_id; Outcome: Instantly provides the unique order_id to the customer. Case: Inventory Management Scenario: Updating and returning the stock count after adding new inventory. SQL Example: MariaDB SQL INSERT INTO inventory (product_name, quantity_added) VALUES ('New Product', 50) RETURNING current_stock_count; Outcome: Offers real-time stock updates for effective tracking. Case: User Registration in Web Applications Scenario: Creating a new user account and returning a confirmation message plus user ID. Here, we are returning a string, but any other kind of computed data can be returned. This is similar to the use case that I found for my demo (returning MariaDB's @@server_id). SQL Example: MariaDB SQL INSERT INTO users (username, password, email) VALUES ('new_user', 'Password123!', 'user@example.com') RETURNING user_id, 'Registration Successful'; Outcome: Confirms account creation (or returns computed data instead of having to process it later in application code) and provides the user ID for immediate use. Never store passwords in plain text like in this example! Case: Personalized Welcome Messages in User Onboarding Scenario: Customizing a welcome message based on the user's profile information during account creation. This is a more elaborated use case similar to the one shown in the previous section. SQL Example: MariaDB SQL INSERT INTO users (username, favorite_genre) VALUES ('fantasyfan', 'Fantasy') RETURNING CONCAT('Welcome, ', username, '! Explore the latest in ', favorite_genre, '!'); Outcome: Produces a personalized welcome message for the user, enhancing the onboarding experience. The message (or some sort of message template) could be provided from outside the SQL sentence, of course. Case: Calculating and Displaying Order Discounts Scenario: Automatically calculating a discount on a new order based on, for example, customer loyalty points. SQL Example: MariaDB SQL INSERT INTO orders (customer_id, total_amount, loyalty_points) VALUES (123, 200, 50) RETURNING total_amount - (loyalty_points * 0.1) AS discounted_price; Outcome: Instantly provides the customer with the discounted price of their order, incentivizing loyalty. Obviously, let your boss know about this. Case: Aggregating Survey Responses for Instant Summary Scenario: Compiling survey responses and instantly providing a summary of the collective responses. It is worth mentioning at this point that even though the SQL examples show "hardcoded" values for IDs, they can be parameters for prepared statements instead. SQL Example: MariaDB SQL INSERT INTO survey_responses (question_id, response) VALUES (10, 'Very Satisfied') RETURNING ( SELECT CONCAT(COUNT(*), ' responses, ', ROUND(AVG(rating), 2), ' average rating') FROM survey_responses WHERE question_id = 10 ); Outcome: Offers a real-time summary of responses, fostering immediate insights. Case: Generating Custom Event Itineraries Scenario: Selecting sessions for a conference event and receiving a personalized itinerary. SQL Example: MariaDB SQL INSERT INTO event_selections (attendee_id, session_id) VALUES (789, 102) RETURNING (SELECT CONCAT(session_name, ' at ', session_time) FROM event_sessions WHERE session_id = 102); Outcome: Immediately create a custom itinerary for the attendees, improving the event experience right from the registration moment. Conclusion Get to know your database. In my case, the more I continue to explore MariaDB, the more I realize the many possibilities it has. The same applies to other databases. In application code, avoid implementing things at which databases excel — namely, handling data.
In contemporary web development, a recurring challenge revolves around harmonizing the convenience and simplicity of using a database with a web application. My name is Viacheslav Aksenov, and in this article, I aim to explore several of the most popular approaches for integrating databases and web applications within the Kubernetes ecosystem. These examples are examined within the context of a testing environment, where constraints are more relaxed. However, these practices can serve as a foundation applicable to production environments as well. One Service, One Database. Why? Running a database alongside a microservice aligns with the principles outlined in the Twelve-Factor App methodology. One key factor is "Backing Services" (Factor III), which suggests treating databases, message queues, and other services as attached resources to be attached or detached seamlessly. By co-locating the database with the microservice, we adhere to the principle of having a single codebase that includes the application and its dependencies, making it easier to manage, scale, and deploy. Additionally, it promotes encapsulation and modularity, allowing the microservice to be self-contained and portable across different environments, following the principles of the Twelve-Factor App. This approach enhances the maintainability and scalability of the entire application architecture. For this task, you can leverage various tools, and one example is using KubeDB. What Is KubeDB? KubeDB is an open-source project that provides a database management framework for Kubernetes, an open-source container orchestration platform. KubeDB simplifies the deployment, management, and scaling of various database systems within Kubernetes clusters. We used the following benefits from using this tool: Database operators: Postgres operator to simplify the process of deploying and managing database instances on Kubernetes. Monitoring and alerts: KubeDB integrates with monitoring and alerting tools like Prometheus and Grafana, enabling you to keep an eye on the health and performance of your database instances. Security: KubeDB helps you set up secure access to your databases using authentication mechanisms and secrets management. And it is very easy to set up the deployment. deployment.yaml: YAML apiVersion: kubedb.com/v1alpha2 kind: PostgreSQL metadata: name: your-postgresql spec: version: "11" storageType: Durable storage: storageClassName: <YOUR_STORAGE_CLASS> accessModes: - ReadWriteOnce resources: requests: storage: 1Gi terminationPolicy: WipeOut databaseSecret: secretName: your-postgresql-secret databaseURLFromSecret: true replicas: 1 users: - name: <YOUR_DB_USER> passwordSecret: secretName: your-postgresql-secret passwordKey: password databaseName: <YOUR_DB_NAME> Then, you can use the credentials and properties of this database to connect your service's pod to it with deployment.yaml: YAML apiVersion: apps/v1 kind: Deployment metadata: name: your-microservice spec: replicas: 1 selector: matchLabels: app: your-microservice template: metadata: labels: app: your-microservice spec: containers: - name: your-microservice-container image: your-microservice-image:tag ports: - containerPort: 80 env: - name: DATABASE_URL value: "postgres://<YOUR_DB_USER>:<YOUR_DB_PASSWORD>@<YOUR_DB_HOST>:<YOUR_DB_PORT>/<YOUR_DB_NAME>" --- apiVersion: v1 kind: Service metadata: name: your-microservice-service spec: selector: app: your-microservice ports: - protocol: TCP port: 80 targetPort: 80 And if, for some reason, you are not ready to use KubeDB or don't require the full functional of their product, you can use the Postgresql container as a sidecar for your test environment. Postgres Container as a Sidecar In the context of Kubernetes and databases like PostgreSQL, a sidecar is a separate container that runs alongside the main application container within a pod. The sidecar pattern is commonly used to enhance or extend the functionality of the main application container without directly impacting its core logic. Let's see the example of a configuration for a small Spring Boot Kotlin service that handles cat names. deployment.yaml: YAML apiVersion: apps/v1 kind: Deployment metadata: name: cat-svc labels: app: cat-svc spec: replicas: 1 selector: matchLabels: app: cat-svc template: metadata: labels: app: cat-svc type: http spec: containers: - name: cat-svc image: cat-svc:0.0.1 ports: - name: http containerPort: 8080 protocol: TCP readinessProbe: httpGet: path: /actuator/health port: 8080 initialDelaySeconds: 30 timeoutSeconds: 10 periodSeconds: 10 livenessProbe: httpGet: path: /actuator/health port: 8080 initialDelaySeconds: 60 timeoutSeconds: 10 periodSeconds: 30 env: - name: PLACES_DATABASE value: localhost:5432/cats - name: POSTGRES_USER value: pwd - name: POSTGRES_PASSWORD value: postgres - name: cat-postgres image: postgres:11.1 ports: - name: http containerPort: 5432 protocol: TCP env: - name: POSTGRES_USER value: pwd - name: POSTGRES_PASSWORD value: postgres - name: POSTGRES_DB value: cats Dockerfile FROM gradle:8.3.0-jdk17 COPY . . EXPOSE 8080 CMD ["gradle", "bootRun"] And for local run, it is possible to use docker-compose with the following configuration. docker-compose.yaml: YAML version: '3.8' services: cat-postgres: image: postgres:12.13 restart: always ports: - "5432:5432" environment: POSTGRES_PASSWORD: postgres POSTGRES_USER: postgres POSTGRES_DB: cats # volumes: # - ./init.sql:/docker-entrypoint-initdb.d/create_tables.sql - if you want to run any script before an app # - ./db-data/:/var/lib/postgresql/data/ service: image: cat-svc:0.0.1 restart: always ports: - '8080:8080' environment: SPRING_PROFILES_ACTIVE: prod PLACES_DATABASE: cat-postgres:5432/cats POSTGRES_PASSWORD: postgres POSTGRES_USER: postgres Migrations The big thing that has to be decided before using this approach is the migration question. The best option in this approach is to delegate the migration process to any tool that can work within your app infrastructure. For example, for Java World, you could use Flyway or Liquibase. Flyway is a popular open-source database migration tool. It allows you to version control your database schema and apply changes in a structured manner. Flyway supports multiple databases, including PostgreSQL, MySQL, and Oracle. Liquibase is an open-source database migration tool that supports tracking, managing, and applying database changes. It provides a way to define database changes using XML, YAML, or SQL, and it supports various databases. Pros of Using a PostgreSQL Sidecar in Kubernetes Separation of concerns: Sidecars allow you to separate specific functionalities (e.g., database migrations, backups) from the main application logic. Сompliance with microservice architecture. Simplified deployment: Sidecars can be deployed and managed alongside the main application using the same deployment configurations, simplifying the overall deployment process. You don't need to support separated database for testing the environment. And it leads to decreasing the complexity of tests (you don't need to think about collisions while you are running many CI with tests for the same table) Cons of Using a PostgreSQL Sidecar in Kubernetes Resource overhead: Running additional containers consumes resources (CPU, memory) on the node, which may impact the overall performance and resource utilization of the Kubernetes cluster. It's best to use as few resources as possible. Startup order: The main application may become dependent on the sidecar for certain functionalities, potentially leading to issues if there are discrepancies or version mismatches between the main application and the sidecar. Arranging containers in a specific order without additional configuration can be somewhat challenging. However, this shouldn't pose a problem in test environments due to the quick startup of the PostgreSQL container. In most scenarios, the PostgreSQL container will initiate before any of your business applications. Even if the application attempts to run before PostgreSQL is ready, it will encounter a failure and be automatically restarted by the default Kubernetes mechanism until the database becomes available. Learning curve: Adopting the sidecar pattern may require a learning curve for development and operations teams, particularly if they are new to the concept of containerized sidecar architectures. Once the setup is complete, new team members should encounter no issues with this approach. Conclusion In conclusion, the choice between using KubDB and the PostgreSQL sidecar approach for integrating web applications and databases in a test environment ultimately depends on your specific requirements and preferences. KubDB offers a comprehensive solution with Kubernetes-native features, streamlining the management of databases alongside web services. On the other hand, the PostgreSQL sidecar approach provides flexibility and fine-grained control over how databases and web applications interact. Whether you opt for the simplicity and seamless integration provided by KubDB or the customization potential inherent in the sidecar pattern, both approaches lay a solid foundation for test environments. The key lies in understanding the unique demands of your project and selecting the method that aligns best with your development workflow, scalability needs, and overall application architecture. Whichever path you choose, the insights gained from exploring these approaches in a test setting can pave the way for a robust and efficient integration strategy in your production environment.
Amazon Elastic Compute Cloud (EC2) stands as a cornerstone of AWS's suite of cloud services, providing a versatile platform for computing on demand. Yet, the true power of EC2 lies in its diverse array of instance types, each meticulously crafted to cater to distinct computational requirements, underpinned by a variety of specialized hardware architectures. This article goes into detail, exploring the intricacies of these instance types and dissecting the hardware that drives them. Through this foundational approach, we aim to furnish a more profound comprehension of EC2's ecosystem, equipping you with the insights necessary to make the right decisions when selecting the most apt instance for your specific use case. Why Understand the Hardware Beneath the Instances? When diving into cloud computing, it's tempting to view resources like EC2 instances as abstracted boxes, merely serving our applications without much thought to their inner workings. However, having a fundamental understanding of the underlying hardware of your chosen EC2 instance is crucial. This knowledge not only empowers you to make more informed decisions, optimizing both performance and costs, but also ensures your applications run smoothly, minimizing unexpected disruptions. Just as a chef selects the right tools for a dish or a mechanic chooses the correct parts for a repair, knowing the hardware components of your EC2 instances can be the key to unlocking their full potential. In this article, we'll demystify the hardware behind the EC2 curtains, helping you bridge the gap between abstract cloud resources and tangible hardware performance. Major Hardware Providers and Their Backgrounds Intel For years, Intel has been the cornerstone of cloud computing, with its Xeon processors powering a vast majority of EC2 instances. Renowned for their robust general-purpose computing capabilities, Intel's chips excel in a wide array of tasks, from data processing to web hosting. Their Hyper-Threading technology allows for higher multi-tasking, making them versatile for varied workloads. However, premium performance often comes at a premium cost. AMD AMD instances, particularly those sporting the EPYC series of processors, have started gaining traction in the cloud space. They are often pitched as cost-effective alternatives to Intel without compromising much on performance. AMD's strength lies in providing a high number of cores, making them suitable for tasks that benefit from parallel processing. They can offer a balance between price and performance, particularly for businesses operating on tighter budgets. ARM (Graviton) ARM's Graviton and Graviton2 processors represent a departure from traditional cloud computing hardware. These chips are known for their energy efficiency, derived from ARM's heritage in mobile computing. As a result, Graviton-powered instances can deliver a superior price-performance ratio, especially for scale-out workloads that can distribute tasks across multiple servers. They're steadily becoming the go-to choice for businesses prioritizing efficiency and cost savings. NVIDIA When it comes to GPU-intensive tasks, NVIDIA stands uncontested. Their Tesla and A100 GPUs, commonly found in EC2's GPU instances, are designed for workloads that demand heavy computational power. Whether machine learning training, 3D rendering, or high-performance computing, NVIDIA-powered instances offer accelerated performance. However, the specialized nature of these instances means they might not be the best choice for general computing tasks and can be more expensive. In essence, while EC2 instance families provide a high-level categorization, the real differentiation in performance, cost, and suitability comes from these underlying hardware providers. By understanding the strengths and limitations of each, businesses can tailor their cloud deployments to achieve the desired balance of performance and cost. 1. General Purpose Instances Notable types: T3/T4g (Intel/ARM), M7i/M7g (Intel/ARM), etc. Primary use: Balancing compute, memory, and networking Practical application: Web servers: A standard web application or website that requires balanced resources can run seamlessly on general-purpose instances Developer environments: The burstable performance of t2 and t3 makes them ideal for development and testing environments where resource demand fluctuates. 2. Compute Optimized Instances Notable Types: C7i/C7g (Intel/ARM), etc. Primary Use: High computational tasks Practical application: High-performance web servers: Websites with massive traffic or services that require quick response times Scientific modeling: Simulating climate patterns, genomic research, or quantum physics calculations 3. Memory Optimized Instances Notable Types: R7i/R7g (Intel/ARM), X1/X1e (Intel), etc. Primary Use: Memory-intensive tasks Practical Application: Large-scale databases: Running applications like MySQL, PostgreSQL, or big databases like SAP HANA Real-time Big Data analytics: Analyzing massive data sets in real-time, such as stock market trends or social media sentiment analysis 4. Storage Optimized Instances Notable types: I3/I3en (Intel), D3/D3en (Intel), H1 (Intel), etc. Primary use: High random I/O access Practical Application: NoSQL databases: Deploying high-transaction databases like Cassandra or MongoDB Data warehousing: Handling and analyzing vast amounts of data, such as user data for large enterprises 5. Accelerated Computing Instances Notable types: P5 (NVIDIA/AMD), Inf1 (Intel), G5 (NVIDIA), etc. Primary use: GPU-intensive tasks Practical application: Machine Learning: Training complex models or neural networks Video rendering: Creating high-quality animation or special effects for movies 6. High-Performance Computing (HPC) Instances Notable types: Hpc7g, Hpc7a Primary use: Tasks requiring extremely high frequencies or hardware acceleration Practical Application: Electronic Design Automation (EDA): Designing and testing electronic circuits Financial simulations: Predicting stock market movements or calculating complex investment scenarios 7. Bare Metal Instances Notable types: m5.metal, r5.metal (Intel Xeon) Primary use: Full access to underlying server resources Practical application: High-performance databases: When databases like Oracle or SQL Server require direct access to server resources Sensitive workloads: Tasks that must comply with strict regulatory or security requirements Each EC2 instance family is tailored for specific workload requirements, and the underlying hardware providers further influence their performance. Users can achieve optimal performance and cost efficiency by aligning the workload with the appropriate instance family and hardware.
This is an article from DZone's 2023 Observability and Application Performance Trend Report.For more: Read the Report Employing cloud services can incur a great deal of risk if not planned and designed correctly. In fact, this is really no different than the challenges that are inherit within a single on-premises data center implementation. Power outages and network issues are common examples of challenges that can put your service — and your business — at risk. For AWS cloud service, we have seen large-scale regional outages that are documented on the AWS Post-Event Summaries page. To gain a broader look at other cloud providers and services, the danluu/post-mortems repository provides a more holistic view of the cloud in general. It's time for service owners relying (or planning) on a single region to think hard about the best way to design resilient cloud services. While I will utilize AWS for this article, it is solely because of my level of expertise with the platform and not because one cloud platform should be considered better than another. A Single-Region Approach Is Doomed to Fail A cloud-based service implementation can be designed to leverage multiple availability zones. Think of availability zones as distinct locations within a specific region, but they are isolated from other availability zones in that region. Consider the following cloud-based service running on AWS inside the Kubernetes platform: Figure 1: Cloud-based service utilizing Kubernetes with multiple availability zones In Figure 1, inbound requests are handled by Route 53, arrive at a load balancer, and are directed to a Kubernetes cluster. The controller routes requests to the service that has three instances running, each in a different availability zone. For persistence, an Aurora Serverless database has been adopted. While this design protects from the loss of one or two availability zones, the service is considered at risk when a region-wide outage occurs, similar to the AWS outage in the US-EAST-1 region on December 7th, 2021. A common mitigation strategy is to implement stand-by patterns that can become active when unexpected outages occur. However, these stand-by approaches can lead to bigger issues if they are not consistently participating by handling a portion of all requests. Transitioning to More Than Two With single-region services at risk, it's important to understand how to best proceed. For that, we can draw upon the simple example of a trucking business. If you have a single driver who operates a single truck, your business is down when the truck or driver is unable to fulfill their duties. The immediate thought here is to add a second truck and driver. However, the better answer is to increase the fleet by two, which allows for an unexpected issue to complicate the original situation. This is known as the "n + 2" rule, which becomes important when there are expectations set between you and your customers. For the trucking business, it might be a guaranteed delivery time. For your cloud-based service, it will likely be measured in service-level objectives (SLOs) and service-level agreements (SLAs). It is common to set SLOs as four nines, meaning your service is operating as expected 99.99% of the time. This translates to the following error budgets, or down time, for the service: Month = 4 minutes and 21 seconds Week = 1 minute and 0.48 seconds Day = 8.6 seconds If your SLAs include financial penalties, the importance of implementing the n + 2 rule becomes critical to making sure your services are available in the wake of an unexpected regional outage. Remember, that December 7, 2021 outage at AWS lasted more than eight hours. The cloud-based service from Figure 1 can be expanded to employ a multi-region design: Figure 2: Multi-region cloud-based service utilizing Kubernetes and multiple availability zones With a multi-region design, requests are handled by Route 53 but are directed to the best region to handle the request. The ambiguous term "best" is used intentionally, as the criteria could be based upon geographical proximity, least latency, or both. From there, the in-region Kubernetes cluster handles the request — still with three different availability zones. Figure 2 also introduces the observability layer, which provides the ability to monitor cloud-based components and establish SLOs at the country and regional levels. This will be discussed in more detail shortly. Getting Out of the Toil Game Google Site Reliability Engineering's Eric Harvieux defined toil as noted below: "Toil is the kind of work that tends to be manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as a service grows." When designing services that run in multiple regions, the amount of toil that exists with a single region becomes dramatically larger. Consider the example of creating a manager-approved change request every time code is deployed into the production instance. In the single-region example, the change request might be a bit annoying, but it is something a software engineer is willing to tolerate. Now, with two additional regions, this will translate to three times the amount of change requests, all with at least one human-based approval being required. An obtainable and desirable end-state should still include change requests, but these requests should become part of the continuous delivery (CD) lifecycle and be created automatically. Additionally, the observability layer introduced in Figure 2 should be leveraged by the CD tooling in order to monitor deployments — rolling back in the event of any unforeseen circumstances. With this approach, the need for human-based approvals is diminished, and unnecessary toil is removed from both the software engineer requesting the deployment and the approving manager. Harnessing the Power of Observability Observability platforms measure a system's state by leverage metrics, logs, and traces. This means that a given service can be measured by the outputs it provides. Leading observability platforms go a step further and allow for the creation of synthetic API tests that can be used to exercise resources for a given service. Tests can include assertions that introduce expectations — like a particular GET request will respond with an expected response code and payload within a given time period. Otherwise, the test will be marked as failed. SLOs can be attached to each synthetic test, and each test can be executed in multiple geographical locations, all monitored from the observability platform. Taking this approach allows service owners the ability to understand service performance from multiple entry points. With the multi-region model, tests can be created and performance thereby monitored at the regional and global levels separately, thus producing a high degree of certainty on the level of performance being produced in each region. In every case, the power of observability can justify the need for manual human-based change approvals as noted above. Bringing It All Together From the 10,000-foot level, the multiregion service implementation from Figure 2 can be placed onto a United States map. In Figure 3, the database connectivity is mapped to demonstrate the inner-region communication, while the observability and cloud metrics data are gathered from AWS and the observability platform globally. Figure 3: Multi-region service adoption placed near the respective AWS regions Service owners have peace of mind that their service is fully functional in three regions by implementing the n + 2 rule. In this scenario, the implementation is prepared to survive two complete region outages. As an example, the eight-hour AWS outage referenced above would not have an impact on the service's SLOs/ SLAs during the time when one of the three regions is unavailable. Charting a Plan Toward Multi-Region Implementing a multi-region footprint for your service without increasing toil is possible, but it does require planning. Some high-level action items are noted below: Understand your persistence layer – Understanding your persistence layer early on is key. If multiple-write regions are not a possibility, alternative approaches will be required. Adopt Infrastructure as Code – The ability to define your cloud infrastructure via code is critical to eliminate toil and increase the ability to adopt additional regions, or even zones. Use containerization – The underlying service is best when containerized. Build the container you wish to deploy during the continuous integration stage and scan for vulnerabilities within every layer of the container for added safety. Reduce time to deploy – Get into the habit of releasing often, as it only makes your team stronger. Establish SLOs and synthetics – Take the time to set SLOs for your service and write synthetic tests to constantly measure your service — across every environment. Automate deployments – Leverage observability during the CD stage to deploy when a merge-to-main event occurs. If a dev deploys and no alerts are emitted, move on to the next environment and continue all the way to production. Conclusion It's important to understand the limitations of the platform where your services are running. Leveraging a single region offered by your cloud provider is only successful when there are zero region-wide outages. Based upon prior history, this is no longer good enough and is certain to happen again. No cloud provider is ever going to be 100% immune from a region-wide outage. A better approach is to utilize the n + 2 rule and increase the number of regions your service is running in by two additional regions. In taking this approach, the service will still be able to respond to customer requests in the event of not only one regional outage but also any form of outage in a second region where the service is running. By adopting the n + 2 approach, there is a far better chance at meeting SLAs set with your customers. Getting to this point will certainly present challenges but should also provide the opportunity to cut down (or even eliminate) toil within your organization. In the end, your customers will benefit from increased service resiliency, and your team will benefit from significant productivity gains. Have a really great day! Resources AWS Post-Event Summaries, AWS Summary of the AWS Service Event in the Northern Virginia (US-EAST-1) Region, AWS danluu/post-mortems, GitHub "Identifying and Tracking Toil Using SRE Principles" by Eric Harvieux, 2020 "Failure Recovery: When the Cure Is Worse Than the Disease" by Guo et al., 2013 This is an article from DZone's 2023 Observability and Application Performance Trend Report.For more: Read the Report
Nowadays in the agile way of the software development lifecycle, continuous integration and continuous delivery enable software delivery workflows to include multiple teams and functions spanning over development, assurance, operations, and security. What Are Software Design Patterns? Software design patterns are best practices that are followed in order to resolve common problems in development. By following software patterns, a development team can follow the same practices to deliver, build, and deploy code in a much more efficient and systematic way. Software design anti-patterns are the malpractices that can cause harm in the way software development is being done. Continuous Integration Software Design Patterns and Anti-Patterns Continuous integration is an automated integration process that gets the source code from multiple branches to be merged into a main branch which is then used as a reference to deploy the development code to different environments. Using certain patterns cleanses the code to be made deployment-ready. CI Pipeline Patterns and Anti-Patterns Version Controlling the Source Code The definition of standards plays an important role in the continuous integration chain. Now, there are several conventions that make it possible to facilitate the understanding of an application's source code and software development lifecycle. Defining conventions, therefore, has a major impact on both the individual or team, and at the level of automated processes. Continuous Integration Version Control Patterns: Define better conventions to set better contexts for the development lifecycle. Build on every change done to a commit, branch, merge, and pull request. Add useful information to commit messages, use a proper branch naming convention, and standardize the application version. Use pre- and- post actions on commits, merges, and pull requests. Continuous Integration Version Control Anti-Patterns: Have limited builds per sprint, per week; cherry-picking commits. Use a non-relevant branch name and meaningless commit messages. Use different versions for different applications for build. Test the maximum of the source code manually after packaging or deploying it. Running Builds Periodically The build phase is the most important phase of the continuous integration cycle. In this phase, several validations are required and considerations are ensured to make sure that the application has been packaged properly for deployment. Related Tutorial: Azure DevOps and Deploying a Mule Application into Cloudhub Continuous Integration Build Patterns: Use a fresh isolated environment to build the application and control the allocated resources to avoid impacting other builds. Automatically release and deploy a new version on every new commit, branch, merge, or pull request. Test the weekly builds to identify potential issues proactively instead of waiting for a code update. Deploy a hotfix as soon as possible. Test the code in staging before moving it to production. Deploy the build free of any security vulnerabilities and sensitive data exposure; take action immediately if a severity is defined, and disassociate passwords from the source code. Lint and format code to make the source code more readable. Run a set of tests automatically on each build to run specific sets periodically. Run the tests in the same pattern across different platforms using the same set of test data to compare results. Continuous Integration Build Anti-Patterns: Always use the same environment without handling dependency issues; not optimizing resources for subsequent builds and potentially impacting other builds as well. Start a build manually after every sprint or week, depending upon task allocation. Schedule a hotfix directly to the production environment. Add in sensitive data, like usernames, passwords, tokens, etc., to configuration files. Not setting in code quality standards and semantics Run tests manually after deployment. Run a test framework that would fail because of the status of the infrastructure. Continuous Deployment Software Design Patterns and Anti-Patterns Continuous deployment enables operations and workflows. The goal is to safely deliver artifacts into the different environments in a repeated fashion with a lesser error rate. The continuous deployment process helps with automating the operational services of releasing, deploying, and monitoring applications. Validations and Release Management The delivery phase is an extension of the continuous integration phase where the system needs to handle the automated process of deploying all code changes to a stable test environment to qualify the working functionality of the source code and the working version of the source code before deployment. Learn more about the release pipeline using Azure DevOps. Continuous Deployment Validation and Release Management Patterns: Automate the verification and validation procedure of the released version of the software to include unit, integration, and regression testing. Define an enterprise standard release convention for version management and facilitate automation. Deploy a hotfix when necessary; test the code in a pre-prod environment before moving it to production. Continuous Deployment Validation and Release Management Anti-Patterns: Use manual tests to verify and validate software. Do not increment the application version to overwrite the previous existing versions. Schedule the deployment of a hotfix; test directly in production. Related Guide: Automation Testing in CI/CD Pipelines Deployment Deployment is done once the feature is tested in a pre-production environment for any regression issues or uncaught errors in the platform. Continuous Deployment Patterns: Run the build process once while deploying to multiple target environments. Deploy code to the production but limit the access to the codebase by enabling a feature flag. Utilize automated provisioning code or scripts to automatically start and destroy environments. Continuous Deployment Anti-Patterns: Run the software build in every stage of the deployment pipeline. Wait to commit the code until the feature development has been completed. Rollback Rollback becomes very important if there's a deployment failure, which essentially means getting back the system to the previous working state. Continuous Deployment Rollback Patterns: Provide a single command rollback of changes after an unsuccessful deployment. Keep the environmental configuration changes. Externalize all variable values from the application configuration as build/deployment properties. Continuous Deployment Rollback Anti-Patterns: Manually undo changes applied to rollback the deployed code. Hardcode values inside the source code based on the target environments. Documentation for Steps and Procedures Documentation is a significant component of the deployment process flow to stream the respective information across stakeholders at every team level. Continuous Deployment Documentation Pattern: Define a standard of documentation that can be understood by every team. Continuous Deployment Documentation Anti-pattern: Keeping the documentation restricted to specific teams. Conclusion The CI/CD process is an omnipotent part of the software delivery lifecycle. It gives an enterprise the power to release quality code that follows all standardizations and compliance into the production environment. The CI/CD software patterns and anti-patterns are important to understand as they give immense potential to standardize the quality of code delivery. If the CI/CD process can be established with the right principles, it will help reduce fallacies and reduce the time-to-market of the product. Additional Resources "DevOps tech: Continuous delivery" "DevOps Metrics: Why, what, and how to measure success in DevOps" Progressive Delivery Patterns and Anti-Patterns Refcard Introduction to DevSecOps Refcard The Essentials of GitOps Refcard Getting Started With Feature Flags Refcard Getting Started With Log Management Refcard End-to-End Testing Automation Essentials Refcard
This is an article from DZone's 2023 Observability and Application Performance Trend Report.For more: Read the Report In today's digital landscape, the growing importance of monitoring and managing application performance cannot be overstated. With businesses increasingly relying on complex applications and systems to drive their operations, ensuring optimal performance has become a top priority. In essence, efficient application performance management can mean the difference between business success and failure. To better understand and manage these sophisticated systems, two key components have emerged: telemetry and observability. Telemetry, at its core, is a method of gathering and transmitting data from remote or inaccessible areas to equipment for monitoring. In the realm of IT systems, telemetry involves collecting metrics, events, logs, and traces from software applications and infrastructure. This plethora of data is invaluable as it provides insight into system behavior, helping teams identify trends, diagnose problems, and make informed decisions. In simpler terms, think of telemetry as the heartbeat monitor of your application, providing continuous, real-time updates about its health. Observability takes this concept one step further. It's important to note that while it does share some similarities with traditional monitoring, there are distinct differences. Traditional monitoring involves checking predefined metrics or logs for anomalies. Observability, on the other hand, is a more holistic approach. It not only involves gathering data but also understanding the "why" behind system behavior. Observability provides a comprehensive view of your system's internal state based on its external outputs. It helps teams understand the overall health of the system, detect anomalies, and troubleshoot potential issues. Simply put, if telemetry tells you what is happening in your system, observability explains why it's happening. The Emergence of Telemetry and Observability in Application Performance In the early days of information systems, understanding what a system was doing at any given moment was a challenge. However, the advent of telemetry played a significant role in mitigating this issue. Telemetry, derived from Greek roots tele (remote) and metron (measure), is fundamentally about measuring data remotely. This technique has been used extensively in various fields such as meteorology, aerospace, and healthcare, long before its application in information technology. As the complexity of systems grew, so did the need for more nuanced understanding of their behavior. This is where observability — a term borrowed from control theory — entered the picture. In the context of IT, observability is not just about collecting metrics, logs, and traces from a system, but about making sense of that data to understand the internal state of the system based on the external outputs. Initially, these concepts were applied within specific software or hardware components, but with the evolution of distributed systems and the challenges they presented, the application of telemetry and observability became more systemic. Nowadays, telemetry and observability are integral parts of modern information systems, helping operators and developers understand, debug, and optimize their systems. They provide the necessary visibility into system performance, usage patterns, and potential bottlenecks, enabling proactive issue detection and resolution. Emerging Trends and Innovations With cloud computing taking the center stage in the digital transformation journey of many organizations, providers like Amazon Web Services (AWS), Azure, and Google Cloud have integrated telemetry and observability into their services. They provide a suite of tools that enable users to collect, analyze, and visualize telemetry data from their workloads running on the cloud. These tools don't just focus on raw data collection but also provide features for advanced analytics, anomaly detection, and automated responses. This allows users to transform the collected data into actionable insights. Another trend we observe in the industry is the adoption of open-source tools and standards for observability like OpenTelemetry, which provides a set of APIs, libraries, agents, and instrumentation for telemetry and observability. The landscape of telemetry and observability has come a long way since its inception, and continues to evolve with technology advancements and changing business needs. The incorporation of these concepts into cloud services by providers like AWS and Azure has made it easier for organizations to gain insights into their application performance, thereby enabling them to deliver better user experiences. The Benefits of Telemetry and Observability The world of application performance management has seen a paradigm shift with the adoption of telemetry and observability. This section delves deep into the advantages provided by these emerging technologies. Enhanced Understanding of System Behavior Together, telemetry and observability form the backbone of understanding system behavior. Telemetry, which involves the automatic recording and transmission of data from remote or inaccessible parts of an application, provides a wealth of information about the system's operations. On the other hand, observability derives meaningful insights from this data, allowing teams to comprehend the internal state of the system from its external outputs. This combination enables teams to proactively identify anomalies, trends, and potential areas of improvement. Improved Fault Detection and Resolution Another significant advantage of implementing telemetry and observability is the enhanced ability to detect and resolve faults. There are tools that allow users to collect and track metrics, collect and monitor log files, set alarms, and automatically react to changes in configuration. This level of visibility hastens the detection of any operational issues, enabling quicker resolution and reducing system downtime. Optimized Resource Utilization These modern application performance techniques also facilitate optimized resource utilization. By understanding how resources are used and identifying any inefficiencies, teams can make data-driven decisions to optimize resource allocation. An auto-scaling feature — which adjusts capacity to maintain steady, predictable performance at the lowest possible cost — is a prime example of this benefit. Challenges in Implementing Telemetry and Observability Implementing telemetry and observability into existing systems is not a straightforward task. It involves a myriad of challenges, stemming from the complexity of modern applications to the sheer volume of data that needs to be managed. Let's delve into these potential pitfalls and roadblocks. Potential Difficulties and Roadblocks The first hurdle is the complexity of modern applications. They are typically distributed across multiple environments — cloud, on-premises, hybrid, and even multi-cloud setups. This distribution makes it harder to understand system behavior, as the data collected could be disparate and disconnected, complicating telemetry efforts. Another challenge is the sheer volume, speed, and variety of data. Modern applications generate massive amounts of telemetry data. Collecting, storing, processing, and analyzing this data in real time can be daunting. It requires robust infrastructure and efficient algorithms to handle the load and provide actionable insights. Also, integrating telemetry and observability into legacy systems can be difficult. These older systems may not be designed with telemetry and observability in mind, making it challenging to retrofit them without impacting performance. Strategies To Mitigate Challenges Despite these challenges, there are ways to overcome them. For the complexity and diversity of modern applications, adopting a unified approach to telemetry can help. This involves using a single platform that can collect, correlate, and analyze data from different environments. To tackle the issue of data volume, implementing automated analytics and machine learning algorithms can be beneficial. These technologies can process large datasets in real time, identifying patterns and providing valuable insights. For legacy system integration issues, it may be worthwhile to invest in modernizing these systems. This could mean refactoring the application or adopting new technology stacks that are more conducive to telemetry and observability. Finally, investing in training and up-skilling teams on tools and best practices can be immensely beneficial. Practical Steps for Gaining Insights Both telemetry and observability have become integral parts of modern application performance management. They offer in-depth insights into our systems and applications, enabling us to detect and resolve issues before they impact end-users. Importantly, these concepts are not just theoretical — they're put into practice every day across services provided by leading cloud providers such as AWS and Google Cloud. In this section, we'll walk through a step-by-step guide to harnessing the power of telemetry and observability. I will also share some best practices to maximize the value you gain from these insights. Step-By-Step Guide The following are steps to implement performance management of a modern application using telemetry and observability on AWS, though this is also possible to implement using other cloud providers: Step 1 – Start by setting up AWS CloudWatch. CloudWatch collects monitoring and operational data in the form of logs, metrics, and events, providing you with a unified view of AWS resources, applications, and services. Step 2 – Use AWS X-Ray for analyzing and debugging your applications. This service provides an end-to-end view of requests as they travel through your application, showing a map of your application's underlying components. Step 3 – Implement AWS CloudTrail to keep track of user activity and API usage. CloudTrail enhances visibility into user and resource activity by recording AWS Management Console actions and API calls. You can identify which users and accounts called AWS, the source IP address from which the calls were made, and when the calls occurred. Step 4 – Don't forget to set up alerts and notifications. AWS SNS (Simple Notification Service) can be used to send you alerts based on the metrics you define in CloudWatch. Figure 1: An example of observability on AWS Best Practices Now that we've covered the basics of setting up the tools and services for telemetry and observability, let's shift our focus to some best practices that will help you derive maximum value from these insights: Establish clear objectives – Understand what you want to achieve with your telemetry data — whether it's improving system performance, troubleshooting issues faster, or strengthening security measures. Ensure adequate training – Make sure your team is adequately trained in using the tools and interpreting the data provided. Remember, the tools are only as effective as the people who wield them. Be proactive rather than reactive – Use the insights gained from telemetry and observability to predict potential problems before they happen instead of merely responding to them after they've occurred. Conduct regular reviews and assessments – Make it a point to regularly review and update your telemetry and observability strategies as your systems evolve. This will help you stay ahead of the curve and maintain optimal application performance. Conclusion The rise of telemetry and observability signifies a paradigm shift in how we approach application performance. With these tools, teams are no longer just solving problems — they are anticipating and preventing them. In the complex landscape of modern applications, telemetry and observability are not just nice-to-haves; they are essentials that empower businesses to deliver high-performing, reliable, and user-friendly applications. As applications continue to evolve, so will the tools that manage their performance. We can anticipate more advanced telemetry and observability solutions equipped with AI and machine learning capabilities for predictive analytics and automated anomaly detection. These advancements will further streamline application performance management, making it more efficient and effective over time. This is an article from DZone's 2023 Observability and Application Performance Trend Report.For more: Read the Report
This is an article from DZone's 2023 Observability and Application Performance Trend Report.For more: Read the Report Agile development practices must be supported by an agile monitoring framework. Overlooking the nuances of the system state — spanning infrastructure, application performance, and user interaction — is a risk businesses can't afford. This is particularly true when performance metrics and reliability shape customer satisfaction and loyalty, directly influencing the bottom line. Traditional application performance monitoring (APM) tools were initially designed for environments that were more static and predictable. These tools were designed to track neither the swift, iterative changes of microservice architectures nor the complexities of cloud-native applications. This led to the gradual evolution of the modern observability approach that leveraged the data collection principles of APM and extended them to provide deeper insights into a system's state. In this article, we delve into the core concepts of observability and monitoring while discussing how the modern observability approach differs from and complements traditional monitoring practices. Optimizing Application Performance Through Data Quality Performance metrics are only as reliable as the data feeding them. Diverse data sources, each with their own format and scale, can convolute the true picture of application performance. Given the "garbage in, garbage out" challenge, data normalization serves as the corrective measure where a dataset is reorganized to reduce redundancy and improve data integrity. The primary aim is to ensure that data is stored efficiently and consistently, which makes it easier to retrieve, manipulate, and make sense of. For APM, there are various normalization techniques that help bring heterogeneous data onto a common scale so that it can be compared and analyzed more effectively: Unit conversion – standardizing units of measure, like converting all time-based metrics to milliseconds Range scaling – adjusting metrics to a common range; useful for comparing metrics that originally existed on different scales Z-score normalization – converting metrics to a standard distribution, which is especially useful when dealing with outlier values Monitoring vs. Observability: Core Concepts In optimizing application performance, monitoring and observability play equally critical but distinct roles. Some often inaccurately use the terms interchangeably, but there's a nuanced difference. Monitoring follows a proactive approach of collecting data points based on predefined thresholds and setting up alarms to flag anomalies. This essentially answers the question, Is my system working as expected? On the other hand, observability allows for deep dives into system behavior, offering insights into issues that you didn't know existed. The approach helps you answer, Why isn't my system working as expected? Example Use Case: E-Commerce Platform For context, consider an e-commerce platform where application uptime and user experience are critical. To ensure everything is running smoothly, the right blend of monitoring and observability strategies can be broken down as follows. MONITORING vs. OBSERVABILITY FOR AN E-COMMERCE PLATFORM Strategy Type Strategy Name Purpose Monitoring Availability checks Regular pings to ensure the website is accessible Latency metrics Measuring page load times to optimize user experience Error rate tracking Flags raised if server errors like "404 Not Found" exceed a threshold Transaction monitoring Automated checks for crucial processes like checkout Observability Log analysis Deep inspection of server logs to trace failed user requests Distributed tracing Maps the path of a request through various services Event tagging Custom tags in code for real-time understanding of user behavior Query-driven exploration Ad hoc queries to examine system behavior Table 1 Synergy Between Monitoring and Observability Monitoring and observability don't conflict; instead, they work hand-in-hand to develop an efficient APM framework. Integrating monitoring and observability will allow you to realize numerous advantages, including those listed below: Enhanced coverage– Monitoring identifies known issues while observability lets you explore the unknown. From system crashes to subtle performance degradations, everything gets covered here. In practical terms, this may mean simply not knowing that your server responded with a 500 error but also understanding why it occurred and what its effects are to the entire ecosystem. Improved analysis – A blended approach enables you to pivot from what is happening to why it's happening. This is crucial for data-driven decision-making. You can allocate resources more effectively, prioritize bug fixes, or even discover optimization opportunities you didn't know existed. For example, you might find that certain API calls are taking longer only during specific times of the day and trace it back to another internal process hogging resources. Scalability– As your system grows, its complexity often grows exponentially. The scalability of your APM can be significantly improved when both monitoring and observability work in sync. Monitoring helps you keep tabs on performance indicators, but observability allows you to fine-tune your system for optimal performance at scale. As a result, you achieve a scalable way to not just proactively identify bottlenecks and resource constraints but also to investigate and resolve them. Figure 1: How observability and monitoring overlap Creating a Cohesive System Synergizing monitoring and observability is one of the most critical aspects of building a robust, scalable, and insightful APM framework. The key here is to build an environment where monitoring and observability are not just coexisting but are codependent, thus amplifying each other's efficacy in maintaining system reliability. While different use cases may require different approaches, consider the following foundational approaches to build a cohesive monitoring and observability stack. Unified Data Storage and Retrieval The first step towards creating a cohesive analytics pipeline is unified data storage. A single data storage and retrieval system enhances the speed and accuracy of your analytics. Your performance analysis stack should accommodate both fixed metrics from monitoring and dynamic metrics from observability. At its core, the underlying system architecture should be capable of handling different data types efficiently. Solutions like time series databases or data lakes can often serve these varied needs well. However, it's crucial to consider the system's capability for data indexing, searching, and filtering, especially when dealing with large-scale, high-velocity data. Interoperability Between Specialized Tools An agile APM system relies on seamless data exchange between monitoring and observability tools. When each tool operates as a disjointed/standalone system, the chances of getting siloed data streams and operational blind spots increase. Consider building an interoperable system that allows you to aggregate data into a single, comprehensive dashboard. Opt for tools that adhere to common data formats and communication protocols. A more advanced approach of achieving this is to leverage a custom middleware to serve as a bridge between different tools. As an outcome, you can correlate monitoring KPIs with detailed logs and traces from your observability tools. Data-Driven Corrective Actions Knowing exactly what needs to be fixed allows for quicker remediation. This speed is vital in a live production environment where every minute of suboptimal performance can translate to lost revenue or user trust. When your monitoring system flags an anomaly, the logical next step is a deep dive into the underlying issue. For instance, a monitoring system alerts you about a sudden spike in error rates, but it doesn't tell you why. Integrating observability tools helps to correlate the layers. These tools can sift through log files, query databases, and analyze trace data, ultimately offering a more granular view. As a result, you're equipped to take targeted, data-driven actions. To streamline this further, consider establishing automated workflows. An alert from the monitoring system can trigger predefined queries in your observability tools, subsequently fast-tracking the identification of the root cause. Distinguishing Monitoring From Observability While the approach of monitoring and observability often intersect, their objectives, methods, and outcomes are distinct in the following ways. Metrics vs. Logs vs. Traces Monitoring primarily revolves around metrics. Metrics are predefined data points that provide quantifiable information about your system's state, indicating when predefined thresholds are breached. These are typically numerical values, such as CPU utilization, memory usage, or network latency. Observability, on the other hand, focuses typically on logs and traces. Logs capture specific events and information that are essential for deep dives when investigating issues. These contain rich sources of context and detail, allowing you to reconstruct events or understand the flow of a process. Traces additionally provide a broader perspective. They follow a request's journey through your system, tracking its path across various services and components. Traces are particularly useful in identifying bottlenecks, latency issues, and uncovering the root causes of performance problems. Reactive vs. Proactive Management Focusing on predefined thresholds through metrics, monitoring predominantly adopts a reactive management approach. When a metric breaches these predefined limits, it offers quick responses to support a broader performance analysis strategy. This reactive nature of monitoring is ideal for addressing known problems promptly but may not be well-suited for handling complex and novel issues that require a more proactive and in-depth approach. While monitoring excels at handling known issues with predefined thresholds, observability extends the scope to tackle complex and novel performance challenges through proactive and comprehensive analysis. This dynamic, forward-looking approach helps constantly analyze data sources, looking for patterns and anomalies that might indicate performance issues, such as a subtle change in response times, a small increase in error rates, or any other deviations from the expected behavior. Observability then initiates a comprehensive investigation to understand the root causes and take corrective actions. Fixed Dashboards vs. Ad Hoc Queries Monitoring systems typically feature fixed dashboards to display a predefined set of metrics and performance indicators. Most modern monitoring tools can be configured with specific metrics and data points that are considered essential for tracking the system's well-being. The underlying metrics can be selected based on the historical understanding of the system and industry best practices. Although fixed dashboards are optimized to answer known questions efficiently, they lack the flexibility to address unforeseen or complex problems and may not provide the necessary data points to investigate effectively. Conversely, observability offers a dynamic and real-time approach to querying your system's performance data. These ad hoc queries can be tailored to specific, context-sensitive issues. The technical foundation of such queries lies in their ability to analyze vast amounts of data from diverse sources and over a rich dataset that includes metrics, logs, and traces. This flexible querying capability provides invaluable flexibility for troubleshooting new or unanticipated issues. When a previously unseen problem occurs, you can create custom queries to extract relevant data for detailed analysis. The following comparative table emphasizes how each set of key performance indicators (KPIs) aligns with the underlying philosophy and how monitoring and observability contribute to system management: MONITORING vs. OBSERVABILITY KPIs KPIs Monitoring Observability Primary objective Ensure system is functioning within set parameters Understand system behavior and identify anomalies Nature of data Metrics Metrics, logs, traces Key metrics CPU usage, memory usage, network latency Error rates, latency distribution, user behavior Data collection method Pre-defined data points Dynamic data points Scope Reactive: addresses known issues Proactive: explores known and unknown issues Visual representation Fixed dashboards Ad hoc queries, dynamic dashboards Alerts Threshold-based Anomaly-based Scale of measurement Usually single-dimension metrics Multi-dimensional metrics Table 2 Conclusion The strength of a perpetually observable system is its proactive nature. To harness the full potential of observability though, one must capture the right data — the kind that deciphers both predictable and unpredictable production challenges. Embrace a culture that emphasizes refining your application's instrumentation. A recommended approach is to set up a stack where any query about your app's performance gets its due response. It is also important to note that observability is an evolving process and not a one-time setup. As your application scales and changes, so should your instrumentation capabilities. This approach ensures that queries — whether they probe routine operations or unexpected anomalies — receive the informed responses that only a finely tuned, responsive observability framework can provide. To further strengthen your journey with observability, consider exploring these resources: OpenTelemetry documentation "Prioritizing Gartner's APM Model: The APM Conceptual Framework" by Larry Dragich "Elevating System Management: The Role of Monitoring and Observability in DevOps" by Saurabh Pandey This is an article from DZone's 2023 Observability and Application Performance Trend Report.For more: Read the Report
5 Steps To Tame Unplanned Work
November 27, 2023
by
CORE
November 27, 2023
by
CORE
Unblock Your Software Engineers With Unblocked
November 27, 2023
by
CORE
Integration of Big Data in Data Management
November 27, 2023 by
Explainable AI: Making the Black Box Transparent
May 16, 2023 by
Unblock Your Software Engineers With Unblocked
November 27, 2023
by
CORE
Integration of Big Data in Data Management
November 27, 2023 by
Low Code vs. Traditional Development: A Comprehensive Comparison
May 16, 2023 by
Securing the Cloud: Navigating the Frontier of Cloud Security
November 27, 2023 by
November 27, 2023 by
Unblock Your Software Engineers With Unblocked
November 27, 2023
by
CORE
5 Steps To Tame Unplanned Work
November 27, 2023
by
CORE
Low Code vs. Traditional Development: A Comprehensive Comparison
May 16, 2023 by
Unblock Your Software Engineers With Unblocked
November 27, 2023
by
CORE
Mastering Cloud Migration: Best Practices to Make it a Success
November 27, 2023 by
Five IntelliJ Idea Plugins That Will Change the Way You Code
May 15, 2023 by