Programming languages allow us to communicate with computers, and they operate like sets of instructions. There are numerous types of languages, including procedural, functional, object-oriented, and more. Whether you’re looking to learn a new language or trying to find some tips or tricks, the resources in the Languages Zone will give you all the information you need and more.
An In-Depth Guide to Threads in OpenAI Assistants API
Designing AI Multi-Agent Systems in Java
Guará is the Python implementation of the design pattern Page Transactions. It is more of a programming pattern than a tool. As a pattern, it can be bound to any driver other than Selenium, including the ones used for Linux, Windows, and Mobile automation. The intent of this pattern is to simplify test automation. It was inspired by Page Objects, App Actions, and Screenplay. Page Transactions focus on the operations (transactions) a user can perform on an application, such as Login, Logout, or Submit Forms. This initiative was born to improve the readability, maintainability, and flexibility of automation testing code without the need for new automation tools or the necessity of many abstractions to build human-readable statements. Another concern was to avoid binding the framework to specific automation tools, like Selenium, leaving the testers free to choose their preferred ones. No new plugin nor new knowledge is required to start using Guará with Helium, Dogtail, PRA Python, Playwright or whatever tool you like or need. It is worth to say again: Guará is the Python implementation of the design pattern Page Transactions. It is more of a programming pattern than a tool. Guará leverages the Command Pattern (GoF) to group user interactions, like pressing buttons or filling texts, into transactions. Although I’m calling it a framework, it is not a new tool. Instead of focusing on UI elements like buttons, check boxes, or text areas, it emphasizes the user journey. The complexity is abstracted into these transactions, making the test statements feel like plain English. Testers also have the flexibility to extend assertions to custom ones that are not provided by the framework. The Framework in Action This simple implementation mimics the user switching languages in a web page: Python import home from selenium import webdriver from guara.transaction import Application from guara import it, setup def test_language_switch(): app = Application(webdriver.Chrome()) # Open the application app.at(setup.OpenApp, url="https://example.com/") # Change language and assert app.at(home.ChangeToPortuguese).asserts(it.IsEqualTo, "Conteúdo em Português") app.at(home.ChangeToEnglish).asserts(it.IsEqualTo, "Content in English") # Close the application app.at(setup.CloseApp) Each user transaction is grouped into its own class (e.g., ChangeToPortuguese) which inherits from AbstractTransaction. The tester just has to override the do method, and the framework does the work. Python from guara.transaction import AbstractTransaction class ChangeToPortuguese(AbstractTransaction): def do(self, **kwargs): self._driver.find_element(By.CSS_SELECTOR, ".btn-pt").click() return self._driver.find_element(By.CSS_SELECTOR, ".content").text The tester can check the transactions and assertions in the logs after running the tests: Shell test_demo.py::test_language_switch 2025-01-24 21:07:10 INFO Transaction: setup.OpenApp 2025-01-24 21:07:10 INFO url: https://example.com/ 2025-01-24 21:07:14 INFO Transaction: home.ChangeToPortuguese 2025-01-24 21:07:14 INFO Assertion: IsEqualTo 2025-01-24 21:07:14 INFO Actual Data: Conteúdo em Português 2025-01-24 21:07:14 INFO Expected: Conteúdo em Português 2025-01-24 21:07:14 INFO Transaction: home.ChangeToEnglish 2025-01-24 21:07:14 INFO Assertion: IsEqualTo 2025-01-24 21:07:14 INFO Actual Data: Content in English 2025-01-24 21:07:14 INFO Expected: Content in English 2025-01-24 21:07:14 INFO Transaction: setup.CloseApp The tester can also use fixtures like set up and tear down to initiate and finish the tests. Remember, it is not a new tool, so you can use pytest or unit testing features without any problem. The Pattern Explained AbstractTransaction: This is the class from which all transactions inherit. The do method is implemented by each transaction. In this method, calls to WebDriver are placed. If the method returns something, like a string, the automation can use it for assertions. setup.OpenApp and setup.CloseApp are part of the framework and provide basic implementation to open and close the web application using Selenium Webdriver. IAssertion: This is the interface implemented by all assertion classes. The asserts method of each subclass contains the logic to perform validations. For example, the IsEqualTo subclass compares the result with the expected value provided by the tester.Testers can inherit from this interface to add new sub-classes of validations that the framework does not natively supportit is the module that contains the concrete assertions. Application: This is the runner of the automation. It executes the do method of each transaction and validates the result using the asserts method. The asserts method receives a reference to an IAssertion instance. It implements the Strategy Pattern (GoF) to allow its behavior to change at runtime.Another important component of the application is the result property. It holds the result of the transaction, which can be used by asserts or inspected by the test using the native built-in assert method. Why Use Guará? Each class represents a complete user transaction, improving code re-usability. Also, the code is written in plain English, making it easier for non-technical collaborators to review and contribute. Custom assertions can be created and shared by testers. Additionally, Guará can be integrated with any non-Selenium tool. Page Transactions can automate REST APIs, unit tests, desktop, and mobile tests. As a side effect of the Command Pattern, the framework can even be used in product development. Using Guará Setting up Guará is simple: 1. Install Guará with the command: Plain Text pip install guara 2. Build your transactions using the AbstractTransaction class. 3. Invoke the transactions using the application runner and its methods at and asserts. 3. Execute tests with detailed logging with Pytest: Python python -m pytest -o log_cli=1 --log-cli-level=INFO For more examples, check out the tutorial. Conclusion Guará is a new way testers can organize their code, making it simple to read, maintain, and integrate with any automation driver. It improves the collaboration between testers and non-technical members. Testers can also extend it by building and sharing new assertions. Try Guará today!
XML is one of the most widely used data formats, which in popularity can compete only with JSON. Still, very often, this format is used as an intermediate representation of data that needs to be transferred between two information systems. And like any intermediate representation the final storage point of XML is a database. Usually, XPath is used to parse XML because it represents a set of functions that allows you to extract data from an XML tree. However, not all XML files are formed correctly, which creates great difficulties when using XPath. Typical Problems When Working With XPath Differences in node naming. You may have an array of documents with a similar logical structure, but they may have differences in the way node names are spelled.Missing nodes. If bad XML generators are used on the server side, they may skip some nesting levels for some of the data in the resulting XML.Object or array? XPath does not allow you to explicitly specify whether the contents of a particular node are an object or an array of objects.Inability to extend syntax. XPath is just a node traversal tool with no syntax extension capability. In this article, I will discuss a tool called SmartXML that solves these problems and allows you to upload complex XML documents to a database. Project Structure SmartXML uses an intermediate representation when processing data — SmartDOM. Unlike traditional DOM, this structure controls the level of element hierarchy and can complete its nodes. SmartDOM consists of the declarative description itself and sets of rules for its transformation. Three Examples of Documents With a Divergent Structure Example 1 The document has a relatively correct structure. All sections have correct nesting. Plain Text <doc> <commonInfo> <supplyNumber>100480</supplyNumber> <supplyDate>2025-01-20</supplyDate> </commonInfo> <lots> <lot> <objects> <object> <name>apples</name> <price>3.25</price> <currency>USD</currency> </object> <object> <name>oranges</name> <price>3.50</price> <currency>USD</currency> </object> </objects> </lot> <lot> <objects> <object> <name>bananas</name> <price>2.50</price> <currency>EUR</currency> </object> <object> <name>strawberries</name> <price>5.00</price> <currency>USD</currency> </object> <object> <name>grapes</name> <price>3.75</price> <currency>USD</currency> </object> </objects> </lot> </lots> </doc> Example 2 The nesting of sections is broken. The object does not have a parent. Plain Text <doc> <commonInfo> <supplyNumber>100593</supplyNumber> <date>2025-01-21</date> </commonInfo> <lots> <lot> <object> <name>raspberry</name> <price>7.50</price> <currency>USD</currency> </object> </lot> </lots> </doc> Example 3 The nesting is correct, but the node names do not match the other sections. Plain Text <doc> <commonInfo> <supplyNumber>100601</supplyNumber> <date>2025-01-22</date> </commonInfo> <lots> <lot> <objects> <obj> <name>cherries</name> <price>3.20</price> <currency>EUR</currency> </obj> <obj> <name>blueberries</name> <price>4.50</price> <currency>USD</currency> </obj> <obj> <name>peaches</name> <price>2.80</price> <currency>USD</currency> </obj> </objects> </lot> </lots> </doc> As you can see, all three of these documents contain the same data but have different storage structures. Intermediate Data View The full structure of the SmartDOM view from data-templates.red: Plain Text #[ sample: #[ ; section name supply_sample: #[ ; subsection name supply_number: none supply_date: none delivery_items: [ item: [ name: none price: none currency: none ] ] ] ] ] Project Setup Create a project and set up a mapping between SmartDOM and XML tree nodes for each XML file. Now, we need to specify how XML nodes are mapped to SmartDOM. This can be done either in the interface on the Rules tab or in the configuration file grow-rules.red, making it look as follows: Plain Text sample: [ item: ["object" "obj"] ] For correct linking of tables, we also need to specify the name of the tag from the root element, which should be passed to the descendant nodes. Without this, it will be impossible to link two tables. Since we have a unique supply_number, it can be used as a similar key. To do this, let's add it to the injection-rules.red rule: Plain Text sample: [ inject-tag-to-every-children: [supply_number] enumerate-nodes: [] injection-tag-and-recipients: [] ] Now, it remains to create the necessary tables in the database and insert there the results of processing XML files: Plain Text PRAGMA foreign_keys = ON; CREATE TABLE supply_sample ( id INTEGER PRIMARY KEY AUTOINCREMENT, supply_number TEXT NOT NULL UNIQUE, supply_date TEXT NOT NULL ); CREATE TABLE delivery_items ( id INTEGER PRIMARY KEY AUTOINCREMENT, supply_number TEXT NOT NULL, name TEXT NOT NULL, price REAL NOT NULL, currency TEXT NOT NULL, FOREIGN KEY (supply_number) REFERENCES supply_sample(supply_number) ); Result The result of converting three XML files to SQL: Plain Text INSERT INTO supply_sample ("supply_number", "supply_date") VALUES ('100480', '2025-01-20'); INSERT INTO delivery_items ("supply_number", "name", "price", "currency") VALUES ('100480', 'apples', '3.25', 'USD'), ('100480', 'oranges', '3.50', 'USD'), ('100480', 'bananas', '2.50', 'EUR'), ('100480', 'strawberries', '5.00', 'USD'), ('100480', 'grapes', '3.75', 'USD'); -- INSERT INTO supply_sample ("supply_number", "supply_date") VALUES ('100593', '2025-01-21'); INSERT INTO delivery_items ("supply_number", "name", "price", "currency") VALUES ('100593', 'raspberry', '7.50', 'USD'); -- INSERT INTO supply_sample ("supply_number", "supply_date") VALUES ('100601', '2025-01-22'); INSERT INTO delivery_items ("supply_number", "name", "price", "currency") VALUES ('100601', 'cherries', '3.20', 'EUR'), ('100601', 'blueberries', '4.50', 'USD'), ('100601', 'peaches', '2.80', 'USD'); This is what the result looks like in tabular form: Conclusion So, we have demonstrated how you can parse and upload to a database quite complex XML files without writing program code. This solution can be useful for system analysts, as well as other people who often work with XML. In the case of parsing using popular programming languages such as Python, we would have to process each separate file with a separate script, which would require more code and time. You can learn more about the SmartXML project structure in the official documentation.
Visualizing complex digraphs often requires balancing clarity with interactivity. Graphviz is a great tool for generating static graphs with optimal layouts, ensuring nodes and edges don't overlap. On the flip side, Cytoscape.js offers interactive graph visualizations but doesn't inherently prevent overlapping elements, which can clutter the display. This article describes a method to convert Graphviz digraphs into interactive Cytoscape.js graphs. This approach combines Graphviz's layout algorithms with Cytoscape.js's interactive capabilities, resulting in clear and navigable visualizations. By extracting Graphviz's calculated coordinates and bounding boxes and mapping them into Cytoscape.js's format, we can recreate the same precise layouts in an interactive environment. This technique leverages concepts from computational geometry and graph theory. Why This Matters Interactive graphs allow users to engage with data more effectively, exploring relationships and patterns that static images can't convey. By converting Graphviz layouts to Cytoscape.js, we retain the benefits of Graphviz's non-overlapping, well-organized structures while enabling dynamic interaction. This enhances presentation, making complex graphs easier to work with. Technical Steps Here's an overview of the process to convert a Graphviz digraph into a Cytoscape.js graph: 1. Convert Graphviz Output to DOT Format Graphviz can output graphs in DOT format, which contains detailed information about nodes, edges, and their positions. Python import pygraphviz def convert_gviz_image(gviz): graph_dot = pygraphviz.AGraph(str(gviz)) image_str = graph_dot.to_string() return image_str 2. Parse the DOT File and Extract Elements Using libraries like networkx and json_graph, we parse the DOT file to extract nodes and edges along with their attributes. Python import networkx from networkx.readwrite import json_graph def parse_dot_file(dot_string): graph_dot = pygraphviz.AGraph(dot_string) graph_netx = networkx.nx_agraph.from_agraph(graph_dot) graph_json = json_graph.node_link_data(graph_netx) return graph_json 3. Transform Coordinates for Cytoscape.js Graphviz and Cytoscape.js use different coordinate systems. We need to adjust the node positions accordingly, typically inverting the Y-axis to match Cytoscape.js's system. Python def transform_coordinates(node): (x, y) = map(float, node['pos'].split(',')) node['position'] = {'x': x, 'y': -y} return node 4. Calculate Edge Control Points For edges, especially those with curves, we calculate control points to replicate Graphviz's edge paths in Cytoscape.js. This involves computing the distance and weight of each control point relative to the source and target nodes. Edge Control Points Calculation Python def get_control_points(node_pos, edges): for edge in edges: if 'data' in edge: src = node_pos[edge['data']['source']] tgt = node_pos[edge['data']['target']] if src != tgt: cp = edge['data'].pop('controlPoints') control_points = cp.split(' ') d = '' w = '' for i in range(1, len(control_points) - 1): cPx = float(control_points[i].split(",")[0]) cPy = float(control_points[i].split(",")[1]) * -1 result_distance, result_weight = \ get_dist_weight(src['x'], src['y'], tgt['x'], tgt['y'], cPx, cPy) d += ' ' + result_distance w += ' ' + result_weight d, w = reduce_control_points(d[1:], w[1:]) edge['data']['point-distances'] = d edge['data']['point-weights'] = w return edges Python def convert_control_points(d, w): remove_list = [] d_tmp = d w_tmp = w for i in range(len(d)): d_tmp[i] = float(d_tmp[i]) w_tmp[i] = float(w_tmp[i]) if w_tmp[i] > 1 or w_tmp[i] < 0: remove_list.append(w_tmp[i]) d_tmp = [x for x, y in zip(d_tmp, w_tmp) if y not in remove_list] w_tmp = [x for x in w_tmp if x not in remove_list] d_check = [int(x) for x in d_tmp] if len(set(d_check)) == 1 and d_check[0] == 0: d_tmp = [0.0, 0.0, 0.0] w_tmp = [0.1, 0.5, 0.9] return d_tmp, w_tmp In the get_control_points function, we iterate over each edge, and if it connects different nodes, we process its control points: Extract control points: Split the control points string into a list.Calculate distances and weights: For each control point (excluding the first and last), calculate the distance (d) and weight (w) using the get_dist_weight function.Accumulate results: Append the calculated distances and weights to strings d and w.Simplify control points: Call reduce_control_points to simplify the control points for better performance and visualization.Update edge data: The calculated point-distances and point-weights are assigned back to the edge's data. The convert_control_points function ensures that control point weights are within the valid range (0 to 1). It filters out any weights that are outside this range and adjusts the distances accordingly. Distance and Weight Calculation Function The get_dist_weight function calculates the perpendicular distance from a control point to the straight line between the source and target nodes (d) and the relative position of the control point along that line (w): Python import math def get_dist_weight(sX, sY, tX, tY, PointX, PointY): if sX == tX: slope = float('inf') else: slope = (sY - tY) / (sX - tX) denom = math.sqrt(1 + slope**2) if slope != float('inf') else 1 d = (PointY - sY + (sX - PointX) * slope) / denom w = math.sqrt((PointY - sY)**2 + (PointX - sX)**2 - d**2) dist_AB = math.hypot(tX - sX, tY - sY) w = w / dist_AB if dist_AB != 0 else 0 delta1 = 1 if (tX - sX) * (PointY - sY) - (tY - sY) * (PointX - sX) >= 0 else -1 delta2 = 1 if (tX - sX) * (PointX - sX) + (tY - sY) * (PointY - sY) >= 0 else -1 d = abs(d) * delta1 w = w * delta2 return str(d), str(w) This function handles both vertical and horizontal lines and uses basic geometric principles to compute the distances and weights. Simplifying Control Points The reduce_control_points function reduces the number of control points to simplify the edge rendering in Cytoscape.js: Python def reduce_control_points(d, w): d_tmp = d.split(' ') w_tmp = w.split(' ') idx_list = [] d_tmp, w_tmp = convert_control_points(d_tmp, w_tmp) control_point_length = len(d_tmp) if control_point_length > 5: max_value = max(map(float, d_tmp), key=abs) max_idx = d_tmp.index(str(max_value)) temp_idx = max_idx // 2 idx_list = [temp_idx, max_idx, control_point_length - 1] elif control_point_length > 3: idx_list = [1, control_point_length - 2] else: return ' '.join(d_tmp), ' '.join(w_tmp) d_reduced = ' '.join(d_tmp[i] for i in sorted(set(idx_list))) w_reduced = ' '.join(w_tmp[i] for i in sorted(set(idx_list))) return d_reduced, w_reduced This function intelligently selects key control points to maintain the essential shape of the edge while reducing complexity. 5. Build Cytoscape.js Elements With nodes and edges prepared, construct the elements for Cytoscape.js, including the calculated control points. Python def build_cytoscape_elements(graph_json): elements = {'nodes': [], 'edges': []} for node in graph_json['nodes']: node = transform_coordinates(node) elements['nodes'].append({'data': node}) node_positions = {node['data']['id']: node['position'] for node in elements['nodes']} edges = graph_json['links'] edges = get_control_points(node_positions, edges) elements['edges'] = [{'data': edge['data']} for edge in edges] return elements 6. Apply Styling We can style nodes and edges based on attributes like frequency or performance metrics, adjusting colors, sizes, and labels for better visualization. Cytoscape.js offers extensive customization, allowing you to tailor the graph's appearance to highlight important aspects of your data. Conclusion This solution combines concepts from: Graph theory: Understanding graph structures, nodes, edges, and their relationships helps in accurately mapping elements between Graphviz and Cytoscape.js.Computational geometry: Calculating positions, distances, and transformations. Python programming: Utilizing libraries such as pygraphviz, networkx, and json_graph facilitates graph manipulation and data handling. By converting Graphviz digraphs to Cytoscape.js graphs, we achieve interactive visualizations that maintain the clarity of Graphviz's layouts. This approach can be extended to accommodate various types of graphs and data attributes. It's particularly useful in fields like bioinformatics, social network analysis, and any domain where understanding complex relationships is essential.
Libraries can rise to stardom in months, only to crash and fade into obscurity within months. We’ve all seen this happen in the software development world, and my own journey has been filled with “must-have” JavaScript libraries, each claiming to be more revolutionary than the one before. But over the years, I’ve come to realize that the tools we need have been with us all along, and in this article, I’ll explain why it’s worth sticking to the fundamentals, how new libraries can become liabilities, and why stable, proven solutions usually serve us best in the long run. The Allure of New Libraries I’ll be the first to admit that I’ve been seduced by shiny new libraries before. Back in 2018, I led a team overhaul of our front-end architecture. We added a number of trendy state management tools and UI component frameworks, certain that they would streamline our workflow. Our package.json ballooned with dependencies, each seemingly indispensable. At first, it felt like we were riding a wave of innovation. Then, about six months in, a pattern emerged. A few libraries became outdated; some were abandoned by their maintainers. Every time we audited our dependencies, it seemed we were juggling security patches and version conflicts far more often than we shipped new features. The headache of maintenance made one thing crystal clear: every new dependency is a promise you make to maintain and update someone else’s code. The True Cost of Dependencies When we adopt a new library, we’re not just adding functionality; we’re also taking on significant risks. Here are just some of the hidden costs that frequently go overlooked: Maintenance Overhead New libraries don’t just drop into your project and remain stable forever. They require patching for security vulnerabilities, updating for compatibility with other tools, and diligence when major releases introduce breaking changes. If you’re not on top of these updates, you risk shipping insecure or buggy code to production. Version Conflicts Even robust tools like npm and yarn can’t guarantee complete harmony among your dependencies. One library might require a specific version of a package that conflicts with another library’s requirements. Resolving these inconsistencies can be a maddening, time-consuming process. Performance Implications The size of the bundle increases a lot because of front-end libraries. One specialized library may add tens or hundreds of kilobytes to your final JavaScript payload, making it heavier, which means slower load times and worse user experiences. Security Vulnerabilities In one audit for a client recently, 60% of their app’s vulnerabilities came from third-party packages, often many layers deep in the dependency tree. Sometimes, to patch one library, multiple interdependent packages need to be updated, which is rarely an easy process. A colleague and I once had a need for a date picker for a project. The hip thing to do would have been to install some feature-rich library and quickly drop it in. Instead, we polyfilled our own lightweight date picker in vanilla JavaScript, using the native Date object. It was a fraction of the size, had zero external dependencies, and was completely ours to modify. That tiny decision spared us from possible library update headaches, conflicts, or abandonment issues months later. The Power of Vanilla JavaScript Modern JavaScript is almost unrecognizable from what it was ten years ago. Many features that previously required libraries like Lodash or Moment are now part of the language — or can be replicated with a few lines of code. For example: JavaScript // Instead of installing Lodash to remove duplicates: const uniqueItems = [...new Set(items)]; // Instead of using a library for deep cloning: const clonedObject = structuredClone(complexObject); A deep familiarity with the standard library can frequently replace entire suites of utility functions. These days, JavaScript’s built-in methods handle most common tasks elegantly, making large chunks of external code unnecessary. When to Use External Libraries None of this is to say you should never install a third-party package. The key lies in discernment — knowing when a problem is big enough or specialized enough to benefit from a well-tested, well-maintained library. For instance: Critical complexity: Frameworks like React have proven their mettle for managing complex UI states in large-scale applications.Time-to-market: Sometimes, a short-term deliverable calls for a robust, out-of-the-box solution, and it makes sense to bring in a trusted library rather than build everything from scratch.Community and maintenance: Popular libraries with long track records and active contributor communities — like D3.js for data visualization — can be safer bets, especially if they’re solving well-understood problems. The key is to evaluate the cost-benefit ratio: Can this be done with native APIs or a small custom script?Do I trust this library’s maintainer track record?Is it solving a core problem or offering only minor convenience?Will my team actually use enough of its features to justify the extra weight? Strategies for Avoiding Unnecessary Dependencies To keep your projects lean and maintainable, here are a few best practices: 1. Evaluate Built-In Methods First You’d be surprised how many tasks modern JavaScript can handle without third-party code. Spend time exploring the newer ES features, such as array methods, Map/Set, async/await, and the Intl API for localization. 2. Document Your Choices If you do bring in a new library, record your reasoning in a few sentences. State the problem it solves, the alternatives you considered, and any trade-offs. Future maintainers (including your future self) will appreciate the context if questions arise later. 3. Regular Dependency Audits Re-scan your package.json every quarter or so. Is this library still maintained? Are you really using their features? Do a small cleanup of the project for removing dead weights that would reduce the potential for security flaws. 4. Aggressive Dependency vs. DevDependency Separation Throw build tooling, testing frameworks, other non-production packages into your devDependencies. Keep your production dependency listing lean in terms of just the things that you really need to function at runtime. The Case for Core Libraries A team I recently worked with had some advanced charting and visualization requirements. Although a newer charting library promised flashy animations and out-of-the-box UI components, we decided to use D3.js, a stalwart in the data visualization space. The maturity of the library, thorough documentation, and huge community made it a stable foundation for our custom charts. By building directly on top of D3’s fundamentals, we had full control over our final visualizations, avoiding the limitations of less established abstractions. That mindset — paying off in performance, maintainability, and peace of mind for embracing a core, proven library rather than chasing every new offering — means instead of spending time adapting our data to a proprietary system or debugging half-baked features, we have to focus on real product needs, confident that D3 would remain stable and well-supported. Performance Gains Libraries aren’t just maintenance overhead, they affect your app’s performance too. In one recent project, we reduced the initial bundle size by 60% simply by removing niche libraries and replacing them with native code. The numbers told the story. Load time dropped from 3.2s to 1.4s.Time to interact improved by nearly half.Memory usage fell by roughly 30%. These results didn’t come from advanced optimizations but from the simpler act of removing unnecessary dependencies. In an age of ever-growing user expectations, the performance benefits alone can justify a more minimal approach. Building for the Long Term Software is never static. Today’s must-have library may turn out to be tomorrow’s orphaned repository. Reliable, stable code tends to come from developers who favor well-understood, minimal solutions over ones that rely too heavily on external, fast-moving packages. Take authentication, for example: with the hundreds of packages that exist to handle user login flows, rolling a simple system with few dependencies may result in something easier to audit, more transparent, and less subject to churn from external libraries. The code might be a bit more verbose, but it’s also explicit, predictable, and directly under your control. Teaching and Team Growth One of the underrated benefits of using fewer libraries is how it fosters stronger problem-solving skills within your team. Having to implement features themselves forces the developers to have a deep understanding of core concepts-which pays dividends when debugging, performance tuning, or even evaluating new technologies in the future. Relying too much on abstractions from someone else can stunt that growth and transform capable coders into “framework operators.” Conclusion The next time you think about installing yet another trending package, reflect on whether it solves your pressing need or just for novelty. As experience has drummed into my head, each new dependency is for life. This is the way to ensure that one gets light solutions that are secure and easier to maintain by using built-in capabilities, well-tried libraries, and a deep understanding of fundamentals. Ultimately, “boring” but reliable libraries — and sometimes just vanilla JavaScript — tend to stand the test of time better than flashy newcomers. Balancing innovation with pragmatism is the hallmark of a seasoned developer. In an era of endless frameworks and packages, recognizing when you can simply reach for the tools you already have may be the most valuable skill of all.
Amazon is a well-known e-commerce platform with a large amount of data available in various formats on the web. This data can be invaluable for gaining business insights, particularly by analyzing product reviews to understand the quality of products provided by different vendors. In this guide, we will look into web scraping steps to extract Amazon reviews of a particular product and save them in Excel or CSV format. Since manually copying information online can be tedious, we’ll focus on scraping reviews from Amazon. This hands-on experience will enhance our practical understanding of web scraping techniques. Prerequisite Before we start, make sure you have Python installed in your system. You can do that from this link. The process is very simple — just install it like you would install any other application. Now that everything is set, let’s proceed. How to Scrape Amazon Reviews Using Python Install Anaconda through this link. Be sure to follow the default settings during installation. For more guidance, you can watch this video: We can use various IDEs, but to keep it beginner-friendly, let’s start with Jupyter Notebook in Anaconda. You can watch the video linked above to understand and get familiar with the software. Steps for Web Scraping Amazon Reviews Create a New Notebook and save it. Step 1: Import Necessary Modules Let’s start importing all the modules needed using the following code: Python import requests from bs4 import BeautifulSoup import pandas as pd Step 2: Define Headers To avoid getting your IP blocked, define custom headers. Note that you can replace the User-agent value with your user agent, which you can find by searching "my user agent" on Google. Python custom_headers = { "Accept-language": "en-GB,en;q=0.9", "User-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.1 Safari/605.1.15", } Step 3: Fetch Webpage Create a Python function to fetch the webpage, check for errors, and return a BeautifulSoup object for further processing. Python # Function to fetch the webpage and return a BeautifulSoup object def fetch_webpage(url): response = requests.get(url, headers=headers) if response.status_code != 200: print("Error in fetching webpage") exit(-1) page_soup = BeautifulSoup(response.text, "lxml") return page_soup Step 4: Extract Reviews Inspect Element to find the element and attribute from which we want to extract data. Let's create another function to select the div and attribute and set it to extract_reviews variable. It identifies review-related elements on a webpage but doesn’t yet extract the actual review content. You would need to add code to extract the relevant information from these elements (e.g., review text, ratings, etc.). Python # Function to extract reviews from the webpage def extract_reviews(page_soup): review_blocks = page_soup.select('div[data-hook="review"]') reviews_list = [] Step 5: Process Review Data The code below processes each review element, extracts the customer’s name (if available), and stores it in the customer variable. If no customer information is found, customer remains none. Python for review in review_blocks: author_element = review.select_one('span.a-profile-name') customer = author_element.text if author_element else None rating_element = review.select_one('i.review-rating') customer_rating = rating_element.text.replace("out of 5 stars", "") if rating_element else None title_element = review.select_one('a[data-hook="review-title"]') review_title = title_element.text.split('stars\n', 1)[-1].strip() if title_element else None content_element = review.select_one('span[data-hook="review-body"]') review_content = content_element.text.strip() if content_element else None date_element = review.select_one('span[data-hook="review-date"]') review_date = date_element.text.replace("Reviewed in the United States on ", "").strip() if date_element else None image_element = review.select_one('img.review-image-tile') image_url = image_element.attrs["src"] if image_element else None Step 6: Process Scraped Reviews The purpose of this function is to process scraped reviews. It takes various parameters related to a review (such as customer, customer_rating, review_title, review_content, review_date, and image URL), and the function returns the list of processed reviews. Python review_data = { "customer": customer, "customer_rating": customer_rating, "review_title": review_title, "review_content": review_content, "review_date": review_date, "image_url": image_url } reviews_list.append(review_data) return reviews_list Step 7: Initialize Review URL Now, let's initialize a search_url variable with an Amazon product review page URL. Python def main(): review_page_url = "https://www.amazon.com/BERIBES-Cancelling-Transparent-Soft-Earpads-Charging-Black/product- reviews/B0CDC4X65Q/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviews" page_soup = fetch_webpage(review_page_url) scraped_reviews = extract_reviews(page_soup) Step 8: Verify Scraped Data Now, let’s print (“Scraped Data:”, data) scraped review data (stored in the data variable) to the console for verification purposes. Python # Print the scraped data to verify print("Scraped Data:", scraped_reviews) Step 9: Create a DataFrame Next, create a DataFrame from the data, which will help organize data into tabular form. Python # create a DataFrame and export it to a CSV file reviews_df = pd.DataFrame(data=scraped_reviews) Step 10: Export DataFrame to CSV Now, export the DataFrame to a CSV file in the current working directory. Python reviews_df.to_csv("reviews.csv", index=False) print("CSV file has been created.") Step 11: Ensure Standalone Execution The code construct below acts as a protective measure. It ensures that certain code runs only when the script is directly executed as a standalone program rather than being imported as a module by another script. Python # Ensuring the script runs only when executed directly if __name__ == '__main__': main() Result Why Scrape Amazon Product Reviews? Scraping Amazon product reviews can provide valuable insights for businesses. Here’s why they do it: Feedback Collection Every business needs feedback to understand customer requirements and implement changes to improve product quality. Scraping reviews allows businesses to gather large volumes of customer feedback quickly and efficiently. Sentiment Analysis Analyzing the sentiments expressed in reviews can help identify positive and negative aspects of products, leading to informed business decisions. Competitor Analysis Scraping allows businesses to monitor competitors’ pricing and product features, helping them stay competitive in the market. Business Expansion Opportunities By understanding customer needs and preferences, businesses can identify opportunities for expanding their product lines or entering new markets. Manually copying and pasting content is time-consuming and error-prone. This is where web scraping comes in. Using Python to scrape Amazon reviews can automate the process, reduce manual errors, and provide accurate data. Benefits of Scraping Amazon Reviews Efficiency: Automate data extraction to save time and resources.Accuracy: Reduce human errors with automated scripts.Large data volume: Collect extensive data for comprehensive analysis.Informed decision-making: Use customer feedback to make data-driven business decisions. Conclusion Now that we’ve covered how to scrape Amazon reviews using Python, you can apply the same techniques to other websites by inspecting their elements. Here are some key points to remember: Understanding HTML Familiarize yourself with the HTML structure. Knowing how elements are nested and how to navigate the Document Object Model (DOM) is crucial for finding the data you want to scrape. CSS Selectors Learn how to use CSS selectors to accurately target and extract specific elements from a webpage. Python Basics Understand Python programming, especially how to use libraries like requests for making HTTP requests and BeautifulSoup for parsing HTML content. Inspecting Elements Practice using browser developer tools (right-click on a webpage and select “Inspect” or press Ctrl+Shift+I) to examine the HTML structure. This helps you find the tags and attributes that hold the data you want to scrape. Error Handling Add error handling to your code to deal with possible issues, like network errors or changes in the webpage structure. Legal and Ethical Considerations Always check a website’s robots.txt file and terms of service to ensure compliance with legal and ethical rules of web scraping. By mastering these areas, you’ll be able to confidently scrape data from various websites, allowing you to gather valuable insights and perform detailed analyses.
As the world becomes increasingly conscious of energy consumption and its environmental impact, software development is joining the movement to go green. Surprisingly, even the choice of runtime environments and how code is executed can affect energy consumption. This brings us to the world of Java Virtual Machines (JVMs), an integral part of running Java applications, and the rising star in the JVM world, GraalVM. In this article, we will explore how code performance and energy efficiency intersect in the JVM ecosystem and why GraalVM stands out in this domain. Understanding Energy and Performance in the JVM Context To grasp why energy efficiency matters in JVMs, we first need to understand what JVMs do. A JVM is the engine that powers Java applications, converting platform-independent Java bytecode into machine-specific instructions. While this flexibility is a major strength, it also means JVMs carry some overhead, especially compared to compiled languages like C++. Now, energy consumption in software isn't just about the hardware running hotter or consuming more electricity. It's tied to the performance of the software itself. When code is slow or inefficient, it takes longer to execute, which directly correlates with more CPU cycles, increased power draw, and greater energy usage. This connection between performance and energy efficiency is at the heart of what makes optimizing JVMs so critical. Studies like those by Leeds Beckett University (2023) and HAL Open Science (2021) reveal how JVM optimizations and configurations significantly impact energy use. As newer JVMs improve performance through better garbage collection, just-in-time (JIT) compilation, and other optimizations, they reduce not just runtime but also energy costs. Yet, even within these advancements, there’s a standout contender reshaping how we think about energy-efficient Java: GraalVM. What Makes GraalVM Different? GraalVM is a high-performance runtime designed to improve the efficiency of applications written in Java and other JVM-compatible languages. Unlike traditional JVM implementations, GraalVM incorporates advanced optimization techniques that make it unique in both speed and energy usage. Its native image capability allows applications to be compiled ahead-of-time (AOT) into standalone executables. Traditional JVMs rely heavily on JIT compilation, which compiles bytecode into machine code at runtime. While this approach allows for adaptive optimizations (learning and optimizing as the program runs), it introduces a delay in startup time and consumes energy during execution. GraalVM’s AOT compilation eliminates this runtime overhead by pre-compiling the code, significantly reducing the startup time and resource consumption. Furthermore, GraalVM supports polyglot programming, which enables developers to mix languages like JavaScript, Python, and Ruby in a single application. This reduces the need for multiple runtime environments, simplifying deployment and cutting down on the energy costs associated with maintaining diverse infrastructures. Energy Efficiency in Numbers The question many might ask is: does GraalVM truly make a difference in energy terms? The combined studies offer some clarity. For example, Leeds Beckett University (2023) and HAL Open Science (2021) benchmarked GraalVM against traditional JVMs like OpenJDK, Amazon Corretto, and Azul Zulu, using diverse workloads. Both studies showed that GraalVM, particularly in its native image configuration, consumed less energy and completed tasks faster across the majority of scenarios. Interestingly, the energy consumption gains are not linear across all benchmarks. While GraalVM excelled in data-parallel tasks like Alternating Least Squares (ALS), it underperformed in certain highly parallel tasks like Avrora. This suggests that the workload type significantly influences the runtime's energy efficiency. Moreover, the researchers observed that while newer JVMs like HotSpot 15 generally offered better energy performance than older versions like HotSpot 8, GraalVM consistently stood out. Even when compared to JVMs optimized for long-running tasks, GraalVM delivered lower energy costs due to its AOT compilation, which minimized runtime overhead. Insights from JVM Configuration Studies Beyond runtime optimizations, how you configure a JVM can have profound effects on energy consumption. Both studies emphasized the role of garbage collection (GC) and JIT compiler settings. For instance, HAL Open Science found that default GC settings were energy-efficient in only half of the experiments. Alternative GC strategies, such as ParallelGC and SerialGC, sometimes outperformed default configurations like G1GC. Similarly, tweaking JIT compilation levels could improve energy efficiency, but such adjustments often required detailed performance evaluations. One of the most striking observations was the variability in energy savings based on application characteristics. For data-heavy tasks like H2 database simulations, energy savings were most pronounced when using GraalVM’s default configurations. However, for highly concurrent applications like Reactors, specific configurations of JIT threads delivered significant improvements. Carbon Footprint Reduction The environmental implications of these energy savings are immense. Using standardized energy-to-carbon conversion factors, the studies highlighted that GraalVM reduced carbon dioxide emissions more effectively than traditional JVMs. These reductions were particularly significant in cloud environments, where optimizing runtime efficiency lowered operational costs and reduced the carbon footprint of large-scale deployments. Broader Implications for Software Development The findings from Leeds Beckett University (2023) and HAL Open Science (2021) are clear: energy efficiency is no longer just about hardware; it’s about making smarter software choices. By adopting greener JVMs like GraalVM, developers can contribute directly to sustainability goals without compromising on performance. However, the road to greener software isn’t just about choosing a runtime. It involves understanding the nuances of workload types, runtime configurations, and application behaviors. Tools like J-Referral, introduced in HAL Open Science’s study, can help developers select the most energy-efficient JVM configurations for their specific needs, simplifying the path to sustainable computing. Conclusion The correlation between code performance and energy efficiency is clear: faster, optimized software consumes less energy. JVMs have long been at the heart of this discussion, and while traditional JVMs continue to evolve, GraalVM offers a leap forward. By combining high performance, energy efficiency, and versatility, it stands out as a powerful tool for modern developers looking to build applications that are not only fast but also environmentally conscious. With studies confirming its efficiency across a broad range of scenarios, GraalVM represents a shift in how we think about software sustainability. The journey to greener software begins with choices like these — choices that balance performance, cost, and environmental responsibility. References Vergilio, TG and Do Ha, L and Kor, A-LG (2023) Comparative Performance and Energy Efficiency Analysis of JVM Variants and GraalVM in Java Applications.Zakaria Ournani, Mohammed Chakib Belgaid, Romain Rouvoy, Pierre Rust, Joel Penhoat. Evaluating the Impact of Java Virtual Machines on Energy Consumption.
Let's discuss an important question: how do we monitor our services if something goes wrong? On the one hand, we have Prometheus with alerts and Kibana for dashboards and other helpful features. We also know how to gather logs — the ELK stack is our go-to solution. However, simple logging isn’t always enough: it doesn’t provide a holistic view of a request’s journey across the entire ecosystem of components. You can find more info about ELK here. But what if we want to visualize requests? What if we need to correlate requests traveling between systems? This applies to both microservices and monoliths — it doesn’t matter how many services we have; what matters is how we manage their latency. Indeed, each user request might pass through a whole chain of independent services, databases, message queues, and external APIs. In such a complex environment, it becomes extremely difficult to pinpoint exactly where delays occur, identify which part of the chain acts as a performance bottleneck, and quickly find the root cause of failures when they happen. To address these challenges effectively, we need a centralized, consistent system to collect telemetry data — traces, metrics, and logs. This is where OpenTelemetry and Jaeger come to the rescue. Let's Look at the Basics There are two main terms we have to understand: Trace ID A Trace ID is a 16-byte identifier, often represented as a 32-character hexadecimal string. It’s automatically generated at the start of a trace and stays the same across all spans created by a particular request. This makes it easy to see how a request travels through different services or components in a system. Span ID Every individual operation within a trace gets its own Span ID, which is typically a randomly generated 64-bit value. Spans share the same Trace ID, but each one has a unique Span ID, so you can pinpoint exactly which part of the workflow each span represents (like a database query or a call to another microservice). How Are They Related? Trace ID and Span ID complement each other. When a request is initiated, a Trace ID is generated and passed to all involved services. Each service, in turn, creates a span with a unique Span ID linked to the Trace ID, enabling you to visualize the full lifecycle of the request from start to finish. Okay, so why not just use Jaeger? Why do we need OpenTelemetry (OTEL) and all its specifications? That’s a great question! Let’s break it down step by step. Find more about Jaeger here. TL;DR Jaeger is a system for storing and visualizing distributed traces. It collects, stores, searches, and displays data showing how requests “travel” through your services.OpenTelemetry (OTEL) is a standard (and a set of libraries) for collecting telemetry data (traces, metrics, logs) from your applications and infrastructure. It isn’t tied to any single visualization tool or backend. Put simply: OTEL is like a “universal language” and set of libraries for telemetry collection.Jaeger is a backend and UI for viewing and analyzing distributed traces. Why Do We Need OTEL if We Already Have Jaeger? 1. A Single Standard for Collection In the past, there were projects like OpenTracing and OpenCensus. OpenTelemetry unifies these approaches to collecting metrics and traces into one universal standard. 2. Easy Integration You write your code in Go (or another language), add OTEL libraries for auto-injecting interceptors and spans, and that’s it. Afterward, it doesn’t matter where you want to send that data—Jaeger, Tempo, Zipkin, Datadog, a custom backend—OpenTelemetry takes care of the plumbing. You just swap out the exporter. 3. Not Just Traces OpenTelemetry covers traces, but it also handles metrics and logs. You end up with a single toolset for all your telemetry needs, not just tracing. 4. Jaeger as a Backend Jaeger is an excellent choice if you’re primarily interested in distributed tracing visualization. But it doesn’t provide the cross-language instrumentation by default. OpenTelemetry, on the other hand, gives you a standardized way to collect data, and then you decide where to send it (including Jaeger). In practice, they often work together: Your application uses OpenTelemetry → communicates via OTLP protocol → goes to the OpenTelemetry Collector (HTTP or grpc) → exports to Jaeger for visualization. Tech Part System Design (A Little Bit) Let's quickly sketch out a couple of services that will do the following: Purchase Service – processes a payment and records it in MongoDBCDC with Debezium – listens for changes in the MongoDB table and sends them to KafkaPurchase Processor – consumes the message from Kafka and calls the Auth Service to look up the user_id for validationAuth Service – a simple user service In summary: 3 Go servicesKafkaCDC (Debezium)MongoDB Code Part Let’s start with the infrastructure. To tie everything together into one system, we’ll create a large Docker Compose file. We’ll begin by setting up telemetry. Note: All the code is available via a link at the end of the article, including the infrastructure. YAML services: jaeger: image: jaegertracing/all-in-one:1.52 ports: - "6831:6831/udp" # UDP port for the Jaeger agent - "16686:16686" # Web UI - "14268:14268" # HTTP port for spans networks: - internal prometheus: image: prom/prometheus:latest volumes: - ./prometheus.yml:/etc/prometheus/prometheus.yml:ro ports: - "9090:9090" depends_on: - kafka - jaeger - otel-collector command: --config.file=/etc/prometheus/prometheus.yml networks: - internal otel-collector: image: otel/opentelemetry-collector-contrib:0.91.0 command: ['--config=/etc/otel-collector.yaml'] ports: - "4317:4317" # OTLP gRPC receiver volumes: - ./otel-collector.yaml:/etc/otel-collector.yaml depends_on: - jaeger networks: - internal We’ll also configure the collector — the component that gathers telemetry. Here, we choose gRPC for data transfer, which means communication will happen over HTTP/2: YAML receivers: # Add the OTLP receiver listening on port 4317. otlp: protocols: grpc: endpoint: "0.0.0.0:4317" processors: batch: # https://github.com/open-telemetry/opentelemetry-collector/tree/main/processor/memorylimiterprocessor memory_limiter: check_interval: 1s limit_percentage: 80 spike_limit_percentage: 15 extensions: health_check: {} exporters: otlp: endpoint: "jaeger:4317" tls: insecure: true prometheus: endpoint: 0.0.0.0:9090 debug: verbosity: detailed service: extensions: [health_check] pipelines: traces: receivers: [otlp] processors: [memory_limiter, batch] exporters: [otlp] metrics: receivers: [otlp] processors: [memory_limiter, batch] exporters: [prometheus] Make sure to adjust any addresses as needed, and you’re done with the base configuration. We already know OpenTelemetry (OTEL) uses two key concepts — Trace ID and Span ID — that help track and monitor requests in distributed systems. Implementing the Code Now, let’s look at how to get this working in your Go code. We need the following imports: Go "go.opentelemetry.io/otel" "go.opentelemetry.io/otel/exporters/otlp/otlptrace" "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc" "go.opentelemetry.io/otel/sdk/resource" "go.opentelemetry.io/otel/sdk/trace" semconv "go.opentelemetry.io/otel/semconv/v1.17.0" Then, we add a function to initialize our tracer in main() when the application starts: Go func InitTracer(ctx context.Context) func() { exp, err := otlptrace.New( ctx, otlptracegrpc.NewClient( otlptracegrpc.WithEndpoint(endpoint), otlptracegrpc.WithInsecure(), ), ) if err != nil { log.Fatalf("failed to create OTLP trace exporter: %v", err) } res, err := resource.New(ctx, resource.WithAttributes( semconv.ServiceNameKey.String("auth-service"), semconv.ServiceVersionKey.String("1.0.0"), semconv.DeploymentEnvironmentKey.String("stg"), ), ) if err != nil { log.Fatalf("failed to create resource: %v", err) } tp := trace.NewTracerProvider( trace.WithBatcher(exp), trace.WithResource(res), ) otel.SetTracerProvider(tp) return func() { err := tp.Shutdown(ctx) if err != nil { log.Printf("error shutting down tracer provider: %v", err) } } } With tracing set up, we just need to place spans in the code to track calls. For example, if we want to measure database calls (since that’s usually the first place we look for performance issues), we can write something like this: Go tracer := otel.Tracer("auth-service") ctx, span := tracer.Start(ctx, "GetUserInfo") defer span.End() tracedLogger := logging.AddTraceContextToLogger(ctx) tracedLogger.Info("find user info", zap.String("operation", "find user"), zap.String("username", username), ) user, err := s.userRepo.GetUserInfo(ctx, username) if err != nil { s.logger.Error(errNotFound) span.RecordError(err) span.SetStatus(otelCodes.Error, "Failed to fetch user info") return nil, status.Errorf(grpcCodes.NotFound, errNotFound, err) } span.SetStatus(otelCodes.Ok, "User info retrieved successfully") We have tracing at the service layer — great! But we can go even deeper, instrumenting the database layer: Go func (r *UserRepository) GetUserInfo(ctx context.Context, username string) (*models.User, error) { tracer := otel.Tracer("auth-service") ctx, span := tracer.Start(ctx, "UserRepository.GetUserInfo", trace.WithAttributes( attribute.String("db.statement", query), attribute.String("db.user", username), ), ) defer span.End() var user models.User // Some code that queries the DB... // err := doDatabaseCall() if err != nil { span.RecordError(err) span.SetStatus(codes.Error, "Failed to execute query") return nil, fmt.Errorf("failed to fetch user info: %w", err) } span.SetStatus(codes.Ok, "Query executed successfully") return &user, nil } Now, we have a complete view of the request journey. Head to the Jaeger UI, query for the last 20 traces under auth-service, and you’ll see all the spans and how they connect in one place. Now, everything is visible. If you need it, you can include the entire query in the tags. However, keep in mind that you shouldn’t overload your telemetry — add data deliberately. I’m simply demonstrating what’s possible, but including the full query, this way isn’t something I’d generally recommend. gRPC client-server If you want to see a trace that spans two gRPC services, it’s quite straightforward. All you need is to add the out-of-the-box interceptors from the library. For example, on the server side: Go server := grpc.NewServer( grpc.StatsHandler(otelgrpc.NewServerHandler()), ) pb.RegisterAuthServiceServer(server, authService) On the client side, the code is just as short: Go shutdown := tracing.InitTracer(ctx) defer shutdown() conn, err := grpc.Dial( "auth-service:50051", grpc.WithInsecure(), grpc.WithStatsHandler(otelgrpc.NewClientHandler()), ) if err != nil { logger.Fatal("error", zap.Error(err)) } That’s it! Ensure your exporters are configured correctly, and you’ll see a single Trace ID logged across these services when the client calls the server. Handling CDC Events and Tracing Want to handle events from the CDC as well? One simple approach is to embed the Trace ID in the object that MongoDB stores. That way, when Debezium captures the change and sends it to Kafka, the Trace ID is already part of the record. For instance, if you’re using MongoDB, you can do something like this: Go func (r *mongoPurchaseRepo) SavePurchase(ctx context.Context, purchase entity.Purchase) error { span := r.handleTracing(ctx, purchase) defer span.End() // Insert the record into MongoDB, including the current span's Trace ID _, err := r.collection.InsertOne(ctx, bson.M{ "_id": purchase.ID, "user_id": purchase.UserID, "username": purchase.Username, "amount": purchase.Amount, "currency": purchase.Currency, "payment_method": purchase.PaymentMethod, // ... "trace_id": span.SpanContext().TraceID().String(), }) return err } Debezium then picks up this object (including trace_id) and sends it to Kafka. On the consumer side, you simply parse the incoming message, extract the trace_id, and merge it into your tracing context: Go // If we find a Trace ID in the payload, attach it to the context newCtx := ctx if traceID != "" { log.Printf("Found Trace ID: %s", traceID) newCtx = context.WithValue(ctx, "trace-id", traceID) } // Create a new span tracer := otel.Tracer("purchase-processor") newCtx, span := tracer.Start(newCtx, "handler.processPayload") defer span.End() if traceID != "" { span.SetAttributes( attribute.String("trace.id", traceID), ) } // Parse the "after" field into a Purchase struct... var purchase model.Purchase if err := mapstructure.Decode(afterDoc, &purchase); err != nil { log.Printf("Failed to map 'after' payload to Purchase struct: %v", err) return err } Go // If we find a Trace ID in the payload, attach it to the context newCtx := ctx if traceID != "" { log.Printf("Found Trace ID: %s", traceID) newCtx = context.WithValue(ctx, "trace-id", traceID) } // Create a new span tracer := otel.Tracer("purchase-processor") newCtx, span := tracer.Start(newCtx, "handler.processPayload") defer span.End() if traceID != "" { span.SetAttributes( attribute.String("trace.id", traceID), ) } // Parse the "after" field into a Purchase struct... var purchase model.Purchase if err := mapstructure.Decode(afterDoc, &purchase); err != nil { log.Printf("Failed to map 'after' payload to Purchase struct: %v", err) return err } Alternative: Using Kafka Headers Sometimes, it’s easier to store the Trace ID in Kafka headers rather than in the payload itself. For CDC workflows, this might not be available out of the box — Debezium can limit what’s added to headers. But if you control the producer side (or if you’re using a standard Kafka producer), you can do something like this with Sarama: Injecting a Trace ID into Headers Go // saramaHeadersCarrier is a helper to set/get headers in a Sarama message. type saramaHeadersCarrier *[]sarama.RecordHeader func (c saramaHeadersCarrier) Get(key string) string { for _, h := range *c { if string(h.Key) == key { return string(h.Value) } } return "" } func (c saramaHeadersCarrier) Set(key string, value string) { *c = append(*c, sarama.RecordHeader{ Key: []byte(key), Value: []byte(value), }) } // Before sending a message to Kafka: func produceMessageWithTraceID(ctx context.Context, producer sarama.SyncProducer, topic string, value []byte) error { span := trace.SpanFromContext(ctx) traceID := span.SpanContext().TraceID().String() headers := make([]sarama.RecordHeader, 0) carrier := saramaHeadersCarrier(&headers) carrier.Set("trace-id", traceID) msg := &sarama.ProducerMessage{ Topic: topic, Value: sarama.ByteEncoder(value), Headers: headers, } _, _, err := producer.SendMessage(msg) return err } Extracting a Trace ID on the Consumer Side Go for message := range claim.Messages() { // Extract the trace ID from headers var traceID string for _, hdr := range message.Headers { if string(hdr.Key) == "trace-id" { traceID = string(hdr.Value) } } // Now continue your normal tracing workflow if traceID != "" { log.Printf("Found Trace ID in headers: %s", traceID) // Attach it to the context or create a new span with this info } } Depending on your use case and how your CDC pipeline is set up, you can choose the approach that works best: Embed the Trace ID in the database record so it flows naturally via CDC.Use Kafka headers if you have more control over the producer side or you want to avoid inflating the message payload. Either way, you can keep your traces consistent across multiple services—even when events are asynchronously processed via Kafka and Debezium. Conclusion Using OpenTelemetry and Jaeger provides detailed request traces, helping you pinpoint where and why delays occur in distributed systems. Adding Prometheus completes the picture with metrics — key indicators of performance and stability. Together, these tools form a comprehensive observability stack, enabling faster issue detection and resolution, performance optimization, and overall system reliability. I can say that this approach significantly speeds up troubleshooting in a microservices environment and is one of the first things we implement in our projects. Links Infra codeOTEL Getting StartedOTEL SQLOTEL CollectorGo gRPCGo ReflectionKafkaDebezium MongoDB Connector DocsUnwrap MongoDB SMT Example
To set the groundwork for this article, let's first understand what Pytest is. Pytest is a popular testing framework for Python that simplifies the process of writing scalable and maintainable test cases. It supports fixtures, parameterized testing, and detailed test reporting, making it a powerful choice for both unit and functional testing. Pytest's simplicity and flexibility have made it a go-to framework for developers and testers alike. How to Install Pytest Pytest requires: Python 3.8+ or PyPy3. 1. Run the following command in your command line: Plain Text pip install -U pytest OR pip3 install -U pytest You can refer to the image below for your reference: Check that you have installed the correct version: Plain Text $ pytest --version You can refer to the image below for your reference: Purpose of Fixtures Fixtures in Pytest provide a robust way to manage setup and teardown logic for tests. They are particularly useful for initializing resources, mocking external services, or performing setup steps that are shared across multiple tests. By using fixtures, you can avoid repetitive code and ensure a consistent testing environment. Common Use Cases Initializing web drivers for browser interactionsNavigating to a specific URL before running testsClean up resources after tests are completed Example of Using Pytest Fixtures in Selenium Test Scenario We want to verify the login functionality of the Sauce Labs Demo website. The steps include: Open a browser and navigate to the login page.Perform login operations.Verify successful login.Close the browser after the test. Code Implementation Plain Text import pytest from selenium import webdriver from selenium.webdriver.common.by import By @pytest.fixture(scope="module") def browser(): # Setup: Initialize the WebDriver driver = webdriver.Chrome() # Replace with your WebDriver (Chrome, Firefox, etc.) driver.get("https://www.saucedemo.com/") # Navigate to the login page yield driver # Provide the WebDriver instance to the tests # Teardown: Close the browser after tests driver.quit() def test_login_success(browser): # Use the WebDriver instance provided by the fixture browser.find_element(By.ID, "user-name").send_keys("standard_user") browser.find_element(By.ID, "password").send_keys("secret_sauce") browser.find_element(By.ID, "login-button").click() # Verify successful login by checking the presence of a product page element assert "Products" in browser.page_source Explanation Fixture Definition Plain Text @pytest.fixture(scope="module") def browser(): driver = webdriver.Chrome() driver.get("https://www.saucedemo.com/") yield driver driver.quit() @pytest.fixture(scope="module"): Defines a fixture named browser with a scope of "module," meaning it will be set up once per module and shared among all tests in that moduleSetup: Initializes the WebDriver and navigates to the Sauce Labs Demo login pageYield: Provides the WebDriver instance to the test functionsTeardown: Closes the browser after all tests are completed Using the Fixture Plain Text def test_login_success(browser): browser.find_element(By.ID, "user-name").send_keys("standard_user") browser.find_element(By.ID, "password").send_keys("secret_sauce") browser.find_element(By.ID, "login-button").click() assert "Products" in browser.page_source The test_login_success function uses the browser fixture.It interacts with the login page by sending credentials and clicking the login button.Finally, it asserts the login's success by verifying the presence of the "Products" page element. How to Use a Parameterized Fixture in Pytest A parameterized fixture in Pytest allows you to run the same test function multiple times with different input values provided by the fixture. This is done by setting the params argument in the @pytest.fixture decorator. Here’s an example of a parameterized fixture: Plain Text import pytest # Define a parameterized fixture @pytest.fixture(params=["chrome", "firefox", "safari"]) def browser(request): # The 'request' object gives access to the parameter value return request.param # Use the parameterized fixture in a test def test_browser_launch(browser): print(f"Launching browser: {browser}") # Simulate browser testing logic assert browser in ["chrome", "firefox", "safari"] Explanation 1. Definition The @pytest.fixture decorator uses the params argument to provide a list of values (["chrome", "firefox", "safari"]).For each value in the list, Pytest will call the fixture once and pass the current value to the test function. 2. Usage In the test test_browser_launch, the fixture browser is passed as an argument.The request.param in the fixture gives the current parameter value being used. 3. Test Execution Pytest will run the test_browser_launchtest three times, once for each browser: browser = "chrome"browser = "firefox"browser = "safari" Output When you run the test: Plain Text pytest -s test_file.py You’ll see: Plain Text Launching browser: chrome Launching browser: firefox Launching browser: safari This approach is particularly useful for testing the same logic or feature across multiple configurations, inputs, or environments. How to Pass Data Dynamically into a Fixture In Python, you can use a fixture to pass data dynamically to your test cases with the help of the Pytest framework. A fixture in Pytest allows you to set up data or resources that can be shared across multiple test cases. You can create dynamic test data inside the fixture and pass it to your test functions by including it as a parameter in the test. Here's how you can use a fixture to pass dynamic data to your test cases: Example Plain Text import pytest import time # Define a fixture that generates dynamic data @pytest.fixture def dynamic_data(): # You can generate dynamic data based on various factors, such as timestamps or random values return "Dynamic Data " + str(time.time()) # Use the fixture in your test case def test_using_dynamic_data(dynamic_data): print(dynamic_data) # This prints dynamically generated data assert "Dynamic Data" in dynamic_data # Your test logic using dynamic data Explanation 1. Fixture Creation The @pytest.fixture decorator is used to create a fixture. The fixture function (dynamic_data) generates dynamic data (in this case, appending the current timestamp to the string). 2. Using Fixture in Test In the test function (test_using_dynamic_data), the fixture is passed as a parameter. Pytest automatically detects that the test function needs dynamic_data, so it calls the fixture and provides the generated data to the test. 3. Dynamic Data Each time the test runs, the fixture generates fresh, dynamic data (based on the current timestamp), making the test run with different data. Benefits of Using Fixtures Code reusability. Define setup and teardown logic once and reuse it across multiple tests.Readability. Keep test logic focused on assertions rather than setup/teardown details.Consistency. Ensure each test starts with a clean state. Closing Thoughts Pytest fixtures are a powerful tool for managing test resources efficiently. Adopting fixtures lets you write clean, reusable, and maintainable test code. Try implementing them in your test automation projects, starting with simple examples like the one above.
Due to the rapid growth of the API industry, client-server interaction must be seamless for everyone. Dependence on live APIs during development, however, may cause delays, particularly if the APIs are currently being built or undergoing frequent changes. A mock client can be helpful for developers aiming for efficiency and reliability. In this article, we focus on the concept of mock clients, their significance, and how to use them for development. What Is a Mock Client? A mock client is a simulated client that interacts with an API based on its defined specifications without sending actual requests to a live server. It mimics the behavior of the API, allowing developers to test and develop client-side code in isolation. This approach is invaluable during the development and testing phases, as it eliminates dependencies on the actual API implementation, leading to more streamlined and error-free code integration. Benefits of Having Mock Clients The mock clients are not just a technical enhancement for any code generator tool, it’s a strategic move to support development efficiency and software quality. Here’s why mock clients are beneficial: 1. Early API Integration Testing Mock clients allow developers to begin integrating and testing their code against the API even before the API is fully developed. This early testing makes sure that client-side functionality is validated upfront, reducing the risk of encountering significant issues later in the development cycle. This leads to having software quality process leverage. 2. Rapid Prototyping and Iteration Developers can quickly prototype features and iterate based on immediate feedback using mock clients. This agility is beneficial in today’s dynamic development environments, where requirements can change rapidly. 3. Enhanced Reliability Reduces dependencies on live APIs, minimizing issues caused by API downtime or instability during development. Demo of Using a Mock Client Note: Here, I have used Ballerina Swan Lake 2201.10.3 for demonstration. For this demonstration, I chose the Ballerina language to implement my app. The Ballerina language has the Ballerina OpenAPI tool to generate a mock client based on my OpenAPI contract. The Ballerina OpenAPI Tool has a primary mode that allows us to generate client code from a given OpenAPI specification. Additionally, the mock client is another sub-mode that can be executed within the client mode. If your OpenAPI contract includes examples for operation responses, the mock client generation code can be executed using the OAS contract. Step 1 First, I’m going to create a Ballerina package and a module for the mock client using this Organize Ballerina code guide. Create your Ballerina package using the bal new demo-mock command.Create a module for the mock client using the bal add mclient command. Step 2 With this Ballerina OpenAPI tool, you can generate mock clients if your OpenAPI contract includes examples of particular operation responses. Then, I use this OpenAPI contract to generate the mock client. This contract has examples with responses. OpenAPI Contract YAML openapi: 3.0.1 info: title: Convert version: 0.1.0 servers: - url: "{server}:{port}/convert" variables: server: default: http://localhost port: default: "9000" paths: /rate/{fromCurrency}/{toCurrency}: get: operationId: getRate parameters: - in: path name: fromCurrency schema: type: string required: true - in: path name: toCurrency schema: type: string required: true responses: "200": description: Created content: application/json: schema: type: object examples: json1: value: toAmount: 60000 fromCurrency: EUR toCurrency: LKR fromAmount: 200 timestamp: 2024-07-14 "202": description: Accepted "400": description: BadRequest content: application/json: schema: $ref: '#/components/schemas/ErrorPayload' components: schemas: ErrorPayload: required: - message - method - path - reason - status - timestamp type: object properties: timestamp: type: string status: type: integer format: int64 reason: type: string message: type: string path: type: string method: type: string Then, execute the mock client generation command inside the created module mclient: Plain Text bal openapi -i <yaml> --mode client --mock Example for Generated Ballerina Mock Client IDL // AUTO-GENERATED FILE. DO NOT MODIFY. // This file is auto-generated by the Ballerina OpenAPI tool. public isolated client class Client { # Gets invoked to initialize the `connector`. # # + config - The configurations to be used when initializing the `connector` # + serviceUrl - URL of the target service # + return - An error if connector initialization failed public isolated function init(ConnectionConfig config = {}, string serviceUrl = "http://localhost:9000/convert") returns error? { return; } # + headers - Headers to be sent with the request # + return - Created resource isolated function get rate/[string fromCurrency]/[string toCurrency](map<string|string[]> headers = {}) returns record {}|error { return {"fromCurrency": "EUR", "toCurrency": "LKR", "fromAmount": 200, "toAmount": 60000, "timestamp": "2024-07-14"}; } } Step 3 Now, you can use the generated sample mock client for the app implementation, as I used in the main.bal file. main.bal File IDL import ballerina/io; import demo_mock.mclient as mc; public function main() returns error? { mc:Client mclient = check new(); record {} mappingResult = check mclient->/rate/["EUR"]/["LKR"](); io:println(mappingResult); // remain logic can be address here } More mock client samples can be found here. Conclusion Whether you’re building complex client-side applications or managing API integrations, using mock clients can transform your development experience to be seamless. As APIs continue to evolve and play an increasingly central role in software ecosystems, tools like OpenAPI’s mock client generation are essential for staying ahead in the competitive landscape. Thank you for reading!
JSON (Javascript Object Notation) is a collection of key-value pairs that can be easily parsed and generated by applications. It is a subset of JavaScript Programming Language Standard ECMA-262. The parsing of JSON is required in most applications, such as restful APIs or applications that need data serialization. In the Java ecosystem, the two most popular libraries for handling JSON data are Jackson and Gson. Both are used widely and offer unique advantages. This article uses edge-case examples to explore the features of both libraries on different parameters. Brief Overview of Jackson and Gson Jackson Jackson was developed by FasterXML and is used in enterprise applications and frameworks such as Spring Boot. It offers parsing, serialization, and deserialization of JSON data. The following features make this library popular among developers: Jackson is the default JSON processing library in Spring Boot, which eliminates manual configuration in most cases.It facilitates JSON deserialization into generic types using TypeReference or JavaType.It provides different annotations to customize serialization and deserialization behavior. For example, @JsonProperty(name) makes the mapping between the incoming key and the actual Java POJO field seamless.It provides extensive and robust support for bidirectional Databinding (JSON to POJO and vice versa), streaming API (API reads JSON into POJO), and Tree model parsing (an in-memory map of JSON objects).The Jackson library offers high performance due to minimizing memory overhead and optimizing serialization/deserialization (from JSON to POJO and vice versa).Jackson supports additional modules such as XML, YAML processing, and Kotlin, scala-specific enhancements.Annotations such as @JsonTypeInfo and @JsonSubTypes handle polymorphic types.It handles missing or additional fields in JSON data due to its backward and forward compatibility.Jackson provides support for immutable objects and classes with constructors, including those using builder patterns.The ObjectMapper class is thread-safe and, therefore, enables efficient use in multithreaded applications. Gson Gson was developed by Google and designed for converting JSON to Java objects (POJO) and vice versa. It is simple and ideal to use for smaller applications that need quick implementations. The open-source library offers the following key features: Gson has minimal external dependencies; therefore, it is easy to integrate.It supports nested objects and complex data types such as lists, maps, and custom classes.It can deserialize JSON into generic collections like List<T>, Map<K,V> using TypeToken.Gson Library’s JsonSerializer and JsonDeserializer interfaces allow customized implementation.The null values are excluded in the JSON output by default, and if required, null values can be included in the output.Annotations @SerializedName maps JSON keys to Java fields with different names.The Gson objects are thread-safe and, therefore, can be used in multithreaded applications.Class GsonBuilder can apply custom naming policies for fields. For example, FieldNamingPolicy.IDENTITY is the default policy, meaning the field name is unchanged. Edge Cases Considered in This Comparison FeatureJacksonGSON Extra Fields Ignored by default, configurable. Ignored by default. Null values Supports @JsonInclude. Requires .serializeNulls(). Circular References Supported using @JsonIdentityInfo. Not supported directly. Data Handling Supports Java 8 Date API with modules. Requires custom-type adapters. Polymorphism Built-in with @JsonTypeInfo. Needs custom deserialization logic. The input JSON considered for comparison with Jackson and Gson libraries is present on GitHub. The model class representation of JSON is on GitHub. Jackson Implementation The above JSON is converted to a Java object using the Jackson libraries below: XML <!-- Jackson START--> <dependency> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-databind</artifactId> <version>2.18.2</version> </dependency> <dependency> <groupId>com.fasterxml.jackson.datatype</groupId> <artifactId>jackson-datatype-jsr310</artifactId> <version>2.18.2</version> </dependency> <!-- Jackson END--> JSON Parsing main class using Jackson library: Java public class JacksonJsonMain { public static void main(String[] args) throws IOException { ObjectMapper mapper = new ObjectMapper(); //Jackson Support for LocalDate using jackson-datatype-jsr310 mapper.registerModule(new JavaTimeModule()); //Configuration to ignore extra fields mapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false); // Deserialize the JSON EmployeeModelData employeeModelData = mapper.readValue(json, EmployeeModelData.class); Employee employee=employeeModelData.getEmployee(); // display Json fields System.out.println("Jackson Library parsing output"); System.out.println("Employee Name: " + employee.getName()); System.out.println("Department Name: " + employee.getDepartment().getName()); System.out.println("Skills: " + employee.getSkills()); System.out.println("Team Members Count: " + employeeModelData.getTeamMembers().size()); } } The output of the above class is as follows: Gson Implementation The Gson dependency used to convert the above JSON to a Java object is below: XML <!--GSON START --> <dependency> <groupId>com.google.code.gson</groupId> <artifactId>gson</artifactId> <version>2.11.0</version> </dependency> <!--GSON END --> JSON parsing using GSON library main class: Java public class GsonJsonMain { public static void main(String[] args) { Gson gson = new GsonBuilder() .registerTypeAdapter(LocalDate.class, new LocalDateAdapter()) // Register LocalDate adapter .serializeNulls() // Handle null values .setPrettyPrinting() // Pretty print JSON .create(); // Deserialize the JSON EmployeeModelData data = gson.fromJson(json, EmployeeModelData.class); // Print Employee information System.out.println("GSON Library parsing output"); System.out.println("Employee Name: " + data.getEmployee().getName()); System.out.println("Department Name: " + data.getEmployee().getDepartment().getName()); System.out.println("Skills: " + data.getEmployee().getSkills()); System.out.println("Team Members Count: " + data.getTeamMembers().size()); } } The output of the above main class is as follows: Which One Should I Choose? Jackson offers high performance; therefore, it must be used when projects involve complex data structures or large datasets, whereas Gson must be used when there are smaller datasets and the data structure is simple. Conclusion Both libraries can handle the above dataset effectively and are excellent while processing JSON parsing in JAVA. The comparison mentioned above helps one to choose the right library based on project requirements. The code snippets mentioned above are available in the GitHub repository. A detailed comparison between Jackson and Gson is available on Baeldung. The Jackson official Documentation offers in-depth information on Jackson’s features and configuration. Similarly, Gson Official documentation provides a detailed implementation guide.
Kai Wähner
Technology Evangelist,
Confluent
Alvin Lee
Founder,
Out of the Box Development, LLC