Is 2025 the year of API-first development and democratization? Take our annual survey and tell us how you implement APIs across your org.
Integrate database deployments into your CI/CD workflows. Learn key techniques to handle the unique constraints of database operations.
Mistakes That Django Developers Make and How To Avoid Them
[Part-1] Text to Action: Build a Smart Calendar AI Assistant
Generative AI
AI technology is now more accessible, more intelligent, and easier to use than ever before. Generative AI, in particular, has transformed nearly every industry exponentially, creating a lasting impact driven by its (delivered) promises of cost savings, manual task reduction, and a slew of other benefits that improve overall productivity and efficiency. The applications of GenAI are expansive, and thanks to the democratization of large language models, AI is reaching every industry worldwide.Our focus for DZone's 2025 Generative AI Trend Report is on the trends surrounding GenAI models, algorithms, and implementation, paying special attention to GenAI's impacts on code generation and software development as a whole. Featured in this report are key findings from our research and thought-provoking content written by everyday practitioners from the DZone Community, with topics including organizations' AI adoption maturity, the role of LLMs, AI-driven intelligent applications, agentic AI, and much more.We hope this report serves as a guide to help readers assess their own organization's AI capabilities and how they can better leverage those in 2025 and beyond.
Apache Cassandra Essentials
Identity and Access Management
I still remember the day our CTO walked into the engineering huddle and declared, "We're moving everything to Kubernetes." It was 2017, and like many teams caught in the container hype cycle, we dove in headfirst with more excitement than wisdom. What followed was a sobering 18-month journey of steep learning curves, 3 AM incident calls, and the gradual realization that we'd traded one set of operational headaches for another. Fast forward to today, I'm deploying containerized applications without managing a single node. No upgrades. No capacity planning. No security patching. Yet, I still have the full power of Kubernetes' declarative API at my fingertips. The serverless Kubernetes revolution is here, and it's changing everything about how we approach container orchestration. The Evolution I've Witnessed Firsthand Having worked with Kubernetes since its early days, I've lived through each phase of its management evolution: Phase 1: The DIY Era (2015-2018) Our first production Kubernetes cluster was a badge of honor — and an operational nightmare. We manually set up everything: etcd clusters, multiple master nodes for high availability, networking plugins that mysteriously failed, and storage integrations that tested the limits of our patience. We became experts by necessity, learning Kubernetes internals in painful detail. I filled three notebooks with command-line incantations, troubleshooting flows, and architecture diagrams. New team members took weeks to ramp up. We were doing cutting-edge work, but at a staggering operational cost. Phase 2: Managed Control Planes (2018-2020) When GKE, EKS, and AKS matured, it felt like a revelation. "You mean we don't have to manage etcd backups anymore?" The relief was immediate — until we realized we still had plenty of operational responsibilities. Our team still agonized over node sizing, Kubernetes version upgrades, and capacity management. I spent countless hours tuning autoscaling parameters and writing Terraform modules. We eliminated some pain, but our engineers were still spending 20-30% of their time on infrastructure rather than application logic. Phase 3: Advanced Management Tooling (2020-2022) As our company expanded to multiple clusters across different cloud providers, we invested heavily in management layers. Rancher became our control center, and we built automation for standardizing deployments. Tools improved, but complexity increased. Each new feature or integration point added cognitive load. Our platform team grew to five people — a significant investment for a mid-sized company. We were more sophisticated, but not necessarily more efficient. Phase 4: The Serverless Awakening (2022-Present) My epiphany came during a late-night production issue. After spending hours debugging a node-level problem, I asked myself: "Why are we still dealing with nodes in 2022?" That question led me down the path to serverless Kubernetes, and I haven't looked back. What Makes Kubernetes Truly "Serverless"? Through trial and error, I've developed a practical definition of what constitutes genuine serverless Kubernetes: You never think about nodes. Period. No sizing, scaling, patching, or troubleshooting. If you're SSHing into a node, it's not serverless.You pay only for what you use. Our bill now scales directly with actual workload usage. Last month, our dev environment cost dropped 78% because it scaled to zero overnight and on weekends.Standard Kubernetes API. The critical feature that separates this approach from traditional PaaS. My team uses the same YAML, kubectl commands, and CI/CD pipelines we've already mastered.Instant scalability. When our product hit the front page of Product Hunt, our API scaled from handling 10 requests per minute to 3,000 in seconds, without any manual intervention.Zero operational overhead. We deleted over 200 runbooks and automation scripts that were dedicated to cluster maintenance. Real Architectural Approaches I've Evaluated When exploring serverless Kubernetes options, I found four distinct approaches, each with unique strengths and limitations: 1. The Virtual Kubelet Approach We first experimented with Azure Container Instances (ACI) via Virtual Kubelet. The concept was elegant — a virtual node that connected our cluster to a serverless backend. This worked well for batch processing workloads but introduced frustrating latency when scaling from zero. Some of our Kubernetes manifests needed modifications, particularly those using DaemonSets or privileged containers. 2. Control Plane + Serverless Compute Our team later moved some workloads to Google Cloud Run for Anthos. I appreciated maintaining a dedicated control plane (for familiarity) while offloading the compute layer. This hybrid approach provided excellent Kubernetes compatibility. The downside? We still paid for the control plane even when idle, undermining the scale-to-zero economics. 3. On-Demand Kubernetes For our development environments, we've recently adopted an on-demand approach, where the entire Kubernetes environment — control plane included — spins up only when needed. The cost savings have been dramatic, but we've had to architect around cold start delays. We've implemented clever prewarming strategies for critical environments before high-traffic events. 4. Kubernetes-Compatible API Layers I briefly tested compatibility layers that provide Kubernetes-like APIs on top of other orchestrators. While conceptually interesting, we encountered too many edge cases where standard Kubernetes features behaved differently. Platform Experiences: What Actually Worked for Us Rather than providing generic platform overviews, let me share my team's real experiences with these technologies: AWS Fargate for EKS After running Fargate for 14 months, here's my honest assessment: What I loved: The seamless integration with existing EKS deployments lets us migrate workloads gradually. Our developers continued using familiar tools while we eliminated node management behind the scenes. The per-second billing granularity provided predictable costs.What caused headaches: Our monitoring stack relied heavily on DaemonSets, requiring significant rearchitecting. Storage limitations forced us to migrate several stateful services to managed alternatives. Cold starts occasionally impacted performance during low-traffic periods.Pro tip: Create separate Fargate profiles with appropriate sizing for different workload types — we reduced costs by 23% after segmenting our applications this way. Google Cloud Run for Anthos We deployed a new microservice architecture using this platform last year: What worked brilliantly: The sub-second scaling from zero consistently impressed us. The Knative Foundation provided an elegant developer experience, particularly for HTTP services. Traffic splitting for canary deployments became trivially easy.Where we struggled: Building effective CI/CD pipelines required additional work. Some of our batch processing workloads weren't ideal fits for the HTTP-centric model. Cost visibility was initially challenging.Real-world insight: Invest time in setting up detailed monitoring for Cloud Run services. We missed several performance issues until implementing custom metrics dashboards. Azure Container Apps For our .NET-based services, we evaluated Azure Container Apps: Standout features: The built-in KEDA-based autoscaling worked exceptionally well for event-driven workloads. The revisions concept for deployment management simplified our release process.Limitations we encountered: The partial Kubernetes API implementation meant we couldn't directly port all our existing manifests. Integration with legacy on-premises systems required additional networking configuration.Lesson learned: Start with greenfield applications rather than migrations to minimize friction with this platform. Implementation Lessons from the Trenches After transitioning multiple environments to serverless Kubernetes, here are the pragmatic lessons that don't typically make it into vendor documentation: Application Architecture Reality Check Not everything belongs in serverless Kubernetes. Our journey taught us to be selective: Perfect fits. Our API gateways, web frontends, and event processors thrived in serverless environments.Problematic workloads. Our ML training jobs, which needed GPU access and ran for hours, remained on traditional nodes. A database with specific storage performance requirements stayed on provisioned infrastructure.Practical adaptation. We created a "best of both worlds" architecture, using serverless for elastic workloads while maintaining traditional infrastructure for specialized needs. The Cost Model Shift That Surprised Us Serverless dramatically changed our cost structure: Before: Predictable but inefficient monthly expenses regardless of traffic.After: Highly efficient but initially less predictable costs that closely tracked usage.How we adapted: We implemented ceiling limits on autoscaling to prevent runaway costs. We developed resource request guidelines for teams to prevent over-provisioning. Most importantly, we built cost visibility tooling so teams could see the direct impact of their deployment decisions. Developer Experience Transformation Transitioning to serverless required workflow adjustments: Local development continuity. We standardized on kind (Kubernetes in Docker) for local development, ensuring compatibility with our serverless deployments.Troubleshooting changes. Without node access, we invested in enhanced logging and tracing. Distributed tracing, in particular, became essential rather than optional.Deployment pipeline adjustments. We built staging environments that closely mimicked production serverless configurations to catch compatibility issues early. Security Model Adaptation Security practices evolved significantly: Shared responsibility clarity. We documented clear boundaries between provider responsibilities and our security obligations.IAM integration. We moved away from Kubernetes RBAC for some scenarios, leveraging cloud provider identity systems instead.Network security evolution. Traditional network policies gave way to service mesh implementations for fine-grained control. Real-World Outcomes From Our Transition The impact of our serverless Kubernetes adoption went beyond technical architecture: Team Structure Transformation Our platform team of five shrunk to two people, with three engineers reallocated to product development. The remaining platform engineers focused on developer experience rather than firefighting. The on-call rotation, once dreaded for its 3 AM Kubernetes node issues, now primarily handles application-level concerns. Last quarter, we had zero incidents related to infrastructure. Business Agility Improvements Product features that once took weeks to deploy now go from concept to production in days. Our ability to rapidly scale during demand spikes allowed the marketing team to be more aggressive with promotions, knowing the platform would handle the traffic. Perhaps most significantly, we reduced our time-to-market for new initiatives by 40%, giving us an edge over competitors still managing their own Kubernetes infrastructure. Economic Impact After full adoption of serverless Kubernetes: Development environment costs decreased by 78%Overall infrastructure spend reduced by 32%Engineer productivity increased by approximately 25%Time spent on infrastructure maintenance dropped by over 90% Honest Challenges You'll Face No transformation is without its difficulties. These are the real challenges we encountered: Debugging complexity. Without node access, some troubleshooting scenarios became more difficult. We compensated with enhanced observability but still occasionally hit frustrating limitations.Ecosystem compatibility gaps. Several of our favorite Kubernetes tools didn't work as expected in serverless environments. We had to abandon some tooling and adapt others.The cold start compromise. We implemented creative solutions for cold start issues, including keepalive mechanisms for critical services and intelligent prewarming before anticipated traffic spikes.Migration complexity. Moving existing applications required more effort than we initially estimated. If I could do it again, I'd allocate 50% more time for the migration phase. Where Serverless Kubernetes Is Heading Based on industry connections and my own observations, here's where I see serverless Kubernetes evolving: Cost Optimization Tooling The next frontier is intelligent, automated cost management. My team is already experimenting with tools that automatically adjust resource requests based on actual usage patterns. Machine learning-driven resource optimization will likely become standard. Developer Experience Convergence The gap between local development and serverless production environments is narrowing. New tools emerging from both startups and established vendors are creating seamless development experiences that maintain parity across environments. Edge Computing Expansion I'm particularly excited about how serverless Kubernetes is enabling edge computing scenarios. Projects we're watching are bringing lightweight, serverless Kubernetes variants to edge locations with centralized management and zero operational overhead. Hybrid Architectures Standardization The most practical approach for many organizations will be hybrid deployments — mixing traditional and serverless Kubernetes. Emerging patterns and tools are making this hybrid approach more manageable and standardized. Final Thoughts When we started our Kubernetes journey years ago, we accepted operational complexity as the cost of admission for container orchestration benefits. Serverless Kubernetes has fundamentally changed that equation. Today, our team focuses on building products rather than maintaining infrastructure. We deploy with confidence to environments that scale automatically, cost-efficiently, and without operational burden. For us, serverless Kubernetes has delivered on the original promise of containers: greater focus on applications rather than infrastructure. Is serverless Kubernetes right for every workload? Absolutely not. Is it transforming how forward-thinking teams deploy applications? Without question. References Kubernetes Virtual Kubelet documentationCNCF Serverless Landscape AWS Fargate for EKSGoogle Cloud Run for AnthosAzure Container AppsKnative documentation
When you only have a few data sources (e.g., PDFs, JSON) that are required in your generative AI application, building RAG might not be worth the time and effort. In this article, I'll show how you can use Google Gemini to retrieve context from three data sources. I'll also show how you can combine the context and ground results using Google search. This enables the end user to combine real-time information from Google Search with their internal data sources. Application Overview I'll only cover the code needed for Gemini and getting the data rather than building the entire application. Please note that this code is for demonstration purposes only. If you want to implement it, follow best practices such as using a key management service for API keys, error handling, etc. This application can answer any question related to events occurring in Philadelphia (I'm only using Philadelphia as an example because I found some good public data.) The data sources I used to send context to Gemini were a Looker report that has a few columns related to car crashes in Philadelphia for 2023, Ticketmaster events occurring for the following week, and weather for the following week. Parts of the code below were generated using Gemini 1.5 Pro and Anthropic Claude Sonnet 3.5. Data Sources I have all my code in three different functions for the API calls to get data in a file called api_handlers. App.py imports from api_handlers and sends the data to Gemini. Let's break down the sources in more detail. Application files Looker Looker is Google's enterprise BI capability. Looker is an API-first platform. Almost anything you can do in the UI can be achieved using the Looker SDK. In this example, I'm executing a Looker report and saving the results to JSON. Here's a screenshot of the report in Looker. Looker report Here's the code to get data from the report using the Looker SDK. Python def get_crash_data(): import looker_sdk from looker_sdk import models40 as models import os import json sdk = looker_sdk.init40("looker.ini") look_id = "Enter Look ID" try: response = sdk.run_look(look_id=look_id, result_format="json") print('looker done') return json.loads(response) except Exception as e: print(f"Error getting Looker data: {e}") return [] This code imports looker_sdk, which is required to interact with Looker reports, dashboards, and semantic models using the API. Looker.ini is a file where the Looker client ID and secret are stored. This document shows how to get API credentials from Looker. You get the look_id from the Looker's Look URL. A Look in Looker is a report with a single visual. After that, the run_look command executes the report and saves the data to JSON. The response is returned when this function is called. Ticketmaster Here's the API call to get events coming from Ticketmaster. Python def get_philly_events(): import requests from datetime import datetime, timedelta base_url = "https://app.ticketmaster.com/discovery/v2/events" start_date = datetime.now() end_date = start_date + timedelta(days=7) params = { "apikey": "enter", "city": "Philadelphia", "stateCode": "PA", "startDateTime": start_date.strftime("%Y-%m-%dT%H:%M:%SZ"), "endDateTime": end_date.strftime("%Y-%m-%dT%H:%M:%SZ"), "size": 50, "sort": "date,asc" } try: response = requests.get(base_url, params=params) if response.status_code != 200: return [] data = response.json() events = [] for event in data.get("_embedded", {}).get("events", []): venue = event["_embedded"]["venues"][0] event_info = { "name": event["name"], "date": event["dates"]["start"].get("dateTime", "TBA"), "venue": event["_embedded"]["venues"][0]["name"], "street": venue.get("address", {}).get("line1", "") } events.append(event_info) return events except Exception as e: print(f"Error getting events data: {e}") return [] I'm using the Ticketmaster Discovery API to get the name, date, venue, and street details for the next 7 days. Since this is an HTTP GET request, you can use the requests library to make the GET request. If the result is successful, the response gets saved as JSON to the data variable. After that, the code loops through the data, and puts the information in a dictionary called events_info, which gets appended to the events list. The final piece of data is weather. Weather data comes from NOAA weather API, which is also free to use. Python def get_philly_weather_forecast(): import requests from datetime import datetime, timedelta import json lat = "39.9526" lon = "-75.1652" url = f"https://api.weather.gov/points/{lat},{lon}" try: # Get API data response = requests.get(url, headers={'User-Agent': 'weatherapp/1.0'}) response.raise_for_status() grid_data = response.json() forecast_url = grid_data['properties']['forecast'] # Get forecast data forecast_response = requests.get(forecast_url) forecast_response.raise_for_status() forecast_data = forecast_response.json() weather_data = { "location": "Philadelphia, PA", "forecast_generated": datetime.now().strftime("%Y-%m-%d %H:%M:%S"), "data_source": "NOAA Weather API", "daily_forecasts": [] } # Process forecast data - take 14 periods to get 7 full days periods = forecast_data['properties']['periods'][:14] # Get 14 periods (7 days × 2 periods per day) # Group periods into days current_date = None daily_data = None for period in periods: period_date = period['startTime'][:10] # Get just the date part of period is_daytime = period['isDaytime'] # If we're starting a new day if period_date != current_date: # Save the previous day's data if it exists if daily_data is not None: weather_data["daily_forecasts"].append(daily_data) # Start a new daily record current_date = period_date daily_data = { "date": period_date, "forecast": { "day": None, "night": None, "high_temperature": None, "low_temperature": None, "conditions": None, "detailed_forecast": None } } # Update the daily data based on whether it's day or night period_data = { "temperature": { "value": period['temperature'], "unit": period['temperatureUnit'] }, "conditions": period['shortForecast'], "wind": { "speed": period['windSpeed'], "direction": period['windDirection'] }, "detailed_forecast": period['detailedForecast'] } if is_daytime: daily_data["forecast"]["day"] = period_data daily_data["forecast"]["high_temperature"] = period_data["temperature"] daily_data["forecast"]["conditions"] = period_data["conditions"] daily_data["forecast"]["detailed_forecast"] = period_data["detailed_forecast"] else: daily_data["forecast"]["night"] = period_data daily_data["forecast"]["low_temperature"] = period_data["temperature"] # Append the last day's data if daily_data is not None: weather_data["daily_forecasts"].append(daily_data) # Keep only 7 days of forecast weather_data["daily_forecasts"] = weather_data["daily_forecasts"][:7] return json.dumps(weather_data, indent=2) except Exception as e: print(f"Error with NOAA API: {e}") return json.dumps({ "error": str(e), "location": "Philadelphia, PA", "forecast_generated": datetime.now().strftime("%Y-%m-%d %H:%M:%S"), "daily_forecasts": [] }, indent=2) The API doesn't require a key but it does require latitude and longitude in the request. The API request is made and saved as JSON in forecast_data. The weather data is broken out by two periods in a day: day and night. The code loops through 14 times times and keeps only 7 days of forecast. I'm interested in temperature, forecast details, and wind speed. It also gets the high and low temperatures. Bringing It All Together Now that we have the necessary code to get our data, we will have to execute those functions and send them to Gemini as the initial context. You can get the Gemini API key from Google AI Studio. The code below adds the data to Gemini's chat history. Python from flask import Flask, render_template, request, jsonify import os from google import genai from google.genai import types from api_handlers import get_philly_events, get_crash_data, get_philly_weather_forecast from dotenv import load_dotenv # Load environment variables load_dotenv() app = Flask(__name__) # Initialize Gemini client client = genai.Client( api_key='Enter Key Here', ) # Global chat history chat_history = [] def initialize_context(): try: # Get API data events = get_philly_events() looker_data = get_crash_data() weather_data = get_philly_weather_forecast() # Format events data events_formatted = "\n".join([ f"- {event['name']} at {event['venue']} {event['street']} on {event['date']}" for event in events ]) # Create system context system_context = f"""You are a helpful AI assistant focused on Philadelphia. You have access to the following data that was loaded when you started: Current Philadelphia Events (Next 7 Days): {events_formatted} Crash Analysis Data: {looker_data} Instructions: 1. Use this event and crash data when answering relevant questions 2. For questions about events, reference the specific events listed above 3. For questions about crash data, use the analysis provided 4. For other questions about Philadelphia, you can provide general knowledge 5. Always maintain a natural, conversational tone 6. Use Google Search when needed for current information not in the provided data Remember: Your events and crash data is from system initialization and represents that point in time.""" # Add context to chat history chat_history.append(types.Content( role="user", parts=[types.Part.from_text(text=system_context)] )) print("Context initialized successfully") return True except Exception as e: print(f"Error initializing context: {e}") return False The final step is to get the message from the user and call Gemini's Flash 2.0 model. Notice how the model also takes a parameter called tools=[types.Tool(google_search=types.GoogleSearch())]. This is the parameter that uses Google search to ground results. If the answer isn't in one of the data sources provided, Gemini will do a Google search to find the answer. This is useful if you had information, such as events that weren't in Ticketmaster, but you wanted to know about them. I used Gemini to help get a better prompt to give during the initial context initialization. Python from flask import Flask, render_template, request, jsonify import os from google import genai from google.genai import types from api_handlers import get_philly_events, get_crash_data, get_philly_weather_forecast from dotenv import load_dotenv # Load environment variables load_dotenv() app = Flask(__name__) # Initialize Gemini client client = genai.Client( api_key='Enter Key Here', ) # Global chat history chat_history = [] def initialize_context(): """Initialize context with events and Looker data""" try: # Get initial data events = get_philly_events() looker_data = get_crash_data() weather_data = get_philly_weather_forecast() # Format events data to present better events_formatted = "\n".join([ f"- {event['name']} at {event['venue']} {event['street']} on {event['date']}" for event in events ]) # Create system context system_context = f"""You are a helpful AI assistant focused on Philadelphia. You have access to the following data that was loaded when you started: Philadelphia Events for the next 7 Days: {events_formatted} Weather forecast for Philadelphia: {weather_data} Crash Analysis Data: {looker_data} Instructions: 1. Use this events, weather, and crash data when answering relevant questions 2. For questions about events, reference the specific events listed above 3. For questions about crash data, use the analysis provided 4. For questions about weather, use the data provided 5. For other questions about Philadelphia, you can provide general knowledge 6. Use Google Search when needed for current information not in the provided data Remember: Your events and crash data is from system initialization and represents that point in time.""" # Add context to chat history chat_history.append(types.Content( role="user", parts=[types.Part.from_text(text=system_context)] )) print("Context initialized successfully") return True except Exception as e: print(f"Error initializing context: {e}") return False @app.route('/') def home(): return render_template('index.html') @app.route('/chat', methods=['POST']) def chat(): try: user_message = request.json.get('message', '') if not user_message: return jsonify({'error': 'Message required'}), 400 # Add user message to history chat_history.append(types.Content( role="user", parts=[types.Part.from_text(text=user_message)] )) # Configure generation settings generate_content_config = types.GenerateContentConfig( temperature=0.9, top_p=0.95, top_k=40, max_output_tokens=8192, tools=[types.Tool(google_search=types.GoogleSearch())], ) # Generate response using full chat history response = client.models.generate_content( model="gemini-2.0-flash", contents=chat_history, config=generate_content_config, ) # Add assistant response to history chat_history.append(types.Content( role="assistant", parts=[types.Part.from_text(text=response.text)] )) return jsonify({'response': response.text}) except Exception as e: print(f"Error in chat endpoint: {e}") return jsonify({'error': str(e)}), 500 if __name__ == '__main__': # Initialize context before starting print("Initializing context...") if initialize_context(): app.run(debug=True) else: print("Failed to initialize context") exit(1) Final Words I'm sure there are other ways to initialize context rather than using RAG. This is just one approach that also grounds Gemini using Google search.
Disclaimer The stock data used in this article is entirely fictitious. It is purely for demo purposes. Please do not use this data for making any financial decisions. In a previous article, we saw the benefits of using Ollama locally for a RAG application. In this article, we'll extend our evaluation of Ollama by testing natural language (NL) queries against a database system, using LangChain's SQLDatabaseToolkit. SQL will serve as the baseline system for comparison as we explore the quality of results provided by OpenAI and Ollama. The notebook files used in this article are available on GitHub. Introduction LangChain's SQLDatabaseToolkit is a powerful tool designed to integrate NL processing capabilities with relational database systems. It enables users to query databases using NL inputs, using the capabilities of large language models (LLMs) to generate SQL queries dynamically. This makes it especially useful for applications where non-technical users or automated systems need to interact with structured data. A number of LLMs are well supported by LangChain. LangChain also provides support for Ollama. In this article, we'll evaluate how well LangChain integrates with Ollama and the feasibility of using the SQLDatabaseToolkit in a local setup. Create a SingleStore Cloud Account A previous article showed the steps to create a free SingleStore Cloud account. We'll use the Free Shared Tier. Selecting the Starter Workspace > Connect > CLI Client will give us the details we need later, such as username, password, host, port and database. Create Database Tables For our test environment, we'll use SingleStore running in the Cloud as our target database system, and we'll connect securely to this environment using Jupyter notebooks running in a local system. From the left navigation pane in the SingleStore cloud portal, we'll select DEVELOP > Data Studio > Open SQL Editor. We'll create three tables, as follows: SQL CREATE TABLE IF NOT EXISTS tick ( symbol VARCHAR(10), ts DATETIME SERIES TIMESTAMP, open NUMERIC(18, 2), high NUMERIC(18, 2), low NUMERIC(18, 2), price NUMERIC(18, 2), volume INT, KEY(ts) ); CREATE TABLE IF NOT EXISTS portfolio ( symbol VARCHAR(10), shares_held INT, purchase_date DATE, purchase_price NUMERIC(18, 2) ); CREATE TABLE IF NOT EXISTS stock_sentiment ( headline VARCHAR(250), positive FLOAT, negative FLOAT, neutral FLOAT, url TEXT, publisher VARCHAR(30), ts DATETIME, symbol VARCHAR(10) ); We'll load the portfolio table with the following fictitious data: SQL INSERT INTO portfolio (symbol, shares_held, purchase_date, purchase_price) VALUES ('AAPL', 100, '2022-01-15', 150.25), ('MSFT', 50, '2021-12-10', 305.50), ('GOOGL', 25, '2021-11-05', 2800.75), ('AMZN', 10, '2020-07-20', 3200.00), ('TSLA', 40, '2022-02-18', 900.60), ('NFLX', 15, '2021-09-01', 550.00); For the stock_sentiment table, we'll download the stock_sentiment.sql.zip file and unpack it. We'll load the data into the table using a MySQL client, as follows: Shell mysql -u "<username>" -p"<password>" -h "<host>" -P <port> -D <database> < stock_sentiment.sql We'll use the values for <username>, <password>, <host>, <port> and <database> that we saved earlier. Finally, for the tick table, we'll create a pipeline: SQL CREATE PIPELINE tick AS LOAD DATA KAFKA 'public-kafka.memcompute.com:9092/stockticker' BATCH_INTERVAL 45000 INTO TABLE tick FIELDS TERMINATED BY ',' (symbol,ts,open,high,low,price,volume); We'll adjust to get the earliest data: SQL ALTER PIPELINE tick SET OFFSETS EARLIEST; And test the pipeline: SQL TEST PIPELINE tick LIMIT 1; Example output: Plain Text +--------+---------------------+--------+--------+--------+--------+--------+ | symbol | ts | open | high | low | price | volume | +--------+---------------------+--------+--------+--------+--------+--------+ | MMM | 2025-01-23 21:40:32 | 178.34 | 178.43 | 178.17 | 178.24 | 38299 | +--------+---------------------+--------+--------+--------+--------+--------+ And then we'll start the pipeline: SQL START PIPELINE tick; After a few minutes, we'll check the quantity of data loaded so far: SQL SELECT COUNT(*) FROM tick; Local Test Environment From a previous article, we'll follow the same steps to set up our local test environment as described in these sections: Introduction. Use a Virtual Machine or venv.Create a SingleStore Cloud account. This step was completed above.Create a database. The Free Shared Tier already provides a database and we just need to note down the database name.Install Jupyter. Plain Text pip install notebook Install Ollama. Plain Text curl -fsSL https://ollama.com/install.sh | sh Environment variables. Plain Text export SINGLESTOREDB_URL="<username>:<password>@<host>:<port>/<database>" Replace <username>, <password>, <host>, <port> and <database> with the values for your environment. Plain Text export OPENAI_API_KEY="<OpenAI API Key>" Replace <OpenAI API Key> with your key.Launch Jupyter. Plain Text jupyter notebook We'll use the Jupyter notebooks from GitHub. These notebooks are configured to use OpenAI and Ollama. For Ollama, we'll use one of the LLMs listed with Tools support. We'll test the following four queries. First Query SQL SQL SELECT symbol, (MAX(high) - MIN(low)) AS volatility FROM tick GROUP BY symbol ORDER BY volatility ASC LIMIT 1; Natural Language Plain Text "For each stock symbol, calculate the volatility as the difference\n" "between the highest recorded price and the lowest recorded price over time.\n" "Which stock symbol has the least volatility?" Results SQL Plain Text +--------+------------+ | symbol | volatility | +--------+------------+ | FTR | 0.55 | +--------+------------+ OpenAI Plain Text The stock symbol with the least volatility is FTR, with a volatility of 0.55. Ollama Plain Text To find the stock with the highest price, we need to compare the prices of all the given stocks and find the maximum value. However, I don't have real-time access to financial data or the ability to execute code that interacts with external APIs. Therefore, I can't directly calculate the highest price from this list. However, if you provide me with a specific date or time range, I can help you find the stock with the highest price during that period. For example, if you want to know the stock with the highest price on a particular day in the past, I can assist you with that. If you're looking for the current highest-priced stock, I recommend checking a financial news website or an API that provides real-time stock data, such as Yahoo Finance, Google Finance, or a service like Alpha Vantage or Finnhub.io. These platforms can give you the most up-to-date information on stock prices.` For troubleshooting, visit: https://python.langchain.com/docs/troubleshooting/errors/OUTPUT_PARSING_FAILURE Second Query SQL SQL SELECT COUNT(*) FROM tick; Natural Language Plain Text "How many rows are in the tick table?" Results SQL Plain Text +----------+ | COUNT(*) | +----------+ | 22367162 | +----------+ OpenAI Plain Text There are 22,367,162 rows in the tick table. Ollama Plain Text The "tick" table has 3 rows. Third Query SQL SQL -- Step 1: Get the latest price for each symbol WITH latest_prices AS ( SELECT symbol, price FROM tick t1 WHERE ts = ( SELECT MAX(ts) FROM tick t2 WHERE t2.symbol = t1.symbol ) ) -- Step 2: Calculate the total portfolio value SELECT SUM(p.shares_held * lp.price) AS total_portfolio_value FROM portfolio p, latest_prices lp WHERE p.symbol = lp.symbol; Natural Language Plain Text "Taking all the stock symbols from the portfolio table,\n" "and using the latest value for each stock symbol from the tick table,\n" "calculate the grand total value of all the shares listed in the portfolio table." Results SQL Plain Text +-----------------------+ | total_portfolio_value | +-----------------------+ | 44540.60 | +-----------------------+ OpenAI Plain Text $44,540.60 Ollama Plain Text I don't know. The provided SQL query does not contain any of the common mistakes listed. It uses the correct join conditions, functions, and data types. The query also properly quotes identifiers and uses the correct number of arguments for functions. Therefore, no changes are needed. However, without access to the actual database schema and table data, I cannot provide a specific answer to the question. Fourth Query SQL SQL SELECT ( SELECT ts FROM stock_sentiment WHERE symbol = 'AAPL' ORDER BY positive DESC LIMIT 1 ) AS sentiment_date, ( SELECT positive FROM stock_sentiment WHERE symbol = 'AAPL' ORDER BY positive DESC LIMIT 1 ) AS most_positive_sentiment, ( SELECT price FROM tick WHERE symbol = 'AAPL' ORDER BY ts DESC LIMIT 1 ) AS current_best_price; Natural Language Plain Text "Using the symbol AAPL, show me the date and the most\n" "positive sentiment in the stock sentiment table and the\n" "current best price for this symbol from the tick table." Results SQL Plain Text +---------------------+-------------------------+--------------------+ | sentiment_date | most_positive_sentiment | current_best_price | +---------------------+-------------------------+--------------------+ | 2020-05-28 00:00:00 | 0.331509 | 116.27 | +---------------------+-------------------------+--------------------+ OpenAI Plain Text On 2020-05-28, the most positive sentiment for AAPL was 0.331509, and the current best price for AAPL is 116.27. Ollama Plain Text The provided list contains decimal numbers, which appear to be the results of some kind of experiment or analysis. Without additional context, it's difficult to determine the exact nature of these results. However, we can observe that the majority of the numbers are between 116.85 and 117.27, with a few outliers at 115.99 and 117.30. The smallest number in the list is 115.99, and the largest is 117.30.` For troubleshooting, visit: https://python.langchain.com/docs/troubleshooting/errors/OUTPUT_PARSING_FAILURE Summary Analyzing the results, we see that SQL and OpenAI produce consistent outputs across all four queries. However, Ollama presents clear issues. A discussion thread on GitHub highlights that while an LLM model should support tool calling, this functionality is not natively available in Ollama. If you are able to get this LangChain functionality working with Ollama in one of the supported LLMs, please send me a message, and I'll update the article and acknowledge your help.
Java is a programming language with many language features, specifications, and APIs. Even among experienced Java developers, being aware of all of these is quite rare. If a study were conducted, we might come across Java developers who have never worked with Threads, never used JPA, or never developed custom annotations. However, is there a Java developer who has worked with Java 8 or later but has never used the Stream API? I highly doubt it. Gatherers is a powerful extension of the Stream API that introduces support for customized intermediate operations. Initially introduced as a preview feature in JDK 22, it became a standard feature in JDK 24. What Are Gatherers? Gatherers were developed to model intermediate operations in the Stream API. Just as a collector models a terminal operation, a gatherer is an object that models an intermediate operation. Gatherers support the characteristics of intermediate operations — they can push any number of elements to the stream they produce, maintain an internal mutable state, short-circuit a stream, delay consumption, be chained, and execute in parallel. For this reason, as stated in JEP 485: In fact every stream pipeline is, conceptually, equivalent to source.gather(…).gather(…).gather(…).collect(…) Java public interface Gatherer<T, A, R> { … } T represents the input element.A represents the potential mutable state object.R represents the output that will be pushed downstream. A gatherer is built upon four key elements: Java Supplier<A> initializer(); Integrator<A, T, R> integrator(); BinaryOperator<A> combiner(); BiConsumer<A, Downstream<? super R>> finisher(); Initializer – A function that produces an instance of the internal intermediate state.Integrator – Integrates a new element into the stream produced by the Gatherer.Combiner – A function that accepts two intermediate states and merges them into one. Supporting parallel execution.Finisher – A function that allows performing a final action at the end of input elements. Among these four elements, only the integrator is mandatory because it has the role of integrating a new element into the stream produced by the Gatherer. The other elements may or may not be required, depending on the operation you intend to model, making them optional. Creating a Gatherer Gatherers are created using factory methods, or you can implement the Gatherer interface. Depending on the operation you want to model, you can use the overloaded variants of Gatherer.of and Gatherer.ofSequential. Java var uppercaseGatherer = Gatherer.<String, String>of((state, element, downstream) -> downstream.push(element.toUpperCase())); The example gatherer above calls toUpperCase on an input element of type String and pushes the result downstream. This gatherer is equivalent to the following map operation. Java Stream.of("a", "b", "c", "d", "e", "f", "g") .map(String::toUpperCase) .forEach(System.out::print); The Stream interface now includes a method called gather(), which accepts a Gatherer parameter. We can use it by passing the gatherer we created. Java Stream.of("a", "b", "c", "d", "e", "f", "g") .gather(uppercaseGatherer) .forEach(System.out::print); Built-In Gaterers The java.util.stream.Gatherers class is a factory class that contains predefined implementations of the java.util.stream.Gatherer interface, defining five different gatherers. windowFixed. It is a many-to-many gatherer that groups input elements into lists of a supplied size, emitting the windows downstream when they are full.windowSliding. It is a many-to-many gatherer that groups input elements into lists of a supplied size. After the first window, each subsequent window is created from a copy of its predecessor by dropping the first element and appending the next element from the input stream.fold. It is a many-to-one gatherer that constructs an aggregate incrementally and emits that aggregate when no more input elements exist.scan. It is a one-to-one gatherer that applies a supplied function to the current state and the current element to produce the next element, which it passes downstream.mapConcurrent. It is a one-to-one gatherer that invokes a supplied function for each input element concurrently, up to a supplied limit. The function executes in Virtual Thread. All of the above gatherers are stateful. Fold and Scan are very similar to the Stream reduce operation. The key difference is that both can take an input of type T and produce an output of type R, and their identity element is mandatory, not optional. Create Your Own Gatherer Let’s see how we can write our custom gatherer using a real-world scenario. Imagine you are processing a system’s log stream. Each log entry represents an event, and it is evaluated based on certain rules to determine whether it is anomalous. The rule and scenario are as follows: Rule. An event (log entry) is considered anomalous if it exceeds a certain threshold or contains an error.Scenario. If an error occurs and is immediately followed by several anomalous events (three in a row, e.g), they might be part of a failure chain. However, if a “normal” event appears in between, the chain is broken. In this case, we can write a gatherer that processes a log stream and returns only the uninterrupted anomalous events. INFO, ERROR, ERROR, INFO, WARNING, ERROR, ERROR, ERROR, INFO, DEBUG Let’s assume that the object in our log stream is structured as follows. Java class LogWrapper { enum Level{ INFO, DEBUG, WARNING, ERROR } private Level level; private String details; } The object has a level field representing the log level. The details field represents the content of the log entry. We need a stateful gatherer because we must retain information about past events to determine whether failures occur consecutively. To achieve this, the internal state of our gatherer can be a List<LogWrapper> Java static Supplier<List<LogWrapper>> initializer() { return ArrayList::new; } The object returned by the initializer() corresponds to the second parameter explained earlier in the type parameters of the Gatherer interface. Java static Integrator<List<LogWrapper>, LogWrapper, String> integrator(final int threshold) { return ((internalState, element, downstream) -> { if(downstream.isRejecting()){ return false; } if(element.getLevel().equals(LogWrapper.Level.ERROR)){ internalState.add(element); } else { if(internalState.size() >= threshold){ internalState.stream().map(LogWrapper::getDetails).forEach(downstream::push); } internalState.clear(); } return true; }); } The integrator will be responsible for integrating elements into the produced stream. The third parameter of the integrator represents the downstream object. We check whether more elements are needed by calling the isRejecting(), which determines if the next stage no longer wants to receive elements. If this condition is met, we return false. If the integrator returns false, it performs a short-circuit operation similar to intermediate operations like allMatch, anyMatch, and noneMatch in the Stream API, indicating that no more elements will be integrated into the stream. If isRejecting() returns false, we check whether the level value of our stream element, LogWrapper, is ERROR. If the level is ERROR, we add the object to our internal state. If the level is not ERROR, we then check the size of our internal state. If the size exceeds or is equal to the threshold, we push the LogWrapper objects stored in the internal state downstream. If not, we don’t. I want you to pay attention to two things here. Pushing an element downstream or not, as per the business rule, is similar to what filter() does. Accepting an input of type LogWrapper and producing an output of type String is similar to what map() does. After that, according to our business rule, we clear the internal state and return true to allow new elements to be integrated into the stream. Java static BinaryOperator<List<LogWrapper>> combiner() { return (_, _) -> { throw new UnsupportedOperationException("Cannot be parallelized"); }; } To prevent our gatherer from being used in a parallel stream, we define a combiner, even though it is not strictly required. This is because our gatherer is inherently designed to work as expected only in a sequential stream. Java static BiConsumer<List<LogWrapper>, Downstream<? super String>> finisher(final int threshold) { return (state, downstream) -> { if(!downstream.isRejecting() && state.size() >= threshold){ state.stream().map(LogWrapper::getDetails).forEach(downstream::push); } }; } Finally, we define a finisher to push any remaining stream elements that have not yet been emitted downstream. If isRejecting() returns false, and the size of the internal state is greater than or equal to the threshold, we push the LogWrapper objects stored in the internal state downstream. When we use this gatherer on data: Plain Text ERROR, Process ID: 191, event details ... INFO, Process ID: 216, event details ... DEBUG, Process ID: 279, event details ... ERROR, Process ID: 312, event details ... WARNING, Process ID: 340, event details ... ERROR, Process ID: 367, event details ... ERROR, Process ID: 389, event details ... INFO, Process ID: 401, event details ... ERROR, Process ID: 416, event details ... ERROR, Process ID: 417, event details ... ERROR, Process ID: 418, event details ... WARNING, Process ID: 432, event details ... ERROR, Process ID: 444, event details ... ERROR, Process ID: 445, event details ... ERROR, Process ID: 446, event details ... ERROR, Process ID: 447, event details ... Similar to the one above, we get the following result: Plain Text Process ID: 416, event details … Process ID: 417, event details … Process ID: 418, event details … Process ID: 444, event details … Process ID: 445, event details … Process ID: 446, event details … Process ID: 447, event details … The code example is accessible in the GitHub repository. Conclusion Gatherers is a new and powerful API that enhances the Stream API by modeling intermediate operations and allowing the definition of custom intermediate operations. A gatherer supports the features that intermediate operations have; it can push any number of elements to the resulting stream, maintain an internal mutable state, short-circuit a stream, delay consumption, be chained, and execute in parallel. References JEP 485cr.openjdk.org
In Couchbase, memory management in the Query Service is key to keeping the service efficient and responsive, especially as the service handles an increasing number of queries simultaneously. Without proper memory management, things can go awry — greedy queries can hog memory, and the combined memory usage of multiple concurrent queries can overwhelm the service, leading to degraded performance. Fortunately, the Query Service has several features that allow users to manage the memory usage of queries and the overall service. This blog will explore these features in detail: Per Request Memory QuotaSoft Memory LimitNode-wide Document Memory Quota Per Request Memory Quota A significant portion of memory usage of the Query Service comes from transient values, which can include documents or computed values. The memory used by these transient values will be referred to as "document memory" in the blog. The Query Service receives documents from the data service as an encoded byte stream. However, the memory used by the value associated with the document can be much larger than the size of the original stream. This is because the Query Service decodes the stream into a structure that can be large as it must store all fields, values, and any nested objects. The Query Service is optimized for performance and not for compactness. What happens if a resource-intensive query comes along and starts consuming a large amount of document memory? It can end up hogging memory and cause other queries to stall. How do we prevent a "greedy" query from affecting the execution of other active queries? This is exactly where the per-request memory quota feature comes in! Since Couchbase 7.0, the Query Service provides a setting called "memory quota" to limit the maximum amount of document memory that a query request can use at any given time during its execution. This per-request memory quota works by terminating a query if it exceeds its quota, while allowing all other active queries to continue execution. This ensures that only the greedy query is stopped, preventing it from affecting the performance of the other queries. The memory quota does not correspond to OS memory use. It only accounts for document memory usage and not for any memory used in the heap, stack, execution operators, etc. How Does Memory Quota Work? The per-request memory quota can be thought of as configuring a document memory pool for a query request. The size of the pool is determined by the value of the query’s memory quota. When the query requires a document/ value, it allocates the size of the document/value from this pool. When the value/ document is no longer needed, the allocated size is returned back to the pool for reuse by the request. At any given moment, the total amount of document memory being used by the query request cannot exceed the size of its pool, i.e., its memory quota. If the query tries to use more document memory than what is available in its pool, the request will be terminated, and an error will be returned. It is important to note that the Query Service is highly parallelized, and operators can run simultaneously. This means that whether a query exceeds its memory quota can vary between runs. This is because, depending on the specifics of each run, the amount of document memory that is being used ( and hence allocated from its request pool ) can vary, even at the same stage of execution. How to Configure the Memory Quota? The per-request memory quota can be set at a cluster, node, and request level. Unit: MiBDefault: 0, i.e., there is no limit on how much document memory a request can use Cluster Level Set the memory quota for every query node in the cluster with the queryMemoryQuota cluster-level setting. The value at the cluster level is persisted and when set, over-writes the node level setting for every query node. Learn how to set a cluster-level setting here. Node Level Set the memory quota for a particular query node with the memory-quota node-level setting. The value set at the node level is the default memory quota for all query requests executed on the node. The node level value is not persisted and is over-written when the cluster level setting is modified. Learn how to set a node-level setting here. Request Level Set the memory quota for a particular query request with the memory_quota parameter. The request level parameter overrides the value of the node-level setting. However, if the node level setting is greater than zero, the request level value is limited by the node level value. Learn how to set a request-level parameter here. Soft Memory Limit of the Query Service Now that we have explored how to limit the document memory usage of a query, you might be wondering, is there a way to limit the memory usage of the Query Service? The Query Service has no setting to enforce a hard limit on the memory usage of the service. This is because the programming language used to develop SQL++ does not provide a mechanism to enforce a hard limit on its runtime memory usage. But it does provide a mechanism to adjust the soft memory limit… Hence, in Couchbase 7.6.0, the "node quota" setting was introduced to adjust the soft memory limit of the Query Service! Since this is a soft limit, there is no guarantee that the memory usage of the Query Service will always strictly stay below it or that out-of-memory conditions will not occur. However, an effort is made to maintain the Query Service’s memory usage below this limit by running the garbage collector (GC) more frequently when this limit is crossed or approached closely. Important Note If the memory usage stays close to the soft limit, the GC runs aggressively, which can cause high CPU utilization. How to Configure the Node Quota? The node quota can be set at a cluster, node, and request level. Unit: MiBDefault: 0Minimum: 1 While the minimum value of the node quota is 1 MiB, please set the node quota to practical values depending on the workloads and the system’s capabilities. Cluster Level Set the node quota for every query node in the cluster with the queryNodeQuota cluster-level setting. The value at the cluster level is persisted and when set, overwrites the node level setting for every query node. Learn how to set a cluster-level setting here. Important Note One way of configuring this setting cluster wide is by using the Couchbase Web Console. In the Web Console, this can be configured under the "Memory Quota per server node" on the Settings page. This section is specifically for configuring the Query Service's cluster-level node quota and must not be confused with setting the cluster-level memory quota setting. Node Level Set the node quota for a particular query node with the node-quota node-level setting. The value set at the node level is the default memory quota for all query requests executed on the node. The node level value is not persisted and is over-written when the cluster level setting is modified. Learn how to set a node-level setting here. How to Configure the Soft Memory Limit? The soft memory limit of the Query Service is set using the value of the node quota. If not set, a default value is calculated. Node Quota If the node quota is set for a node, this is the soft memory limit. The soft memory limit will be capped at a maximum allowable value, which is calculated using these steps: 1. The difference between the total system RAM and 90% of the total system RAM is calculated. Plain Text Total System RAM - (0.9 * Total System RAM) 2. If the difference is greater than 8 GiB, the maximum soft memory limit will be: Plain Text Total System RAM - 8 GiB 3. If the difference is 8 GiB or less, the maximum soft memory limit will be set to 90% of the total system RAM. If the node quota exceeds the calculated maximum, then the soft memory limit is silently set to the maximum. Default If the node quota setting is not set for a node, a default value is calculated for the soft limit using the following steps: 1. The difference between the total system RAM and 90% of the total system RAM is calculated. Plain Text Total System RAM - (0.9 * Total System RAM) 2. If the difference is greater than 8 GiB, the default soft memory limit will be: Plain Text Total System RAM - 8 GiB 3. If the difference is 8 GiB or less, the default soft memory limit will be set to 90% of the total system RAM. Node-Wide Document Memory Quota What if a workload has a query that requires a large amount of memory to execute? Enforcing a per-request memory quota might not be ideal, as this query might frequently be terminated for exceeding its quota. How can this query successfully execute while still protecting the Query Service from excessive memory usage? Consider another scenario with multiple queries executing concurrently, each with a per-request memory quota set. In this scenario, the memory usage of the service has become very high. But the document memory use of the queries remains below their respective quotas. So, no query is terminated. As a result, the overall memory usage of the Query Service remains high, causing problems. How can this be addressed? Starting in Couchbase 7.6.0, the Query Service has a mechanism to limit the cumulative amount of document memory that active queries can use! The introduction of a node-wide document memory quota attempts to address these challenges. How Does Node-Wide Quota Work? The node-wide quota can be thought of as configuring a document memory pool for the entire Query Service on a node. When the node quota is set, a "memory session" is created for each request. By default, this session starts with an initial size of 1 MiB. When the node-wide quota is configured, 1 MiB is allocated for every servicer and subtracted from the node-wide pool. This default allocation guarantees that each servicer has at least a minimum amount of reserved space, ensuring that incoming requests can always be serviced.When a request requires a value/document, the size of the value/document is allocated from its session. If the session does not have enough memory for this allocation, it will grow in minimum increments of 1 MiB to accommodate the allocation request. The additional memory required for this growth is allocated from the node-wide pool. If an active request’s memory session attempts to grow beyond the available remaining memory in the node-wide pool, the request will be stopped, and an error will be returned.Once the request no longer needs the value/document, it returns the allocated size back to its session. The session’s memory ( excluding the 1 MiB of the initial servicer reservation ) is only returned to the node-wide pool once the request’s execution completes.At any time, the total size of all memory sessions cannot exceed the size of the node-wide quota. It is important to understand that this memory session is not to be confused with the per-request pool that is configured when the memory quota is set for a request. The two are not the same. Both a node-wide quota and a per-request memory quota can be configured. Read the "Configuring both Node-Wide Document Quota and Per-Request Memory Quota" section below to understand more. In this way, the node-wide quota places a limit on the amount of document memory that is being used by all active requests. The node-wide document quota can only be configured when the node quota setting is explicitly set for a node. The size of this quota is calculated using two Query settings, "node quota" ( explored in an earlier section ) and "node quota value percent." How to Configure Node Quota Value Percent? The node quota value percent is the percentage of the node quota dedicated to tracked value content memory/"document memory" across all active requests. The node quota value percent can be set at the cluster and the node level. Unit: MiBDefault: 67Minimum: 0Maximum: 100 Cluster Level Set the node quota value percentage for every query node in the cluster with the queryNodeQuotaValPercent cluster-level setting. The value at the cluster level is persisted and when set, over-writes the node level setting for every query node. Learn how to set a cluster-level setting here. Node Level Set the node quota value percentage for a particular query node with the node-quota-val-percent node-level setting. The value set at the node level is the default memory quota for all query requests executed on the node. The node level value is not persisted and is over-written when the cluster level setting is modified. Learn how to set a node-level setting here. How to Configure the Node-Wide Document Memory Quota? The size of the node-wide document memory quota is calculated relative to the node quota. The node quota must be set for the node-wide document memory quota to be configured. The size of the node-wide pool is calculated using the following steps: 1. Calculate the percentage of the node quota dedicated to tracking document memory across all active queries using the following formula: Plain Text node-quota * node-quota-val-percent / 100 2. Calculate the minimum allowable value for the node-wide document memory quota. The execution of SQL++ statements is handled by "servicers." When a query is to be executed, it is assigned to a servicer thread that is responsible for its execution. The Query Service is configured with a number of servicers to handle incoming requests. There are two types of servicers, "unbounded servicers" and "plus servicers." The Query engine reserves 1 MiB of document memory for each servicer. Hence, the default initial value of each request’s memory session is 1 MiB. This means that the baseline document memory usage will be the total number of unbounded and plus servicers, measured in MiB. Therefore, the size of the node-wide document memory quota must be at least equal to the number of servicers, measured in MiB. Formula 1 Plain Text Quota reserved for servicers = (number of unbounded servicers + number of plus servicers) MiB Learn more about unbounded servicers here and plus servicers here. 3. The size of the node-wide document memory quota is calculated using the following formula: Formula 2 Plain Text Size of node-wide document memory quota = MAX(node-quota * node-quota-val-percent / 100, Quota reserved for servicers) The quota reserved for the servicers is calculated using Formula 1. This is the maximum allowable size of all memory sessions across active requests and includes the initial reservation for each servicer. Calculating Available Quota in the Pool for Document Memory Growth The initial reservation for the servicers is deducted from the node-wide document memory quota for the node. Any remaining space in the node-wide memory pool can be used by each active request to grow its document memory usage beyond its initial 1 MiB reservation. This remaining quota available for document memory growth is calculated using the following formula: Formula 3 Plain Text Size of node-wide document memory quota available for memory sessions of active requests to grow = Size of node-wide document memory quota - Quota reserved for servicers The size of the node-wide document memory quota is calculated in Formula 2.The quota reserved for the servicers is calculated using Formula 1. It is important to set appropriate node-quota and node-quota-val-percent values that are practical and suitable for workloads. The next section explores an example to illustrate the importance of this. Example Consider a query node with 32 unbounded servicers and 128 plus servicers. The Administrator sets the node quota to 10 MiB. The node-quota-val-percent is the default value of 67. Using Formula 2 to calculate the size of the node-wide document memory quota: Plain Text Size of node-wide document memory quota = MAX(node-quota * node-quota-val-percent / 100, Quota reserved for servicers) = MAX( node-quota * node-quota-val-percent / 100, (number of unbounded servicers + number of plus servicers) MiB ) = MAX ( 10 * 67 / 100 MiB, (32+128) MiB ) = MAX ( 6.7 MiB, 160 MiB) = 160 MiB Using Formula 3 to calculate the amount of document memory available in the node-wide pool available for requests’ memory growth: Plain Text = Size of node-wide document memory quota - (number of unbounded servicers + number of plus servicers) MiB = 160 MiB - (32+128) MiB = 160 MiB - 160 MiB = 0 MiB This means that there is no room for document memory growth of requests beyond their 1 MiB initial reservation. In other words, each request is limited to using a maximum of 1 MiB of document memory. Additionally, the node quota of 10 MiB is very small, and garbage collection will likely be forced to run frequently, causing high CPU utilization. Reporting Document Memory Figures If the memory quota was set for a request or a node-wide document memory pool configured, information about the same will be reported in several SQL++ features which will be explored below. This information is helpful for debugging. 1. Response Output Metrics The usedMemory field in the metrics section of the query’s response reports the high-water mark (HWM) document memory usage of the query in bytes. The Query Service is highly parallelized, and operators can run simultaneously. As a result, the usedMemory figures can vary between runs for the same query. This is because, depending on the specifics of each run, the HWM document memory usage can be different. A sample metrics section of a query response: JSON "metrics": { "elapsedTime": "19.07875ms", "executionTime": "18.909916ms", "resultCount": 10000, "resultSize": 248890, "serviceLoad": 2, "usedMemory": 341420 } Controls Section If the controls Query setting is enabled, and the memory quota configured for the request, the memoryQuota field in the controls section of the query’s response reports the value of the memory quota set. A sample controls section of a query response: JSON "controls": { "scan_consistency": "unbounded", "use_cbo": "true", "memoryQuota": "25", "n1ql_feat_ctrl": "0x4c", "disabledFeatures":[ "(Reserved for future use) (0x40)", "Encoded plans (0x4)", "Golang UDFs (0x8)" ], "stmtType": "SELECT" } Learn more about the controls setting here. 2. System Keyspaces In a request’s entry in the system:completed_requests and system:active_requestssystem keyspaces: The usedMemory field is the HWM document memory usage of the query in bytes. The Query Service is highly parallelized, and operators can run simultaneously. As a result, the usedMemory figures can vary between runs for the same query. This is because, depending on the specifics of each run, the HWM document memory usage can be different.The memoryQuota field is the value of the memory quota set for the request Learn more about system:completed_requests here, and system:active_requests here. Configuring Both Per-Request Memory Quota and Node-Wide Document Quota As described in the "Per Request Memory Quota" section, if a request has a memory quota configured, the maximum amount of document memory that it can use at any given time during its execution is limited by the memory quota. Additionally, as explained in the "Node-Wide Document Memory Quota" section, when the node quota and a node-wide document memory quota are configured, each request gets its own "memory session." Any growth in the size of these sessions is allocated from the node-wide document memory quota. If a node-wide document memory quota is configured and a request has a memory quota set, the document memory usage of the query request is limited by both quotas. How Would a Document/Value Allocation be Performed? When the request requires a document/ value, the following steps are performed during the allocation process: 1. Memory Session Allocation The request first tries to allocate memory for the document from its memory session. If there is enough space in the session, the allocation is successful.If there is insufficient space in the session, the session attempts to grow its size by allocating from the node-wide document memory quota. (i.e., from the "node-wide document memory pool" ). If there is not enough space for the session’s growth in the node-wide pool, the request will be stopped, and an error will be returned. 2. Request Memory Quota Allocation If the session allocation is successful, the request will attempt to allocate memory for the document from its memory quota. (i.e., from its "request memory pool").If there is sufficient space left in its memory quota, the allocation succeeds, and the request proceeds.If there is not enough remaining space in the memory quota, the request will fail, and an error will be returned. Monitoring With system:vitals The system keyspace system:vitals contains important information about each query node in the cluster, including information related to memory and CPU usage, garbage collection, and much more. Users can use this system keyspace to monitor the health and vitals of the query nodes. There are two ways to access this information: 1. Query the system:vitals keyspace using SQL++. SQL SELECT * FROM system:vitals; 2. Accessing the vitals per node using the Query Service's /admin/vitals endpoint. Plain Text curl -u $USER:$PASSWORD $QUERY_NODE_URL/admin/vitals Below is a sample of a record in system:vitals for a query node. JSON { "bucket.IO.stats": { "travel-sample": { "reads": 52090 } }, "cores": 12, "cpu.sys.percent": 0.005, "cpu.user.percent": 0.0056, "ffdc.total": 0, "gc.num": 64352224, "gc.pause.percent": 0, "gc.pause.time": "4.479336ms", "healthy": true, "host.memory.free": 321028096, "host.memory.quota": 10485760000, "host.memory.total": 38654705664, "host.memory.value_quota": 7025459200, "load": 0, "loadfactor": 6, "local.time": "2024-12-05T17:21:25.609+05:30", "memory.system": 584662408, "memory.total": 3884613696, "memory.usage": 25302328, "node": "127.0.0.1:8091", "node.allocated.values": 613916, "node.memory.usage": 251658240, "process.memory.usage": 0, "process.percore.cpupercent": 0, "process.rss": 629309440, "process.service.usage": 0, "request.active.count": 1, "request.completed.count": 41, "request.per.sec.15min": 0.0221, "request.per.sec.1min": 0.0136, "request.per.sec.5min": 0.017, "request.prepared.percent": 0, "request.queued.count": 0, "request_time.80percentile": "63.969209ms", "request_time.95percentile": "74.865437ms", "request_time.99percentile": "150.904625ms", "request_time.mean": "37.019323ms", "request_time.median": "39.115791ms", "servicers.paused.count": 0, "servicers.paused.total": 0, "temp.hwm": 0, "temp.usage": 0, "total.threads": 411, "uptime": "12m48.431007375s", "version": "7.6.0-N1QL" } Learn more about the Vitals here and the system:vitals keyspace here. How Does the Query Service Trigger the Garbage Collector? Starting in 7.6.0, the Query Service routinely checks if the garbage collector (GC) has run in the last 30 seconds. If it has not, the GC is triggered to run. During this check, the amount of free system memory is also monitored. If the amount of free memory is less than 25%, an attempt is made to return as much memory to the OS as possible. Run Garbage Collector on Demand Starting in Couchbase 7.6.0, the Query Service provides a REST endpoint /admin/gc that can be invoked to run the garbage collector. This endpoint can be invoked to trigger a GC run in an attempt to reduce memory utilization. To force a GC run, issue a GET request to the API. Plain Text curl -u $USER:$PASSWORD $QUERY_NODE_URL/admin/gc To force a GC run and attempt to return as much memory to the OS as possible, issue a POST request to the API. Plain Text curl -X POST -u $USER:$PASSWORD $QUERY_NODE_URL/admin/gc Learn more about this endpoint here. Important Note Aggressively running the garbage collector can cause high CPU utilization. Helpful References Couchbase blog on per-request memory quotaCouchbase documentation for system keyspacesCouchbase documentation for configuring cluster, node, request level Query settings
If you would like to run your own solution without any costs and get familiar with serverless architecture, I'd encourage you to look at CloudFare. Recently, I tried to assemble all the pieces together to make the Next.js application authorization work on Cloudflare and faced a lot of issues. Provided examples on the official site and demos on other sources were working only partially. After some digging and combining the pieces together, it finally started working. You could also look at the source code and working demo. For the rest who would like to know what to do to get your own setup up and running, let's begin. First, create your account on CloudFare. It's completely free and very straightforward to do. Now, you can clone the demo project with: Plain Text git clone You can find the instructions in the Readme.md file and what you have to do there. There are multiple ways how you can run the project. If you want to run it with hot reload, you'd better use: Plain Text pnpm run dev But be aware that if the project runs with this command, it doesn't guarantee it will work on CloudFare. To be sure that when you deploy, it will work, you can verify it locally with: Plain Text pnpm run preview But with this mode, you can't enjoy your hot-reloading capabilities and building time significantly longer. So, I would not use it during the active development. In CloudFare, you can run your applications as Pages and also as Workers. I chose to run it as a Page because Workers have a limitation of 3 MB on a free tier. With the Next.js application, it's very easy to reach this threshold. It also means that if you'd like to add backend API, you have to create a Worker application and create a binding between Worker and Page. Now, about the quirks you may come across when trying to create the same application from other sources: In auth.ts, you'll find an example to configure it as: TypeScript const authResult = async (): Promise<NextAuthResult> => { return NextAuth({ providers: [ Resend({ apiKey: (await getCloudflareContext()).env.AUTH_RESEND_KEY, from: (await getCloudflareContext()).env.AUTH_EMAIL_FROM, }), ], adapter: D1Adapter((await getCloudflareContext()).env.DB), }); }; Notice the line: Plain Text D1Adapter((await getCloudflareContext()).env.DB), This will not work! The latest version of NextAuth doesn't allow to do it as it treats it as an invocation in the root of the file. Also, be careful with getting the context. In the example above: For Workers, it is getCloudflareContext().For Pages, it is getRequestContext(). You can't mix them; otherwise, it will be borked. The next issue is with NextAuth when you deal with signing in the user and redirecting it to the page. I've seen many suggestions in StackOverflow, but what seems to work the best is: TypeScript try { await signIn("credentials", { email: formData.get("email") as string, }); } catch (error) { if (error?.hasOwnProperty("digest")) { throw error; } } What's going on here? signIn actually throws the exception always! It doesn't really matter if the user signed in successfully or not. But, to let the error propagate when the user signs in, it works well, and the page is redirected how it should. In the case of failure we use an extra logic for that. As this is a server-side component, there are not many options how to display the error to users. I used the solution to pass the information via cookie, because useState, useEfffect, and other hooks work only for client-side components. One more thing to be aware of: I was very surprised to see that it matters which package manager to use. When I used npm, I've seen very weird errors. Also, do not mix invocation of different package managers — first pnpm, then npm, or yarn. It can make your application not work after. Happy coding, and let me know if you'd like to share with me your experience with this tech stack and the challenges on the way with it!
DZone events bring together industry leaders, innovators, and peers to explore the latest trends, share insights, and tackle industry challenges. From Virtual Roundtables to Fireside Chats, our events cover a wide range of topics, each tailored to provide you, our DZone audience, with practical knowledge, meaningful discussions, and support for your professional growth. DZone Events Happening Soon Below, you’ll find upcoming events that you won't want to miss. Unpacking the 2025 Developer Experience Trends Report: Insights, Gaps, and Putting it into Action Date: March 19, 2025Time: 1:00 PM ET Register for Free! We’ve just seen the 2025 Developer Experience Trends Report from DZone, and while it shines a light on important themes like platform engineering, developer advocacy, and productivity metrics, there are some key gaps that deserve attention. Join Cortex Co-founders Anish Dhar and Ganesh Datta for a special webinar, hosted in partnership with DZone, where they’ll dive into what the report gets right—and challenge the assumptions shaping the DevEx conversation. Their take? Developer experience is grounded in clear ownership. Without ownership clarity, teams face accountability challenges, cognitive overload, and inconsistent standards, ultimately hampering productivity. Don’t miss this deep dive into the trends shaping your team’s future. Accelerating Software Delivery: Unifying Application and Database Changes in Modern CI/CD Date: March 25, 2025Time: 1:00 PM ET Register for Free! Want to speed up your software delivery? It’s time to unify your application and database changes. Join us for Accelerating Software Delivery: Unifying Application and Database Changes in Modern CI/CD, where we’ll teach you how to seamlessly integrate database updates into your CI/CD pipeline. Petabyte Scale, Gigabyte Costs: Mezmo’s ElasticSearch to Quickwit Evolution Date: March 27, 2025Time: 1:00 PM ET Register for Free! For Mezmo, scaling their infrastructure meant facing significant challenges with ElasticSearch. That's when they made the decision to transition to Quickwit, an open-source, cloud-native search engine designed to handle large-scale data efficiently. This is a must-attend session for anyone looking for insights on improving search platform scalability and managing data growth. Best Practices for Building Secure Data Pipelines with Apache Airflow® Date: April 15, 2025Time: 1:00 PM ET Register for Free! Security is a critical but often overlooked aspect of data pipelines. Effective security controls help teams protect sensitive data, meet compliance requirements with confidence, and ensure smooth, secure operations. Managing credentials, enforcing access controls, and ensuring data integrity across systems can become overwhelming—especially while trying to keep Airflow environments up–to-date and operations running smoothly. Whether you're working to improve access management, protect sensitive data, or build more resilient pipelines, this webinar will provide the knowledge and best practices to enhance security in Apache Airflow. Generative AI: The Democratization of Intelligent Systemsive Date: April 16, 2025Time: 1:00 PM ET Register for Free! Join DZone, alongside industry experts from Cisco and Vertesia, for an exclusive virtual roundtable exploring the latest trends in GenAI. This discussion will dive into key insights from DZone's 2025 Generative AI Trend Report, focusing on advancements in GenAI models and algorithms, their impact on code generation, and the evolving role of AI in software development. We’ll examine AI adoption maturity, intelligent search capabilities, and how organizations can optimize their AI strategies for 2025 and beyond. What's Next? DZone has more in store! Stay tuned for announcements about upcoming Webinars, Virtual Roundtables, Fireside Chats, and other developer-focused events. Whether you’re looking to sharpen your skills, explore new tools, or connect with industry leaders, there’s always something exciting on the horizon. Don’t miss out — save this article and check back often for updates!
As the energy and climate footprint of digital technologies keeps accelerating, frugal AI emerges as an alternative that seeks to reconcile innovation with environmental responsibility. What is this approach? Can it truly establish itself against current ultra-powerful yet energy-intensive models? What are the challenges and opportunities associated with its adoption? Let's break it down. At a global scale, digital technology today accounts for nearly 4% of greenhouse gas emissions, a figure that is steadily increasing, with an annual growth rate of 6%. This footprint is now comparable to that of the civil aviation sector. Around the world, digital energy consumption is following an exponential trajectory. But how can we balance technological innovation with environmental responsibility? And could frugal AI be a viable solution? Why Frugal AI Matters: A Revolution in Efficiency and Innovation Frugal artificial intelligence is an approach aimed at maximizing the efficiency of AI systems while reducing their resource consumption. Unlike conventional models, which are often highly demanding in terms of data and computing power, frugal AI is based on optimization strategies designed to minimize its environmental impact while maintaining acceptable performance levels. This approach emerges as AI, particularly generative AI, is becoming a major driver of energy consumption. Training a large language model, like those developed by Anthropic, AWS, DeepSeek, Google, IBM, Meta, and OpenAI, to name a few, consumes hundreds of thousands of kWh, equivalent to millions of miles driven by a car. In France, for example, digital technology already accounts for 10% of national electricity consumption, and this share could continue to grow with the massive expansion of AI and the projected development of data centers. Globally, digital infrastructures use as much electricity as an entire industrialized nation: according to the International Energy Agency, their consumption could exceed 1,000 TWh by 2026, equivalent to Japan’s total energy consumption. The revised Energy Efficiency Directive from the European Commission includes energy efficiency metrics and good practices that promote efficiency, along with carbon emissions reductions." (IEA, Electricity 2024: Analysis and Forecast to 2026) Smarter Algorithms, Greener AI: The Optimization Factor Faced with these energy and environmental challenges, it becomes clear that technological approaches must strike a balance between performance and energy efficiency. In this context, algorithmic optimization aims to reduce the size and complexity of models. This is achieved through compression techniques, decreasing the number of parameters, and leveraging learning techniques that require less data. These approaches help maintain satisfactory performance while significantly reducing computational and storage requirements. Frugal AI relies on low-power processors and encourages executing computations directly on the user's device rather than systematically relying on data centers. This approach not only reduces overall energy consumption but also improves system latency and responsiveness. Reducing the carbon footprint is another key priority. Frugal AI seeks to limit the reliance on heavy infrastructure and promotes extending the lifespan of equipment. The goal is to avoid frequent hardware renewal, as manufacturing, transportation, and disposal of digital equipment have a significant environmental impact. Finally, frugal AI aligns with the principles of trustworthy and explainable AI. The objective is to ensure that models remain interpretable, robust, and compliant with ethical and regulatory principles while optimizing the resources they utilize. It is about striking a balance between performance, environmental responsibility, and transparency so that AI can be sustainably integrated into a more efficient and resilient technological framework. In France, Ecolab, in collaboration with AFNOR (*), has published a reference document (AFNOR Spec 2314) that enables digital stakeholders to assess the environmental impact of their AI projects and communicate their frugality. (*) Association Française de Normalisation. This is the French organization responsible for developing and disseminating voluntary standards in France. Beyond the Hype: The Real Impact of Frugal AI AI, and particularly generative AI, is part of a broader movement where ecological transition and regulation are playing an increasingly significant role in the strategies of businesses and public administrations. While its potential for innovation is immense, its energy and environmental footprint raises critical concerns. We can clearly see that as resource consumption accelerates, decision-makers in both the public and private sectors may be forced to choose between technological performance and ecological responsibility. In this context, initiatives are emerging to regulate this evolution: in France, for example, Ecolab and AFNOR have established a general framework for frugal AI, allowing projects to be assessed and certified based on their environmental impact. Similarly, European regulations like the AI Act are beginning to incorporate energy efficiency criteria into AI system evaluations. This regulation aims to ensure that fundamental rights, democracy, the rule of law and environmental sustainability are protected from high risk AI, while boosting innovation and making Europe a leader in the field. The rules establish obligations for AI based on its potential risks and level of impact. - Artificial Intelligence Act: deal on comprehensive rules for trustworthy AI - European Parliament Beyond regulations, social and economic pressures are pushing companies to rethink their strategies. A more resource-efficient AI model is not just an ecological necessity; it can also become a competitive advantage. A well-designed, frugal AI system can optimize energy costs, adapt to hardware constraints (especially in embedded systems and edge computing), and enhance infrastructure resilience in response to energy supply pressures. It is evident that frugal AI is not merely a technological approach. It is increasingly shaping up as a strategic response to the challenges faced by companies and public administrations. The questions it raises prompt a broader reflection on how we conceive artificial intelligence and technological innovations in a world where resources are finite. What Makes Frugal AI Different — And Why It’s Hard to Get Right Departing from traditional approaches, frugal AI relies on several technical and methodological principles aimed at reducing its ecological footprint and dependency on massive resources. As previously mentioned, algorithmic optimization involves model compression, parameter reduction, and the integration of techniques such as low-shot learning. These approaches allow models to learn effectively even with a limited amount of data. The key benefit lies in significantly reducing the energy needed for training. Along these lines, frugal AI also promotes lower energy consumption by prioritizing models that require less computational power and integrating solutions such as edge computing. This approach enables data processing closer to users, limiting the systematic reliance on data centers and thereby reducing their energy impact. When discussing edge computing and embedded or mobile systems, we can summarize by saying that it involves processing data directly where it is generated — close to users or within the devices themselves. Think of smartphones, connected vehicles, or industrial sensors: today, each has computing power that is often overlooked or underutilized. Embedded systems are precisely these small autonomous systems integrated into connected objects, capable of performing calculations independently without systematically depending on a remote data center. This proximity reduces energy consumption, streamlines data exchanges, and makes artificial intelligence more responsive while preserving the environment. Regulating Frugal AI: The Fine Line Between Progress and Compliance Frugal AI is part of a regulatory and responsible approach. Faced with new European regulations aimed at controlling the environmental impact of digital technology, frugal AI must ensure transparency, compliance, and explainability. Companies and researchers must, therefore, develop models aligned with existing standards to guarantee more energy-efficient, ethical, and sustainable AI. Its development is supported by several legislative and normative initiatives. In 2024, a general framework for frugal AI was published by AFNOR and the Ministry of Ecological Transition. It establishes a methodological framework designed to assess and reduce the environmental impact of AI projects. At the same time, the European Union has adopted guidelines on sustainable AI, integrating energy efficiency criteria and environmental performance indicators into the AI Act. In this context, France, less advanced than other countries in generative AI, is betting on a strategy of frugal and trustworthy AI, with demonstrators aiming to promote more environmentally respectful, ethical, and sustainable uses. Conclusion The Future of AI: Can We Afford to Ignore Frugality? Frugal AI could establish itself as a strong response to the ecological, technological, and societal challenges posed by the explosion of digital usage. Perhaps we are approaching a stage where reducing AI's energy consumption and carbon footprint will no longer be an option but a strategic imperative. Yet, it is not that simple. This transition requires rethinking models in a way that combines algorithmic optimization, data reduction, and more efficient use of infrastructures. Above all, it must be based on a regulatory framework that continuously evolves. More than a constraint, frugal AI represents an opportunity for innovation and competitiveness for those who can anticipate this shift. If every watt and every piece of data matters, then its relevance becomes all the more evident in shaping a more efficient and sustainable digital landscape. The real question isn’t if we should adopt it, but how to make it work. One More Thing Hype or Transformation? The Real Future of Frugal AI? If this approach appears to address environmental and energy challenges, the question remains: is it truly possible to achieve comparable performance to traditional models while drastically reducing resource consumption? Furthermore, the adoption of frugal AI by businesses is not yet a given. It requires awareness, substantial R&D investments, and a cultural shift in an industry dominated by energy-intensive models. If some sectors are beginning to integrate these principles, the widespread adoption of optimized AI will require strong incentives from society, regulatory initiatives, and economic pressure mechanisms. One final question arises: could regulation itself become a barrier to its adoption? The emergence of stringent standards and environmental benchmarks may both ensure responsible practices and, unfortunately, create new challenges for companies required to comply. Is frugal AI merely a trend, or will it become the foundation of a more sustainable digital future? Only time will tell, but those who anticipate this transformation may well become the technological leaders of tomorrow.
Web automation testing demands both efficient processes and precise outcomes. Microsoft developed Playwright as an open-source framework, which gained recognition for its powerful browser automation across Chromium, Firefox, and WebKit. Playwright consistently releases new features through ongoing updates that optimize workflows for both developers and testers. The new “Copy prompt” feature introduced in version 1.51 enhances debugging processes through the capabilities of large language models (LLMs). This blog will explain this feature’s functionality and workings before demonstrating its practical application through an example. What Is the “Copy Prompt” Feature? Playwright introduced the “Copy prompt” feature, which streamlines the debugging process. The feature shows up as a button within the Playwright HTML report and both trace viewer and UI mode. The button becomes accessible next to error details whenever a test fails and produces an error. The button action generates a clipboard-ready prompt containing both the error message and the surrounding context for troubleshooting. Developers can achieve faster results through LLM platforms such as ChatGPT or Grok by using this prompt, which eliminates the need to create queries manually. Why This Feature Matters Debugging is often one of the most time-consuming aspects of test automation. A failing test might throw an error like TimeoutError: waiting for selector “button.submit” failed, leaving you to figure out whether the selector is wrong, the page didn’t load correctly, or some other issue is at play. Traditionally, you’d need to: Read the error message and stack trace.Investigate the test code and the application under test.Find the cause and search for solutions — perhaps by Googling or asking a colleague.Iterate until the issue is resolved. With “Copy prompt,” Playwright eliminates much of this manual effort. The pre-filled prompt includes not just the error but also relevant metadata — like the test name, browser context, and potentially even snippets of the failing code. This enriched context allows an LLM to provide more accurate and specific suggestions, transforming a tedious process into a single click-and-paste operation. How It Works Let’s break down where and how you can use this feature: HTML report. After running your tests with the command npx playwright test — reporter=html, an HTML report is generated. If a test fails, you can open the report, locate the error, and find the “Copy prompt” button next to it.Trace viewer. Playwright’s trace viewer (Command: npx playwright show-trace trace.zip) provides a detailed timeline of actions, network requests, and errors. When an error is highlighted, the button appears, ready to copy a prompt based on the trace data.UI mode. Running the command npx playwright test — ui launches an interactive interface where you can execute and debug tests. Errors here also come with the “Copy prompt” option, integrating seamlessly into the visual workflow. Once clicked, the button copies a structured prompt to your clipboard. You can then paste it into an LLM interface — say, Claude, ChatGPT, or Grok — and get a response tailored to your specific issue. A Practical Example To illustrate how “Copy prompt” works, let’s walk through a real-world scenario. Suppose you’re writing a Playwright test to verify a login and logout feature on a sample website. Here’s the test code: TypeScript import { test, expect } from '@playwright/test'; test('Sauce test', async ({ page }) => { await page.goto('https://www.saucedemo.com/'); await page.locator('[data-test="username"]').click(); await page.locator('[data-test="username"]').fill('standard_user'); await page.locator('[data-test="password"]').click(); await page.locator('[data-test="password"]').fill('secret_sauce'); await expect(page.locator('form')).toMatchAriaSnapshot(` - textbox "Username": standard_user - textbox "Password": secret_sauce - button "Login" `); await page.locator('[data-test="login-button"]').click(); await page.locator('button', { name: 'Open Menu' }).click(); await page.locator('[data-test="logout-sidebar-link"]').click(); }); You run the test with npx playwright test tests/sauce.spec.js — reporter=html, and it fails. The HTML report shows an error. Next to this error, you see the “Copy prompt” button. Clicking it copies something like this to your clipboard. TypeScript # Instructions - Following Playwright test failed. - Explain why, be concise, respect Playwright best practices. - Provide a snippet of code with the fix, if possible. # Test info - Name: Sauce test - Location: /Users/xxx.xx/Documents/pwdemo/tests/sauce.spec.js:3:5 # Error details ``` Error: locator.click: Error: strict mode violation: locator('button') resolved to 8 elements: 1) <button type="button" id="react-burger-menu-btn">Open Menu</button> aka getByRole('button', { name: 'Open Menu' }) 2) <button type="button" tabindex="-1" id="react-burger-cross-btn">Close Menu</button> aka getByText('Close Menu') 3) <button id="add-to-cart-sauce-labs-backpack" name="add-to-cart-sauce-labs-backpack" data-test="add-to-cart-sauce-labs-backpack" class="btn btn_primary btn_small btn_inventory ">Add to cart</button> aka locator('[data-test="add-to-cart-sauce-labs-backpack"]') 4) <button id="add-to-cart-sauce-labs-bike-light" name="add-to-cart-sauce-labs-bike-light" data-test="add-to-cart-sauce-labs-bike-light" class="btn btn_primary btn_small btn_inventory ">Add to cart</button> aka locator('[data-test="add-to-cart-sauce-labs-bike-light"]') 5) <button id="add-to-cart-sauce-labs-bolt-t-shirt" name="add-to-cart-sauce-labs-bolt-t-shirt" data-test="add-to-cart-sauce-labs-bolt-t-shirt" class="btn btn_primary btn_small btn_inventory ">Add to cart</button> aka locator('[data-test="add-to-cart-sauce-labs-bolt-t-shirt"]') 6) <button id="add-to-cart-sauce-labs-fleece-jacket" name="add-to-cart-sauce-labs-fleece-jacket" class="btn btn_primary btn_small btn_inventory " data-test="add-to-cart-sauce-labs-fleece-jacket">Add to cart</button> aka locator('[data-test="add-to-cart-sauce-labs-fleece-jacket"]') 7) <button id="add-to-cart-sauce-labs-onesie" name="add-to-cart-sauce-labs-onesie" data-test="add-to-cart-sauce-labs-onesie" class="btn btn_primary btn_small btn_inventory ">Add to cart</button> aka locator('[data-test="add-to-cart-sauce-labs-onesie"]') 8) <button class="btn btn_primary btn_small btn_inventory " id="add-to-cart-test.allthethings()-t-shirt-(red)" name="add-to-cart-test.allthethings()-t-shirt-(red)" data-test="add-to-cart-test.allthethings()-t-shirt-(red)">Add to cart</button> aka locator('[data-test="add-to-cart-test\\.allthethings\\(\\)-t-shirt-\\(red\\)"]') Call log: - waiting for locator('button') at /Users/xx.xx/Documents/pwdemo/tests/sauce.spec.js:15:55 ``` # Page snapshot ```yaml - button "Open Menu" - img "Open Menu" - text: Swag Labs Products Name (A to Z) - combobox: - option "Name (A to Z)" [selected] - option "Name (Z to A)" - option "Price (low to high)" - option "Price (high to low)" - link "Sauce Labs Backpack": - img "Sauce Labs Backpack" - link "Sauce Labs Backpack" - text: carry.allTheThings() with the sleek, streamlined Sly Pack that melds uncompromising style with unequaled laptop and tablet protection. $29.99 - button "Add to cart" - link "Sauce Labs Bike Light": - img "Sauce Labs Bike Light" - link "Sauce Labs Bike Light" - text: A red light isn't the desired state in testing but it sure helps when riding your bike at night. Water-resistant with 3 lighting modes, 1 AAA battery included. $9.99 - button "Add to cart" - link "Sauce Labs Bolt T-Shirt": - img "Sauce Labs Bolt T-Shirt" - link "Sauce Labs Bolt T-Shirt" - text: Get your testing superhero on with the Sauce Labs bolt T-shirt. From American Apparel, 100% ringspun combed cotton, heather gray with red bolt. $15.99 - button "Add to cart" - link "Sauce Labs Fleece Jacket": - img "Sauce Labs Fleece Jacket" - link "Sauce Labs Fleece Jacket" - text: It's not every day that you come across a midweight quarter-zip fleece jacket capable of handling everything from a relaxing day outdoors to a busy day at the office. $49.99 - button "Add to cart" - link "Sauce Labs Onesie": - img "Sauce Labs Onesie" - link "Sauce Labs Onesie" - text: Rib snap infant onesie for the junior automation engineer in development. Reinforced 3-snap bottom closure, two-needle hemmed sleeved and bottom won't unravel. $7.99 - button "Add to cart" - link "Test.allTheThings() T-Shirt (Red)": - img "Test.allTheThings() T-Shirt (Red)" - link "Test.allTheThings() T-Shirt (Red)" - text: This classic Sauce Labs t-shirt is perfect to wear when cozying up to your keyboard to automate a few tests. Super-soft and comfy ringspun combed cotton. $15.99 - button "Add to cart" - contentinfo: - list: - listitem: - link "Twitter" - listitem: - link "Facebook" - listitem: - link "LinkedIn" - text: © 2025 Sauce Labs. All Rights Reserved. Terms of Service | Privacy Policy ``` # Test source ```ts 1 | import { test, expect } from '@playwright/test'; 2 | 3 | test('Sauce test', async ({ page }) => { 4 | await page.goto('https://www.saucedemo.com/'); 5 | await page.locator('[data-test="username"]').click(); 6 | await page.locator('[data-test="username"]').fill('standard_user'); 7 | await page.locator('[data-test="password"]').click(); 8 | await page.locator('[data-test="password"]').fill('secret_sauce'); 9 | await expect(page.locator('form')).toMatchAriaSnapshot(` 10 | - textbox "Username": standard_user 11 | - textbox "Password": secret_sauce 12 | - button "Login" 13 | `); 14 | await page.locator('[data-test="login-button"]').click(); > 15 | await page.locator('button', { name: 'Open Menu' }).click(); | ^ Error: locator.click: Error: strict mode violation: locator('button') resolved to 8 elements: 16 | await page.locator('[data-test="logout-sidebar-link"]').click(); 17 | }); ``` Paste the above error message in LLM, e.g., (ChatGPT). Once we paste the issue and ask to fix it to the LLM, the LLM will automatically fix the code and provide the updated code. Let's see what the issue was and how the issue is fixed. Issue Explanation The error occurs because the locator page.locator('button') resolves to 8 different buttons on the page, violating Playwright’s strict mode, which requires a unique element match. Root Cause The test attempts to click on button using { name: 'Open Menu' }, but the syntax is incorrect.The correct way to target the “Open Menu” button is by using getByRole('button', { name: 'Open Menu' }) or its unique attribute selector. Fix Modify the selector to target the “Open Menu” button explicitly using Playwright best practices. TypeScript await page.getByRole('button', { name: 'Open Menu' }).click(); // OR await page.locator('#react-burger-menu-btn').click(); Here’s an updated version of your test where you can see we have replaced the locator of the hamburger to await page.locator(‘#react-burger-menu-btn’). TypeScript import { test, expect } from '@playwright/test'; test('Sauce test', async ({ page }) => { await page.goto('https://www.saucedemo.com/'); await page.locator('[data-test="username"]').click(); await page.locator('[data-test="username"]').fill('standard_user'); await page.locator('[data-test="password"]').click(); await page.locator('[data-test="password"]').fill('secret_sauce'); await expect(page.locator('form')).toMatchAriaSnapshot(` - textbox "Username": standard_user - textbox "Password": secret_sauce - button "Login" `); await page.locator('[data-test="login-button"]').click(); await page.locator('#react-burger-menu-btn').click(); await page.locator('[data-test="logout-sidebar-link"]').click(); }); Execute the above code again, and you will see that the test case has passed. Conclusion The “Copy prompt” empowers developers to leverage LLMs for quick, actionable solutions for seamless debugging. Generating detailed, context-rich prompts with a single click empowers developers to leverage LLMs for quick, actionable solutions, cutting down on manual troubleshooting time. Embrace the “Copy prompt” on your next Playwright project, and let's simplify your debugging and elevate your testing experience.
How can we, software developers of the 2020s, stay competitive in the world of automation? Would LLM-based code-generating AI (GAI), like OpenAI services, take over our jobs? Would template-based code generators/software robots (TSR), like UiPath, Blue Prism, Strapi, etc., make developers irrelevant? The short answer is no! We should embrace automation whenever reasonable and focus on "business valuable" hard-to-automate skills. To be specific, consider a common distributed microservice information system (IS) in Fig. 1. There are domain microservices, data science (DS)/operations research (OR) microservices, and a composer service to aggregate data from the microservices (See Appendix A for the terminology). Every service (or a group of services) is implemented with a language, a tech stack, and a database (DB) optimal for the service's specific task. Also, the services communicate with each other via frameworks, message brokers, and protocols best for their tasks. Fig. 1. System overview Here, GAI helps us to create microservice starters on common languages and frameworks, generate code snippets, and learn languages and frameworks we are less experienced with. On the other hand, it is harder for GAI to generate workable code and conflict-free imports for less common frameworks or use cases like GraphQL streaming. Finally, DS/OR functionality, if done right, provides a high business value but is even harder to automate (Appendix B). So, we, as developers, should efficiently and conveniently integrate such functionality into our IS. We should also closely cooperate with a DS group whenever possible. On the other hand, TSR can generate code for simple CRUD microservices with predefined business logic via a WYSIWYG GUI. Some TSRs, like Strapi, allow us to manually edit generated code for more complex and efficient solutions. Many real-world use cases, however, can't be programmed this way; even when they can, there are glitches, like non-deterministic outputs. As I demonstrate in the sequel, our MERN NestJS monorepo tools are generally better than TSR for most but the simplest of use cases. To illustrate how to start building such IS (Fig. 1), I came up with the following starter that is based on hexagonal microservices in different languages (Java, Typescript, Python), different frameworks (Spring Boot Camel, MERN with NestJS monorepo, Flask), different databases (MongoDB and PostgreSQL). The microservices communicate with each other and the front-end single-page app (ReactJS-Vite) via REST, GraphQL, gRPC, and AMQP. Here, the MERN monorepo is used for domain microservices and the front-end app, which includes gRPC, AMQP, GraphQL, and REST servers. DS/OR functionality is implemented in Python with the same server techs. The composer is implemented with Spring Boot Camel with REST and GraphQL servers and gRPC, AMQP, and REST clients. Why so many diverse communication techs? If you are unsure which one works best for your project, try all of them! The hexagonal architecture allows you to easily build microservices with different communication techs, the same core, and databases. Also, during the presentation, I commented on what functionality we, developers, can do better than TSR. Here is the code. Let's go. System Architecture As was already mentioned, the starter has a hexagonal microservices architecture (Fig. 2). Every inbound adapter contains an entry point to a microservice (a main method in Spring boot, an app.listen() in NestJS, etc.). Every inbound adapter imports a core logic module, that, in turn, imports outbound adapter clients. Fig. 2. System architecture Here, two services are fully implemented (Composer, Users). In every service, its business logic is placed in the center; the business logic interacts with the outside world by means of controllers via ports (green). GQL=GraphQL. See Appendix A, Fig A for the notation. Some of the microservices interact directly (REST, GraphQL, gRPC), and some via message brokers (AMQP and RabbitMQ in our case). The former register their deployment host:port in an Eureka registration service. A Spring Cloud Gateway, in contact with the registration service, helps to call microservices by their registered aliases. RabbitMQ is a good choice for this system to intelligently route fewer but complex messages. For example, you may have a custom scheduler to route computationally intensive tasks to microservices to process in parallel. RabbitMQ can route a message by a pattern in the message's routing key (topic routing). A more versatile but computationally intensive option is to route a message by parsing its header (header routing). See this series of posts for details. As was mentioned in the introduction, there are three kinds of "productive" microservices in the starter: domain, DS/OR, and composer. Each domain service is frequently called and is responsible for a single domain. DS/OR services are far less frequently called, but have complex input/output data formats and a long computation time. Finally, composers aggregate data obtained from the domain and DS/OR services in complex scenarios. Let's see how the domain services are implemented. Domain Microservices This starter's domain services follow the hexagonal pattern outlined above. The services are implemented with NestJS monorepo (MERN stack). The service communicates with ReactJS-Vite (also in the same monorepo). Also, the service shares dto and verifiers with the front-end app. Every inbound adapter is an entry point. If, for example, REST and GraphQL adapters are active, these are two separate servers on different host:ports. The adapters import the core functionality as a module. Also, the core functionality imports the DAO module. The service implements a CQRS pattern (Fig. 3). Fig. 3. User domain service Here, light green squares are inbound and outbound adapters. User-core and User-DAO are the same in both services. C-1...6 are the steps the service executes to create a new User. Also, GU-1...3 are the steps to update a group. See Appendix A, Fig A for the notation. The CQRS pattern (Appendix A) is useful here for three reasons. First, it efficiently deals with a high number of requests since the pattern is asynchronous. Second, aggregates maintain data consistency. Third, the pattern uses a separate "read" DB to "join" data from different domains as soon as the data becomes available. Of course, the CAP theorem limits the view data consistency. Our data model is shown in Fig. 4 A. The aggregate is the User in Fig. 4 B. The view is UserView. The view contains user aggregates with extra user group data; this data comes from a group domain microservice (not implemented in this starter) by means of GlobalGroupUpdateEvents (Fig. 4 B) via an AMQP message broker. Fig. 4. Data model of Users service (A). User Aggregate, User View, Group Update Event (B) To create a new user (Fig. 3), the REST controller receives a request (C-1). The controller sends a CreateUserCommand via a command bus to the AggregateService (C-2). The AggregateService calls the AggregateDAO to create and persist a new Aggregate (C-3); if positive, the AggregateService publishes a UserCreatedEvent with the new aggregate data. The User event listener gets the event (C-4), broadcasts the event to the message broker, calls QueryService directly (C-5) to update the UserView (C-6). When the AMQP controller receives a GlobalGroupUpdatedEvent (Fig. 3, GU-1), the controller calls QueryService (GU-2) directly to update the UserView (GU-3). Notice how an aggregate is updated in our case. Usually [MsPs, Ch. 5], to update an aggregate as a whole, a transaction is created. Within the transaction, an old aggregate is read, and the constraints of the old and the new aggregates are checked. If OK, the aggregate gets updated, and the transaction is committed. If, however, it is permissible to send a whole aggregate to the front-end, then we can modify it there, check the constraints in the AggregateService, and finally, update the aggregate in the DB as a single document (with $set:{...user} command). No (explicitly created) transaction is needed, and all the aggregate's constraints are satisfied. Let's summarize why the MERN stack with NestJS monorepo is very good for domain microservices: Typescript and monorepo make it easier to share the same DTO types, ts-rest contracts, code-first GraphQL annotated classes, and Zod-like schemes between a Typescript front (ReactJS) and back-end NestJS apps. To my knowledge, TSR doesn't do this.MongoDB is great for CRUD nested objects, including maps; this is equally useful for dealing with both aggregates and views. For other scenarios, like audit, a graph DB can be better (Appendix C).CQRS efficiently deals with a lot of commands and queries. To my knowledge, TSR uses a simple three-layer architecture and a single read/write DB for CRUD operations.An extensive ecosystem with various communication technology modules (GraphQL, gRPC, RabbitMQ, Kafka,...) and architectural styles (CQRS) out of the box.Extensive testing and mocking utilities.NestJS CLI is useful to generate controller, module, CRUD resource, etc, boilerplate code and so possesses some TSR code generation capabilities. Finally, let's point out that domain services are the most often called from front-end apps. Some developers use LLMs to convert text and/or voice commands into API calls to domain services. I believe, however, that this will not pay off: a thoughtfully made conventional UI with modern tech, like GraphQL, will work better than a hyped LLM functionality that regularly misinterprets requests. Let's move to our DS/OR microservices. Data Science/Operations Research Microservices As was already mentioned, DS/OR microservices are less frequently called but have a complex input/output data format and unpredictably long computation time. So, we don't have to worry about running out of controller threads and can use a simpler three-layer architecture. The logic core methods are called directly, and a database is used to store intermediate results (later to train an ML-assisted mixed integer solver, for example). Fig. 5. DS/OR microservice architecture. The notation is the same as before. Basically, DS/OR microservices have to process two kinds of scenarios: a request-response and a (near) real-time stream processing; the latter requires input-output stream RPCs. Consider hexagonal Python microservices (Fig. 5), where every inbound adapter, as before, has an entry point if __name__ == '__main__'. Every inbound adapter imports a logic core that, in turn, imports a DAO module. Such apps may couple with each other via a common database. A request first comes to an inbound adapter (T-1), then a logic method is called synchronously (T-2), intermediate results are persisted (T-3); finally, the logic method returns a result back to the inbound adapter. I found gRPC to be the most useful communication technology for a DS/OR inbound adapter, for it can deal with complex input/output data formats via gRPC DSL and .proto files. Also, a gRPC input/output stream server is easy to program and set up. Other communication tech inbound adapters (GraphQL, REST, AMQP) are also implemented in the starter and may be useful in certain situations, when, for example, RabbitMQ infrastructure is already available. These less common (in comparison to REST) communication technologies allow us to conveniently and efficiently integrate our DS/OR functionality into our common IS. Let's take a closer look at the composer. Composer The composer follows the same hexagonal architecture outlined above. There is an inbound adapter to receive requests and core logic to process the requests in multiple steps (and to pull data from multiple sources); results are returned to the adapter and then to the user (Fig. 6). Since the requests are processed in multiple steps, it is nice to report progress status or intermediate results back to the user. Fig. 6. Composer architecture. The notation is the same as before. There are REST and GraphQL inbound adapters (separate Spring Boot apps, each with its own main method), and a core logic. As an inbound adapter receives a new request, the adapter calls a new task route. The route may "check the health" of the needed microservices and compose an execution plan (graph). To build a plan, the new task route may call a specialized service to parse a request and build an execution graph on the fly. Then Camel conditional routing capabilities allow us to execute the task (in its simplest form it is demonstrated here). To my knowledge, TSR can't build execution plans dynamically. This plan is then executed step by step by other routes. In our case, the steps are REST, AMQP, gRPC, Result, and Stop. At every step, a piece of data is received from an outbound adapter and processed. Then, an intermediate result message is sent to the inbound adapter via a Reactor sink. These messages are disparate, so every route has a sink converter. The inbound adapter receives a message from the sink and streams the messages to the browser. You might wonder if it would be easier to use a wiretap pattern in this case instead of a more complex Reactor sink. Indeed, such a composer would be more straightforward to build. However, in this case, the browser has to poll the composer periodically to obtain a task status. The camel framework is especially useful for this kind of scenario. It has, among others, extensive data conversion, async processing, and routing tools. Useful documentation is also very handy. Notice that since the composer calls a number of external RPCs, the calls need to be made asynchronously so that the system doesn't consume CPU cycles while it waits for an RPC to reply. Builder, template method, and strategy design patterns make it easier to reuse code for different clients (REST, AMQP, gRPC in our case). Let's see how exceptions are handled. Exception Handling Every real-life system faces problems, and so must have a way to handle them. Here, I implemented the following scheme (Fig. 7) whenever reasonable. Every module (controller, logic, DAO) has its own set of exceptions. Also, every module converts exceptions of its child modules into its own exceptions. Fig. 7. Exception handling scheme of the starter With monorepo architecture, it is easier to have a common exception set. Also, for microservices, we need to get and pass a correlation ID of every request to an exception's message. Let's see how to containerize the starter. Dockerization Let's elaborate on how to dockerize the starter. Recall that every inbound adapter of every microservice is an entry point (with a main method), which imports a logic module as a library, which, in turn, imports a DAO module (also as a library). Every such application is built as a separate image and deployed in its own container (Fig. 8). Notice that these apps (with the same logic and DAO) can also be deployed in a single container. All the containers are run in a docker composer (DC) with a default network. As usual, for a DC, we set a container's DC service name as a hostname to communicate with the service. Fig. 8. How to dockerize the starter Finally, let's point out that every microservice (Spring Boot, NestJS, ReactJS-Vite, Python frameworks) requires its own procedure to build a docker image. ReactJS-Vite is especially quirky for a Vite builder sets the deploy parameters (like hosts and ports) into a compiled main.js file directly, so that the parameters are essentially fixed. Conclusion In this post, I presented a starter for a distributed multi-language analytics and information system with domain, data science, and composer services. Specifically, the starter demonstrates how to efficiently integrate data science and operations research functionality with a common information system. Due to a versatile hexagonal architecture, the starter can be readily extended with other communication technologies and databases. Also, I pointed out how this system compares with LLM and TSR technologies. Appendix A. Some Microservice Terminology Here I follow [MsPs]: Aggregate A graph of objects that can be treated as a unit [MsPs, Ch. 5]. An aggregate contains a root entity and, possibly, one or more other entities and value objects; C/U/D functionalities deal with aggregates. Aggregates are only referenced by their roots. Inter-aggregate references must use primary keys instead of object references. Aggregates should be updated as a whole. Composer Pattern (or API Composer) [MsPs, Ch. 7] Implements a query operation by invoking the services that own the data and combining the results. Notice how a composer differs from a saga [MsPs, Ch. 6]: the later is a sequence of local transactions (that can roll back if unsuccessful). CQRS Pattern [MsPs, Ch. 7] Separate parts of the system and separate DBs to deal with Read (Query) and Create/Update/Delete requests (Command). Domain A noun (with, may be, child nouns) in user scenarios [MsPs, Ch. 2]. Domain Event Notifies subscribers of changes to aggregates [MsPs, Ch. 5]. Entity An object with a persistent identity. Hexagonal Architecture [MsPs, Ch. 2] this architectural style has one or more inbound adapters that handle requests from the outside by invoking the business logic (Fig. A). Similarly, the system has one or more outbound adapters that are invoked by the business logic and invoke external applications. Also, the business logic has one or more ports - a set of operations for the business logic to interact with adapters (in Java, ports may be Java interfaces). The business logic doesn't depend on the adapters, but the adapters do depend on the logic. This means that although an inbound adapter usually contains an entry point to a microservice (a main method in Spring boot, an app.listen() in NestJS), the adapter converts its data to "fit" into the business logic port. Value Object A collection of values. View A read-optimized DB or a virtual table (like in PostgreSQL). Fig. A Appendix B. Why OR and Predictive/Prescriptive Analytics Is Hard to Automate According to [ZF], an OR pipeline is as follows: Problem identification and definitionParameter generationModel formulationModel optimizationInterpretation and validation The hardest step to automate here is 3. As [ZF] points out, people currently try to use LLMs to convert text-based problem descriptions into mathematical models. The results on textbook problems are as follows: general purpose LLMs: 24-82% accuracy, OR tailored LLM (NL4OPT): up to 90% accuracy. So, for real-world problems, LLM formulations can serve only as a starting point for OR experts to create mathematical models. An often overlooked fact is that linear and (even more so) mixed-integer programming problems are ill-posed (an output is highly sensitive to an input) [TA, Ch 9]. So, we can't afford LLM formulation mistakes; every OR equation and constraint must be carefully examined. Special regularization algorithms can alleviate this problem, but only to an extent. See, for example, [Pan, Ch. 7], and [Vas, Ch. 4-5]. These LLM limitations for OR (and advanced analytical problems in general) are no accidents. According to [Bub], GPT-4 "can also make very basic [math] mistakes and occasionally produces incoherent output which may be interpreted as a lack of true understanding [of math]." Appendix C. Document vs. Relational vs. Graph Databases I found the following mental picture to keep in mind when to choose among document (DDB), relational (RDB), or graph data bases (GDB) (Fig. C). If your data model resembles "connected pyramids," then consider a DDB. On Fig. C A, a document of Group collection refers to a document of User collection. MongoDB, a DDB, is especially good in dealing with inheritance (via the discriminator mechanism). Also, MongoDB is good when a schema is not fully known in advance or very complex, like when you save intermediate results of complex computations. Finally, MongoDB now fully supports ACID transactions, and scales horizontally via sharding mechanism. Fig. C. A simple mental picture of a document (A), relational (B), and graph (C) databases On the other hand, GDBs, like Neo4j, treat individual entities as graph nodes (graph edges can also carry information) (Fig. C C). These data bases are especially useful for "graph problems", like to find indirectly connected entities, cycles, etc (see, for example, this post on audit functionality). Neo4j is also ACID — compliant and scales horizontally. A negative side of Neo4j is that it takes an order of magnitude more disk space. Finally, RDBs, like PostgreSQL, are still the most widely used, constantly improving, with a huge community, and usually chosen by default unless there are special scenarios (like those described above) in your project. Of course, CAP theorem limits any hard-constraint database to scale horizontally. Literature [Bub]: Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, Yi Zhang, "Sparks of Artificial General Intelligence: Early experiments with GPT-4."[MsPs]: Chris Richardson's "Microservice patterns."[Pan]: Ping-Qi Pan, "Linear Programming Computation."[TA]: A.N. Tikhonov, V.Ya. Arsenin, "Solutions of ill-posed problems."[Vas]: F.P. Vasilyev, A. Yu. Ivanitskiy, "In-depth analysis of linear programming."[ZF]: Zhenan Fan, Bissan Ghaddar, Xinglu Wang, Linzi Xing, Yong Zhang, Zirui Zhou, "Artificial Intelligence for Operations Research: Revolutionizing the Operations Research Process."
AI-Driven Kubernetes Troubleshooting With DeepSeek and k8sgpt
March 19, 2025
by
CORE
Bringing Security to Digital Product Design
March 18, 2025 by
The Agile Prompt Engineering Framework
March 17, 2025
by
CORE
When Doris Meets Iceberg: A Data Engineer's Redemption
March 21, 2025 by
Graph API for Entra ID (Azure AD) Object Management
March 21, 2025 by
[Part-2] Text to Action: Words to Calendar Events
March 21, 2025 by
Multi-Tenant .NET Applications With Keycloak Realms
March 21, 2025 by
Graph API for Entra ID (Azure AD) Object Management
March 21, 2025 by
Spring Boot Gateway With Spring Cloud and WebFlux
March 21, 2025 by
Multi-Tenant .NET Applications With Keycloak Realms
March 21, 2025 by
Writing DTOs With Java8, Lombok, and Java14+
March 21, 2025 by
Graph API for Entra ID (Azure AD) Object Management
March 21, 2025 by
Graph API for Entra ID (Azure AD) Object Management
March 21, 2025 by
[Part-2] Text to Action: Words to Calendar Events
March 21, 2025 by
Running and Debugging Tests With Playwright UI Mode
March 21, 2025
by
CORE
Writing DTOs With Java8, Lombok, and Java14+
March 21, 2025 by
Concurrency and Parallelism in Node.js for Scalable Apps
March 21, 2025 by
Spring Boot Gateway With Spring Cloud and WebFlux
March 21, 2025 by