DZone's Cloud Native Research: Join Us for Our Survey (and $750 Raffle)!
How To Get Started With New Pattern Matching in Java 21
Enterprise AI
Artificial intelligence (AI) has continued to change the way the world views what is technologically possible. Moving from theoretical to implementable, the emergence of technologies like ChatGPT allowed users of all backgrounds to leverage the power of AI. Now, companies across the globe are taking a deeper dive into their own AI and machine learning (ML) capabilities; they’re measuring the modes of success needed to become truly AI-driven, moving beyond baseline business intelligence goals and expanding to more innovative uses in areas such as security, automation, and performance.In DZone’s Enterprise AI Trend Report, we take a pulse on the industry nearly a year after the ChatGPT phenomenon and evaluate where individuals and their organizations stand today. Through our original research that forms the “Key Research Findings” and articles written by technical experts in the DZone Community, readers will find insights on topics like ethical AI, MLOps, generative AI, large language models, and much more.
Core PostgreSQL
AI Automation Essentials
Generative AI (GenAI) and Large Language Models (LLMs) offer transformative potential across various industries. However, their deployment in production environments faces challenges due to their computational intensity, dynamic behavior, and the potential for inaccurate or undesirable outputs. Existing monitoring tools often fall short of providing real-time insights crucial for managing such applications. Building on top of the existing work, this article presents the framework for monitoring GenAI applications in production. It addresses both infrastructure and quality aspects. On the infrastructure side, one needs to proactively track performance metrics such as cost, latency, and scalability. This enables informed resource management and proactive scaling decisions. To ensure quality and ethical use, the framework recommends real-time monitoring for hallucinations, factuality, bias, coherence, and sensitive content generation. The integrated approach empowers developers with immediate alerts and remediation suggestions, enabling swift intervention and mitigation of potential issues. By combining performance and content-oriented monitoring, this framework fosters the stable, reliable, and ethical deployment of Generative AI within production environments. Introduction The capabilities of GenAI, driven by the power of LLMs, are rapidly transforming the way we interact with technology. From generating remarkably human-like text to creating stunning visuals, GenAI applications are finding their way into diverse production environments. Industries are harnessing this potential for use cases such as content creation, customer service chatbots, personalized marketing, and even code generation. However, the path from promising technology to operationalizing these models remains a big challenge[1]. Ensuring the optimal performance of GenAI applications demands careful management of infrastructure costs associated with model inference, cost, and proactive scaling measures to handle fluctuations in demand. Maintaining user experience requires close attention to response latency. Simultaneously, the quality of the output generated by LLMs is of utmost importance. Developers must grapple with the potential for factual errors, the presence of harmful biases, and the possibility of the models generating toxic or sensitive content. These challenges necessitate a tailored approach to monitoring that goes beyond traditional tools. The need for real-time insights into both infrastructure health and output quality is essential for the reliable and ethical use of GenAI applications in production. This article addresses this critical need by proposing solutions specifically for real-time monitoring of GenAI applications in production. Current Limitations The monitoring and governance of AI systems have garnered significant attention in recent years. Existing literature on AI model monitoring often focuses on supervised learning models [2]. These approaches address performance tracking, drift detection, and debugging in classification or regression tasks. Research in explainable AI (XAI) has also yielded insights into interpreting model decisions, particularly for black-box models [3]. This field seeks to unravel the inner workings of these complex systems or provide post-hoc justifications for outputs [4]. Moreover, studies on bias detection explore techniques for identifying and mitigating discriminatory patterns that may arise from training data or model design [5]. While these fields provide a solid foundation, they do not fully address the unique challenges of monitoring and evaluating generative AI applications based on LLMs. Here, the focus shifts away from traditional classification or regression metrics and towards open-ended generation. Evaluating LLMs often involves specialized techniques like human judgment or comparison against reference datasets [6]. Furthermore, standard monitoring and XAI solutions may not be optimized for tracking issues prevalent in GenAI, such as hallucinations, real-time bias detection, or sensitivity to token usage and cost. There has been some recent work in helping solve this challenge [8], [9]. This article builds upon prior work in these related fields while proposing a framework designed specifically for the real-time monitoring needs of production GenAI applications. It emphasizes the integration of infrastructure and quality monitoring, enabling the timely detection of a broad range of potential issues unique to LLM-based applications. This article concentrates on monitoring Generative AI applications utilizing model-as-a-service (MLaaS) offerings such as Google Cloud's Gemini, OpenAI's GPTs, Claude on Amazon Bedrock, etc. While the core monitoring principles remain applicable, self-hosted LLMs necessitate additional considerations. These include model optimization, accelerator (e.g. GPU) management, infrastructure management, scaling, etc - factors outside the scope of this discussion. Also, this article focuses on text-to-text models, but the principles can be extended to other modalities as well. The subsequent sections will focus on various metrics, techniques, and architecture for capturing those metrics to gain visibility into LLM's behavior in production. Application Monitoring Monitoring the performance and resource utilization of Generative AI applications is vital for ensuring their optimal functioning and cost-effectiveness in production environments. This section delves into the key components of application monitoring for GenAI, specifically focusing on cost, latency, and scalability considerations. Cost Monitoring and Optimization The cost associated with deploying GenAI applications can be significant, especially when leveraging MLaaS offerings. Therefore, granular cost monitoring and optimization are crucial. Below are some of the key metrics to focus on: Granular Cost Tracking MLaaS providers typically charge based on factors such as the number of API calls, tokens consumed, model complexity, and data storage. Tracking costs at this level of detail allows for a precise understanding of cost drivers. For MLaaS LLMs, input and output characters/token count can be the key driver of cost. Most models have tokenizer APIs to count the characters/tokens for any given text. These APIs can help understand usage for monitoring and optimizing inference costs. Below is an example of generating a billable character count for Google Cloud’s Gemini model. Python import vertexai from vertexai.generative_models import GenerativeModel def generate_count(project_id: str, location: str) -> str: # Initialize Vertex AI vertexai.init(project=project_id, location=location) # Load the model model = GenerativeModel("gemini-1.0-pro") # prompt tokens count count = model.count_tokens("how many billable characters are here?")) # response total billable characters return count.total_billable_characters generate_count('your-project-id','us-central1') Usage Pattern Analysis and Token Efficiency Analyzing token usage patterns plays a pivotal role in optimizing the operating costs and user experience of GenAI applications. Cloud providers often impose token-per-second quotas, and consistently exceeding these limits can degrade performance. While quota increases may be possible, there are often hard limits. Creative resource management may be required for usage beyond these thresholds. A thorough analysis of token usage over time helps identify avenues for cost optimization. Consider the following strategies: Prompt optimization: Rewriting prompts to reduce their size reduces token consumption and should be a primary focus of optimization efforts. Model tuning: A model fine-tuned on a well-curated dataset can potentially deliver similar or even superior performance with smaller prompts. While some providers charge similar fees for base and tuned models, premium pricing models for tuned models also exist. One needs to be cognizant of these, before making a decision. In certain cases, model tuning can significantly reduce token usage and associated costs. Retrieval-augmented generation: Incorporating information retrieval techniques can help reduce input token size by strategically limiting the data fed into the model, potentially reducing costs. Smaller model utilization: When a smaller model is used in tandem with high-quality data, not only can it achieve comparable performance to a larger model, but it offers a compelling cost-saving strategy too. The token count analysis code example provided earlier in the article can be instrumental in understanding and optimizing token usage. It's worth noting that pricing models for tuned models vary across MLaaS providers, highlighting the importance of careful pricing analysis during the selection process. Latency Monitoring In the context of GenAI applications, latency refers to the total time elapsed between a user submitting a request and receiving a response from the model. Ensuring minimal latency is crucial for maintaining a positive user experience, as delays can significantly degrade perceived responsiveness and overall satisfaction. This section delves into the essential components of robust latency monitoring for GenAI applications. Real-Time Latency Measurement Real-time tracking of end-to-end latency is fundamental. This entails measuring the following components: Network latency: Time taken for data to travel between the user's device and the cloud-based MLaaS service. Model inference time: The actual time required for the LLM to process the input and generate a response. Pre/post-processing overhead: Any additional time consumed for data preparation before model execution and formatting responses for delivery. Impact on User Experience Understanding the correlation between latency and user behavior is essential for optimizing the application. Key user satisfaction metrics to analyze include: Bounce rate: The percentage of users who leave a website or application after viewing a single interaction. Session duration: The length of time a user spends actively engaged with the application. Conversion rates: (When applicable) The proportion of users who complete a desired action, such as a purchase or sign-up. Identifying Bottlenecks Pinpointing the primary sources of latency is crucial for targeted fixes. Potential bottleneck areas warranting investigation include: Network performance: Insufficient bandwidth, slow DNS resolution, or network congestion can significantly increase network latency. Model architecture: Large, complex models may have longer inference times. Many times using smaller models, with higher quality data and better prompts can help yield necessary results. Inefficient input/output processing: Unoptimized data handling, encoding, or formatting can add overhead to the overall process. MLaaS platform factors: Service-side performance fluctuations on the MLaaS platform can impact latency. Proactive latency monitoring is vital for maintaining the responsiveness and user satisfaction of GenAI applications in production environments. By understanding the components of latency, analyzing its impact on user experience, and strategically identifying bottlenecks, developers can make informed decisions to optimize their applications. Scalability Monitoring Production-level deployment of GenAI applications necessitates the ability to handle fluctuations in demand gracefully. Regular load and stress testing are essential for evaluating a system's scalability and resilience under realistic and extreme traffic scenarios. These tests should simulate diverse usage patterns, gradual load increases, peak load simulations, and sustained load. Proactive scalability monitoring is critical, particularly when leveraging MLaaS platforms with hard quota limits for LLMs. This section outlines key metrics and strategies for effective scalability monitoring within these constraints. Autoscaling Configuration Leveraging the autoscaling capabilities provided by MLaaS platforms is crucial for dynamic resource management. Key considerations include: Metrics: Identify the primary metrics that will trigger scaling events (e.g., response time, API requests per second, error rates). Set appropriate thresholds based on performance goals. Scaling policies: Define how quickly resources should be added or removed in response to changes in demand. Consider factors like the time it takes to spin up additional model instances. Cooldown periods: Implement cooldown periods after scaling events to prevent "thrashing" (rapid scaling up and down), which can lead to instability and increased costs. Monitoring Scaling Metrics During scaling events, meticulously monitor these essential metrics: Response time: Ensure that response times remain within acceptable ranges, even when scaling, as latency directly impacts user experience. Throughput: Track the system's overall throughput (e.g., requests per minute) to gauge its capacity to handle incoming requests. Error rates: Monitor for any increases in error rates due to insufficient resources or bottlenecks that can arise during scaling processes. Resource utilization: Observe CPU, memory, and GPU utilization to identify potential resource constraints. MLaaS platforms' hard quota limits pose unique challenges for scaling GenAI applications. Strategies to address this include: Caching: Employ strategic caching of model outputs for frequently requested prompts to reduce the number of model calls. Batching: Consolidate multiple requests and process them in batches to optimize resource usage. Load balancing: Distribute traffic across multiple model instances behind a load balancer to maximize utilization within available quotas. Hybrid deployment: Consider a hybrid approach where less demanding requests are served by MLaaS models, and those exceeding quotas are handled by a self-hosted deployment (assuming the necessary expertise). Proactive application monitoring, encompassing cost, latency, and scalability aspects, underpins the successful deployment and cost-effective operation of GenAI applications in production. By implementing the strategies outlined above, developers and organizations can gain crucial insights, optimize resource usage, and ensure the responsiveness of their applications for enhanced user experiences. Content Monitoring Ensuring the quality and ethical integrity of GenAI applications in production requires a robust content monitoring strategy. This section addresses the detection of hallucinations, accuracy issues, harmful biases, lack of coherence, and the generation of sensitive content. Hallucination Detection Mitigating the tendency of LLMs to generate plausible but incorrect information is paramount for their ethical and reliable deployment in production settings. This section delves into grounding techniques and strategies for leveraging multiple LLMs to enhance the detection of hallucinations. Human-In-The-Loop To address the inherent issue of hallucinations in LLM-based applications, the human-in-the-loop approach offers two key implementation strategies: End-user feedback: Incorporating direct feedback mechanisms, such as thumbs-up/down ratings and options for detailed textual feedback, provides valuable insights into the LLM's output. This data allows for continuous model refinement and pinpoints areas where hallucinations may be prevalent. End-user feedback creates a collaborative loop that can significantly enhance the LLM's accuracy and trustworthiness over time. Human review sampling: Randomly sampling a portion of LLM-generated outputs and subjecting them to rigorous human review establishes a quality control mechanism. Human experts can identify subtle hallucinations, biases, or factual inconsistencies that automated systems might miss. This process is essential for maintaining a high standard of output, particularly in applications where accuracy is paramount. Implementing these HITL strategies fosters a symbiotic relationship between humans and LLMs. It leverages human expertise to guide and correct the LLM, leading to progressively more reliable and factually sound outputs. This approach is particularly crucial in domains where accuracy and the absence of misleading information are of utmost importance. Grounding in First-Party and Trusted Data Anchoring the output of GenAI applications in reliable data sources offers a powerful method for hallucination detection. This approach is essential, especially when dealing with domain-specific content or scenarios where verifiable facts are required. Techniques include: Prompt engineering with factual constraints: Carefully construct prompts that incorporate domain-specific knowledge, reference external data, or explicitly require the model to adhere to a known factual context. For example, a prompt for summarizing a factual document could include instructions like, "Restrict the summary to information explicitly mentioned in the document. Retrieval Augmented Generation: Augment LLMs using trusted datasets that prioritize factual accuracy and adherence to provided information. This can help reduce the model's overall tendency to fabricate information. Incorporating external grounding sources: Utilize APIs or services designed to access and process first-party data, trusted knowledge bases, or real-world information. This allows the system to cross-verify the model's output and flag potential discrepancies. For instance, a financial news summarization task could be coupled with an API that provides up-to-date stock market data for accuracy validation. LLM-based output evaluation: The unique capabilities of LLMs can be harnessed to evaluate the factual consistency of the generated text. Strategies include: Self-consistency check: This can be achieved through multi-step generation, where a task is broken into smaller steps, and later outputs are checked for contradictions against prior ones. For instance, asking the model to first outline key points of a document and then generate a full summary allows for verification that the summary aligns with those key points. Alternatively, rephrasing the original prompt in different formats and comparing the resulting outputs can reveal inconsistencies indicative of fabricated information. Cross-model comparison: Feed the output of one LLM as a prompt into a different LLM with potentially complementary strengths. Analyze any inconsistencies or contradictions between the subsequent outputs, which may reveal hallucinations. Metrics for tracking hallucinations: Accurately measuring and quantifying hallucinations generated by LLMs remains an active area of research. While established metrics from fields such as information retrieval and classification offer a foundation, the unique nature of hallucination detection necessitates the adaptation of existing metrics and the development of novel ones. This section proposes a multi-faceted suite of metrics, including standard metrics creatively adapted for this context as well as novel metrics specifically designed to capture the nuances of hallucinated text. Importantly, I encourage practitioners to tailor these metrics to the specific sensitivities of their business domains. Domain-specific knowledge is essential in crafting a metric set that aligns with the unique requirements of each GenAI deployment. Considerations and Future Directions Specificity vs. Open-Endedness Grounding techniques can be highly effective in tasks requiring factual precision. However, in more creative domains where novelty is expected, strict grounding might hinder the model's ability to generate original ideas. Data Quality The reliability of any grounding strategy depends on the quality and trustworthiness of the external data sources used. Verification against curated first-party data or reputable knowledge bases is essential. Computational Overhead Fact-checking, data retrieval, and multi-model evaluation can introduce additional latency and costs that need careful consideration in production environments. Evolving Evaluation Techniques Research into the use of LLMs for semantic analysis and consistency checking is ongoing. More sophisticated techniques for hallucination detection leveraging LLMs are likely to emerge, further bolstering their utility in this task. Grounding and cross-model evaluation provide powerful tools to combat hallucinations in GenAI outputs. Used strategically, these techniques bolster the factual accuracy and trustworthiness of these applications, promoting their robust deployment in real-world scenarios. Bias Monitoring The issue of bias in LLMs is a complex and pressing concern, as these models have the potential to perpetuate or amplify harmful stereotypes and discriminatory patterns present in their training data. Proactive bias monitoring is crucial for ensuring the ethical and inclusive deployment of GenAI in production. This section explores data-driven, actionable strategies for bias detection and mitigation. Fairness Evaluation Toolkits Specialized libraries and toolkits offer a valuable starting point for bias assessment in LLM outputs. While not all are explicitly designed for LLM evaluation, many can be adapted and repurposed for this context. Consider the following tools: Aequitas: Provides a suite of metrics and visualizations for assessing group fairness and bias across different demographics. This tool can be used to analyze model outputs for disparities based on sensitive attributes like gender, race, etc. ([invalid URL removed]) FairTest: Enables the identification and investigation of potential biases in model outputs. It can analyze the presence of discriminatory language or differential treatment of protected groups. ([invalid URL removed]) Real-Time Analysis In production environments, real-time bias monitoring is essential. Strategies include: Keyword and phrase tracking: Monitor outputs for specific words, phrases, or language patterns historically associated with harmful biases or stereotypes. Tailor these lists to sensitive domains and potential risks related to your application. Dynamic prompting for bias discovery: Systematically test the model with carefully constructed inputs designed to surface potential biases. For example, modify prompts to vary gender, ethnicity, or other attributes while keeping the task consistent, and observe whether the model's output exhibits prejudice. Mitigation Strategies When bias is detected, timely intervention is critical. Consider the following actions: Alerting: Implement an alerting system to flag potentially biased outputs for human review and intervention. Calibrate the sensitivity of these alerts based on the severity of bias and its potential impact. Filtering or modification: In sensitive applications, consider automated filtering of highly biased outputs or modification to neutralize harmful language. These measures must be balanced against the potential for restricting valid and unbiased expressions. Human-in-the-loop: Integrate human moderators for nuanced bias assessment and for determining appropriate mitigation steps. This can include re-prompting the model, providing feedback for fine-tuning, or escalating critical issues. Important Considerations Evolving standards: Bias detection is context-dependent and definitions of harmful speech evolve over time. Monitoring systems must remain adaptable. Intersectionality: Biases can intersect across multiple axes (e.g., race, gender, sexual orientation). Monitoring strategies need to account for this complexity. Bias monitoring in GenAI applications is a multifaceted and ongoing endeavor. By combining specialized toolkits, real-time analysis, and thoughtful mitigation strategies, developers can work towards more inclusive and equitable GenAI systems. Coherence and Logic Assessment Ensuring the internal consistency and logical flow of GenAI output is crucial for maintaining user trust and avoiding nonsensical results. This section offers techniques for unsupervised coherence and logic assessment, applicable to a variety of LLM-based tasks at scale. Semantic Consistency Checks Semantic Similarity Analysis Calculate the semantic similarity between different segments of the generated text (e.g., sentences, paragraphs). Low similarity scores can indicate a lack of thematic cohesion or abrupt changes in topic. Implementation Leverage pre-trained sentence embedding models (e.g., Sentence Transformers) to compute similarity scores between text chunks. Python from sentence_transformers import SentenceTransformer model = SentenceTransformer('paraphrase-distilroberta-base-v2') generated_text = "The company's stock price surged after the earnings report. Cats are excellent pets." sentences = generated_text.split(".") embeddings = model.encode(sentences) similarity_score = cosine_similarity(embeddings[0], embeddings[1]) print(similarity_score) # A low score indicates potential incoherence Topic Modeling Apply topic modeling techniques (e.g., LDA, NMF) to extract latent topics from the generated text. Inconsistent topic distribution across the output may suggest a lack of a central theme or focus. Implementation Utilize libraries like Gensim or scikit-learn for topic modeling. Logical Reasoning Evaluation Entailment and Contradiction Detection Assess whether consecutive sentences within the generated text exhibit logical entailment (one sentence implies the other) or contradiction. This can reveal inconsistencies in reasoning. Implementation Employ entailment models (e.g., BERT-based models fine-tuned on Natural Language Inference datasets like SNLI or MultiNLI). These techniques can be packaged into user-friendly functions or modules, shielding users without deep ML expertise from the underlying complexities. Sensitive Content Detection With GenAI's ability to produce remarkably human-like text, it's essential to be proactive about detecting potentially sensitive content within its outputs. This is necessary to avoid unintended harm, promote responsible use, and maintain trust in the technology. The following section explores modern techniques specifically designed for sensitive content detection within the context of large language models. These scalable approaches will empower users to safeguard the ethical implementation of GenAI across diverse applications. Perspective API integration: Google's Perspective API offers a pre-trained model for identifying toxic comments. It can be integrated into LLM applications to analyze generated text and provide a score for the likelihood of containing toxic content. The Perspective API can be accessed through a REST API. Here's an example using Python: Python from googleapiclient import discovery import json def analyze_text(text): client = discovery.build("commentanalyzer", "v1alpha1") analyze_request = { "comment": {"text": text}, "requestedAttributes": {"TOXICITY": {}, } response = client.comments().analyze(body=analyze_request).execute() return response["attributeScores"]["TOXICITY"]["summaryScore"]["value"] text = "This is a hateful comment." toxicity_score = analyze_text(text) print(f"Toxicity score: {toxicity_score}") The API returns a score between 0 and 1, indicating the likelihood of toxicity. Thresholds can be set to flag or filter content exceeding a certain score. LLM-based safety filter: Major MLaaS providers like Google offer first-party safety filters integrated into their LLM offerings. These filters use internal LLM models trained specifically to detect and mitigate sensitive content. When using Google's Gemini API, the safety filters are automatically applied. You can access different creative text formats with safety guardrails in place. They also provide a second level of safety filters that users can leverage to apply additional filtering based on a set of metrics. For example, Google Cloud’s safety filters are mentioned here. Human-in-the-loop evaluation: Integrating human reviewers in the evaluation process can significantly improve the accuracy of sensitive content detection. Human judgment can help identify nuances and contextual factors that may be missed by automated systems. A platform like Amazon Mechanical Turk can be used to gather human judgments on the flagged content. Evaluator LLM: This involves using a separate LLM (“Evaluator LLM”) specifically to assess the output of the generative LLM for sensitive content. This Evaluator LLM can be trained on a curated dataset labeled for sensitive content. Training an Evaluator LLM requires expertise in deep learning. Open-source libraries like Hugging Face Transformers provide tools and pre-trained models to facilitate this process. An alternative is to use general-purpose LLMs such as Gemini or GPT with appropriate prompts to discover sensitive content. The language used to express sensitive content constantly evolves, requiring continuous updates to the detection models. By combining these scalable techniques and carefully addressing the associated challenges, we can build robust systems for detecting and mitigating sensitive content in LLM outputs, ensuring responsible and ethical deployment of this powerful technology. Conclusion Ensuring the reliable, ethical, and cost-effective deployment of Generative AI applications in production environments requires a multifaceted approach to monitoring. This article presented a framework specifically designed for real-time monitoring of GenAI, addressing both infrastructure and quality considerations. On the infrastructure side, proactive tracking of cost, latency, and scalability is essential. Tools for analyzing token usage, optimizing prompts, and leveraging auto-scaling capabilities play a crucial role in managing operational expenses and maintaining a positive user experience. Content monitoring is equally important for guaranteeing the quality and ethical integrity of GenAI applications. This includes techniques for detecting hallucinations, such as grounding in reliable data sources and incorporating human-in-the-loop verification mechanisms. Strategies for bias mitigation, coherence assessment, and sensitive content detection are vital for promoting inclusivity and preventing harmful outputs. By integrating the monitoring techniques outlined in this article, developers can gain deeper insights into the performance, behavior, and potential risks associated with their GenAI applications. This proactive approach empowers them to take informed corrective actions, optimize resource utilization, and ultimately deliver reliable, trustworthy, and ethical AI-powered experiences to users. While we have focused on MLaaS offerings, the principles discussed can be adapted to self-hosted LLM deployments. The field of GenAI monitoring is rapidly evolving. Researchers and practitioners should remain vigilant regarding new developments in hallucination detection, bias mitigation, and evaluation techniques. Additionally, it's crucial to recognize the ongoing debate around the balance between accuracy restrictions and creativity in generative models. Reference M. Korolov, “For IT leaders, operationalized gen AI is still a moving target,” CIO, Feb. 28, 2024. O. Simeone, "A Very Brief Introduction to Machine Learning With Applications to Communication Systems," in IEEE Transactions on Cognitive Communications and Networking, vol. 4, no. 4, pp. 648-664, Dec. 2018, doi: 10.1109/TCCN.2018.2881441. F. Doshi-Velez and B. Kim, "Towards A Rigorous Science of Interpretable Machine Learning", arXiv, 2017. [Online]. A. B. Arrieta et al. "Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI." Information Fusion 58 (2020): 82-115. A. Saleiro et al. "Aequitas: A Bias and Fairness Audit Toolkit." arXiv, 2018. [Online]. E. Bender and A. Koller, “Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data,” Proceedings of the 58th Annual Meeting of the Association for Computational S. Mousavi et al., “Enhancing Large Language Models with Ensemble of Critics for Mitigating Toxicity and Hallucination,” OpenReview. X. Amatriain, “Measuring And Mitigating Hallucinations In Large Language Models: A Multifaceted Approach”, Mar. 2024. [Online].
In the realm of modern enterprise integration, the harmonious interaction between different systems and platforms is paramount. Salesforce, being one of the leading CRM platforms, often necessitates seamless integration with other systems to leverage its data and functionalities. MuleSoft, on the other hand, is a powerful integration platform that facilitates connecting various applications and APIs. One common scenario in Salesforce integration is invoking APEX REST methods from MuleSoft to interact with Salesforce data. In this blog post, we'll walk through the process of invoking an APEX REST method in MuleSoft and fetching account information from Salesforce using APEX code snippets. Why Use APEX REST Methods? Salesforce APEX REST methods provide a flexible way to expose custom functionalities or access Salesforce data through RESTful APIs. Leveraging APEX REST methods allows for fine-grained control over what data is exposed and how it's accessed, making it a preferred choice for integrating Salesforce with external systems like MuleSoft. Prerequisites Before we dive into the integration process, ensure you have the following prerequisites in place: Salesforce developer account: You'll need a Salesforce Developer account to create and test APEX REST methods. MuleSoft Anypoint platform account: Sign up for MuleSoft Anypoint Platform to create Mule applications. Integration Steps 1. Create APEX REST Method in Salesforce First, let's create an APEX REST method in Salesforce to fetch account information. Here's a simple example: Java apex @RestResource(urlMapping='/accountInfo/*') global with sharing class AccountInfoRestController { @HttpGet global static Account getAccountInfoById() { RestRequest req = RestContext.request; String accountId = req.requestURI.substring(req.requestURI.lastIndexOf('/')+1); // Fetch Account information by Id Account acc = [SELECT Id, Name, Industry, Phone, BillingCity FROM Account WHERE Id=:accountId]; return acc; } } In this code snippet, we define an APEX class AccountInfoRestController annotated with @RestResource to expose a REST endpoint /accountInfo/. The getAccountInfoById() method retrieves account information based on the provided Account ID. 2. Deploy APEX Class in Salesforce Deploy the AccountInfoRestController class to your Salesforce organization. 3. Create MuleSoft Application Create a new MuleSoft application in Anypoint Studio. File --> New --> Mule Project 4. Configure HTTP Connector Add an HTTP Listener to your Mule flow to receive incoming HTTP requests. Configure the listener with the appropriate host and port. 5. Invoke the APEX REST Method in MuleSoft There are 2 ways to invoke the APEX REST method in MuleSoft. 5.1. Using HTTP Requester Add an HTTP Request component after the HTTP Listener and configure it to make a GET request to the Salesforce APEX REST endpoint: URL: Specify the URL of the APEX REST method in the format https://<Salesforce_Instance_URL>/services/apexrest/accountInfo/<Account_Id>. Method: Set the method to GET. Headers: Include any required headers, such as authentication tokens. 5.2. Using Invoke APEX Rest API Connector Add an Invoke APEX Rest API Connector after the HTTP listener and configure it to make a GET request to the Salesforce APEX REST endpoint: APEX class: Specify the name of the class AccountInfoRestController APEX method: This is the combination of 4 attributes: methodName^urlMapping^httpMethod^returnType, getAccountInfoById^/accountInfo^HttpGet^Account Salesforce configuration: Basic Authentication Username: Your login user ID Password: Your login password Security Token: Generate from Profile --> Settings --> Reset My Security Token 6. Parse Response After invoking the APEX REST method, parse the response body to extract account information. Use transform message to send payload in required structure to calling flow. JSON { "Id": "xxxxxxxxxxxxxxx", "Name": "Example Account", "Industry": "Technology", "Phone": "(555) 555-5555", "BillingCity": "San Francisco" } 7. Handle Errors and Transform Data Implement error handling and data transformation as per your application requirements. 8. Complete the Mule Flow Complete the Mule flow with the necessary components for further processing or response generation. Conclusion Integrating Salesforce APEX REST methods in MuleSoft enables seamless interaction between Salesforce and other systems. By following the steps outlined in this guide, you can effectively invoke APEX REST methods from MuleSoft and retrieve account information from Salesforce. This integration approach empowers organizations to unlock the full potential of their Salesforce data within their broader application ecosystem.
In today's dynamic and complex cloud environments, observability has become a cornerstone for maintaining the reliability, performance, and security of applications. Kubernetes, the de facto standard for container orchestration, hosts a plethora of applications, making the need for an efficient and scalable observability framework paramount. This article delves into how OpenTelemetry, an open-source observability framework, can be seamlessly integrated into a Kubernetes (K8s) cluster managed by KIND (Kubernetes IN Docker), and how tools like Loki, Tempo, and the kube-prometheus-stack can enhance your observability strategy. We'll explore this setup through the lens of a practical example, utilizing custom values from a specific GitHub repository. The Observability Landscape in Kubernetes Before diving into the integration, let's understand the components at play: KIND offers a straightforward way to run K8s clusters within Docker containers, ideal for development and testing. Loki is a horizontally scalable, highly available, multi-tenant log aggregation system inspired by Prometheus. Tempo is a high-volume, minimal-dependency trace aggregator, providing a robust way to store and query distributed traces. kube-prometheus-stack bundles Prometheus together with Grafana and other tools to provide a comprehensive monitoring solution out-of-the-box. OpenTelemetry Operator simplifies the deployment and management of OpenTelemetry collectors in K8s environments. Promtail is responsible for gathering logs and sending them to Loki. Integrating these components within a K8s cluster orchestrated by KIND not only streamlines the observability but also leverages the strengths of each tool, creating a cohesive and powerful monitoring solution. Setting up Your Kubernetes Cluster With KIND Firstly, ensure you have KIND installed on your machine. If not, you can easily install it using the following command: Shell curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.11.1/kind-$(uname)-amd64 chmod +x ./kind mv ./kind /usr/local/bin/kind Once KIND is installed, you can create a cluster by running: Shell kind create cluster --config kind-config.yaml kubectl create ns observability kubectl config set-context --current --namespace observability kind-config.yaml should be tailored to your specific requirements. It's important to ensure your cluster has the necessary resources (CPU, memory) to support the observability tools you plan to deploy. Deploying Observability Tools With HELM HELM, the package manager for Kubernetes, simplifies the deployment of applications. Here's how you can install Loki, Tempo, and the kube-prometheus-stack using HELM: Add the necessary HELM repositories: helm repo add grafana https://grafana.github.io/helm-charts helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm repo update Install Loki, Tempo, and kube-prometheus-stack: For each tool, we'll use a custom values file available in the provided GitHub repository. This ensures a tailored setup aligned with specific monitoring and tracing needs. Loki: helm upgrade --install loki grafana/loki --values https://raw.githubusercontent.com/brainupgrade-in/kubernetes/main/observability/opentelemetry/01-loki-values.yaml Tempo: helm install tempo grafana/tempo --values https://raw.githubusercontent.com/brainupgrade-in/kubernetes/main/observability/opentelemetry/02-tempo-values.yaml kube-prometheus-stack: helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack --values https://raw.githubusercontent.com/brainupgrade-in/kubernetes/main/observability/opentelemetry/03-grafana-helm-values.yaml Install OpenTelemetry Operator and Promtail: The OpenTelemetry Operator and Promtail can also be installed via HELM, further streamlining the setup process. OpenTelemetry Operator: helm install opentelemetry-operator open-telemetry/opentelemetry-operator Promtail: helm install promtail grafana/promtail --set "loki.serviceName=loki.observability.svc.cluster.local" Configuring OpenTelemetry for Optimal Observability Once the OpenTelemetry Operator is installed, you'll need to configure it to collect metrics, logs, and traces from your applications. OpenTelemetry provides a unified way to send observability data to various backends like Loki for logs, Prometheus for metrics, and Tempo for traces. A sample OpenTelemetry Collector configuration might look like this: YAML apiVersion: opentelemetry.io/v1alpha1 kind: OpenTelemetryCollector metadata: name: otel namespace: observability spec: config: | receivers: filelog: include: ["/var/log/containers/*.log"] otlp: protocols: grpc: endpoint: 0.0.0.0:4317 http: endpoint: 0.0.0.0:4318 processors: memory_limiter: check_interval: 1s limit_percentage: 75 spike_limit_percentage: 15 batch: send_batch_size: 1000 timeout: 10s exporters: # NOTE: Prior to v0.86.0 use `logging` instead of `debug`. debug: prometheusremotewrite: endpoint: "http://prometheus-kube-prometheus-prometheus.observability:9090/api/v1/write" loki: endpoint: "http://loki.observability:3100/loki/api/v1/push" otlp: endpoint: http://tempo.observability.svc.cluster.local:4317 retry_on_failure: enabled: true tls: insecure: true service: pipelines: traces: receivers: [otlp] processors: [memory_limiter, batch] exporters: [debug,otlp] metrics: receivers: [otlp] processors: [memory_limiter, batch] exporters: [debug,prometheusremotewrite] logs: receivers: [otlp] processors: [memory_limiter, batch] exporters: [debug,loki] mode: daemonset This configuration sets up the collector to receive data via the OTLP protocol, process it in batches, and export it to the appropriate backends. To enable auto-instrumentation for java apps, you can define the following. YAML apiVersion: opentelemetry.io/v1alpha1 kind: Instrumentation metadata: name: java-instrumentation namespace: observability spec: exporter: endpoint: http://otel-collector.observability:4317 propagators: - tracecontext - baggage sampler: type: always_on argument: "1" java: env: - name: OTEL_EXPORTER_OTLP_ENDPOINT value: http://otel-collector.observability:4317 Leveraging Observability Data for Insights With the observability tools in place, you can now leverage the collected data to gain actionable insights into your application's performance, reliability, and security. Grafana can be used to visualize metrics and logs, while Tempo allows you to trace distributed transactions across microservices. Visualizing Data With Grafana Grafana offers a powerful platform for creating dashboards that visualize the metrics and logs collected by Prometheus and Loki, respectively. You can create custom dashboards or import existing ones tailored to Kubernetes monitoring. Tracing With Tempo Tempo, integrated with OpenTelemetry, provides a detailed view of traces across microservices, helping you pinpoint the root cause of issues and optimize performance. Illustrating Observability With a Weather Application Example To bring the concepts of observability to life, let's walk through a practical example using a simple weather application deployed in our Kubernetes cluster. This application, structured around microservices, showcases how OpenTelemetry can be utilized to gather crucial metrics, logs, and traces. The configuration for this demonstration is based on a sample Kubernetes deployment found here. Deploying the Weather Application Our weather application is a microservice that fetches weather data. It's a perfect candidate to illustrate how OpenTelemetry captures and forwards telemetry data to our observability stack. Here's a partial snippet of the deployment configuration. Full YAML is found here. YAML apiVersion: apps/v1 kind: Deployment metadata: labels: app: weather tier: front name: weather-front spec: replicas: 1 selector: matchLabels: app: weather tier: front template: metadata: labels: app: weather tier: front app.kubernetes.io/name: weather-front annotations: prometheus.io/scrape: "true" prometheus.io/port: "8888" prometheus.io/path: /actuator/prometheus instrumentation.opentelemetry.io/inject-java: "true" # sidecar.opentelemetry.io/inject: 'true' instrumentation.opentelemetry.io/container-names: "weather-front" spec: containers: - image: brainupgrade/weather:metrics imagePullPolicy: Always name: weather-front resources: limits: cpu: 1000m memory: 2048Mi requests: cpu: 100m memory: 1500Mi env: - name: APP_NAME valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.labels['app.kubernetes.io/name'] - name: NAMESPACE valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.namespace - name: OTEL_SERVICE_NAME value: $(NAMESPACE)-$(APP_NAME) - name: spring.application.name value: $(NAMESPACE)-$(APP_NAME) - name: spring.datasource.url valueFrom: configMapKeyRef: name: app-config key: spring.datasource.url - name: spring.datasource.username valueFrom: secretKeyRef: name: app-secret key: spring.datasource.username - name: spring.datasource.password valueFrom: secretKeyRef: name: app-secret key: spring.datasource.password - name: weatherServiceURL valueFrom: configMapKeyRef: name: app-config key: weatherServiceURL - name: management.endpoints.web.exposure.include value: "*" - name: management.server.port value: "8888" - name: management.metrics.web.server.request.autotime.enabled value: "true" - name: management.metrics.tags.application value: $(NAMESPACE)-$(APP_NAME) - name: otel.instrumentation.log4j.capture-logs value: "true" - name: otel.logs.exporter value: "otlp" ports: - containerPort: 8080 This deployment configures the weather service with OpenTelemetry's OTLP (OpenTelemetry Protocol) exporter, directing telemetry data to our OpenTelemetry Collector. It also labels the service for clear identification within our telemetry data. Visualizing Observability Data Once deployed, the weather service starts sending metrics, logs, and traces to our observability tools. Here's how you can leverage this data. Trace the request across services using Tempo datasource Metrics: Prometheus, part of the kube-prometheus-stack, collects metrics on the number of requests, response times, and error rates. These metrics can be visualized in Grafana to monitor the health and performance of the weather service. For example, grafana dashboard (ID 17175) can be used to view Observability for Spring boot apps Logs: Logs generated by the weather service are collected by Promtail and stored in Loki. Grafana can query these logs, allowing you to search and visualize operational data. This is invaluable for debugging issues, such as understanding the cause of an unexpected spike in error rates. Traces: Traces captured by OpenTelemetry and stored in Tempo provide insight into the request flow through the weather service. This is crucial for identifying bottlenecks or failures in the service's operations. Gaining Insights With the weather application up and running, and observability data flowing, we can start to gain actionable insights: Performance optimization: By analyzing response times and error rates, we can identify slow endpoints or errors in the weather service, directing our optimization efforts more effectively. Troubleshooting: Logs and traces help us troubleshoot issues by providing context around errors or unexpected behavior, reducing the time to resolution. Scalability decisions: Metrics on request volumes and resource utilization guide decisions on when to scale the service to handle load more efficiently. This weather service example underscores the power of OpenTelemetry in a Kubernetes environment, offering a window into the operational aspects of applications. By integrating observability into the development and deployment pipeline, teams can ensure their applications are performant, reliable, and scalable. This practical example of a weather application illustrates the tangible benefits of implementing a comprehensive observability strategy with OpenTelemetry. It showcases how seamlessly metrics, logs, and traces can be collected, analyzed, and visualized, providing developers and operators with the insights needed to maintain and improve complex cloud-native applications. Conclusion Integrating OpenTelemetry with Kubernetes using tools like Loki, Tempo, and the kube-prometheus-stack offers a robust solution for observability. This setup not only simplifies the deployment and management of these tools but also provides a comprehensive view of your application's health, performance, and security. With the actionable insights gained from this observability stack, teams can proactively address issues, improve system reliability, and enhance the user experience. Remember, the key to successful observability lies in the strategic implementation and continuous refinement of your monitoring setup. Happy observability!
In an era where instant access to data is not just a luxury but a necessity, distributed caching has emerged as a pivotal technology in optimizing application performance. With the exponential growth of data and the demand for real-time processing, traditional methods of data storage and retrieval are proving inadequate. This is where distributed caching comes into play, offering a scalable, efficient, and faster way of handling data across various networked resources. Understanding Distributed Caching What Is Distributed Caching? Distributed caching refers to a method where information is stored across multiple servers, typically spread across various geographical locations. This approach ensures that data is closer to the user, reducing access time significantly compared to centralized databases. The primary goal of distributed caching is to enhance speed and reduce the load on primary data stores, thereby improving application performance and user experience. Key Components Cache store: At its core, the distributed cache relies on the cache store, where data is kept in-memory across multiple nodes. This arrangement ensures swift data retrieval and resilience to node failures. Cache engine: This engine orchestrates the operations of storing and retrieving data. It manages data partitioning for balanced distribution across nodes and load balancing to maintain performance during varying traffic conditions. Cache invalidation mechanism: A critical aspect that keeps the cache data consistent with the source database. Techniques such as time-to-live (TTL), write-through, and write-behind caching are used to ensure timely updates and data accuracy. Replication and failover processes: These processes provide high availability. They enable the cache system to maintain continuous operation, even in the event of node failures or network issues, by replicating data and providing backup nodes. Security and access control: Integral to protecting the cached data, these mechanisms safeguard against unauthorized access and ensure the integrity and confidentiality of data within the cache. Why Distributed Caching? Distributed caching is a game-changer in the realm of modern applications, offering distinct advantages that ensure efficient, scalable, and reliable software solutions. Speed and performance: Think of distributed caching as having express checkout lanes in a grocery store. Just as these lanes speed up the shopping experience, distributed caching accelerates data retrieval by storing frequently accessed data in memory. This results in noticeably faster and more responsive applications, especially important for dynamic platforms like e-commerce sites, real-time analytics tools, and interactive online games. Scaling with ease: As your application grows and attracts more users, it's like a store becoming more popular. You need more checkout lanes (or in this case, cache nodes) to handle the increased traffic. Distributed caching makes adding these extra lanes simple, maintaining smooth performance no matter how busy things get. Always up, always available: Imagine if one express lane closes unexpectedly – in a well-designed store, this isn’t a big deal because there are several others open. Similarly, distributed caching replicates data across various nodes. So, if one node goes down, the others take over without any disruption, ensuring your application remains up and running at all times. Saving on costs: Finally, using distributed caching is like smartly managing your store’s resources. It reduces the load on your main databases (akin to not overstaffing every lane) and, as a result, lowers operational costs. This efficient use of resources means your application does more with less, optimizing performance without needing excessive investment in infrastructure. How Distributed Caching Works Imagine you’re in a large library with lots of books (data). Every time you need a book, you must ask the librarian (the main database), who then searches through the entire library to find it. This process can be slow, especially if many people are asking for books at the same time. Now, enter distributed caching. Creating a mini-library (cache modes): In our library, we set up several small bookshelves (cache nodes) around the room. These mini-libraries store copies of the most popular books (frequently accessed data). So, when you want one of these books, you just grab it from the closest bookshelf, which is much faster than waiting for the librarian. Keeping the mini-libraries updated (cache invalidation): To ensure that the mini-libraries have the latest versions of the books, we have a system. Whenever a new edition comes out, or a book is updated, the librarian makes sure that these changes are reflected in the copies stored on the mini bookshelves. This way, you always get the most current information. Expanding the library (scalability): As more people come to the library, we can easily add more mini bookshelves or put more copies of popular books on existing shelves. This is like scaling the distributed cache — we can add more cache nodes or increase their capacity, ensuring everyone gets their books quickly, even when the library is crowded. Always open (high availability): What if one of the mini bookshelves is out of order (a node fails)? Well, there are other mini bookshelves with the same books, so you can still get what you need. This is how distributed caching ensures that data is always available, even if one part of the system goes down. In essence, distributed caching works by creating multiple quick-access points for frequently needed data, making it much faster to retrieve. It’s like having speedy express lanes in a large library, ensuring that you get your book quickly, the library runs efficiently, and everybody leaves happy. Caching Strategies Distributed caching strategies are like different methods used in a busy restaurant to ensure customers get their meals quickly and efficiently. Here’s how these strategies work in a simplified manner: Cache-aside (lazy loading): Imagine a waiter who only prepares a dish when a customer orders it. Once cooked, he keeps a copy in the kitchen for any future orders. In caching, this is like loading data into the cache only when it’s requested. It ensures that only necessary data is cached, but the first request might be slower as the data is not preloaded. Write-through caching: This is like a chef who prepares a new dish and immediately stores its recipe in a quick-reference guide. Whenever that dish is ordered, the chef can quickly recreate it using the guide. In caching, data is saved in the cache and the database simultaneously. This method ensures data consistency but might be slower for write operations. Write-around caching: Consider this as a variation of the write-through method. Here, when a new dish is created, the recipe isn’t immediately put into the quick-reference guide. It’s added only when it’s ordered again. In caching, data is written directly to the database and only written to the cache if it's requested again. This reduces the cache being filled with infrequently used data but might make the first read slower. Write-back caching: Imagine the chef writes down new recipes in the quick-reference guide first and updates the main recipe book later when there’s more time. In caching, data is first written to the cache and then, after some delay, written to the database. This speeds up write operations but carries a risk if the cache fails before the data is saved to the database. Each of these strategies has its pros and cons, much like different techniques in a restaurant kitchen. The choice depends on what’s more important for the application – speed, data freshness, or consistency. It's all about finding the right balance to serve up the data just the way it's needed! Consistency Models Understanding distributed caching consistency models can be simplified by comparing them to different methods of updating news on various bulletin boards across a college campus. Each bulletin board represents a cache node, and the news is the data you're caching. Strong consistency: This is like having an instant update on all bulletin boards as soon as a new piece of news comes in. Every time you check any board, you're guaranteed to see the latest news. In distributed caching, strong consistency ensures that all nodes show the latest data immediately after it's updated. It's great for accuracy but can be slower because you have to wait for all boards to be updated before continuing. Eventual consistency: Imagine that new news is first posted on the main bulletin board and then, over time, copied to other boards around the campus. If you check a board immediately after an update, you might not see the latest news, but give it a little time, and all boards will show the same information. Eventual consistency in distributed caching means that all nodes will eventually hold the same data, but there might be a short delay. It’s faster but allows for a brief period where different nodes might show slightly outdated information. Weak consistency: This is like having updates made to different bulletin boards at different times without a strict schedule. If you check different boards, you might find varying versions of the news. In weak consistency for distributed caching, there's no guarantee that all nodes will be updated at the same time, or ever fully synchronized. This model is the fastest, as it doesn't wait for updates to propagate to all nodes, but it's less reliable for getting the latest data. Read-through and write-through caching: These methods can be thought of as always checking or updating the main news board (the central database) when getting or posting news. In read-through caching, every time you read data, it checks with the main database to ensure it's up-to-date. In write-through caching, every time you update data, it updates the main database first before the bulletin boards. These methods ensure consistency between the cache and the central database but can be slower due to the constant checks or updates. Each of these models offers a different balance between ensuring data is up-to-date across all nodes and the speed at which data can be accessed or updated. The choice depends on the specific needs and priorities of your application. Use Cases E-Commerce Platforms Normal caching: Imagine a small boutique with a single counter for popular items. This helps a bit, as customers can quickly grab what they frequently buy. But when there's a big sale, the counter gets overcrowded, and people wait longer. Distributed caching: Now think of a large department store with multiple counters (nodes) for popular items, scattered throughout. During sales, customers can quickly find what they need from any nearby counter, avoiding long queues. This setup is excellent for handling heavy traffic and large, diverse inventories, typical in e-commerce platforms. Online Gaming Normal caching: It’s like having one scoreboard in a small gaming arcade. Players can quickly see scores, but if too many players join, updating and checking scores becomes slow. Distributed caching: In a large gaming complex with scoreboards (cache nodes) in every section, players anywhere can instantly see updates. This is crucial for online gaming, where real-time data (like player scores or game states) needs fast, consistent updates across the globe. Real-Time Analytics Normal caching: It's similar to having a single newsstand that quickly provides updates on certain topics. It's faster than searching through a library but can get overwhelming during peak news times. Distributed caching: Picture a network of digital screens (cache nodes) across a city, each updating in real-time with news. For applications analyzing live data (like financial trends or social media sentiment), this means getting instant insights from vast, continually updated data sources. Choosing the Right Distributed Caching Solution When selecting a distributed caching solution, consider the following: Performance and latency: Assess the solution's ability to handle your application’s load, especially under peak usage. Consider its read/write speed, latency, and how well it maintains performance consistency. This factor is crucial for applications requiring real-time responsiveness. Scalability and flexibility: Ensure the solution can horizontally scale as your user base and data volume grow. The system should allow for easy addition or removal of nodes with minimal impact on ongoing operations. Scalability is essential for adapting to changing demands. Data consistency and reliability: Choose a consistency model (strong, eventual, etc.) that aligns with your application's needs. Also, consider how the system handles node failures and data replication. Reliable data access and accuracy are vital for maintaining user trust and application integrity. Security features: Given the sensitive nature of data today, ensure the caching solution has robust security features, including authentication, authorization, and data encryption. This is especially important if you're handling personal or sensitive user data. Cost and total ownership: Evaluate the total cost of ownership, including licensing, infrastructure, and maintenance. Open-source solutions might offer cost savings but consider the need for internal expertise. Balancing cost with features and long-term scalability is key for a sustainable solution. Implementing Distributed Caching Implementing distributed caching effectively requires a strategic approach, especially when transitioning from normal (single-node) caching. Here’s a concise guide: Assessment and Planning Normal caching: Typically involves setting up a single cache server, often co-located with the application server. Distributed caching: Start with a thorough assessment of your application’s performance bottlenecks and data access patterns. Plan for multiple cache nodes, distributed across different servers or locations, to handle higher loads and ensure redundancy. Choosing the Right Technology Normal caching: Solutions like Redis or Memcached can be sufficient for single-node caching. Distributed caching: Select a distributed caching technology that aligns with your scalability, performance, and consistency needs. Redis Cluster, Apache Ignite, or Hazelcast are popular choices. Configuration and Deployment Normal caching: Configuration is relatively straightforward, focusing mainly on the memory allocation and cache eviction policies. Distributed caching: Requires careful configuration of data partitioning, replication strategies, and node discovery mechanisms. Ensure cache nodes are optimally distributed to balance load and minimize latency. Data Invalidation and Synchronization Normal caching: Less complex, often relying on TTL (time-to-live) settings for data invalidation. Distributed caching: Implement more sophisticated invalidation strategies like write-through or write-behind caching. Ensure synchronization mechanisms are in place for data consistency across nodes. Monitoring and Maintenance Normal caching: Involves standard monitoring of cache hit rates and memory usage. Distributed caching: Requires more advanced monitoring of individual nodes, network latency between nodes, and overall system health. Set up automated scaling and failover processes for high availability. Security Measures Normal caching: Basic security configurations might suffice. Distributed caching: Implement robust security protocols, including encryption in transit and at rest, and access controls. Challenges and Best Practices Challenges Cache invalidation: Ensuring that cached data is updated or invalidated when the underlying data changes. Data synchronization: Keeping data synchronized across multiple cache nodes. Best Practices Regularly monitor cache performance: Use monitoring tools to track hit-and-miss ratios and adjust strategies accordingly. Implement robust cache invalidation mechanisms: Use techniques like time-to-live (TTL) or explicit invalidation. Plan for failover and recovery: Ensure that your caching solution can handle node failures gracefully. Conclusion Distributed caching is an essential component in the architectural landscape of modern applications, especially those requiring high performance and scalability. By understanding the fundamentals, evaluating your needs, and following best practices, you can harness the power of distributed caching to elevate your application's performance, reliability, and user experience. As technology continues to evolve, distributed caching will play an increasingly vital role in managing the growing demands for fast and efficient data access.
This article presents an in-depth analysis of the service mesh landscape, focusing specifically on Istio, one of the most popular service mesh frameworks. A service mesh is a dedicated infrastructure layer for managing service-to-service communication in the world of microservices. Istio, built to seamlessly integrate with platforms like Kubernetes, provides a robust way to connect, secure, control, and observe services. This journal explores Istio’s architecture, its key features, and the value it provides in managing microservices at scale. Service Mesh A Kubernetes service mesh is a tool that improves the security, monitoring, and reliability of applications on Kubernetes. It manages communication between microservices and simplifies the complex network environment. By deploying network proxies alongside application code, the service mesh controls the data plane. This combination of Kubernetes and service mesh is particularly beneficial for cloud-native applications with many services and instances. The service mesh ensures reliable and secure communication, allowing developers to focus on core application development. A Kubernetes service mesh, like any service mesh, simplifies how distributed applications communicate with each other. It acts as a layer of infrastructure that manages and controls this communication, abstracting away the complexity from individual services. Just like a tracking and routing service for packages, a Kubernetes service mesh tracks and directs traffic based on rules to ensure reliable and efficient communication between services. A service mesh consists of a data plane and a control plane. The data plane includes lightweight proxies deployed alongside application code, handling the actual service-to-service communication. The control plane configures these proxies, manages policies, and provides additional capabilities such as tracing and metrics collection. With a Kubernetes service mesh, developers can separate their application's logic from the infrastructure that handles security and observability, enabling secure and monitored communication between microservices. It also supports advanced deployment strategies and integrates with monitoring tools for better operational control. Istio as a Service Mesh Istio is a popular open-source service mesh that has gained significant adoption among major tech companies like Google, IBM, and Lyft. It leverages the data plane and control plane architecture common to all service meshes, with its data plane consisting of envoy proxies deployed as sidecars within Kubernetes pods. The data plane in Istio is responsible for managing traffic, implementing fault injection for specific protocols, and providing application layer load balancing. This application layer load balancing differs from the transport layer load balancing in Kubernetes. Additionally, Istio includes components for collecting metrics, enforcing access control, authentication, and authorization, as well as integrating with monitoring and logging systems. It also supports encryption, authentication policies, and role-based access control through features like TLS authentication. Find the Istio architecture diagram below: Below, find the configuration and data flow diagram of Istio: Furthermore, Istio can be extended with various tools to enhance its functionality and integrate with other systems. This allows users to customize and expand the capabilities of their Istio service mesh based on their specific requirements. Traffic Management Istio offers traffic routing features that have a significant impact on performance and facilitate effective deployment strategies. These features allow precise control over the flow of traffic and API calls within a single cluster and across clusters. Within a single cluster, Istio's traffic routing rules enable efficient distribution of requests between services based on factors like load balancing algorithms, service versions, or user-defined rules. This ensures optimal performance by evenly distributing requests and dynamically adjusting routing based on service health and availability. Routing traffic across clusters enhances scalability and fault tolerance. Istio provides configuration options for traffic routing across clusters, including round-robin, least connections, or custom rules. This capability allows traffic to be directed to different clusters based on factors such as network proximity, resource utilization, or specific business requirements. In addition to performance optimization, Istio's traffic routing rules support advanced deployment strategies. A/B testing enables the routing of a certain percentage of traffic to a new service version while serving the majority of traffic to the existing version. Canary deployments involve gradually shifting traffic from an old version to a new version, allowing for monitoring and potential rollbacks. Staged rollouts incrementally increase traffic to a new version, enabling precise control and monitoring of the deployment process. Furthermore, Istio simplifies the configuration of service-level properties like circuit breakers, timeouts, and retries. Circuit breakers prevent cascading failures by redirecting traffic when a specified error threshold is reached. Timeouts and retries handle network delays or transient failures by defining response waiting times and the number of request retries. In summary, Istio's traffic routing capabilities provide a flexible and powerful means to control traffic and API calls, improving performance and facilitating advanced deployment strategies such as A/B testing, canary deployments, and staged rollouts. The following is a code sample that demonstrates how to use Istio's traffic routing features in Kubernetes using Istio VirtualService and DestinationRule resources: In the code below, we define a VirtualService named my-service with a host my-service.example.com. We configure traffic routing by specifying two routes: one to the v1 subset of the my-service destination and another to the v2 subset. We assign different weights to each route to control the proportion of traffic they receive. The DestinationRule resource defines subsets for the my-service destination, allowing us to route traffic to different versions of the service based on labels. In this example, we have subsets for versions v1 and v2. Code Sample YAML # Example VirtualService configuration apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: my-service spec: hosts: - my-service.example.com http: - route: - destination: host: my-service subset: v1 weight: 90 - destination: host: my-service subset: v2 weight: 10 # Example DestinationRule configuration apiVersion: networking.istio.io/v1alpha3 kind: DestinationRule metadata: name: my-service spec: host: my-service subsets: - name: v1 labels: version: v1 - name: v2 labels: version: v2 Observability As the complexity of services grows, it becomes increasingly challenging to comprehend their behavior and performance. Istio addresses this challenge by automatically generating detailed telemetry for all communications within a service mesh. This telemetry includes metrics, distributed traces, and access logs, providing comprehensive observability into the behavior of services. With Istio, operators can easily access and analyze metrics that capture various aspects of service performance, such as request rates, latency, and error rates. These metrics offer valuable insights into the health and efficiency of services, allowing operators to proactively identify and address performance issues. Distributed tracing in Istio enables the capturing and correlation of trace spans across multiple services involved in a request. This provides a holistic view of the entire request flow, allowing operators to understand the latency and dependencies between services. With this information, operators can pinpoint bottlenecks and optimize the performance of their applications. Full access logs provided by Istio capture detailed information about each request, including headers, payloads, and response codes. These logs offer a comprehensive audit trail of service interactions, enabling operators to investigate issues, debug problems, and ensure compliance with security and regulatory requirements. The telemetry generated by Istio is instrumental in empowering operators to troubleshoot, maintain, and optimize their applications. It provides a deep understanding of how services interact, allowing operators to make data-driven decisions and take proactive measures to improve performance and reliability. Furthermore, Istio's telemetry capabilities are seamlessly integrated into the service mesh without requiring any modifications to the application code, making it a powerful and convenient tool for observability. Istio automatically generates telemetry for all communications within a service mesh, including metrics, distributed traces, and access logs. Here's an example of how you can access metrics and logs using Istio: Commands in Bash # Access metrics: istioctl dashboard kiali # Access distributed traces: istioctl dashboard jaeger # Access access logs: kubectl logs -l istio=ingressgateway -n istio-system In the code above, we use the istioctl command-line tool to access Istio's observability dashboards. The istioctl dashboard kiali command opens the Kiali dashboard, which provides a visual representation of the service mesh and allows you to view metrics such as request rates, latency, and error rates. The istioctl dashboard jaeger command opens the Jaeger dashboard, which allows you to view distributed traces and analyze the latency and dependencies between services. To access access logs, we use the kubectl logs command to retrieve logs from the Istio Ingress Gateway. By filtering logs with the label istio=ingressgateway and specifying the namespace istio-system, we can view detailed information about each request, including headers, payloads, and response codes. By leveraging these observability features provided by Istio, operators can gain deep insights into the behavior and performance of their services. This allows them to troubleshoot issues, optimize performance, and ensure the reliability of their applications. Security Capabilities Microservices have specific security requirements, such as protecting against man-in-the-middle attacks, implementing flexible access controls, and enabling auditing tools. Istio addresses these needs with its comprehensive security solution. Istio's security model follows a "security-by-default" approach, providing in-depth defense for deploying secure applications across untrusted networks. It ensures strong identity management, authenticating and authorizing services within the service mesh to prevent unauthorized access and enhance security. Transparent TLS encryption is a crucial component of Istio's security framework. It encrypts all communication within the service mesh, safeguarding data from eavesdropping and tampering. Istio manages certificate rotation automatically, simplifying the maintenance of a secure communication channel between services. Istio also offers powerful policy enforcement capabilities, allowing operators to define fine-grained access controls and policies for service communication. These policies can be dynamically enforced and updated without modifying the application code, providing flexibility in managing access and ensuring secure communication. With Istio, operators have access to authentication, authorization, and audit (AAA) tools. Istio supports various authentication mechanisms, including mutual TLS, JSON Web Tokens (JWT), and OAuth2, ensuring secure authentication of clients and services. Additionally, comprehensive auditing capabilities help operators track service behavior, comply with regulations, and detect potential security incidents. In summary, Istio's security solution addresses the specific security requirements of microservices, providing strong identity management, transparent TLS encryption, policy enforcement, and AAA tools. It enables operators to deploy secure applications and protect services and data within the service mesh. Code Sample YAML # Example DestinationRule for mutual TLS authentication apiVersion: networking.istio.io/v1alpha3 kind: DestinationRule metadata: name: my-service spec: host: my-service trafficPolicy: tls: mode: MUTUAL clientCertificate: /etc/certs/client.pem privateKey: /etc/certs/private.key caCertificates: /etc/certs/ca.pem # Example AuthorizationPolicy for access control apiVersion: security.istio.io/v1beta1 kind: AuthorizationPolicy metadata: name: my-service-access spec: selector: matchLabels: app: my-service rules: - from: - source: principals: ["cluster.local/ns/default/sa/my-allowed-service-account"] to: - operation: methods: ["*"] In the code above, we configure mutual TLS authentication for the my-service destination using a DestinationRule resource. We set the mode to MUTUAL to enforce mutual TLS authentication between clients and the service. The clientCertificate, privateKey, and caCertificates fields specify the paths to the client certificate, private key, and CA certificate, respectively. We also define an AuthorizationPolicy resource to control access to the my-service based on the source service account. In this example, we allow requests from the my-allowed-service-account service account in the default namespace by specifying its principal in the principals field. By applying these configurations to an Istio-enabled Kubernetes cluster, you can enhance the security of your microservices by enforcing mutual TLS authentication and implementing fine-grained access controls. Circuit Breaking and Retry Circuit breaking and retries are crucial techniques in building resilient distributed systems, especially in microservices architectures. Circuit breaking prevents cascading failures by stopping requests to a service experiencing errors or high latency. Istio's CircuitBreaker resource allows you to define thresholds for failed requests and other error conditions, ensuring that the circuit opens and stops further degradation when these thresholds are crossed. This isolation protects other services from being affected. Additionally, Istio's Retry resource enables automatic retries of failed requests, with customizable backoff strategies, timeout periods, and triggering conditions. By retrying failed requests, transient failures can be handled effectively, increasing the chances of success. Combining circuit breaking and retries enhances the resilience of microservices, isolating failing services and providing resilient handling of intermittent issues. Configuration of circuit breaking and retries in Istio is done within the VirtualService resource, allowing for customization based on specific requirements. Overall, leveraging these features in Istio is essential for building robust and resilient microservices architectures, protecting against failures, and maintaining system reliability. In the code below, we configure circuit breaking and retries for my-service using the VirtualService resource. The retries section specifies that failed requests should be retried up to 3 times with a per-try timeout of 2 seconds. The retryOn field specifies the conditions under which retries should be triggered, such as 5xx server errors or connect failures. The fault section configures fault injection for the service. In this example, we introduce a fixed delay of 5 seconds for 50% of the requests and abort 10% of the requests with a 503 HTTP status code. The circuitBreaker section defines the circuit-breaking thresholds for the service. The example configuration sets the maximum number of connections to 100, maximum HTTP requests to 100, maximum pending requests to 10, sleep window to 5 seconds, and HTTP detection interval to 10 seconds. By applying this configuration to an Istio-enabled Kubernetes cluster, you can enable circuit breaking and retries for your microservices, enhancing resilience and preventing cascading failures. Code Sample YAML apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: my-service spec: hosts: - my-service http: - route: - destination: host: my-service subset: v1 retries: attempts: 3 perTryTimeout: 2s retryOn: 5xx,connect-failure fault: delay: fixedDelay: 5s percentage: value: 50 abort: httpStatus: 503 percentage: value: 10 circuitBreaker: simpleCb: maxConnections: 100 httpMaxRequests: 100 httpMaxPendingRequests: 10 sleepWindow: 5s httpDetectionInterval: 10s Canary Deployments Canary deployments with Istio offer a powerful strategy for releasing new features or updates to a subset of users or traffic while minimizing the risk of impacting the entire system. With Istio's traffic management capabilities, you can easily implement canary deployments by directing a fraction of the traffic to the new version or feature. Istio's VirtualService resource allows you to define routing rules based on percentages, HTTP headers, or other criteria to selectively route traffic. By gradually increasing the traffic to the canary version, you can monitor its performance and gather feedback before rolling it out to the entire user base. Istio also provides powerful observability features, such as distributed tracing and metrics collection, allowing you to closely monitor the canary deployment and make data-driven decisions. In case of any issues or anomalies, you can quickly roll back to the stable version or implement other remediation strategies, minimizing the impact on users. Canary deployments with Istio provide a controlled and gradual approach to releasing new features, ensuring that changes are thoroughly tested and validated before impacting the entire system, thus improving the overall reliability and stability of your applications. To implement canary deployments with Istio, we can use the VirtualService resource to define routing rules and gradually shift traffic to the canary version. Code Sample YAML apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: my-service spec: hosts: - my-service http: - route: - destination: host: my-service subset: stable weight: 90 - destination: host: my-service subset: canary weight: 10 In the code above, we configure the VirtualService to route 90% of the traffic to the stable version of the service (subset: stable) and 10% of the traffic to the canary version (subset: canary). The weight field specifies the distribution of traffic between the subsets. By applying this configuration, you can gradually increase the traffic to the canary version and monitor its behavior and performance. Istio's observability features, such as distributed tracing and metrics collection, can provide insights into the canary deployment's behavior and impact. If any issues or anomalies are detected, you can quickly roll back to the stable version by adjusting the traffic weights or implementing other remediation strategies. By leveraging Istio's traffic management capabilities, you can safely release new features or updates, gather feedback, and mitigate risks before fully rolling them out to your user base. Autoscaling Istio seamlessly integrates with Kubernetes' Horizontal Pod Autoscaler (HPA) to enable automated scaling of microservices based on various metrics, such as CPU or memory usage. By configuring Istio's metrics collection and setting up the HPA, you can ensure that your microservices scale dynamically in response to increased traffic or resource demands. Istio's metrics collection capabilities allow you to gather detailed insights into the performance and resource utilization of your microservices. These metrics can then be used by the HPA to make informed scaling decisions. The HPA continuously monitors the metrics and adjusts the number of replicas for a given microservice based on predefined scaling rules and thresholds. When the defined thresholds are crossed, the HPA automatically scales up or down the number of pods, ensuring that the microservices can handle the current workload efficiently. This automated scaling approach eliminates the need for manual intervention and enables your microservices to adapt to fluctuating traffic patterns or resource demands in real time. By leveraging Istio's integration with Kubernetes' HPA, you can achieve optimal resource utilization, improve performance, and ensure the availability and scalability of your microservices. Code Sample YAML apiVersion: autoscaling/v2beta2 kind: HorizontalPodAutoscaler metadata: name: my-service-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: my-service minReplicas: 1 maxReplicas: 10 metrics: - type: Resource resource: name: cpu targetAverageUtilization: 50 In the example above, the HPA is configured to scale the my-service deployment based on CPU usage. The HPA will maintain an average CPU utilization of 50% across all pods. By applying this configuration, Istio will collect metrics from your microservices, and the HPA will automatically adjust the number of replicas based on the defined scaling rules and thresholds. With this integration, your microservices can dynamically scale up or down based on traffic patterns and resource demands, ensuring optimal utilization of resources and improved performance. It’s important to note that the Istio integration with Kubernetes' HPA may require additional configuration and tuning based on your specific requirements and monitoring setup. Implementing Fault Injection and Chaos Testing With Istio Chaos fault injection with Istio is a powerful technique that allows you to test the resilience and robustness of your microservices architecture. Istio provides built-in features for injecting faults and failures into your system, simulating real-world scenarios, and evaluating how well your system can handle them. With Istio's Fault Injection feature, you can introduce delays, errors, aborts, or latency spikes to specific requests or services. By configuring VirtualServices and DestinationRules, you can selectively apply fault injection based on criteria such as HTTP headers or paths. By combining fault injection with observability features like distributed tracing and metrics collection, you can closely monitor the impact of injected faults on different services in real time. Chaos fault injection with Istio helps you identify weaknesses, validate error handling mechanisms, and build confidence in the resilience of your microservices architecture, ensuring the reliability and stability of your applications in production environments. Securing External Traffic Using Istio's Ingress Gateway Securing external traffic using Istio's Ingress Gateway is crucial for protecting your microservices architecture from unauthorized access and potential security threats. Istio's Ingress Gateway acts as the entry point for external traffic, providing a centralized and secure way to manage inbound connections. By configuring Istio's Ingress Gateway, you can enforce authentication, authorization, and encryption protocols to ensure that only authenticated and authorized traffic can access your microservices. Istio supports various authentication mechanisms such as JSON Web Tokens (JWT), mutual TLS (mTLS), and OAuth, allowing you to choose the most suitable method for your application's security requirements. Additionally, Istio's Ingress Gateway enables you to define fine-grained access control policies based on source IP, user identity, or other attributes, ensuring that only authorized clients can reach specific microservices. By leveraging Istio's powerful traffic management capabilities, you can also enforce secure communication between microservices within your architecture, preventing unauthorized access or eavesdropping. Overall, Istio's Ingress Gateway provides a robust and flexible solution for securing external traffic, protecting your microservices, and ensuring the integrity and confidentiality of your data and communications. Code Sample YAML apiVersion: networking.istio.io/v1alpha3 kind: Gateway metadata: name: my-gateway spec: selector: istio: ingressgateway servers: - port: number: 80 name: http protocol: HTTP hosts: - "*" In this example, we define a Gateway named my-gateway that listens on port 80 and accepts HTTP traffic from any host. The Gateway's selector is set to istio: ingressgateway, which ensures that it will be used as the Ingress Gateway for external traffic. Best Practices for Managing and Operating Istio in Production Environments When managing and operating Istio in production environments, there are several best practices to follow. First, it is essential to carefully plan and test your Istio deployment before production rollout, ensuring compatibility with your specific application requirements and infrastructure. Properly monitor and observe your Istio deployment using Istio's built-in observability features, including distributed tracing, metrics, and logging. Regularly review and update Istio configurations to align with your evolving application needs and security requirements. Implement traffic management cautiously, starting with conservative traffic routing rules and gradually introducing more advanced features like traffic splitting and canary deployments. Take advantage of Istio's traffic control capabilities to implement circuit breaking, retries, and timeout policies to enhance the resilience of your microservices. Regularly update and patch your Istio installation to leverage the latest bug fixes, security patches, and feature enhancements. Lastly, establish a robust backup and disaster recovery strategy to mitigate potential risks and ensure business continuity. By adhering to these best practices, you can effectively manage and operate Istio in production environments, ensuring the reliability, security, and performance of your microservices architecture. Conclusion In the evolving landscape of service-to-service communication, Istio, as a service mesh, has surfaced as an integral component, offering a robust and flexible solution for managing complex communication between microservices in a distributed architecture. Istio's capabilities extend beyond merely facilitating communication to providing comprehensive traffic management, enabling sophisticated routing rules, retries, failovers, and fault injections. It also addresses security, a critical aspect in the microservices world, by implementing it at the infrastructure level, thereby reducing the burden on application code. Furthermore, Istio enhances observability in the system, allowing organizations to effectively monitor and troubleshoot their services. Despite the steep learning curve associated with Istio, the multitude of benefits it offers makes it a worthy investment for organizations. The control and flexibility it provides over microservices are unparalleled. With the growing adoption of microservices, the role of service meshes like Istio is becoming increasingly pivotal, ensuring reliable, secure operation of services, and providing the scalability required in today's dynamic business environment. In conclusion, Istio holds a significant position in the service mesh realm, offering a comprehensive solution for managing microservices at scale. It represents the ongoing evolution in service-to-service communication, driven by the need for more efficient, secure, and manageable solutions. The future of Istio and service mesh, in general, appears promising, with continuous research and development efforts aimed at strengthening and broadening their capabilities. References "What is a service mesh?" (Red Hat) "Istio - Connect, secure, control, and observe services." (Istio) "What is Istio?" (IBM Cloud) "Understanding the Basics of Service Mesh" (Container Journal)
I blogged about Java stream debugging in the past, but I skipped an important method that's worthy of a post of its own: peek. This blog post delves into the practicalities of using peek() to debug Java streams, complete with code samples and common pitfalls. Understanding Java Streams Java Streams represent a significant shift in how Java developers work with collections and data processing, introducing a functional approach to handling sequences of elements. Streams facilitate declarative processing of collections, enabling operations such as filter, map, reduce, and more in a fluent style. This not only makes the code more readable but also more concise compared to traditional iterative approaches. A Simple Stream Example To illustrate, consider the task of filtering a list of names to only include those that start with the letter "J" and then transforming each name into uppercase. Using the traditional approach, this might involve a loop and some "if" statements. However, with streams, this can be accomplished in a few lines: List<String> names = Arrays.asList("John", "Jacob", "Edward", "Emily"); // Convert list to stream List<String> filteredNames = names.stream() // Filter names that start with "J" .filter(name -> name.startsWith("J")) // Convert each name to uppercase .map(String::toUpperCase) // Collect results into a new list .collect(Collectors.toList()); System.out.println(filteredNames); Output: [JOHN, JACOB] This example demonstrates the power of Java streams: by chaining operations together, we can achieve complex data transformations and filtering with minimal, readable code. It showcases the declarative nature of streams, where we describe what we want to achieve rather than detailing the steps to get there. What Is the peek() Method? At its core, peek() is a method provided by the Stream interface, allowing developers a glance into the elements of a stream without disrupting the flow of its operations. The signature of peek() is as follows: Stream<T> peek(Consumer<? super T> action) It accepts a Consumer functional interface, which means it performs an action on each element of the stream without altering them. The most common use case for peek() is logging the elements of a stream to understand the state of data at various points in the stream pipeline. To understand peek, let's look at a sample similar to the previous one: List<String> collected = Stream.of("apple", "banana", "cherry") .filter(s -> s.startsWith("a")) .collect(Collectors.toList()); System.out.println(collected); This code filters a list of strings, keeping only the ones that start with "a". While it's straightforward, understanding what happens during the filter operation is not visible. Debugging With peek() Now, let's incorporate peek() to gain visibility into the stream: List<String> collected = Stream.of("apple", "banana", "cherry") .peek(System.out::println) // Logs all elements .filter(s -> s.startsWith("a")) .peek(System.out::println) // Logs filtered elements .collect(Collectors.toList()); System.out.println(collected); By adding peek() both before and after the filter operation, we can see which elements are processed and how the filter impacts the stream. This visibility is invaluable for debugging, especially when the logic within the stream operations becomes complex. We can't step over stream operations with the debugger, but peek() provides a glance into the code that is normally obscured from us. Uncovering Common Bugs With peek() Filtering Issues Consider a scenario where a filter condition is not working as expected: List<String> collected = Stream.of("apple", "banana", "cherry", "Avocado") .filter(s -> s.startsWith("a")) .collect(Collectors.toList()); System.out.println(collected); Expected output might be ["apple"], but let's say we also wanted "Avocado" due to a misunderstanding of the startsWith method's behavior. Since "Avocado" is spelled with an upper case "A" this code will return false: Avocado".startsWith("a"). Using peek(), we can observe the elements that pass the filter: List<String> debugged = Stream.of("apple", "banana", "cherry", "Avocado") .peek(System.out::println) .filter(s -> s.startsWith("a")) .peek(System.out::println) .collect(Collectors.toList()); System.out.println(debugged); Large Data Sets In scenarios involving large datasets, directly printing every element in the stream to the console for debugging can quickly become impractical. It can clutter the console and make it hard to spot the relevant information. Instead, we can use peek() in a more sophisticated way to selectively collect and analyze data without causing side effects that could alter the behavior of the stream. Consider a scenario where we're processing a large dataset of transactions, and we want to debug issues related to transactions exceeding a certain threshold: class Transaction { private String id; private double amount; // Constructor, getters, and setters omitted for brevity } List<Transaction> transactions = // Imagine a large list of transactions // A placeholder for debugging information List<Transaction> highValueTransactions = new ArrayList<>(); List<Transaction> processedTransactions = transactions.stream() // Filter transactions above a threshold .filter(t -> t.getAmount() > 5000) .peek(t -> { if (t.getAmount() > 10000) { // Collect only high-value transactions for debugging highValueTransactions.add(t); } }) .collect(Collectors.toList()); // Now, we can analyze high-value transactions separately, without overloading the console System.out.println("High-value transactions count: " + highValueTransactions.size()); In this approach, peek() is used to inspect elements within the stream conditionally. High-value transactions that meet a specific criterion (e.g., amount > 10,000) are collected into a separate list for further analysis. This technique allows for targeted debugging without printing every element to the console, thereby avoiding performance degradation and clutter. Addressing Side Effects Streams shouldn't have side effects. In fact, such side effects would break the stream debugger in IntelliJ which I have discussed in the past. It's crucial to note that while collecting data for debugging within peek() avoids cluttering the console, it does introduce a side effect to the stream operation, which goes against the recommended use of streams. Streams are designed to be side-effect-free to ensure predictability and reliability, especially in parallel operations. Therefore, while the above example demonstrates a practical use of peek() for debugging, it's important to use such techniques judiciously. Ideally, this debugging strategy should be temporary and removed once the debugging session is completed to maintain the integrity of the stream's functional paradigm. Limitations and Pitfalls While peek() is undeniably a useful tool for debugging Java streams, it comes with its own set of limitations and pitfalls that developers should be aware of. Understanding these can help avoid common traps and ensure that peek() is used effectively and appropriately. Potential for Misuse in Production Code One of the primary risks associated with peek() is its potential for misuse in production code. Because peek() is intended for debugging purposes, using it to alter state or perform operations that affect the outcome of the stream can lead to unpredictable behavior. This is especially true in parallel stream operations, where the order of element processing is not guaranteed. Misusing peek() in such contexts can introduce hard-to-find bugs and undermine the declarative nature of stream processing. Performance Overhead Another consideration is the performance impact of using peek(). While it might seem innocuous, peek() can introduce a significant overhead, particularly in large or complex streams. This is because every action within peek() is executed for each element in the stream, potentially slowing down the entire pipeline. When used excessively or with complex operations, peek() can degrade performance, making it crucial to use this method judiciously and remove any peek() calls from production code after debugging is complete. Side Effects and Functional Purity As highlighted in the enhanced debugging example, peek() can be used to collect data for debugging purposes, but this introduces side effects to what should ideally be a side-effect-free operation. The functional programming paradigm, which streams are a part of, emphasizes purity and immutability. Operations should not alter state outside their scope. By using peek() to modify external state (even for debugging), you're temporarily stepping away from these principles. While this can be acceptable for short-term debugging, it's important to ensure that such uses of peek() do not find their way into production code, as they can compromise the predictability and reliability of your application. The Right Tool for the Job Finally, it's essential to recognize that peek() is not always the right tool for every debugging scenario. In some cases, other techniques such as logging within the operations themselves, using breakpoints and inspecting variables in an IDE, or writing unit tests to assert the behavior of stream operations might be more appropriate and effective. Developers should consider peek() as one tool in a broader debugging toolkit, employing it when it makes sense and opting for other strategies when they offer a clearer or more efficient path to identifying and resolving issues. Navigating the Pitfalls To navigate these pitfalls effectively: Reserve peek() strictly for temporary debugging purposes. If you have a linter as part of your CI tools, it might make sense to add a rule that blocks code from invoking peek(). Always remove peek() calls from your code before committing it to your codebase, especially for production deployments. Be mindful of performance implications and the potential introduction of side effects. Consider alternative debugging techniques that might be more suited to your specific needs or the particular issue you're investigating. By understanding and respecting these limitations and pitfalls, developers can leverage peek() to enhance their debugging practices without falling into common traps or inadvertently introducing problems into their codebases. Final Thoughts The peek() method offers a simple yet effective way to gain insights into Java stream operations, making it a valuable tool for debugging complex stream pipelines. By understanding how to use peek() effectively, developers can avoid common pitfalls and ensure their stream operations perform as intended. As with any powerful tool, the key is to use it wisely and in moderation. The true value of peek() is in debugging massive data sets, these elements are very hard to analyze even with dedicated tools. By using peek() we can dig into the said data set and understand the source of the issue programmatically.
In this blog, you will learn how to implement Retrieval Augmented Generation (RAG) using Weaviate, LangChain4j, and LocalAI. This implementation allows you to ask questions about your documents using natural language. Enjoy! 1. Introduction In the previous post, Weaviate was used as a vector database in order to perform a semantic search. The source documents used are two Wikipedia documents. The discography and list of songs recorded by Bruce Springsteen are the documents used. The interesting part of these documents is that they contain facts and are mainly in a table format. Parts of these documents are converted to Markdown in order to have a better representation. The Markdown files are embedded in Collections in Weaviate. The result was amazing: all questions asked, resulted in the correct answer to the question. That is, the correct segment was returned. You still needed to extract the answer yourself, but this was quite easy. However, can this be solved by providing the Weaviate search results to an LLM (Large Language Model) by creating the right prompt? Will the LLM be able to extract the correct answers to the questions? The setup is visualized in the graph below: The documents are embedded and stored in Weaviate; The question is embedded and a semantic search is performed using Weaviate; Weaviate returns the semantic search results; The result is added to a prompt and fed to LocalAI which runs an LLM using LangChain4j; The LLM returns the answer to the question. Weaviate also supports RAG, so why bother using LocalAI and LangChain4j? Unfortunately, Weaviate does not support integration with LocalAI and only cloud LLMs can be used. If your documents contain sensitive information or information you do not want to send to a cloud-based LLM, you need to run a local LLM and this can be done using LocalAI and LangChain4j. If you want to run the examples in this blog, you need to read the previous blog. The sources used in this blog can be found on GitHub. 2. Prerequisites The prerequisites for this blog are: Basic knowledge of embedding and vector stores; Basic Java knowledge, Java 21 is used; Basic knowledge of Docker; Basic knowledge of LangChain4j; You need Weaviate and the documents need to be embedded, see the previous blog on how to do so; You need LocalAI if you want to run the examples, see a previous blog on how you can make use of LocalAI. Version 2.2.0 is used for this blog. If you want to learn more about RAG, read this blog. 3. Create the Setup Before getting started, there is some setup to do. 3.1 Setup LocalAI LocalAI must be running and configured. How to do so is explained in the blog Running LLM’s Locally: A Step-by-Step Guide. 3.2 Setup Weaviate Weaviate must be started. The only difference with the Weaviate blog is that you will run it on port 8081 instead of port 8080. This is because LocalAI is already running on port 8080. Start the compose file from the root of the repository. Shell $ docker compose -f docker/compose-embed-8081.yaml Run class EmbedMarkdown in order to embed the documents (change the port to 8081!). Three collections are created: CompilationAlbum: a list of all compilation albums of Bruce Springsteen; Song: a list of all songs by Bruce Springsteen; StudioAlbum: a list of all studio albums of Bruce Springsteen. 4. Implement RAG 4.1 Semantic Search The first part of the implementation is based on the semantic search implementation of class SearchCollectionNearText. It is assumed here, that you know in which collection (argument className) to search for. In the previous post, you noticed that strictly spoken, you do not need to know which collection to search for. However, at this moment, it makes the implementation a bit easier and the result remains identical. The code will take the question and with the help of NearTextArgument, the question will be embedded. The GraphQL API of Weaviate is used to perform the search. Java private static void askQuestion(String className, Field[] fields, String question, String extraInstruction) { Config config = new Config("http", "localhost:8081"); WeaviateClient client = new WeaviateClient(config); Field additional = Field.builder() .name("_additional") .fields(Field.builder().name("certainty").build(), // only supported if distance==cosine Field.builder().name("distance").build() // always supported ).build(); Field[] allFields = Arrays.copyOf(fields, fields.length + 1); allFields[fields.length] = additional; // Embed the question NearTextArgument nearText = NearTextArgument.builder() .concepts(new String[]{question}) .build(); Result<GraphQLResponse> result = client.graphQL().get() .withClassName(className) .withFields(allFields) .withNearText(nearText) .withLimit(1) .run(); if (result.hasErrors()) { System.out.println(result.getError()); return; } ... 4.2 Create Prompt The result of the semantic search needs to be fed to the LLM including the question itself. A prompt is created which will instruct the LLM to answer the question using the result of the semantic search. Also, the option to add extra instructions is implemented. Later on, you will see what to do with that. Java private static String createPrompt(String question, String inputData, String extraInstruction) { return "Answer the following question: " + question + "\n" + extraInstruction + "\n" + "Use the following data to answer the question: " + inputData; } 4.3 Use LLM The last thing to do is to feed the prompt to the LLM and print the question and answer to the console. Java private static void askQuestion(String className, Field[] fields, String question, String extraInstruction) { ... ChatLanguageModel model = LocalAiChatModel.builder() .baseUrl("http://localhost:8080") .modelName("lunademo") .temperature(0.0) .build(); String answer = model.generate(createPrompt(question, result.getResult().getData().toString(), extraInstruction)); System.out.println(question); System.out.println(answer); } 4.4 Questions The questions to be asked are the same as in the previous posts. They will invoke the code above. Java public static void main(String[] args) { askQuestion(Song.NAME, Song.getFields(), "on which album was \"adam raised a cain\" originally released?", ""); askQuestion(StudioAlbum.NAME, StudioAlbum.getFields(), "what is the highest chart position of \"Greetings from Asbury Park, N.J.\" in the US?", ""); askQuestion(CompilationAlbum.NAME, CompilationAlbum.getFields(), "what is the highest chart position of the album \"tracks\" in canada?", ""); askQuestion(Song.NAME, Song.getFields(), "in which year was \"Highway Patrolman\" released?", ""); askQuestion(Song.NAME, Song.getFields(), "who produced \"all or nothin' at all?\"", ""); } The complete source code can be viewed here. 5. Results Run the code and the result is the following: On which album was “Adam Raised a Cain” originally released?The album “Darkness on the Edge of Town” was originally released in 1978, and the song “Adam Raised a Cain” was included on that album. What is the highest chart position of “Greetings from Asbury Park, N.J.” in the US?The highest chart position of “Greetings from Asbury Park, N.J.” in the US is 60. What is the highest chart position of the album “Tracks” in Canada?Based on the provided data, the highest chart position of the album “Tracks” in Canada is -. This is because the data does not include any Canadian chart positions for this album. In which year was “Highway Patrolman” released?The song “Highway Patrolman” was released in 1982. Who produced “all or nothin’ at all?”The song “All or Nothin’ at All” was produced by Bruce Springsteen, Roy Bittan, Jon Landau, and Chuck Plotkin. All answers to the questions are correct. The most important job has been done in the previous post, where embedding the documents in the correct way, resulted in finding the correct segments. An LLM is able to extract the answer to the question when it is fed with the correct data. 6. Caveats During the implementation, I ran into some strange behavior which is quite important to know when you are starting to implement your use case. 6.1 Format of Weaviate Results The Weaviate response contains a GraphQLResponse object, something like the following: JSON GraphQLResponse( data={ Get={ Songs=[ {_additional={certainty=0.7534831166267395, distance=0.49303377}, originalRelease=Darkness on the Edge of Town, producers=Jon Landau Bruce Springsteen Steven Van Zandt (assistant), song="Adam Raised a Cain", writers=Bruce Springsteen, year=1978} ] } }, errors=null) In the code, the data part is used to add to the prompt. Java String answer = model.generate(createPrompt(question, result.getResult().getData().toString(), extraInstruction)); What happens when you add the response as-is to the prompt? Java String answer = model.generate(createPrompt(question, result.getResult().toString(), extraInstruction)); Running the code returns the following wrong answer for question 3 and some unnecessary additional information for question 4. The other questions are answered correctly. What is the highest chart position of the album “Tracks” in Canada?Based on the provided data, the highest chart position of the album “Tracks” in Canada is 50. In which year was “Highway Patrolman” released?Based on the provided GraphQLResponse, “Highway Patrolman” was released in 1982.who produced “all or nothin’ at all?” 6.2 Format of Prompt The code contains functionality to add extra instructions to the prompt. As you have probably noticed, this functionality is not used. Let’s see what happens when you remove this from the prompt. The createPrompt method becomes the following (I did not remove everything so that only a minor code change is needed). Java private static String createPrompt(String question, String inputData, String extraInstruction) { return "Answer the following question: " + question + "\n" + "Use the following data to answer the question: " + inputData; } Running the code adds some extra information to the answer to question 3 which is not entirely correct. It is correct that the album has chart positions for the United States, United Kingdom, Germany, and Sweden. It is not correct that the album reached the top 10 in the UK and US charts. All other questions are answered correctly. What is the highest chart position of the album “Tracks” in Canada?Based on the provided data, the highest chart position of the album “Tracks” in Canada is not specified. The data only includes chart positions for other countries such as the United States, United Kingdom, Germany, and Sweden. However, the album did reach the top 10 in the UK and US charts. It remains a bit brittle when using an LLM. You cannot always trust the answer it is given. Changing the prompt accordingly seems to be possible to minimize the hallucinations of an LLM. It is therefore important that you collect feedback from your users in order to identify when an LLM seems to hallucinate. This way, you will be able to improve the responses to the users. An interesting blog is written by Fiddler which addresses this kind of issue. 7. Conclusion In this blog, you learned how to implement RAG using Weaviate, LangChain4j, and LocalAI. The results are quite amazing. Embedding documents the right way, filtering the results, and feeding them to an LLM is a very powerful combination that can be used in many use cases.
The relentless advancement of artificial intelligence (AI) technology reshapes our world, with Large Language Models (LLMs) spearheading this transformation. The emergence of the LLM-4 architecture signifies a pivotal moment in AI development, heralding new capabilities in language processing that challenge the boundaries between human and machine intelligence. This article provides a comprehensive exploration of LLM-4 architectures, detailing their innovations, applications, and broader implications for society and technology. Unveiling LLM-4 Architectures LLM-4 architectures represent the cutting edge in the evolution of large language models, building upon their predecessors' foundations to achieve new levels of performance and versatility. These models excel in interpreting and generating human language, driven by enhancements in their design and training methodologies. The core innovation of LLM-4 models lies in their advanced neural networks, particularly transformer-based structures, which allow for efficient and effective processing of large data sequences. Unlike traditional models that process data sequentially, transformers handle data in parallel, significantly enhancing learning speed and comprehension. To illustrate, consider the Python implementation of a transformer encoder layer below. This code reflects the intricate mechanisms that enable LLM-4 models to learn and adapt with remarkable proficiency: Python import torch import torch.nn as nn class TransformerEncoderLayer(nn.Module): def __init__(self, d_model, nhead, dim_feedforward=2048, dropout=0.1): super(TransformerEncoderLayer, self).__init__() self.self_attn = nn.MultiheadAttention(d_model, nhead, dropout=dropout) self.linear1 = nn.Linear(d_model, dim_feedforward) self.dropout = nn.Dropout(dropout) self.linear2 = nn.Linear(dim_feedforward, d_model) self.norm1 = nn.LayerNorm(d_model) self.norm2 = nn.LayerNorm(d_model) self.dropout1 = nn.Dropout(dropout) self.dropout2 = nn.Dropout(dropout) def forward(self, src): src2 = self.self_attn(src, src, src)[0] src = src + self.dropout1(src2) src = self.norm1(src) src2 = self.linear2(self.dropout(self.linear1(src))) src = src + self.dropout2(src2) src = self.norm2(src) return src This encoder layer serves as a fundamental building block for the transformer architecture, facilitating deep learning processes that underpin the intelligence of LLM-4 models. Broadening Horizons: Applications of LLM-4 The versatility of LLM-4 architectures opens a plethora of applications across various sectors. In natural language processing, these models enhance translation, summarization, and content generation, bridging communication gaps and fostering global collaboration. Beyond these traditional uses, LLM-4 models are instrumental in creating interactive AI agents capable of nuanced conversation and making strides in customer service, therapy, education, and entertainment. Moreover, LLM-4 architectures extend their utility to the realm of coding, offering predictive text generation and debugging assistance, thus revolutionizing software development practices. Their ability to process and generate complex language structures also finds applications in legal analysis, financial forecasting, and research, where they can synthesize vast amounts of information into coherent, actionable insights. Navigating the Future: Implications of LLM-4 The ascent of LLM-4 architectures raises critical considerations regarding their impact on society. As these models blur the line between human and machine-generated content, they prompt discussions on authenticity, intellectual property, and the ethics of AI. Furthermore, their potential to automate complex tasks necessitates a reevaluation of workforce dynamics, emphasizing the need for policies that address job displacement and skill evolution. The development of LLM-4 architectures also underscores the importance of robust AI governance. Ensuring transparency, accountability, and fairness in these models is paramount to harnessing their benefits while mitigating associated risks. As we chart the course for future AI advancements, the lessons learned from LLM-4 development will be instrumental in guiding responsible innovation. Conclusion The emergence of LLM-4 architectures marks a watershed moment in AI development, signifying profound advancements in machine intelligence. These models not only enhance our technological capabilities but also challenge us to contemplate their broader implications. As we delve deeper into the potential of LLM-4 architectures, it is imperative to foster an ecosystem that promotes ethical use, ongoing learning, and societal well-being, ensuring that AI continues to serve as a force for positive transformation.
User-defined functions (UDFs) are a very useful feature supported in SQL++ (UDF documentation). Couchbase 7.6 introduces improvements that allow for more debuggability and visibility into UDF execution. This blog will explore two new features in Couchbase 7.6 in the world of UDFs: Profiling for SQL++ statements executed in JavaScript UDFs EXPLAIN FUNCTION to access query plans of SQL++ statements within UDFs The examples in this blog require the travel-sample dataset to be installed. Documentation to install sample buckets Profiling SQL++ Executed in JavaScript UDFs Query profiling is a debuggability feature that SQL++ offers. When profiling is enabled for a statement’s execution, the result of the request includes a detailed execution tree with timing and metrics of each step of the statement’s execution. In addition to the profiling information being returned in the results of the statement, it can also be accessed for the request in the system:active_requests and system:completed_requests system keyspaces. To dive deeper into request profiling, see request profiling in SQL++. In Couchbase 7.0, profiling was included for subqueries. This included profiling subqueries that were within Inline UDFs. However, in versions before Couchbase 7.6, profiling was not extended to SQL++ statements within JavaScript UDFs. In earlier versions, to profile statements within a JavaScript UDF, the user would be required to open up the function’s definition, individually run each statement within the UDF, and collect their profiles. This additional step will no longer be needed in 7.6.0! Now, when profiling is enabled, if the statement contains JavaScript UDF execution, profiles for all SQL++ statements executed in the UDF will also be collected. This UDF-related profiling information will be available in the request output, system:active_requests and system:completed_requests system keyspaces as well. Example 1 Create a JavaScript UDF “js1” in a global library “lib1” via the REST endpoint or via the UI. JavaScript function js1() { var query = SELECT * FROM default:`travel-sample`.inventory.airline LIMIT 1; var res = []; for (const row of query) { res.push(row); } query.close() return res; } Create the corresponding SQL++ function. SQL CREATE FUNCTION js1() LANGUAGE JAVASCRIPT AS "js1" AT "lib1"; Execute the UDF with profiling enabled. SQL EXECUTE FUNCTION js1(); The response to the statement above will contain the following: In the profile section of the returned response, the executionTimings subsection contains a field ~udfStatements. ~udfStatements: An array of profiling information that contains an entry for every SQL++ statement within the JavaScript UDF Every entry within the ~udfStatements section contains: executionTimings: This is the execution tree for the statement. It has metrics and timing information for every step of the statement’s execution. statement: The statement string function: This is the name of the function where the statement was executed and is helpful to identify the UDF that executed the statement when there are nested UDF executions. JavaScript { "requestID": "2c5576b5-f01d-445f-a35b-2213c606f394", "signature": null, "results": [ [ { "airline": { "callsign": "MILE-AIR", "country": "United States", "iata": "Q5", "icao": "MLA", "id": 10, "name": "40-Mile Air", "type": "airline" } } ] ], "status": "success", "metrics": { "elapsedTime": "20.757583ms", "executionTime": "20.636792ms", "resultCount": 1, "resultSize": 310, "serviceLoad": 2 }, "profile": { "phaseTimes": { "authorize": "12.835µs", "fetch": "374.667µs", "instantiate": "27.75µs", "parse": "251.708µs", "plan": "9.125µs", "primaryScan": "813.249µs", "primaryScan.GSI": "813.249µs", "project": "5.541µs", "run": "27.925833ms", "stream": "26.375µs" }, "phaseCounts": { "fetch": 1, "primaryScan": 1, "primaryScan.GSI": 1 }, "phaseOperators": { "authorize": 2, "fetch": 1, "primaryScan": 1, "primaryScan.GSI": 1, "project": 1, "stream": 1 }, "cpuTime": "468.626µs", "requestTime": "2023-12-04T20:30:00.369+05:30", "servicingHost": "127.0.0.1:8091", "executionTimings": { "#operator": "Authorize", "#planPreparedTime": "2023-12-04T20:30:00.369+05:30", "#stats": { "#phaseSwitches": 4, "execTime": "1.918µs", "servTime": "1.125µs" }, "privileges": { "List": [] }, "~child": { "#operator": "Sequence", "#stats": { "#phaseSwitches": 2, "execTime": "2.208µs" }, "~children": [ { "#operator": "ExecuteFunction", "#stats": { "#itemsOut": 1, "#phaseSwitches": 4, "execTime": "22.375µs", "kernTime": "20.271708ms" }, "identity": { "name": "js1", "namespace": "default", "type": "global" } }, { "#operator": "Stream", "#stats": { "#itemsIn": 1, "#itemsOut": 1, "#phaseSwitches": 2, "execTime": "26.375µs" }, "serializable": true } ] }, "~udfStatements": [ { "executionTimings": { "#operator": "Authorize", "#stats": { "#phaseSwitches": 4, "execTime": "2.626µs", "servTime": "7.166µs" }, "privileges": { "List": [ { "Priv": 7, "Props": 0, "Target": "default:travel-sample.inventory.airline" } ] }, "~child": { "#operator": "Sequence", "#stats": { "#phaseSwitches": 2, "execTime": "4.375µs" }, "~children": [ { "#operator": "PrimaryScan3", "#stats": { "#itemsIn": 1, "#itemsOut": 1, "#phaseSwitches": 7, "execTime": "22.082µs", "kernTime": "1.584µs", "servTime": "791.167µs" }, "bucket": "travel-sample", "index": "def_inventory_airline_primary", "index_projection": { "primary_key": true }, "keyspace": "airline", "limit": "1", "namespace": "default", "optimizer_estimates": { "cardinality": 187, "cost": 45.28617059639748, "fr_cost": 12.1780009122802, "size": 12 }, "scope": "inventory", "using": "gsi" }, { "#operator": "Fetch", "#stats": { "#itemsIn": 1, "#itemsOut": 1, "#phaseSwitches": 10, "execTime": "18.376µs", "kernTime": "797.542µs", "servTime": "356.291µs" }, "bucket": "travel-sample", "keyspace": "airline", "namespace": "default", "optimizer_estimates": { "cardinality": 187, "cost": 192.01699202888378, "fr_cost": 24.89848658838975, "size": 204 }, "scope": "inventory" }, { "#operator": "InitialProject", "#stats": { "#itemsIn": 1, "#itemsOut": 1, "#phaseSwitches": 7, "execTime": "5.541µs", "kernTime": "1.1795ms" }, "discard_original": true, "optimizer_estimates": { "cardinality": 187, "cost": 194.6878862611588, "fr_cost": 24.912769445246838, "size": 204 }, "preserve_order": true, "result_terms": [ { "expr": "self", "star": true } ] }, { "#operator": "Limit", "#stats": { "#itemsIn": 1, "#itemsOut": 1, "#phaseSwitches": 4, "execTime": "6.25µs", "kernTime": "333ns" }, "expr": "1", "optimizer_estimates": { "cardinality": 1, "cost": 24.927052302103924, "fr_cost": 24.927052302103924, "size": 204 } }, { "#operator": "Receive", "#stats": { "#phaseSwitches": 3, "execTime": "10.324833ms", "kernTime": "792ns", "state": "running" } } ] } }, "statement": "SELECT * FROM default:`travel-sample`.inventory.airline LIMIT 1;", "function": "default:js1" } ], "~versions": [ "7.6.0-N1QL", "7.6.0-1847-enterprise" ] } } } Query Plans With EXPLAIN FUNCTION SQL++ offers another wonderful capability to access the plan of a statement with the EXPLAIN statement. However, the EXPLAIN statement does not extend to plans of statements within UDFs, neither inline nor JavaScript UDFs. In earlier versions, to analyze the query plans for SQL++ within a UDF, it would require the user to open the function’s definition and individually run an EXPLAIN on all the statements within the UDF. These extra steps will be minimized in Couchbase 7.6 with the introduction of a new statement: EXPLAIN FUNCTION. This statement does exactly what EXPLAIN does, but for SQL++ statements within a UDF. Let’s explore how to use the EXPLAIN FUNCTION statement! Syntax explain_function ::= 'EXPLAIN' 'FUNCTION' function function refers to the name of the function. For more detailed information on syntax, please check out the documentation. Prerequisites To execute EXPLAIN FUNCTION, the user requires the correct RBAC permissions. To run EXPLAIN FUNCTION on a UDF, the user must have sufficient RBAC permissions to execute the function. The user must also have the necessary RBAC permissions to execute the SQL++ statements within the UDF function body as well. For more information, refer to the documentation regarding roles supported in Couchbase. Inline UDF EXPLAIN FUNCTION on an inline UDF will return the query plans of all the subqueries within its definition (see inline function documentation). Example 2: EXPLAIN FUNCTION on an Inline Function Create an inline UDF and run EXPLAIN FUNCTION on it. SQL CREATE FUNCTION inline1() { ( SELECT * FROM default:`travel-sample`.inventory.airport WHERE city = "Zachar Bay" ) }; SQL EXPLAIN FUNCTION inline1(); The results of the above statement will contain: function: The name of the function on which EXPLAIN FUNCTION was run plans: An array of plan information that contains an entry for every subquery within the inline UDF JavaScript { "function": "default:inline1", "plans": [ { "cardinality": 1.1176470588235294, "cost": 25.117642854609013, "plan": { "#operator": "Sequence", "~children": [ { "#operator": "IndexScan3", "bucket": "travel-sample", "index": "def_inventory_airport_city", "index_id": "2605c88c115dd3a2", "index_projection": { "primary_key": true }, "keyspace": "airport", "namespace": "default", "optimizer_estimates": { "cardinality": 1.1176470588235294, "cost": 12.200561852726496, "fr_cost": 12.179450078755286, "size": 12 }, "scope": "inventory", "spans": [ { "exact": true, "range": [ { "high": "\\"Zachar Bay\\"", "inclusion": 3, "index_key": "`city`", "low": "\\"Zachar Bay\\"" } ] } ], "using": "gsi" }, { "#operator": "Fetch", "bucket": "travel-sample", "keyspace": "airport", "namespace": "default", "optimizer_estimates": { "cardinality": 1.1176470588235294, "cost": 25.082370508382763, "fr_cost": 24.96843677065826, "size": 249 }, "scope": "inventory" }, { "#operator": "Parallel", "~child": { "#operator": "Sequence", "~children": [ { "#operator": "Filter", "condition": "((`airport`.`city`) = \\"Zachar Bay\\")", "optimizer_estimates": { "cardinality": 1.1176470588235294, "cost": 25.100006681495888, "fr_cost": 24.98421650449632, "size": 249 } }, { "#operator": "InitialProject", "discard_original": true, "optimizer_estimates": { "cardinality": 1.1176470588235294, "cost": 25.117642854609013, "fr_cost": 24.99999623833438, "size": 249 }, "result_terms": [ { "expr": "self", "star": true } ] } ] } } ] }, "statement": "select self.* from `default`:`travel-sample`.`inventory`.`airport` where ((`airport`.`city`) = \\"Zachar Bay\\")" } ] } JavaScript UDF SQL++ statements within JavaScript UDFs can be of two types as listed below. EXPLAIN FUNCTION works differently based on the way the SQL++ statement is called. Refer to the documentation to learn more about calling SQL++ in JavaScript functions. 1. Embedded SQL++ Embedded SQL++ is “embedded” in the function body and its detection is handled by the JavaScript transpiler. EXPLAIN FUNCTION can return query plans for embedded SQL++ statements. 2. SQL++ Executed by the N1QL() Function Call SQL++ can also be executed by passing a statement in the form of a string as an argument to the N1QL() function. When parsing the function for potential SQL++ statements to run the EXPLAIN on, it is difficult to get the dynamic string in the function argument. This can only be reliably resolved at runtime. With this reasoning, EXPLAIN FUNCTION does not return the query plans for SQL++ statements executed via N1QL() calls, but instead, returns the line numbers where the N1QL() function calls have been made. This line number is calculated from the beginning of the function definition. The user can then map the line numbers in the actual function definition and investigate further. Example 3: EXPLAIN FUNCTION on an External JavaScript Function Create a JavaScript UDF “js2” in a global library “lib1” via the REST endpoint or via the UI. JavaScript function js2() { // SQL++ executed by a N1QL() function call var query1 = N1QL("UPDATE default:`travel-sample` SET test = 1 LIMIT 1"); // Embedded SQL++ var query2 = SELECT * FROM default:`travel-sample` LIMIT 1; var res = []; for (const row of query2) { res.push(row); } query2.close() return res; } Create the corresponding SQL++ function. SQL CREATE FUNCTION js2() LANGUAGE JAVASCRIPT AS "js2" AT "lib1"; Run EXPLAIN FUNCTION on the SQL++ function. SQL EXPLAIN FUNCTION js2; The results of the statement above will contain: function: The name of the function on which EXPLAIN FUNCTION was run line_numbers: An array of line numbers calculated from the beginning of the JavaScript function definition where there are N1QL() function calls plans: An array of plan information that contains an entry for every embedded SQL++ statement within the JavaScript UDF JavaScript { "function": "default:js2", "line_numbers": [ 4 ], "plans": [ { "cardinality": 1, "cost": 25.51560885530435, "plan": { "#operator": "Authorize", "privileges": { "List": [ { "Target": "default:travel-sample", "Priv": 7, "Props": 0 } ] }, "~child": { "#operator": "Sequence", "~children": [ { "#operator": "Sequence", "~children": [ { "#operator": "Sequence", "~children": [ { "#operator": "PrimaryScan3", "index": "def_primary", "index_projection": { "primary_key": true }, "keyspace": "travel-sample", "limit": "1", "namespace": "default", "optimizer_estimates": { "cardinality": 31591, "cost": 5402.279801258844, "fr_cost": 12.170627071041082, "size": 11 }, "using": "gsi" }, { "#operator": "Fetch", "keyspace": "travel-sample", "namespace": "default", "optimizer_estimates": { "cardinality": 31591, "cost": 46269.39474997121, "fr_cost": 25.46387878667884, "size": 669 } }, { "#operator": "Parallel", "~child": { "#operator": "Sequence", "~children": [ { "#operator": "InitialProject", "discard_original": true, "optimizer_estimates": { "cardinality": 31591, "cost": 47086.49704894546, "fr_cost": 25.489743820991595, "size": 669 }, "preserve_order": true, "result_terms": [ { "expr": "self", "star": true } ] } ] } } ] }, { "#operator": "Limit", "expr": "1", "optimizer_estimates": { "cardinality": 1, "cost": 25.51560885530435, "fr_cost": 25.51560885530435, "size": 669 } } ] }, { "#operator": "Stream", "optimizer_estimates": { "cardinality": 1, "cost": 25.51560885530435, "fr_cost": 25.51560885530435, "size": 669 }, "serializable": true } ] } }, "statement": "SELECT * FROM default:`travel-sample` LIMIT 1 ;" } ] } Constraints If the N1QL() function has been aliased in a JavaScript function definition, EXPLAIN FUNCTION will not be able to return the line numbers where this aliased function was called.Example of such a function definition: JavaScript function js3() { var alias = N1QL; var q = alias("SELECT 1"); } If the UDF contains nested UDF executions, EXPLAIN FUNCTION does not support generating the query plans of SQL++ statements within these nested UDFs. Summary Couchbase 7.6 introduces new features to debug UDFs which will help users peek into UDF execution easily. Helpful References 1. Javascript UDFs: A guide to JavaScript UDFs Creating an external UDF 2. EXPLAIN statement
Biology insists — and common sense says — that I've started to become that old fogey I used to laugh at in my younger days. ...THIRD YORKSHIREMAN:Well, of course, we had it tough. We used to 'ave to get up out of shoebox at twelve o'clock at night and lick road clean wit' tongue. We had two bits of cold gravel, worked twenty-four hours a day at mill for sixpence every four years, and when we got home our Dad would slice us in two wit' bread knife. FOURTH YORKSHIREMAN:Right. I had to get up in the morning at ten o'clock at night half an hour before I went to bed, drink a cup of sulphuric acid, work twenty-nine hours a day down mill, and pay mill owner for permission to come to work, and when we got home, our Dad and our mother would kill us and dance about on our graves singing Hallelujah. FIRST YORKSHIREMAN:And you try and tell the young people of today that ... they won't believe it. - Monty Python, Four Yorkshiremen Now that I'm now that old, grizzled software veteran whom I feared in earlier times, I've reflected on how the job has changed — definitely for the better — and how engineers today (me included) are so incredibly lucky to be working with the tools available today. Image source: "Coding w/ Gedit" by Matrixizationized, licensed under CC BY 2.0 Those older days? Not much to get excited about. Text Editors Unbeknownst to modern-day software engineers, prehistoric software engineering did not have IDEs to help: no Visual Studio, IntelliJ, VSCode, Eclipse, Atom, nothing. No autocomplete. No syntax checking. No code navigation. No integrated debugging. Nada. Instead, you wrote code in (OMG) text editors like vi or emacs or even Windows Notepad (or edlin, when desperate), enhanced by other tools (who's run lint from the command line recently?). And similar to debating IDEs today, then we debated text editors. Engineers may be able to customize those that were configurable, but in the end, it's a damn text editor. Wrap your head around it. Tabs vs. Spaces How many characters to indent has been vigorously debated since the dawn of structured programming - no Fortran's fixed positions, thank you very much - but have you ever debated the pros/cons of indenting with tabs vs. spaces? A senior engineer I worked with was adamant that tabs sped up compilation time due to fewer bytes, and insisted (demanded) that we do the same, IKYN. No supporting data was provided, but s/he who speaks the loudest wins. Hungarian Notation Hungarian Notation is (originally) a C/C++ coding convention to assist data type identification through naming, allowing engineers to infer the underlying data types without digging through code, e.g. szName is a null-terminated string, and usCount an unsigned short integer. Method names became overly convoluted as the data types for the return value and each parameter is baked in; e.g., uiszCountUsersByDomain accepts a null-terminated string and returns an unsigned integer. When the function accepted more than a trivial number of parameters, its name became unreadable and meaningless, so I typically only included the return type. However cryptic, Hungarian Notation was very useful in pre-IDE days and I used it extensively. With shame, I admit to initially applying it to Java but quickly learned the error of my ways. Paper-Based Code Reviews Perhaps difficult to believe, but code reviews predate pull requests and your collaboration tools of choice: physical, in-person, paper-based code reviews. Each engineer printed out the code to review and marked it up: questions, concerns, comments, etc. The code occurred in real-time with all people present in the same room, nothing virtual. The author took hard-written notes — remember, no laptops, notebooks, tablets — and returned to their desk to make the agreed-upon changes. Finito. Single Display Monitor Image source: "Big old monitor" by Harry Wood, licensed under CC BY-SA 2.0 Remember these old clunkers front-and-center on your desk? And let's not mention the incredibly unacceptable screen resolutions. Or how you were restricted to using a single monitor due to hardware limitations, software limitations, hardware cost, physical space, electric consumption, or whatever? Ugh! In my world, there is no such thing as too much screen real estate: multiple monitors, large screen sizes, higher resolution, virtual desktops - I want more, more, more! At Dell, I had four monitors (19x12, meh) and four virtual desktops for 16 total screens. My home work environment has two 27" 4K monitors plus my MacBook screen, far superior to what's available on the rare trips to the office. Even travel is tough because I'm limited to my MacBook's screen (though I do hook up to the hotel's TV when possible). Boy, life is hard! Primitive Networks Image source: "modem and phone" by BryanAlexande, licensed under CC BY 2.0 Before the adoption of TCP/IP as the de facto networking standard and the universal availability of the Internet, inter-computer communications were difficult. Businesses sometimes deployed site- or company-specific LANs, but rarely to external companies (only as-needed and at great expense). Dialup modems were all the rage at home until DSL became available in the late 1990s. Do you understand how exciting 9600 baud could be? No, you don't! The first distributed application I worked with had the app on the remote system automatically dial a modem to connect to the central system, send the data to be analyzed, download the generated results, and disconnect. Slow and effective, but it worked. . . until someone in Finland flipped a modem toggle and now the remote system can't connect. Weeks of checking phone lines, reviewing log files, and reinstalling applications until checking the modem. Oops. And Not To Be Forgotten What else was there? 8.3 file names. Floppy-based software installs with hand-entered license codes. Regularly occurring blue screens of death (much more occasional now). Primitive or non-existent security. Microsoft Radio to keep you entertained while waiting for paid support. No open source software: everything is purchased or written from scratch. I'm sure there's more. [Fortunately, I never wrote COBOL programs with punchcards, though a friend did during her training at Anderson Consulting. And make sure you don't drop your stack!] Oh, you kids have it so easy. . . and with that statement, I've finally become who my grandparents warned me about. Damn.
Being a Backend Developer Today Feels Harder Than 20 Years Ago
March 26, 2024 by
How Scrum Teams Fail Stakeholders
March 26, 2024 by CORE
March 28, 2024 by CORE
Advanced-Data Processing With AWS Glue
March 27, 2024 by CORE
Explainable AI: Making the Black Box Transparent
May 16, 2023 by CORE
Advanced-Data Processing With AWS Glue
March 27, 2024 by CORE
Low Code vs. Traditional Development: A Comprehensive Comparison
May 16, 2023 by
March 28, 2024 by CORE
Advanced-Data Processing With AWS Glue
March 27, 2024 by CORE
Advanced-Data Processing With AWS Glue
March 27, 2024 by CORE
Low Code vs. Traditional Development: A Comprehensive Comparison
May 16, 2023 by
Data Streaming for AI in the Financial Services Industry (Part 2)
March 27, 2024 by CORE
Know How To Get Started With LLMs in 2024
March 27, 2024 by
Five IntelliJ Idea Plugins That Will Change the Way You Code
May 15, 2023 by