What is the developer experience like at your organization? Do you have a platform engineering team? We want to hear from you!
Stop letting outdated access processes hold your team back. Learn how to build smarter, faster, and more secure DevOps infrastructures.
Beyond Sequential: Why Rust's Threading Model Changed How I Think About Concurrent Programming
Router4j: A Free Alternative to Google Maps for Route and Distance Calculation
Observability and Performance
The dawn of observability across the software ecosystem has fully disrupted standard performance monitoring and management. Enhancing these approaches with sophisticated, data-driven, and automated insights allows your organization to better identify anomalies and incidents across applications and wider systems. While monitoring and standard performance practices are still necessary, they now serve to complement organizations' comprehensive observability strategies. This year's Observability and Performance Trend Report moves beyond metrics, logs, and traces — we dive into essential topics around full-stack observability, like security considerations, AIOps, the future of hybrid and cloud-native observability, and much more.
Java Application Containerization and Deployment
Software Supply Chain Security
Traditional internal developer platforms (IDPs) have transformed how organizations manage code and infrastructure. By standardizing workflows through tools like CI/CD pipelines and Infrastructure as Code (IaC), these platforms have enabled rapid deployments, reduced manual errors, and improved developer experience. However, their focus has primarily been on operational efficiency, often treating data as an afterthought. This omission becomes critical in today's AI-driven landscape. While traditional IDPs excel at managing infrastructure, they fall short when it comes to the foundational elements required for scalable and compliant AI innovation: Governance: Ensuring data complies with policies and regulatory standards is often a manual or siloed effort.Traceability: Tracking data lineage and transformations across workflows is inconsistent, if not entirely missing.Quality: Validating data for reliability and AI readiness lacks automation and standardization. To meet these challenges, data must be elevated to a first-class citizen within the IDP. A data-first IDP goes beyond IaC, directly embedding governance, traceability, quality, and Policy as Code (PaC) into the platform's core. This approach transforms traditional automation into a comprehensive framework that operationalizes data workflows alongside infrastructure, enabling Data Products as Code (DPaC). This architecture supports frameworks like the Open Data Product Specification (ODPS) and the Open Data Contract (ODC), which standardize how data products are defined and consumed. While resource identifiers (RIDs) are critical in enabling traceability and interoperability, the heart of the data-first IDP lies in meta-metadata, which provides the structure, rules, and context necessary for scalable and compliant data ecosystems. What (declarative definitions) The Data-First Approach: Extending Automation Templates and recipes are critical technologies that enable the IDP to achieve a high level of abstraction and componentize the system landscape. A recipe is a parameterized configuration, IaC, that defines how specific resources or workloads are provisioned, deployed, or managed within the platform. Recipes are customized and reusable to fit particular contexts or environments, ensuring standardization while allowing flexibility for specific use cases. A template is a group of recipes forming a "Golden Path" for developers. An architectural design pattern, such as a data ingestion pattern, for either Streaming, API, or File, the template creates a manifest, which is built, validated, and executed in the delivery plane. A Data-First IDP adds the "Data Product" specification as a component, a resource, and, therefore, a recipe to the IDP; this could be a parameterized version of the ODPS and ODC. The lifecycle and management of software are far more mature than that of data. The concept of a DPaC goes a long way toward changing this; it aligns the maturity of data management with the well-established principles of software engineering. DPaC transforms data management by treating data as a programmable, enforceable asset, aligning its lifecycle with proven software development practices. By bridging the maturity gap between data and software, DPaC empowers organizations to scale data-driven operations with confidence, governance, and agility. As IaC revolutionized infrastructure, DPaC is poised to redefine how we manage and trust our data. The Data Marketplace, discussed in the previous article, is a component, a resource, and a recipe, which may rely on other services such as observability, a data quality service, and a graph database, which are also components and part of the CI/CD pipeline. Governance and Engineering Baseline Governance and engineering baselines can be codified into policies that are managed, versioned, and enforced programmatically through PaC. By embedding governance rules and engineering standards into machine-readable formats (e.g., YAML, JSON, Rego), compliance is automated, and consistency across resources. Governance policies: Governance rules define compliance requirements, access controls, data masking, retention policies, and more. These ensure that organizational and regulatory standards are consistently applied.Engineering baselines: Baselines establish the minimum technical standards for infrastructure, applications, and data workflows, such as resource configurations, pipeline validation steps, and security protocols. The Role of RIDs While meta-metadata drives the data-first IDP, RIDs operationalize its principles by providing unique references for all data-related resources. RIDs ensure the architecture supports traceability, quality, and governance across the ecosystem. Facilitating lineage: RIDs are unique references for data products, storage, and compute resources, allowing external tools to trace dependencies and transformations.Simplifying observability: This allows objects to be tracked across the landscape. Example RID Format rid:<context>:<resource-type>:<resource-name>:<version> Data product RID: rid:customer-transactions:data-product:erp-a:v1.0Storage RID: rid:customer-transactions:storage:s3-bucket-a:v1.0 Centralized Management and Federated Responsibility With Community Collaboration A data-first IDP balances centralized management, federated responsibility, and community collaboration to create a scalable, adaptable, and compliant platform. Centralized governance provides the foundation for consistency and control, while federated responsibility empowers domain teams to innovate and take ownership of their data products. Integrating a community-driven approach results in a dynamically evolving framework to meet real-world needs, leveraging collective expertise to refine policies, templates, and recipes. Centralized Management: A Foundation for Consistency Centralized governance defines global standards, such as compliance, security, and quality rules, and manages critical infrastructure like unique RIDs and metadata catalogs. This layer provides the tools and frameworks that enable decentralized execution. Standardized Policies Global policies are codified using PaC and integrated into workflows for automated enforcement. Federated Responsibility: Shift-Left Empowerment Responsibility and accountability are delegated to domain teams, enabling them to customize templates, define recipes, and manage data products closer to their sources. This shift-left approach ensures compliance and quality are applied early in the lifecycle while maintaining flexibility: Self-service workflows: Domain teams use self-service tools to configure resources, with policies applied automatically in the background.Customization within guardrails: Teams can adapt central templates and policies to fit their context, such as extending governance rules for domain-specific requirements.Real-time validation: Automated feedback ensures non-compliance is flagged early, reducing errors and fostering accountability. Community Collaboration: Dynamic and Adaptive Governance The environment encourages collaboration to evolve policies, templates, and recipes based on real-world needs and insights. This decentralized innovation layer ensures the platform remains relevant and adaptable: Contributions and feedback: Domain teams contribute new recipes or propose policy improvements through version-controlled repositories or pull requests.Iterative improvement: Cross-domain communities review and refine contributions, ensuring alignment with organizational goals.Recognition and incentives: Teams are incentivized to share best practices and reusable artifacts, fostering a culture of collaboration. Automation as the Enabler Automation ensures that governance and standards are consistently applied across the platform, preventing deviation over time. Policies and RIDs are managed programmatically, enabling: Compliance at scale: New policies are integrated seamlessly, validated early, and enforced without manual intervention.Measurable outcomes Extending the Orchestration and Adding the Governance Engine A data-first IDP extends the orchestration engine to automate data-centric workflows and introduces a governance engine to enforce compliance and maintain standards dynamically. Orchestration Enhancements Policy integration: Validates governance rules (PaC) during workflows, blocking non-compliant deployments.Resource awareness: Uses RIDs to trace and enforce lineage, quality, and compliance Data automation: Automates schema validation, metadata enrichment, and lineage registration. Governance Engine Centralized policies: Defines compliance rules as PaC and applies them automatically.Dynamic enforcement: Monitors and remediates non-compliance, preventing drift from standards.Real-time feedback: Provides developers with actionable insights during deployment. Together, these engines ensure proactive compliance, scalability, and developer empowerment by embedding governance into workflows, automating traceability, and maintaining standards over time. The Business Impact Governance at scale: Meta-metadata and ODC ensure compliance rules are embedded and enforced across all data products.Improved productivity: Golden paths reduce cognitive load, allowing developers to deliver faster without compromising quality or compliance.Trust and transparency: ODPS and RIDs ensure that data products are traceable and reliable, fostering stakeholder trust.AI-ready ecosystems: The framework enables reliable AI model training and operationalization by reducing data prep and commoditizing data with all the information that adds value and resilience to the solution. The success of a data-first IDP hinges on meta-metadata, which provides the foundation for governance, quality, and traceability. Supported by frameworks like ODPS and ODC and operationalized through RIDs, this architecture reduces complexity for developers while meeting the business's needs for scalable, compliant data ecosystems. The data-first IDP is ready to power the next generation of AI-driven innovation by embedding smart abstractions and modularity.
When creating a new app or service, what begins as learning just one new tool can quickly turn into needing a whole set of tools and frameworks. For Python devs, jumping into HTML, CSS, and JavaScript to build a usable app can be daunting. For web devs, many Python-first backend tools work in JavaScript but are often outdated. You’re left with a choice: Stick with JavaScript or switch to Python for access to the latest features. FastHTML bridges the gap between these two groups. For Python devs, it makes creating a web app straightforward — no JavaScript required! For web devs, it makes creating a Python app quick and easy, with the option to extend using JavaScript — you’re not locked in. As a web developer, I’m always looking for ways to make Python dev more accessible. So, let’s see how quickly we can build and deploy a FastHTML app. I’ll follow the image generation tutorial and then deploy it to Heroku. Let’s go! Intro to FastHTML Never heard of FastHTML before? Here’s how FastHTML describes itself: FastHTML is a new next-generation web framework for fast, scalable web applications with minimal, compact code. It’s designed to be: Powerful and expressive enough to build the most advanced, interactive web apps you can imagine.Fast and lightweight, so you can write less code and get more done.Easy to learn and use, with a simple, intuitive syntax that makes it easy to build complex apps quickly. FastHTML promises to enable you to generate usable, lightweight apps quickly. Too many web apps are bloated and heavy, requiring a lot of processing and bandwidth for simple tasks. Most web apps just need something simple, beautiful, and easy to use. FastHTML aims to make that task easy. You may have heard of FastAPI, designed to make creating APIs with Python a breeze. FastHTML is inspired by FastAPI’s philosophy, seeking to do the same for front-end applications. Opinionated About Simplicity and Ease of Use Part of the FastHTML vision is to “make it the easiest way to create quick prototypes, and also the easiest way to create scalable, powerful, rich applications.” As a developer tool, FastHTML seems to be opinionated about the right things — simplicity and ease of use without limiting you in the future. FastHTML gets you up and running quickly while also making it easy for your users. It does this by selecting key core technologies such as ASGI and HTMX. The 'foundations page' from FastHTML introduces these technologies and gives the basics (though you don’t need to know about these to get started). Get Up and Running Quickly The tutorials from FastHTML offer several examples of different apps, each with its own use case. I was curious about the Image Generation App tutorial and wanted to see how quickly I could get a text-to-image model into a real, working app. The verdict? It was fast. Really fast. In less than 60 lines of code, I created a fully functioning web app where a user can type in a prompt and receive an image from the free Pollinations text-to-image model. Here’s a short demo of the tutorial app: In this tutorial app, I got a brief glimpse of the power of FastHTML. I learned how to: Submit data through a formInteract with external APIsDisplay some loading text while waiting What’s impressive is that it only took one tiny Python file to complete this, and the final app is lightweight and looks good. Here’s the file I ended up with: Python from fastcore.parallel import threaded from fasthtml.common import * import os, uvicorn, requests, replicate from PIL import Image app = FastHTML(hdrs=(picolink,)) # Store our generations generations = [] folder = f"gens/" os.makedirs(folder, exist_ok=True) # Main page @app.get("/") def home(): inp = Input(id="new-prompt", name="prompt", placeholder="Enter a prompt") add = Form(Group(inp, Button("Generate")), hx_post="/", target_id='gen-list', hx_swap="afterbegin") gen_list = Div(id='gen-list') return Title('Image Generation Demo'), Main(H1('Magic Image Generation'), add, gen_list, cls='container') # A pending preview keeps polling this route until we return the image preview def generation_preview(id): if os.path.exists(f"gens/{id}.png"): return Div(Img(src=f"/gens/{id}.png"), id=f'gen-{id}') else: return Div("Generating...", id=f'gen-{id}', hx_post=f"/generations/{id}", hx_trigger='every 1s', hx_swap='outerHTML') @app.post("/generations/{id}") def get(id:int): return generation_preview(id) # For images, CSS, etc. @app.get("/{fname:path}.{ext:static}") def static(fname:str, ext:str): return FileResponse(f'{fname}.{ext}') # Generation route @app.post("/") def post(prompt:str): id = len(generations) generate_and_save(prompt, id) generations.append(prompt) clear_input = Input(id="new-prompt", name="prompt", placeholder="Enter a prompt", hx_swap_oob='true') return generation_preview(id), clear_input # URL (for image generation) def get_url(prompt): return f"https://image.pollinations.ai/prompt/{prompt.replace(' ', '%20')}?model=flux&width=1024&height=1024&seed=42&nologo=true&enhance=true" @threaded def generate_and_save(prompt, id): full_url = get_url(prompt) Image.open(requests.get(full_url, stream=True).raw).save(f"{folder}/{id}.png") return True if __name__ == '__main__': uvicorn.run("app:app", host='0.0.0.0', port=int(os.getenv("PORT", default=5000))) Looking for more functionality? The tutorial continues, adding some CSS styling, user sessions, and even payment tracking with Stripe. While I didn’t go through it all the way, the potential is clear: lots of functionality and usability without a lot of boilerplate or using both Python and JavaScript. Deploy Quickly to Heroku Okay, so now that I have a pure Python app running locally, what do I need to do to deploy it? Heroku makes this easy. I added a single file called Procfile with just one line in it: Shell web: python app.py This simple text file tells Heroku how to run the app. With the Procfile in place, I can use the Heroku CLI to create and deploy my app. And it’s fast… from zero to done in less than 45 seconds. With two commands, I created my project, built it, and deployed it to Heroku. And let’s just do a quick check. Did it actually work? And it’s up for the world to see! Conclusion When I find a new tool that makes it easier and quicker to build an app, my mind starts spinning with the possibilities. If it’s that easy, then maybe next time I need to spin up something, I can do it this way and integrate it with this tool and that other thing. So much of programming is assembling the right tools for the job. FastHTML has opened the door to a whole set of Python-based applications for me, and Heroku makes it easy to get those apps off my local machine and into the world. That said, several of the foundations of FastHTML are new to me, and I look forward to understanding them more deeply as I use it more. I hope you have fun with FastHTML and Heroku! Happy coding!
Snowflake Cortex enables seamless integration of Generative AI (GenAI) capabilities within the Snowflake Data Cloud. It allows organizations to use pre-trained large language models (LLMs) and create applications for tasks like content generation, text summarization, sentiment analysis, and conversational AI — all without managing external ML infrastructure. Prerequisites for Snowflake Cortex Setup Snowflake Environment Enterprise Edition or higher is required as a baseline for using advanced features like External Functions and Snowpark. Cortex Licensing Specific License: Snowflake Cortex requires an additional license or subscription. Ensure you have the Cortex license as part of your Snowflake. External Integration and Data Preparation Set up secure API access to LLMs (e.g., OpenAI or Hugging Face) for embedding and text generation.Prepare clean data in Snowflake tables and configure networking for secure external function calls. Key Features of Snowflake Cortex for GenAI Pre-Trained LLMs Access to pre-trained models for text processing and generation, like OpenAI’s GPT models or Snowflake's proprietary embeddings. Text Embeddings Generate high-dimensional vector embeddings from textual data for semantic search, clustering, and contextual understanding. Vector Support Native VECTOR data type to store embeddings, perform similarity comparisons, and optimize GenAI applications. Integration With SQL Leverage Cortex functions (e.g., EMBEDDINGS, MATCH, MATCH_SCORE) directly in SQL queries. Use Case: Build a Product FAQ Bot With GenAI Develop a GenAI-powered bot to answer product-related questions using Snowflake Cortex. Step 1: Create a Knowledge Base Table Start by storing your FAQs in Snowflake. SQL CREATE OR REPLACE TABLE product_faq ( faq_id INT, question STRING, answer STRING, question_embedding VECTOR(768) ); Step 2: Insert FAQ Data Populate the table with sample questions and answers. SQL INSERT INTO product_faq (faq_id, question, answer) VALUES (1, 'How do I reset my password?', 'You can reset your password by clicking "Forgot Password" on the login page.'), (2, 'What is your return policy?', 'You can return products within 30 days of purchase with a receipt.'), (3, 'How do I track my order?', 'Use the tracking link sent to your email after placing an order.'); Step 3: Generate Question Embeddings Generate vector embeddings for each question using Snowflake Cortex. SQL UPDATE product_faq SET question_embedding = EMBEDDINGS('cortex_default', question); What this does is: Converts the question into a 768-dimensional vector using Cortex’s default LLM.Stores the vector in the question_embedding column. Step 4: Query for Answers Using Semantic Search When a user asks a question, match it to the most relevant FAQ in the database. SQL SELECT question, answer, MATCH_SCORE(question_embedding, EMBEDDINGS('cortex_default', 'How can I reset my password?')) AS relevance FROM product_faq ORDER BY relevance DESC LIMIT 1; Explanation The user’s query ('How can I reset my password?') is converted into a vector.MATCH_SCORE calculates the similarity between the query vector and FAQ embeddings.Returns the most relevant answer. Step 5: Automate Text Generation Use GenAI capabilities to auto-generate answers for uncovered queries. SQL SELECT GENERATE_TEXT('cortex_default', 'How do I update my email address?') AS generated_answer; What this does is: Generates a text response for the query using the cortex_default LLM.Can be stored back in the FAQ table for future use. Advanced Use Cases Document Summarization Summarize lengthy product manuals or policy documents for quick reference. SQL SELECT GENERATE_TEXT('cortex_default', 'Summarize: Return policy allows refunds within 30 days...') AS summary; Personalized Recommendations Combine vector embeddings with user preferences to generate personalized product recommendations. SQL SELECT product_name, MATCH_SCORE(product_embedding, EMBEDDINGS('cortex_default', 'Looking for lightweight gaming laptops')) AS relevance FROM product_catalog ORDER BY relevance DESC LIMIT 3; Chatbot Integration Integrate Cortex-powered GenAI into chat applications using frameworks like Streamlit or API connectors. Best Practices Optimize Embedding Generation Use cleaned, concise text to improve embedding quality.Preprocess input text to remove irrelevant data. Use VECTOR Indexes Speed up similarity searches for large datasets: SQL CREATE VECTOR INDEX faq_index USING cortex_default ON product_faq (question_embedding) Monitor Model Performance Track MATCH_SCORE to assess query relevance.Fine-tune queries or improve data quality for low-confidence results. Secure Sensitive Data Limit access to tables and embeddings containing sensitive or proprietary information. Batch Processing for Scalability Process embeddings and queries in batches for high-volume use cases. Benefits of Snowflake Cortex for GenAI No Infrastructure Overhead Use pre-trained LLMs directly within Snowflake without managing external systems. Seamless Integration Combine GenAI capabilities with Snowflake’s data analytics features. Scalability Handle millions of embeddings or GenAI tasks with Snowflake’s scalable architecture. Flexibility Build applications like chatbots, recommendation engines, and content generators. Cost-Effective Leverage on-demand GenAI capabilities without investing in separate ML infrastructure. Next Steps Extend: Add advanced use cases like multi-lingual support or real-time chat interfaces.Explore: Try other Cortex features like clustering, sentiment analysis, and real-time text generation.Integrate: Use external tools like Streamlit or Flask to build user-facing applications. Snowflake Cortex makes it easy to bring the power of GenAI into your data workflows. Whether you’re building a chatbot, summarizing text, or creating personalized recommendations, Cortex provides a seamless, scalable platform to achieve your goals.
GenAI Logic using ApiLogicServer has recently introduced a workflow integration using the n8n.io. The tool has over 250 existing integrations and the developer community supplies prebuilt solutions called templates (over 1000) including AI integrations to build chatbots. GenAI Logic can build the API transaction framework from a prompt and use natural language rules (and rule suggestions) to help get the user started on a complete system. Eventually, most systems require additional tooling to support features like email, push notifications, payment systems, or integration into corporate data stores. While ApiLogicServer is an existing API platform, writing 250 integration endpoints with all the nuances of security, transformations, logging, and monitoring — not to mention the user interface — would require a huge community effort. ApiLogicServer found the solution with n8n.io (one of many workflow engines on the market). What stands out is that n8n.io offers a community version using a native Node.js solution for local testing (npx n8n) as well as a hosted cloud version. N8N Workflow In n8n, you create a Webhook from ApiLogicServer object which creates a URL that can accept an HTTP GET, POST, PUT, or DELETE, with added basic authentication (user: admin, password: p) to test the webhook. The Convert to JSON block provides a transformation of the body (a string) into a JSON object using JavaScript. The Switch block allows routing based on different JSON payloads. The If Inserted block decides if the Employee was an insert or update (which is passed in the header). The SendGrid blocks register a SendGrid API key and format an email to send (selecting the email from the JSON using drag-n-drop). Finally, the Respond to Webhook returns a status code of 200 to the ApiLogicServer event. Employees, Customers, and Orders are all sent to the same Webhook Configuration There are two parts to the configuration. The first is the installation of the workflow engine n8n.io (either on-premise, Docker, or cloud), and then the creation of the webhook object in the workflow diagram (http://localhost:5678). This will generate a unique name and path that is passed to the ApiLogicServer project in the config/config.py directory; in this example, a simple basic authorization (user/password). Note: In an ApiLogicServer project integration/n8n folder, this sample JSON file is available to import this example into your own n8n project! Webhook Output ApiLogicServer Logic and Webhook The real power of this is the ability to add a business logic rule to trigger the webhook, adding some configuration information (n8n server, port, key, and path plus authorization). So the actual rule (after_flush_row_event) is called anytime an insert event occurs on an API endpoint. The actual implementation is simply a call to the Python code to post the payload (e.g., requests.post(url=n8n_webhook_url, json=payload, headers=headers)). Configuration to call n8n webhook config/config.py: Python wh_scheme = "http" wh_server = "localhost" # or cloud.n8n.io... wh_port = 5678 wh_endpoint = "webhook-test" # from n8n Webhook URL wh_path = "002fa0e8-f7aa-4e04-b4e3-e81aa29c6e69" # from n8n Webhook URL token = "YWRtaW46cA==" #base64 encode of user/pasword admin:p N8N_PRODUCER = {"authorization": f"Basic {token}", "n8n_url": \ f'"{wh_scheme}://{wh_server}:{wh_port}/{wh_endpoint}/{wh_path}"'} # Or enter the n8n_url directly: N8N_PRODUCER = {"authorization": f"Basic \ {token}","n8n_url":"http://localhost:5678/webhook-test/002fa0e8-f7aa-4e04-b4e3-e81aa29c6e69"} #N8N_PRODUCER = None # comment out to enable N8N producer Call a business rule (after_flush_row_event) on the API entity: Python def call_n8n_workflow(row: Employee, old_row: Employee, logic_row: LogicRow): """ Webhook Workflow: When Employee is inserted = post to n8n webhook """ if logic_row.is_inserted(): status = send_n8n_message(logic_row=logic_row) logic_row.log(status) Rule.after_flush_row_event(on_class=models.Emploee, calling=call_n8n_workflow) Declarative Logic (Rules) ApiLogicServer is an open-source platform based on SQLAlchemy ORM and Flask. The SQLAlchemy provides a hook (before flush) that allows LogicBank (another open-source tool) to let developers declare "rules." These rules fall into 3 categories: derivations, constraints, and events. Derivations are similar to spreadsheet rules in that they operate on a selected column (cell): formula, sums, counts, and copy. Constraints operate on the API entity to validate the row and will roll back a multi-table event if the constraint test does not pass. Finally, the events (early, row, commit, and flush) allow the developer to call "user-defined functions" to execute code during the lifecycle of the API entity. The WebGenAI feature (a chatbot to build applications) was trained on these rules to use natural language prompts (this can also be done in the IDE using Copilot). Notice that the rules are declared and unordered. New rules can be added or changed and are not actually processed until the state change of the API or attribute is detected. Further, these rules can impact other API endpoints (e.g., sums, counts, or formula) which in turn can trigger constraints and events. Declarative rules can easily be 40x more concise than code. Natural language rules generated by WebGenAI: Python Use LogicBank to enforce the Check Credit requirement: 1. The Customer's balance is less than the credit limit 2. The Customer's balance is the sum of the Order amount_total where date_shipped is null 3. The Order's amount_total is the sum of the Item amount 4. The Item amount is the quantity * unit_price 5. The Item unit_price is copied from the Product unit_price Becomes these Rules logic/declary_logic.py #ApiLogicServer: basic rules - 5 rules vs 200 lines of code: # logic design translates directly into rules Rule.constraint(validate=Customer, as_condition=lambda row: row.Balance <= row.CreditLimit, error_msg="balance ({round(row.Balance, 2)}) exceeds credit ({round(row.CreditLimit, 2)})") # adjust iff AmountTotal or ShippedDate or CustomerID changes Rule.sum(derive=Customer.Balance, as_sum_of=Order.AmountTotal, where=lambda row: row.ShippedDate is None and row.Ready == True) # adjust iff Amount or OrderID changes Rule.sum(derive=Order.AmountTotal, as_sum_of=OrderDetail.Amount) Rule.formula(derive=OrderDetail.Amount, as_expression=lambda row: row.UnitPrice * row.Quantity) # get Product Price (e,g., on insert, or ProductId change) Rule.copy(derive=OrderDetail.UnitPrice,from_parent=Product.UnitPrice) SendGrid Email N8N has hundreds of integration features that follow the same pattern. Add a node to your diagram and attach the input, configure the settings (here, a SendGrid API key is added), and test to see the output. The SendGrid will respond with a messageId (which can be returned to the caller or stored in a database or Google sheet). Workflows can be downloaded and stored in GitHub or uploaded into the cloud version. SendGrid input and output (use drag and drop to build email message) AI Integration: A Chatbot Example The community contributes workflow "templates" that anyone can pick up and use in their own workflow. One template has the ability to take documents from S3 and feed them to Pinecone (a vector data store). Then, use the AI block to link this to ChatGPT — the template even provides the code to insert into your webpage to make this a seamless end-to-end chatbot integration. Imagine taking your product documentation in Markdown and trying this out on a new website to help users understand how to chat and get answers to questions. AI workflow to build a chatbot Summary GenAI Logic is the new kid on the block. It combines the power of AI chat, natural language rules, and API automation framework to instantly deliver running applications. The source is easily downloaded into a local IDE and the work for the dev team begins. With the API in place, the UI/UX team can use the Ontimze (Angular) framework to "polish" the front end. The developer team can add logic and security to handle the business requirements. Finally, the integration team can build the workflows to meet the business use case requirements. ApiLogicServer has a Kafka integration for producers and consumers. This extends the need for real-time workflow integration and can produce a Kafka message that a consumer can start the workflow (and log, track, and retry if needed). N8N provides an integration space that gives ApiLogicServer new tools to meet most system integration needs. I have also tested Zapier webhook (a cloud-based solution) which works the same way. Try the WebGenAI for free to get started building apps and logic from prompts.
The industry's increasing focus on secure container images is undeniable. Companies like Chainguard — specializing in delivering container images free of CVEs — have demonstrated the demand by recently raising an impressive $140 million at a $1.1 billion valuation. In the open-source ecosystem, Cloud Native Buildpacks, an incubating CNCF project, and their vibrant communities deliver a comparable value proposition by automating the creation of optimized and secure container images. In this article, I'll explore Buildpack's core concepts, comparing them with Docker to illustrate their functionality and highlight how they provide a community-driven alternative to the value Chainguard brings to container security. What Are Buildpacks? Buildpacks automate the process of preparing your application for deployment, detecting dependencies, building runtime artifacts, and packaging everything into a container image. They abstract the manual effort to build images efficiently. In other words, if Docker allows you to define how a container is built through a Dockerfile explicitly, Buildpacks operate at a higher level of abstraction. They offer opinionated defaults that help developers ship production-ready images quickly. Comparing a Few Concepts Buildpacks do containerization differently and more efficiently. For those unfamiliar with the technology, let's review a few key Docker concepts and see how they translate to the Buildpacks world. Entrypoint and Start Commands In a Dockerfile, the ENTRYPOINT or CMD specifies the command that runs when the container starts. For example: Dockerfile CMD ["java", "-jar", "app.jar"] Buildpacks abstract this step; you have nothing to do. They automatically detect the appropriate start command for your application based on the runtime and build process. For example, when using a Java Buildpack, the resulting image includes logic to start your application with java -jar app.jar or a similar command. You don't need to configure it explicitly; Buildpacks "just know" how to start applications based on best practices. Writing a Dockerfile The concept of not doing anything goes even further; you don't even need to write the equivalent of a Dockerfile. Buildpacks will take care of everything that is needed to containerize your application into an OSI image. Multi-Stage Builds That abstraction is not coming at the cost of optimization. For example, multi-stage builds are a common technique in Docker to create lean images by separating the built environment from the runtime environment. For instance, you might compile a Java binary in one stage and copy it to a minimal base image in the final stage: Dockerfile # Build stage FROM maven:3.8-openjdk-11 as builder WORKDIR /app COPY . . RUN mvn package # Runtime stage FROM openjdk:11 COPY --from=builder /app/target/app.jar /app.jar CMD ["java", "-jar", "/app.jar"] Buildpacks handle the equivalent of multi-stage builds behind the scenes. During the build process, they: Detect your application's dependenciesBuild artifacts (e.g., compiled binaries for Java)Create a final image with only the necessary runtime components This is again done automatically, requiring no explicit configuration. About Security Let's jump into the security part and explore a few ways that the Buildpacks ecosystem can be seen as an OSS alternative to Chainguard. Non-Root Containers Running containers as non-root users is a best practice to improve security. In Dockerfiles, this typically involves creating a new user and configuring permissions. Buildpacks enforce non-root execution by default. The resulting container image is configured to run as an unprivileged user, with no extra effort required from the developer. CVEs Security is a significant focus for open-source Buildpack communities like Paketo Buildpacks, and Google Cloud. What these communities offer could be seen as the open-source alternative to Chainguard. By default, Buildpacks use pre-configured, community-maintained base images that are regularly updated to eliminate known vulnerabilities (CVEs). For example, Paketo Buildpacks stacks (build image and run image) are rebuilt whenever a package is patched to fix a CVE, and every stack is rebuilt weekly to ensure packages without CVEs are also up to date. The community releases stack updates that fix high and critical CVEs within 48 hours of the patch release and two weeks for low and medium CVEs. SBOM Buildpacks can provide an SBOM to describe the dependencies that they provide. It supports three ways to report SBOM data: CycloneDX, SPDX, or Syft. Paketo Buildpacks also uses SBOM generation to provide a detailed record of all dependencies in the images they provide, making it easier to track and audit components for vulnerabilities. A Solid OSS Chainguard Alternative Buildpacks offer a simple, secure, and standardized way to create production-ready container images, making them a potential cornerstone of platform engineering strategy. By automating tasks like dependency management, non-root execution, and security updates, Buildpacks provide a community-driven alternative to commercial security solutions like Chainguard. For teams looking to streamline workflows and enhance container security without the complexity of Dockerfiles and the cost and limitations of Chainguard, Buildpacks can be a solid starting point.
Performance tuning is a critical responsibility for Oracle database administrators, ensuring that SQL queries run efficiently across various environments. This guide details how to copy an SQL execution plan from one Oracle 19c database to another, a practical solution when a query performs inconsistently across environments. For example, if a query runs efficiently in a staging environment but poorly in production, transferring the execution plan can resolve performance issues without modifying the SQL code. Below are the steps to copying SQL execution plans. Source Database Operations Step 1: Identify the Plan Hash Value To begin, identify the PLAN_HASH_VALUE of the SQL query in the source database where it performs well. SQL SELECT DISTINCT plan_hash_value FROM v$sql WHERE sql_id = 'abcd1234xyz'; Example output: PLAN_HASH_VALUE3456789012 Performance validation: Query execution time in staging: ~0.5 seconds.Query execution time in production: ~3.2 seconds. Step 2: Load the Plan into SQL Plan Management (SPM) Load the execution plan into the SPM repository using the DBMS_SPM.LOAD_PLANS_FROM_CURSOR_CACHE procedure. SQL DECLARE ret binary_integer; BEGIN ret := DBMS_SPM.LOAD_PLANS_FROM_CURSOR_CACHE( sql_id => 'abcd1234xyz', plan_hash_value => 3456789012, fixed => 'YES', enabled => 'YES' ); END; / Verify the loaded plan: SQL SELECT sql_handle, plan_name FROM dba_sql_plan_baselines; Example output: SQL_HANDLEPLAN_NAMESQL_12345abcde67890SQL_PLAN_xyz09876abcd Step 3: Create a Staging Table Create a table to store the plan for export. SQL BEGIN DBMS_SPM.CREATE_STGTAB_BASELINE( table_name => 'STAGING_PLAN_TABLE', table_owner => 'APPUSER', tablespace_name => 'USERS' ); END; / Step 4: Pack the Execution Plan Pack the plan into the staging table. SQL DECLARE my_plans NUMBER; BEGIN my_plans := DBMS_SPM.PACK_STGTAB_BASELINE( table_name => 'STAGING_PLAN_TABLE', table_owner => 'APPUSER', plan_name => 'SQL_PLAN_xyz09876abcd', sql_handle => 'SQL_12345abcde67890' ); END; / Step 5: Export the Staging Table Export the staging table using Oracle Data Pump: Shell expdp appuser/password@source_db \ tables=APPUSER.STAGING_PLAN_TABLE \ dumpfile=plan_export.dmp \ logfile=plan_export.log Transfer the plan_export.dmp file to the target database. Target Database Operations Step 6: Import the Staging Table Import the table into the target database. Shell impdp appuser/password@target_db \ tables=APPUSER.STAGING_PLAN_TABLE \ dumpfile=plan_export.dmp \ logfile=plan_import.log Step 7: Unpack the Execution Plan Unpack the execution plan into the SPM repository in the target database. SQL DECLARE plans_unpacked PLS_INTEGER; BEGIN plans_unpacked := DBMS_SPM.UNPACK_STGTAB_BASELINE( table_name => 'STAGING_PLAN_TABLE', table_owner => 'APPUSER' ); DBMS_OUTPUT.PUT_LINE('Plans Unpacked: ' || plans_unpacked); END; / Verify the unpacked plan: SQL SELECT sql_handle, plan_name, enabled, accepted, fixed FROM dba_sql_plan_baselines; Step 8: Fix the Execution Plan in Oracle 19c Ensure the SQL optimizer consistently uses the imported execution plan by marking it as FIXED. A fixed plan tells the optimizer to prioritize it over other plans for the same SQL query, ensuring stable and predictable performance. Code for Fixing the Plan SQL DECLARE plans_altered PLS_INTEGER; BEGIN plans_altered := DBMS_SPM.ALTER_SQL_PLAN_BASELINE( sql_handle => 'SQL_12345abcde67890', plan_name => 'SQL_PLAN_xyz09876abcd', attribute_name => 'fixed', attribute_value => 'YES' ); DBMS_OUTPUT.PUT_LINE('Plans Altered: ' || plans_altered); END; / Explanation of Input Parameters 1. sql_handle Definition: A unique identifier for the SQL statement associated with the execution plan in the SPM repository.Example value: 'SQL_12345abcde67890' This corresponds to the SQL query whose execution plan you imported. How to find it: Query the DBA_SQL_PLAN_BASELINES table to get the SQL_HANDLE: SQL SELECT sql_handle, sql_text FROM dba_sql_plan_baselines WHERE sql_text LIKE '%<your_query>%'; 2. plan_name Definition: The unique identifier for the specific execution plan you want to fix.Example value: 'SQL_PLAN_xyz09876abcd' This corresponds to the imported execution plan. How to find it: Query the DBA_SQL_PLAN_BASELINES table to get the PLAN_NAME: SQL SELECT plan_name, sql_handle, enabled, accepted, fixed FROM dba_sql_plan_baselines WHERE sql_handle = 'SQL_12345abcde67890'; 3. attribute_name Definition: The attribute of the plan that you want to modify.Allowed values: 'fixed', 'enabled', 'accepted', etc.Example value: 'fixed' In this context, it specifies that you want to modify the fixed status of the plan. 4. attribute_value Definition: The new value for the attribute being modified.Allowed values: 'YES', 'NO'.Example value: 'YES' This marks the execution plan as fixed, prioritizing it over other plans for the same query. Expected Output When executed, the procedure updates the specified plan's fixed attribute and outputs the number of plans altered. For example: YAML Plans Altered: 1 Verification To confirm the plan is fixed, run the following query: SQL SELECT sql_handle, plan_name, fixed FROM dba_sql_plan_baselines WHERE sql_handle = 'SQL_12345abcde67890'; Expected output: SQL_HANDLEPLAN_NAMEFIXEDSQL_12345abcde67890SQL_PLAN_xyz09876abcdYES When to Use Fixing Plans Fixing a plan is particularly useful in scenarios like: Stabilizing query performance in production environments.Ensuring the optimizer does not deviate from a known efficient plan.Addressing inconsistent performance across environments. By carefully fixing execution plans, you can maintain predictable query behavior while mitigating performance risks. Step 9: Test the Query Run the query in the target database to confirm the plan is applied: SQL SELECT DISTINCT plan_hash_value FROM v$sql WHERE sql_id = 'abcd1234xyz'; Example output: PLAN_HASH_VALUE3456789012 Performance validation: Query execution time after plan transfer: ~0.5 seconds.Performance improvement: ~84.4% faster. Conclusion By following these steps, you can transfer SQL execution plans between Oracle 19c databases to address performance issues without altering SQL code. This method ensures consistent query behavior across environments, significantly improving performance. Summary of Results EnvironmentQuery Execution TimeImprovement (%)Staging~0.5 secondsBaselineProduction~3.2 secondsN/AProduction (Post Plan Transfer)~0.5 seconds~84.4%
SQL Server is a powerful relational database management system (RDBMS), but as datasets grow in size and complexity, optimizing their performance becomes critical. Leveraging AI can revolutionize query optimization and predictive maintenance, ensuring the database remains efficient, secure, and responsive. In this article, we will explore how AI can assist in these areas, providing code examples to tackle complex queries. AI for Query Optimization Complex queries can be slow due to inefficient exciting plans or poor indexing strategies. AI can analyze query execution metrics, identify bottlenecks, and provide suggestions for optimization. Example: Complex Query Optimization Let's start with a slow-running query: MS SQL SELECT p.ProductID, SUM(o.Quantity) AS TotalQuantity FROM Products p JOIN Orders o ON p.ProductID = o.ProductID WHERE o.OrderDate >= '2023-01-01' GROUP BY p.ProductID HAVING SUM(o.Quantity) > 1000 ORDER BY TotalQuantity DESC; So, this query suffers from performance issues because of: Unoptimized indexes on OrderDate and ProductID.A high volume of unnecessary data is being scanned. Solution: AI-Based Query Plan Analysis Using tools like SQL Server Query Store and integrating AI-based analytics, you can identify inefficiencies: 1. Enable Query Store MS SQL ALTER DATABASE AdventureWorks SET QUERY_STORE = ON; 2. Capture Query Performance Metrics Use Python with a library like PyODBS and AI frameworks to analyze the query's executions and statistics. Python import pyodbc import pandas as pd from sklearn.ensemble import IsolationForest # Connect to SQL Server conn = pyodbc.connect( "Driver={SQL Server};" "Server=your_server_name;" "Database=AdventureWorks;" "Trusted_Connection=yes;" ) # Retrieve query execution stats query = """ SELECT TOP 1000 qs.query_id, qs.execution_type, qs.total_duration, qs.cpu_time, qs.logical_reads, qs.physical_reads FROM sys.query_store_runtime_stats qs """ df = pd.read_sql(query, conn) # Use AI for anomaly detection (e.g., identifying slow queries) model = IsolationForest(n_estimators=100, contamination=0.1) model.fit(df[['total_duration', 'cpu_time', 'logical_reads']]) df['anomaly'] = model.predict(df[['total_duration', 'cpu_time', 'logical_reads']]) print(df[df['anomaly'] == -1]) # Anomalous slow queries 3. Optimize the Query Based on the analysis, add proper indexing: MS SQL CREATE NONCLUSTERED INDEX IDX_Orders_OrderDate_ProductID ON Orders(OrderDate, ProductID); Here is the updated Query after the AI suggestions and reduced the unnecessary scans: MS SQL SELECT p.ProductID, SUM(o.Quantity) AS TotalQuantity FROM Products p JOIN Orders o ON p.ProductID = o.ProductID WHERE o.OrderDate >= '2023-01-01' AND EXISTS ( SELECT 1 FROM Orders o2 WHERE o2.ProductID = p.ProductID AND o2.Quantity > 1000 ) GROUP BY p.ProductID ORDER BY TotalQuantity DESC; AI for Predictive Maintenance AI can predict system issues before they occur, such as disk I/O bottlenecks for query timeouts. Example: Predicting Performance Bottlenecks 1. Collect Performance Metrics Use SQL Server's DMV's (Dynamic Management Views) to retrieve metrics. MS SQL SELECT database_id, io_stall_read_ms, io_stall_write_ms, num_of_reads, num_of_writes FROM sys.dm_io_virtual_file_stats(NULL, NULL); 2. Analyze Metrics With AI Predict bottlenecks using Python and a regression model: Python from sklearn.linear_model import LinearRegression import numpy as np # Example I/O data io_data = { 'read_stall': [100, 150, 300, 500, 800], 'write_stall': [80, 120, 280, 480, 750], 'workload': [1, 2, 3, 4, 5] # Hypothetical workload levels } X = np.array(io_data['workload']).reshape(-1, 1) y = np.array(io_data['read_stall']) # Train model model = LinearRegression() model.fit(X, y) # Predict for future workload levels future_workload = np.array([6]).reshape(-1, 1) predicted_stall = model.predict(future_workload) print(f"Predicted read stall for workload 6: {predicted_stall[0]} ms") 3. Proactive Maintenance Schedule optimizations based on predicted workloadsAdd resources (e.g., disk I/O capacity) or rebalance workloads to mitigate future issues. Analysis of SQL Server Before and After AI-Driven Query MetricBefore OptimizationAfter Optimization with AIImprovementDataset Size50 million rows50 million rowsNo changeQuery Execution Time120 seconds35 seconds~70% reductionCPU Utilization (%)85%55%~35% reductionI/O Read Operations (per query)1,500,000850,000~43% reductionLogical Reads (pages)120,00055,000~54% reductionIndex UtilizationMinimalFully optimizedImproved indexing strategyLatency for Concurrent QueriesHigh (queries queued)Low (handled in parallel)Significant reduction in wait timeResource ContentionFrequentRareBetter query and resource managementOverall Throughput (queries/hour)20603x improvementError Rate (timeouts or failures)5%1%80% reduction Key Observations 1. Query Execution Time Using AI to analyze execution plans and recommend the indexes significantly reduced execution time for complex queries. 2. CPU and I/O Efficiency Optimized indexing and improved query structure reduced resource consumption. 3. Concurrency Handling Enhanced indexing and optimized execution plans improved the ability to handle concurrent queries, reducing latency 4. Throughput With reduced execution time and better resource utilization, the system processed more queries per hour. 5. Error Rate AI-driven optimization reduced query timeouts and failures by minimizing resource contention and improving execution plans. Conclusion Incorporating AI-driven solutions into the optimization of SQL Server significantly enhances the management and querying of extensive datasets, particularly when dealing with millions of rows. A comparative analysis of performance metrics before and after optimization reveals marked improvements in execution times, resource efficiency, and overall system throughput, By utilizing AI tools for query optimization, indexing methodologies, and predictive analytics, organizations can achieve reduced latency, improved concurrency, and fewer errors, thereby ensuring a dependable and efficient database environment. The adoption of sophisticated indexing techniques and AI-based query analysis has led to a reduction in execution times by approximately 70%, a decrease in CPU and I/O resource consumption, and a tripling of query throughput. Furthermore, predictive maintenance has facilitated proactive resource management, significantly mitigating the potential for bottlenecks and system downtime. These enhancements improve performance and foster scalability and resilience for future expansion.
Despite their remarkable capabilities in generating text, answering complex questions, and performing a wide range of tasks, Large Language Models (LLMs) have notable limitations that hinder their real-world applicability. One significant challenge is their inability to consistently provide precise, up-to-date responses. This issue is especially critical in fields like healthcare, law, and finance, where the accuracy and explainability of information are paramount. For instance, imagine a financial analyst querying the latest market trends or a doctor seeking updated medical guidelines. Retrieval-augmented generation (RAG) addresses these limitations by combining the strengths of LLMs with information retrieval systems, ensuring more accurate, reliable, and contextually grounded outputs. Limitations of LLMs and How RAG Helps Hallucination LLMs sometimes generate content that is "nonsensical or unfaithful to the provided source content" (Ji et al., 2023). This phenomenon, known as hallucination, occurs because the models rely on patterns in their training data rather than a solid understanding of the underlying facts. For example, they may produce inaccurate historical dates, fictitious citations, or nonsensical scientific explanations. RAG mitigates hallucination by grounding the generation process in trusted external sources. By retrieving information from verifiable knowledge bases or databases, RAG ensures that outputs are aligned with reality, reducing the occurrence of spurious or incorrect details. Outdated Knowledge LLMs are limited by the static nature of their training data, meaning their knowledge is frozen at the time of training. They lack the ability to access new information or keep up with fast-changing fields like technology, medicine, or global affairs. RAG overcomes this limitation by integrating dynamic retrieval mechanisms that allow the model to query up-to-date sources in real time. For instance, when integrated with APIs or live databases, RAG systems can provide accurate responses even in domains with frequent updates, such as real-time stock market analysis or the latest medical guidelines. Opaque and Untraceable Reasoning The reasoning process of LLMs is often non-transparent, making it difficult for users to understand or trust how answers are derived. This lack of traceability can be problematic in high-stakes scenarios where accountability is essential. RAG addresses this by incorporating citations and source traceability into its outputs. By linking retrieved information to authoritative sources, RAG not only enhances transparency but also builds user trust. This feature is particularly beneficial in legal, academic, or compliance-focused applications. How RAG Works A simple RAG system, as illustrated in the image below, consists of two main components: the Retriever and the Augmented Generator. Retriever Simple RAG Architecture Efficient retrieval methods are at the core of RAG systems, enabling them to identify the most relevant information from a knowledge base to contextualize and ground the outputs of the language model. This knowledge base can include structured or unstructured data sources, such as internal documents, document databases, or the broader internet. The retrieval process involves: Preprocessing and Indexing Chunking: Large documents are segmented into smaller, logically coherent chunks ensuring each chunk captures a meaningful context. For example, a 100-page technical report might be divided into 1-page sections, each covering a distinct topic.Knowledge representation: This step converts the chunks into mathematical representations that facilitate retrieval and comparison. It includes both dense and sparse approaches: Dense representations: Chunks are converted into dense vector representations (embeddings) using embedding models like Sentence-BERT, OpenAI's embedding models, or other transformer-based approaches.Sparse representations: Alternatively, sparse representations such as TF-IDF or BM25 can be computed to create a term-based representation of the chunks.Indexing: Indexing organizes the processed representations of document chunks into specialized data structures, enabling efficient and scalable retrieval.Different indexing techniques are used depending on the type of representation: Vector store indexing: Dense embeddings are stored in a vector database (e.g., FAISS, Pinecone, Weaviate) for similarity-based retrieval.Sparse indexing: TF-IDF or BM25 representations are indexed using traditional inverted indices to support lexical matching.Query embedding: The user's query is processed into a sparse or dense vector using the same model used for the knowledge base embeddings. This ensures both reside in the same vector space for meaningful comparisons.Similarity search: The query embedding is compared against the stored embeddings using similarity metrics such as cosine similarity or inner product to retrieve relevant data from the knowledge base. Augmented Generation The retrieved information is combined with the original user query and provided as input to a language model. The generation process ensures the response is: Contextual: Incorporates the user’s query and retrieved-context to generate outputs relevant to the question.Grounded: Based on reliable, retrieved knowledge rather than speculative extrapolation.Explainable: Traces back to the retrieved sources, enabling users to verify the response. Advanced RAG Techniques The simple RAG architecture presented earlier had several potential points of failure that can degrade the quality of the system’s output. One of the first challenges occurs at the user query stage, where the query might not be clearly articulated or precisely framed, leading to ineffective retrieval. Additionally, even when the query is well-formed, the query embedding process may fail to capture the full intent behind the query, especially when the query is complex. This can cause a mismatch between the query and the relevant information retrieved from the knowledge base. Furthermore, the retrieval stage may introduce additional challenges. The system may return irrelevant or low-confidence documents due to limitations in the retrieval model. If the retrieved context is not highly relevant to the query, the information passed to the LLM is likely to be misleading, incomplete, or out of date. This can lead to hallucinations or a generation of responses that are not grounded in the correct context, undermining the reliability of the system. To address these challenges, a strong and reliable RAG system must intervene at each of these stages to ensure optimal performance. This is where the concept of an Advanced RAG system comes into play. In an Advanced RAG framework, sophisticated mechanisms are introduced at every stage of the pipeline, from query formulation and embedding generation to retrieval and context utilization. These interventions are designed to mitigate the risks of poor query formulation, inaccurate embeddings, and irrelevant or outdated context, resulting in a more robust system that delivers highly relevant, accurate, and trustworthy outputs. By addressing these points of failure proactively, Advanced RAG ensures that the language model generates responses that are not only contextually grounded but also transparent and reliable. The Advanced RAG architecture introduces key improvements, as shown in the diagram below. Compared to the simple RAG model, Advanced RAG incorporates additional steps to address the potential points of failure. Advanced RAG Architecture Key Enhancements in Advanced RAG Data Preprocessing for LLM-Based RAG Systems Data preprocessing is the foundational step in a RAG system. It involves transforming diverse data formats — such as PDFs, Word documents, web pages, or code files—into a consistent structure that can be efficiently processed by the RAG pipeline. Proper preprocessing is critical for enhancing accuracy, efficiency, and relevance in information retrieval and response generation. Key Steps in Data Preprocessing include: Tokenization and Text Cleaning Tokenization and cleaning ensure data is structured in a form suitable for the model. Key considerations include: Consistent Tokenization The tokenizer associated with the LLM being used should be employed to ensure compatibility and efficiency (Lewis et al., 2020). For example, GPT models use specialized tokenization schemes optimized for their architecture (OpenAI 2024). Text Cleaning Irrelevant characters and noise, such as excessive whitespace or non-meaningful special symbols, should be removed. Semantically relevant characters, such as those in mathematical or programming contexts, should be preserved. Chunking for Efficient Retrieval Chunking involves segmenting large documents into smaller, meaningful units to address token limits and context window constraints inherent in LLMs. By organizing similar information into compact chunks, the retrieval process becomes more efficient and precise. Various chunking strategies can be adopted depending on the type of data and specific use case (Weights & Biases, 2024): Fixed-length chunking: This method divides text into segments based on a fixed number of tokens or characters. While straightforward, it risks disrupting semantic flow by splitting content arbitrarily.Semantic chunking: Text is divided along natural thematic breaks, much like separating a book into chapters. This method enhances interpretability but may require advanced NLP techniques for implementation (Lewis et al., 2020).Content-based chunking: Segmentation is tailored to the structure of the document. For instance, code files can be split by function definitions, and web pages can be divided based on HTML elements, ensuring relevance to the document’s format and purpose. Query Enhancement Advanced RAG systems employ LLMs and a variety of NLP techniques to enhance the understanding of user queries. These include: Intent Classification This helps the system identify the user’s intent, allowing it to better understand the purpose behind the query. For instance, is the query seeking factual information, a product feature comparison, or technical support? Intent classification ensures the retrieval process is customized to meet the user’s specific needs (Weights & Biases, 2024). Query Decomposition For complex or multi-faceted queries, the system breaks the query into smaller, more focused subqueries. This approach ensures that the retrieval process comprehensively addresses all aspects of the user’s request and retrieves highly specific and relevant information (Rackauckas et al. 2024). Conversation History Utilization This technique enables the RAG system to enhance queries by leveraging past interactions. By maintaining context across multiple exchanges, the system analyzes and incorporates previous queries and responses. Through chat history condensation, key details from prior interactions are distilled to reduce noise and preserve relevance. Additionally, contextual query reformulation refines the current query by integrating historical context, allowing the system to retrieve more accurate and context-aware information. This ensures the delivery of coherent and tailored responses, even in complex or evolving conversational scenarios (Weights & Biases, 2024). Hypothetical Document Embeddings (HyDE) HyDE utilizes LLMs to create hypothetical documents that are relevant to the original query. Although these generated documents may include factual inaccuracies and are not real, they serve to capture relevance by offering illustrative examples, which can then aid in the retrieval of the actual documents (Gao et al., 2022). Advanced Retrieval In addition to utilizing the enhanced queries for retrieval, information retrieval techniques include: Classical Information Retrieval Classical information retrieval techniques focus on scoring and ranking documents based on their relevance to a given query. Two prominent methods are Term Frequency-Inverse Document Frequency (TF-IDF) and Best Matching 25 (BM25). While these approaches are effective for matching tokens, they often fall short when it comes to understanding the content or context of the query and document, relying primarily on surface-level token similarity rather than deeper semantic relationships. Neural Information Retrieval Neural information (NI) retrieval methods represent documents using language models, producing dense numeric representations (embeddings) that capture the semantic and contextual meaning of the documents. These dense representations allow for more accurate and nuanced matching between queries and documents compared to traditional approaches. NI retrieval techniques include (Khattab et al. 2020): Bi-Encoder Model In a Bi-Encoder model, the query and document are independently encoded into single fixed-length vector representations using separate neural networks, such as a pre-trained BERT model. These representations capture the overall semantic meaning of the query and document rather than token-level details. The similarity between the query and document is then computed using a similarity function, such as the dot product or cosine similarity, to identify relevant documents. This strategy can be further enhanced by fine-tuning the pre-trained BERT model to maximize the similarity between query and document embeddings. Fine-tuning is typically achieved by optimizing the negative log-likelihood of positive document pairs, as outlined by Karpukhin et al. (2020). This approach improves the alignment of query and document representations, leading to more accurate retrieval. This approach is highly efficient for large-scale retrieval, as document embeddings can be precomputed and indexed, enabling fast nearest-neighbor searches. However, the independent encoding of the query and document limits the ability to capture intricate interactions between them, which can reduce the retrieval quality, particularly for more complex queries. Cross-Encoder Model In contrast, a Cross-Encoder model processes both the query and document simultaneously by concatenating them and passing the combined input through a single transformer model. This allows for more complex interactions between the query and document, resulting in potentially higher-quality relevance scores. The fine-grained attention between the query and document typically leads to more accurate relevance assessments. However, this approach is computationally expensive, as each query-document pair must be encoded independently at inference time. As a result, it is less scalable for large datasets. ColBERT ColBERT (Contextualized Late Interaction BERT) (Khattab et al. 2020) is an advanced retrieval model designed to strike a balance between the efficiency of Bi-Encoders and the accuracy of Cross-Encoders. Its core innovation lies in computing a matrix of similarity scores between query tokens and document tokens, enabling a fine-grained assessment of relevance. For each document, the overall relevance score is computed as the sum of the maximum similarity scores between individual document tokens and query tokens. This token-wise interaction allows ColBERT to capture nuanced relationships while maintaining scalability. ColBERT achieves high efficiency because, like Bi-Encoder models, it allows documents to be pre-encoded and stored in advance. During query processing, only the query needs to be encoded, after which the maximum similarity scores are computed on the fly. This approach significantly reduces computational overhead while delivering high retrieval performance. This combination of pre-encoded document representations and late interaction makes ColBERT an effective choice for large-scale information retrieval tasks. Re-Ranker Neural information retrieval models offer significant advancements in information retrieval but come with considerable computational costs, particularly due to the need for forward inference on transformer-based architectures such as BERT. This is especially challenging in scenarios with stringent latency requirements. To make this practical, these models can be effectively utilized as re-rankers in a two-step retrieval pipeline: An initial retrieval phase uses efficient term-based models, such as BM25, to identify the top K candidate documents.These K documents are then re-ranked using neural information retrieval models, which assess their relevance through contextual scoring. In the re-ranking stage, advanced scoring mechanisms — such as cross-encoders or ColBERT, evaluate and reorder the retrieved documents based on their contextual relevance and quality. By incorporating this step, the re-ranking process reduces irrelevant information in the retrieved set, thereby improving the contextual input passed to downstream language models. This methodology not only mitigates computational overheads but also enhances the quality of responses generated in subsequent processing stages. Response Synthesis and Prompt Engineering Response synthesis is a critical component that bridges the gap between raw data retrieval and user-friendly output in the RAG system. Prompt engineering plays a crucial role in maximizing the performance of such systems, enabling the model to generate accurate, contextually appropriate, and reliable responses (Brown et al., 2020; Bender et al., 2021). By systematically providing clear instructions and examples and leveraging advanced language models, the quality, accuracy, and trustworthiness of generated responses can be significantly improved. Key Techniques for Effective Prompt Engineering Define the Role Clearly define the role the language model (LM) should play in the given task (e.g., "You are a helpful assistant"). This helps set expectations for how the model should behave and respond. Define the Goal Explicitly state the objective of the response (e.g., "Answer the following question"). A well-defined goal ensures that the LM's output aligns with user expectations and task requirements. Provide Context Contextual information is crucial for guiding the model’s output. Include relevant background data, domain-specific knowledge, or specific constraints that the model should consider while generating the response. Add Instructions Specify detailed instructions on how the model should generate the output (e.g., "Use bullet points to list the steps"). Clear guidance on the structure or format can improve the clarity and usability of the generated response. Few-Shot Learning Incorporating a small set of high-quality, diverse examples can significantly enhance the performance and adaptability of LMs in RAG systems (Brown et al., 2020). Few-shot prompting provides the model with a middle ground between zero-shot learning (where no examples are provided) and fine-tuning (where the model is retrained on a specific dataset). By embedding a few representative examples, the model learns the desired output behavior, improving its response accuracy. Representative Samples Choose examples that are reflective of the most common query types and desired response formats the RAG system will handle. Specificity and Diversity Include examples that balance specificity to your use case with diversity to address a wide range of queries. Dynamic Example Set As the system evolves, regularly update the set of examples to align with new query types or business needs. Balancing Performance and Token Usage LLMs have limitations on the amount of text they can process in a single prompt (context window). It's essential to find the right balance between including enough examples to guide the model effectively and not overloading the context window, which can degrade performance. Incorporate Model Reasoning To enhance the transparency of the RAG system, request the model to explain its thought process (Bender et al., 2021). This helps improve the trustworthiness of the model’s outputs and can assist users in understanding the reasoning behind specific responses, especially when dealing with complex or uncertain queries. By employing these strategies in prompt engineering, one can significantly improve the functionality, performance, and reliability of a RAG system, ensuring that the generated responses are not only accurate but also aligned with the user’s needs. Response Validation Before the response is presented to the user, an additional validation layer assesses the output generated by the language model. This step acts as a quality control mechanism to ensure that the response is accurate, appropriate, and grounded in the retrieved context. It can involve tasks like factual consistency checks, appropriateness scoring, and cross-referencing with trusted sources. Conclusion The evolution of RAG systems marks a pivotal shift in how we harness LLMs to provide precise, context-aware, and reliable information. By integrating advanced techniques such as intelligent preprocessing, enhanced query understanding, sophisticated retrieval mechanisms, and effective response synthesis, modern RAG systems address the complexities of real-world applications with remarkable efficiency. As RAG continues to mature, its role in bridging the gap between vast unstructured data sources and actionable insights becomes increasingly indispensable. Whether applied to domains like customer support, research, or decision-making, RAG systems are poised to redefine how we interact with and benefit from AI-driven knowledge systems. References 1. Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y.J., Madotto, A. and Fung, P., 2023. Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12), pp.1-38. 2. Lewis, P., Oguz, B., Rinott, R., Riedel, S., & Stenetorp, P. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Processing Systems (NeurIPS), 33, 9459–9474. https://proceedings.neurips.cc 3. OpenAI, 2024. Tokenization and context window. Retrieved from https://platform.openai.com 4. Weights & Biases, 2024. RAG in Production. [online] Available at: https://www.wandb.courses/courses/rag-in-production [Accessed 5 Oct. 2024]. 5. Gao, L., Ma, X., Lin, J. and Callan, J., 2022. Precise zero-shot dense retrieval without relevance labels. arXiv preprint arXiv:2212.10496. 6. Rackauckas, Z., 2024. Rag-fusion: a new take on retrieval-augmented generation. arXiv preprint arXiv:2402.03367. 7. Karpukhin, V., Oğuz, B., Min, S., Lewis, P., Wu, L., Edunov, S., Chen, D. and Yih, W.T., 2020. Dense passage retrieval for open-domain question answering. arXiv preprint arXiv:2004.04906. 8. Khattab, O. and Zaharia, M., 2020, July. Colbert: Efficient and effective passage search via contextualized late interaction over bert. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval (pp. 39-48). 9. Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A. and Agarwal, S., 2020. Language models are few-shot learners. Advances in neural information processing systems, 33, pp.1877-1901. 10. Bender, E.M., Gebru, T., McMillan-Major, A. and Shmitchell, S., 2021, March. On the dangers of stochastic parrots: Can language models be too big?. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency (pp. 610-623).
Metaprogramming is a powerful programming paradigm that allows code to dynamically manipulate its behavior at runtime. JavaScript, with the introduction of Proxies and the Reflect API in ES6, has taken metaprogramming capabilities to a new level, enabling developers to intercept and redefine core object operations like property access, assignment, and function invocation. This blog post dives deep into these advanced JavaScript features, explaining their syntax, use cases, and how they work together to empower dynamic programming. What Are Proxies? A Proxy in JavaScript is a wrapper that allows developers to intercept and customize fundamental operations performed on an object. These operations include getting and setting properties, function calls, property deletions, and more. Proxy Syntax JavaScript const proxy = new Proxy(target, handler); target: The object being proxied.handler: An object containing methods, known as traps, that define custom behaviors for intercepted operations. Example: Logging Property Access JavaScript const user = { name: 'Alice', age: 30 }; const proxy = new Proxy(user, { get(target, property) { console.log(`Accessing property: ${property}`); return target[property]; } }); console.log(proxy.name); // Logs: Accessing property: name → Output: Alice Key Proxy Traps Trap NameOperation InterceptedgetAccessing a property (obj.prop or obj['prop'])setAssigning a value to a property (obj.prop = value)deletePropertyDeleting a property (delete obj.prop)hasChecking property existence (prop in obj)applyFunction invocation (obj())constructCreating new instances with new (new obj()) Advanced Use Cases With Proxies 1. Input Validation JavaScript const user = { age: 25 }; const proxy = new Proxy(user, { set(target, property, value) { if (property === 'age' && typeof value !== 'number') { throw new Error('Age must be a number!'); } target[property] = value; return true; } }); proxy.age = 30; // Works fine proxy.age = '30'; // Throws Error: Age must be a number! In this example, the set trap ensures type validation before allowing assignments. 2. Reactive Systems (Similar to Vue.js Reactivity) JavaScript const data = { price: 5, quantity: 2 }; let total = 0; const proxy = new Proxy(data, { set(target, property, value) { target[property] = value; total = target.price * target.quantity; console.log(`Total updated: ${total}`); return true; } }); proxy.price = 10; // Logs: Total updated: 20 proxy.quantity = 3; // Logs: Total updated: 30 This code dynamically recalculates values whenever dependent properties are updated, mimicking the behavior of modern reactive frameworks. What Is Reflect? The Reflect API complements Proxies by providing methods that perform default behaviors for object operations, making it easier to integrate them into Proxy traps. Key Reflect Methods MethodDescriptionReflect.get(target, prop)Retrieves the value of a property.Reflect.set(target, prop, val)Sets a property value.Reflect.has(target, prop)Checks property existence (prop in obj).Reflect.deleteProperty(target, prop)Deletes a property.Reflect.apply(func, thisArg, args)Calls a function with a specified this context.Reflect.construct(target, args)Creates a new instance of a constructor. Example: Using Reflect for Default Behavior JavaScript const user = { age: 25 }; const proxy = new Proxy(user, { set(target, property, value) { if (property === 'age' && typeof value !== 'number') { throw new Error('Age must be a number!'); } return Reflect.set(target, property, value); // Default behavior } }); proxy.age = 28; // Sets successfully console.log(user.age); // Output: 28 Using Reflect simplifies the code by maintaining default operations while adding custom logic. Real-World Use Cases Security wrappers: Restrict access to sensitive properties.Logging and debugging: Track object changes.API data validation: Ensure strict rules for API data. Conclusion Metaprogramming with Proxies and Reflect enables developers to dynamically control and modify application behavior. Master these tools to elevate your JavaScript expertise. Happy coding!
Welcome to 2025! A new year is the perfect time to learn new skills or refine existing ones, and for software developers, staying ahead means continuously improving your craft. Software design is not just a cornerstone of creating robust, maintainable, and scalable applications but also vital for your career growth. Mastering software design helps you write code that solves real-world problems effectively, improves collaboration with teammates, and showcases your ability to handle complex systems — a skill highly valued by employers and clients alike. Understanding software design equips you with the tools to: Simplify complexity in your projects, making code easier to understand and maintain.Align your work with business goals, ensuring the success of your projects.Build a reputation as a thoughtful and practical developer prioritizing quality and usability. To help you on your journey, I’ve compiled my top five favorite books on software design. These books will guide you through simplicity, goal-oriented design, clean code, practical testing, and mastering Java. 1. A Philosophy of Software Design This book is my top recommendation for understanding simplicity in code. It dives deep into how to write simple, maintainable software while avoiding unnecessary complexity. It also provides a framework for measuring code complexity with three key aspects: Cognitive Load: How much effort and time are required to understand the code?Change Amplification: How many layers or parts of the system need to be altered to achieve a goal?Unknown Unknowns: What elements of the code or project are unclear or hidden, making changes difficult? The book also discusses the balance between being strategic and tactical in your design decisions. It’s an insightful read that will change the way you think about simplicity and elegance in code. Link: A Philosophy of Software Design 2. Learning Domain-Driven Design: Aligning Software Architecture and Business Strategy Simplicity alone isn’t enough — your code must achieve client or stakeholders' goals. This book helps you bridge the gap between domain experts and your software, ensuring your designs align with business objectives. This is the best place to start if you're new to domain-driven design (DDD). It offers a practical and approachable introduction to DDD concepts, setting the stage for tackling Eric Evans' original work later. Link: Learning Domain-Driven Design 3. Clean Code: A Handbook of Agile Software Craftsmanship Once you’ve mastered simplicity and aligned with client goals, the next step is to ensure your code is clean and readable. This classic book has become a must-read for developers worldwide. From meaningful naming conventions to object-oriented design principles, “Clean Code” provides actionable advice for writing code that’s easy to understand and maintain. Whether new to coding or a seasoned professional, this book will elevate your code quality. Link: Clean Code 4. Effective Software Testing: A Developer’s Guide No software design is complete without testing. Testing should be part of your “definition of done.” This book focuses on writing practical tests that ensure your software meets its goals and maintains high quality. This book covers techniques like test-driven development (TDD) and data-driven testing. It is a comprehensive guide for developers who want to integrate testing seamlessly into their workflows. It’s one of the best software testing resources available today. Link: Effective Software Testing 5. Effective Java (3rd Edition) For Java developers, this book is an essential guide to writing effective and idiomatic Java code. From enums and collections to encapsulation and concurrency, “Effective Java” provides in-depth examples and best practices for crafting elegant and efficient Java programs. Even if you’ve been writing Java for years, you’ll find invaluable insights and tips to refine your skills and adopt modern Java techniques. Link: Effective Java (3rd Edition) Bonus: Head First Design Patterns: Building Extensible and Maintainable Object-Oriented Software As a bonus, I highly recommend this book to anyone looking to deepen their understanding of design patterns. In addition to teaching how to use design patterns, this book explains why you need them and how they contribute to building extensible and maintainable software. With its engaging and visually rich style, this book is an excellent resource for developers of any level. It makes complex concepts approachable and practical. Link: Head First Design Patterns These five books and the bonus recommendation provide a roadmap to mastering software design. Whether you’re just starting your journey or looking to deepen your expertise, each offers a unique perspective and practical advice to take your skills to the next level. Happy learning and happy coding! Video
Understanding DNS Records: What They Are and Why They Matter
January 14, 2025 by
The Importance of AI System Memory
January 14, 2025 by
AWS Nitro Enclaves: Enhancing Security With Isolated Compute Environments
January 14, 2025 by
AWS Nitro Enclaves: Enhancing Security With Isolated Compute Environments
January 14, 2025 by
Router4j: A Free Alternative to Google Maps for Route and Distance Calculation
January 14, 2025 by CORE
A Developer's Guide to Modern Queue Patterns
January 14, 2025 by CORE
AWS Nitro Enclaves: Enhancing Security With Isolated Compute Environments
January 14, 2025 by
Router4j: A Free Alternative to Google Maps for Route and Distance Calculation
January 14, 2025 by CORE
Azure AI and GPT-4: Real-World Applications and Best Practices
January 14, 2025 by
AWS Nitro Enclaves: Enhancing Security With Isolated Compute Environments
January 14, 2025 by
Azure AI and GPT-4: Real-World Applications and Best Practices
January 14, 2025 by
Optimizing Java Applications for AWS Lambda
January 14, 2025 by
The Importance of AI System Memory
January 14, 2025 by
Router4j: A Free Alternative to Google Maps for Route and Distance Calculation
January 14, 2025 by CORE
Azure AI and GPT-4: Real-World Applications and Best Practices
January 14, 2025 by