DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

How does AI transform chaos engineering from an experiment into a critical capability? Learn how to effectively operationalize the chaos.

Data quality isn't just a technical issue: It impacts an organization's compliance, operational efficiency, and customer satisfaction.

Are you a front-end or full-stack developer frustrated by front-end distractions? Learn to move forward with tooling and clear boundaries.

Developer Experience: Demand to support engineering teams has risen, and there is a shift from traditional DevOps to workflow improvements.

DZone Spotlight

Friday, June 13 View All Articles »
AI-Native Platforms: The Unstoppable Alliance of GenAI and Platform Engineering

AI-Native Platforms: The Unstoppable Alliance of GenAI and Platform Engineering

By Graziano Casto DZone Core CORE
Let's be honest. Building developer platforms, especially for AI-native teams, is a complex art, a constant challenge. It's about finding a delicate balance: granting maximum autonomy to development teams without spiraling into chaos, and providing incredibly powerful, cutting-edge tools without adding superfluous complexity to their already dense workload. Our objective as Platform Engineers has always been to pave the way, remove obstacles, and accelerate innovation. But what if the next, inevitable phase of platform evolution wasn't just about what we build and provide, but what Generative AI can help us co-build, co-design, and co-manage? We're not talking about a mere incremental improvement, a minor optimization, or a marginal new feature. We're facing a genuine paradigm shift, a conceptual earthquake where artificial intelligence is no longer merely the final product of our efforts, the result of our development toils, but becomes the silent partner, the tireless ally that is already reimagining, rewriting, and redefining our entire development experience. This is the real gamble, the challenge that awaits us: transforming our platforms from simple toolsets, however sophisticated, into intelligent, dynamic, and self-optimizing ecosystems. A place where productivity isn't just high, but exceptionally high, and innovation flows frictionlessly. What if We Unlock 100% of Our Platform’s Potential? Your primary goal, like that of any good Platform Engineer, is already to make developers' lives simpler, faster, and, let's admit it, significantly more enjoyable. Now, imagine endowing your platform with genuine intelligence, with the ability to understand, anticipate, and even generate. GenAI, in this context, isn't just an additional feature that layers onto existing ones; it's the catalyst that is already fundamentally redefining the Developer Experience (DevEx), exponentially accelerating the entire software development lifecycle, and, even more fascinating, creating new, intuitive, and natural interfaces for interacting with the platform's intrinsic capabilities. Let's momentarily consider the most common and frustrating pain points that still afflict the average developer: the exhaustive and often fruitless hunt through infinite and fragmented documentation, the obligation to memorize dozens, if not hundreds, of specific and often cryptic CLI commands, or the tedious and repetitive generation of boilerplate code. With the intelligent integration of GenAI, your platform magically evolves into a true intelligent co-pilot. Imagine a developer who can simply express a request in natural language, as if speaking to an expert colleague: "Provision a new staging environment for my authentication microservice, complete with a PostgreSQL database, a dedicated Kafka topic, and integration with our monitoring system." The GenAI-powered platform not only understands the deep meaning and context of the request, not only translates the intention into a series of technical actions, but executes the operation autonomously, providing immediate feedback and magically configuring everything needed. This isn't mere automation, which we already know; it's a conversational interaction, deep and contextual, that almost completely zeroes out the developer's cognitive load, freeing their mind and creative energies to focus on innovation, not on the complex and often tedious infrastructural "plumbing". But the impact extends far beyond simple commands. GenAI can act as an omnipresent expert, an always-available and incredibly informed figure, providing real-time, contextual assistance. Imagine being stuck on a dependency error, a hard-to-diagnose configuration problem, or a security vulnerability. Instead of spending hours searching forums or asking colleagues, you can ask the platform directly. And it, magically, suggests practical solutions, directs you to relevant internal best practices (perhaps your own guides, finally usable in an intelligent way!), or even proposes complete code patches to solve the problem. It can proactively identify potential security vulnerabilities in the code you've just generated or modified, suggest intelligent refactorings to improve performance, or even scaffold entire new modules or microservices based on high-level descriptions. This drastically accelerates the entire software development lifecycle, making best practices inherent to the process and transforming bottlenecks into opportunities for automation. Your platform is no longer a mere collection of passive tools, but an intelligent and proactive partner at every single stage of the developer's workflow, from conception to implementation, from testing to deployment. Crucially, for this to work, the GenAI model must be fed with the right platform context. By ingesting all platform documentation, internal APIs, service catalogs, and architectural patterns, the AI becomes an unparalleled tool for discoverability of platform items. Developers can now query in natural language to find the right component, service, or golden path for their needs. Furthermore, this contextual understanding allows the AI to interrogate and access all data and assets within the platform itself, as well as from the applications being developed on it, providing insights and recommendations in real-time. This elevates the concept of a composable architecture, already enabled by your platform, to an entirely new level. With an AI co-pilot that not only knows all available platform items but also understands how to use them optimally and how others have used them effectively, the development of new composable applications or rapid Proofs of Concept (PoCs) becomes faster than ever before. The new interfaces enabled by GenAI go beyond mere suggestion. Think of natural language chatbot interfaces for giving commands, where the platform responds like a virtual assistant. Crucially, thanks to advancements like Model Context Protocol (MCP) or similar tool-use capabilities, the GenAI-powered platform can move beyond just "suggesting" and actively "doing". It can execute complex workflows, interact with external APIs, and trigger actions within your infrastructure. This fosters a true cognitive architecture where the model isn't just generating text but is an active participant in your operations, capable of generating architectural diagrams, provisioning resources, or even deploying components based on a simple natural language description. The vision is that of a "platform agent" or an "AI persona" that learns and adapts to the specific needs of the team and the individual developer, constantly optimizing their path and facilitating the adoption of best practices. Platforms: The Launchpad for Ai-Powered Applications This synergy is two-way, a deep symbiotic relationship. If, on one hand, GenAI infuses new intelligence and vitality into platforms, on the other, your Internal Developer Platforms are, and will increasingly become, the essential launchpad for the unstoppable explosion of AI-powered applications. The complex and often winding journey of an artificial intelligence model—from the very first phase of experimentation and prototyping, through intensive training, to serving in production and scalable inference—is riddled with often daunting infrastructural complexities. Dedicated GPU clusters, specialized Machine Learning frameworks, complex data pipelines, and scalable, secure, and performant serving endpoints are by no means trivial for every single team to manage independently. And this is where your platform uniquely shines. It has the power to abstract away all the thorny and technical details of AI infrastructure, providing self-service and on-demand provisioning of the exact compute resources (CPU, various types of GPUs), storage (object storage, data lakes), and networking required for every single phase of the model's lifecycle. Imagine a developer who has just finished training a new model and needs to deploy an inference service. Instead of interacting with the Ops team for days or weeks, they simply request it through an intuitive self-service portal on the platform, and within minutes, the platform automatically provisions the necessary hardware (perhaps a dedicated GPU instance), deploys the model to a scalable endpoint (e.g., a serverless service or a container on a dedicated cluster), and, transparently, even generates a secure API key for access and consumption. This process eliminates days or weeks of manual configuration, of tickets and waiting times, transforming a complex and often frustrating MLOps challenge into a fluid, instant, and completely self-service operation. The platform manages not only serving but the entire lifecycle: from data preparation, to training clusters, to evaluation and A/B testing phases, all the way to post-deployment monitoring. Furthermore, platforms provide crucial golden paths for AI application development at the application layer. There's no longer a need for every team to reinvent the wheel for common AI patterns. Your platform can offer pre-built templates and codified best practices for integrating Large Language Models (LLMs), implementing patterns like Retrieval-Augmented Generation (RAG) with connectors to your internal data sources, or setting up complete pipelines for model monitoring and evaluation. Think of robust libraries and opinionated frameworks for prompt engineering, for managing model and dataset versions, for specific AI model observability (e.g., tools for bias detection, model interpretation, or drift management). The platform becomes a hub for collaboration on AI assets, facilitating the sharing and reuse of models, datasets, and components, including the development of AI agents. By embedding best practices and pre-integrating the most common and necessary AI services, every single developer, even one without a deep Machine Learning background, is empowered to infuse their applications with intelligent, cutting-edge capabilities. This not only democratizes AI development across the organization but unlocks unprecedented innovation that was previously limited to a few specialized teams. The Future Is Symbiotic: Your Next Move The era of AI-native development isn't an option; it's an imminent reality, and it urgently demands AI-native platforms. The marriage of GenAI and Platform Engineering isn't just an evolutionary step; it's a revolutionary leap destined to redefine the very foundations of our craft. GenAI makes platforms intrinsically smarter, more intuitive, more responsive, and consequently, incredibly more powerful. Platforms, in turn, provide the robust, self-service infrastructure and the well-paved roads necessary to massively accelerate the adoption and deployment of AI across the enterprise, transforming potential into reality. Are you ready to stop building for AI and start building with AI? Now is the time to act. Identify the most painful bottlenecks in your current DevEx and think about how GenAI could transform them. Prioritize the creation of self-service capabilities for AI infrastructure, making model deployment as simple as that of a traditional microservice. Cultivate a culture of "platform as a product", where AI is not just a consumer, but a fundamental feature of the platform itself. The future of software development isn't just about AI-powered applications; it's about an AI-powered development experience that completely redefines the concepts of productivity, creativity, and the very act of value creation. Embrace this unstoppable alliance, and unlock the next fascinating frontier of innovation. The time of static platforms is over. The era of intelligent platforms has just begun. More
Misunderstanding Agile: Bridging The Gap With A Kaizen Mindset

Misunderstanding Agile: Bridging The Gap With A Kaizen Mindset

By Pabitra Saikia
In recent years, Agile has become closely associated with modern software development, promoting customer-focused value delivery, regular feedback loops, and empowered teams. However, beneath the familiar terminology, many technical professionals are beginning to question whether Agile is achieving its intended outcomes or simply adding complexity. Many experienced developers and engineers voice discontent with excessive processes, poorly executed rituals, and a disconnect between Agile principles and the realities of their daily work. As organizations push for broader Agile adoption, understanding the roots of this discontent is crucial — not only for improving team morale but also for ensuring that Agile practices genuinely add value rather than becoming just another management fad. The Agile Manifesto The Agile Manifesto defines a set of values and principles that guide software development (and other products). It inspires various frameworks and methods to support iterative delivery, early and continuous value creation, team collaboration, and continuous improvement through regular feedback and adaptation. Teams may misinterpret their core purpose when implementing Agile methodologies that do not adhere to their foundational principles. This misinterpretation can distort the framework’s adaptability and focus on customer-centric value delivery. The sooner we assess the health of Agile practices and take corrective action, the greater the benefits for business outcomes and team morale. Feedback on Agile Practices Here are some common feedback themes based on Scrum teams' perceptions of their experience with Agile practices. 1. Disconnect Between Agile Theory and Practice The Agile Manifesto sounds excellent, but real-world Agile feels like “Agile theater” with ceremonies and buzzwords. Cause: Many teams adopt Agile practices solely to undergo the process without embracing its values. Change in perception: Recognize the difference between doing vs. being Agile. Foster a culture of self-organized teams delivering value with continuous improvement to customers. 2. Lack of Autonomy Agile can feel prescriptive, with strict roles and rituals that constrain engineers. Cause: An overly rigid application of Agile can stifle creativity and reduce a sense of ownership. Engineers thrive when given the freedom to solve problems rather than being confined to a prescriptive approach. Change in perception: Agile teams are empowered to make decisions. They don’t dwell on obstacles—they take ownership, lead through collaboration, and focus on delivering solutions with achievable delivery commitments. 3. Misuse of Agile as a Management Tool Agile is used for micromanagement to track velocity and demand commitments. Cause: Agile is sometimes misunderstood to focus on metrics over outcomes. When velocity is prioritized over value, the purpose gets lost. Change in perception: Focus on principles and purpose, not just processes. Processes aren’t about restriction, but repeatable and reliable success. Agile processes support the team by reinforcing what works and making success scalable. 4. Lack of Visible Improvement Despite Agile processes, teams still face delays, unclear requirements, or poor decisions. Cause: When teams struggle to show visible improvement, foundational elements — like a clear roadmap and meaningful engagement with engineers around the product vision — are often missing. Change in perception: Anchor Agile practices to tangible outcomes, such as faster feedback loops, improved quality, and reduced defects. Continuously inspect and adapt the process and product direction, ensuring both evolve together to drive meaningful progress. How to Bridge the Gap With Kaizen The disconnect between Agile’s theoretical benefits and practical execution can undermine empowerment and autonomy for a self-organized team, ultimately producing outcomes antithetical to the methodology’s intent of delivering iterative, user-focused solutions. Without proper contextualization and leadership buy-in, such implementations risk reducing Agile to a superficial process rather than a cultural shift toward continuous improvement. As the Japanese philosophy of Kaizen reminds us, meaningful change happens incrementally. Agile retrospectives embody this mindset. When the process isn't working, the team must come together — not to assign blame but to reflect, realign, and evolve. Leveraging the Power of Retrospective for Continuous Improvement Misalignment with the value statement is a core reason Agile processes fail. Agile teams should go beyond surface-level issues and explore more profound, value-driven questions in the retrospective to get the most out of them. Some of the recommended core areas for effective Agile retrospectives: Value Alignment What does “value” mean to us in this sprint or project? Are we clear on what our customer truly needs right now? Flow and Process Efficiency Where did work get blocked and delayed, and is the team aware of the communication path to seek support? Are our ceremonies (stand-ups, planning, reviews) meaningful, valuable, or just rituals? Commitment and Focus Were our sprint goals clear and achievable? Did we commit to too much or too little? Customer Centricity Did we receive or act on honest feedback from users or stakeholders? Do we know how the work impacted the customer? Suggested Template for Agile Retrospective Takeaways Use this template to capture and communicate the outcomes of your retrospective. It helps ensure accountability, transparency, and alignment going forward. A structured retrospective framework for teams to reflect on performance and improve workflows. 1. Keep doing what’s working well: Practical and valuable habits and Practices. What reinforces team strengths and morale? Examples: Effective and outcome-based meeting Collaboration for efficient dependency management 2. Do less of what we are doing too much of: Process overdose. Encourage balance and efficiency. Overused activities are not always valuable. Examples: Too many long meetings drain team morale and disrupt daily progress. Excessive code reviews on trivial commits delay code merge and integration. 3. Stop doing what’s not working and should be eliminated: Identify waste or negative patterns. Break unhealthy habits that reduce productivity or hurt team morale. Examples: Starting work before stories and the Definition of Done are fully defined - action before understanding purpose, business value, and success criteria Skipping retrospectives - detached from improvement 4. Start doing what new practices or improvements we should try: Encourages innovation, experimentation, and growth. A great place to introduce ideas that the team hasn't tried yet. Examples: Add a mid-sprint check-in Start using sprint goals more actively Conclusion Agile is based on the principle of progressing through continuous improvement and incrementally delivering value. Retrospective meetings are crucial in this process, as they allow teams to pause, reflect, and realign themselves to ensure they are progressing in the right direction. This approach aligns with the Kaizen philosophy of ongoing improvement. More

Trend Report

Generative AI

AI technology is now more accessible, more intelligent, and easier to use than ever before. Generative AI, in particular, has transformed nearly every industry exponentially, creating a lasting impact driven by its (delivered) promises of cost savings, manual task reduction, and a slew of other benefits that improve overall productivity and efficiency. The applications of GenAI are expansive, and thanks to the democratization of large language models, AI is reaching every industry worldwide.Our focus for DZone's 2025 Generative AI Trend Report is on the trends surrounding GenAI models, algorithms, and implementation, paying special attention to GenAI's impacts on code generation and software development as a whole. Featured in this report are key findings from our research and thought-provoking content written by everyday practitioners from the DZone Community, with topics including organizations' AI adoption maturity, the role of LLMs, AI-driven intelligent applications, agentic AI, and much more.We hope this report serves as a guide to help readers assess their own organization's AI capabilities and how they can better leverage those in 2025 and beyond.

Generative AI

Refcard #158

Machine Learning Patterns and Anti-Patterns

By Tuhin Chattopadhyay DZone Core CORE
Machine Learning Patterns and Anti-Patterns

Refcard #269

Getting Started With Data Quality

By Miguel Garcia DZone Core CORE
Getting Started With Data Quality

More Articles

Automating Sentiment Analysis Using Snowflake Cortex
Automating Sentiment Analysis Using Snowflake Cortex

In this hands-on tutorial, you'll learn how to automate sentiment analysis and categorize customer feedback using Snowflake Cortex, all through a simple SQL query without needing to build heavy and complex machine learning algorithms. No MLOps is required. We'll work with sample data simulating real customer feedback comments about a fictional company, "DemoMart," and classify each customer feedback entry using Cortex's built-in function. We'll determine sentiment (positive, negative, neutral) and label the feedback into different categories. The goal is to: Load a sample dataset of customer feedback into a Snowflake table.Use the built-in LLM-powered classification (CLASSIFY_TEXT) to tag each entry with a sentiment and classify the feedback into a specific category. Automate this entire workflow to run weekly using Snowflake Task.Generate insights from the classified data. Prerequisites A Snowflake account with access to Snowflake CortexRole privileges to create tables, tasks, and proceduresBasic SQL knowledge Step 1: Create Sample Feedback Table We'll use a sample dataset of customer feedback that covers products, delivery, customer support, and other areas. Let's create a table in Snowflake to store this data. Here is the SQL for creating the required table to hold customer feedback. SQL CREATE OR REPLACE TABLE customer.csat.feedback ( feedback_id INT, feedback_ts DATE, feedback_text STRING ); Now, you can load the data into the table using Snowflake's Snowsight interface. The sample data "customer_feedback_demomart.csv" is available in the GitHub repo. You can download and use it. Step 2: Use Cortex to Classify Sentiment and Category Let's read and process each row from the feedback table. Here's the magic. This single query classifies each piece of feedback for both sentiment and category: SQL SELECT feedback_id, feedback_ts, feedback_text, SNOWFLAKE.CORTEX.CLASSIFY_TEXT(feedback_text, ['positive', 'negative', 'neutral']):label::STRING AS sentiment, SNOWFLAKE.CORTEX.CLASSIFY_TEXT( feedback_text, ['Product', 'Customer Service', 'Delivery', 'Price', 'User Experience', 'Feature Request'] ):label::STRING AS feedback_category FROM customer.csat.feedback LIMIT 10; I have used the CLASSIFY_TEXT Function available within Snowflake's cortex to derive the sentiment based on the feedback_text and further classify it into a specific category the feedback is associated with, such as 'Product', 'Customer Service', 'Delivery', and so on. P.S.: You can change the categories based on your business needs. Step 3: Store Classified Results Let's store the classified results in a separate table for further reporting and analysis purposes. For this, I have created a table with the name feedback_classified as shown below. SQL CREATE OR REPLACE TABLE customer.csat.feedback_classified ( feedback_id INT, feedback_ts DATE, feedback_text STRING, sentiment STRING, feedback_category STRING ); Initial Bulk Load Now, let's do an initial bulk classification for all existing data before moving on to the incremental processing of newly arriving data. SQL -- Initial Load INSERT INTO customer.csat.feedback_classified SELECT feedback_id, feedback_ts, feedback_text, SNOWFLAKE.CORTEX.CLASSIFY_TEXT(feedback_text, ['positive', 'negative', 'neutral']):label::STRING, SNOWFLAKE.CORTEX.CLASSIFY_TEXT( feedback_text, ['Product', 'Customer Service', 'Delivery', 'Price', 'User Experience', 'Feature Request'] ):label::STRING AS feedback_label, CURRENT_TIMESTAMP AS PROCESSED_TIMESTAMP FROM customer.csat.feedback; Once the initial load is completed successfully, let's build an SQL that fetches only incremental data based on the processed_ts column value. For the incremental load, we need fresh data with customer feedback. For that, let's insert ten new records into our raw table customer.csat.feedback SQL INSERT INTO customer.csat.feedback (feedback_id, feedback_ts, feedback_text) VALUES (5001, CURRENT_DATE, 'My DemoMart order was delivered to the wrong address again. Very disappointing.'), (5002, CURRENT_DATE, 'I love the new packaging DemoMart is using. So eco-friendly!'), (5003, CURRENT_DATE, 'The delivery speed was slower than promised. Hope this improves.'), (5004, CURRENT_DATE, 'The product quality is excellent, I’m genuinely impressed with DemoMart.'), (5005, CURRENT_DATE, 'Customer service helped me cancel and reorder with no issues.'), (5006, CURRENT_DATE, 'DemoMart’s website was down when I tried to place my order.'), (5007, CURRENT_DATE, 'Thanks DemoMart for the fast shipping and great support!'), (5008, CURRENT_DATE, 'Received a damaged item. This is the second time with DemoMart.'), (5009, CURRENT_DATE, 'DemoMart app is very user-friendly. Shopping is a breeze.'), (5010, CURRENT_DATE, 'The feature I wanted is missing. Hope DemoMart adds it soon.'); Step 4: Automate Incremental Data Processing With TASK Now that we have newly added (incremental) fresh data into our raw table, let's create a task to pick up only new data and classify it automatically. We will schedule this task to run every Sunday at midnight UTC. SQL --Creating task CREATE OR REPLACE TASK CUSTOMER.CSAT.FEEDBACK_CLASSIFIED WAREHOUSE = COMPUTE_WH SCHEDULE = 'USING CRON 0 0 * * 0 UTC' -- Run evey Sunday at midnight UTC AS INSERT INTO customer.csat.feedback_classified SELECT feedback_id, feedback_ts, feedback_text, SNOWFLAKE.CORTEX.CLASSIFY_TEXT(feedback_text, ['positive', 'negative', 'neutral']):label::STRING, SNOWFLAKE.CORTEX.CLASSIFY_TEXT( feedback_text, ['Product', 'Customer Service', 'Delivery', 'Price', 'User Experience', 'Feature Request'] ):label::STRING AS feedback_label, CURRENT_TIMESTAMP AS PROCESSED_TIMESTAMP FROM customer.csat.feedback WHERE feedback_ts > (SELECT COALESCE(MAX(PROCESSED_TIMESTAMP),'1900-01-01') FROM CUSTOMER.CSAT.FEEDBACK_CLASSIFIED ); This will automatically run every Sunday at midnight UTC, process any newly arrived customer feedback, and classify it. Step 5: Visualize Insights You can now build dashboards in Snowsight to see weekly trends using a simple query like this: SQL SELECT feedback_category, sentiment, COUNT(*) AS total FROM customer.csat.feedback_classified GROUP BY feedback_category, sentiment ORDER BY total DESC; Conclusion With just a few lines of SQL, you: Ingested raw feedback into a Snowflake table.Used Snowflake Cortex to classify customer feedback and derive sentiment and feedback categoriesAutomated the process to run weeklyBuilt insights into the classified feedback for business users/leadership team to act upon by category and sentiment This approach is ideal for support teams, product teams, and leadership, as it allows them to continuously monitor customer experience without building or maintaining ML infrastructure. GitHub I have created a GitHub page with all the code and sample data. You can access it freely. The whole dataset generator and SQL scripts are available on GitHub.

By Rajanikantarao Vellaturi
Converting List to String in Terraform
Converting List to String in Terraform

In Terraform, you will often need to convert a list to a string when passing values to configurations that require a string format, such as resource names, cloud instance metadata, or labels. Terraform uses HCL (HashiCorp Configuration Language), so handling lists requires functions like join() or format(), depending on the context. How to Convert a List to a String in Terraform The join() function is the most effective way to convert a list into a string in Terraform. This concatenates list elements using a specified delimiter, making it especially useful when formatting data for use in resource names, cloud tags, or dynamically generated scripts. The join(", ", var.list_variable) function, where list_variable is the name of your list variable, merges the list elements with ", " as the separator. Here’s a simple example: Shell variable "tags" { default = ["dev", "staging", "prod"] } output "tag_list" { value = join(", ", var.tags) } The output would be: Shell "dev, staging, prod" Example 1: Formatting a Command-Line Alias for Multiple Commands In DevOps and development workflows, it’s common to run multiple commands sequentially, such as updating repositories, installing dependencies, and deploying infrastructure. Using Terraform, you can dynamically generate a shell alias that combines these commands into a single, easy-to-use shortcut. Shell variable "commands" { default = ["git pull", "npm install", "terraform apply -auto-approve"] } output "alias_command" { value = "alias deploy='${join(" && ", var.commands)}'" } Output: Shell "alias deploy='git pull && npm install && terraform apply -auto-approve'" Example 2: Creating an AWS Security Group Description Imagine you need to generate a security group rule description listing allowed ports dynamically: Shell variable "allowed_ports" { default = [22, 80, 443] } resource "aws_security_group" "example" { name = "example_sg" description = "Allowed ports: ${join(", ", [for p in var.allowed_ports : tostring(p)])}" dynamic "ingress" { for_each = var.allowed_ports content { from_port = ingress.value to_port = ingress.value protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] } } } The join() function, combined with a list comprehension, generates a dynamic description like "Allowed ports: 22, 80, 443". This ensures the security group documentation remains in sync with the actual rules. Alternative Methods For most use cases, the join() function is the best choice for converting a list into a string in Terraform, but the format() and jsonencode() functions can also be useful in specific scenarios. 1. Using format() for Custom Formatting The format() function helps control the output structure while joining list items. It does not directly convert lists to strings, but it can be used in combination with join() to achieve custom formatting. Shell variable "ports" { default = [22, 80, 443] } output "formatted_ports" { value = format("Allowed ports: %s", join(" | ", var.ports)) } Output: Shell "Allowed ports: 22 | 80 | 443" 2. Using jsonencode() for JSON Output When passing structured data to APIs or Terraform modules, you can use the jsonencode() function, which converts a list into a JSON-formatted string. Shell variable "tags" { default = ["dev", "staging", "prod"] } output "json_encoded" { value = jsonencode(var.tags) } Output: Shell "["dev", "staging", "prod"]" Unlike join(), this format retains the structured array representation, which is useful for JSON-based configurations. Creating a Literal String Representation in Terraform Sometimes you need to convert a list into a literal string representation, meaning the output should preserve the exact structure as a string (e.g., including brackets, quotes, and commas like a JSON array). This is useful when passing data to APIs, logging structured information, or generating configuration files. For most cases, jsonencode() is the best option due to its structured formatting and reliability in API-related use cases. However, if you need a simple comma-separated string without additional formatting, join() is the better choice. Common Scenarios for List-to-String Conversion in Terraform Converting a list to a string in Terraform is useful in multiple scenarios where Terraform requires string values instead of lists. Here are some common use cases: Naming resources dynamically: When creating resources with names that incorporate multiple dynamic elements, such as environment, application name, and region, these components are often stored as a list for modularity. Converting them into a single string allows for consistent and descriptive naming conventions that comply with provider or organizational naming standards.Tagging infrastructure with meaningful identifiers: Tags are often key-value pairs where the value needs to be a string. If you’re tagging resources based on a list of attributes (like team names, cost centers, or project phases), converting the list into a single delimited string ensures compatibility with tagging schemas and improves downstream usability in cost analysis or inventory tools.Improving documentation via descriptions in security rules: Security groups, firewall rules, and IAM policies sometimes allow for free-form text descriptions. Providing a readable summary of a rule’s purpose, derived from a list of source services or intended users, can help operators quickly understand the intent behind the configuration without digging into implementation details.Passing variables to scripts (e.g., user_data in EC2 instances): When injecting dynamic values into startup scripts or configuration files (such as a shell script passed via user_data), you often need to convert structured data like lists into strings. This ensures the script interprets the input correctly, particularly when using loops or configuration variables derived from Terraform resources.Logging and monitoring, ensuring human-readable outputs: Terraform output values are often used for diagnostics or integration with logging/monitoring systems. Presenting a list as a human-readable string improves clarity in logs or dashboards, making it easier to audit deployments and troubleshoot issues by conveying aggregated information in a concise format. Key Points Converting lists to strings in Terraform is crucial for dynamically naming resources, structuring security group descriptions, formatting user data scripts, and generating readable logs. Using join() for readable concatenation, format() for creating formatted strings, and jsonencode() for structured output ensures clarity and consistency in Terraform configurations.

By Mariusz Michalowski
How to Install and Set Up Jenkins With Docker Compose
How to Install and Set Up Jenkins With Docker Compose

Jenkins is an open-source CI/CD tool written in Java that is used for organising the CI/CD pipelines. Currently, at the time of writing this blog, it has 24k stars and 9.1k forks on GitHub. With over 2000 plugin support, Jenkins is a well-known tool in the DevOps world. The following are multiple ways to install and set up Jenkins: Using the Jenkins Installer package for WindowsUsing Homebrew for macOSUsing the Generic Java Package (war)Using DockerUsing KubernetesUsing apt for Ubuntu/Debian Linux OS In this tutorial blog, I will cover the step-by-step process to install and setup Jenkins using Docker Compose for an efficient and seamless CI/CD experience. Using Dockerwith Jenkins allows users to set up a Jenkins instance quickly with minimal manual configuration. It ensures portability and scalability, as with Docker Compose, users can easily set up Jenkins and its required services, such as volumes and networks, using a single YAML file. This allows the users to easily manage and replicate the setup in different environments. Installing Jenkins Using Docker Compose Installing Jenkins with Docker Compose makes the setup process simple and efficient, and allows us to define configurations in a single file. This approach removes the complexity and difficulty faced while installing Jenkins manually and ensures easy deployment, portability, and quick scaling. Prerequisite As a prerequisite, Docker Desktop needs to be installed, up and running on the local machine. Docker Compose is included in Docker Desktop along with Docker Engine and Docker CLI. Jenkins With Docker Compose Jenkins could be instantly set up by running the following docker-compose command using the terminal: Plain Text docker compose up -d This docker-compose command could be run by navigating to the folder where the Docker Compose file is placed. So, let’s create a new folder jenkins-demo and inside this folder, let’s create another new folder jenkins-configuration and a new file docker-compose.yaml. The following is the folder structure: Plain Text jenkins-demo/ ├── jenkins-configuration/ └── docker-compose.yaml The following content should be added to the docker-compose.yaml file. YAML # docker-compose.yaml version: '3.8' services: jenkins: image: jenkins/jenkins:lts privileged: true user: root ports: - 8080:8080 - 50000:50000 container_name: jenkins volumes: - /Users/faisalkhatri/jenkins-demo/jenkins-configuration:/var/jenkins_home - /var/run/docker.sock:/var/run/docker.sock Decoding the Docker Compose File The first line in the file is a comment. The services block starts from the second line, which includes the details of the Jenkins service. The Jenkins service block contains the image, user, and port details. The Jenkins service will run the latest Jenkins image with root privileges and name the container as jenkins. The ports are responsible for mapping container ports to the host machine. The details of these ports are as follows: 8080:8080:This will map the port 8080 inside the container to the port 8080 on the host machine. It is important, as it is required for accessing the Jenkins web interface. It will help us in accessing Jenkins in the browser by navigating to http://localhost:808050000:50000:This will map the port 50000 inside the container to port 50000 on the host machine. It is the JNLP (Java Network Launch Protocol) agent port, which is used for connecting Jenkins build agents to the Jenkins Controller instance. It is important, as we would be using distributed Jenkins setups, where remote build agents connect to the Jenkins Controller instance. The privileged: true setting will grant the container full access to the host system and allow running the process as the root user on the host machine. This will enable the container to perform the following actions : Access all the host devicesModify the system configurationsMount file systemsManage network interfacesPerform admin tasks that a regular container cannot perform These actions are important, as Jenkins may require permissions to run specific tasks while interacting with the host system, like managing Docker containers, executing system commands, or modifying files outside the container. Any data stored inside the container is lost when the container stops or is removed. To overcome this issue, Volumes are used in Docker to persist data beyond the container’s lifecycle. We will use Docker Volumes to keep the Jenkins data intact, as it is needed every time we start Jenkins. Jenkins data would be stored in the jenkins-configuration folder on the local machine. The /Users/faisalkhatri/jenkins-demo/jenkins-configuration on the host is mapped to /var/jenkins_home in the container. The changes made inside the container in the respective folder will reflect on the folder on the host machine and vice versa. This line /var/run/docker.sock:/var/run/docker.sock, mounts the Docker socket from the host into the container, allowing the Jenkins container to directly communicate with the Docker daemon running on the host machine. This enables Jenkins, which is running inside the container, to manage and run Docker commands on the host, allowing it to build and run other Docker containers as a part of CI/CD pipelines. Installing Jenkins With Docker Compose Let’s run the installation process step by step as follows: Step 1 — Running Jenkins Setup Open a terminal, navigate to the jenkins-demo folder, and run the following command: Plain Text docker compose up -d After the command is successfully executed, open any browser on your machine and navigate to https://localhost:8080, you should be able to find the Unlock Jenkins screen as shown in the screenshot below: Step 2 — Finding the Jenkins Password From the Docker Container The password to unlock Jenkins could be found by navigating to the jenkins container (remember we had given the name jenkins to the container in the Docker Compose file) and checking out its logs by running the following command on the terminal: Plain Text docker logs jenkins Copy the password from the logs, paste it in the Administrator password field on the Unlock Jenkins screen in the browser, and click on the Continue button. Step 3 — Setting up Jenkins The “Getting Started” screen will be displayed next, which will prompt us to install plugins to set up Jenkins. Select the Install suggested plugins and proceed with the installation. It will take some time for the installations to complete. Step 4 — Creating Jenkins user After the installation is complete, Jenkins will show the next screen to update the user details. It is recommended to update the user details with a password and click on Save and Continue. This username and password can then be used to log in to Jenkins. Step 5 — Instance Configuration In this window, we can update the Jenkins accessible link so it can be further used to navigate and run Jenkins. However, we can leave it as it is now — http://localhost:8080. Click on the Save and Finish button to complete the set up. With this, the Jenkins installation and set up are complete; we are now ready to use Jenkins. Summary Docker is the go-to tool for instantly spinning up a Jenkins instance. Using Docker Compose, we installed Jenkins successfully in just 5 simple steps. Once Jenkins is up and started, we can install the required plugin and set up CI/CD workflows as required. Using Docker Volumes allows us to use Jenkins seamlessly, as it saves the instance data between restarts. In the next tutorial, we will learn about installing and setting up Jenkins agents that will help us run the Jenkins jobs.

By Faisal Khatri DZone Core CORE
Beyond Java Streams: Exploring Alternative Functional Programming Approaches in Java
Beyond Java Streams: Exploring Alternative Functional Programming Approaches in Java

Few concepts in Java software development have changed how we approach writing code in Java than Java Streams. They provide a clean, declarative way to process collections and have thus become a staple in modern Java applications. However, for all their power, Streams present their own challenges, especially where flexibility, composability, and performance optimization are priorities. What if your programming needs more expressive functional paradigms? What if you are looking for laziness and safety beyond what Streams provide and want to explore functional composition at a lower level? In this article, we will be exploring other functional programming techniques you can use in Java that do not involve using the Streams API. Java Streams: Power and Constraints Java Streams are built on a simple premise—declaratively process collections of data using a pipeline of transformations. You can map, filter, reduce, and collect data with clean syntax. They eliminate boilerplate and allow chaining operations fluently. However, Streams fall short in some areas: They are not designed for complex error handling.They offer limited lazy evaluation capabilities.They don’t integrate well with asynchronous processing.They lack persistent and immutable data structures. One of our fellow DZone members wrote a very good article on "The Power and Limitations of Java Streams," which describes both the advantages and limitations of what you can do using Java Streams. I agree that Streams provide a solid basis for functional programming, but I suggest looking around for something even more powerful. The following alternatives are discussed within the remainder of this article, expanding upon points introduced in the referenced piece. Vavr: A Functional Java Library Why Vavr? Provides persistent and immutable collections (e.g., List, Set, Map)Includes Try, Either, and Option types for robust error handlingSupports advanced constructs like pattern matching and function composition Vavr is often referred to as a "Scala-like" library for Java. It brings in a strong functional flavor that bridges Java's verbosity and the expressive needs of functional paradigms. Example: Java Option<String> name = Option.of("Bodapati"); String result = name .map(n -> n.toUpperCase()) .getOrElse("Anonymous"); System.out.println(result); // Output: BODAPATI Using Try, developers can encapsulate exceptions functionally without writing try-catch blocks: Java Try<Integer> safeDivide = Try.of(() -> 10 / 0); System.out.println(safeDivide.getOrElse(-1)); // Output: -1 Vavr’s value becomes even more obvious in concurrent and microservice environments where immutability and predictability matter. Reactor and RxJava: Going Asynchronous Reactive programming frameworks such as Project Reactor and RxJava provide more sophisticated functional processing streams that go beyond what Java Streams can offer, especially in the context of asynchrony and event-driven systems. Key Features: Backpressure control and lazy evaluationAsynchronous stream compositionRich set of operators and lifecycle hooks Example: Java Flux<Integer> numbers = Flux.range(1, 5) .map(i -> i * 2) .filter(i -> i % 3 == 0); numbers.subscribe(System.out::println); Use cases include live data feeds, user interaction streams, and network-bound operations. In the Java ecosystem, Reactor is heavily used in Spring WebFlux, where non-blocking systems are built from the ground up. RxJava, on the other hand, has been widely adopted in Android development where UI responsiveness and multithreading are critical. Both libraries teach developers to think reactively, replacing imperative patterns with a declarative flow of data. Functional Composition with Java’s Function Interface Even without Streams or third-party libraries, Java offers the Function<T, R> interface that supports method chaining and composition. Example: Java Function<Integer, Integer> multiplyBy2 = x -> x * 2; Function<Integer, Integer> add10 = x -> x + 10; Function<Integer, Integer> combined = multiplyBy2.andThen(add10); System.out.println(combined.apply(5)); // Output: 20 This simple pattern is surprisingly powerful. For example, in validation or transformation pipelines, you can modularize each logic step, test them independently, and chain them without side effects. This promotes clean architecture and easier testing. JEP 406 — Pattern Matching for Switch Pattern matching, introduced in Java 17 as a preview feature, continues to evolve and simplify conditional logic. It allows type-safe extraction and handling of data. Example: Java static String formatter(Object obj) { return switch (obj) { case Integer i -> "Integer: " + i; case String s -> "String: " + s; default -> "Unknown type"; }; } Pattern matching isn’t just syntactic sugar. It introduces a safer, more readable approach to decision trees. It reduces the number of nested conditions, minimizes boilerplate, and enhances clarity when dealing with polymorphic data. Future versions of Java are expected to enhance this capability further with deconstruction patterns and sealed class integration, bringing Java closer to pattern-rich languages like Scala. Recursion and Tail Call Optimization Workarounds Recursion is fundamental in functional programming. However, Java doesn’t optimize tail calls, unlike languages like Haskell or Scala. That means recursive functions can easily overflow the stack. Vavr offers a workaround via trampolines: Java static Trampoline<Integer> factorial(int n, int acc) { return n == 0 ? Trampoline.done(acc) : Trampoline.more(() -> factorial(n - 1, n * acc)); } System.out.println(factorial(5, 1).result()); Trampolining ensures that recursive calls don’t consume additional stack frames. Though slightly verbose, this pattern enables functional recursion in Java safely. Conclusion: More Than Just Streams "The Power and Limitations of Java Streams" offers a good overview of what to expect from Streams, and I like how it starts with a discussion on efficiency and other constraints. So, I believe Java functional programming is more than just Streams. There is a need to adopt libraries like Vavr, frameworks like Reactor/RxJava, composition, pattern matching, and recursion techniques. To keep pace with the evolution of the Java enterprise platform, pursuing hybrid patterns of functional programming allows software architects to create systems that are more expressive, testable, and maintainable. Adopting these tools doesn’t require abandoning Java Streams—it means extending your toolbox. What’s Next? Interested in even more expressive power? Explore JVM-based functional-first languages like Kotlin or Scala. They offer stronger FP constructs, full TCO, and tighter integration with functional idioms. Want to build smarter, more testable, and concurrent-ready Java systems? Time to explore functional programming beyond Streams. The ecosystem is richer than ever—and evolving fast. What are your thoughts about functional programming in Java beyond Streams? Let’s talk in the comments!

By Rama Krishna Prasad Bodapati
Serverless IAM: Implementing IAM in Serverless Architectures with Lessons from the Security Trenches
Serverless IAM: Implementing IAM in Serverless Architectures with Lessons from the Security Trenches

When I first began working with serverless architectures in 2018, I quickly discovered that my traditional security playbook wasn't going to cut it. The ephemeral nature of functions, the distributed service architecture, and the multiplicity of entry points created a fundamentally different security landscape. After several years of implementing IAM strategies for serverless applications across various industries, I've compiled the approaches that have proven most effective in real-world scenarios. This article shares these insights, focusing on practical Python implementations that address the unique security challenges of serverless environments. The Shifting Security Paradigm in Serverless Traditional security models rely heavily on network perimeters and long-running servers where security agents can monitor activity. Serverless computing dismantles this model through several key characteristics: Execution lifetime measured in milliseconds: Functions that spin up, execute, and terminate in the blink of an eye make traditional agent-based security impracticalHighly distributed components: Instead of monolithic services, serverless apps often comprise dozens or hundreds of small functionsMultiple ingress points: Rather than funneling traffic through a single application gatewayComplex service-to-service communication patterns: With functions frequently calling other servicesPerformance sensitivity: Where security overhead can significantly impact cold start times During a financial services project last year, we learned this lesson the hard way when our initial security approach added nearly 800ms to function cold starts—unacceptable for an API that needed to respond in under 300ms total. Core Components of Effective Serverless IAM Through trial and error across multiple projects, I've found that serverless IAM strategies should address four key areas: 1. User and Service Authentication Authenticating users and services in a serverless context requires approaches optimized for stateless, distributed execution: JWT-based authentication: These stateless tokens align perfectly with the ephemeral nature of serverless functionsOpenID Connect (OIDC): For standardized authentication flows that work across service boundariesAPI keys and client secrets: When service-to-service authentication is requiredFederated identity: Leveraging identity providers to offload authentication complexity 2. Authorization and Access Control After verifying identity, you need robust mechanisms to control access: Role-based access control (RBAC): Assigning permissions based on user rolesAttribute-based access control (ABAC): More dynamic permissions based on user attributes and contextPolicy enforcement points: Strategic locations within your architecture where access decisions occur 3. Function-Level Permissions The functions themselves need careful permission management: Principle of least privilege: Granting only the minimal permissions requiredFunction-specific IAM roles: Approving tailored permissions for each functionResource-based policies: Controlling which identities can invoke your functions 4. Secrets Management Secure handling of credentials and sensitive information: Managed secrets services: Cloud-native solutions for storing and accessing secretsEnvironment variables: For injecting configuration at runtimeParameter stores: For less sensitive configuration information Provider-Specific Implementation Patterns Having implemented serverless security across major cloud providers, I've developed practical patterns for each platform. These examples reflect real-world implementations with necessary simplifications for clarity. AWS: Pragmatic IAM Approaches AWS offers several robust options for serverless authentication: Authentication with Amazon Cognito Here's a streamlined example of validating Cognito tokens in a Lambda function, with performance optimizations I've found effective in production: Python # Example: Validating Cognito tokens in a Lambda function import json import os import boto3 import jwt import requests from jwt.algorithms import RSAAlgorithm # Cache of JWKs - crucial for performance jwks_cache = {} def lambda_handler(event, context): try: # Extract token from Authorization header auth_header = event.get('headers', {}).get('Authorization', '') if not auth_header or not auth_header.startswith('Bearer '): return { 'statusCode': 401, 'body': json.dumps({'message': 'Missing or invalid authorization header'}) } token = auth_header.replace('Bearer ', '') # Verify the token decoded_token = verify_token(token) # Process authenticated request with user context user_id = decoded_token.get('sub') user_groups = decoded_token.get('cognito:groups', []) # Your business logic here, using the authenticated user context response_data = process_authorized_request(user_id, user_groups, event) return { 'statusCode': 200, 'body': json.dumps(response_data) } except jwt.ExpiredSignatureError: return { 'statusCode': 401, 'body': json.dumps({'message': 'Token expired'}) } except Exception as e: print(f"Authentication error: {str(e)}") return { 'statusCode': 401, 'body': json.dumps({'message': 'Authentication failed'}) } def verify_token(token): # Decode the token header header = jwt.get_unverified_header(token) kid = header['kid'] # Get the public keys if not cached region = os.environ['AWS_REGION'] user_pool_id = os.environ['USER_POOL_ID'] if not jwks_cache: keys_url = f'https://cognito-idp.{region}.amazonaws.com/{user_pool_id}/.well-known/jwks.json' jwks = requests.get(keys_url).json() jwks_cache.update(jwks) # Find the key that matches the kid in the token key = None for jwk in jwks_cache['keys']: if jwk['kid'] == kid: key = jwk break if not key: raise Exception('Public key not found') # Construct the public key public_key = RSAAlgorithm.from_jwk(json.dumps(key)) # Verify the token payload = jwt.decode( token, public_key, algorithms=['RS256'], audience=os.environ['APP_CLIENT_ID'] ) return payload This pattern has performed well in production, with the key caching strategy reducing token verification time by up to 80% compared to our initial implementation. Secrets Management with AWS Secrets Manager After securing several high-compliance applications, I've found this pattern for secrets management to be both secure and performant: Python # Example: Using AWS Secrets Manager in Lambda with caching import json import boto3 import os from botocore.exceptions import ClientError # Cache for secrets to minimize API calls secrets_cache = {} secrets_ttl = {} SECRET_CACHE_TTL = 300 # 5 minutes in seconds def lambda_handler(event, context): try: # Get the secret - using cache if available and not expired api_key = get_secret('payment-api-key') # Use secret for external API call result = call_payment_api(api_key, event.get('body', {})) return { 'statusCode': 200, 'body': json.dumps({'transactionId': result['id']}) } except ClientError as e: print(f"Error retrieving secret: {e}") return { 'statusCode': 500, 'body': json.dumps({'message': 'Internal error'}) } def get_secret(secret_id): import time current_time = int(time.time()) # Return cached secret if valid if secret_id in secrets_cache and secrets_ttl.get(secret_id, 0) > current_time: return secrets_cache[secret_id] # Create a Secrets Manager client secrets_manager = boto3.client('secretsmanager') # Retrieve secret response = secrets_manager.get_secret_value(SecretId=secret_id) # Parse the secret if 'SecretString' in response: secret_data = json.loads(response['SecretString']) # Cache the secret with TTL secrets_cache[secret_id] = secret_data secrets_ttl[secret_id] = current_time + SECRET_CACHE_TTL return secret_data else: raise Exception("Secret is not a string") The caching strategy here has been crucial in high-volume applications, where we've seen up to 95% reduction in Secrets Manager API calls while maintaining reasonable security through controlled TTL. Azure Serverless IAM Implementation When working with Azure Functions, I've developed these patterns for robust security: Authentication with Azure Active Directory (Entra ID) For enterprise applications on Azure, this pattern has provided a good balance of security and performance: Python # Example: Validating AAD token in Azure Function import json import os import jwt import requests import azure.functions as func from jwt.algorithms import RSAAlgorithm import logging from datetime import datetime, timedelta # Cache for JWKS with TTL jwks_cache = {} jwks_timestamp = None JWKS_CACHE_TTL = timedelta(hours=24) # Refresh keys daily def main(req: func.HttpRequest) -> func.HttpResponse: try: # Extract token auth_header = req.headers.get('Authorization', '') if not auth_header or not auth_header.startswith('Bearer '): return func.HttpResponse( json.dumps({'message': 'Missing or invalid authorization header'}), mimetype="application/json", status_code=401 ) token = auth_header.replace('Bearer ', '') # Validate token start_time = datetime.now() decoded_token = validate_token(token) validation_time = (datetime.now() - start_time).total_seconds() # Log performance for monitoring logging.info(f"Token validation completed in {validation_time} seconds") # Process authenticated request user_email = decoded_token.get('email', 'unknown') user_name = decoded_token.get('name', 'User') return func.HttpResponse( json.dumps({ 'message': f'Hello, {user_name}', 'email': user_email, 'authenticated': True }), mimetype="application/json", status_code=200 ) except Exception as e: logging.error(f"Authentication error: {str(e)}") return func.HttpResponse( json.dumps({'message': 'Authentication failed'}), mimetype="application/json", status_code=401 ) def validate_token(token): global jwks_cache, jwks_timestamp # Decode without verification to get the kid header = jwt.get_unverified_header(token) kid = header['kid'] # Get tenant ID from environment tenant_id = os.environ['TENANT_ID'] # Get the keys if not cached or expired current_time = datetime.now() if not jwks_cache or not jwks_timestamp or current_time - jwks_timestamp > JWKS_CACHE_TTL: keys_url = f'https://login.microsoftonline.com/{tenant_id}/discovery/v2.0/keys' jwks = requests.get(keys_url).json() jwks_cache = jwks jwks_timestamp = current_time logging.info("JWKS cache refreshed") # Find the key matching the kid key = None for jwk in jwks_cache['keys']: if jwk['kid'] == kid: key = jwk break if not key: raise Exception('Public key not found') # Construct the public key public_key = RSAAlgorithm.from_jwk(json.dumps(key)) # Verify the token client_id = os.environ['CLIENT_ID'] issuer = f'https://login.microsoftonline.com/{tenant_id}/v2.0' payload = jwt.decode( token, public_key, algorithms=['RS256'], audience=client_id, issuer=issuer ) return payload The key implementation detail here is the TTL-based JWKS cache, which has dramatically improved performance while ensuring keys are periodically refreshed. Google Cloud Serverless IAM Implementation For Google Cloud Functions, these patterns have proven effective in production environments: Authentication with Firebase This approach works well for consumer-facing applications with Firebase Authentication: Python # Example: Validating Firebase Auth token in Cloud Function import json import firebase_admin from firebase_admin import auth from firebase_admin import credentials import time import logging from functools import wraps # Initialize Firebase Admin SDK (with exception handling for warm instances) try: app = firebase_admin.get_app() except ValueError: cred = credentials.ApplicationDefault() firebase_admin.initialize_app(cred) def require_auth(f): @wraps(f) def decorated_function(request): # Performance tracking start_time = time.time() # Get the ID token auth_header = request.headers.get('Authorization', '') if not auth_header or not auth_header.startswith('Bearer '): return json.dumps({'error': 'Unauthorized - Missing token'}), 401, {'Content-Type': 'application/json'} id_token = auth_header.split('Bearer ')[1] try: # Verify the token decoded_token = auth.verify_id_token(id_token) # Check if token is issued in the past auth_time = decoded_token.get('auth_time', 0) if auth_time > time.time(): return json.dumps({'error': 'Invalid token auth time'}), 401, {'Content-Type': 'application/json'} # Track performance validation_time = time.time() - start_time logging.info(f"Token validation took {validation_time*1000:.2f}ms") # Add user info to request request.user = { 'uid': decoded_token['uid'], 'email': decoded_token.get('email'), 'email_verified': decoded_token.get('email_verified', False), 'auth_time': auth_time } # Continue to the actual function return f(request) except Exception as e: logging.error(f'Error verifying authentication token: {e}') return json.dumps({'error': 'Unauthorized'}), 401, {'Content-Type': 'application/json'} return decorated_function @require_auth def secure_function(request): # The function only executes if auth is successful user = request.user return json.dumps({ 'message': f'Hello, {user["email"]}!', 'userId': user['uid'], 'verified': user['email_verified'] }), 200, {'Content-Type': 'application/json'} The decorator pattern has been particularly valuable, standardizing authentication across dozens of functions in larger projects. Hard-Earned Lessons and Best Practices After several years of implementing serverless IAM in production, I've learned these critical lessons: 1. Implement Least Privilege with Precision One of our earlier projects granted overly broad permissions to Lambda functions. This came back to haunt us when a vulnerability in a dependency was exploited, giving the attacker more access than necessary. Now, we religiously follow function-specific permissions: YAML # AWS SAM example with precise permissions Resources: ProcessPaymentFunction: Type: AWS::Serverless::Function Properties: Handler: payment_handler.lambda_handler Runtime: python3.9 Policies: - DynamoDBReadPolicy: TableName: !Ref CustomerTable - SSMParameterReadPolicy: ParameterName: /prod/payment/api-key - Statement: - Effect: Allow Action: - secretsmanager:GetSecretValue Resource: !Sub arn:aws:secretsmanager:${AWS::Region}:${AWS::AccountId}:secret:payment/* 2. Implement Smart Caching for Performance Authentication processes can significantly impact cold start times. Our testing showed that a poorly implemented token validation flow could add 300-500ms to function execution time. This optimized caching approach has been effective in real-world applications: Python # Example: Smart caching for token validation import json import jwt import time from functools import lru_cache import threading # Thread-safe token cache with TTL class TokenCache: def __init__(self, ttl_seconds=300): self.cache = {} self.lock = threading.RLock() self.ttl = ttl_seconds def get(self, token_hash): with self.lock: cache_item = self.cache.get(token_hash) if not cache_item: return None expiry, user_data = cache_item if time.time() > expiry: # Token cache entry expired del self.cache[token_hash] return None return user_data def set(self, token_hash, user_data): with self.lock: expiry = time.time() + self.ttl self.cache[token_hash] = (expiry, user_data) # Initialize cache token_cache = TokenCache() def get_token_hash(token): # Create a hash of the token for cache key import hashlib return hashlib.sha256(token.encode()).hexdigest() def validate_token(token): # Check cache first token_hash = get_token_hash(token) cached_user = token_cache.get(token_hash) if cached_user: print("Cache hit for token validation") return cached_user print("Cache miss - validating token") # Actual token validation logic here decoded = jwt.decode(token, verify=False) # Placeholder for actual verification # Extract user data user_data = { 'sub': decoded.get('sub'), 'email': decoded.get('email'), 'roles': decoded.get('roles', []) } # Cache the result token_cache.set(token_hash, user_data) return user_data In high-volume applications, intelligent caching like this has improved average response times by 30-40%. 3. Implement Proper Defense in Depth During a security audit of a serverless financial application, we discovered that while our API Gateway had authentication enabled, several functions weren't verifying the JWT token payload. This created a vulnerability where valid but expired tokens could be reused. We now implement defense in depth consistently: Python # Example: Multiple validation layers def process_order(event, context): try: # 1. Verify authentication token (already checked by API Gateway, but verify again) auth_result = verify_token(event) if not auth_result['valid']: return { 'statusCode': 401, 'body': json.dumps({'error': auth_result['error']}) } user = auth_result['user'] # 2. Validate input data structure body = json.loads(event.get('body', '{}')) validation_errors = validate_order_schema(body) if validation_errors: return { 'statusCode': 400, 'body': json.dumps({'errors': validation_errors}) } # 3. Verify business-level authorization auth_result = check_order_authorization(user, body) if not auth_result['authorized']: return { 'statusCode': 403, 'body': json.dumps({'error': auth_result['reason']}) } # 4. Process with proper input sanitization processed_data = sanitize_order_input(body) # 5. Execute with error handling result = create_order(user['id'], processed_data) # 6. Return success with minimal information return { 'statusCode': 200, 'body': json.dumps({'orderId': result['id']}) } except Exception as e: # Log detailed error internally but return generic message log_detailed_error(e) return { 'statusCode': 500, 'body': json.dumps({'error': 'An unexpected error occurred'}) } This approach has proven effective in preventing various attack vectors. 4. Build Secure Service-to-Service Communication One of the more challenging aspects of serverless security is function-to-function communication. In a recent project, we implemented this pattern for secure internal communication: Python # Example: Service-to-service communication with JWT import json import jwt import time import os import requests def generate_service_token(service_name, target_service): # Create a signed JWT for service-to-service auth secret = os.environ['SERVICE_JWT_SECRET'] payload = { 'iss': service_name, 'sub': f'service:{service_name}', 'aud': target_service, 'iat': int(time.time()), 'exp': int(time.time() + 60), # Short-lived token (60 seconds) 'scope': 'service' } return jwt.encode(payload, secret, algorithm='HS256') def call_order_service(customer_id, order_data): service_token = generate_service_token('payment-service', 'order-service') # Call the order service with the token response = requests.post( os.environ['ORDER_SERVICE_URL'], json={ 'customerId': customer_id, 'orderDetails': order_data }, headers={ 'Authorization': f'Bearer {service_token}', 'Content-Type': 'application/json' } ) if response.status_code != 200: raise Exception(f"Order service error: {response.text}") return response.json() This pattern ensures that even if one function is compromised, the attacker has limited time to exploit the service token. 5. Implement Comprehensive Security Monitoring After a security incident where unauthorized token usage went undetected for days, we implemented enhanced security monitoring: Python # Example: Enhanced security logging for authentication import json import time import logging from datetime import datetime import traceback def log_auth_event(event_type, user_id, ip_address, success, details=None): """Log authentication events in a standardized format""" log_entry = { 'timestamp': datetime.utcnow().isoformat(), 'event': f'auth:{event_type}', 'userId': user_id, 'ipAddress': ip_address, 'success': success, 'region': os.environ.get('AWS_REGION', 'unknown'), 'functionName': os.environ.get('AWS_LAMBDA_FUNCTION_NAME', 'unknown') } if details: log_entry['details'] = details # Log in JSON format for easy parsing logging.info(json.dumps(log_entry)) def authenticate_user(event): try: # Extract IP from request context ip_address = event.get('requestContext', {}).get('identity', {}).get('sourceIp', 'unknown') # Extract and validate token auth_header = event.get('headers', {}).get('Authorization', '') if not auth_header or not auth_header.startswith('Bearer '): log_auth_event('token_missing', 'anonymous', ip_address, False) return {'authenticated': False, 'error': 'Missing authentication token'} token = auth_header.replace('Bearer ', '') # Track timing for performance monitoring start_time = time.time() try: # Validate token (implementation details omitted) decoded_token = validate_token(token) validation_time = time.time() - start_time user_id = decoded_token.get('sub', 'unknown') # Log successful authentication log_auth_event('login', user_id, ip_address, True, { 'validationTimeMs': round(validation_time * 1000), 'tokenExpiry': datetime.fromtimestamp(decoded_token.get('exp')).isoformat() }) return { 'authenticated': True, 'user': { 'id': user_id, 'email': decoded_token.get('email'), 'roles': decoded_token.get('roles', []) } } except jwt.ExpiredSignatureError: # Extract user ID from expired token for logging try: expired_payload = jwt.decode(token, options={'verify_signature': False}) user_id = expired_payload.get('sub', 'unknown') except: user_id = 'unknown' log_auth_event('token_expired', user_id, ip_address, False) return {'authenticated': False, 'error': 'Authentication token expired'} except Exception as e: log_auth_event('token_invalid', 'unknown', ip_address, False, { 'error': str(e), 'tokenFragment': token[:10] + '...' if len(token) > 10 else token }) return {'authenticated': False, 'error': 'Invalid authentication token'} except Exception as e: # Unexpected error in authentication process error_details = { 'error': str(e), 'trace': traceback.format_exc() } log_auth_event('auth_error', 'unknown', 'unknown', False, error_details) return {'authenticated': False, 'error': 'Authentication system error'} This comprehensive logging approach has helped us identify suspicious patterns and potential attacks before they succeed. Advanced Patterns from Production Systems As our serverless systems have matured, we've implemented several advanced patterns that have proven valuable: 1. Fine-Grained Authorization with OPA For a healthcare application with complex authorization requirements, we implemented Open Policy Agent: Python # Example: Using OPA for authorization in AWS Lambda import json import requests import os def check_authorization(user, resource, action): """Check if user is authorized to perform action on resource using OPA""" # Create authorization query auth_query = { 'input': { 'user': { 'id': user['id'], 'roles': user['roles'], 'department': user.get('department'), 'attributes': user.get('attributes', {}) }, 'resource': resource, 'action': action, 'context': { 'environment': os.environ.get('ENVIRONMENT', 'dev'), 'timestamp': datetime.utcnow().isoformat() } } } # Query OPA for authorization decision try: opa_url = os.environ['OPA_URL'] response = requests.post( f"{opa_url}/v1/data/app/authz/allow", json=auth_query, timeout=0.5 # Set reasonable timeout ) # Parse response if response.status_code == 200: result = response.json() is_allowed = result.get('result', False) # Log authorization decision log_auth_event( 'authorization', user['id'], 'N/A', is_allowed, { 'resource': resource.get('type') + ':' + resource.get('id'), 'action': action, 'allowed': is_allowed } ) return { 'authorized': is_allowed, 'reason': None if is_allowed else "Not authorized for this operation" } else: # OPA service error log_auth_event( 'authorization_error', user['id'], 'N/A', False, { 'statusCode': response.status_code, 'response': response.text } ) # Fall back to deny by default return { 'authorized': False, 'reason': "Authorization service error" } except Exception as e: # Error communicating with OPA log_auth_event( 'authorization_error', user['id'], 'N/A', False, {'error': str(e)} ) # Default deny on errors return { 'authorized': False, 'reason': "Authorization service unavailable" } This approach has allowed us to implement complex authorization rules that would be unwieldy to code directly in application logic. 2. Multi-Tenant Security Pattern For SaaS applications with multi-tenant requirements, we've developed this pattern: Python # Example: Multi-tenant request handling in AWS Lambda import json import boto3 import os from boto3.dynamodb.conditions import Key def lambda_handler(event, context): try: # Authenticate user auth_result = authenticate_user(event) if not auth_result['authenticated']: return { 'statusCode': 401, 'body': json.dumps({'error': auth_result['error']}) } user = auth_result['user'] # Extract tenant ID from token or path parameter requested_tenant_id = event.get('pathParameters', {}).get('tenantId') user_tenant_id = user.get('tenantId') # Security check: User can only access their assigned tenant if not user.get('isAdmin', False) and requested_tenant_id != user_tenant_id: log_auth_event( 'tenant_access_denied', user['id'], get_source_ip(event), False, { 'requestedTenant': requested_tenant_id, 'userTenant': user_tenant_id } ) return { 'statusCode': 403, 'body': json.dumps({'error': 'Access denied to this tenant'}) } # Create tenant-specific DynamoDB client dynamodb = boto3.resource('dynamodb') table = dynamodb.Table(os.environ['DATA_TABLE']) # Query with tenant isolation to prevent data leakage result = table.query( KeyConditionExpression=Key('tenantId').eq(requested_tenant_id) ) # Audit the data access log_data_access( user['id'], requested_tenant_id, 'query', result['Count'] ) return { 'statusCode': 200, 'body': json.dumps({ 'items': result['Items'], 'count': result['Count'] }) } except Exception as e: # Log the error but return generic message log_error(str(e), event) return { 'statusCode': 500, 'body': json.dumps({'error': 'Internal server error'}) } This pattern has successfully prevented tenant data leakage even in complex multi-tenant systems. Conclusion: Security is a Journey, Not a Destination Implementing IAM in serverless architectures requires a different mindset from traditional application security. Rather than focusing on perimeter security, the emphasis shifts to identity-centric, fine-grained permissions that align with the distributed nature of serverless applications. Through my journey implementing serverless security across various projects, I've found that success depends on several key factors: Designing with least privilege from the start - It's much harder to reduce permissions later than to grant them correctly initiallyBalancing security with performance - Intelligent caching and optimization strategies are essentialBuilding defense in depth - No single security control should be your only line of defenseMonitoring and responding to security events - Comprehensive logging and alerting provides visibilityContinuously adapting security practices - Serverless security is evolving rapidly as the technology matures The serverless paradigm has fundamentally changed how we approach application security. By embracing these changes and implementing the patterns described in this article, you can build serverless applications that are both secure and scalable. Remember that while cloud providers secure the underlying infrastructure, the security of your application logic, authentication flows, and data access patterns remains your responsibility. The shared responsibility model is especially important in serverless architectures where the division of security duties is less clear than in traditional deployments. As serverless adoption continues to grow, expect to see more sophisticated security solutions emerge that address the unique challenges of highly distributed, ephemeral computing environments. By implementing the practices outlined here, you'll be well-positioned to leverage these advancements while maintaining strong security fundamentals.

By Mahesh Vaijainthymala Krishnamoorthy
Defining Effective Microservice Boundaries - A Practical Approach To Avoiding The Most Common Mistakes
Defining Effective Microservice Boundaries - A Practical Approach To Avoiding The Most Common Mistakes

Have you found yourself staring at an entire whiteboard filled with boxes and arrows, pondering whether this would become the next awesome microservices architecture or the worst distributed monolith that ever existed? Same here, and more often than I would like to admit. Last month, I was talking to one of my cofounder friends, and he mentioned, “We have 47 services!” with pride. Then two weeks later, I was going through their docs and found out that to deploy a simple feature, I need to make changes in six of their services. What I thought was their “microservices” architecture turned out to be a monolith split into pieces, with distribution complexity but no benefits whatsoever. Perhaps the most critically important and the most underappreciated step in this architectural style is correct partitioning of microservices. Doing so increases modular independent deployability, isolation of faults, and swiftness in team operations. Mess it up, and welcome to a distributed system that is a thousand times harder to maintain than the monolith you wanted to remove. The Anti-patterns: How Boundaries Fail – Consider the Case The Distributed Monolith: Death by a Thousand Cuts An application made up of multiple services that are interdependent is an example of a pattern I encounter frequently, known as “distributed monolith.” This occurs when the complexity of distribution exists, but not the advantages. Here are some indicators that your distributed Monolith is operating below peak efficiency: One modification prompts and requires multiple adaptations across different services.Disabling dependency across services results in a breakage.Cross-team coordination complexity for release planning. A team I recently interacted with had to cross-coordinate across eight services for deployment just to add a field in their user’s profile. That is neither a microservices nor a service; that is just an unnecessarily intricate web of self-ensuing torture. The Shared Database Trap Again, “We need to use the same data!” falls under an alarm that can lead to this trap. Having many services access the same database tables leads to direct hidden coupling that eliminates every siloed advantage your architecture stands for. I saw a retail company suffering through Black Friday as a consequence of four hours of downtime when their order service’s inventory service changed a database schema that their order service relied on. Nanoservice Growth: Over-Indulging on the Good Stuff This can also go in the opposite direction. Sometimes I refer to it as “nanoservice madness.” You create an endless number of services and it turns your architecture into something resembling spaghetti. One of the gaming companies I consulted for was creating individual microservices for user achievements, user added preferences, user friends, and even user authentication and profile. Each of these services had their own deployment pipeline, database, and even an on-call rotation. The operational overhead was too much for their small team. A Defined Scenario: Redesigning an E-Commerce Boundary Let me show you an actual scenario from last year. I was consulting for an e-commerce business that had a typical case of a “distributed monolith.” Their initial architecture was something along the lines of this: YAML # Original architecture with poor boundaries services: product-service: responsibilities: - Management of product catalogs - Inventory management - Rules associated with pricing - Discount calculations database: shared_product_db dependencies: - user-service - order-service order-service: responsibilities: - Management and creation of orders - Processing of payments - Coordination of shipping database: shared_order_db dependencies: - product-service - user-service user-service: responsibilities: - User profiles - Authentication - Authorization - User preferences database: shared_user_db dependencies: - product-service It was obvious what the problems were. Services did have an appropriate amount of responsibilities but were overloaded with circular dependencies and too much knowledge of each other. Changes required coordinating at minimum three separate teams which is a disaster waiting to happen. Their business professionals were with us for a week. By the end of day one, the sticky notes had taken over the walls. The product team was in a heated debate with the inventory folks over who “owned” the concept of a product being “in stock.” It was chaotic, but by the end of the week, we had much clearer boundaries. The end result is as follows: YAML services: catalog-service: responsibilities: - Product information - Categorization - Search database: catalog_db dependencies: [] inventory-service: responsibilities: - Stock tracking - Reservations database: inventory_db dependencies: [] pricing-service: responsibilities: - Base prices - Discounts - Promotions database: pricing_db dependencies: [] order-service: responsibilities: - Order creation - Tracking - History database: order_db dependencies: - catalog-service - inventory-service - pricing-service (all async) payment-service: responsibilities: - Payment processing - Refunds database: payment_db dependencies: [] user-profile-service: responsibilities: - Profile management - Preferences database: user_profile_db dependencies: [] auth-service: responsibilities: - Authentication - Authorization database: auth_db dependencies: [] I understand your initial thoughts, “You went from 3 services to 7? That is increasing complexity, not decreasing it,” right? The thing is, every service now has one, dedicated responsibility. The dependencies are reduced and mostly asynchronous. Each service is fully in control of its data. The outcome was drastic. The average time to implement new features decreased by 60%, while deployment frequency went up by 300%. Their Black Friday sale was the real test for us six months later. Each service scaled on its load patterns rather than overstocking resources like the previous year. While the catalog service required 20 instances, payment only needed five. In the middle of the night, their CTO texted me a beer emoji, the universal sign of a successful launch. A Practical Strategy Finding The Right Boundaries Start With Domain-Driven Design (But Make It Work) As much as Domain-Driven Design (DDD) purists would like to disagree, you don’t need to be a purist to benefit from DDD’s tools for exploring boundaries. Start with Event Storming. This is a workshop approach where you gather domain experts and developers to construct business process models using sticky notes representing domain events, commands, and aggregates. This type of collaboration often exposes boundaries that are already a feature in your domain. The “Two Pizza Team” Rule Still Works Amazon continues to enforce their famous rule that states a team should be small enough to be fed by two pizzas. The team should be able to fit into a single meeting room alongside to microservices. If a service grows so complicated that it takes more than 5-8 engineers to maintain it, that's often an indication it should be split. But the inverse is also true, if you have 20 services and only 5 engineers, there is an increasing likelihood you’ve become too fine grained. The Cognitive Load Test Introduction 2025 An interesting approach I adopted in 2025 is what I like to refer to as 'the cognitive load test' for boundary determination, which seems to be very effective. It’s very straightforward: does any new team member manage to understand the goals, duties, and functions of the service within a day? If not, your service might have too many operations or is fragmented. Actionable Insights for 2025 and Further The Strangler Fig Pattern: Expand Your Horizons When remodeling an existing system, don’t sweat the boundaries on the first attempt. Implement the Strangler Fig pattern which replaces parts of monolithic architecture gradually with well-structured microservices (named after a vine that gradually overtakes its host tree). A healthcare client of mine tried to create the perfect microservices architecture for 18 months without writing a single line of code. Their design became completely obsolete after many changes within the business requirements during the tangled time-consuming process. The Newest Pattern: Boundary Observability A trend that I've started noticing in 2025 is something I'm calling “boundary-testing observability”—monitoring cross service dependencies and consistency data, essentially. ServiceMesh and BoundaryGuard are tools that will notify you when services are getting too talkative or when data redundancy is posing a consistency threat. Concluding Remarks: Boundaries Are a Journey, Not a Destination After assisting countless businesses with adopting microservices, my domain understanding boundaries have shifted as business needs change. This learning will always remain agile, and boundless. If there’s a strong “value” in doing so, initiate with coarse-grained services and progress from there. Boundaries and borders are subjective. There is a fine line that dictates whether data should be shared or duplicated, so be reasonable. Most importantly, pay attention to the problems and pain your teams face, there is a strong chance that it will give clues to boundary issues. As my mentor used to say, “the best microservices architecture isn’t the one that looks prettiest on a whiteboard—it’s the one that lets your teams ship features quickly and safely without wanting to quit every other Tuesday.”

By Mohit Menghnani
Software Specs 2.0: Evolving Requirements for the AI Era (2025 Edition)
Software Specs 2.0: Evolving Requirements for the AI Era (2025 Edition)

Any form of data that we can use to make decisions for writing code, be it requirements, specifications, user stories, and the like, must have certain qualities. In agile development, for example, we have the INVEST qualities. More specifically, a user story must be Independent of all others and Negotiable, i.e., not a specific contract for features. It must be Valuable (or vertical) and Estimable (to a good approximation). It must also be Small to fit within an iteration and Testable (in principle, even if there isn’t a test for it yet). This article goes beyond agile, waterfall, rapid application development, and the like. I will summarise a set of general and foundational qualities as a blueprint for software development. To effectively leverage AI for code generation, while fundamental principles of software requirements remain, their emphasis and application must adapt. This ensures the AI, which lacks human intuition and context, can produce code that is not only functional but also robust, maintainable, and aligned with project constraints. For each fundamental quality, I first explain its purpose. Its usefulness and applicability when code is generated by AI are also discussed. The level of detail that I want to cover this topic necessitates two articles. This article summarizes the "what" we should do. A follow-up article gives an elaborate example about "how" we can do that. Documented Software requirements must be documented and should not just exist in our minds. Documentation may be as lightweight as possible as long as it’s easy to maintain. After all, documentation's purpose is to be a single source of truth. When we say requirements must be "Documented" for human developers, we mean they need to be written down somewhere accessible (wiki, requirements doc, user stories, etc.). If they only exist in someone's head or if they are scattered across chat messages, they probably won't be very effective. This ensures alignment, provides a reference point, and helps with onboarding. While lightweight documentation is often preferred (like user stories), there's usually an implicit understanding that humans can fill in gaps through conversation, experience, and shared context. For AI code generation, the "Documented" quality takes on a more demanding role: The documentation is the primary input: AI-code assistants don't attend planning meetings. They may not ask clarifying questions in real-time (though some tools allow interactive refinement). Currently, they lack the years of contextual experience a human developer has. The written requirement document could be the most direct and often sole instruction set the AI receives for a specific task. Its quality can directly dictate the quality of the generated code.Need for machine interpretability: While we can understand natural language fairly well, even with some ambiguity, AI models perform best with clear, structured, and unambiguous input. This means that the format and precision of the documentation could be a game-changer. Vague language can lead to unpredictable or incorrect assumptions by the AI.Structured formats aid consistency: We could use Gherkin for BDD, specific prompt templates, or even structured data formats like JSON/YAML for configuration-like requirements. Using predefined structures or templates for requirements can be very useful. This way, the necessary details (like error handling, edge cases, and non-functional requirements) are consistently considered and provided to the AI. This can lead to more predictable and reliable code generation.Single source of truth is paramount: Because the document is the spec fed to the AI, ensuring it's the definitive, up-to-date version is critical. Changes must be reflected in the documentation before regeneration is attempted. Correct We must understand correctly what is required from the system and what is not required. This may seem simple, but how many times have we implemented requirements that were wrong? The Garbage In, Garbage Out (GIGO) rule applies here. For AI-code generation, the importance of correctness can be evaluated if we consider that: AI executes literally: AI code generators are powerful tools for translating instructions (requirements) into code. However, they typically lack deep domain understanding. Currently, they lack the "common sense" to question if the instructions themselves are logical or align with broader business goals. If you feed an AI a requirement that is clearly stated but functionally wrong, the AI will likely generate code that perfectly implements that wrong functionality.Reduced opportunity for implicit correction: We might read a requirement and, based on our experience or understanding of the project context, spot a logical flaw or something that contradicts a known business rule. We might say, "Are you sure this is right? Usually, we do X in this situation." This provides a valuable feedback loop to catch incorrect requirements early. An AI is much less likely to provide this kind of proactive, context-aware sanity check. It usually assumes the requirements it receives are the intended truth.Validation is key: The burden of ensuring correctness falls heavily on the requirement definition and validation process before the AI gets involved. The people defining and reviewing the requirements must be rigorous in confirming that what they are asking for is truly what is needed. Complete This is about having no missing attributes or features. While incomplete requirements are an issue, again, we may infer missing details, ask clarifying questions, or rely on implicit knowledge. This is not always the case, however, even for us humans! Requirements may remain incomplete even after hours of meetings and discussions. In the case of AI-generated code, I've seen AI-assistants going both ways. There are cases where AI assistants generate what is explicitly stated. The resulting gaps led to incomplete features or the AI making potentially incorrect assumptions. There are also cases where the AI-assistant spotted the missing attributes and made suggestions. In any case, for completeness, I think it's still worth being as explicit as we can be. Requirements must detail not just the "happy path" but also: Edge cases: Explicitly list known edge cases.Error handling: Specify how errors should be handled (e.g., specific exceptions, return codes, logging messages).Non-functional requirements (NFRs): Performance targets, security constraints (input validation, output encoding, authentication/authorization points), scalability considerations, and data handling/privacy rules must be stated clearly.Assumptions: Explicitly list any assumptions being made. Unambiguous When we read the requirements, we can all understand the same thing. Ambiguous requirements may lead to misunderstandings, long discussions, and meetings for clarification. They may also lead to rework and bugs. In the worst case, requirements may be interpreted differently and we may develop something different than what was expected. In the case of AI assistants, it also looks particularly dangerous. Patterns and rules: AI models process the input text according to the patterns and rules they've learned. They don't inherently "understand" the underlying business goal or possess common sense in the human way. If a requirement can be interpreted in multiple ways, the AI might arbitrarily choose one interpretation based on its training data. This may not be the one intended by the stakeholders.Unpredictable results: Ambiguity leads directly to unpredictability in the generated code. You might run the same ambiguous requirement through the AI (or slightly different versions of it) and get vastly different code implementations. Each time you run the code, the AI-assistant may handle the ambiguity in a different way. Consistent Consistency in requirements means using the same terminology for the same concepts. It means that statements don't contradict each other and maintain a logical flow across related requirements. For human teams, minor inconsistencies can often be resolved through discussion or inferred context. In the worst case, inconsistency can also lead to bugs and rework. However, for AI code generators, consistency is vital for several reasons: Pattern recognition: The AI assistants will try to extract patterns for your requirements. Because LLMs lack an internal semantic model of the system, they won’t infer that ‘Client’ and ‘User’ refer to the same entity unless that connection is made explicit. This can lead to generating separate, potentially redundant code structures, data fields, or logic paths, or fail to correctly link related functionalities.Inability to resolve contradictions: AI models struggle with logical contradictions. If one requirement states "Data must be deleted after 30 days," and another related requirement states "User data must be archived indefinitely," the AI may not ask for clarification or determine the correct business rule. It might implement only one rule (often the last one it processed), try to implement both (leading to errors), or fail unpredictably.Impact on code quality: Consistency in requirements often translates to consistency in the generated code. If requirements consistently use specific naming conventions for features or data elements, the AI is more likely to follow those conventions in the generated code (variables, functions, classes). Inconsistent requirements can lead to inconsistently structured and named code. This makes it harder to understand and maintain.Logical flow: Describing processes or workflows requires a consistent logical sequence. Jumbled or contradictory steps in the requirements can confuse the AI about the intended order of operations. Testable We must have an idea about how to test that the requirements are fulfilled. A requirement is testable if there are practical and objective ways to determine whether the implemented solution meets it. Testability is paramount for both human-generated code and AI-generated code. Our confidence must primarily come from verifying code behavior. Rigorous testing against clear, testable requirements is the primary mechanism to ensure that the code is reliable and fit for purpose. Testable requirements provide the blueprint for verification. Testability calls for smallness, observability, and controllability. A small requirement here implies that it results in a small unit of code under test. This is where decomposability, simplicity, and modularity become important. Smaller, well-defined, and simpler units of code with a single responsibility are inherently easier to understand, test comprehensively, and reason about than large, monolithic, and complex components. If an AI generates a massive, tangled function, even if it "works" for the happy path, verifying all its internal logic and edge cases is extremely difficult. You can't be sure what unintended behaviours might lurk within. For smallness, decompose large requirements into smaller, more manageable sub-requirements. Each sub-requirement should ideally describe a single, coherent piece of functionality with its own testable outcomes. Observability is the ease with which you can determine the internal state of a component and its outputs, based on its inputs. This holds true before, during, and after a test execution. Essentially, can you "see" what the software is doing and what its results are? To test, we need to be able to observe behaviour or state. If the effects of an action are purely internal and not visible, testing is difficult. For observability, we need clear and comprehensive logging, exposing relevant state via getters or status endpoints. We need to return detailed and structured error messages, implement event publishing, or use debuggers effectively. This way we can verify intermediate steps, understand the flow of execution, and diagnose why a test might be failing. Describe external behavior: Focus on what the system does that can be seen, not how it does it internally (unless the internal "how" impacts an NFR like performance that needs constraint).Specify outputs: Detail the format, content, and destination of any outputs (UI display, API responses, file generation, database entries, logged messages). Example: Upon successful registration, the system MUST return an HTTP 201 response with a JSON body containing user_id and email.Define state changes: If a state change is an important outcome, specify how that state can be observed. Example: After order submission, the order status MUST be 'PENDING_PAYMENT' and this status MUST be retrievable via the /orders/{orderId}/status endpoint.Require logging for key events: Log key state changes and decision points at INFO level. The system MUST log an audit event with event_type='USER_LOGIN_SUCCESS' and user_id upon successful login. Controllability is the ease with which we can "steer" a component into specific states or conditions. How easily can we provide a component with the necessary inputs (including states of dependencies) to execute a test and isolate it from external factors that are not part of the test? We can achieve this through techniques like dependency injection (DI), designing clear APIs and interfaces, using mock objects or stubs for dependencies, and providing configuration options. This allows us to easily set up specific scenarios, test individual code paths in isolation, and create deterministic tests. Problems Caused by Poor Controllability Hardcoded Dependencies They can force you to test your unit along with its real dependencies. This turns unit tests into slow, potentially unreliable integration tests. You can't easily simulate error conditions from the dependency. Reliance on Global State If a component reads or writes to global variables or singletons, it's hard to isolate tests. One test might alter the global state, causing subsequent tests to fail or behave unpredictably. Resetting the global state between tests can be complex. Lack of Clear Input Mechanisms If a component's behaviour is triggered by intricate internal state changes or relies on data from opaque sources rather than clear input parameters, it's difficult to force it into the specific state needed for a particular test. Consequences Slow tests: Tests that need to set up databases, call real APIs, or wait for real timeouts run slowly, discouraging frequent execution.Flaky tests: Tests relying on external systems or shared state can fail intermittently due to factors outside the code under test (e.g., network issues, API rate limits).Difficult to write and maintain: Complex setups and non-deterministic behaviour lead to tests that are hard to write, understand, and debug when they fail. The "Arrange" phase of a test becomes a huge effort. Traceable Traceability in software requirements means being able to follow the life of a requirement both forwards and backwards. You should be able to link a specific requirement to the design elements, code modules, and test cases that implement and verify it. Conversely, looking at a piece of code or a test case, you should be able to trace it back to the requirement(s) it fulfills. Traceability tells us why that code exists and what business rule or functionality it's supposed to implement. Without this link, code can quickly become opaque "magic" that developers are hesitant to touch. Debugging and root cause analysis: When AI-generated code exhibits a bug or unexpected behavior, tracing it back to the source requirement is often the first step. Was the requirement flawed? Did the AI misinterpret a correct requirement? Traceability guides the debugging process.Maintenance and change impact analysis: Requirements inevitably change. If REQ-123 is updated, traceability allows you to quickly identify the specific code sections (potentially AI-generated). Tests associated with REQ-123 will need review, modification, or regeneration. Without traceability, finding all affected code sections becomes a time-consuming and error-prone manual search.Verification and coverage: Traceability helps verify that our requirements have code and tests. You can check if any requirements have been missed or if any generated code doesn't trace back to a valid requirement. Viable A requirement is "Viable" if it can realistically be implemented within the project's given constraints. These constraints typically include available time, budget, personnel skills, existing technology stack, architectural patterns, security policies, industry regulations, performance targets, and the deployment environment. Need for explicit constraints: To ensure that AI assistants generate viable code, the requirements must explicitly state the relevant constraints. These act as guardrails, guiding the AI towards solutions that are not just technically possible but also practical and appropriate for your specific project context. Perhaps your company standardized on using the FastAPI framework for Python microservices. Maybe that direct database access from certain services is forbidden by the security policy. Maybe your deployment target is a low-memory container environment, or maybe a specific external (paid) API suggested by the AI exceeds the project budget. Wrapping Up When writing requirements for AI-generated code, the fundamental principles remain, but the emphasis shifts towards: Extreme explicitness: Cover edge cases, errors, and NFRs meticulously.Unambiguity and precision: Use clear, machine-interpretable language.Constraint definition: Guide the AI by specifying architecture, tech stack, patterns, and NFRs.Testability: Define clear, measurable acceptance criteria. Smallness, observability, and controllability are important.Structured input: Format requirements for optimal AI consumption. In essence, the requirements for AI code generation mean being more deliberate, detailed, and directive. It's about providing the AI with a high-fidelity blueprint that minimizes guesswork. A blueprint that maximizes the probability of generating correct, secure, efficient, and maintainable code. Code that aligns with project goals and technical standards. This involves amplifying the importance of qualities like completeness, unambiguity, and testability. It also involves evolving the interpretation of understandability to suit an AI "developer." Currently, it seems that carefully crafting software requirements can also reduce hallucinations in AI-generated code. However, it's not expected to eliminate hallucinations entirely just through the requirements alone. The quality and structure of the input prompt (including the requirements) significantly influence how prone the AI is to hallucinate details. Hallucinations also stem from model limitations, training data artifacts, and prompt-context boundaries. Such factors are beyond the scope of this article.

By Stelios Manioudakis, PhD DZone Core CORE
Kung Fu Code: Master Shifu Teaches Strategy Pattern to Po – The Functional Way
Kung Fu Code: Master Shifu Teaches Strategy Pattern to Po – The Functional Way

"There is no good or bad code. But how you write it… that makes all the difference.” - Master Shifu The sun had just touched the tips of the Valley of Peace. Birds chirped, the wind whispered tales of warriors, and Po—the Dragon Warrior—was busy trying to write some Java code. Yes, you read that right. Master Shifu stood behind him, watching, amused and concerned. Po (scratching his head): “Master Shifu, I’m trying to make this app where each Kung Fu move is chosen based on the enemy. But the code is… bloated. Classes everywhere. If OOP was noodles, this is a full buffet.” Shifu (calmly sipping tea): “Ah, the classic Strategy Pattern. But there’s a better way, Po… a functional way. Let me show you the path.” The Traditional (OOP) Strategy Pattern – Heavy Like Po’s Lunch Po wants to choose a fighting strategy based on his opponent. Java // Strategy Interface interface FightStrategy { void fight(); } // Concrete Strategies class TigerFightStrategy implements FightStrategy { public void fight() { System.out.println("Attack with swift tiger strikes!"); } } class MonkeyFightStrategy implements FightStrategy { public void fight() { System.out.println("Use agile monkey flips!"); } } // Context class Warrior { private FightStrategy strategy; public Warrior(FightStrategy strategy) { this.strategy = strategy; } public void fight() { strategy.fight(); } public void setStrategy(FightStrategy strategy) { this.strategy = strategy; } } Usage Java Warrior po = new Warrior(new TigerFightStrategy()); po.fight(); // Output: Attack with swift tiger strikes! po.setStrategy(new MonkeyFightStrategy()); po.fight(); // Output: Use agile monkey flips! Why This Is a Problem (and Why Po Is Annoyed) Po: “So many files, interfaces, boilerplate! All I want is to change moves easily. This feels like trying to meditate with a noodle cart passing by!” Indeed, OOP Strategy pattern works, but it's verbose, rigid, and unnecessarily class-heavy. It violates the spirit of quick Kung Fu adaptability! Enter Functional Programming – The Way of Inner Simplicity Shifu (nodding): “Po, what if I told you… that functions themselves can be passed around like scrolls of wisdom?” Po: “Whoa... like… JScrolls? Shifu: “No, Po. Java lambdas.” In functional programming, functions are first-class citizens. You don’t need classes to wrap behavior. You can pass behavior directly. Higher-Order Functions are functions that take other functions as parameters or return them. Po, In Java 8 onwards, we can do that easily with the help of lambda. Lambda can wrap the functionality and can be passed to another method as a parameter. Strategy Pattern – The Functional Way in Java Java import java.util.function.Consumer; class Warrior { private Consumer<Void> strategy; public Warrior(Consumer<Void> strategy) { this.strategy = strategy; } public void fight() { strategy.accept(null); } public void setStrategy(Consumer<Void> strategy) { this.strategy = strategy; } } But there’s a better, cleaner way with just lambdas and no class at all. Java import java.util.function.Supplier; public class FunctionalStrategy { public static void main(String[] args) { // Each strategy is just a lambda Runnable tigerStyle = () -> System.out.println("Attack with swift tiger strikes!"); Runnable monkeyStyle = () -> System.out.println("Use agile monkey flips!"); Runnable pandaStyle = () -> System.out.println("Roll and belly-bounce!"); // Fighter is a high-order function executor executeStrategy(tigerStyle); executeStrategy(monkeyStyle); executeStrategy(pandaStyle); } static void executeStrategy(Runnable strategy) { strategy.run(); } } Shifu (with a gentle tone): “Po, in the art of code—as in Kung Fu—not every move needs a name, nor every master a title. In our example, we summoned the ancient scroll of Runnable… a humble interface with but one method—run(). In Java 8, we call it Functional Interface. Think of it as a silent warrior—it expects no inputs (parameters), demands no rewards (return type), and yet, performs its duty when called. Each fighting style—tiger, monkey, panda—was not wrapped in robes of classes, but flowed freely as lambdas. And then, we had the executeStrategy() method… a higher-order sensei. It does not fight itself, Po. It simply receives the wisdom of a move—a function—and executes it when the time is right. This… is the way of functional composition. You do not command the move—you invite it. You do not create many paths—you simply choose the next step.” Benefits – As Clear As The Sacred Pool of Tears No extra interfaces or classes Easily switch behaviors at runtimeMore readable, composable, and flexiblePromotes the power of behavior as data. Real-World Example: Choosing Payment Strategy in an App Java Map<String, Runnable> paymentStrategies = Map.of( "CARD", () -> System.out.println("Processing via Credit Card"), "UPI", () -> System.out.println("Processing via UPI"), "CASH", () -> System.out.println("Processing via Cash") ); String chosen = "UPI"; paymentStrategies.get(chosen).run(); // Output: Processing via UPI Po: “This is amazing! It’s like picking dumplings from a basket, but each dumpling is a deadly move.” Shifu: “Exactly. The Strategy was never about the class, Po. It was about choosing the right move at the right moment… effortlessly.” One move = One lambda. The good part is, this lambda only holds the move details—nothing else. So any warrior can master these moves and use them when needed, without having to rely on some bounded object that wrapped the move inside a bulky, boilerplate class. Final Words of Wisdom “The strength of a great developer lies not in how many patterns they know… but in how effortlessly they flow between object thinking and function weaving to craft code that adapts like water, yet strikes like steel.”- Master Shifu, on the Tao of Design Patterns. Coming Up in the Series Code of Shadows: Master Shifu and Po Use Functional Java to Solve the Decorator Pattern MysteryKung Fu Commands: Shifu Teaches Po the Command Pattern with Java Functional Interfaces

By Shamik Mitra
Data Storage and Indexing in PostgreSQL: Practical Guide With Examples and Performance Insights
Data Storage and Indexing in PostgreSQL: Practical Guide With Examples and Performance Insights

PostgreSQL employs sophisticated techniques for data storage and indexing to ensure efficient data management and fast query performance. This guide explores PostgreSQL's mechanisms, showcases practical examples, and includes simulated performance metrics to illustrate the impact of indexing. Data Storage in PostgreSQL Table Structure and TOAST (The Oversized-Attribute Storage Technique) Table Structure: PostgreSQL stores table data in a format known as a heap. Each table's heap contains one or more pages (blocks), where each page is typically 8KB in size. This size can be altered when compiling PostgreSQL from source. PostgreSQL organizes table data in a heap structure with 8KB pages by default. Rows exceeding a page size are handled using TOAST, which compresses and stores oversized attributes in secondary storage. Example: Managing Large Text Data Consider a documents table: SQL CREATE TABLE documents ( doc_id SERIAL PRIMARY KEY, title TEXT, content TEXT ); Scenario: Storing a document with 10MB of content.Without TOAST: The entire document resides in the table, slowing queries.With TOAST: The content is compressed and stored separately, leaving a pointer in the main table. Expected Performance Improvement MetricWithout TOASTWith TOASTQuery Execution Time~4.2 seconds~2.1 seconds (50% faster) TOAST significantly reduces table size, enhancing read and write efficiency. MVCC (Multi-Version Concurrency Control): Consistency with Row Versions: PostgreSQL uses MVCC to ensure data consistency and support concurrent transactions. Each transaction sees a snapshot of the database, isolating it from others and preventing locks during long queries.Transaction Management with XIDs: Each row version includes Transaction IDs (XIDs) to indicate when it was created and when it expired. This enables PostgreSQL to manage concurrency and recovery efficiently. For example, while editing an inventory item during a sales report generation, MVCC ensures the sales report sees the original data while the update operates independently. Indexing in PostgreSQL Indexes in PostgreSQL optimize queries by reducing the need for full-table scans. Below are examples showcasing indexing techniques, their use cases, and expected improvements. B-Tree Index: Default for Range Queries B-tree indexes are efficient for equality and range queries. Example: Product Price Filtering Given a products table: SQL CREATE TABLE products ( product_id SERIAL PRIMARY KEY, name TEXT, price NUMERIC ); Query Without Index SQL SELECT * FROM products WHERE price BETWEEN 50 AND 100; Execution Time: ~8.3 seconds (full scan on 1 million rows). Query With B-Tree Index MariaDB SQL CREATE INDEX idx_price ON products(price); SELECT * FROM products WHERE price BETWEEN 50 AND 100; Execution Time: ~0.6 seconds (direct row access). Performance Improvement MetricWithout IndexWith IndexImprovement (%)Query Execution Time~8.3 seconds~0.6 seconds~92.8% faster Hash Index: Fast Equality Searches Hash indexes are ideal for simple equality searches. Example: User Email Lookup Given a users table: SQL CREATE TABLE users ( user_id SERIAL PRIMARY KEY, name TEXT, email TEXT UNIQUE ); Query Without Index SQL SELECT * FROM users WHERE email = '[email protected]'; Execution Time: ~4.5 seconds (scans 500,000 rows). Query With Hash Index SQL CREATE INDEX idx_email_hash ON users USING hash(email); SELECT * FROM users WHERE email = '[email protected]'; Execution Time: ~0.3 seconds. Performance Improvement MetricWithout IndexWith IndexImprovement (%)Query Execution Time~4.5 seconds~0.3 seconds~93.3% faster GiST Index: Handling Spatial Data GiST indexes are designed for complex data types, such as geometric or spatial queries. Example: Store Locator Given a locations table: SQL CREATE TABLE locations ( location_id SERIAL PRIMARY KEY, name TEXT, coordinates GEOMETRY(Point, 4326) ); Query Without Index SQL SELECT * FROM locations WHERE ST_DWithin(coordinates, ST_MakePoint(40.748817, -73.985428), 5000); Execution Time: ~6.7 seconds. Query With GiST Index SQL CREATE INDEX idx_coordinates_gist ON locations USING gist(coordinates); SELECT * FROM locations WHERE ST_DWithin(coordinates, ST_MakePoint(40.748817, -73.985428), 5000); Execution Time: ~1.2 seconds. Performance Improvement MetricWithout IndexWith IndexImprovement (%)Query Execution Time~6.7 seconds~1.2 seconds~82% faster GIN Index: Full-Text Search GIN indexes optimize composite or multi-value data types, such as arrays or JSON. Example: Tag Search Given an articles table: SQL CREATE TABLE articles ( article_id SERIAL PRIMARY KEY, title TEXT, tags TEXT[] ); Query Without Index SQL SELECT * FROM articles WHERE tags @> ARRAY['technology']; Execution Time: ~9.4 seconds. Query With GIN Index SQL CREATE INDEX idx_tags_gin ON articles USING gin(tags); SELECT * FROM articles WHERE tags @> ARRAY['technology']; Execution Time: ~0.7 seconds. Performance Improvement MetricWithout IndexWith IndexImprovement (%)Query Execution Time~9.4 seconds~0.7 seconds~92.6% faster BRIN Index: Large Sequential Datasets BRIN indexes summarize data blocks, suitable for massive sequential datasets. Example: Log File Queries Given a logs table: SQL CREATE TABLE logs ( log_id SERIAL PRIMARY KEY, log_time TIMESTAMP, message TEXT ); Query Without Index SQL SELECT * FROM logs WHERE log_time BETWEEN '2023-01-01' AND '2023-01-31'; Execution Time: ~45 seconds. Query With BRIN Index SQL CREATE INDEX idx_log_time_brin ON logs USING brin(log_time); SELECT * FROM logs WHERE log_time BETWEEN '2023-01-01' AND '2023-01-31'; Execution Time: ~3.2 seconds. Performance Improvement MetricWithout IndexWith IndexImprovement (%)Query Execution Time~45 seconds~3.2 seconds~92.9% faster Performance Considerations Impact on Writes: Indexes can slow down INSERT, UPDATE, or DELETE operations as they require updates to all associated indexes. Balancing the number and type of indexes is crucial. Example: An orders table with multiple indexes may experience slower insert speeds, requiring careful optimization. Index Maintenance: Over time, indexes can fragment and degrade in performance. Regular maintenance with commands like REINDEX can restore efficiency: SQL REINDEX INDEX idx_salary; Using Execution Plans: Analyze queries with EXPLAIN to understand index usage and identify performance bottlenecks: SQL EXPLAIN SELECT * FROM employees WHERE salary BETWEEN 50000 AND 70000; Conclusion PostgreSQL employs effective storage and indexing strategies, such as the TOAST mechanism for handling oversized data and various specialized index types, to significantly enhance query performance. This guide provides examples and performance metrics that showcase the tangible benefits of using indexes in various scenarios. By applying these techniques, database engineers can optimize both read and write operations, leading to robust and scalable database systems.

By arvind toorpu DZone Core CORE
TFVC to Git Migration: Step-by-Step Guide for Modern DevOps Teams
TFVC to Git Migration: Step-by-Step Guide for Modern DevOps Teams

The Challenge Our organization has maintained a large monolithic codebase in Team Foundation Version Control (TFVC) for over a decade. As development velocity has increased and teams have moved toward agile methodologies, microservices, and cloud-native architectures, the limitations of TFVC have become increasingly apparent. The centralized version control model hinders collaboration, branching, and automation, and our existing classic build and release pipelines in TFS are tightly coupled with legacy tooling that no longer aligns with modern DevOps practices. We have observed significant bottlenecks in: Managing concurrent feature development across teamsImplementing flexible CI/CD workflowsIntegrating with cloud-based infrastructure and toolsAdopting containerized, microservice-oriented deployments To enable a scalable, collaborative, and DevOps-friendly environment, we must migrate our TFVC repositories to Git, which is better suited for distributed development, supports lightweight branching, and integrates seamlessly with modern CI/CD pipelines and platforms like Azure DevOps, GitHub, and Kubernetes. Overview While TFVC has served enterprises for years, its centralized nature and complex branching model make it less suitable for modern development paradigms. In contrast, Git, a distributed version control system, empowers teams to move faster, collaborate more effectively, and align with industry-standard CI/CD practices. In this blog, we will walk through Why should we migrate from TFVC to GitKey challenges during migrationStep-by-step guide using Azure DevOpsAn example use casePost-migration best practices Why Migrate from TFVC to Git? 1. Align With Modern Tooling Git integrates seamlessly with tools like GitHub, GitLab, Bitbucket, Azure DevOps Repos, Kubernetes, Docker, and more. TFVC is limited mostly to older Visual Studio versions and TFS 2. Distributed Workflows Git allows every developer to work independently with a local copy of the entire codebase, enabling offline work, faster operations, and streamlined collaboration. 3. Agile and DevOps Support Git's branching and merging strategies suit agile development and trunk-based development better than TFVC’s heavyweight model. 4. Cloud-Native and Microservices Ready Microservices require isolated, independently deployable repositories. Git supports this easily with its lightweight branching, tagging, and submodule capabilities. Challenges We May Face While Git offers substantial benefits, the migration is not trivial, especially in large enterprises ChallengeDescriptionRepository SizeTFVC projects can be large with extensive historyHistory PreservationWe may want to retain commit history, comments and metadataUser MappingMapping historical TFVC users to Git commit authorsTool FamiliarityDevelopers may need Git trainingPipeline DependenciesExisting TFS build/release pipelines may break post-migration Step-by-Step Migration from TFVC to Git (Using Azure DevOps) Azure DevOps provides native tools to facilitate TFVC-to-Git migrations. Let’s walk through a real-world example: Scenario A legacy monolithic application is stored in a TFVC repository in Azure DevOps Server 2019. The organization wants to modernize development by migrating this codebase to Git and starting to use YAML pipelines. Step 1: Prepare the Environment Git-TFS.NET Framework 4.7.2+Git Step 2: Install Git-TFS Git-TFS is a .NET tool that allows you to clone a TFVC repository and convert it into a Git repository. Shell choco install gittfs Or manually download from this link. Step 3: Clone the TFVC Repository With Git History Now we will create a Git repository by fetching history from TFVC: Shell git tfs clone http://your-tfs-url:8080/tfs/DefaultCollection $/YourProject/MainBranch --branches=all Notes: --branches=all will attempt to migrate TFVC branches to Git branches.$ is the root symbol for TFVC paths. We can also limit the history to a certain number of changesets for performance: Shell git tfs clone http://your-tfs-url:8080/tfs/DefaultCollection $/YourProject/MainBranch --changeset=10000 Step 4: Push to Git Repository in Azure DevOps Create a new Git Repo in Azure DevOps: Go to Project > Repos > New Repository.Select Git, name it appropriately. Then, push your migrated Git repo. Shell cd <<YOUR-GIT-REPO>> git remote add origin https://dev.azure.com/your-org/YourProject/_git/YourProject-Git git push -u origin --all Step 5: Validate and Set Up CI/CD Ensure all branches and tags are present.Recreate pipelines using Azure Pipelines (YAML) or any Git-based CI/CD system.Define branch policies, pull request templates, and protection rules. Example Use Case Let's assume we are migrating a health care management system developed in the .NET framework and hosted in TFVC. Before Migration Single monolithic TFVC repository.Classic release pipelines in TFS.Developers struggle with branching and rollback. After Migration Git repository with main, feature/*, and release/* branches.Developers create pull requests for features and hotfixes.Azure YAML pipelines automate builds and deployments Sample Git Branching Strategy Shell main │ ├── feature/add-enrollment-integration ├── feature/optimize-db-calls ├── release/v1.0 Sample Azure DevOps (YAML) CI/CD Pipeline YAML trigger: branches: include: - main - release/* pool: vmImage: 'windows-latest' steps: - task: UseDotNet@2 inputs: packageType: 'sdk' version: '6.x' - script: dotnet build - script: dotnet test - script: dotnet publish -c Release Best Practices Post-Migration Train the team on Git commands and workflows.Automate branching policies and PR reviews.Archive or decommission TFVC repositories to avoid confusion.Use semantic versioning, tagging, and GitHub flow based on team size.Monitor Git performance with tools like Git Large File Storage (LFS) if needed. Understanding Branch Migration With Git-TFS When we run a basic git tfs clone command, Git-TFS only clones the main branch (trunk) and its history.To migrate all branches, we must add --branches=all option: Shell git tfs clone http://tfs-url:8080/tfs/Collection $/YourProject/MainBranch --branches=all This: Identifies TFVC branches as defined in the TFS repositoryAttempts to map them into Git branchesTries to preserve the merge relationships, if any Migrate Selective TFVC Branches to Git 1. Identify TFVC Branch Paths Find the full TFVC paths for the branches we care about: Shell $/YourProject/Main $/YourProject/Dev $/YourProject/Release/1.0 $/YourProject/Release/2.0 We can use Visual Studio or the tf branches command to list these 2. Clone the Main Branch First Shell git tfs clone http://tfs-server:8080/tfs/DefaultCollection $/YourProject/Main --with-branches --branches=none --debug This creates a Git repository tracking the Main TFVC branch and avoids pulling in unwanted branches.--branches=none ensures only this branch is cloned (avoids automatic detection of others).--with-branches initializes Git-TFS to track additional branches later. 3. Add Additional Branches Add additional desired branches using: Shell cd YourProject git tfs branch -i $/YourProject/Dev git tfs branch -i $/YourProject/Release/1.0 4. Fetch All the Branches Now download the full changeset history for all the added branches: Shell git tfs fetch 5. Verify Git Branches List the available branches in the local Git repo: Shell git branch -a Expected output: Shell * main remotes/tfs/Dev remotes/tfs/Release-1.0 6. Create Local Branches (Optional) If we want to work locally on these branches: Shell git checkout -b dev remotes/tfs/Dev git checkout -b release/1.0 remotes/tfs/Release-1.0 7. Commit (Only If Modifications Are Made Locally) If any manual changes are done to the working directory, don’t forget to commit: Shell git add . git commit -m "Post-migration cleanup or updates" 8. Push to Git Remote (e.g., Azure DevOps or GitHub) First, add your remote Git repository: Shell git remote add origin https://dev.azure.com/your-org/YourProject/_git/YourProject-Git Then push your branches: Shell git push -u origin main git push -u origin dev git push -u origin release/1.0 Other Methods While Git-TFS is one common approach, there are multiple methods to migrate from TFVC to Git, each with trade-offs depending on your goals: whether you need full history, multiple branches, scalability for large repos, or simplicity for new development. Below are the main options: 1. Shallow Migration (No History, Clean Slate) This is best for: Teams that want a fresh start in GitRewriting architecture to microservicesRepositories with bloated or irrelevant TFVC history Steps Create a Git repo.Export the latest code snapshot from TFVC (e.g, using tf get).Add, commit, and push to Git. Shell tf get $/YourProject/Main git init git add . git commit -m "Initial commit from TFVC snapshot" git remote add origin <GitRepoURL> git push -u origin main Challenges We lose historical commit history.Can't track file-level changes pre-migration. 2. Manual Branch-by-Branch Migration This is best for: Large monoliths broken down into microservicesControlled, phased migration Steps Identify key branches (e.g., main, dev, release).Export them one by one using git-tfs clone.Push each to separate Git repos or branches. Challenges Requires effort to maintain consistency across branchesRisk of missing context between branches Conclusion Migrating from TFVC to Git is not just a source control update — it's a strategic step toward modernization. Git enables speed, agility, and scalability in software development that centralized systems like TFVC cannot match. By adopting Git, you not only align with current development trends but also lay the foundation for DevOps, microservices, and scalable delivery pipelines. Whether you’re handling a single project or thousands of TFVC branches, start small, validate your process, and iterate. With the right tooling and planning, the transition to Git can be smooth and incredibly rewarding.

By Thiyagarajan Mani Chettier

Culture and Methodologies

Agile

Agile

Career Development

Career Development

Methodologies

Methodologies

Team Management

Team Management

Misunderstanding Agile: Bridging The Gap With A Kaizen Mindset

June 12, 2025 by Pabitra Saikia

How Security Engineers Can Help Build a Strong Security Culture

June 12, 2025 by Swati Babbar

When Agile Teams Fake Progress: The Hidden Danger of Status Over Substance

June 11, 2025 by Ella Mitkin

Data Engineering

AI/ML

AI/ML

Big Data

Big Data

Databases

Databases

IoT

IoT

KubeVirt: Can VM Management With Kubernetes Work?

June 12, 2025 by Chris Ward DZone Core CORE

HTAP Using a Star Query on MongoDB Atlas Search Index

June 12, 2025 by Franck Pachot

The Missing Infrastructure Layer: Why AI's Next Evolution Requires Distributed Systems Thinking

June 12, 2025 by John Vester DZone Core CORE

Software Design and Architecture

Cloud Architecture

Cloud Architecture

Integration

Integration

Microservices

Microservices

Performance

Performance

KubeVirt: Can VM Management With Kubernetes Work?

June 12, 2025 by Chris Ward DZone Core CORE

Mastering Fluent Bit: Controlling Logs With Fluent Bit on Kubernetes (Part 4)

June 12, 2025 by Eric D. Schabell DZone Core CORE

How Security Engineers Can Help Build a Strong Security Culture

June 12, 2025 by Swati Babbar

Coding

Frameworks

Frameworks

Java

Java

JavaScript

JavaScript

Languages

Languages

Tools

Tools

KubeVirt: Can VM Management With Kubernetes Work?

June 12, 2025 by Chris Ward DZone Core CORE

Mastering Fluent Bit: Controlling Logs With Fluent Bit on Kubernetes (Part 4)

June 12, 2025 by Eric D. Schabell DZone Core CORE

Beyond Java Streams: Exploring Alternative Functional Programming Approaches in Java

June 12, 2025 by Rama Krishna Prasad Bodapati

Testing, Deployment, and Maintenance

Deployment

Deployment

DevOps and CI/CD

DevOps and CI/CD

Maintenance

Maintenance

Monitoring and Observability

Monitoring and Observability

KubeVirt: Can VM Management With Kubernetes Work?

June 12, 2025 by Chris Ward DZone Core CORE

The Missing Infrastructure Layer: Why AI's Next Evolution Requires Distributed Systems Thinking

June 12, 2025 by John Vester DZone Core CORE

Mastering Fluent Bit: Controlling Logs With Fluent Bit on Kubernetes (Part 4)

June 12, 2025 by Eric D. Schabell DZone Core CORE

Popular

AI/ML

AI/ML

Java

Java

JavaScript

JavaScript

Open Source

Open Source

The Missing Infrastructure Layer: Why AI's Next Evolution Requires Distributed Systems Thinking

June 12, 2025 by John Vester DZone Core CORE

Beyond Java Streams: Exploring Alternative Functional Programming Approaches in Java

June 12, 2025 by Rama Krishna Prasad Bodapati

Designing Scalable Multi-Agent AI Systems: Leveraging Domain-Driven Design and Event Storming

June 12, 2025 by Kaustav Dey

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • [email protected]

Let's be friends: