DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Software Design and Architecture Topics

article thumbnail
TPU vs GPU: Real-World Performance Testing for LLM Training on Google Cloud
H100 GPUs are best for flexibility, fast iteration, and custom CUDA work. TPU v5p wins on GCP for large-scale LLM training with better cost efficiency and scaling.
January 30, 2026
by Jubin Abhishek Soni DZone Core CORE
· 3,487 Views
article thumbnail
Designing Irreversible Security Release at Hyper-Scale: Lessons Learned From Things You Can’t Undo
Engineers rely on rollback to keep systems stable—but sometimes it isn’t possible. This article explores irreversible changes and why baking and testing matter.
January 30, 2026
by Arun Anbumani
· 1,010 Views
article thumbnail
Reliable AI Agent Architecture for Mobile: Timeouts, Retries, and Idempotent Tool Calls
Ship reliable mobile agents: timeout everything, retry by error class, persist steps across restarts, and require idempotency keys for write tools.
January 29, 2026
by Mohan Sankaran
· 1,838 Views · 6 Likes
article thumbnail
5 Technical Strategies for Scaling SaaS Applications
In this article, I want to take a closer look at the pitfalls of popular SaaS scaling strategies, drawing on my own experience, and share the lessons learned.
January 29, 2026
by Mykhailo Kopyl
· 1,850 Views · 1 Like
article thumbnail
Cognitive Load-Aware DevOps: Improving SRE Reliability
SRE reliability depends on human cognition as much as infrastructure. Reducing cognitive load is key to resilient systems.
January 29, 2026
by Oreoluwa Omoike
· 2,191 Views
article thumbnail
Automating AWS Glue Infra and Code Reviews With RAG and Amazon Bedrock
Automate AWS Glue reviews with infra-first RAG governance, enforcing enterprise standards, reducing manual work, and shifting checks left.
January 29, 2026
by pooja chhabra
· 1,835 Views
article thumbnail
Cloud Systems Drift: What Happens When Exceptions Become the System
Cloud systems drift when exceptions accumulate, and decisions lose connection to original objectives. Clear requirements and early security design prevent sprawl.
January 29, 2026
by Kevin Maki
· 3,938 Views
article thumbnail
Why Terraform Pipeline Failures Still Take 30 Minutes — and How We Cut Them to 2
AI system cuts Terraform pipeline failure resolution from 30 minutes to two with automated analysis and human-approved fixes.
January 29, 2026
by Shamsher Khan DZone Core CORE
· 2,018 Views
article thumbnail
2 Hidden Bottlenecks in Large-Scale Azure Migrations
Moving a massive on-premise system to the cloud isn't just about copying VMs. Here is how to overcome the two hidden performance killers.
January 28, 2026
by Dippu Kumar Singh
· 2,104 Views
article thumbnail
AI-Powered DevSecOps: Automating Security with Machine Learning Tools
AI-driven development is outpacing security teams. This piece examines where AI-powered security actually help, where they fail, and how teams can use them responsibly.
January 28, 2026
by Igboanugo David Ugochukwu DZone Core CORE
· 2,080 Views · 1 Like
article thumbnail
From Monolith to Modular Monolith: A Smarter Alternative to Microservices
Microservices introduce distributed-systems complexity most teams underestimate: failures, coordination drag, observability sprawl, and ballooning costs.
January 28, 2026
by David Iyanu Jonathan
· 3,016 Views · 3 Likes
article thumbnail
Zero Trust for Agents: Implementing Context Lineage in the Enterprise Data Mesh
Agent identity and its audit history will enforce zero-trust access for agents based on both identity and past behavior. This makes agent access more secure and reliable.
January 28, 2026
by Anshul Pathak
· 1,963 Views · 1 Like
article thumbnail
The Serverless Ceiling: Designing Write-Heavy Backends With Aurora Limitless
Break the single-writer bottleneck by aligning AWS Lambda, RDS Proxy, and the Aurora Limitless router into a cohesive architecture.
January 28, 2026
by Nabin Debnath
· 3,845 Views · 2 Likes
article thumbnail
GraphQL vs REST — Which Is Better?
Which API design paradigm should you choose for your application? GraphQL or REST. This article looks at the pros and cons of each.
January 27, 2026
by Ananth Iyer
· 6,780 Views · 7 Likes
article thumbnail
Cost-Aware GenAI Architecture: Caching, Model Routing, and Token Budgets That Don’t Explode
Keep GenAI cheap and fast: cache aggressively, route models by confidence, cap tokens and tools, compress context, and monitor cost per successful outcome.
January 27, 2026
by Mohan Sankaran
· 2,855 Views · 5 Likes
article thumbnail
Designing Mathematical Software for Humans
Mathematical software should mirror how people reason, not just compute. Design APIs that express ideas clearly and make exploration intuitive.
January 27, 2026
by Dhyey Mavani
· 2,217 Views
article thumbnail
Versioning Lies: A Date Contract Is a Promise That Never Breaks
Modify URI-based API versioning to use date-based versions, easing operations, ensuring immutability, and also separating core logic from API responses.
January 27, 2026
by Akash Rasal
· 1,117 Views · 1 Like
article thumbnail
Edge-First AI Architecture: Designing Low-Latency, Offline-Capable Intelligence
Most Android AI features stall on flaky networks; an edge-first architecture runs key models on-device, with cloud used only as an optional upgrade.
January 27, 2026
by Mohan Sankaran
· 2,483 Views · 5 Likes
article thumbnail
An Introduction to the Four Pillars of Observability
The blog introduces you to the four pillars of observability, AWS and Azure cloud-native services, and ROI to help in architects and engineer's quest for system clarity.
January 27, 2026
by Akash Lomas
· 1,490 Views
article thumbnail
Building an AI Agent Traffic Management Platform: APISIX AI Gateway in Practice
A global appliance leader uses APISIX AI Gateway for unified LLM traffic control, hybrid cloud routing, and multi-tenant isolation.
January 26, 2026
by Yilia Lin
· 1,231 Views
  • Previous
  • ...
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×