Shamsher Khan

CORE

Sr. Engineer at GlobalLogic Inc - A Hitachi Company

Company website: https://globallogic.com

Tampa, US

Joined Sep 2025

https://opscart.com

About

Cloud & DevOps Engineer with expertise in Kubernetes, security, and AI-driven automation. IEEE Senior Member with a focus on delivering scalable architectures and advancing cloud-native engineering practices through research and hands-on contribution to the community.

Open Source Projects

Open Source Projects

opscart-k8s-watcher

Jan 2026 - Current

Kubernetes operational triage dashboard. Surfaces CrashLoops, security gaps, orphaned resources, and cost waste — tells you what to fix first. Read-only, no agents, no cloud credentials.

Docker & Container Engineering

Aug 2025 - Current

A hands‑on, lab‑based toolkit for securing containerised environments. Progress from core audit techniques to advanced AI‑enabled runtime security, with clear examples, scripts, and best practices.

Stats

Reputation:	1134
Pageviews:	50.4K
Articles:	16
Comments:	6

Articles
Comments

Articles

From Bash Script to Operational Triage: What Eight Months of Kubernetes Debugging Taught Me

Finding Kubernetes failures is easy. Knowing where to start is the hard part. Here's what eight months of building taught me.

July 9, 2026

· 2,097 Views

Building Production-Safe Agentic Remediation With Docker MCP Gateway: Lessons From 43% to 100% Accuracy

We built an AI Docker remediation system on MCP Gateway. First version: 43% correct. After 9 engineering fixes: 100%. Here's what changed.

June 29, 2026

· 2,280 Views

Your AI Coding Agent Can't Steal What It Never Had: The Docker Sandbox Isolation Story

Docker Sandbox runs AI agents in microVMs. The API key never enters the sandbox — the host proxy authenticates on the agent's behalf.

June 19, 2026

· 2,298 Views · 1 Like

Docker Hardened Images Are Free Now — Here's What You Still Need to Build

Docker Hardened Images solve the CVE problem. But CVEs aren't why containers fail in production — governance gaps are. Here's the trust architecture that closes them.

May 27, 2026

· 4,482 Views

The Pod Prometheus Never Saw: Kubernetes' Sampling Blind Spot

Prometheus sampling gaps are irreducible — reducing the scrape interval just moves the threshold. The Kubernetes watch API eliminates it entirely.

April 23, 2026

· 2,349 Views · 1 Like

Docker Secrets Management: From Development to Production

Why environment variables leak, how Docker Swarm secrets work, when to use HashiCorp Vault, and building a layered approach to secrets in production containers.

April 7, 2026

· 3,248 Views · 1 Like

When Kubernetes Says "All Green" But Your System Is Already Failing

Learn about how standard cluster observability misses the failure signals that matter most during real incidents, outages, and postmortems.

March 26, 2026

· 3,322 Views

Hands-On With Kubernetes 1.35

Tested K8s 1.35's four key features on Azure VM: zero-downtime pod resizing, gang scheduling, structured auth, and node capabilities. All scripts and configs on GitHub.

March 6, 2026

· 3,154 Views · 1 Like

When Kubernetes Forgets: The 90-Second Evidence Gap

Kubernetes heals too fast, losing diagnostic context. Engineers reconstruct incidents manually. Time-bounded queries, correlation, and intent tracking preserve evidence.

February 18, 2026

· 2,507 Views · 2 Likes

Why Terraform Pipeline Failures Still Take 30 Minutes — and How We Cut Them to 2

AI system cuts Terraform pipeline failure resolution from 30 minutes to two with automated analysis and human-approved fixes.

January 29, 2026

· 2,061 Views

Docker Runtime Escape: Why Mounting docker.sock Is Worse Than Running Privileged Containers

Tested mounting docker.sock in a container. Five minutes later: full host root access, all secrets stolen, backdoors installed. Here's how.

January 23, 2026

· 1,741 Views · 2 Likes

Advanced Docker Security: From Supply Chain Transparency to Network Defense

A practical guide to implementing SBOM generation and multi-tier network security in containerized environments. Includes real-world examples and CI/CD integration.

December 11, 2025

· 2,252 Views · 7 Likes

How I Cut Kubernetes Debugging Time by 80% With One Bash Script

Built a bash script that analyzes Kubernetes clusters in 60 seconds, generating HTML/JSON/Markdown reports. Saved 70 minutes daily across 8 clusters.

November 20, 2025

· 5,707 Views · 4 Likes

From Agent AI to Agentic AI: Building Self-Healing Kubernetes Clusters That Learn

Evolving from reactive Agent AI to autonomous Agentic AI: a self-healing Kubernetes system that learns from fixes and applies patterns automatically.

November 17, 2025

· 3,419 Views · 1 Like

Docker Security: 6 Practical Labs From Audit to AI Protection

Master Docker security with six practical labs that take you from basic configuration audits to advanced AI workload protection

November 10, 2025

· 5,079 Views · 6 Likes

AI-Assisted Kubernetes Diagnostics: A Practical Implementation

Proof-of-concept tool using GPT-4 to detect failing Kubernetes pods, analyze logs and events, and suggest fixes with human approval for common issues.

October 10, 2025

· 4,367 Views · 5 Likes

Comments

Mastering Azure Kubernetes Service: The Ultimate Guide to Scaling, Security, and Cost Optimization

Apr 23, 2026 · Jubin Abhishek Soni

Thanks for the clarification—that’s really helpful. Makes sense that the core behavior is similar since the underlying components are the same.

Mastering Azure Kubernetes Service: The Ultimate Guide to Scaling, Security, and Cost Optimization

Apr 07, 2026 · Jubin Abhishek Soni

Great article—really appreciate the clear breakdown of AKS capabilities, especially around scalability.
I’m curious, how does AKS autoscaling (cluster autoscaler + HPA) typically behave under sudden traffic spikes compared to self-managed Kubernetes clusters?

Hands-On With Kubernetes 1.35

Mar 10, 2026 · Shamsher Khan

I appreciate that! Stay tuned for more deep dives like this.

The Slow/Fast Call Orchestration: Parallelizing for Perception

Nov 24, 2025 · VIVEK KATARYA

Great insights, Vivek!

AI-Assisted Kubernetes Diagnostics: A Practical Implementation

Oct 21, 2025 · Shamsher Khan

The tool uses kubectl get pods to detect unhealthy pods, including probe failures. No OpenAI needed for that.

OpenAI is used for diagnosis. When a pod fails, the tool sends probe configs, logs, and events to GPT-4 to determine why it's failing.

For probe failures: kubectl shows the probe failed, but GPT-4 analyzes whether it's slow startup (increase initialDelaySeconds), wrong endpoint, timeout settings, or an actual app problem.

Detection = kubectl

Diagnosis = GPT-4

You can skip the LLM and read kubectl output yourself - that's traditional troubleshooting. The LLM reads the same data and suggests fixes.

AI-Assisted Kubernetes Diagnostics: A Practical Implementation

Oct 13, 2025 · Shamsher Khan

Thanks for reading — curious how others use AI in Kubernetes diagnostics