Platform Engineering Golden Paths: Stop Building Developer Portals, Start Shipping Code

Platform engineering is backward: 80% portal building, 20% path paving. Flip it. Golden paths reach 95% adoption by making the right thing the easiest.

Dinesh Elumalai

CORE ·

Jan. 08, 26 · Tutorial

Likes (2)

Comment

Save

1.4K Views

Here’s the uncomfortable truth: if your platform team is spending 80% of its time building portals and only 20% paving paths, you’re doing platform engineering backward. The revolution isn’t about prettier UIs — it’s about invisible automation that makes the right thing the easiest thing.

The Portal Problem Nobody Talks About

Platform teams are solving the wrong problem. They’re building museums of infrastructure when developers need highways to production. I’ve seen this pattern repeat at companies ranging from scrappy Series A startups to multinational corporations: hire a platform team, mandate Backstage or Humanitec, spend six months integrating everything, launch with fanfare — and then watch adoption plateau at 30% while developers continue cowboy-coding in production.

The issue isn’t the tools — Backstage is actually pretty good. The problem is thinking that a portal is the platform. It’s like believing that building a fancy airport terminal will make planes fly faster. The terminal is nice, but the real value is in the air traffic control, the runways, and the flight paths that get passengers from New York to London safely and efficiently.

What Golden Paths Actually Look Like

Golden paths aren’t documentation. They’re not templates. They’re pre-paved highways where developers can merge onto production traffic at full speed without thinking about infrastructure details. When I talk about golden paths, I’m talking about making deployment so boring and automated that it becomes invisible.

At one of the companies I worked with, we replaced a 47-page deployment wiki (which was always out of date) with a single command: make deploy. That Makefile called a Terraform module that handled VPC setup, security groups, load balancers, auto-scaling, monitoring dashboards, log aggregation, secret management, and deployment — all templated with sensible defaults that teams could override if needed.

The result? Deployment time dropped from 3.5 hours to 12 minutes. More importantly, the number of production incidents caused by misconfiguration dropped by 73% because the golden path encoded our security and reliability best practices. Developers didn’t have to remember to enable encryption or configure health checks — it happened automatically.

The Golden Path Principle: Make the secure, observable, scalable option the path of least resistance. If developers have to read documentation to do the right thing, you’ve already lost.

The Three Pillars of Effective Golden Paths

1. Automation Over Documentation

Every time you write a wiki page explaining how to deploy something, you’re admitting that your platform isn’t automated enough. Documentation rots the moment you publish it. Code doesn’t (well, it does — but at least you can test it).

Here’s what I mean in practice. Instead of documenting “How to Create a New Microservice,” create a Terraform module that generates the entire stack:

    JSON
   
 

   variable "service_name" {
  description = "Name of the microservice"
  type        = string
}

variable "team" {
  description = "Owning team for tagging and access control"
  type        = string
}

variable "runtime" {
  description = "Runtime environment: nodejs, python, go"
  type        = string
  default     = "nodejs"
}

# Opinionated defaults that encode best practices
locals {
  common_tags = {
    ManagedBy   = "Platform-Engineering"
    Team        = var.team
    Service     = var.service_name
    Environment = terraform.workspace
  }
  
  # Security defaults
  enable_encryption    = true
  enable_audit_logging = true
  enable_waf          = terraform.workspace == "prod"
  
  # Observability defaults
  metrics_retention_days = 30
  log_retention_days     = 90
  enable_tracing        = true
}

# ECS Task Definition with sensible defaults
resource "aws_ecs_task_definition" "service" {
  family                   = var.service_name
  requires_compatibilities = ["FARGATE"]
  network_mode            = "awsvpc"
  cpu                     = "256"
  memory                  = "512"
  execution_role_arn      = aws_iam_role.execution.arn
  task_role_arn           = aws_iam_role.task.arn

  container_definitions = jsonencode([{
    name  = var.service_name
    image = "${data.aws_caller_identity.current.account_id}.dkr.ecr.${data.aws_region.current.name}.amazonaws.com/${var.service_name}:latest"
    
    portMappings = [{
      containerPort = 8080
      protocol      = "tcp"
    }]
    
    # Automatic logging to CloudWatch
    logConfiguration = {
      logDriver = "awslogs"
      options = {
        "awslogs-group"         = aws_cloudwatch_log_group.service.name
        "awslogs-region"        = data.aws_region.current.name
        "awslogs-stream-prefix" = var.service_name
      }
    }
    
    # Health check defaults
    healthCheck = {
      command     = ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"]
      interval    = 30
      timeout     = 5
      retries     = 3
      startPeriod = 60
    }
    
    # Environment variables from Parameter Store
    secrets = [{
      name      = "DATABASE_URL"
      valueFrom = aws_ssm_parameter.db_url.arn
    }]
  }])

  tags = local.common_tags
}

# Application Load Balancer with HTTPS
resource "aws_lb" "service" {
  name               = "${var.service_name}-alb"
  internal           = false
  load_balancer_type = "application"
  security_groups    = [aws_security_group.alb.id]
  subnets           = data.aws_subnets.public.ids

  enable_deletion_protection = terraform.workspace == "prod"
  enable_http2              = true
  
  tags = local.common_tags
}

# CloudWatch Dashboard automatically created
resource "aws_cloudwatch_dashboard" "service" {
  dashboard_name = "${var.service_name}-${terraform.workspace}"

  dashboard_body = jsonencode({
    widgets = [
      {
        type = "metric"
        properties = {
          metrics = [
            ["AWS/ECS", "CPUUtilization", "ServiceName", aws_ecs_service.service.name],
            [".", "MemoryUtilization", ".", "."]
          ]
          period = 300
          stat   = "Average"
          region = data.aws_region.current.name
          title  = "Resource Utilization"
        }
      },
      {
        type = "metric"
        properties = {
          metrics = [
            ["AWS/ApplicationELB", "TargetResponseTime", "LoadBalancer", aws_lb.service.arn_suffix],
            [".", "HTTPCode_Target_5XX_Count", ".", "."],
            [".", "RequestCount", ".", "."]
          ]
          period = 60
          stat   = "Sum"
          region = data.aws_region.current.name
          title  = "Application Metrics"
        }
      }
    ]
  })
}

# Outputs for CI/CD integration
output "service_url" {
  value = "https://${aws_lb.service.dns_name}"
}

output "ecr_repository" {
  value = aws_ecr_repository.service.repository_url
}

output "deployment_role_arn" {
  value = aws_iam_role.github_actions.arn
}
  

Developers don’t need to know how to configure an ALB or set up CloudWatch dashboards. They just run:

terraform apply -var="service_name=payment-api" -var="team=payments"

And they get a production-ready service with monitoring, logging, auto-scaling, and security baked in.

2. Opinionated Defaults with Escape Hatches

The best golden paths are opinionated but not restrictive. They should handle 90% of use cases perfectly and provide clear override mechanisms for the other 10%. This is where most platform teams fail — they either build something so rigid that teams route around it, or so flexible that it’s basically infrastructure as a service with extra steps.

I learned this lesson the hard way. My first attempt at building a golden path for database provisioning gave teams 47 configuration options. Guess how many teams actually used it? Three. The rest went directly to the AWS Console because our “flexible” solution was more complex than doing it manually.

The second version had exactly two options: a small database (dev/test) and a production database (with all the bells and whistles). If you needed something custom, there was a custom_config map where you could override anything. Usage went from three teams to 87 teams in two months.

3. Observability Built In, Not Bolted On

If developers have to set up monitoring after deployment, they won’t do it — or they’ll do it wrong. Your golden path should automatically create CloudWatch dashboards, configure log aggregation, set up distributed tracing, and establish reasonable alerting thresholds.

At my current company, every service deployed through our golden path automatically gets a Grafana dashboard with the RED metrics (Rate, Errors, Duration), a PagerDuty integration for critical alerts, and log correlation across application logs, infrastructure logs, and traces. Developers don’t configure any of this — it just appears when their service goes live.

The GitHub Actions Integration Nobody Builds (But Should)

Here’s where it gets interesting. Most platform teams stop at Terraform modules or CLI tools. But the real magic happens when you integrate golden paths directly into developers’ existing workflows. If your developers use GitHub (and most do), that means GitHub Actions.

Instead of asking developers to run Terraform commands locally or SSH into some deployment server, why not make deployment automatic on merge to main? Here’s a complete GitHub Actions workflow that deploys using our golden path:

    YAML
   
 

   name: Deploy to Production

on:
  push:
    branches: [main]
  workflow_dispatch:

env:
  AWS_REGION: us-east-1
  SERVICE_NAME: ${{ github.event.repository.name }}
  TEAM: ${{ github.repository_owner }}

jobs:
  deploy:
    runs-on: ubuntu-latest
    permissions:
      id-token: write
      contents: read
    
    steps:
      - name: Checkout code
        uses: actions/checkout@v4
      
      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_DEPLOYMENT_ROLE }}
          aws-region: ${{ env.AWS_REGION }}
      
      - name: Login to Amazon ECR
        id: login-ecr
        uses: aws-actions/amazon-ecr-login@v2
      
      - name: Build and scan container image
        env:
          ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
          IMAGE_TAG: ${{ github.sha }}
        run: |
          # Build image
          docker build -t $ECR_REGISTRY/$SERVICE_NAME:$IMAGE_TAG .
          docker tag $ECR_REGISTRY/$SERVICE_NAME:$IMAGE_TAG $ECR_REGISTRY/$SERVICE_NAME:latest
          
          # Security scanning with Trivy
          docker run --rm -v /var/run/docker.sock:/var/run/docker.sock \
            aquasec/trivy image --severity HIGH,CRITICAL \
            --exit-code 1 $ECR_REGISTRY/$SERVICE_NAME:$IMAGE_TAG
          
          # Push if scan passes
          docker push $ECR_REGISTRY/$SERVICE_NAME:$IMAGE_TAG
          docker push $ECR_REGISTRY/$SERVICE_NAME:latest
      
      - name: Run tests
        run: |
          docker run --rm $ECR_REGISTRY/$SERVICE_NAME:$IMAGE_TAG npm test
      
      - name: Deploy infrastructure with golden path
        uses: hashicorp/setup-terraform@v3
      
      - name: Terraform Init
        run: |
          cat > backend.tf << EOF
          terraform {
            backend "s3" {
              bucket = "platform-terraform-state"
              key    = "services/$SERVICE_NAME/terraform.tfstate"
              region = "$AWS_REGION"
            }
          }
          EOF
          
          cat > main.tf << EOF
          module "service" {
            source = "git::https://github.com/your-org/terraform-golden-path.git//modules/service-scaffold?ref=v2.1.0"
            
            service_name = "$SERVICE_NAME"
            team         = "$TEAM"
            runtime      = "nodejs"
            
            # Environment-specific overrides
            environment_config = {
              prod = {
                min_capacity = 2
                max_capacity = 10
                cpu          = "512"
                memory       = "1024"
              }
            }
          }
          
          output "service_url" {
            value = module.service.service_url
          }
          EOF
          
          terraform init
      
      - name: Terraform Plan
        run: terraform plan -out=tfplan
      
      - name: Terraform Apply
        run: terraform apply -auto-approve tfplan
      
      - name: Update service with new image
        env:
          IMAGE_TAG: ${{ github.sha }}
        run: |
          aws ecs update-service \
            --cluster platform-services \
            --service $SERVICE_NAME \
            --force-new-deployment \
            --region $AWS_REGION
      
      - name: Wait for deployment
        run: |
          aws ecs wait services-stable \
            --cluster platform-services \
            --services $SERVICE_NAME \
            --region $AWS_REGION
      
      - name: Run smoke tests
        run: |
          SERVICE_URL=$(terraform output -raw service_url)
          curl -f $SERVICE_URL/health || exit 1
      
      - name: Notify deployment success
        if: success()
        uses: slackapi/slack-github-action@v1
        with:
          payload: |
            {
              "text": " ${{ env.SERVICE_NAME }} deployed to production",
              "blocks": [
                {
                  "type": "section",
                  "text": {
                    "type": "mrkdwn",
                    "text": "*Deployment Successful*\n\n*Service:* ${{ env.SERVICE_NAME }}\n*Commit:* ${{ github.sha }}\n*Author:* ${{ github.actor }}\n*URL:* $(terraform output -raw service_url)"
                  }
                }
              ]
            }
        env:
          SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK }}
      
      - name: Rollback on failure
        if: failure()
        run: |
          echo "Deployment failed, initiating rollback..."
          aws ecs update-service \
            --cluster platform-services \
            --service $SERVICE_NAME \
            --task-definition $SERVICE_NAME:$(aws ecs describe-services --cluster platform-services --services $SERVICE_NAME --query 'services[0].deployments[1].taskDefinition' --output text) \
            --region $AWS_REGION
  

With this workflow, developers get:

Automatic security scanning on every build
Infrastructure provisioning that happens once and updates intelligently
Zero-downtime deployments with automatic health checks
Automatic rollback if health checks fail
Slack notifications for visibility
Full traceability from commit to production

And here’s the kicker: they don’t have to maintain any of this. The platform team owns the golden path module and the reusable workflow. When you need to update security policies or add new compliance requirements, you update the module version, and all services automatically inherit the improvements.

Measuring What Actually Matters

Portal teams love to measure “portal engagement” metrics — page views, catalog entries, number of plugins installed. These are vanity metrics. They tell you whether people are clicking around your portal, not whether you’re actually making them more productive.

Golden path teams measure different things:

The chart above shows real data from a company that transitioned from a portal-heavy approach to golden paths in June 2024. Notice how deployment times remained stubbornly high for the first six months (the portal era), then dropped precipitously once golden paths were introduced. By December, average deployment time had decreased from 195 minutes to just 12 minutes — a 94% improvement.

The Hard Parts Nobody Warns You About

Building golden paths isn’t all sunshine and roses. There are real challenges you’ll face, and I’d be doing you a disservice if I didn’t mention them.

Challenge #1: The “But We’re Special” Problem

Every team thinks its use case is unique and requires special treatment. Ninety percent of the time, they’re wrong. The remaining 10% of the time, they’re right — but still shouldn’t get custom infrastructure. Your job is to build a golden path that handles the common case brilliantly and provides clear escape hatches for legitimate edge cases.

Challenge #2: Keeping Defaults Current

Your golden path encodes today’s best practices — but best practices change. You need a strategy for updating the path without breaking existing services. We handle this through versioned modules and progressive rollouts: new services get the latest version automatically, existing services can opt in to upgrades, and we force upgrades for security-critical changes.

Challenge #3: The Portal People Will Fight You

If you’ve already invested in a developer portal, there will be people (probably senior people) who have a lot of ego and budget tied up in that investment. They’ll argue that portals and golden paths are complementary. They’re not wrong — but they’re not right either. A portal can be useful for discovery and documentation, but it shouldn’t be in the critical path for deployment.

My advice? Start small, prove value, and let adoption speak for itself. When 90% of your developers are using the golden path and bypassing the portal, the conversation shifts from “Should we do this?” to “How do we expand this?”

Implementation Roadmap: Your First 90 Days

If I were starting a golden path initiative tomorrow at a new company, here’s exactly what I’d do:

Days 1–14: Research & Validation

Interview 10–15 developers about their deployment pain points
Shadow a team through their entire deployment process
Document every manual step, every “tribal knowledge” requirement, and every “just SSH in and fix it” moment
Identify the one service type that 70%+ of teams need (usually a stateless API)

Days 15–45: Build the MVP Golden Path

Create a Terraform module for that one service type
Encode security, observability, and scaling best practices as defaults
Build a GitHub Actions workflow that uses the module
Deploy exactly one service using it (preferably something internal and low-risk)
Measure everything: time to deploy, number of manual steps, error rate

Days 46–60: Pilot Program

Get 3–5 teams to migrate existing services to the golden path
Sit with them during migration and document every question and pain point
Iterate rapidly based on feedback
Measure adoption metrics and improvements in deployment time

Days 61–90: Scale & Evangelize

Create clear documentation (but keep it minimal — the path should be self-documenting)
Present results to engineering leadership with hard metrics
Make the golden path the default for all new services
Start planning golden paths for other service types (databases, batch jobs, etc.)

The Future Is Paths, Not Portals

I’m not saying developer portals are completely useless. They have a place — for service discovery, documentation, ownership tracking, and organizational visibility. But they’re not the platform. They’re a layer on top of the platform.

The real platform is the golden path — the automated, opinionated, batteries-included way to go from code to production safely and quickly. It’s the infrastructure-as-code modules, the CI/CD pipelines, the security scanning, the automatic observability, and the guardrails that prevent developers from shooting themselves in the foot.

When you shift your focus from building portals to paving paths, something magical happens. Developers stop seeing the platform team as a blocker and start seeing it as a force multiplier. Deployment becomes boring (in the best way). Production incidents decrease. Onboarding time shrinks. And most importantly, your organization ships code faster.

That’s the promise of platform engineering done right — not another dashboard to click through, but invisible automation that makes excellence the path of least resistance.

Get Started: Complete Implementation

Want to implement this at your organization? I’ve created a complete, production-ready golden path implementation you can use as a starting point. The repository includes:

Terraform modules for common service patterns (APIs, workers, scheduled jobs)
GitHub Actions reusable workflows with security scanning and deployment
Example services showing different configurations
Documentation and migration guides
Observability dashboards and alerting templates

Download the complete implementation from the accompanying GitHub repository. Adapt it to your infrastructure, customize the defaults to match your policies, and start shipping code faster.

GitHub Repo: https://github.com/dinesh-k-elumalai/golden-path-repo

Engineering dev teams platform engineering

Opinions expressed by DZone contributors are their own.

Related

Trending