DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
What's in store for DevOps in 2023? Hear from the experts in our "DZone 2023 Preview: DevOps Edition" on Fri, Jan 27!
Save your seat
  1. DZone
  2. Coding
  3. Languages
  4. In Defense of YAML

In Defense of YAML

YAML is great as a data format. But if you try and use it as a programming language, it'll give you nightmares.

Rod Johnson user avatar by
Rod Johnson
·
Mar. 28, 19 · Tutorial
Like (4)
Save
Tweet
Share
8.60K Views

Join the DZone community and get the full member experience.

Join For Free

If you follow me on Twitter, you may think I hate YAML.

I'm not against YAML, just against abuse of YAML. I want to help prevent people abusing YAML and being cruel to themselves and their coworkers in the process.

YAML's strength is as a structured data format. Yes, it has issues. Whitespace is a minefield. Its syntax is surprisingly complex. It has gotchas: "Anyone who uses YAML long enough will eventually get burned when attempting to abbreviate Norway." But YAML is human readable and supports comments: two key benefits that drive its popularity.

Where it can go wrong is where we use YAML to describe behavior.

Consider some examples from the CI domain. This isn't the only domain in which YAML is abused this way, but it's among the worst offenders.

Take GitLab's pipeline definition for delivering itself: an 1170(!) line YAML file rife with sections like this:

gitlab:assets:compile:
  <<: *dedicated-no-docs-pull-cache-job
  image: dev.gitlab.org:5005/gitlab/gitlab-build-images:ruby-2.5.3-git-2.18-chrome-71.0-node-8.x-yarn-1.12-graphicsmagick-1.3.29-docker-18.06.1
  dependencies:
    - setup-test-env
  services:
    - docker:stable-dind
  variables:
    NODE_ENV: "production"
    RAILS_ENV: "production"
    SETUP_DB: "false"
    SKIP_STORAGE_VALIDATION: "true"
    WEBPACK_REPORT: "true"
    # we override the max_old_space_size to prevent OOM errors
    NODE_OPTIONS: --max_old_space_size=3584
    DOCKER_DRIVER: overlay2
    DOCKER_HOST: tcp://docker:2375
  script:
    - node --version
    - yarn install --frozen-lockfile --production --cache-folder .yarn-cache
    - free -m
    - bundle exec rake gitlab:assets:compile
    - time scripts/build_assets_image
    - scripts/clean-old-cached-assets
  artifacts:
    name: webpack-report
    expire_in: 31d
    paths:
      - webpack-report/
      - public/assets/

Note the script block containing a list of shell scripts. Does this look like data? Is this the right model for specifying execution?

There are many similar cases. Here is a fragment from an example of Tekton, a newish Kubernetes-based delivery solution:

apiVersion: tekton.dev/v1alpha1
kind: Task
metadata:
  name: build-push
spec:
  inputs:
    resources:
    - name: workspace
      type: git
    params:
    - name: pathToDockerFile
      description: The path to the dockerfile to build
      default: /workspace/workspace/Dockerfile
    - name: pathToContext
      description: The build context used by Kaniko (https://github.com/GoogleContainerTools/kaniko#kaniko-build-contexts)
      default: /workspace/workspace
  outputs:
    resources:
    - name: builtImage
      type: image
  steps:
  - name: build-and-push
    image: gcr.io/kaniko-project/executor
    command:
    - /kaniko/executor
    args:
    - --dockerfile=${inputs.params.pathToDockerFile}
    - --destination=${outputs.resources.builtImage.url}
    - --context=${inputs.params.pathToContext}

Ouch. Variables. Qualified names. Arguments. This is not structured data. This is programming masquerading as configuration.

Haven't we met concepts like variables and successive instructions before? Why clumsily reinvent imperative programming? What about modularity and testability? What about toolability, which we'd get for free with a programming language? Why reinvent exception handling, which is rigorously defined in modern languages? What about logical operations, let alone more advanced and elegant FP or OOP concepts?

The best argument in favor of such YAML-based syntax is that it's an external DSL, enforcing a beneficial structure. However, even this doesn't stack up, for several reasons:

  • The prescriptive structure is largely an illusion. The bulk of the work is pushed into shell scripts like this (from the GitLab example), which have no structure beyond the environment. In practice, it's the Wild West.
  • If a step is missing in the design of the DSL, you hit a wall. For example, CI tools typically model delivery phases as YAML stanzas. If you need a unique phase, you're probably out of luck.
  • YAML is a poor format for an external DSL, just as XML was. The popular configuration format du jour is always misused this way.

You probably don't want an external DSL, anyway: something we learned the hard way at Atomist.

External DSLs...are like puppies, they all start out cute and happy, but without exception turn into vicious beasts as they grow up.

Modern programming languages are flexible enough to make internal DSLs more and more compelling, with far superior tooling and extensibility.

Trying to use a data format as a programming language is wrong. Calling it out has nothing to do with the merits of the data format for what it was designed for.

YAML as data format is defensible. YAML as a programming language is not. If you're programming, use a programming language. You owe it to Turing, Hopper, Djikstra, and the countless other computer scientists and practitioners who've built our discipline. And you owe it to yourself.
YAML

Published at DZone with permission of Rod Johnson, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • A ChatGPT Job Interview for a Scrum Master Position
  • How to Cut the Release Inspection Time From 4 Days to 4 Hours
  • How To Use Terraform to Provision an AWS EC2 Instance
  • NEXT.JS 13: Be Dynamic Without Limits

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: