In Defense of YAML

DZone 's Guide to

In Defense of YAML

YAML is great as a data format. But if you try and use it as a programming language, it'll give you nightmares.

· Web Dev Zone ·
Free Resource

If you follow me on Twitter, you may think I hate YAML.

I'm not against YAML, just against abuse of YAML. I want to help prevent people abusing YAML and being cruel to themselves and their coworkers in the process.

YAML's strength is as a structured data format. Yes, it has issues. Whitespace is a minefield. Its syntax is surprisingly complex. It has gotchas: "Anyone who uses YAML long enough will eventually get burned when attempting to abbreviate Norway." But YAML is human readable and supports comments: two key benefits that drive its popularity.

Where it can go wrong is where we use YAML to describe behavior.

Consider some examples from the CI domain. This isn't the only domain in which YAML is abused this way, but it's among the worst offenders.

Take GitLab's pipeline definition for delivering itself: an 1170(!) line YAML file rife with sections like this:

  <<: *dedicated-no-docs-pull-cache-job
  image: dev.gitlab.org:5005/gitlab/gitlab-build-images:ruby-2.5.3-git-2.18-chrome-71.0-node-8.x-yarn-1.12-graphicsmagick-1.3.29-docker-18.06.1
    - setup-test-env
    - docker:stable-dind
    NODE_ENV: "production"
    RAILS_ENV: "production"
    SETUP_DB: "false"
    WEBPACK_REPORT: "true"
    # we override the max_old_space_size to prevent OOM errors
    NODE_OPTIONS: --max_old_space_size=3584
    DOCKER_DRIVER: overlay2
    DOCKER_HOST: tcp://docker:2375
    - node --version
    - yarn install --frozen-lockfile --production --cache-folder .yarn-cache
    - free -m
    - bundle exec rake gitlab:assets:compile
    - time scripts/build_assets_image
    - scripts/clean-old-cached-assets
    name: webpack-report
    expire_in: 31d
      - webpack-report/
      - public/assets/

Note the script block containing a list of shell scripts. Does this look like data? Is this the right model for specifying execution?

There are many similar cases. Here is a fragment from an example of Tekton, a newish Kubernetes-based delivery solution:

apiVersion: tekton.dev/v1alpha1
kind: Task
  name: build-push
    - name: workspace
      type: git
    - name: pathToDockerFile
      description: The path to the dockerfile to build
      default: /workspace/workspace/Dockerfile
    - name: pathToContext
      description: The build context used by Kaniko (https://github.com/GoogleContainerTools/kaniko#kaniko-build-contexts)
      default: /workspace/workspace
    - name: builtImage
      type: image
  - name: build-and-push
    image: gcr.io/kaniko-project/executor
    - /kaniko/executor
    - --dockerfile=${inputs.params.pathToDockerFile}
    - --destination=${outputs.resources.builtImage.url}
    - --context=${inputs.params.pathToContext}

Ouch. Variables. Qualified names. Arguments. This is not structured data. This is programming masquerading as configuration.

Haven't we met concepts like variables and successive instructions before? Why clumsily reinvent imperative programming? What about modularity and testability? What about toolability, which we'd get for free with a programming language? Why reinvent exception handling, which is rigorously defined in modern languages? What about logical operations, let alone more advanced and elegant FP or OOP concepts?

The best argument in favor of such YAML-based syntax is that it's an external DSL, enforcing a beneficial structure. However, even this doesn't stack up, for several reasons:

  • The prescriptive structure is largely an illusion. The bulk of the work is pushed into shell scripts like this (from the GitLab example), which have no structure beyond the environment. In practice, it's the Wild West.
  • If a step is missing in the design of the DSL, you hit a wall. For example, CI tools typically model delivery phases as YAML stanzas. If you need a unique phase, you're probably out of luck.
  • YAML is a poor format for an external DSL, just as XML was. The popular configuration format du jour is always misused this way.

You probably don't want an external DSL, anyway: something we learned the hard way at Atomist.

External DSLs...are like puppies, they all start out cute and happy, but without exception turn into vicious beasts as they grow up.

Modern programming languages are flexible enough to make internal DSLs more and more compelling, with far superior tooling and extensibility.

Trying to use a data format as a programming language is wrong. Calling it out has nothing to do with the merits of the data format for what it was designed for.

YAML as data format is defensible. YAML as a programming language is not. If you're programming, use a programming language. You owe it to Turing, Hopper, Djikstra, and the countless other computer scientists and practitioners who've built our discipline. And you owe it to yourself.
data formats, web application data, web dev, yaml

Published at DZone with permission of Rod Johnson , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}