DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Culture and Methodologies
  3. Career Development
  4. When YAML Breeds: Distributed Denial of Productivity

When YAML Breeds: Distributed Denial of Productivity

As services multiply, so do the YAML files, and so does duplication and drift. This holds us back.

Rod Johnson user avatar by
Rod Johnson
·
May. 15, 19 · Tutorial
Like (1)
Save
Tweet
Share
6.85K Views

Join the DZone community and get the full member experience.

Join For Free

My last post discussed the danger of programming in YAML, with examples from the CI domain, where YAML is the norm for defining delivery pipelines. Worse still is YAML programming at scale. As services multiply, so do the YAML files, and so does duplication and drift. This holds us back.

Proliferation, Duplication, and Drift

Consider Microsoft's GitHub organization. To ensure an apples to apples comparison, I focused on one CI tool (Travis) and one stack (Node).

Out of 90 repositories, there 40 different Travis script stanzas. Some lint, some don't. Some don't even use npm. 35 are one offs, some with elaborate script blocks. There are 20 different beforeInstall scripts and 6 different afterSuccess scripts, some reporting test coverage one way, others another way, most not at all.

Both the inconsistency and duplication are problematic. Why perform the same logical steps differently or in a different order? Why say something more than once? If this were application code, we'd consider this a smell.

There's no sense of organizational best practice. We either care about linting and test coverage or we don't. Unlike, say, dependencies expressed in a package.json file, such concerns are unlikely to vary by repository.

Delivery across this organization fails a basic test. It would be hard to add or change functionality. For example, the realistic requirement to "add test coverage reporting to all repos" would require over 80 repos to be modified. Adding CVE scanning to all repos would touch every one, and the changes required would differ on a repo by repo basis. Such important concerns affect all repositories and each policy should be updatable in one place.

The technical solution does not match the problem. Modeling the delivery of Microsoft's Node projects in 90 distinct YAML files does not match the requirements.

Wait: What Are We Trying to Accomplish?

Let's step back and consider what we're trying to do in delivery at scale.

A problem statement might be something like, We have many projects and need to deliver them safely, where "safely" means built successfully, with no known CVEs, meeting organizational standards for code quality and formatting, and passing all tests. The definition of "safely" changes over time. We may discover additional checks we need to run, additional teams or systems we need to notify of progress, or ways to optimize or correct invocation of compilers and other tools.

The problem statement would certainly not be, We have many repositories, each of which should have a distinct build pipeline. Yet this is what we do by default today. When behavior is scattered across hundreds of repositories, it's impractical to evolve it. We get inconsistency and errors we can't easily fix.

A model of one pipeline definition per repository doesn't reflect the real problem. We need delivery policies at team level, not repository level.

The traditional default of defining behaviors at repository level with clumsy ways of sharing common steps is the wrong way around. We should share behaviors by default, with the ability to specialize by repository or logical group where needed. Delivery actions are cross-cutting concerns.

Escaping One Pipeline per Repository

CI was a big step forward for our industry. CI files were once our friends. When we had a small number of large projects, one pipeline per repository worked fine. Now we have many smaller projects, we need to think at organizational level.

To achieve organizational policy we need more context. In particular:

  • Delivery needs to be smarter. We need a domain model to help us work with projects and their delivery, grouping behavior as necessary. For example, we should be able to treat Node projects differently from Java projects, and TypeScript projects differently from JavaScript projects.
  • We need a richer concept of stages. In place of a fill-in-the-blanks model such Travis's (with hooks like script, beforeInstall and afterSuccess), we need to model meaningful phases such as lint/fix, build, test, and deploy, and allow meaningful custom stages to be defined.
  • We need a more sophisticated way of expressing behavior. Logic belongs in a programming language, not YAML lists of scripts.

We need to rethink the notion of the static pipeline defined upfront. In my next post, I'll show how an event-driven approach backed by a rich domain model is more powerful and flexible, avoids duplication and drift and enables us to evolve our delivery in real time. We can apply modern software engineering principles to make our delivery solutions match modern requirements.

Thanks to Chris Swan for the phrase "distributed denial of productivity."

YAML Productivity

Published at DZone with permission of Rod Johnson, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Using AI and Machine Learning To Create Software
  • Kubernetes vs Docker: Differences Explained
  • GPT-3 Playground: The AI That Can Write for You
  • How To Avoid “Schema Drift”

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: