DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • Software Delivery at Scale: Centralized Jenkins Pipeline for Optimal Efficiency
  • Concourse CI/CD Pipeline: Webhook Triggers
  • Automating Data Pipelines: Generating PySpark and SQL Jobs With LLMs in Cloudera
  • Optimize Deployment Pipelines for Speed, Security and Seamless Automation

Trending

  • Agile’s Quarter-Century Crisis
  • Creating a Web Project: Caching for Performance Optimization
  • Modern Test Automation With AI (LLM) and Playwright MCP
  • Efficient API Communication With Spring WebClient
  1. DZone
  2. Testing, Deployment, and Maintenance
  3. DevOps and CI/CD
  4. Refactoring with Loops and Collection Pipelines: Part 1

Refactoring with Loops and Collection Pipelines: Part 1

By 
Martin Fowler user avatar
Martin Fowler
·
Jul. 07, 15 · Tutorial
Likes (0)
Comment
Save
Tweet
Share
3.8K Views

Join the DZone community and get the full member experience.

Join For Free

 The loop is the classic way of processing collections, but with the greater adoption of first-class functions in programming languages the collection pipeline is an appealing alternative. In this article I look at refactoring loops to collection pipelines with a series of small examples. 

I'm publishing this article in installments. This adds an example of refactoring a loop that summarizes flight delay data for each destination airport.

A common task in programming is processing a list of objects. Most programmers naturally do this with a loop, as it's one of the basic control structures we learn with our very first programs. But loops aren't the only way to represent list processing, and in recent years more people are making use of another approach, which I call the collection pipeline. This style is often considered to be part of functional programming, but I used it heavily in Smalltalk. As OO languages support lambdas and libraries that make first class functions easier to program with, then collection pipelines become an appealing choice.


Refactoring a Simple Loop into a Pipeline

I'll start with a simple example of a loop and show the basic way I refactor one into a collection pipeline.

Let's imagine we have a list of authors, each of which has the following data structure.

class Author...

  public string Name { get; set; }
  public string TwitterHandle { get; set;}
  public string Company { get; set;}

This example uses C#

Here is the loop.

class Author...

  static public IEnumerable<String> TwitterHandles(IEnumerable<Author> authors, string company) {
    var result = new List<String> ();
    foreach (Author a in authors) {
      if (a.Company == company) {
        var handle = a.TwitterHandle;
        if (handle != null)
          result.Add(handle);
      }
    }
    return result;
  }

My first step in refactoring a loop into a collection pipeline is to apply Extract Variable on the loop collection.

class Author...

  static public IEnumerable<String> TwitterHandles(IEnumerable<Author> authors, string company) {
    var result = new List<String> ();
    var loopStart = authors;
    foreach (Author a in loopStart) {
      if (a.Company == company) {
        var handle = a.TwitterHandle;
        if (handle != null)
          result.Add(handle);
      }
    }
    return result;
  }

This variable gives me a starting point for pipeline operations. I don't have a good name for it right now, so I'll use one that makes sense for the moment, expecting to rename it later.

I then start looking at bits of behavior in the loop. The first thing I see is a conditional check, I can move this to the pipeline with a .

class Author...

  static public IEnumerable<String> TwitterHandles(IEnumerable<Author> authors, string company) {
    var result = new List<String> ();
    var loopStart = authors
      .Where(a => a.Company == company);
    foreach (Author a in loopStart) {
      if (a.Company == company) {
        var handle = a.TwitterHandle;
        if (handle != null)
          result.Add(handle);
      }
    }
    return result;
  }

I see the next part of the loop operates on the twitter handle, rather than the author, so I can use a a .

class Author...

  static public IEnumerable<String> TwitterHandles(IEnumerable<Author> authors, string company) {
    var result = new List<String> ();
    var loopStart = authors
      .Where(a => a.Company == company)
      .Select(a => a.TwitterHandle);
    foreach (string handle in loopStart) {
      var handle = a.TwitterHandle;
      if (handle != null)
        result.Add(handle);
     }
    return result;
  }

Next in the loop as another conditional, which again I can move to a filter operation.

class Author...

  static public IEnumerable<String> TwitterHandles(IEnumerable<Author> authors, string company) {
    var result = new List<String> ();
    var loopStart = authors
      .Where(a => a.Company == company)
      .Select(a => a.TwitterHandle)
      .Where (h => h != null);
    foreach (string handle in loopStart) {
      if (handle != null)
        result.Add(handle);
    }
    return result;
  }

All the loop now does is add everything in its loop collection into the result collection, so I can remove the loop and just return the pipeline result.

class Author...

  static public IEnumerable<String> TwitterHandles(IEnumerable<Author> authors, string company) {
    var result = new List<String> ();
    return authors
      .Where(a => a.Company == company)
      .Select(a => a.TwitterHandle)
      .Where (h => h != null);
    foreach (string handle in loopStart) {
      result.Add(handle);
    }
    return result;
  }

Here's the final state of the code

class Author...

  static public IEnumerable<String> TwitterHandles(IEnumerable<Author> authors, string company) {
    return authors
      .Where(a => a.Company == company)
      .Select(a => a.TwitterHandle)
      .Where (h => h != null);
  }

What I like about collection pipelines is that I can see the flow of logic as the elements of the list pass through the pipeline. For me it reads very closely to how I'd define the outcome of the loop "take the authors, choose those who have a company, and get their twitter handles removing any null handles".

Furthermore, this style of code is familiar even in different languages who have different syntaxes and different names for pipeline operators.

Java

  public List<String> twitterHandles(List<Author> authors, String company) {
    return authors.stream()
            .filter(a -> a.getCompany().equals(company))
            .map(a -> a.getTwitterHandle())
            .filter(h -> null != h)
            .collect(toList());
  }

Ruby

  def twitter_handles authors, company
    authors
      .select {|a| company == a.company}
      .map    {|a| a.twitter_handle}
      .reject {|h| h.nil?}
  end

while this matches the other examples, I would replace the final reject with compact

Clojure

  (defn twitter-handles [authors company]
    (->> authors
         (filter #(= company (:company %)))
         (map :twitter-handle)
         (remove nil?)))

F#

  let twitterHandles (authors : seq<Author>, company : string) = 
       authors
           |> Seq.filter(fun a -> a.Company = company)
           |> Seq.map(fun a -> a.TwitterHandle)
           |> Seq.choose (fun h -> h)

again, if I wasn't concerned about matching the structure of the other examples I would combine the map and choose into a single step

I've found that once I got used to thinking in terms of pipelines I could apply them quickly even in an unfamiliar language. Since the fundamental approach is the same it's relatively easy to translate from even unfamiliar syntax and function names.

Refactoring within the Pipeline, and to a Comprehension

Once you have some behavior expressed as a pipeline, there are potential refactorings you can do by reordering steps in the pipeline. One such move is that if you have a map followed by a filter, you can usually move the filter before the map like this.

class Author...

  static public IEnumerable<String> TwitterHandles(IEnumerable<Author> authors, string company) {
    return authors
      .Where(a => a.Company == company)
      .Where (a => a.TwitterHandle != null)
      .Select(a => a.TwitterHandle);
  }

When you have two adjacent filters, you can combine them using a conjunction.

class Author...

  static public IEnumerable<String> TwitterHandles(IEnumerable<Author> authors, string company) {
    return authors
      .Where(a => a.Company == company && a.TwitterHandle != null)
      .Select(a => a.TwitterHandle);
  }

Once I have a C# collection pipeline in the form of a simple filter and map like this, I can replace it with a Linq expression

class Author...

  static public IEnumerable<String> TwitterHandles(IEnumerable<Author> authors, string company) {
    return from a in authors where a.Company == company && a.TwitterHandle != null select a.TwitterHandle;
  }

I consider Linq expressions to be a form of , and similarly you can do something like this with any language that supports list comprehensions. It's a matter of taste whether you prefer the list comprehension form, or the pipeline form (I prefer pipelines). In general pipelines are more powerful, in that you can't refactor all pipelines into comprehensions.



Pipeline (software)

Opinions expressed by DZone contributors are their own.

Related

  • Software Delivery at Scale: Centralized Jenkins Pipeline for Optimal Efficiency
  • Concourse CI/CD Pipeline: Webhook Triggers
  • Automating Data Pipelines: Generating PySpark and SQL Jobs With LLMs in Cloudera
  • Optimize Deployment Pipelines for Speed, Security and Seamless Automation

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!