Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Refactoring with Loops and Collection Pipelines: Part 1

DZone's Guide to

Refactoring with Loops and Collection Pipelines: Part 1

· Big Data Zone
Free Resource

Learn best practices according to DataOps. Download the free O'Reilly eBook on building a modern Big Data platform.

The loop is the classic way of processing collections, but with the greater adoption of first-class functions in programming languages the collection pipeline is an appealing alternative. In this article I look at refactoring loops to collection pipelines with a series of small examples.

I'm publishing this article in installments. This adds an example of refactoring a loop that summarizes flight delay data for each destination airport.

A common task in programming is processing a list of objects. Most programmers naturally do this with a loop, as it's one of the basic control structures we learn with our very first programs. But loops aren't the only way to represent list processing, and in recent years more people are making use of another approach, which I call the collection pipeline. This style is often considered to be part of functional programming, but I used it heavily in Smalltalk. As OO languages support lambdas and libraries that make first class functions easier to program with, then collection pipelines become an appealing choice.


Refactoring a Simple Loop into a Pipeline

I'll start with a simple example of a loop and show the basic way I refactor one into a collection pipeline.

Let's imagine we have a list of authors, each of which has the following data structure.

class Author...

  public string Name { get; set; }
  public string TwitterHandle { get; set;}
  public string Company { get; set;}

This example uses C#

Here is the loop.

class Author...

  static public IEnumerable<String> TwitterHandles(IEnumerable<Author> authors, string company) {
    var result = new List<String> ();
    foreach (Author a in authors) {
      if (a.Company == company) {
        var handle = a.TwitterHandle;
        if (handle != null)
          result.Add(handle);
      }
    }
    return result;
  }

My first step in refactoring a loop into a collection pipeline is to apply Extract Variable on the loop collection.

class Author...

  static public IEnumerable<String> TwitterHandles(IEnumerable<Author> authors, string company) {
    var result = new List<String> ();
    var loopStart = authors;
    foreach (Author a in loopStart) {
      if (a.Company == company) {
        var handle = a.TwitterHandle;
        if (handle != null)
          result.Add(handle);
      }
    }
    return result;
  }

This variable gives me a starting point for pipeline operations. I don't have a good name for it right now, so I'll use one that makes sense for the moment, expecting to rename it later.

I then start looking at bits of behavior in the loop. The first thing I see is a conditional check, I can move this to the pipeline with a filter operation.

class Author...

  static public IEnumerable<String> TwitterHandles(IEnumerable<Author> authors, string company) {
    var result = new List<String> ();
    var loopStart = authors
      .Where(a => a.Company == company);
    foreach (Author a in loopStart) {
      if (a.Company == company) {
        var handle = a.TwitterHandle;
        if (handle != null)
          result.Add(handle);
      }
    }
    return result;
  }

I see the next part of the loop operates on the twitter handle, rather than the author, so I can use a a map operation .

class Author...

  static public IEnumerable<String> TwitterHandles(IEnumerable<Author> authors, string company) {
    var result = new List<String> ();
    var loopStart = authors
      .Where(a => a.Company == company)
      .Select(a => a.TwitterHandle);
    foreach (string handle in loopStart) {
      var handle = a.TwitterHandle;
      if (handle != null)
        result.Add(handle);
     }
    return result;
  }

Next in the loop as another conditional, which again I can move to a filter operation.

class Author...

  static public IEnumerable<String> TwitterHandles(IEnumerable<Author> authors, string company) {
    var result = new List<String> ();
    var loopStart = authors
      .Where(a => a.Company == company)
      .Select(a => a.TwitterHandle)
      .Where (h => h != null);
    foreach (string handle in loopStart) {
      if (handle != null)
        result.Add(handle);
    }
    return result;
  }

All the loop now does is add everything in its loop collection into the result collection, so I can remove the loop and just return the pipeline result.

class Author...

  static public IEnumerable<String> TwitterHandles(IEnumerable<Author> authors, string company) {
    var result = new List<String> ();
    return authors
      .Where(a => a.Company == company)
      .Select(a => a.TwitterHandle)
      .Where (h => h != null);
    foreach (string handle in loopStart) {
      result.Add(handle);
    }
    return result;
  }

Here's the final state of the code

class Author...

  static public IEnumerable<String> TwitterHandles(IEnumerable<Author> authors, string company) {
    return authors
      .Where(a => a.Company == company)
      .Select(a => a.TwitterHandle)
      .Where (h => h != null);
  }

What I like about collection pipelines is that I can see the flow of logic as the elements of the list pass through the pipeline. For me it reads very closely to how I'd define the outcome of the loop "take the authors, choose those who have a company, and get their twitter handles removing any null handles".

Furthermore, this style of code is familiar even in different languages who have different syntaxes and different names for pipeline operators.

Java

  public List<String> twitterHandles(List<Author> authors, String company) {
    return authors.stream()
            .filter(a -> a.getCompany().equals(company))
            .map(a -> a.getTwitterHandle())
            .filter(h -> null != h)
            .collect(toList());
  }

Ruby

  def twitter_handles authors, company
    authors
      .select {|a| company == a.company}
      .map    {|a| a.twitter_handle}
      .reject {|h| h.nil?}
  end

while this matches the other examples, I would replace the final reject with compact

Clojure

  (defn twitter-handles [authors company]
    (->> authors
         (filter #(= company (:company %)))
         (map :twitter-handle)
         (remove nil?)))

F#

  let twitterHandles (authors : seq<Author>, company : string) = 
       authors
           |> Seq.filter(fun a -> a.Company = company)
           |> Seq.map(fun a -> a.TwitterHandle)
           |> Seq.choose (fun h -> h)

again, if I wasn't concerned about matching the structure of the other examples I would combine the map and choose into a single step

I've found that once I got used to thinking in terms of pipelines I could apply them quickly even in an unfamiliar language. Since the fundamental approach is the same it's relatively easy to translate from even unfamiliar syntax and function names.

Refactoring within the Pipeline, and to a Comprehension

Once you have some behavior expressed as a pipeline, there are potential refactorings you can do by reordering steps in the pipeline. One such move is that if you have a map followed by a filter, you can usually move the filter before the map like this.

class Author...

  static public IEnumerable<String> TwitterHandles(IEnumerable<Author> authors, string company) {
    return authors
      .Where(a => a.Company == company)
      .Where (a => a.TwitterHandle != null)
      .Select(a => a.TwitterHandle);
  }

When you have two adjacent filters, you can combine them using a conjunction.

class Author...

  static public IEnumerable<String> TwitterHandles(IEnumerable<Author> authors, string company) {
    return authors
      .Where(a => a.Company == company && a.TwitterHandle != null)
      .Select(a => a.TwitterHandle);
  }

Once I have a C# collection pipeline in the form of a simple filter and map like this, I can replace it with a Linq expression

class Author...

  static public IEnumerable<String> TwitterHandles(IEnumerable<Author> authors, string company) {
    return from a in authors where a.Company == company && a.TwitterHandle != null select a.TwitterHandle;
  }

I consider Linq expressions to be a form of list comprehension, and similarly you can do something like this with any language that supports list comprehensions. It's a matter of taste whether you prefer the list comprehension form, or the pipeline form (I prefer pipelines). In general pipelines are more powerful, in that you can't refactor all pipelines into comprehensions.


Find the perfect platform for a scalable self-service model to manage Big Data workloads in the Cloud. Download the free O'Reilly eBook to learn more.

Topics:
big data ,nosql

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}