Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Filter First, Map Later

DZone's Guide to

Filter First, Map Later

Java Streams can learn a lesson from SQL to help manage performance and algorithmic complexity. When using imperative languages, remember to filter first, map later.

· Java Zone ·
Free Resource

How do you break a Monolith into Microservices at Scale? This ebook shows strategies and techniques for building scalable and resilient microservices.

In recent days, I’ve seen a bit too much of this:

someCollection
    .stream()
    .map(e -> someFunction(e))
    .collect(Collectors.toList())
    .subList(0, 2);


Something is very wrong with the above example. Can you see it? No? Let me rename those variables for you.

hugeCollection
    .stream()
    .map(e -> superExpensiveMapping(e))
    .collect(Collectors.toList())
    .subList(0, 2);


Better now? Exactly. The above algorithm is O(N) when it could be O(1):

hugeCollection
    .stream()
    .limit(2)
    .map(e -> superExpensiveMapping(e))
    .collect(Collectors.toList());


(Let’s assume the lack of explicit ordering is irrelevant.)

I’m working mostly with SQL and helping companies tune their SQL and as such, I’m always very eager to reduce the algorithmic complexity of queries. To some people, this seems like wizardry, but honestly, most of the time, it just boils down to adding a well-placed index.

Why?

Because an index reduces the algorithmic complexity of a query algorithm from O(N) (scanning the entire table for matches) to O(log N) (scanning a B-tree index for the same matches).

Sidenote: Reducing algorithmic complexity is almost never premature optimization. As your data set grows, bad complexity will always bite you!

The same is true for the above examples. Why would you ever traverse and transform an entire list of a huge amount of elements (N) with an expensive operation (superExpensiveMapping), when you really only need to do this only for the first two values?

Conclusion

SQL is a declarative language where the query optimizer gets this right automatically for you: It will (almost) always filter the data first (WHERE clause), and only then transform it (JOIN, GROUP BY, SELECT, etc).

Just as in SQL, when we hand-write our queries using Streams (and also in imperative programming), always:

Filter First, Map Later

Your production system will thank you.

Note, another interesting case with flatMap() is documented here.

How do you break a Monolith into Microservices at Scale? This ebook shows strategies and techniques for building scalable and resilient microservices.

Topics:
imperative programming ,query ,algorithm ,java ,java performance ,tutorial

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}