DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • Optimizing Java Applications: Parallel Processing and Result Aggregation Techniques
  • The Long Road to Java Virtual Threads
  • Generics in Java and Their Implementation
  • The Two-Pointers Technique

Trending

  • Monolith: The Good, The Bad and The Ugly
  • How to Create a Successful API Ecosystem
  • Event-Driven Microservices: How Kafka and RabbitMQ Power Scalable Systems
  • How To Introduce a New API Quickly Using Quarkus and ChatGPT
  1. DZone
  2. Data Engineering
  3. Data
  4. Stream Summary Statistics

Stream Summary Statistics

Execute multiple operations on a Java Stream at once to avoid repeated traversal. Note that the Stream becomes invalid after the terminal operation.

By 
Horatiu Dan user avatar
Horatiu Dan
·
Jul. 18, 23 · Analysis
Likes (2)
Comment
Save
Tweet
Share
8.2K Views

Join the DZone community and get the full member experience.

Join For Free

In order to be able to leverage various capabilities of the Java Streams, one shall first understand two general concepts – the stream and the stream pipeline. A Stream in Java is a sequential flow of data. A stream pipeline, on the other hand, represents a series of steps applied to data, a series that ultimately produce a result.

My family and I recently visited the Legoland Resort in Germany – a great place, by the way – and there, among other attractions, we had the chance to observe in detail a sample of the brick-building process. Briefly, everything starts from the granular plastic that is melted, modeled accordingly, assembled, painted, stenciled if needed, and packed up in bags and boxes. All the steps are part of an assembly factory pipeline.

What is worth mentioning is the fact that the next step cannot be done until the previous one has been completed and also that the number of steps is finite. Moreover, at every step, each Lego element is touched to perform the corresponding operation, and then it moves only forward, never backward, so that the next step is done. The same applies to Java streams.

In functional programming, the steps are called stream operations, and they are of three categories – one that starts the job (source), one that ends it and produces the result (terminal), and a couple of intermediate ones in between.

As a last consideration, it’s worth mentioning the intermediate operations have the ability to transform the stream into another one but are never run until the terminal operation runs (they are lazily evaluated). Finally, once the result is produced and the initial scope is achieved, the stream is no longer valid.

Abstract

Having as starting point the fact that in the case of Java Streams, once the terminal stream operation is done, the stream is no longer valid, this article aims to present a way of computing multiple operations at once through only one stream traversal. It is accomplished by leveraging the Java summary statistics objects (in particular IntSummaryStatistics) that reside since version 1.8.

Proof of Concept

The small project was built especially to showcase the statistics computation uses the following:

  • Java 17
  • Maven 3.6.3
  • JUnit Jupiter Engine v.5.9.3

As a domain, there is one straightforward entity – a parent.

Java
 
public record Parent(String name, int age) { }


It is modeled by two attributes – the name and its age. While the name is present only for being able to distinguish the parents, the age is the one of interest here.

The purpose is to be able to compute a few age statistics on a set of parents, that is:

  • The total sample count
  • The ages of the youngest and the oldest parent.
  • The age range of the group.
  • The average age
  • The total number of years the parents accumulate.

The results are encapsulated into a ParentStats structure and represented as a record as well.

Java
 
public record ParentStats(long count,
                          int youngest,
                          int oldest,
                          int ageRange,
                          double averageAge,
                          long totalYearsOfAge) { }


In order to accomplish this, an interface is defined. 

Java
 
public interface Service {
 
    ParentStats getStats(List<Parent> parents);
}


For now, it has only one method that receives input from a list of Parents and provides as output the desired statistics. 

Initial Implementation

As the problem is trivial, an initial and imperative implementation of the service might be as below:

Java
 
public class InitialService implements Service {
 
    @Override
    public ParentStats getStats(List<Parent> parents) {
        int count = parents.size();
        int min = Integer.MAX_VALUE;
        int max = 0;
        int sum = 0;
        for (Parent human : parents) {
            int age = human.age();
            if (age < min) {
                min = age;
            }
            if (age > max) {
                max = age;
            }
            sum += age;
        }
 
        return new ParentStats(count, min, max, max - min, (double) sum/count, sum);
    }
}


The code looks clear, but it seems too focused on the how rather than on the what; thus, the problem seems to get lost in the implementation, and the code is hard to read.

As the functional style and streams are already part of every Java developer’s practices, most probably, the next service implementation would be chosen.

Java
 
public class StreamService implements Service {
 
    @Override
    public ParentStats getStats(List<Parent> parents) {
        int count = parents.size();
 
        int min = parents.stream()
                .mapToInt(Parent::age)
                .min()
                .orElseThrow(RuntimeException::new);
 
        int max = parents.stream()
                .mapToInt(Parent::age)
                .max()
                .orElseThrow(RuntimeException::new);
 
        int sum = parents.stream()
                .mapToInt(Parent::age)
                .sum();
 
        return new ParentStats(count, min, max, max - min, (double) sum/count, sum);
    }
}


The code is more readable now; the downside though is the stream traversal redundancy for computing all the desired stats – three times in this particular case. As stated at the beginning of the article, once the terminal operation is done – min, max, sum – the stream is no longer valid. It would be convenient to be able to compute the aimed statistics without having to loop the list of parents multiple times. 

Summary Statistics Implementation

In Java, there is a series of objects called SummaryStatistics which come in different types – IntSummaryStatistics, LongSummaryStatistics, DoubleSummaryStatistics.

According to the JavaDoc, IntSummaryStatistics is “a state object for collecting statistics such as count, min, max, sum and average. The class is designed to work with (though does not require) streams”. 

It is a good candidate for the initial purpose; thus, the following implementation of the Service seems the preferred one.

Java
 
public class StatsService implements Service {
 
    @Override
    public ParentStats getStats(List<Parent> parents) {
        IntSummaryStatistics stats = parents.stream()
                .mapToInt(Parent::age)
                .summaryStatistics();
 
        return new ParentStats(stats.getCount(),
                stats.getMin(),
                stats.getMax(),
                stats.getMax() - stats.getMin(),
                stats.getAverage(),
                stats.getSum());
    }
}


There is only one stream of parents, the statistics get computed, and the code is way readable this time.

In order to check all three implementations, the following abstract base unit test is used.

Java
 
abstract class ServiceTest {
 
    private Service service;
 
    private List<Parent> mothers;
    private List<Parent> fathers;
    private List<Parent> parents;
 
    protected abstract Service setupService();
 
    @BeforeEach
    void setup() {
        service = setupService();
 
        mothers = IntStream.rangeClosed(1, 3)
                .mapToObj(i -> new Parent("Mother" + i, i + 30))
                .collect(Collectors.toList());
 
        fathers = IntStream.rangeClosed(4, 6)
                .mapToObj(i -> new Parent("Father" + i, i + 30))
                .collect(Collectors.toList());
 
        parents = new ArrayList<>(mothers);
        parents.addAll(fathers);
    }
 
    private void assertParentStats(ParentStats stats) {
        Assertions.assertNotNull(stats);
        Assertions.assertEquals(6, stats.count());
        Assertions.assertEquals(31, stats.youngest());
        Assertions.assertEquals(36, stats.oldest());
        Assertions.assertEquals(5, stats.ageRange());
 
        final int sum = 31 + 32 + 33 + 34 + 35 + 36;
 
        Assertions.assertEquals((double) sum/6, stats.averageAge());
        Assertions.assertEquals(sum, stats.totalYearsOfAge());
    }
 
    @Test
    void getStats() {
        final ParentStats stats = service.getStats(parents);
        assertParentStats(stats);
    }
}


As the stats are computed for all the parents, the mothers, and fathers are first put together in the same parents list (we will see later why there were two lists in the first place).

The particular unit test for each implementation is trivial – it sets up the service instance.

Java
 
class StatsServiceTest extends ServiceTest {
 
    @Override
    protected Service setupService() {
        return new StatsService();
    }
}


Combining Statistics

In addition to the already used methods – getMin(), getMax(), getCount(), getSum(), getAverage() – IntSummaryStatistics provides a way to combine the state of another similar object into the current one. 

Java
 
void combine(IntSummaryStatistics other)


As we saw in the above unit test, initially, there are two source lists – mothers and fathers. It would be convenient to be able to directly compute the statistics without first merging them.

In order to accomplish this, the Service is enriched with the following method.

Java
 
default ParentStats getCombinedStats(List<Parent> mothers, List<Parent> fathers) {
    final List<Parent> parents = new ArrayList<>(mothers);
    parents.addAll(fathers);
    return getStats(parents);
}


The first two implementations – InitialService and StreamService – are not of interest here; thus, a default implementation was provided for convenience. It is overwritten only by the StatsService. 

Java
 
@Override
public ParentStats getCombinedStats(List<Parent> mothers, List<Parent> fathers) {
    Collector<Parent, ?, IntSummaryStatistics> collector = Collectors.summarizingInt(Parent::age);
 
    IntSummaryStatistics stats = mothers.stream().collect(collector);
    stats.combine(fathers.stream().collect(collector));
 
    return new ParentStats(stats.getCount(),
            stats.getMin(),
            stats.getMax(),
            stats.getMax() - stats.getMin(),
            stats.getAverage(),
            stats.getSum());
}


By leveraging the combine() method, the statistics can be merged directly as different source lists are available.

The corresponding unit test is straightforward.

Java
 
@Test
void getCombinedStats() {
    final ParentStats stats = service.getCombinedStats(mothers, fathers);
    assertParentStats(stats);
}


Having seen the above Collector, the initial getStats() method may be written even more briefly. 

Java
 
@Override
public ParentStats getStats(List<Parent> parents) {
    IntSummaryStatistics stats = parents.stream()
            .collect(Collectors.summarizingInt(Parent::age));
 
    return new ParentStats(stats.getCount(),
            stats.getMin(),
            stats.getMax(),
            stats.getMax() - stats.getMin(),
            stats.getAverage(),
            stats.getSum());
}


Conclusion

Depending on the used data types, IntSummaryStatistics, LongSummaryStatistics or DoubleSummaryStatistics are convenient out-of-the-box structures that one can use to quickly compute simple statistics and focus on writing more readable and maintainable code. 

Data structure Statistics Java (programming language) Stream (computing) unit test Data Types

Published at DZone with permission of Horatiu Dan. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Optimizing Java Applications: Parallel Processing and Result Aggregation Techniques
  • The Long Road to Java Virtual Threads
  • Generics in Java and Their Implementation
  • The Two-Pointers Technique

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!