Making Pivot Table Using Java Streams
Pivot tables are great for summarizing data. Fortunately, you can use Streams to prepare and sort your data so you can put it into a more readable format.
Join the DZone community and get the full member experience.
Join For FreeToday, let's see how we can implement a pivot table using Java 8 Streams. Raw data by itself does not deliver much insight to humans. We need some kind of data aggregation to discern patterns in raw data. A pivot table is one such instrument. Other more visual methods of aggregation include graphs and charts.
In previous articles, we showed how to use Java 8 Streams to perform SQL-like slicing and dicing of raw data. Today’s article will build on some of those examples, so if the material here appears a bit dense to you, I suggest you browse those two articles first.
If you do not like this “raw data” approach to creating a pivot table and would like Excel to be your medium of choice, we have you covered.
CSV Data Represented as a POJO
We use the following POJO representing baseball players and their salaries.
public class Player {
private int year;
private String teamID;
private String lgID;
private String playerID;
private int salary;
// defined getters and setters here
}
The data comes from a simple CSV file – no quoted fields, no multi-line fields, and no commas inside a field. We can use a simple Regex pattern to parse the CSV and load the data into a List. The data looks like this and contains about 26428 rows:
yearID,teamID,lgID,playerID,salary
1985,ATL,NL,barkele01,870000
1985,ATL,NL,bedrost01,550000
1985,ATL,NL,benedbr01,545000
1985,ATL,NL,campri01,633333
1985,ATL,NL,ceronri01,625000
...
The CSV data is loaded using Streams with code similar to:
Pattern pattern = Pattern.compile(",");
try (BufferedReader in = new BufferedReader(new FileReader(filename));){
List<Player> players = in
.lines()
.skip(1)
.map(line -> {
String[] arr = pattern.split(line);
return new Player(Integer.parseInt(arr[0]),
arr[1],
arr[2],
arr[3],
Integer.parseInt(arr[4]));
})
.collect(Collectors.toList());
}
Define the Pivot Columns Class
The following class defines the class that we use as the container for the pivot columns. These are the columns that the data is grouped by. If using SQL, these columns would appear in the “GROUP BY” clause.
public class YearTeam
{
public int year;
public String teamID;
public YearTeam(int year,String teamID) {
this.year = year;
this.teamID = teamID;
}
@Override
public boolean equals(Object other)
{
if ( other == null ) return false;
if ( this == other ) return true;
if ( other instanceof YearTeam ) {
YearTeam yt = (YearTeam)other;
if ( year == yt.year && teamID.equals(yt.teamID) )
return true;
}
return false;
}
@Override
public int hashCode()
{
int hash = 1;
hash = hash * 17 + year;
hash = hash * 31 + teamID.hashCode();
return hash;
}
@Override
public String toString()
{
StringBuilder sbuf = new StringBuilder();
sbuf.append('[').append(year).append(", ").append(teamID)
.append(']');
return sbuf.toString();
}
}
The fields are defined with the “public” accessor merely for convenience. For your application, you can make it private and add getters and/or setters as needed.
The class overrides equals() and hashCode(), since it will be used as the key when storing in a Map.
Grouping Data Using Streams
And here we read the CSV data, create a POJO for each row, and group the data using the pivot columns class defined above.
Map<YearTeam,List<Player>> grouped = in
.lines()
.skip(1)
.map(line -> {
String[] arr = pattern.split(line);
return new Player(Integer.parseInt(arr[0]),
arr[1],
arr[2],
arr[3],
Integer.parseInt(arr[4]));
})
.collect(Collectors.groupingBy(x-> new YearTeam(x.getYear(), x.getTeamID())));
At this point, the data has been collected in a Map properly grouped by the specified columns.
Printing the Pivot Table as a CSV
Let us print the data in the pivot table as a CSV so we can load it into Excel for comparison. While printing the data, we apply the aggregate function summingLong(). Java 8 Streams also provide averagingLong() which gives you the average. Need more, or all of it at once? summarizingLong() dumps all of it in your face. Enjoy!
CSV Column Headers
We use values of the teamID as column headers. Collect and print them as shown. We use a TreeSethere to have it sorted alphabetically.
Set<String> teams = grouped
.keySet()
.stream()
.map(x -> x.teamID)
.collect(Collectors.toCollection(TreeSet::new));
System.out.print(',');
teams.stream().forEach(t -> System.out.print(t + ","));
System.out.println();
Printing the Data
And here is the complete pivot table being built and printed. For each year and team, we extract the list of players, perform a summing operation and print the total.
Set<Integer> years = grouped
.keySet()
.stream()
.map(x -> x.year)
.collect(Collectors.toSet());
years
.stream()
.forEach(y -> {
System.out.print(y + ",");
teams.stream().forEach(t -> {
YearTeam yt = new YearTeam(y, t);
List<Player> players = grouped.get(yt);
if ( players != null ) {
long total = players
.stream()
.collect(Collectors.summingLong(Player::getSalary));
System.out.print(total);
}
System.out.print(',');
});
System.out.println();
});
Comparing Output in Excel
Loading the CSV output in Excel shows the data as follows:
Comparing it with Excel’s own pivot table, it shows the data to be identical. (For some reason, column “MON
” has appeared first in Excel as shown below. Maybe it is one of the many “features” available. The values are identical though.)
And that, my friends, is one way you can prepare a pivot table from inside Java using plain Collections. Go find some cool uses for it!
Summary
A pivot table is a useful data summarizing tool. It is available in most data analytics software including Excel., Here we learned how to build the same data structure using Java 8 Streams. We used grouping-by and summing to achieve the functionality.
Published at DZone with permission of Jay Sridhar, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments