JDK 12: Merging Collectors and the Challenge of Naming
In JDK 12, we will likely see a new feature in the Stream API. Click here to find out more about naming merging Collectors in the JDK.
Join the DZone community and get the full member experience.Join For Free
It appears likely that a new method will be available on the java.util.streams.Collectors class in JDK 12 that will, according to the new method's proposed Javadoc-based documentation, "return a Collector that passes the input elements to two specified collectors and merges their results with the specified merge function." The currently proposed name of this new
Collectors method is
pairing, but that new method's name has been the source of discussion.
The naming of this method has solicited wide discussion on the OpenJDK core-libs-dev mailing list. Although it would be easy to label this as an example of bike-shedding (or Parkinson's Law of Triviality), my experience has been that proper naming can be more important than it might seem at first glance. I've seen many situations in which there was nothing wrong with the logic of a particular implementation, but problems ensued related to the use of that implementation, due to miscommunication or bad assumptions tied to poorly named code constructs. For a major API in the JDK, it's not so surprising that the name of this method parameter would be so seriously considered.
The discussion began with Peter Levart's post "BiCollector" (published on June 17), in which he opened with the question, "have you ever wanted to perform a collection of the same Stream into two different targets using two Collectors?" Levart included an example of an implementation of a
BiCollector and asked if this was the type of thing that might be added to the JDK. Not to anyone's suprise, it turns out that this implementation is desired by others, and some alternate existing implementations, including Kirk Pepperdine and Tagir Valeev's streameximplementation, were mentioned.
After discussion regarding the multiple implementations of the "BiCollector," Tagir Valeev created an OpenJDK "preliminary webrev of my own implementation" and put it out for review. In that post, Valeev specifically called out that he had made up the name "pairing" for the method and added, "as I'm not a native English speaker, I cannot judge whether it's optimal, so better ideas are welcome." That "opened the floodgates!"
Although there was some interesting and significant discussion surrounding other implementation details of the proposed "BiCollector" (now in proposed code as "Collectors.pairing()"), the naming of the method received many contributions. In a June 21 post, Valeev summarized the proposed names with accompanying comments about each recommendation, and I have reproduced that list— without the insightful comments — here:
- tee or teeing
- bifurcate (or bifurcating?)
- fanout or fanningOut
For those interested in arguments for and against these proposed names, I recommend viewing Valeev's original post. Most of the posts linked above with the name suggestions provide arguments for their favored name, and there is some interesting insight into what OpenJDK contributors think about the aspects in a method name that might aid or hinder understanding the fucntionality of the method.
After the excitement of naming the method, discussion died down for a while on this addition to the
Collectors API, until Valeev posted a "ping message" with a link to the latest webrev for review (changes
@since 11 to
@since 12). In response to this "ping" message, there is feedback regarding the name of the last argument to the proposed method (currently named "
finisher"), which is another reminder of the importance of naming methods.
Other posts on this topic on the core-libs-dev mailing list remind us that for this new method to be added to the
Collectorspublic API, a few things still need to happen that include a sponsor volunteering to review and sponsor the changeset, as well as the need for a CSR (Compatibility & Specification Review) and "a couple of reviewers that are fully aware of Stream's design."
A Brian Goetz post on this thread summarizes why naming this proposed method is so difficult:
The essential challenge in naming here is that this Collector does two (or maybe three) things: it duplicates the stream into two identical streams ("tee"), sends each element to the two collectors ("collecting"), and then combines the results ("finishing"). So, all the one-word names (pairing, teeing, unzipping, biMapping) only emphasize one half of the operation, and names that capture the full workflow accurately (teeingAndCollectingAndThen) are unwieldy.
That same Goetz post argues against "merging" (or its derivatives) for the method's name, because "names along the lines of 'merging' may incorrectly give the idea that the merge is happening elementwise, rather than duplicating the streams, collecting and merging the results."
I find several of the proposed method names to be reasonable, but there are a couple that I believe (hope) were made out of an attempt at humor.
JDK-8205461 ["Create Collector which merges results of two other collectors"] is the "Enhancement" or "bug" describing this issue. Its description currently begins with "add a new Collector into the Collectors class, which merges results of two other collectors" before explicitly stating "one API method should be added (the name is subject to discussion)." If you've ever wanted to name a method in a public JDK API, this might be your opportunity!
I have used this blog post to accomplish two things:
- Create awareness of this method that is likely to be available in the public API as of JDK 12
- Present an example of why naming is important and why it can be as difficult in regards to the technical implementation
- Proper naming can be difficult for anyone — even those of us who are native English speakers!
Published at DZone with permission of Dustin Marx, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.