Over a million developers have joined DZone.

What if Every Object was an Array? No More NullPointerExceptions!

· Java Zone

Check out this 8-step guide to see how you can increase your productivity by skipping slow application redeploys and by implementing application profiling, as you code! Brought to you in partnership with ZeroTurnaround.

To NULL or not to NULL? Programming language designers inevitably have to decide whether they support NULLs or not. And they’ve proven to have a hard time getting this right. NULL is not intuitive in any language, because NULL is an axiom of that language, not a rule that can be derived from lower-level axioms. Take Java for instance, where

// This yields true:
null == null

// These throw an exception (or cannot be compiled)
int value = (Integer) null;

It’s not like there weren’t any alternatives. SQL, for instance, implements a more expressive but probably less intuitive three-value logic, which most developers get wrong in subtle ways once in a while.

At the same time, SQL doesn’t know “NULL” results, only “NULL” column values. From a set theory perspective, there are only empty sets, not NULL sets.

Other languages allow for dereferencing null through special operators, letting the compiler generate tedious null checks for you, behind the scenes. An example for this is Groovy with its null-safe dereferencing operator. This solution is far from being generally accepted, as can be seen in thisdiscussion about a Scala equivalent. Scala uses Option, which Java 8 will imitate using Optional (or @Nullable).

Let’s think about a much broader solution

To me, nullability isn’t a first-class citizen. I personally dislike the fact that Scala’s Option[T] type pollutes my type system by introducing a generic wrapper type (even if it seems to implement similar array-features through the traversable trait). I don’t want to distinguish the types of Option[T] and T. This is specifically true when reasoning about types from a reflection API perspective, where Scala’s (and Java’s) legacy will forever keep me from accessing the type of T at runtime.

But much worse, most of the times, in my application I don’t really want to distinguish between “option” references and “some” references. Heck, I don’t even want to distinguish between having 1 reference and having dozens. jQuery got this quite right. One of the main reasons why jQuery is so popular is because everything you do, you do on a set of wrapped DOM elements. The API never distinguishes between matching 1 or 100 div’s. Check out the following code:

// This clearly operates on a single object or none
$('div#unique-id').html('new content')
                  .click(function() { ... });

// This possibly operates on several objects or none
$('div.any-class').html('new content')
                  .click(function() { ... });

This is possible because JavaScript allows you to override the prototype of the JavaScript Array type, modifying arrays in general, at least for the scope of the jQuery library. How more awesome can it get? .html() and .click() are actions performed on the array as a whole, no matter if you have zero, one, or 100 elements in your match. What would a more typesafe language look like, where everything behaves like an array (or an ArrayList)? Think about the following model:

class Customer {
  String firstNames;  // Read as String[] firstNames
  String lastName;    // Read as String[] lastName
  Order orders;       // Read as Order[] orders

class Order {
  int value;          // Read as int[] value
  boolean shipped() { // Read as boolean[] shipped

Don’t rant (just yet). Let’s assume this wouldn’t lead to memory or computation overhead. Let’s continue thinking about the advantages of this. So, I want to see if a Customer’s orders have been shipped. Easy:

Customer customer = // ...
boolean shipped = customer.orders.shipped();

This doesn’t look spectacular (yet). But beware of the fact that a customer can have several orders, and the above check is really to see if all orders have been shipped. I really don’t want to write the loop, I find it quite obvious that I want to perform the shipped() check on every order. Consider:

// The length pseudo-field would still be
// present on orders

// In fact, the length pseudo-field is also
// present on customer, in case there are several

// Let's add an order to the customer:
customer.orders.add(new Order());

// Let's reset order

// Let's calculate the sum of all values
// OO-style:
// Functional style:

Of course there would be a couple of caveats and the above choice of method names might not be the best one. But being able to deal with single references (nullable or non-nullable) or array references (empty, single-valued, multi-valued) in the same syntactic way is just pure syntax awesomeness. Null-checks would be replaced by length checks, but mostly you don’t even have to do those, because each method would always be called on every element in the array. The current single-reference vs. multi-reference semantics would be documented by naming conventions. Clearly, naming something “orders” indicates that multi-references are possible, whereas naming something “customer” indicates that multi-references are improbable.

As users have commented, this technique is commonly referred to as array programming, which is implemented in Matlab or R.


I’m curious to hear your thoughts!

The Java Zone is brought to you in partnership with ZeroTurnaround. Check out this 8-step guide to see how you can increase your productivity by skipping slow application redeploys and by implementing application profiling, as you code!


Published at DZone with permission of Lukas Eder, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}