Why Empty Strings are Not the Same as Null
Join the DZone community and get the full member experience.
Join For FreeNull
is an important, but sometimes hard concept. What’s the difference between an empty string and a null
string? One of my first Stack Overflow questions was (NOT) NULL for NVARCHAR columns.
Some people claim that using null
is always wrong or is some kind of voodoo.
If you feel the urge to use Nullable<int> or Nullable<bool> stop! You're doing it wrong...
This time Mike is wrong. There is a place for null
, including Nullable<bool>
and Nullable<int>
.
Null means no value
Null
is a special value that means no value. For plain C pointers it’s just a name for the magic number 0 (I know that according to the specification it can have another numerical representation than 0, but in reality it doesn’t). In C# null
is a special value that’s not part of the reference value space. The same is true for nullable value types (Nullable<int>
or int?
for short) where an int?
can take any permitted value for an int
or be null
. For a SQL column the same is true, a nullable int column can take any possible int
value plus or be null
.
Empty string or Null
For strings things get a bit more complicated. A SQL NVARCHAR() NULL
can be either empty or null
. If you allow the string to be null
you’d better have a strict definition of how null
is different to an empty string. There might be cases where null
means unspecified while an empty string means specified as empty. They are unusual (I even failed to come up with an example). In most cases I find it best to use empty strings instead of null
. Unfortunately C# doesn’t allow non-null strings.
Empty values should be null
In Nullability Voodoo Mike argues why using null
is wrong.
Rather than using the nullability of EndDate to mean that the task hasn’t completed, consider giving the task a status instead.
He is right that EndDate
being null
is a bad way of marking a task as not completed. Especially if there are many different states that are dependent on different fields it can quickly get hard to find the state. I prefer an explicit state field. It might be implemented as an in memory only, calculated field, on the entity corresponding to the database row. That keeps the database normalized.
Even if Mike is right that having only a null
value for EndDate
is a bad marker for a “not completed” state, that’s still not a reason to not have EndDate
nullable. If the table indeed has a state field, which clearly marks a row as “not completed”, what’s the right value to put in for EndDate
? As the task is not yet completed there is no end date. It is undefined. Undefined is represented as null
.
Exclude Undefined values from Calculations
Using null
for undefined values effectively excludes them from calculations which is good. When a manager comes running, asking for a quick ad hoc report shoving the average number of items shipped for each order you don’t want to include non shipped orders (with incomplete data) in the calculation. If the ShippedItemCount
column isn’t nullable, all non shipped orders have a 0 value. In the ad hoc report a filter has to be applied to ignore those 0 values in the calculation.
If null
is instead used for ShippedItemCount
until the order is actually shipped, those values are automatically excluded from the calculation.
Mike is Right and Wrong
Mike is right in that null
should be used with care. null
is a powerful tool that should only be used where appropriate. In fact, Anders Hejlsberg regrets that non-nullable reference types are not available in C#. null
should be an opt-in for where it is appropriate. Not mandatory as it is now.
Published at DZone with permission of Anders Abel, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Trending
-
8 Data Anonymization Techniques to Safeguard User PII Data
-
How to Optimize CPU Performance Through Isolation and System Tuning
-
Mainframe Development for the "No Mainframe" Generation
-
HashMap Performance Improvements in Java 8
Comments