Why Empty Strings are Not the Same as Null
Join the DZone community and get the full member experience.Join For Free
Null is an important, but sometimes hard concept. What’s the difference between an empty string and a
null string? One of my first Stack Overflow questions was (NOT) NULL for NVARCHAR columns.
Some people claim that using
null is always wrong or is some kind of voodoo.
If you feel the urge to use Nullable<int> or Nullable<bool> stop! You're doing it wrong...
This time Mike is wrong. There is a place for
Null means no value
Null is a special value that means no value. For plain C pointers it’s just a name for the magic number 0 (I know that according to the specification it can have another numerical representation than 0, but in reality it doesn’t). In C#
null is a special value that’s not part of the reference value space. The same is true for nullable value types (
int? for short) where an
int? can take any permitted value for an
int or be
null. For a SQL column the same is true, a nullable int column can take any possible
int value plus or be
Empty string or Null
For strings things get a bit more complicated. A SQL
NVARCHAR() NULL can be either empty or
null. If you allow the string to be
null you’d better have a strict definition of how
null is different to an empty string. There might be cases where
null means unspecified while an empty string means specified as empty. They are unusual (I even failed to come up with an example). In most cases I find it best to use empty strings instead of
null. Unfortunately C# doesn’t allow non-null strings.
Empty values should be null
In Nullability Voodoo Mike argues why using
null is wrong.
Rather than using the nullability of EndDate to mean that the task hasn’t completed, consider giving the task a status instead.
He is right that
null is a bad way of marking a task as not completed. Especially if there are many different states that are dependent on different fields it can quickly get hard to find the state. I prefer an explicit state field. It might be implemented as an in memory only, calculated field, on the entity corresponding to the database row. That keeps the database normalized.
Even if Mike is right that having only a
null value for
EndDate is a bad marker for a “not completed” state, that’s still not a reason to not have
EndDate nullable. If the table indeed has a state field, which clearly marks a row as “not completed”, what’s the right value to put in for
EndDate? As the task is not yet completed there is no end date. It is undefined. Undefined is represented as
Exclude Undefined values from Calculations
null for undefined values effectively excludes them from calculations which is good. When a manager comes running, asking for a quick ad hoc report shoving the average number of items shipped for each order you don’t want to include non shipped orders (with incomplete data) in the calculation. If the
ShippedItemCount column isn’t nullable, all non shipped orders have a 0 value. In the ad hoc report a filter has to be applied to ignore those 0 values in the calculation.
null is instead used for
ShippedItemCount until the order is actually shipped, those values are automatically excluded from the calculation.
Mike is Right and Wrong
Mike is right in that
null should be used with care.
null is a powerful tool that should only be used where appropriate. In fact, Anders Hejlsberg regrets that non-nullable reference types are not available in C#.
null should be an opt-in for where it is appropriate. Not mandatory as it is now.
Published at DZone with permission of Anders Abel, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
8 Data Anonymization Techniques to Safeguard User PII Data
How to Optimize CPU Performance Through Isolation and System Tuning
Mainframe Development for the "No Mainframe" Generation
HashMap Performance Improvements in Java 8