Why do C#, the .NET Framework, and the CLR need value types and reference types? Why two categories of types? Why the added complexity in training developers to understand why and when to use each type of type?
There are many answers, but very few get to the crux of the matter. You could try to justify the need for two types of types by looking at the semantic differences C# affords each. For example, you know that by default, instances of value types are copied when passed to a function, but instances of reference types are not -- only the references are copied. Or, you could say that by default, the
Equals method compares whether two instances of a reference type are identical (point to the same memory location), but for instances of value types it compares their contents. There are many other superficial semantic differences, too. But do they justify having two types of types?
It seems that the standard reasons of "stack vs. heap", "by value vs. by reference", "identity vs. contents" are not by themselves enough to justify the associated language and implementation complexity of having two categories of types. Here is an attempt at an alternative explanation.
Consider C#, Java, and C++ for a moment. All three languages have types that are more lightweight than others:
|Language||Lightweight Types||Heavyweight Types|
||Object types (
|C#||Value types||Reference types|
|C++||Structs/classes without virtual methods||Structs/classes with virtual methods|
What do the "heavyweight" types have in common? Instances of these types -- in all three languages -- are provided some additional services by the compiler/runtime/execution environment at the expense of additional overhead. These services, and that overhead, are the reason for having two types of types.
What are those services, then? Although it somewhat depends on the language and environment, all three examples above afford heavyweight types with support for polymorphism, namely virtual method invocation. Virtual methods in Java, C#, and C++ rely on a method table stored in memory, and pointed to from the header of each object instance. This pointer ("vfptr"" in C++, "method table pointer" in the CLR) is the overhead, the cost heavyweight types must pay for a service that lightweight types do not have access to.
+---Ref Type Instance---+ | Object Header Word | | Method Table Pointer ------> +---Method Table (Simplified)---+ | ... object fields ... | | Ptr to Base MT | +-----------------------+ | Ptr to Object.Equals | | Ptr to Object.ToString | | ... additional methods ... | +-------------------------------+
In the CLR and the JVM, reference types enjoy additional services on top of virtual method invocation. For example, reference types participate in
Monitor synchronization: you can use the C#
lock or Java
synchronized keyword to use an arbitrary reference type instance for synchronization. Additionally, both the CLR and the JVM offer garbage collection for heap objects. Both of these services require additional memory overhead associated with each reference type.
It is not the semantic difference in copying vs. passing a reference or comparing identity vs. comparing contents that explains why two types of types are so prevalent. The additional services -- supporting virtual methods, object synchronization, garbage collection, finalization -- make the overhead necessary for reference types. This very overhead is not acceptable for small, primitive types of which millions of instances are likely to be required. Integers, floats, characters, Booleans, two-dimensional points, rectangle coordinates cannot afford to waste 4-16 bytes of overhead per instance.
This is why C#, Java, and C++ have two categories of types -- even if you don't think of them as different categories. And this is also why you should consider using value types: not because they make it easier to copy objects by value or compare their contents, but because they do not pay the cost of services you will not require of them.