Serialization Breaks Encapsulation
Join the DZone community and get the full member experience.
Join For FreeRecently I ran into an interesting article in which the author, another enthusiast blogger, suggests a set of cool features we all would love to have in Java, things like collection literals, and support for tuples and records and many more. I agree with most of the proposed ideas since I, too, would love to see support for most of them in Java. However I also found a few ideas that I would consider more controversial. One of those ideas was that of making all Java objects serializable by default. In the past I have faced the difficulties related to the evolution of serializable classes and that’s why I was reluctant to consider this a good proposal. So, with the only purpose of fostering a healthy discussion on the subject I have decided to write this post.
My previous post was precisely about the importance of encapsulation and how this can help us to hide complexity and the sources of changes and how this is good because we all expect the software to change over time. However, serialization has always been a headache because it literally exposes everything about an object and so it’s inherently opposite to these other desirable attributes achieved through encapsulation.
With encapsulation we pretend that nothing is revealed about the internal representation of an object, and we interact with our components only through their public interfaces; a desirable attribute that we usually exploit later when we want to change the internal representation of data in a component without breaking any code from its users.
Conversely, serialization implies exposing the internal state of an object by transforming the object’s state into some other format that can be stored and resurrected later. This means that, once serialized, the internal structure of an object cannot be changed without risking the success of this resurrection process.
The problems with serialization could appear not only in the cases of open systems but also in distributed systems that somehow rely on it. For example, if we stop our application server, it may choose to serialize the objects in the current session to resurrect them later, when the server is restarted, but if we redeploy our application using new versions of our serializable objects, will they still be compatible when the server attempts to resurrect them? In a distributed system is common to use code mobility, namely, sets of classes are located in a central repository available for clients and server to share common code. In this approach, since objects are serialized to be shared between clients and servers, do we run the risk of breaking anything if we update the serializable classes in this common repository?
Consider for example that we had a class Person as follows:
public class Person { private String firstName; private String lastName; private boolean isMale; private int age; public boolean isMale() { return this.isMale; } public int getAge() { return this.age; } //more getters and setters }
Let’s say that we released our first version of our API with this abstraction of a Person. For the second version, though, we would like to introduce two changes: first, we discovered that it would be better if we could store the date of birth of a person, instead of the age as an integer, and second our definition of the class Person may have occurred when Java did not have enumerations but now we would like to use them to represent the gender of a person.
Evidently, since the fields are properly encapsulated, we could change the inner workings of the class without affecting the public interface. Somewhat like this:
public class Person { private String firstName; private String lastName; private Gender gender; private Date dateOfBirth; public boolean isMale() { return this.gender == Gender.MALE; } public int getAge() { Calendar today = Calendar.getInstance(); Calendar birth = Calendar.getInstance(); birth.setTime(this.dateOfBirth); return today.get(Calendar.YEAR) - birth.get(Calendar.YEAR); } //the rest of getters and setters }
By doing these changes as shown above we can make sure preexisting clients will not break, because even when we changed the internal representation of the state of the object, we kept the public interface unchanged.
However, consider that the class Person was serializable by default, and if our system is an open system, there could be thousands of lines of code out there relying on the fact that they will be capable of resurrecting serialized objects based on the original class, or maybe even clients who serialized extended classes based on the original version of the class as their parent. Some of these objects may have been serialized to binary form, or some other format, by the users of our API, who now, would like to to evolve to our second version of the code.
Then if we wanted to do some changes as we did in our second example,
we would immediately break some of them; all those having serialized
objects based on the original version of the class who have stored
objects containing a field called age
of type int
, containing the age of a person, and field named isMale
of type boolean
containing information about the gender are likely to fail during the
deserialization of these objects because the new class definition uses
new fields and new data types.
Clearly our problem here is that the serialization has exposed sensitive information about our objects, and now we cannot simply change anything, not even what we thought that was encapsulated because through serialization, everything has been exposed publicly.
Now, consider a scenario in which every single class in the JDK API were serializable by default. The designers of Java simply could not evolve the APIs of Java without risking to break many applications. They would be forced to assume that somebody out there may have a serialized version of any of the classes in the JDK.
There are ways to deal with the evolution of serializable classes (I will leave that for another post), but the important point here is that, when it comes to encapsulation, we would like to keep our serializable classes as contained as possible and for those classes that we indeed need to serialize, then we may need to ponder about the implications of any possible scenario in which we may attempt to resurrect an object using an evolved version of its class.
Published at DZone with permission of Edwin Dalorzo. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments