Bean Validation for the Rest of Us – Emmanuel Bernard on JSR 303
DZone recently interviewed Emmanuel Bernard, JBoss lead developer on four Hibernate projects - Annotations, EntityManager, Validator, and Hibernate Search. Emmanuel is a member of a member of JPA 2.0 expert group and the spec lead of JSR-303: Bean Validation. He is also author of the recently released book for Manning, "Hibernate Search in Action."
DZone: Emmanuel, can you tell us a little bit about what you're up to these days?
Emmanuel Bernard: Sure. I basically finished “Hibernate Search in Action” in late 2008. So that was probably one of my biggest projects of the year. For people that are considering writing a book, I wouldn’t recommend it -- it's a lot of work.
My main work at the moment is really to polish the bean validation work the expert group was done over the last year. So we really started working very hard on the spec in 2008, led by me, and I proposed various features, and we're trying to get community involvement and the expert group involvement on that.
And because at the time we recall the (Java) EE 6 deadline is very close, I'm working really hard to try and get bean validation into the EE 6 spec.
DZone: What were some of the motivations behind the Bean Validation spec?
Emmanuel: Sure. Let me first start by why do you need validation? If you look at validation there are several areas where it is useful in software, in some kind of a software system.
The first one is to give user feedback. "I'm sorry, but the email you're trying to give me seems wrong, probably you made some kind of typo here." The second use case is to make sure that the service is going to behave properly. And if the input you give on the service is not appropriate to what it is designed to, it should fail fast and you say, "I'm sorry, but what you gave me, I will not be able to process that." So I would rather raise an exception and say that I cannot work on it.
The last point, which is probably the most important one, is to keep your data as clean and useful as possible. So let's take the case of where people don't care about validation at all. The process of entering data and the process of entering some kind blob into your data storage, and if you try to extract information from that, you will have a lot of issues. If they are not normalized, you will probably not be able to properly extract the phone number to then connect that to your phone system or so on and so on.
So people have been adding validation at different levels in their systems to work around that. If you take your classic Java application, you can see validation rules at the DAO levels where the data is about to be persisted in the database; at the business layer, where the parameters you receive are not correct according to your needs, you will raise exception; and of course the presentation layer where you have the interaction between the user and your system.
The problem with that is every layer, somehow every framework, came with its own validation logic and validation framework. When you, as an application developer, try to validate this, you will have lots of declaration duplications. So you will say that your property is not null in your presentation layer, and so you have to also say your property must not be null at the DAO layer, and so on and so forth.
If you change it in one place, you also have to remember to change it on the other layer on the other framework. So you have some kind of duplication in the way you declare the constraints.
You also have an inconsistency in the way the constraints are actually validated. Let's say there is no bug in the way of validation is done. Let's say the rules to validate an email are the same between the different frameworks and different layers. So you have this declaration duplication.
The other problem is some kind of inconsistency in the way you defined the constraints. First of all, each framework has its own way of declaring and defining the constraint. The second problem that could arise is well, maybe the email validation at the DAO level is different than the email validation at the presentation layer. And now you have some kind of inconsistency.
The last problem is when you have a validation error or constraint valid relations, how is this reported to your system? One framework will use an exception, another one, let's say JSF, will use a Faces message and pass it to the UI. So you have a lot of inconsistent ways to receive validation error reports.
DZone: Does Bean Validation then really provide a way to “re-factor” all the different validation methodologies into one central place in the business tier?
Emmanuel: In a way, yeah. You are correct. The thing is, if you look at validation as the constraint, they are most of the time based on the constraints declared around the domain model - the object model that is representing your data.
I want this address, the city and address should never be null and never be greater than 50 characters. That’s a constraint that you would like to apply at all different levels. So Bean Validation comes with the idea of standardizing the way a user is declaring the constraints in his system.
They have three goals. The first one is standardizing the way you declare the constraints. So an application developer will just know one way to declare constraints, and just forget about all the different models.
The second goal of Bean Validation is to provide a default runtime engine that is used to validate the constraints, in Java. The third point is to expose some metadata to the user or to the framework. So you will be able to say, well, for a given property, let me get the list of constraints that are exposed.
Any framework that is having a need to use validation, to apply validation, can either use the runtime engine of Bean Validation or go extract the metadata and play with that. So the user declares it once, in one place, and every framework in the Java space can actually go and either apply the validation routine, or extract the validation method out of Bean Validation and play with it.
DZone: How do you know what type of validation to put in the presentation tier versus the business tier, or in the back-end?
Well, yes, I need to do that because if I don't, basically, someone hacking my Web app will be able to inject any kind of data without the security of the data being validated. And if we push that even to the lowest space, my database is actually shared by more than one application. So should I use validation at the application level and forget about the database? Or, should I also try to put the constraints at the database level? Well, if I put the constraint at the database then I'm sure that my data will be valid, regardless of the application using the database and inserting data into it.
So really, if you think about it, what I'm saying is that you should put your constraint as low as possible - at the lowest level as possible. The database is really where you store your data, so that's where you should put them. But the problem with that is the lower you are in the system the less contextual information you have with regards to your data and constraint logic.
I'm in the process of validating, let's say, a user profile. When I'm in my business tier, I have some knowledge that it's actually a user profile with a given set of constrainted data. Java is rich enough to apply very complex validation logic that I would not be able to do at the database level. So it's interesting to expose and implement the constraint at the business level.
If you look at the presentation layer, the business level will, basically, raise an exception saying, 'I'm sorry. The credit card is not good.' But if you look at your user, wouldn't it be more interesting to give to the user a very short message that says, 'Sorry, it seems to me that your credit card is wrong,' without even reaching the server? It provides feedback to the user very quickly and efficiently. So also you really need to apply the validation at the presentation layer.
The key here is, I need the same constraints that need to be applied at all layers of my application all the way down to the database, by doing DDL operations like ‘this column is not the email,’ or something like that.
It is my business layer with all the richness of Java, where I can make sure that my data is not corrupt, and at the presentation layer, because I want to provide as useful and meaningful feedback to the user as possible.
DZone: What does a developer need to do concretely, in order to put Bean Validation on the domain model?
Emmanuel: Bean Validation - one of the main goals was to use annotations as the primary language to express constraints. The are a few reasons.
As I explained earlier, the domain model is where you want to express your constraints and make sure they are here to be validated. By using annotations, you put the constraints exactly where the property is in your code. In one look, and you exactly know the kind of constraint apply to your given property.
If you think about a constraint, a constraint is really some kind of an extension of the type system. Instead of, let's say I want to get email. Let's say when I receive an object, and that would not be the natural way you would do it in Java. In Java, you would want to say, well, I want a string which represents my email. You should think even further, I don't really want a string, I want a special kind of string that makes sure that the string looks like an email.
So a constraint is really some kind of a type extension. Using annotations for that is very natural.
The way you declare a constraint on your domain model is basically by using annotation - how this property 'getUserName' should never be null. So I will put 'not null' annotation to my field of my getter property. I can also apply constraints on the bean itself. Say I have got an address, and address is validated in a more complex way than just a property level validation. I can put a constraint on the address object itself. In this case I receive the address object and I can do any kind of validation.
So that's how basically you validate a given object type. But what if you want to validate an object graph? What if your user is having addresses and credit cards and you want to make sure the whole user is valid, even the associated objects?
Bean Validation lets you do that using the @Valid annotation. Basically saying, 'Well, when you validate this object, make sure the associated object is validated as well.' By the way, even if Bean Validation is primarily annotation focused, there is the equivalent in XML. So for people that are really averse to annotations, don't worry, you can use XML files.
I am personally very averse to using XML files for any kind of code-centric metadata. But it’s a matter of taste I guess.
DZone: What is groups and how is it useful?
Emmanuel: Good question. Remember, I told you, you need to define your constraint on your domain model. That can really mean that every constraint in the domain model should be validated and valid at a given time for an object to be valid. But there are several use cases where you don't really want to validate all the constraints that you describe in your domain model. Let me give you a couple of use cases.
The first one is your user partially gives you some partial data that is not really complete, so your object or object graph is not completely valid. But that's the first step of the wizard, to put this object graph in memory. And then you go to the second step of the wizard to fill up the traditional information and so on.
What would be interesting is to say, "Well, I'm at the first step of the wizard," so make sure all of the data that you have entered so far is valid. And I know that the object itself as a whole is not valid. So somehow I need to define a subset of constraints that will be validated at a given time. OK, that's the first use case - the wizard-style validation.
The second one is, some constraints validation is actually quite complex or costly. Maybe it's an expensive CPU-based validation or maybe every time you validate a credit card number, it costs you some money.
So you want to make sure that all the other constraints are really validated within a given property before executing this validation, a costly validation. You need a way to say, "Validate all the subgroups, and if the subgroups fail, stop here. And if this subgroup is not failing, then move to the next group, which is going to do my more costly validation." That's the second use case.
The third use case is sometimes you want to validate an object or object graph in a given state. Not specifically a valid state, but some kind of state. So you should look at Amazon, you can buy in one click, right? And you can buy in one click only if you have a default credit card associated to your account and you have a default address associated to your account. Otherwise it will literally not be able to validate and process your order in one click.
So what Amazon could do is connect the user objects to some constraint describing that this credit card is not actually not null and valid and the default shipping and billing address are also not known. And you can define a subset of the constraint named, I don't know, "buy in one click," that will make sure that the object that is actually in a valid state for buying in one click. And if it's the case, then Amazon will display the "buy in one click" button. If it's not, then the button will just not be displayed.
The way we solve that is to let the user describe the group's given constraint it's assigned on. Groups are basically interfaces, so it's kind of a type safe way to define a group and you can put in Javadocs on it. It's an easy way to document it.
And groups can be narrated. So let's imagine, to buy in one click of course the customer needs to be billable, so maybe I've got a billable group that will be a super-interface of "buy in one click." So when I'm validating ‘buy in one click’ and I'm also validating "billable."
And we can also define what is called a "group sequence." A group sequence is here to make sure that when we validate a given subset, a given group, if this group is valid we go and validate the next group. Remember the complex validation that we don't want to do unless the first, more basic validations are correct? That's how you solve this kind of problem.
DZone: How can groups work across various tiers of the application?
Emmanuel: That's the beauty of Bean Validation. Bean Validation is never tied to any layer specifically. Most validation frameworks are actually associated to the presentation layer - let's say the Struts validation logic is of course dedicated to Struts, if you use the metadata pack from JPA it's kind of associated to JPA, et cetera, et cetera.
Bean Validation is really independent of the Java tier you're talking to. So a group being an interface, the way you define the group you want to validate is basically just by providing to the validate method the list of groups you want, the list of interfaces you want to validate.
So let's say I want to validate a user and make sure it's a buy-in-one-click user, I will do validateUser and then pass the ‘buy-in-one-click’ class group. And I can pass more than one group by the way.
What's more interesting is when you, instead of explicitly calling Bean Validation to validate an object graph, which is typically what you would do in your business layer, what's interesting is when other frameworks call Bean Validation for you and validate the object graph.
JSF can do that and JPA can do that, they can call Bean Validation transparently for you just by you declaring stuff. And we use group in that way as well to define the kind of group JPA will trigger and validate instead of you having to manually go and say, "I want to validate, I need to call .validate for this object."
So that's how groups is actually used across things. It's not really standardized, if you think about it, at least not standardized in Bean Validation, but the goal here is to let other frameworks using validation declare how they want to express the list of groups the user is willing to validate. So in JSF, it's really a string which is the fully qualified class name that you put in your JSF page. In JPA, it's a property that you define with the list of groups you want to validate.
It's really the frameworks using Bean Validation's responsibility to choose how it will expose groups to the user.
DZone: Is there any difference in how it integrates into the runtime environments in, say, Java SE versus Java EE?
Emmanuel: Good question. So there's no fundamental difference in the sense that we wanted Bean Validation to be as agnostic as possible, as I was talking about. However, I deeply believe in the goodness, if I can say that, in the goodness of Bean Validation - in the goodness of being able to declare your constraint in one way and then let it be used across all of the frameworks over there. And particular frameworks that people are using are the standard ones. So we are talking about JPA, we are talking about JSF, we are talking about Java EE and EJB. So I have been working and the expert group has been working very hard with other expert groups, like JPA 2.0 expert group and JSF 2.0 expert group, to properly and naturally integrate Bean Validation into those two frameworks.
The integration between Bean Validation and JPA is actually specified inside the JPA specification. The same for JSF - the way you express the constraint is actually specified. So in a way, in EE, you just have to sit down and enjoy the show. Everything has been taken care for you inside the EE space.
In the SDKs, if you have all the hooks to actually do the same kind of integration. Let's say a proprietary container is waiting to the same kind of integration between its presentation layer, its persistence layer, by using Bean Validation.
The same hook that is being used by EE can be used in the SE environment. So, we have a few SPI interfaces that we expose to the Bean Validation users. One of them is how you actually interpolate messages. Maybe you want to extract them from a policy file. Maybe you are not able to expose internationalization to your user. The way you create the constraints validator, which is the logic that is validating a given constraint, it’s pluggable. Also the way you traverse an object graph is pluggable.
The reason for plugging the object graph traversable algorithm is because if you use a JPA object graph, you don't want to... Bean Validation should never go and traverse a lazy association and trigger the load of the association by side effect. That would be a very bad behavior. So, JPA has a way to put some limits in the object graph. The Bean Validation will actually validate.
DZone: Emmanuel, is there any reference implementation available for the spec?
Emmanuel: It is a work in progress. There is essentially going to be a 'Hibernate Validator.' At the time we are actually recording, we haven't really released a milestone version. It should be there next week or the week after. The people can already go to the SVN and just go to the 'Hibernate SVN,' and go into /validator/trunk, and they will see all the work we are doing, including the RI, the reference implementation and the TCK.
DZone: Are you fairly optimistic about 3.03 thing included in the Java EE 6.0 release?
Emmanuel: I am. I have spent too much time on it. Yes, I am fairly optimistic. The spec has not been not evolving for like a month and a half, almost two months, because I was exclusively focusing on getting the best integration possible inside JPA, best integration possible inside JSF. Me and my colleagues at JBoss worked very hard to get the three extra groups working in coordination to get it done in the most efficient way.
So there is no real technical barrier into integrating Bean Validation inside Java EE 6.0. Hopefully, the EE expert group will recognize that and we will get a nice spec added to the EE 6.0 specification.
DZone: Emmanuel, are there any final words of advice you would offer to our developers that are building complex enterprise systems requiring data validation across multiple tiers of an application?
Emmanuel: I guess two things. Go use Bean Validation, or Hibernate Validator. I have looked into the state of the art in Java. I have looked into the state of the art in other platforms, like Ruby on Rails and so on, and I believe in Bean Validation. If I manage to get Bean Validation inside EE, even if we don't, I think Bean Validation is the most extensible and useful validation framework out there. I am talking about all platforms - .Net, Ruby on Rails, Java, etc., etc. that's a good one.
The second advice I would give is, really trust declarations. Just use an annotation, a very expressive annotation to say, 'Hey, this string is actually an orderNumber. So I am using @ orderNumber. Then they give the actual validation to an orderNumber validator.'
Oh, by the way, that's something we didn't talk about. But you are not limited to a given set of constraints that you can apply. Bean Validation is extensible, so you can write and you are very correct to what your own validations. So you can write your own order number validation, which is specific to your company.
The last advice I would give is, there is a fine limit between business rule, business logic and validation. It is very easy to put everything into validation. But remember we have a business layer in traditional applications, and that's for a reason. The imperative approach is not that valid in many cases. So don't try to put too many things into a validation. Use it wisely. Every time there is something logical, like, 'Hey, that's an order number,' then go for it.
Just don't have a very complex validation involving multiple properties, even maybe multiple objects. Try to think back and say, should it really be declarative or should it really be the imperative.
DZone: Emmanuel, I want to thank you for giving time to chat with us today.
Emmanuel: Thank you. Thank you, Nitin, and thank you to your listeners and readers.