Sanitize: Good for Beer, Good for Data
Sanitize: Good for Beer, Good for Data
Beer sanitization and data sanitization have a lot in common. With best practices in both kids of sanitization, we can all enjoy better applications and better beer.
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
Did someone say beer?!
When I’m not working, there’s a real good chance I’m thinking about beer. I know you’re saying to yourself, “Duh, me too.” Well, I’m not just thinking about what I’ve got in the fridge or what’s on tap down the road (though I am interested in that, too); I am also thinking about brewing my own, growing hops, and just about all aspects of beer and brewing. I may be a little obsessed.
When it comes to brewing, one of the most critical considerations is sanitization. The same fact can be said for development. In this blog, I am going to discuss the importance of data sanitization in development, with parallels to sanitization in brewing.
When to Think About Sanitization
For brewing, sanitization is important in all steps of the process. It needs to be considered and carefully managed throughout the brew day, during any hop (or other) additions during fermentation, while transferring between fermentation vessels, when bottling or kegging, and even during storage and serving. Ever had a beer on tap that just didn’t taste right? That was most likely due to the delivery system not be properly cleaned and/or sanitized.
Brewing sanitization resources:
In development, the same step-by-step approach can be said for data sanitization. Data sanitation needs to be considered at all steps where data is being passed along. There may not be an action that needs to take place at each step, but it should be considered.
Often, we consider the areas of user input as the places where this needs to be handled. This step is indeed critical. However, the condition of data should at least be considered at each point where it will be inserted into a new record.
Development sanitization resources:
For user input in development, an important and complementary step to sanitization is using validation rules to attempt to control what data is passed in.
In brewing, you can compare validation to making sure you have the right ingredients before getting started. For both development and brewing, this could encompass a completely separate discussion.
After user input has been validated, you can then sanitize the values passed before submitting them to your application. The exact method used should depend on your platform and the tools available, but you will want to make sure that all risks that could be introduced in the data have been checked and resolved.
You will want to check for illegal characters, injected code, and any additional entries that could pose risk to your data or application stability. Depending on what is found, you can remove, escape, and/or replace the unwanted information and submit the data, or you can return to the user to have them alter the data.
When returning to the user to have the data altered, it is important that you do not expose too much information about your application that could assist a user who is attempting to expose vulnerabilities.
This step often needs to be balanced with providing information that could be helpful in a support situation. In this case, providing error codes that can be referenced by support staff is a good method, as opposed to providing detailed application information. If the data needs to be displayed back to the user in an error situation, considerations will need to be made if information needs to be updated prior being seen by the user.
With both brewing and development, all of this sanitization should be considered from the very beginning in the planning and architecting stages. You need to make sure you have all necessary tools in place and readily available for use.
Often the tools that will be used for sanitization will be developed within your application. There are functions that can be used that will reduce the amount of effort involved.
My best advice for your particular application would be to review the libraries you intend to use (or are using) for their best practices and included functions and tools (if available).
There are multiple options for sanitization in brewing and often it is a choice of availability or brewer preference. As you can see above, the same can be said for development. There are many methods available and the choice often comes down to personal preference or what is readily available on the current platform.
Making these decisions in the beginning when possible is critical to increasing the chances that all points are protected. In development, it is possible to go back and add later in the process, but it will most likely introduce unnecessary risk and extra work. As also in brewing: if you miss something, you typically are unable to go back and sanitize after the fact.
Skipping this important step or not focusing the appropriate energy on data sanitization can have effects in both development and brewing — from small to catastrophic.
In brewing, you can introduce unintended flavors, create a lesser end product, or completely ruin your hard work. With development, you can introduce inaccuracies or bad data, errors can be caused or exposed to attack, and security holes can be created.
With best practices in sanitization, we can all enjoy better applications and better beer.
Published at DZone with permission of Jeremy Gard , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.