Failing to properly validate input data is behind at least half of all application security problems. In order to properly validate input data, you have to start by first ensuring that all data is in the same standard, simple, consistent format – a canonical form. This is because of all the wonderful flexibility in internationalization and data formatting and encoding that modern platforms and especially the Web offer. Wonderful capabilities that attackers can take advantage of to hide malicious code inside data in all sorts of sneaky ways.
Canonicalization is a conceptually simple idea: take data inputs, and convert all of it into a single, simple, consistent normalized internal format before you do anything else with it. But how exactly do you do this, and how do you know that it has been done properly? What are the steps that programmers need to take to properly canonicalize data? And how do you test for it? This is where things get fuzzy as hell.
To read my latest post on canonicalization problems (and the search for solutions), go to the SANS Application Security blog.