Actual Password Distribution Follows Power Law
If you're curious about how many people actually have strong passwords, or how many don't, this analysis will help shed some light on things. Click here to learn more!
Join the DZone community and get the full member experience.Join For Free
According to this paper, the empirical distribution of real passwords follows a power law. In the authors' terms, a "Zipf-like distribution." The frequency of the th most common password is proportional to something like 1/ r. More precisely,
fr = Cr–s
Where s is on the order of 1, the value of s that best fits the data depends on the set of passwords, but their estimates of s varied from 0.46 to 0.91.
This means that the most common passwords are very common and easy to guess.
If passwords come from an alphabet of size A and have length n, then there are A possibilities. For example, if a password has length 10 and consists of uppercase and lowercase English letters and digits, there are
62 10 = 839,299,365,868,340,224
possible such passwords. If users chose passwords randomly from this set, brute force password attacks would be impractical. But brute force attacks are practical because passwords are not chosen uniformly from this large space of possibilities, far from it.
Attackers do not randomly try passwords. They start with the most common passwords and work their way down the list. In other words, attackers use Pareto's rule.
Rules requiring, say, one upper case letter, doesn't help much because most users will respond by using exactly one upper case letter, probably the first letter. If passwords must have one special character, most people will use exactly one special character, most likely at the end of the word. Expanding the alphabet size A exponentially increases the possible passwords, but it does little to increase the actual number of passwords.
What's interesting about the power law distribution is that there's not a dichotomy between naive and sophisticated users. If there were, there would be a lot of common passwords, and all the rest uniformly distributed. Instead, there's a continuum between the most naive and most sophisticated. That means a lot of people are not exactly naive, but not as secure as they think they are.
If you need to come up with a password, randomly generated passwords are best but hard to remember. So, people either use weak but memorable passwords or use strong passwords and don't try to remember them. The latter varies in sophistication from password management software down to Post-it notes stuck on a monitor.
One compromise is to concatenate a few randomly chosen words. Something like "thebestoftimes" would be weak because they are consecutive words from a famous novel. Something like "orangemarbleplungersoap" would be better.
Another compromise, one that takes more effort than most people are willing to expand, is to use Manuel Blum's mental hash function.
Published at DZone with permission of John Cook, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.