Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Kill Your Users Table

DZone's Guide to

Kill Your Users Table

·
Free Resource

It's time to rethink how we store user information. You don't need as much as you think you do.

Consider...
Reddit's database was stolen a little over a year ago (yes, I know, old news. There's a point here). This, from the team blog (emphasis mine).

...a backup of a portion of the reddit database was stolen recently. Although the media did not contain any personally identifiable information about our users ... we wanted to alert you to the possibility that your username, password, and in some cases e-mail address may have been compromised. If you use the user name and/or password for other purposes, we suggest that you change them in those other uses as soon as possible – just in case.

This is a response from spez, one of the developers.

[Password encryption] is [easy to implement], and I'll go ahead and do it now that everyone has decided to weigh in. Personally, I prefer the convenience of being having my passwords emailed to me when I forget, which happens from time to time since I use difference passwords everywhere. Not hashing was a design decision we made in the beginning, and it didn't stem from irresponsibility-- it stemmed from a decision to provide functionality that I liked. It bit us in the ass this time, and we are truly sorry for it. The irresponsibility (and there is some) was allowing our data to get nabbed.

We can beat on the Reddit guys for this all we want, however (in lighter form), applications still retain a lot more user information than they used to and perhaps it's time to change that.

Think Like a Hacker

Mr. Savvy Hacker knows that it's not too hard to imagine a way to get at some good information with the data from Reddit:

//loop the millions of users we just stole
//and see if we can login to their email
foreach(User u in RedditHackedDatabase){

string email=u.Email;
string password=u.Password;

//setup login spoof for web-based emails, like
//Google, LiveMail, Yahoo, etc
//....

}

f we assume that Reddit has a million users, odds are that some percentage of those users would have the same password as their email address. It happens -it's people mechanics and people are too trusting.

Let's say the net of this harvest is small - 1000 emails compromised. It took maybe 4-6 hours to run this looping routine, and right now Reddit doesn't even know their data is gone. Once they find out, they'll probably run damage control which will take another 6-8 hours, and then they'll let their users know. The Savvy Hacker has a day or more to go nuts with these email accounts.

Things like looking for forgotten password emails. From your bank.

Rethink It

Today I was on a call with Scott Hanselman who is leading a charge to re-imagine and redo the Northwind Database. This was a Skypecast and we talked about all kinds of things, and he asked me about my experiences with Membership and the MVC Storefront.

I suggested we kill the Users table.

This devolved into some back and forth where my thoughts were challenged greatly (by many) and I can sum the challenges thus:

  • The application needs to at least know the first/last of the user so I can recognize them, at the very least. All sites do this.
  • What about [featureX]! There's no way I can do this without knowing who the user is
  • How's my application going to send information to the user

These are very good points, and I think that there is room here to rethink "how it's been done" and come up with a way that protects our users better. Perhaps we can start by understanding this information is not something we need in our system, in just about every case.

User Identification As A Service

Enter services like Open ID and Passport/LiveID. These services will store information about the User for you. You don't need it. Really.

There are some things where you might want to know a user's first/last name - but ask yourself why. Does the application really need it?

  • Billing and Shipping Addresses: Billing records are kept at the payment processor, and shipping is likely kept in a waybill somewhere - both accessible by your application when needed. You don't need this in your database.
  • Names of people who review/comment in my application: You can get this using OpenID and LiveID, or you can ask for a reference when they add the review/comment. Scott's blog is a great example of this.

There are always exceptions to this (social applications - like forums); even then there are ways to minimize what you know about your users. The point about the application contacting the user is a good one, but I might suggest using an "Opt-in" - where you ask the user if they want to be contacted. I never do - but I always have to turn that feature off.

Information storage is a trend, however, and it's often easier to "just ask" so we have it when we need it; in other words we opt for convenience. This design philosophy echoes what spez (Reddit Developer) said above:

Not hashing was a design decision we made in the beginning, and it didn't stem from irresponsibility-- it stemmed from a decision to provide functionality that I liked.

This design decision wasn't thought out all the way. The ramifications clearly not fully understood. It's time to change this way of thinking.

Ultimately people will think that the responsibility sits with the user to provide a unique password and "take care of themselves". This is a comment on the Reddit site, after spez's above:

Access credentials are a mutual decision (I pick my username/password, you store them), so it is up to both of us to decide how important it is that someone can't pose as me or access my data on this particular website. If I'm particularly worried about these things, I'll pick a password that is hard to break, and ask you about how you store passwords. If I'm not particularly worried, I'll just use a different password than the sites I am worried about people breaking into.

This is a reasonable point, although I might argue that it's usually pretty darn important that users don't pose as other users - else what's the point? Perhaps notifying the user is something we could do:

When you login to our site, please be aware that we don't take strong measures to protect your data and some or all of your information may be compromised at some point in the future if our data is stolen. As such, we recommend using a username and password that you deem appropriate to mitigate our design decision.

The problem here is that when something goes wrong, it's always the developer that is blamed. It's up to us to demand something better for ourselves.

A Challenge

This is a challenge to give privacy a bigger consideration in your application - in the same way you might be challenging yourself right now to use TDD or Agile. Shift away from the information arrogance that make us think we can ask people for their email account and password (with apologies to Jeff Atwood):

[img_assist|nid=3932|title=|desc=|link=none|align=none|width=483|height=315]

Never forget that people are after your data, all the time. I get reminded of this daily. All I have to do is turn on my FTP server for 60 seconds, and I get some great pings from people trying to hack my server:

[img_assist|nid=3933|title=|desc=|link=none|align=none|width=504|height=362]

I'd love to hear your thoughts on this. Are you willing to Kill Your Users Table?

Original Author

Original article by Rob Conery

Topics:

Published at DZone with permission of Schalk Neethling. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}