Udi Dahan posted about the myth of infinite scalability. It is a good post, and I recommend reading it. I have my own 2 cents to add, though.
When building a system, I am always assuming an order of magnitude increase in the work the system has to do (usually, we are talking about # of requests / time period).
In other words, if we have 50,000 users, and we have roughly 5,000 active users during work hours, we typically have 100,000 requests per hour. My job is to make sure that we can manage to get up to 1,000,000 requests per hour. (Just to give you the numbers, we are talking about moving from ~30 requests per second to ~300 requests per second).
Why do we need that buffer?
There are several reasons to want to do that. First, we assume that the system is going to be in production for a relatively long time, so the # of user or their activity is going to grow. Second, in most systems, we are usually talking about some % of the users being active, but there are usually times when you have a significantly more users being active. For example, at end of tax year, an accounting system can be expected to see a greater utilization, as well as at every end of the month.
And within that buffer, we don’t need to change the system architecture, we just need to add more resources to the system.
And why a full order of magnitude?
Put simply, because it is relatively easy to get there. Let us assume end of year again, and now we have 15,000 active users (vs. the “normal” 5,000). But the amount of work we do isn’t directly related to the number of users. It is related to how active they are and what operations they are doing. In such a scenario, it is more likely that the users will be more active and stress the system more.
Finally, there is another aspect. Usually, moving a single order of magnitude up is no problem. In fact, I feel comfortable running on a single server (maybe replicate for HA) from 1 request per second to 50 or so. That is 3,600 request per hour to 180,000 requests per hour. But beyond that, we are going to start talking about multiple servers. We can handle a millions requests per hour on a web farm without much problems, but moving to the next level is likely to require more changes, and when you get beyond that, you require more changes still.
Note, those changes are not changes to the code, they are architectural and system changes. Where before you had a single database, now you have many. Where before you could use ACID, now you have to use BASE. You need to push a lot more tasks to the background, the user interaction changes, etc.
Of course, you can probably start your application design with a 10,000,000 requests per hour. That is big enough that you can move to a hundred million requests per hour easily enough. But, and this is important, is this something that is valuable? Is this really something that you can justify to the guys writing the checks.
Scalability is a cost function, as Udi has mentioned. And you need to be aware that this cost incurs during development and during production. If your expected usage can be handled (including the buffer) with a simpler architecture, go for it. If you grow beyond an order of magnitude, there are three things that will happen:
- You have the money to deploy a new system for the new load.
- You now have a better idea on the actual usage, rather than just guessing about the future.
- You are going to change how the system work anyway, from the UI to the internal works.
The last one is important. I am guessing that many people would say “but we can do it right the first time”. And the problem is that you really can’t. You don’t know what users will do, what they will like, etc. Whatever you are guessing now, you are bound to be wrong. I have had that happen to me on multiple systems.
More than that, remember time to market is a big feature as well.