Infrastructure is hard to build. This is true when putting together compute clusters, or when dealing with roads or power lines. Typically this involves both increases in operating expenses and capital expenses, and a small mistake can be quite costly.
All organizations have goals. Sometimes these goals are built around reliability, and sometimes they are build around budgets, but most of the time both are important. In large organizations a few extra servers, don’t usually carry a material cost impact, but in a small organization one extra servers can double the cost of a project. If you’re missing a large budget it can make some reliability goals quite challenging.
Engineering is the art of making things as weak as they need to be to survive. When putting together infrastructure in a small environment its helpful to really give someone the job of reliability engineering. They should look at your application, and outline what is required to provide the basic redundancy your organization needs. Then they should see how they can line up the budget, and the requirements and get you a solution that meets your up-time needs as well as your pocketbook.
This is batted around often when discussing the redundancy. In very large organization this n may equal 1000, so n + 1 is 1001, but in a small organization N is often 1, making N + 1 equal to 2. This is often a hard problem to work around when you may only be allowed to buy 2 severs for a three tiered application, but you can work around it. Virtualization can really help out, but it increases the planning demands. You will need to insure that you have the capacity in each piece of physical equipment to meet your needs, and that you’ve made sure that system roles always exist such that they can fail independently. While this sounds simple, it really needs an owner to keep track of this, just to make sure you don’t loose your primary and backup service at the same time.
The second issue with n + 1 redundancy, when N equals 1 you need to plan capacity carefully. The best solution in this case is to use an active-passive setup. If you use active-active setups you need to be careful that you don’t exceed 50% of your total capacity, since a failure will remove 50% of your capacity.
Wrapping it Up
Infrastructure is one of the harder things to get right in a small org. Take you time and think about it. Always keep an eye on your budget, and reliability goals.