We’ve all been there. The power goes out, or someone digs up some fiber and you lose connectivity. You either can’t get your work done, or else the phone starts ringing with users who can’t. All of us in the IT industry work hard to make sure that the applications and workspaces that we support are up and available to the users that need them, when they need them. What are the basic steps that we address to make sure our systems are up when they need to be up? How do we balance availability and uptime with the IT budget? Where does redundancy figure in our disaster recovery planning? After all, as they said in the movie, The Right Stuff, “No bucks, no Buck Rogers!”
Full power redundancy
Those fancy servers of ours go nowhere unless there is power to them. This means dual power supplies to the physical boxes. If one fails, then the remaining power supply needs to be large enough to run the server or appliance. In addition to this, these power supplies need to be on separate electrical circuits. It does little good to have two power supplies if the same circuit or UPS failing will take down both. Speaking of Uninterrupted Power Supplies (UPS), for a truly redundant system, there should be fail-over paths for these as well. There don’t necessarily need to be two, but there should be a clear path to power in the event that a UPS fails.
Your servers or applications don’t do users any good if no one can access them. That means that there are multiple paths from users to each workload. From multiple NICs within each enclosure, cabled to multiple switches with multiple paths to the core, physical paths are important for redundancy. Multiple demarcs leading to disparate paths of connectivity are also important. Of course, these multiple paths get expensive, so use your best judgement as to what the return on investment is on these options.
You have your physical hosts covered. We have multiple paths to the data. Now we need to work on system redundancy. There are solutions from failover clusters that have no application downtime to high-availability servers, that may have limited amounts of down time, but will automatically restart servers on a new virtual machine if the old machine fails for some reason. These are two different ways to address the Recovery Time Objective factors of Disaster Recovery. With most things, the smaller the downtime window, the larger the price tag.
And of course, there is always the decision to outsource things. Having a company host your servers, or going with a Cloud solution are of course viable options to redundancy. Whether you are allowing these services to host all of your computing infrastructure, or you are using these as part of your failover plan, they are tools that you can use in your redundancy toolchest. Large Cloud providers can spread the cost of massive redundancy between many clients, making it very affordable to use.
Double up on things
So, we have gone over a few of the most common things that IT staff will use to ensure consistent connectivity in their environment. Obviously, the specific needs of your environment are best known to you. Any decisions will need to be weighed against your budget and management’s risk appetite. This article has been designed as a jumping off point for redundancy planning of your network. What are you doing in your environment?