I got a call from a customer not long ago asking me questions about Disaster Recovery planning. Now we’ve all developed DR plans, and a quick search on the interwebs will get even the novice started on the basics, but there were three things that we went over that dredged up old memories and I thought I would share them here. Those things are prioritizing your servers (or functions), doing the math, and asking somebody to help.
A Place for Everything and Everything has a Place
When I worked as a systems engineer at a large company, we developed a divisional DR plan. After a lot of busy work thinking we needed to recover everything now and not getting that to happen without a project budget we could denominate in gold bars, we recognized that not every server or business function needed to be up immediately. There was a method to recovering systems, and we decided to group everything into three classes with different RPO and RTO objectives.
The most important systems were classified as “A” systems, and were the first to be recovered. These were the business critical boxes that needed to be up ASAP. Systems that directly related to and directly impacted the business lines and areas of the business that were visible to the customer.
When the Class “A” were completed, or well underway, we could focus on the class “B” systems. They were not as business critical and not as important as the class “A” systems, but were important to the business. Internal systems that the business needed to run, but were not immediately visible to the public, or systems that could wait until the Class “A” boxes were up.
The last class was the Class “C” systems. These needed up, but had the longest RTO and the greatest RPO of the bunch. The “nice to have” systems. By categorizing our recovery, we could get the important stuff done first, and then work on the rest.
Do the Math is a simple concept
Just like your professors told you back in school, do and show your work. Go ahead and run those storage studies so you know how much you will be recovering. Do a growth test and see what the “delta” (information change) is on your local systems on a daily and weekly basis. Then plan around that. Run the numbers, build a spreadsheet. Are your WAN connections big enough to replicate your data within your time allowance? How often can you make snapshots with the available space on the SAN or NAS? Do you have enough hardware to recover what you plan to recover at the site? Something as simple as can you read your tapes (Boy, am I dating myself!)?
Another Do the Math concept is activate your plan with a limited scope. Nothing will show you where your plan is weak like trying to recover a small amount of data. It doesn’t have to be a full-on test, but activate your plan for a single server. Send someone over to the DR site and have them try to recover last night’s email server, or the HR system. Or only a small portion of the system. Pick 100 records to restore – just enough to tell you where your plan needs more work. And where you can improve.
If your company is big enough to have one, invite the audit department to tag along. Nothing impresses the audit folks and regulators (if you are in that line of business) like testing your plan and working to improve it. Nothing is perfect the first time around or even the seventh, so do the math to improve it.
Ask somebody. But not just anybody
The Beatles were on to something there. There are people in your organization that can help you out. When we classified the systems in the organization into different classes, we didn’t just pick those systems at random. We asked for help. The IT Department sent out questionnaires to department heads and had them rank the systems that we had identified for importance and impact. We also asked for any systems that we might have missed and were not on the list. You would be amazed at the systems WE thought were important versus the systems the BUSINESS thought were important.
Remember that your DR plan is not an end product. It is designed to let the IT assets of your company help recover the business lines of your company. Of course, information is vital to your company, but how long will you be in business if the widgets don’t get made? If Accounting needs the company chat system to be up first, then the chat system needs to be up first. And no matter what anyone says, email is a Class “A” system. If management doesn’t believe that, turn it off for an hour and see how the phones light up.
Nothing that I have said in this article is rocket science, it is just a few lessons learned from building a plan, and then working to test it. Technology changes, and thus the tools used to implement your plan over time will vary, but the fundamentals of prioritizing your servers, doing the math, and involving the business lines for help still remain pertinent today.