Disaster Recovery Plan – Is Your Business Prepared?
Disasters happen. Without a sufficient recovery plan in place – one that is written for your business, updated regularly, and tested and rehearsed – the only thing protecting your business, your employees, and your customers is luck.
Brian: Welcome to another episode of the Fearless Paranoia podcast where we seek to demystify cybersecurity and make it understandable and easier to use apply and keep yourself safe. My name is Brian. I’m a cybersecurity attorney. His name is Ryan. He is an IT specialist. He also apparently builds things and flies things and has Kerbals all over the place and is just an expert in the field. And as always, I’m incredibly thankful to have him here and have him as an advisor and as a friend and it saves me a lot of headaches.
Today, we’re going to talk about probably one of the bigger headaches that you can possibly go through as a business. We are in the middle of hurricane season, we just saw Florida get absolutely drowned. So not as an entirely random based on that series of events episode we’re going to be discussing keeping your business and your data safe in the event of a disaster, how to protect yourself from the catastrophic consequences that come with losing everything because you didn’t protect against what most insurance policies referred to as an act of God. Now, disasters themselves come in many forms. The most common when we think of now is probably depending on which coast you live on a hurricane or a wildfire. But all a disaster really requires is an unforeseen event that causes potentially randomized and not necessarily predictable levels of damage that interfere with your ability to do business. This can be anything from a lightning strike a hurricane, a flood of fire, it can be an unexplained loss of telecommunications capabilities. One of the more bizarre recent ones was a couple years ago, when British Airways lost the ability to transmit data between their planes for three days, there was no clear understandable cause of the problem. But it did cause immeasurable damage to their business beyond just the dollars and cents that comes from having to deal with reshuffling flights and dealing with angry customers when you’ve lost reputational damage that has lasted for years. So a disaster is anything that’s unforeseen in the immediate in your immediate sphere. But the most important thing is it doesn’t necessarily mean unpredictable. Disaster Recovery applies to a lot of different things. But probably the biggest area to start from is that it comes from an analysis of the risks your business could face, how likely those risks are and the level of damage that could occur if those risks occurred. It is basic statistical analysis, it’s a basic understanding of how likely is it and how bad would it be if it happened, right. And you have done a lot of work with a lot of companies regarding cybersecurity. And I do know just from my work that disaster recovery is one of those things that people who are asking for cybersecurity help don’t think of but the people who are providing it, it’s first in line, it’s one of the first things that every it every cybersecurity Consulting Group, every emergency services company, they always go to first disaster recovery. Tell us why it’s so important.
Ryan: Well, it’s important exactly why you mentioned it’s important. And it’s important primarily, because first of all, a lot of people don’t understand why it needs to be front and center until you actually lay it out in front of them. And once you make the business case for it, next thing you know, it becomes front and center for everybody from the executive level all the way down. To me, the biggest place to start from is two of my favorite acronyms inside the disaster recovery space. And that’s RTO and RPO. And those are where the conversations I always kind of go back to and start is from a business when I first asked them and they say where are we going to spend time? Why are we gonna spend effort building a disaster recovery plan? What do we need this for? Are we just taking insurance boxes? What are we doing? The first thing I asked them is okay, what are your critical systems? What are the things that if I went and shut down your whole infrastructure today, what are the things you couldn’t live without, and couldn’t operate the business without, and we generate a list. And usually, it starts out with a very small list of core services, and eventually expands out to quite a big list once you start to look at the ancillary services that are required to support those services. So you come up with a good list of these are the can’t live without right? Same kind of stuff. What if this were to light on fire? What are you going to do? How are you going to survive without this data? So then you hit the RTO RPO conversation, which is recovery point objective and a recovery time objective. So now you’ve got this core services list, how much time can you survive for them as a business without these core services? That’s your RTO point. So the first thing is determining what order do these need to come up? You break down each of those different individual services and say, How long can we survive without these four, and then you start building your plan to recover those services based on the amount of time that you have and trying to recover in a time period.
Brian: One other basic way I’ve heard that described as RTO, recovery time objective, the amount of time it takes to recover all of your applications. And it sounds to me like you’ve actually taken that a step further. It’s not just how long does it take to recover all of your applications. It’s how long it takes to recover your most critical applications in the order that they need to be recovered. Because there are quite frankly, going to be certain systems that need to be back in place before other systems can be restored. Sometimes they’ll be just more important. Sometimes they’ll be dependent on those initial systems. So you take it a bit further than just the basic definition of how long.
Ryan: Yeah, that’s exactly it. Like do you need your core backup services running in order to operate your business? Probably not like not effectively running right at the moment in order to do business like, can I sell a product without my backup stuff running? Sure, probably. Now, does that become part of your DR Plan? Absolutely. And it’s a required piece, but it has a different RPO and RTO than something like a point of sale system would, because if you’re trying to sell a product, if your point of sale systems down, your business isn’t selling product, you’re not generating revenue, you’re not doing anything at that point. So to me, if your point of sale system is down, that’s a high one, if you’re a retail company, or if you’re a hospital, what if all of your access to your X rays and your data and your exam rooms and stuff goes down? That’s a high critical RTO RPO event that needs to be dealt with. But then again, access to your legacy data from 10 years ago in an old database, probably pretty low, when it comes to critical nature as far as operating the business. So I think you can classify RTO RPO as like business wide full core services, if you want to just put together a very loose DR plan. And for a small business that might be all you need is maybe you’ve defined your four or five core services, and you just put RTO RPO, wrapped around all those because you don’t need to really segregate it break it down further. But for a large business, especially large enterprises that have got a really distributed network, you kind of have to classify as part of your DR plan, staged RTO RPO stage disaster recovery. So you get core critical services up first, then you do support services beyond that, and then you go into ancillary and then like non critical services and you really want to kind of break it down to that level to make sure that you’re not looking at the whole thing as one big event that will now take a week or longer maybe to really kind of restore everything, you can break it down into consumable chunks, get the business operating again, even if they’re just limping along, you’re operating the business again, and then you can work on bringing it back up to you know full, full service recovery.
Brian: You’re listening to the Fearless Paranoia podcast, we’re here to help make the complex language of cybersecurity understandable. So if there are topics or issues that you’d like Ryan and I to break down in an episode, send us an email at email@example.com or reach out to us on Facebook or Twitter. For more information about today’s episode, be sure to check out Fearless Paranoia.com We’ll find a post for this episode containing links to all the sources research information that we have cited to you. And also check out our older posts and podcasts as well as additional helpful resources for learning about cybersecurity. Now, back to the show.
Brian: That’s I think an excellent point to understand is that when you’re starting this out the RTO and RPO. And just I know you’re going to discuss this in greater detail just a minute the RTO recovery point objective know I’m not as fluent on this side, because this is a much more technical thing. The RTO is how much time is it gonna take but the basics are, the more time you have to dedicate to putting together your plan. Now let’s be very clear. This entire episode in basic terms is about why you need to establish a disaster recovery plan and how that disaster recovery plan can be established. So the more time you’re willing to dedicate to creating your disaster recovery plan, which means various things, including identifying the systems that are critical to run, systems that are nice to haves but not needed, the more you can narrow that down, the tighter you can get that list, the better your plan is going to be. Because the shorter your RTO is going to be the less time it’s going to take you to get back. Now if you don’t have the time to dedicate to creating this huge disaster recovery plan and very detailed, that’s okay, it means that you’re going to have a longer recovery time, that’s not necessarily a bad thing. The important thing is that you know that you’ve established something and you have recovery time objective, so that you have an idea of how long it’s going to take the systems that you need to run your business. And again, the longer you spend on an analyzing them, the narrower your focus can be, but you don’t have to spend a ton of time in order to have an effective plan. So talk about recovery point objective, what does that mean?
Ryan: Yeah, so recovery point is the hand in hand piece with your time recovery. So the knowing the amount of time it’s going to take you to effectively recover is a huge point. But then again, in most of these incidents, there’s an opportunity for data loss also, which is another critical point because again, let’s just say you don’t have immediate backups, like a lot of backup systems aren’t taking backups at every single delta in the data. They’re not, you know, looking for one minute post of the data and then setting a new version of a backup over usually it’s scheduled, you have an amount of time, your backups are occurring every four hours, once a day, once a week, once a year, whatever based on the critical nature of the data that you’re that you’re backing up. So recovery point objective comes down to with each of these critical core systems. And again, for if you’ve got a small environment, you can do it as one RPO across all your systems and just classify all of them. If you’re in a big, distributed enterprise environment, you’re going to want to do RPO individually for each of your critical systems that has data. And again, if it doesn’t have his data, it probably doesn’t fall into the RPO. But RPO would be for your datasets, how much data can you effectively lose? So now let’s say your whole network gets ransomware or let’s just say your data center gets more wiped out, let’s say hurricane, whatever, Hurricane Brian and Ryan comes through and just goes, and just flattens your whole data center and you don’t have your stuff in a second data center somewhere. Maybe you got some tape backups, maybe what Anyway, guys, you’re gonna have to institute your disaster recovery plan. And one of the first things is what is our backup set look like? What is our current data set that still exists look like? And how can we bring that back into play? And what’s the delta between where that data set is and where we were at the moment of disaster? So what did we lose in between? And so your RPO comes down to how much can you safely lose and still operate your business? Now, obviously, every business owner is gonna go, well, zero, I can’t afford to lose any data, because that’s just the general nature of it, right? You don’t want to lose anything. So I guess it comes down to how much data can you comfortably lose or uncomfortably lose and still operate your business. Now, again, if you lose your trade secrets, game over business is done. But if you lose a day’s worth of sales, or a day’s worth of orders, something like that might not be the end of the world, right? You might piss off a few customers, you might lose a little bit of revenue. But the fact of the matter is, as long as you can still sell products and things within a day or so you’re up and running, and you are generating business again. And you will overcome that inconvenience, that small hurdle. But the critical point is can you get back to operating by losing that six hours’ worth of data, I’ve worked in the past at a hedge fund our recovery point objective, there was as close to zero as possible, because again, you’re making millions of dollars and trades every few minutes, if you start losing a lot of that data that cost compounds very, very quickly. So then you start looking at some of those trading platforms. And you have to do things like immediate log shipping. So we got to a point where every single trade that occurred, you are posting that log to a database, and then you are shipping that new updated database over to an off-site to like a hot site der backup site with every transaction, which is very resource intensive, very costly, but it also means that if our core site goes down, I can flip on the hot site. And within minutes, we are back up and running, again, with a relatively accurate data set that has almost zero loss, most businesses don’t want to take on that kind of cost. So again, you have to make that compromise of how much can you comfortably or uncomfortably lose, but still get back to the point of being operational. And you have to find that middle ground, as uncomfortable as it is you have to find that compromise. And then that really is where you set your RPO at and then that becomes your target in a disaster recovery scenario.
Brian: The amount of data loss that you’re willing or able to survive, not just as a result of the disaster, but during recovery from that disaster. And you’re not suggesting that it’s lost business, you’re talking about lost data, things that as you mentioned, if you lose a day’s worth of sales, will you have to call up your customers, the ones you can identify and re execute the sales, or are you gonna have to wait for them to call you and everything like that the end results that can be fixed, but the fixing them is a part of that situation. If you lose one day’s worth of sales, you’re gonna have one day’s worth of customers pissed off, and you may not be able to identify those customers right off the bat. So you may be waiting for them to contact you. But another thing you said was I thought very critical is that you need to determine what is the pain point you were able to take and then find a solution based on that pain point. And even though there has to be some sort of balancing of priorities when it comes to cost, especially for small businesses, it still seems to me like the very first question is always what is your maximum pain point, where’s the point where you can no longer tolerate anymore, and at which point you have to work back from there and say, okay, regardless how much it’s going to cost me I need to set up a system that is going to get me there. And as far as cost goes, I’m gonna go back to one of my old talking points about cybersecurity is that for every expensive solution, there is an inexpensive solution that will most likely get you to the same place, but will probably take a lot more, you know, a lot more work on your end a lot more understanding on your end for you know. When it comes to data backups, you can go with a very expensive data backup solution, one that stores your data in four different data centers around the world backs up four or five versions of every document, or you can plug in a hard drive and use a backup software, one of them is going to cost a lot more, and you’re not going to have to do much, the other one is going to be pretty damn cheap. But you’re gonna have to do the work. The solutions are there. You don’t necessarily need to spend a ton of money, but you have to meet that maximum pain point. If you’ve determined you can’t live with a data loss beyond a certain point, your only real option is to find a way to stop that loss at that point.
Ryan: And that’s why disaster readiness and recovery comes down to one critical word. One word sums up the whole thing and it is planning. You absolutely need to have plans in place to deal with these things. And that’s why going into some of my favorite acronyms for disaster stuff is DRP IRP, BCP at the end of every one of those is a p, right? Disaster recovery planning, disaster readiness planning incident response planning, business continuity planning, planning, plan, plan, plan. You know what none of those solutions are going to do you damn better good without a proper plan in place. And all those solutions are doing is costing you money if you don’t have a plan for what those solutions are actually doing. So if you don’t know why you’re buying a solution, if you’re buying it to check an insurance box, you’re probably wasting money. And at that point, I’d recommend just going with the lowest solution because if you don’t have a plan for what you’re doing with your EDR how to keep it Modern how to report on it, how to guarantee compliance with it. And all you’re doing is checking a box, buy the cheapest thing you can, because you’re just checking a box. All you need to do that as a pen effectively and a small tool with a name on it. But if you have a proper plan in place to actually deal with these events, coming back to RTO RPO, making sure you’ve got the right tools in place for when disaster recovery comes into play like good backups, making sure that you’ve got good visibility and good logging in place for when you hit an incident so that you can enact those plans and making sure that you’ve got the right people set up so that when a certain type of incident or a certain type of disaster hits, you’re not scrambling at the ground level and trying to figure out okay, how do we deal with this? You have a recorded plan of how to deal with this, you have a recorded list of people that are responsible parties that are supposed to deal with different pieces of this because again, a disaster can be something like a datacenter getting wiped out. Who’s responsible for backups? Who’s responsible for system access? Who’s going to notify customers? Who’s going to notify our people? Who’s going to be the project manager, the leader, that’s going to lead the response effort? If your disaster is DLP, or ransomware, or something to that effect, major breach, who’s going to be the one to do the marketing and the PR work that’s involved in that who’s going to contact legal counsel who’s going to get a hold of whatever third parties are needed for incident response?
You got to have all these things planned and regimented because all of these disasters occur at a moment’s notice. You will never fully be ready for one that’s coming. Because rarely ever, do you get any sort of lead time that these events are coming. And businesses don’t have a pause button, you can’t just come back later, you need to be ready to go at a moment’s notice. So like having these plans in place, and the people ready and the tools ready is critical to just surviving as a business nowadays. And these threats and disasters are growing in their type and nature. The more that we keep distributing, the more that we keep heading out into the internet for these types of services, the more we rely on different applications, different service providers. Every one of those things becomes a potential fail point along the way. What if somebody finds a way to go into Google and just wipe out Google’s backups? Google’s data, Google’s everything, or Apple or Microsoft, for that matter? God forbid, what if somebody actually got in and was able to wipe out Microsoft’s whole database and all of the backups that they’ve got, think of what that would do to businesses around the world, or just one facility of Amazon’s hardware setups, even taking one of their regions down would be enough to cripple numerous businesses for a day. And a lot of those businesses, even if they’re deep into Amazon, and a lot of cases, their backups are in Amazon, their recovery systems are in Amazon. And so again, you have to kind of determine what level of outage you’re willing to deal with. And even though you may be one of the smaller players in that, like if an Amazon goes down at least you’re not going to be feeling the pain alone. It’s not going to be just you know, ma and pa’s liquor store is hurting today because Amazon’s down know everybody’s going to feel that pain together. So at least I mean, Misery loves company, I suppose. Yeah, you’re sharing it that but yeah, it’s, you know, again, if you want to be fully distributed, you need to look beyond just those individual critical fail points and critical control points. And you need to account for, you know, unfortunately, all of them, or as many as you can.
Brian: You’re listening to the Fearless Paranoia podcast for more information on keeping yourself your family and your company protected against cyber threats, check out the Resilience Cybersecurity and Data Privacy blog. If you’re enjoying this podcast, please like and subscribe using any of your favorite podcast platforms.
Brian: And I do like when our conversations actually dip into an area that I have some expertise. It’s kind of fun. But yeah, the planning a disaster response plan, you know, you can imagine that you’re safe. But do you have your plan set up that identifies who gets contacted, how there is a different method of communication when you’ve been hacked? Because you don’t want necessarily hackers to be able to eavesdrop on your emergency communications. Then there is during a hurricane, when you can’t necessarily rely on methods of communications all being functional. So your disaster response plan has to account for those different variabilities. You need a succession plan, one that identifies who has what roles and when someone can’t be reached within that plan, who takes over for them and having the authority to make that important decisions that that role involves. You need to have your criticality of services list – your most important services that need to be restored immediately. You need to have them in order, you need to have them listed in advance, you need to functionally know what your business can operate without before you’re forced to do so. You need to have your data backup and restoration plan needs to be in place. And it needs to be instituted based on the possibility of multiple different types of disasters taking place. There’s the physical disasters that occur just at your office as the one that wiped out your whole town. There’s the ones that wipe out data of a certain type if they go after one particular cloud service. So that can be very distributed impact many different people in many different ways. You need to have your media management who’s going to discuss with the public what happened to your information if it’s necessary, and trust me speaking as a cybersecurity attorney, it is more likely every day that anytime you have anything that affects your data, it’s going to be necessary to tell people about it and probably the biggest aspect that I push every single time it comes up, have you practiced it?
Ryan: Yeah, I’m glad you said it. Because if you didn’t, I was gonna bring it up. Because to me, if you don’t test all of those plans accurately, it’s not going to do you a hell of a lot of good in the very end if you’re not actually testing and executing those plans. Because again, all you’re doing is producing a piece of paper. And in a disaster, if you haven’t actually tested that and executed that prior to that, it’s, it’s going to be about as good as the toilet paper is for you in the end. So absolutely, absolutely put together good planning, even if it has to be lightweight, make it as extensive as it needs to be for your business. Don’t overcomplicate it, because that’s not going to help you effectively execute it. And then test test test test frequently in enterprise environments. Anyone that’s not doing tabletop exercises around disaster recovery, incident response, stuff like that is behind the game and apologize, I’m gonna offend somebody, but you are behind the game. If you’re not doing testing on stuff like this 80% of businesses in the last year have dealt with some sort of either compromised or ransomware event 80% 80. That means the heavy majority, we have now effectively gotten to a point where our threat actors have met the threshold of being 80%. Effective. That’s a scary thing. So well.
Brian: And the other thing about tabletop exercises, I wouldn’t want to say it was that anyone in cybersecurity will tell you how important they are. But from someone who deals with bringing businesses into a world of being resilient, capable company that can handle dealing, even with setbacks, as a part of general education, you need to have exercises that are proper for your business that address the issues you face. But that are also engaging and interesting. There was a great story about a technology company that during a seminar, basically, in Las Vegas, they had a company retreat, it was bunch of programmers, and they broke them into groups, and they gave them their security exercises. But the way that it was put together was both a collaborative and Competitive Enterprise, there was a leaderboard up and you could actually help other teams improve their scores, and they could help you improve your scores based on how things were done. It can be done in a manner that is engaging, and that is memorable. There’s no point in teaching someone something if they’re not going to listen to you. And it’s not really all that worthwhile teaching them if they listen to you and don’t remember. So get exercises and tests from people who know how to teach this stuff. Think back to school, the differences between the teachers you liked and the teachers you didn’t and the relative amount you learned in those classes. Disaster recovery should be top of your list as a small business, it’s something that you should plan for to the extent you can. You need to have a basic plan in place. If that’s all you can do and the more time you can spend on it, the better it’s going to be provided you test it and make sure it actually works. One of the things that you can find on our website and then on resilience is an example disaster recovery guide a checklist of how to put together your disaster recovery setup. I can’t give you the entire disaster recovery plan and all accompanying plans because then I would be broke and you don’t have any reason to hire me but the information is available and it’s something you should definitely take a look at. Thank you for tuning into Fearless Paranoia. We enjoy talking about this stuff. We enjoy helping people out in getting past some of the dense and hard to understand and sometimes hard to appreciate elements of cybersecurity. I’m Brian.
Ryan: And I’m Ryan and we will see you next time
to make cybersecurity understandable, digestable, and guide you through being able to understand what you and your business need to focus on in order to get the most benefit for your cybersecurity spend.
©2022 Fearless Paranoia