Traditionally, organizations have focused on protecting themselves against natural disasters, power outages, major telecommunications outages and unexpected data loss events caused by human errors. In recent years, their focus has also been pulled toward concerns about imminent cyberattacks, malware, ransomware and other major threats to their organization’s data and the continuous operation of their business services.
Fortunately, there are technologies, policies and procedures that help organizations mitigate these risks and quickly recover from potentially crippling downtime and data loss to ensure always-on operations. Establishing the appropriate Recovery Time Objective (RTO) and Recovery Point Objective (RPO) for your systems, applications and data is integral to any comprehensive and successful Disaster Recovery (DR) plan—and there’s a science behind arriving at these two metrics.
RTO and RPO are two of the most important metrics used when establishing the protection methodology and recovery procedures for mission-critical business systems and applications. Formalizing the appropriate recovery objectives for your organization will help minimize service disruption and data loss. Optimizing the approach, technologies and procedures will result in an ideal set of outcomes at the lowest possible cost. Both metrics influence the overall cost of your DR strategy and in some cases, costs increase as they get closer to zero. However, not all technologies result in a dramatic increase in cost as RTOs and RPOs are reduced. This is why it’s important for organizations to carefully weigh all the options when establishing their cost/protection ratio.
Veristor’s Michael J. Stolarczyk provides top insights into how to ensure corporate Data remains intact.
Calculating Your RTO
RTO is often misinterpreted as the actual time it takes to restore a system. Simply put, RTO represents the maximum amount of time allowed to restore mission-critical systems and applications before the business suffers irreparable damage. In other words, it is the window of time when end users cannot perform their jobs, but the company can still recover.
There are key factors to consider when devising an acceptable RTO. First, the metric should focus on individual applications rather than entire servers, so the company can assign the right levels of priority. For example, email systems may need to be recovered in less than an hour, so employees can stay connected with customers and partners, whereas an accounting program may be able to wait until the next day to be restored without an actual financial impact to the business. Secondly, the RTO clock should start ticking the moment vital systems or applications no longer function; not when the disaster is identified.
These factors make RTO a very business-specific metric, which must be calculated carefully. Each business is unique, as are its applications and systems. Therefore, it is important to establish a thorough understanding of how the business operates and which systems are most vital to critical functions such as processing orders and transactions, moving money and delivering services to third-party partners and customers. Only then can a business truly identify its RTO needs and develop a successful recovery plan.
Calculating Your RPO
RPO is the allowable amount of data loss within systems and applications that may occur during a disaster without permanently jeopardizing business operations. Thus, the RPO dictates the frequency in which backups need to be performed to defend against crippling data loss. While an RPO of zero is often desired, the optimal solution lies with limiting data loss while maintaining vital operations. The right RPO can then be used to determine which technologies best support the overall DR and data protection strategy.
Determining Acceptable Loss
If you ask your end users how long they can perform their responsibilities without some of the core business applications or how long they can wait between backups, you’ll likely get answers that range from 15 seconds to 72 hours. As well-intentioned as it may be, factoring in the end user perspective alone can drive DR costs to unacceptable levels. Therefore, it’s helpful to take the following steps:
- Create a list of all vital systems and applications that are involved in business operations
- Note the functions each system and application perform and who would be affected if they were unexpectedly offline, such as employees, customers and partners
- Application-specific change management documentation and existing DR runbooks can be an excellent reference for this information
- Estimate the potential losses and business impact for each system or application such as reduced sales, customer attrition, public perception and lost employee productivity
- Consider if sales cycle or time of the year affect any of the above
The next step in determining your acceptable downtime is figuring out at exactly what point the above losses become irreparable. The following is a sample of questions that you might consider:
- Do our customers need real-time, continuous access to specific data and services?
- Are we responsible for protecting our customers’ data and guaranteeing it is accessible to them at all times?
- What systems and applications rely on each other?
- What systems and applications will create financial losses if they are unexpectedly offline or irreparably destroyed?
- How long can these systems be unavailable without a noticeable impact to the business?
- What would be the estimated revenue impact if these systems are offline for a longer period than the established RTO or if the RPO is not met and more data is lost than expected?
By asking these questions, you can get a better sense of acceptable recovery time and necessary backup frequency for each of your systems and applications. This approach to calculating the metrics that work best for your organization also makes the DR plan more cost effective since you won’t be spending money to achieve extremely low RPOs and RTOs that may not provide measurable value.
It’s an Art and a Science
One of the most difficult aspects of establishing your RTO and RPO is finding the right solution provider to support them. Solution providers often define their offerings and set their prices using various methods that can get be complicated and, at times, confusing. This can make the process of choosing the right partner a challenge.
Some DR providers ask prospective customers to fill out questionnaires and surveys to summarize their needs and objectives. Considering the if/then scenario questions included above, it becomes clear that a simple form will not sufficiently collect critical information and address specific and unique business requirements. Providers may also charge an additional premium for how close the RPO is to zero; this tends to be an arbitrary number instead of a metric based on actual information.
Unfortunately, another common practice is for providers to lock customers into contracts before providing evidence that they can deliver as advertised and can meet the agreed upon RTOs and RPOs. Lastly, some providers will charge a lower contractual rate on an annual or monthly basis, but then make up for it with unusually high “failover” fees or “testing” fees. The above examples are potential red flags, which should encourage businesses to continue evaluating potential DR providers to find the best match for their needs and requirements.
The successful execution of your DR plan requires your provider to understand both the art and the science of this important job. Whichever partner you choose should have the proper infrastructure, processes and people dedicated to supporting the service, maintaining your RTOs, RPOs and overall DR readiness and protecting your business – 24x7x365. Regular testing and execution of the DR runbook is essential, as is post-test debriefing and updating the runbook as needed based on changes observed since the last test. Most importantly, the relationship between a business and its DR provider should be ongoing, collaborative and based on mutual success.
To learn more on how to defend against disasters and protect your data-driven business, please visit: https://veristor.com/services/managed-services/