We are looking at resiliency capabilities for SaaS applications. Our internal BIA's reveal business recovery requirements such as 4 hours, 12 hours, 24 hours, etc. We are finding that when we review vendor offerings in our contracts or other publicly available documents, SaaS providers are not talking about recovery times, but availability times (99.999% uptime). We are trying to figure out a good way to reconcile recovery time requirements against availability statistics. Has anyone done any work in this space?
Sort by:
We recently pushed a SaaS vendor to agree on 30mins RTO. This was only possible since the vendor product was deployed on a dedicated cloud infrastructure and not a shared tenant. In addition, the product was running on open source and scalable architecture that runs AWS, mongoDB, and elastic search components independently offering flexibility to scale.
In the SaaS space, even Microsoft doesn’t offer less than 8hours of RTO commitment. All will vouch for availability with multi-region deployments.
Honestly, it would really depend on the product architecture and if it will support that aggressive recovery time. Most SaaS vendors run multi-tenant logical separation that adds to additional complexities. Some vendors may commit to 4hours RTO but will charge accordingly.

Availability and recovery time should not be combined into one metric because they measure different things. Availability reflects the provider's uptime, while recovery time measures how fast your team can recover after an incident, and combining them would blur the responsibility and focus of each metric.