What are some best practices disaster recovery approaches?

7.8k views1 Upvote6 Comments

Sort by:

Director Certifications in Education6 years ago

Regularly testing and when any significant changes are made to critical applications and infrastructure.

CTO in Software6 years ago

When we talk about the Web applications and need the plan of disaster recovery then we have lots of options for that apart from manually.

1. The traditional way is to have the backup of things locally with you (I really don't prefer this)
2. The automation of backups
2.1 We generally use the AWS S3 for this, we have created the bash scripts which can backup the complete code and data backup and put on the S3 bucket in different regions so that we can quickly deploy the backup again in case of disaster.
2.1 For the backup of code we use the code repository with the CI & CD tool so at least we can have the complete code backup from the starting and automated DB backup on RDS.

There can have lots of tools for recovery. You can get the idea from this table https://xebialabs.com/periodic-table-of-devops-tools/

CIO in Software6 years ago

Hi,
First, my assumption is that we are referring to DR and not BCP which address people and operations on top of the DR which is mostly technical. my recommendation is to adopt the assumption that not all services born equal, meaning u don't need to approach the whole eco system inn the same way, this will enable you to put priorities in place from the solution point of view and from the budget aspect, I believe that the solution should be driven by the time you wish to recover and then ask yourself to which % operational ratio are u willing to accept, as DR does not mean 100% operational with same SLA like normal days as again its not happening every day and budget is a big factor, second as yourself which point of time is acceptable to be up (again not all services born equal), another aspect is your SaaS solutions, in spite that manny thinks, many of them do not have DR solution of 100%, in some cases its a tier of support level, in some cases there is no SLA associated with the availability in case of DR, so reviewing and signing off those 3rd party solutions is extremely important, tech wise there are many solutions today including SaaS ones which I would consider and unless you have already Active Active eco system you are operating from, I would consider not building such by default but exploring 3rd party providers, another important and maybe the most important, usually the focus is on backup or mirroring data etc...but the real service in DR is the restore or switching to the mirrored cluster, there for the most important tip I have to say here is test your self between 1-2 times a year, you can divide services to two and then test diff ones each drill, have a war room, document it all and conduct a take-in (Leasonn learned) post drill as I am sure you will find dependencies many did not consider in the initiall phaseof design. define a kpi's to tell the story of the DR, how much was automated/manual, time to recover, knowledge in place. one last thing which very important to look at is the "back to normal" phase, this must to be planned very well and confirm that both DR plans and back to normal is aligned with business needs and customer contract in case you have appendix which addresses availability and service levels, hope this was useful to you and maybe others.

IT-chef / Director IT in Energy and Utilities6 years ago

... whoops... also have a system in place for incident reporting , analyze and learn from this to improve you swift recovery

IT-chef / Director IT in Energy and Utilities6 years ago

A redundant site as pointed out above is very good and a hygene factor. How ever, when it happens the following is good 1. have your data classified in priority of criticality 2. define the master application/system, 3 if application/system is on prem know which server its on 4. rank apps on the server, so if completely down its all tracable and u know which application/system that hold critical data to start first of all other things. 5. a matrix of reset report progress in time 4 hrs, 8 hrs etc with report responsible (whos doing what when). Depending on what type of business or nation security naturally a communication device that works when all else is down. Also, actually train your staff for this as it will happen in various degrees of disaster level.

Content you might like

Do you think quick patching is key to security, or is testing patches just as important as applying them?

Speed is key!33%

Balance (test then patch)57%

It depends on the severity of the bug9%

View Results

How do you think AI will disrupt business across industries? Add to my list: 1. Content creation 2. Photos and video production 3. Basic coding and debugging 4. Strategic analysis to be highly complimented

Security through obscurity is no security at all.

Strongly agree21%

Agree63%

Disagree12%

Strongly disagree2%

View Results

For large-scale transformations, would you consider complementing your technology and process work with consulting focused on leadership alignment, mindset transformation, and accelerating measurable results across teams?

What are some best practices disaster recovery approaches?

Sort by:

Content you might like

Do you think quick patching is key to security, or is testing patches just as important as applying them?

How do you think AI will disrupt business across industries? Add to my list: 1. Content creation 2. Photos and video production 3. Basic coding and debugging 4. Strategic analysis to be highly complimented

Security through obscurity is no security at all.

For large-scale transformations, would you consider complementing your technology and process work with consulting focused on leadership alignment, mindset transformation, and accelerating measurable results across teams?

Looking ahead, what do you think will be the biggest challenge IT leaders face in the next 2–3 years, and how are you preparing for it now?

What sets us apart?

RELATED ONE-MINUTE INSIGHTS

How are U.S. CISOs Addressing Liability Risk?

CrowdStrike Outage: Impact And Recovery

Impact & Perceptions of A Privacy-First Digital Era

Challenges & Tools For A Privacy-First Internet Age

IT and Infosec Collaboration on Vulnerability Patching

Take Your Insights On-the-Go