Krescendo Disaster Recovery Test

On Sunday February 23rd 2014 Krescendo executed their annual Disaster Recovery (DR) test. The scope of this test was to simulate a complete outage of the primary data center.

What does Krescendo’s DR Infrastructure consist of?
Krescendo have a completely redundant set of servers in a secondary datacenter. MySQL database servers are kept in sync by using replication through a secure SSH tunnel, MongoDB nodes by MongoDB clustering over secure SSL, and the file systems by running mirror scripts over secure SSH sessions. No special action was required for file attachments managed through our File Manager or for events logged through our Event Horizon service infrastructures: this is because they use clustering, are self replicated securely via HTTPS REST APIs and the secondary node is located in the DR site.

How was the DR test executed?
1. Shutting down services in the primary site
2. Making DNS changes so that application/service access was routed to the secondary site
3. Running a set of test cases
4. Users testing
5. Running steps 1-4 to return to the primary site

Was the DR test a success?
On the whole the test was a success.
Krescendo was able to confirm that all applications and services were up-to-date and fully operational in DR within the agreed SLAs. Testing highlighted non-critical areas where configuration can be adjusted for improved efficiency and controls in a disaster recovery event, confirming the importance of performing realistic simulations on a planned basis.

If you are interested in finding out more, contact us!