Big data implementations that produce value drive lower costs and higher profits. Hence, at some point you must implement a disaster recovery plan. And, since this requirement may come with little warning, the database administrator and other support staff should take proactive steps during the first big data implementation.

Review storage needs, network capacity, hardware capabilities and software license requirements at the beginning of your implementation. Have this data published and available to management before it becomes critical. This allows your enterprise to budget and plan for its needs in advance.

Both application designers and database administrators sometimes take the simplistic view that regular backups of application data are sufficient for any recovery needs. The strategy of weekend backups can easily backfire!  Backup methods that meet the application’s and enterprise’s needs start with a sound recovery strategy.  Further, this strategy must be applied from the beginning, starting with the big data database and application design.

Two factors drive which recovery options are used for a big data application:

The recovery time objective (RTO) -- During a recovery scenario, how long can the application data (or portions of the data) be unavailable?
The recovery point objective (RPO) -- During a recovery scenario, to what point must data be recovered?  To a specific date/time? To the end of the most recently completed transaction?
For a big data implementation, the choice of recovery point is straightforward. The most common situation is a period of extract, transform, and load (ETL) of operational data from legacy systems into the big data store, followed by multiple analytical applications that query the data. The most commonly chosen recovery point is immediately after loading is complete.

Backup and recovery strategies are driven by this choice. For example, if the preferred method of backup is database image copies, these can be scheduled to begin at the time of the recovery point. These backups will not interfere with applications because analytics involves querying, not updating. Of course, the database administrator must ensure that all backups complete within a reasonable time; backups taking more than 24 hours will interfere with the next day’s processing.

Recovery time requirements are also easily defined. The recovery process must have data available for analytics within about 24 hours time. Any longer, and the recovery site may not be able to catch up with the additional daily operational data that must now be loaded.

Database administrators should elicit basic recovery time and recovery point objectives for any big data implementation as early as possible. Then they should review backup and recovery options, choose methods and procedures that meet the objectives, and document the results for future reference. As applications mature and the enterprise big data store grows, the designation of your big data as mission-critical won’t catch you unprepared.


Another common objection to doing disaster recovery planning for big data is the sheer size of the data store. Infrastructure staff believe that such a huge volume of data will take forever to back up, forever to recover, and take up immense quantities of backup storage. Luckily, several recent technical advances in hardware can mitigate these worries.


Most modern disk storage equipment has an optional disk mirroring facility. Mirroring is a process where changes on one disk drive are made to a corresponding dive in another location. This allows the support staff to implement disk copying and backup, data replication, and publish-subscribe applications using available hardware features without the need to code applications.

For backup and recovery purposes, the storage administrator designates a set of disk or a disk array to be the mirror (or backup) of another. The primary disks can be those of the big data store, with an array of backup disks in a secure location used as the mirrors. When the primary disks are updated during the ETL process the hardware automatically makes those updates on the mirrors.  In case of a disaster, the storage administrator defines the mirror disks to the operating system as the primary ones.  Suddenly, the data is restored and the application is available.



Leave a Reply.