Powerful Backup Strategies for Publisher Digital Content

22 May 2017

backup, strategies, incremental backup, full backup, local backup, remote backup, disaster recovery, data survival

When working in a digital environment, backing up your valuable content is as important as creating it in the first place. We discuss our strategies to ensure valuable publisher content is always safe and ready for business.


As with any backup, it is important to consider which backup type is best suited to your own organization's needs. Backup and recovery is essential for any digital content management system. Not having a verified backup and recovery procedure, puts your data at risk of loss.

It is really important to have your backup and disaster recovery strategy in place. After all, it is in a live system with real data changing constantly.

Ensure you have a systems disaster recovery strategy in place that protects your data in order to make sure a disaster is just an short-term inconvenience. People often learn this only after a major catastrophy and their data is lost for good. All attempts at recovering lost data loss will take up both a lot of your time and your money.

You need a powerful and reliable backup strategy to minimize lost time from disaster such as server failure or human error such as file deleted or any other possible reason. Protect your business, avoid downtime, and to save money and time.

Types of backups

There are a number of backup types used when it comes to backing up of your valuable digital content. We these and concentrate on the types of backup used by Infogrid Pacific.

Full backup

Full backup is a method of backup where all the files and folders selected for the backup will be backed up. When subsequent backups are run, the entire list of files and will be backed up again. The advantage of this backup is restores are fast and easy as the complete list of files are stored each time. The disadvantage is that each backup run is time consuming as the entire list of files is copied again. Also, full backups take up a lot of storage space when compared to incremental or differential backups. A full backup is the start of every first time new backup strategy for a software deployment.

Incremental backup

An incremental backup is a backup of all changes made since the last backup. With incremental backups, one full backup is done first and subsequent backup runs are just recording the changes made since the last backup. The result is a much faster backup then a full backup for each backup run. Storage space used is much less than a full backup and less than with differential backups. Restores are slower than with a full backup and a differential backup. The disadvantage is that with incremental backups, the data is retained in the backup location, even when not available in the original server.

Backup Storage location

Local backups - All Infogrid Pacific Local servers are Incrementally backed up on a Daily and Weekly basis, onto an external hard drive.

Local incremental backups are any kind of backup where the storage medium is kept close at hand. Since the backups are always close at hand they are fast and convenient to restore.

Remote backups - If you are using Amazon EC2 or a cloud service as we do, then all servers are Incrementally backed up on Daily basis, onto a Amazon S3 location. Its not a hard and fast rule that IGP deployments need to be done on AWS only. It can be done on any other hosted environment or physical server, depending on the end clients needs and requirements.

The portals.publisherecms.com, which is a SAAS model of IGP:Digital Publisher, is deployed on an AWS xlarge instance in the US East (N. Virginia) region, and the backup is done to EU Ireland region. Please note the two different regions, that is important if your goal is High Availability of your backup data. Any accidental downtime is not supposed to affect more than one Amazon EC2 availability zones simultaneously.

A good suggestion would be to employ both local and remote backup so you’re sure, that against all the odds, you have your data secured.

Frequency of backups

As for our applications, we would recommend full weekly backup and an incremental daily backup. The timing of incremental backups depends on the velocity of content change in a system. Publisher content production tends to be relatively "slow-motion" so a 24 hour strategy is workable when calculating lost time in a disaster. More dynamic business content systems generally require full replication.

All Infogrid Pacific in-house servers are Incrementally backed up on daily and weekly basis, once in a day, onto an external hard drive.

For applications deployed on AWS instance, all Infogrid Pacific LIVE servers are Incrementally backed up on daily basis, once in a day, onto a S3 location. Additionally a snapshot of the instance is also taken every fortnightly.

Backup notifications

The backup are run automatically on all our servers. We set our servers to regularly take backups in the middle of the night. Backups are triggered as a cronjob, at a scheduled time, using backup scripts created for the purpose.

Notification email are delivered to the concerned persons with the logs attached containing complete status information right from the start to the end of the backup.

What to backup?

All Infogrid Pacific publisher solutions are web applications and have a number of distinctly separate components stored to empower dynamic updating and of course reliable, predictable and powerful back and restore strategies.

  • Application databases
  • Application software (stored in /var/opt/)
  • Application data (stored in /data/)
  • Application components files (stored in /var/www/)
  • System configuration files (stored in /etc/ , /opt/ , /var/spool/cron)

This pattern applies to IGP:Digital Publisher, IGP:Distribution Manager and AZARDI:Content Fulfilment. IGP:ECMS is similar but has extended storage strategies for protection of archive preservation packages and  Dissemination packages to conform to the OAIS (Open Archive Information System) model.

Test your backups

It would be completely wrong to assume that everything will be fine once you set up backup policies and procedures, schedule automated backup scripts, start running backups. The backup taken should be tested against a defined procedure of restoring the application to a working state. A backup must provide restoration of a single file or restoration of the entire system.

Testing your backup is important since it verifies that your backups actually contain your data and that you know the steps to follow to recover from a data loss. No matter how sophisticated or comprehensive your backup system is, you will never know if it works unless you actually test it. Without testing, you can have no confidence at all.

It is preferable to do a complete restoration of all your data to a second system with an identical configuration. Infogrid Pacific does full restoration rehearsals every six months to to make sure the system is humming and to identify any backup strategy change requirements and to understand the lost business time (and therefore cost) for a full recovery to be executed. This is relatively cost-effective and straight-forward to do in "the Cloud" as suitable resources only need to be deployed for the rehearsal time.

Posted by Savio Barretto (Test and Deployment Manager)

comments powered by Disqus