Abstract

We were recently asked to move an Informix Dynamic Server (IDS) Workgroup Edition (WE) version 14.10.FC8 High-availability Data Replication (HDR) server pair to new hardware in another country. The application is a critical 24×7 public service, and only a short break in service would be tolerated. The instance has around 1.5TB used pages, so we needed to have the databases replicated at the new site beforehand. That was a problem because WE only allows 3 replicas to be connected, and there is also a Remote Standalone Secondary (RSS) instance which was not moving. This article explains how it was achieved.

Content

The process was devised and tested in an Informix Developer Edition Docker container using the latest image from GitHub. The necessary steps were evolved and automated in this test environment. For the live change, production Informix server names were substituted and commands were selectively entered manually on the relevant hosts in turn, but otherwise the technique worked perfectly entirely unchanged.

The following script was used to set up 5 test instances in a state comparable to the live systems before the change:

Copy to Clipboard


That used these utilities:

Copy to Clipboard
Copy to Clipboard


The first trick is how to create a cluster on the same host for testing purposes. A replica must have the same chunk device paths as the primary, but must also have its own storage. This is possible by using relative paths, and starting the instances while in different directories containing “cooked” chunks in the file system. That’s why the “cluster” script above changes directory as well setting the required environment variables.

The “all” script runs a command against all 5 instances after setting the instance with “cluster” in each case.

The second trick is to use Continuous Log Restore (CLR) to get new servers ready to take over replication roles. CLR instances are off-line and exempt from the 3 instance WE limit. In a real case, there would be a cron job for user “informix” to apply new logical log backups on each CLR instance in the interim, for example in our test scenario:


That script contains:

Copy to Clipboard


It’s obviously also essential that the CLR servers can see logical log backups saved by the primary. That’s not an issue for the test as all instances are on the same machine with LTAPEDEV set to the same directory. In a real scenario, the servers should share an NFS, or the primary should distribute log backups using “rsync” after each has completed.

The script that tests the cut-over process (after disabling CLR cron jobs) is:

Copy to Clipboard


The third trick is that, although you can have only 3 instances including the primary connected simultaneously in a WE cluster, you can have any number of other RSS servers defined but disconnected. We can use this fact to switch the primary and HDR roles to the new servers without needing any level 0 restore, as long as we do the required steps in the right order. Otherwise, you would get messages like this when trying to connect another:


The fourth trick is that you can convert a CLR instance to be the primary or secondary in an HDR pair using the normal “onmode -d” commands as long as it is then restarted.

The steps in the above script are:

  1. You can only have one HDR but there can be multiple RSS servers. We therefore first run the command on the old HDR server to convert it to an RSS, then stop the instance.
  2. We can then undefine HDR on the old primary, and redeclare HDR to be with what will become the new primary.
  3. That server is then started up, made the HDR secondary, restarted, and the roles reversed.
  4. The previous step leaves the old primary stopped, so it is restarted with “oninit -PHY” (no logical recovery), changed from HDR to RSS, and stopped.
  5. The old HDR – now an RSS server – is brought back up to redeclare the new primary and then stopped.
  6. The new HDR server is declared on the new primary.
  7. The new HDR server is configured and restarted.

Some steps complete in background, which is why there are several “sleep 30” statements where necessary to ensure the previous command is fully effective before continuing.

The old servers – although stopped – are left configured as RSS secondaries to the new primary, so it would be possible to revert the change without any level 0 restore. You can check the status of the whole cluster at any time on the primary as in the following example at the end of the test:


For the test, empty chunk files were created in advance in each location as user “informix” with the required permissions, for example:


Distinct TCP ports were set for each instance (* means listen on every IP address):


Only 3 parameters needed to be different in each configuration file, for example:


Other essential differences from the standard configuration file for the test to work were:

Caveats

Old and new hosts must be networked together, running the same IDS version, and trusted for user “informix”.

The backup mechanism is assumed to be ontape to a directory. You will need to replace those commands with onbar equivalents if using the Informix Primary Storage Manager (PSM) or similar third party product.

Conclusion

This article provides a way to migrate a workgroup cluster to a new site with a downtime of only a few minutes.

Disclaimer

Suggestions above are provided “as is” without warranty of any kind, either express or implied, including without limitation any implied warranties of condition, uninterrupted use, merchantability, fitness for a particular purpose, or non-infringement.

Contact Us

If you have any questions or would like to find out more about this topic, please contact us.

Author