Informix Workgroup cluster relocation

Abstract

We were recently asked to move an Informix Dynamic Server (IDS) Workgroup Edition (WE) version 14.10.FC8 High-availability Data Replication (HDR) server pair to new hardware in another country. The application is a critical 24×7 public service, and only a short break in service would be tolerated. The instance has around 1.5TB used pages, so we needed to have the databases replicated at the new site beforehand. That was a problem because WE only allows 3 replicas to be connected, and there is also a Remote Standalone Secondary (RSS) instance which was not moving. This article explains how it was achieved.

Content

The process was devised and tested in an Informix Developer Edition Docker container using the latest image from GitHub. The necessary steps were evolved and automated in this test environment. For the live change, production Informix server names were substituted and commands were selectively entered manually on the relevant hosts in turn, but otherwise the technique worked perfectly entirely unchanged.

The following script was used to set up 5 test instances in a state comparable to the live systems before the change:

migrate-cluster-1.sh

Copy to Clipboard

# TEST SETUP

# Local definitions
                                                ARCH=/opt/ibm/backup/arch/ifx_1_L0
                                                LOGS=/opt/ibm/backup/logs
# Abort if no level 0 archive today
                                                find $ARCH -daystart -mtime 0 | egrep -q . ||
                                                exec echo "No $ARCH dated today"
# Clear logical logs
                                                rm -f $LOGS/*
# Physical restore of all instances
                                                all "onmode -ky ; ontape -p -d -t STDIO < $ARCH"
# Old Primary
                                                . cluster 1
                                                                onmode -m
                                                                onmode -d primary cluster2
                                                                onmode -d add RSS cluster3
# Old HDR
                                                . cluster 2
                                                                onmode -d secondary cluster1
# Old RSS
                                                . cluster 3
                                                                onmode -d RSS cluster1
# New primary (CLR)
                                                . cluster 4
                                                                ontape -l -C -d
# New HDR (CLR)
                                                . cluster 5
                                                                ontape -l -C -d

That used these utilities:

all

Copy to Clipboard

cluster

Copy to Clipboard

The first trick is how to create a cluster on the same host for testing purposes. A replica must have the same chunk device paths as the primary, but must also have its own storage. This is possible by using relative paths, and starting the instances while in different directories containing “cooked” chunks in the file system. That’s why the “cluster” script above changes directory as well setting the required environment variables.

The “all” script runs a command against all 5 instances after setting the instance with “cluster” in each case.

The second trick is to use Continuous Log Restore (CLR) to get new servers ready to take over replication roles. CLR instances are off-line and exempt from the 3 instance WE limit. In a real case, there would be a cron job for user “informix” to apply new logical log backups on each CLR instance in the interim, for example in our test scenario:

* * * * * . ./.profile ; for i in 4 5 ; do . cluster $i ; clr.sh ; done

 
* * * * * . ./.profile ; for i in 4 5 ; do . cluster $i ; clr.sh ; done

That script contains:

clr.sh

Copy to Clipboard

It’s obviously also essential that the CLR servers can see logical log backups saved by the primary. That’s not an issue for the test as all instances are on the same machine with LTAPEDEV set to the same directory. In a real scenario, the servers should share an NFS, or the primary should distribute log backups using “rsync” after each has completed.

The script that tests the cut-over process (after disabling CLR cron jobs) is:

migrate-cluster-2.sh

Copy to Clipboard

# GO LIVE

# Make old HDR an RSS and stop
                                                . cluster 2
                                                                onmode -d RSS cluster1
                                                                sleep 30
                                                                onmode -ky
# Specify new primary as temp HDR
                                                . cluster 1
                                                                onmode -d standard
                                                                onmode -d primary cluster4
# Bring up temp HDR and make new primary
                                                . cluster 4
                                                                onmode -d secondary cluster1
                                                                sleep 30
                                                                onmode -ky
                                                                oninit -w
                                                                onmode -d make primary cluster4
# Make old primary an RSS and stop
                                                . cluster 1
                                                                oninit -PHY
                                                                onmode -d secondary cluster4
                                                                sleep 30
                                                                onmode -d RSS cluster4
                                                                sleep 30
                                                                onmode -ky
# Make old HDR an RSS to new primary and stop
                                                . cluster 2
                                                                oninit -w
                                                                onmode -d RSS cluster4
                                                                sleep 30
                                                                onmode -ky
# Specify new HDR
                                                . cluster 4
                                                                onmode -d primary cluster5
# Bring up new HDR
                                                . cluster 5
                                                                onmode -d secondary cluster4
                                                                sleep 30
                                                                onmode -ky
                                                                oninit -w

The third trick is that, although you can have only 3 instances including the primary connected simultaneously in a WE cluster, you can have any number of other RSS servers defined but disconnected. We can use this fact to switch the primary and HDR roles to the new servers without needing any level 0 restore, as long as we do the required steps in the right order. Otherwise, you would get messages like this when trying to connect another:

12:46:29  Unexpected RSS Authentication Failure during Initialization Phase
12:46:29  This edition of IBM Informix Dynamic Server supports only 2 secondary nodes in
a cluster. Stopping connection attempt.

 
12:46:29  Unexpected RSS Authentication Failure during Initialization Phase12:46:29  This edition of IBM Informix Dynamic Server supports only 2 secondary nodes ina cluster. Stopping connection attempt.

The fourth trick is that you can convert a CLR instance to be the primary or secondary in an HDR pair using the normal “onmode -d” commands as long as it is then restarted.

The steps in the above script are:

You can only have one HDR but there can be multiple RSS servers. We therefore first run the command on the old HDR server to convert it to an RSS, then stop the instance.
We can then undefine HDR on the old primary, and redeclare HDR to be with what will become the new primary.
That server is then started up, made the HDR secondary, restarted, and the roles reversed.
The previous step leaves the old primary stopped, so it is restarted with “oninit -PHY” (no logical recovery), changed from HDR to RSS, and stopped.
The old HDR – now an RSS server – is brought back up to redeclare the new primary and then stopped.
The new HDR server is declared on the new primary.
The new HDR server is configured and restarted.

Some steps complete in background, which is why there are several “sleep 30” statements where necessary to ensure the previous command is fully effective before continuing.

The old servers – although stopped – are left configured as RSS secondaries to the new primary, so it would be possible to revert the change without any level 0 restore. You can check the status of the whole cluster at any time on the primary as in the following example at the end of the test:

$ onstat -g cluster

IBM Informix Dynamic Server Version 14.10.FC10DE -- On-Line (Prim) -- Up 00:09:58 -- 285264 Kbytes

Primary Server:cluster4
Current Log Page:4626,417
Index page logging status: Enabled
Index page logging was enabled at: 2024/02/08 10:35:28

Server   ACKed Log    Applied Log  Supports     Status
         (log, page)  (log, page)  Updates
cluster1 0,0          0,0          No           ASYNC(RSS),Disconnected,Inactive
cluster2 0,0          0,0          No           ASYNC(RSS),Disconnected,Inactive
cluster3 4626,417     4626,417     No           ASYNC(RSS),Connected,Active
cluster5 4626,417     4626,417     No           NEAR_SYNC(HDR),Connected,On

​x
 
$ onstat -g cluster​IBM Informix Dynamic Server Version 14.10.FC10DE -- On-Line (Prim) -- Up 00:09:58 -- 285264 Kbytes​Primary Server:cluster4Current Log Page:4626,417Index page logging status: EnabledIndex page logging was enabled at: 2024/02/08 10:35:28​Server   ACKed Log    Applied Log  Supports     Status         (log, page)  (log, page)  Updatescluster1 0,0          0,0          No           ASYNC(RSS),Disconnected,Inactivecluster2 0,0          0,0          No           ASYNC(RSS),Disconnected,Inactivecluster3 4626,417     4626,417     No           ASYNC(RSS),Connected,Activecluster5 4626,417     4626,417     No           NEAR_SYNC(HDR),Connected,On

For the test, empty chunk files were created in advance in each location as user “informix” with the required permissions, for example:

umask 007
touch rootdbs

 
umask 007touch rootdbs 

Distinct TCP ports were set for each instance (* means listen on every IP address):

informix@ifx:~$ fgrep cluster $INFORMIXDIR/etc/sqlhosts
cluster1    onsoctcp    *localhost    9088
cluster2    onsoctcp    *localhost    9090
cluster3    onsoctcp    *localhost    9092
cluster4    onsoctcp    *localhost    9094
cluster5    onsoctcp    *localhost    9096

 
informix@ifx:~$ fgrep cluster $INFORMIXDIR/etc/sqlhostscluster1    onsoctcp    *localhost    9088cluster2    onsoctcp    *localhost    9090cluster3    onsoctcp    *localhost    9092cluster4    onsoctcp    *localhost    9094cluster5    onsoctcp    *localhost    9096

Only 3 parameters needed to be different in each configuration file, for example:

MSGPATH            /opt/ibm/data/logs/cluster1.log
SERVERNUM          1
DBSERVERNAME       cluster1

 
MSGPATH            /opt/ibm/data/logs/cluster1.logSERVERNUM          1DBSERVERNAME       cluster1

Other essential differences from the standard configuration file for the test to work were:

ROOTPATH           rootdbs.000            # Chunks in current directory (no full path)
TAPEDEV            /opt/ibm/backup/arch   # Shared directory for ontape level 0 archives
LTAPEDEV           /opt/ibm/backup/logs   # Shared directory for ontape logical log backups
LOG_INDEX_BUILDS   1                      # Mandatory for cluster with RSS servers

 
ROOTPATH           rootdbs.000            # Chunks in current directory (no full path)TAPEDEV            /opt/ibm/backup/arch   # Shared directory for ontape level 0 archivesLTAPEDEV           /opt/ibm/backup/logs   # Shared directory for ontape logical log backupsLOG_INDEX_BUILDS   1                      # Mandatory for cluster with RSS servers

Caveats

Old and new hosts must be networked together, running the same IDS version, and trusted for user “informix”.

The backup mechanism is assumed to be ontape to a directory. You will need to replace those commands with onbar equivalents if using the Informix Primary Storage Manager (PSM) or similar third party product.

Conclusion

This article provides a way to migrate a workgroup cluster to a new site with a downtime of only a few minutes.

Disclaimer

Suggestions above are provided “as is” without warranty of any kind, either express or implied, including without limitation any implied warranties of condition, uninterrupted use, merchantability, fitness for a particular purpose, or non-infringement.

Contact Us

If you have any questions or would like to find out more about this topic, please contact us.

Author

Doug Lawry
Senior Informix Consultant

Informix Workgroup cluster relocation

Abstract

Content

migrate-cluster-1.sh

all

cluster

clr.sh

migrate-cluster-2.sh

Caveats

Conclusion

Disclaimer

Contact Us

Author

Related Technical Articles:

Share this story: