Abstract
This relatively new feature, documented here, was introduced with IBM Informix Dynamic Server (IDS) version 12.10.xC11 in March 2018, enabling replication of tables within the same database or instance. It is an enhancement to Enterprise Replication (ER) which was previously only supported between different instances. This article gives an example of how it can reduce downtime when slow schema changes are required on large tables in continuous use.
Content
Links in this article not shown in full are for pages within the new IBM Documentation site:
https://www.ibm.com/docs/en/informix-servers/14.10
Included ER “cdr” (continuous data replication) commands are clickable documentation links and not described in detail for brevity.
Existing relevant articles and demonstrations are here:
https://github.com/nagaraju-inturi/loopback-replication
https://databasemusing.wordpress.com/2020/06/15/loopback
Informix Tech Talks: Informix Large Table Operations by Ben Thompson
The test system used for this article is an IDS 14.10.FC5 Developer Edition Linux Docker container created with the image available from:
https://hub.docker.com/r/ibmcom/informix-developer-database
If you are not already using ER, you will need two new dbspaces – with your preferred names – to hold the ER “syscdr” database and smart blobs which contain transaction contents, as was done for this article in the Docker container as follows:
eval cd $(dirname $(onstat -c ROOTPATH))
(umask 006 ; touch cdrdbs.000 cdrsbs.000)
onspaces -c -d cdrdbs -p $(pwd)/cdrdbs.000 -o 0 -s 204800
onspaces -c -S cdrsbs -p $(pwd)/cdrsbs.000 -o 0 -s 512000 -Df "AVG_LO_SIZE=2,LOGGING=ON"
|
The sizes above are the defaults used by “cdr migrate server“. You may wish to reduce the estimated average blob size to the minimum of 2KB (default 32KB) to avoid running out of metadata space, and/or enable logging if part of a high-availability cluster (default OFF), by appending one or both of the qualifiers shown above in blue.
Corresponding configuration parameters then need updating, together with two others as usually recommended:
cdr change config "CDR_DBSPACE cdrdbs" cdr change config "CDR_QDATA_SBSPACE cdrsbs" cdr change config "CDR_QUEUEMEM 262144" cdr change config "CDR_SUPPRESS_ATSRISWARN 1,2,3" |
There is another less well documented and non-essential parameter:
$ onstat -g cfg full CDR_QHDR_DBSPACE IBM Informix Dynamic Server Version 14.10.FC5X4DE -- On-Line -- Up 1 days 03:26:59 -- 472196 Kbytes Configuration Parameter Info id name type maxlen units rsvd tunable 157 CDR_QHDR_DBSPACE CHAR 129 default : rootdbs onconfig: current : Description: Specifies the location of the dbspace that Enterprise Replication uses to store the transaction record headers spooled from the send and receive queues. The CDR_QHDR_DBSPACE configuration parameter specifies the location of the dbspace that Enterprise Replication uses to store the transaction record headers spooled from the send and receive queues. By default, Enterprise Replication stores the transaction record headers in the root dbspace. For more information, see Transaction Record dbspace. Restriction: Do not change the value of CDR_QHDR_DBSPACE after you initialize Enterprise Replication. |
If CDR_DBSPACE is set, the default for the above is in fact that and not “rootdbs” as stated. Specifically, setting CDR_QHDR_DBSPACE specifies the dbspace for the following “syscdr” tables (verified by testing):
blkdelete control_send_stxn trg_receive_stxn trg_send_srep trg_send_stxn |
The next prerequisite step is to add a loopback connector (red) and ER group information (blue) in the “sqlhosts” file as described here and in the following example:
g_informix group - - i=1 informix onsoctcp *localhost 9088 g=g_informix g_loopback group - - i=2 loopback onsoctcp *localhost 9090 g=g_loopback |
Each asterisk instructs IDS to listen on all IP addresses available when started, which is not an ER requirement, but is often useful.
The $ONCONFIG file should be updated with the additional alias, for example:
DBSERVERNAME informix
DBSERVERALIASES loopback
|
A new listener thread can then be started and tested with:
onmode -P start loopback echo | INFORMIXSERVER=loopback dbaccess sysmaster |
ER can now be initiated with:
cdr define server -I g_informix cdr define server -I g_loopback -S g_informix |
The “syscdr” database will now have been created in the dbspace specified by CDR_DBSPACE.
We can now experiment with loopback replication. The following table was created in empty database “test”:
CREATE TABLE test_table ( serial_no SERIAL NOT NULL PRIMARY KEY, batch_name VARCHAR(255) NOT NULL, batch_row SMALLINT NOT NULL, UNIQUE (batch_name, batch_row), CHECK (batch_row > 0) ); |
For brevity, the above does not follow best practice of explicitly named indexes and constraints.
Sample data was inserted with:
CREATE PROCEDURE sp_test_data(nrows SMALLINT) -- Much quicker in a transaction to reduce log buffer flushing. -- Assumes few enough rows to avoid "Long Transaction Aborted". -- Table exclusively locked to prevent lock structure growth. DEFINE n LIKE test_table.batch_name; DEFINE r SMALLINT; LET n = CURRENT YEAR TO SECOND; BEGIN WORK; LOCK TABLE test_table IN EXCLUSIVE MODE; FOR r = 1 TO nrows INSERT INTO test_table VALUES (0, n, r); END FOR COMMIT WORK; END PROCEDURE; EXECUTE PROCEDURE sp_test_data(4000); |
Imagine that “batch_name” is in fact being populated with DATETIME data by applications. We therefore want to alter the VARCHAR column to a more appropriate data type:
ALTER TABLE test_table MODIFY ( batch_name DATETIME YEAR TO SECOND NOT NULL ); |
However, by running the following SQL before and after that on a test system, we see that the “partnum” has changed, indicating that this is a slow alter and not an in-place alter:
SELECT partnum FROM systables WHERE tabname = 'test_table'; |
Note this relevant Request for Enhancement (RFE):
Add onconfig parameter to not allow slow alters
https://ibm-data-and-ai.ideas.aha.io/ideas/INFX-I-393
We therefore plan to reduce downtime on the live system by creating a replica table but with the improved data type in blue:
CREATE TABLE test_table_new
(
serial_no SERIAL NOT NULL PRIMARY KEY,
batch_name DATETIME YEAR TO SECOND NOT NULL,
batch_row SMALLINT NOT NULL,
UNIQUE (batch_name, batch_row),
CHECK (batch_row > 0)
);
|
Copying existing contents before setting up replication is quicker and lighter. In the example scenario, we have HDR and/or RSS replica instances. The new table must therefore be logged (not RAW) so that contents are propagated to secondary instances. We must therefore avoid “long transaction aborted” by splitting the operation into a few thousand rows per commit. If you have it or can compile it, the easiest method is to use “dbcopy” from Art Kagel’s open source ESQL-C “utils2_ak”:
https://www.askdbmgt.com/my-utilities.html
A solution would then be a shell command in this form:
dbcopy -d test -t test_table -T test_table_new -F -a -w 60 |
You could UNLOAD the table in “dbaccess” – or copy to an external table which is quicker – and then use the standard IDS “dbload” tool to break the job into transactions of a few thousand rows each, but that requires space in a file system, so might not be convenient. You might be able to get round that with FIFO devices, but that’s more complex.
Alternatively, write a stored procedure such as this:
CREATE FUNCTION sp_test_copy() RETURNING BIGINT AS rows_inserted; -- Only a few thousand rows should be committed per transaction: -- singleton inserts are too slow as flushes log buffer each time -- but must commit regularly to avoid "Long Transaction Aborted". -- Table is exclusively locked to further improve performance. DEFINE v_serial_no LIKE test_table_new.serial_no; DEFINE v_batch_name LIKE test_table_new.batch_name; DEFINE v_batch_row LIKE test_table_new.batch_row; DEFINE v_inserted BIGINT; LET v_inserted = 0; BEGIN WORK; LOCK TABLE test_table_new IN EXCLUSIVE MODE; FOREACH WITH HOLD -- otherwise COMMIT closes cursor SELECT * INTO v_serial_no, v_batch_name, v_batch_row FROM test_table INSERT INTO test_table_new VALUES (v_serial_no, v_batch_name, v_batch_row); LET v_inserted = v_inserted + 1; IF MOD(v_inserted, 2000) = 0 THEN COMMIT WORK; BEGIN WORK; LOCK TABLE test_table_new IN EXCLUSIVE MODE; END IF END FOREACH; COMMIT WORK; RETURN v_inserted; END FUNCTION; |
Bulk data loading is always much quicker with indexes and constraints disabled, and the resulting indexes will be more efficient, so the best process is therefore:
SET CONSTRAINTS FOR test_table_new DISABLED; SET INDEXES FOR test_table_new DISABLED; TRUNCATE test_table_new; -- if rerunning EXECUTE FUNCTION sp_test_copy(); -- or use dbcopy SET CONSTRAINTS FOR test_table_new ENABLED; SET INDEXES FOR test_table_new ENABLED; |
Note that index builds, which will be performed by the last statement above, are faster and more stable with sorting in file systems (instead of temp dbspaces) specified in environment variable PSORT_DBTEMP. Furthermore, see:
https://www.oninitgroup.com/faq-items/ram-disk-and-informix
We are now ready to enable replication between the two tables with this shell script:
set -e # exit on error DB=test # database name TO=informix # table owner TN=test_table # table name G1=g_informix # groups in sqlhosts G2=g_loopback T1=$TO.$TN # owner + table names T2=$T1\_new RP=repl_$TN # replicate name cdr define replicate $RP -C always -S row -A -R \ "P $DB@$G1:$T1" "select * from $T1" \ "R $DB@$G2:$T2" "select * from $T2" cdr start replicate $RP CT=check_$(date +%Y%m%d_%H%M%S) echo "Starting task $CT" cdr check replicate -r $RP -m $G1 -a -R -n $CT |
Parameters used to define the replicate:
Short | Long | Description |
---|---|---|
-C always | --conflict=always | "Use the always option if you do not want Enterprise Replication to resolve conflicts, but you do want replicated changes to be applied even if the operations are not the same on the source and target servers. Use the always-apply conflict resolution rule only with a primary-target replication system." This setting is appropriate for one-way replication. Must be specified. |
-S row | --scope=row | "Evaluate one row at a time and apply the replicated rows that win the conflict resolution with the target rows." This setting is appropriate for one-way replication. Default is "transaction". |
-A | --ats | "Activates aborted transaction spooling for replicate transactions that fail to be applied to the target database." Useful for troubleshooting, unlikely to occur with one-way replication. |
-R | --ris | "Activates row-information spooling for replicate row data that fails conflict resolution or encounters replication order problems." Useful for troubleshooting, unlikely to occur with one-way replication. |
"P ..." | "For primary-target replicates, specifies that the participant is a primary participant, which both sends and receives replicated data." Required for loopback. |
|
"R ..." | "For primary-target replicates, specifies that the participant is a receive-only target participant, which only receives data from primary participants." Required for loopback. |
Parameters used to check the replicate:
Short | Long | Description |
---|---|---|
-r repl | --repl=repl | "Specifies the name of the replicate to check." Replicate must be specified. |
-m | --master | "Specifies the database server to use as the reference copy of the data." Source must be specified in some way and this is the simplest. |
-a | --all | "Specifies that all servers defined for the replicate are checked." Target must be specified in some way and this is the simplest. |
-R | --repair | "Specifies that rows that are found to be inconsistent are repaired." This is the purpose of the check command in our script. |
-n task | --name=task | "Specifies that the progress of this command can be monitored. Information about the operation is stored under the specified progress report task name on the server on which the command was run." Cannot easily be monitored without the check task being named. |
Output looks like this:
Interpreting this replicate as a master replicate. Verification of test@g_informix:informix.test_table started Verification of test@g_informix:informix.test_table is successful Verification of test@g_loopback:informix.test_table_new started Verification of test@g_loopback:informix.test_table_new is successful Starting task check_20210416_121945 Apr 16 2021 12:20:55 ------ Table scan for repl_test_table start -------- Node Rows Extra Missing Mismatch Processed ---------------- --------- --------- --------- --------- --------- g_informix 4000 0 0 0 0 g_loopback 4000 0 0 0 0 Apr 16 2021 12:21:01 ------ Table scan for repl_test_table end --------- |
You could watch the progress of the last stage (updating every 10 seconds) on another screen with:
cdr stats check -r 10 check_20210416_121945 |
Alternatively, on Linux, this might be easier on the eye:
watch -d -n 10 "cdr stats check check_20210416_121945" |
You can list previous check tasks with:
cdr stats check |
Example output is:
Task check_20210415_185458 repl_test_table Completed Started Apr 15 18:56:57, Elapsed Time 0:00:02 Task check_20210415_191142 repl_test_table Completed Started Apr 15 19:13:39, Elapsed Time 0:00:03 Task check_20210416_121945 repl_test_table Completed Started Apr 16 12:20:59, Elapsed Time 0:00:01 |
You can delete the record of a previous check task with a command such as:
cdr stats check --delete check_20210416_121945" |
At an agreed time, when ready to go live with the new table, you should check there is no replication backlog with:
onstat -g rqm brief |
The cut-over steps would then be:
- Disconnect applications using the old table.
- Stop loopback replication of that table.
- Rename tables so that the new one will be used.
- Reconnect applications.
If we were to use admin mode – in which only user “informix” (by default) can be connected – as a means to kick users off for step 1, and if they automatically reconnect when allowed, a script for the whole process would be:
onmode -jy # admin mode cdr delete replicate repl_test_table dbaccess test <<EOF RENAME TABLE test_table TO test_table_old; RENAME TABLE test_table_new TO test_table; EOF onmode -m # on-line mode |
Rolling back the change would be achieved by renaming the tables back to how they were. The old table should be dropped once you know it is no longer needed.
ER could subsequently be stopped completely and the “syscdr” database dropped with:
cdr delete server g_loopback cdr delete server g_informix |
Caveats
IDS 12.10.xC11 or later is required for loopback replication.
ER generally has these prerequisites:
- The database must have logging enabled with “ontape“, “ondblog“, or “alter logmode“.
- Tables being replicated must be logged (cannot be RAW).
- A unique index or ERKEY columns must exist for each replicated table.
- Disk space is required for mandatory ER dbspaces if they need creating.
Conclusion
With those provisos, down time associated with a slow table alteration can be reduced to a few seconds using this technique. Everything you need is given in this article, though you may want to review general ER documentation for best tuning.
Disclaimer
Suggestions above are provided “as is” without warranty of any kind, either express or implied, including without limitation any implied warranties of condition, uninterrupted use, merchantability, fitness for a particular purpose, or non-infringement.