Abstract
When testing a database migration to a new platform that is not compatible, for example with a different OS or locale, you cannot restore from a level 0 archive on the new server. Data must be copied either via flat files or directly using SQL tools. If this is from a live database that is being updated and cannot be exclusively locked (which rules out dbexport), you cannot guarantee consistency of the data being reloaded, so NOVALIDATE must be set to ensure foreign keys can be reapplied.
This article provides stored procedure language (SPL) code to identify and delete any resulting invalid data.
Content
The following SQL generator was posted in the dbexport alternative thread on the IBM Informix community forum in May 2021, the only change below being that PROCEDURE has been replaced with FUNCTION which is more correct if data is returned:
DROP FUNCTION IF EXISTS sp_revalidate;
CREATE FUNCTION sp_revalidate() RETURNING LVARCHAR AS sql;
/*
Generate SQL to check FOREIGN KEYS added with NOVALIDATE
Doug Lawry, Oninit Consulting, March 2021
Usage in "dbaccess":
UNLOAD TO 'revalidate.sql' DELIMITER ';'
SELECT * FROM TABLE (FUNCTION sp_revalidate());
*/
DEFINE l_constrname VARCHAR(128); -- foreign key constraint name
DEFINE l_dtabname VARCHAR(128); -- detail table name
DEFINE l_ptabname VARCHAR(128); -- parent table name
DEFINE l_dtabid LIKE systables.tabid; -- detail table ID
DEFINE l_ptabid LIKE systables.tabid; -- parent table ID
DEFINE l_dindexkeys LIKE sysindices.indexkeys; -- detail index column numbers
DEFINE l_pindexkeys LIKE sysindices.indexkeys; -- parent index column numbers
DEFINE l_dcolno LIKE syscolumns.colno; -- detail column number
DEFINE l_pcolno LIKE syscolumns.colno; -- parent column number
DEFINE l_dcolname VARCHAR(128); -- detail column name
DEFINE l_pcolname VARCHAR(128); -- parent column name
DEFINE l_keyid SMALLINT; -- index array pointer
DEFINE l_sql LVARCHAR; -- generated SQL
FOREACH
SELECT dc.constrname,
TRIM(dt.tabname), dt.tabid, di.indexkeys,
TRIM(pt.tabname), pt.tabid, pi.indexkeys
INTO l_constrname,
l_dtabname, l_dtabid, l_dindexkeys,
l_ptabname, l_ptabid, l_pindexkeys
FROM sysconstraints AS dc
JOIN sysobjstate AS do ON do.name = dc.constrname
JOIN systables AS dt ON dt.tabid = dc.tabid
JOIN sysindices AS di ON di.idxname = dc.idxname
JOIN sysreferences AS dr ON dr.constrid = dc.constrid
JOIN sysconstraints AS pc ON pc.constrid = dr.primary
JOIN systables AS pt ON pt.tabid = pc.tabid
JOIN sysindices AS pi ON pi.idxname = pc.idxname
WHERE dc.constrtype = 'R'
AND do.objtype = 'C'
AND do.state = 'E'
ORDER BY 2, 1
LET l_sql =
'SELECT d.ROWID AS row_id ' ||
'FROM ' || l_dtabname || ' AS d ' ||
'LEFT OUTER JOIN ' || l_ptabname || ' AS p ';
FOR l_keyid = 0 TO 15
LET l_dcolno = ikeyextractcolno(l_dindexkeys, l_keyid);
LET l_pcolno = ikeyextractcolno(l_pindexkeys, l_keyid);
IF l_dcolno = 0 THEN
EXIT FOR;
END IF
SELECT TRIM(colname)
INTO l_dcolname
FROM syscolumns
WHERE tabid = l_dtabid
AND colno = l_dcolno;
SELECT TRIM(colname)
INTO l_pcolname
FROM syscolumns
WHERE tabid = l_ptabid
AND colno = l_pcolno;
LET l_sql = l_sql ||
DECODE(l_keyid, 0, 'ON', 'AND') ||
' d.' || l_dcolname || " =" ||
' p.' || l_pcolname || " ";
END FOR
FOR l_keyid = 0 TO 15
LET l_dcolno = ikeyextractcolno(l_dindexkeys, l_keyid);
IF l_dcolno = 0 THEN
EXIT FOR;
END IF
SELECT TRIM(colname)
INTO l_dcolname
FROM syscolumns
WHERE tabid = l_dtabid
AND colno = l_dcolno;
LET l_sql = l_sql ||
DECODE(l_keyid, 0, 'WHERE', 'AND') ||
' d.' || l_dcolname || " IS NOT NULL ";
END FOR
LET l_sql = l_sql ||
'AND p.ROWID IS NULL ' ||
'INTO TEMP temp_rowid WITH NO LOG';
RETURN l_sql WITH RESUME;
LET l_sql =
'UNLOAD TO "' || l_dtabname || '.' || l_constrname || '.unl" ' ||
'SELECT * FROM ' || l_dtabname || ' ' ||
'WHERE ROWID IN (SELECT * FROM temp_rowid)';
RETURN l_sql WITH RESUME;
LET l_sql =
'DELETE FROM ' || l_dtabname || ' ' ||
'WHERE ROWID IN (SELECT * FROM temp_rowid)';
RETURN l_sql WITH RESUME;
LET l_sql =
'DROP TABLE temp_rowid';
RETURN l_sql WITH RESUME;
END FOREACH;
END FUNCTION;
The foreign key hierarchy is traversed for the entire current database. You will need to create the function in every database that needs to be checked: table names are assumed to be in the database containing the stored procedure or function unless explicitly stated otherwise, and not in the database to which the calling session is connected.
The main FOREACH SELECT statement examines system catalog tables within the database being checked as follows:
Alias | Table | Description |
---|---|---|
dc | sysconstraints | Foreign key constraint (FK) |
do | sysobjstate | FK object state |
dt | systables | FK table |
di | sysindices | FK index |
dr | sysreferences | Constraint relationship |
pc | sysconstraints | Primary key constraint (PK) |
pt | systables | PK table |
pi | sysindices | PK index |
Criterion | Description |
---|---|
dc.constrtype = 'R' | Referential constraint (FK) |
do.objtype = 'C' | Constraint object |
do.state = 'E' | Enabled (remove to include all) |
As well as the necessary object names and table IDs, “sysindices.indexkeys” are selected for both the foreign and primary key. We are using table sysindices which contains the column numbers in an index as an array so is easier to use here, rather than the more commonly used sysindexes view which represents the array as 16 separate columns. We can then loop through the 16 possible elements using an undocumented internal function with this specification:
CREATE DBA FUNCTION ikeyextractcolno (INDEXKEYARRAY, INTEGER)
RETURNING SMALLINT;
Overall usage is as given in the source code comments:
UNLOAD TO 'revalidate.sql' DELIMITER ';'
SELECT * FROM TABLE (FUNCTION sp_revalidate());
UNLOAD is implemented client-side, so will only work in dbaccess or AGS Server Studio. If using other SQL clients, you will need to save the result set in such a way that semi-colons are added to each line, or edit the SPL function to do that.
Note also the under-documented trick to treat function output as a sub-query with:
TABLE (FUNCTION spl-name(optional-parameters))
The file produced by executing this in the standard “stores_demo” database created by dbaccessdemo contains:
SELECT d.ROWID AS row_id FROM catalog AS d LEFT OUTER JOIN stock AS p ON d.stock_num = p.stock_num AND d.manu_code = p.manu_code WHERE d.stock_num IS NOT NULL AND d.manu_code IS NOT NULL AND p.ROWID IS NULL INTO TEMP temp_rowid WITH NO LOG;
UNLOAD TO "catalog.aa.unl" SELECT * FROM catalog WHERE ROWID IN (SELECT * FROM temp_rowid);
DELETE FROM catalog WHERE ROWID IN (SELECT * FROM temp_rowid);
DROP TABLE temp_rowid;
SELECT d.ROWID AS row_id FROM cust_calls AS d LEFT OUTER JOIN customer AS p ON d.customer_num = p.customer_num WHERE d.customer_num IS NOT NULL AND p.ROWID IS NULL INTO TEMP temp_rowid WITH NO LOG;
UNLOAD TO "cust_calls.r108_19.unl" SELECT * FROM cust_calls WHERE ROWID IN (SELECT * FROM temp_rowid);
DELETE FROM cust_calls WHERE ROWID IN (SELECT * FROM temp_rowid);
DROP TABLE temp_rowid;text
SELECT d.ROWID AS row_id FROM cust_calls AS d LEFT OUTER JOIN call_type AS p ON d.call_code = p.call_code WHERE d.call_code IS NOT NULL AND p.ROWID IS NULL INTO TEMP temp_rowid WITH NO LOG;
UNLOAD TO "cust_calls.r108_20.unl" SELECT * FROM cust_calls WHERE ROWID IN (SELECT * FROM temp_rowid);
DELETE FROM cust_calls WHERE ROWID IN (SELECT * FROM temp_rowid);
DROP TABLE temp_rowid;
SELECT d.ROWID AS row_id FROM items AS d LEFT OUTER JOIN orders AS p ON d.order_num = p.order_num WHERE d.order_num IS NOT NULL AND p.ROWID IS NULL INTO TEMP temp_rowid WITH NO LOG;
UNLOAD TO "items.fk_items_orders.unl" SELECT * FROM items WHERE ROWID IN (SELECT * FROM temp_rowid);
DELETE FROM items WHERE ROWID IN (SELECT * FROM temp_rowid);
DROP TABLE temp_rowid;
SELECT d.ROWID AS row_id FROM items AS d LEFT OUTER JOIN stock AS p ON d.stock_num = p.stock_num AND d.manu_code = p.manu_code WHERE d.stock_num IS NOT NULL AND d.manu_code IS NOT NULL AND p.ROWID IS NULL INTO TEMP temp_rowid WITH NO LOG;
UNLOAD TO "items.r105_12.unl" SELECT * FROM items WHERE ROWID IN (SELECT * FROM temp_rowid);
DELETE FROM items WHERE ROWID IN (SELECT * FROM temp_rowid);
DROP TABLE temp_rowid;
SELECT d.ROWID AS row_id FROM orders AS d LEFT OUTER JOIN customer AS p ON d.customer_num = p.customer_num WHERE d.customer_num IS NOT NULL AND p.ROWID IS NULL INTO TEMP temp_rowid WITH NO LOG;
UNLOAD TO "orders.r102_4.unl" SELECT * FROM orders WHERE ROWID IN (SELECT * FROM temp_rowid);
DELETE FROM orders WHERE ROWID IN (SELECT * FROM temp_rowid);
DROP TABLE temp_rowid;
SELECT d.ROWID AS row_id FROM stock AS d LEFT OUTER JOIN manufact AS p ON d.manu_code = p.manu_code WHERE d.manu_code IS NOT NULL AND p.ROWID IS NULL INTO TEMP temp_rowid WITH NO LOG;
UNLOAD TO "stock.r104_9.unl" SELECT * FROM stock WHERE ROWID IN (SELECT * FROM temp_rowid);
DELETE FROM stock WHERE ROWID IN (SELECT * FROM temp_rowid);
DROP TABLE temp_rowid;
There are repeating groups of 4 statements per foreign key. An example group with linefeeds inserted for readability follows:
SELECT d.ROWID AS row_id
FROM items AS d LEFT OUTER JOIN orders AS p
ON d.order_num = p.order_num
WHERE d.order_num IS NOT NULL AND p.ROWID IS NULL
INTO TEMP temp_rowid WITH NO LOG;
UNLOAD TO "items.fk_items_orders.unl"
SELECT * FROM items
WHERE ROWID IN (SELECT * FROM temp_rowid);
DELETE FROM items
WHERE ROWID IN (SELECT * FROM temp_rowid);
DROP TABLE temp_rowid;
The first statement identifies rows in the foreign key table for which the column values are not present in the corresponding primary key, and uses the technique described here:
https://www.oninitgroup.com/faq-items/sub-queries-a-faster-alternative-to-not-exists
NULL values in foreign key columns are exempted from referential checking, so are excluded with IS NOT NULL criteria.
The ROWIDs are saved in a temporary table, which is then used to:
- save row contents in standard pipe-delimited UNLOAD format to file “table-name.foreign-key-name.unl”;
- delete those rows.
WITH NO LOG is appended as a precaution, though you should always have configuration parameter TEMPTAB_NOLOG set to 1, making this the default anyway. This does not cause any error if the database is not logged.
The temporary table is then dropped as the name will be reused in the next repeating group of statements.
Should you wish to report invalid data without fixing it, the RETURN of delete statements can be commented out in the code, or lines beginning with DELETE removed from what is generated afterwards.
The result can then be executed to save a flat file per foreign key. Empty files can be ignored, or removed on Unix or Linux with:
for i in *.unl
do test -s $i || rm -f $i
done
If you decided not include DELETE statements, file contents can be reviewed before fixing data manually.
Caveats
The code assumes ROWID is available for all tables involved. If you have Enterprise Edition and are using tables with multiple partitions, WITH ROWIDS must be applied to them. Alternatively, if every table contains a SERIAL, SERIAL8 or BIGSERIAL column, the SPL function could be recoded to use those instead. Another work-around from IDS 14.10.xC6 onwards would be to use the undocumented IFX_ROW_ID pseudo-column: queries on them no longer sequentially scan the table.
Conclusion
The tool provided in this article means that a database can be copied while fully in use, and any resulting inconsistencies in referential integrity identified and fixed afterwards as necessary.
References
Documentation links not shown in full in this article can be queried at the new IBM Informix 14.10 site:
Disclaimer
Suggestions above are provided “as is” without warranty of any kind, either express or implied, including without limitation any implied warranties of condition, uninterrupted use, merchantability, fitness for a particular purpose, or non-infringement.