Updated June 2024
June 2024 change log:
- sp_load into temp table handled
- sp_load no longer specifies EXPRESS (only caused warning in message log)
September 2023 change log:
- Work-around for tables containing hidden specialized columns such as VERCOLS
- DBDATE external table option removed for better flexibility via environment
- CONSTRAINTS also disabled as necessary for EXPRESS with hidden indexes.
We have raised relevant APARs to be fixed in 14.10.xC11. They are:
Abstract
The Informix statements LOAD and UNLOAD allow data to be transferred between a flat file and a database, but are in fact only implemented in certain clients such as DB-Access, 4GL variants, and AGS Server Studio. You may therefore need functions in Informix Dynamic Server (IDS) that do this, such as when coding in Java with JDBC. Furthermore, external tables are much lighter and quicker for this purpose, so you might in any case prefer the functions described in this article, which use them in Informix Stored Procedure Language (SPL).
Content
Examples in this article were produced using the provided “stores_demo” database in a Docker container from ibmcom/informix-developer-database.
An EXTERNAL TABLE definition allows reading and writing to a flat file as though it were a normal table, but with many restrictions: in summary, they cannot have indexes, triggers or replication, and support only SELECT, INSERT (always truncates it first) and DROP statements.
There are many other uses outside the focus of this article, such as permanent archives of data that you don’t want kept in your instance to keep it smaller, and which you are happy always to be scanned sequentially. They can reside on another system if presented to the Informix server in a Networked File System (NFS), which is available on all supported operating systems including Windows.
Because external tables are handled entirely within the database engine, extraction or ingestion of data is much faster. If the table is not logged – either type RAW or in an unlogged database – ingestion is especially quick as this uses Express Mode with “light appends” which bypass the buffer pool.
Before we look at the code and how the functions work, it is easier to see them in action first. Once created, you could do this in a dbaccess session:
> EXECUTE FUNCTION sp_unload('items', '/tmp/items.unl');
rows_unloaded
67
1 row(s) retrieved.
> TRUNCATE items;
Table truncated.
> EXECUTE FUNCTION sp_load('items', '/tmp/items.unl');
rows_loaded
67
1 row(s) retrieved.
Used in the simplest form above, the whole table is unloaded or loaded. The file name should be a full path as seen on the database server, not the client.
An optional third parameter can be given to specify the delimiter character if different to the default “|” pipe symbol. For example:
> EXECUTE FUNCTION sp_unload('items', '/tmp/items.csv', ',');
rows_unloaded
67
1 row(s) retrieved.
> TRUNCATE items;
Table truncated.
> EXECUTE FUNCTION sp_load('items', '/tmp/items.csv', ',');
rows_loaded
67
1 row(s) retrieved.
Beware that this does not produce true CSV content with strings in double-quotes. Any literal occurrence of the delimiter still has a back-slash before it, as does back-slash itself. For loading into Excel, a tab delimited text file will be more reliable:
> EXECUTE FUNCTION sp_unload('items', '/tmp/items.txt', CHR(9));
rows_unloaded
67
1 row(s) retrieved.
When unloading, you can in fact give any SELECT statement instead of the table name, for example:
> EXECUTE FUNCTION sp_unload(
> 'SELECT * FROM items WHERE state = ''CA''',
> '/tmp/california.unl'
> );
rows_unloaded
18
1 row(s) retrieved.
Note above that two single-quotes is a literal single-quote. Do not use double-quotes which might be reserved for SQL identifiers: see environment variable DELIMIDENT.
When loading, external tables require that the final column delimiter is present even if the last column is not empty (which LOAD allows and is common in CSV files).
Listings and explanations of the two functions follow.
- Remove or comment out the DEBUG and TRACE statements as preferred.
- Both functions handle tables names in double-quotes containing special characters or mixed case if DELIMIDENT is set in the environment.
- If a user supplies a SQL statement to unload, only SELECT is allowed.
- Semi-colons are rejected throughout to prevent multiple statements.
- For “sp_unload”, the external table temporarily created contains the session ID for concurrent use.
- For “sp_load”, it contains the target table name which is in any case exclusively locked, and indexes are temporarily disabled for speed and as required by Express Mode.
- These functions fail unless the user has RESOURCE or DBA database privilege needed to create tables.
You can find the SQL that created external tables in the debug files, as in the following example (with indentation added):
-- sp_unload
SELECT * FROM items INTO EXTERNAL temp_670_ext
USING
(
DATAFILES ('DISK:/tmp/items.unl'),
DELIMITER '|',
ESCAPE
);
-- sp_load
CREATE EXTERNAL TABLE temp_items_ext SAMEAS items
USING
(
DATAFILES ('DISK:/tmp/items.unl'),
REJECTFILE '/tmp/sp_load.items.rej',
MAXERRORS 2,
EXPRESS,
DELIMITER '|',
ESCAPE
);
ESCAPE is required for compatibility with versions prior to IDS 12.
MAX_ERRORS defaults to 2 in the “sp_load” function, meaning it will abort on the second error when loading data. Files from other systems often have column headings in the first line which would otherwise abort the load. You can override that with a fourth optional function parameter. For example, loading “items.csv” with column headings inserted at the top and with MAX_ERRORS set to 2 then 1 produces:
> TRUNCATE items;
Table truncated.
> EXECUTE FUNCTION sp_load('items', '/tmp/items.csv', ',');
rows_loaded
67
1 row(s) retrieved.
> TRUNCATE items;
Table truncated.
> EXECUTE FUNCTION sp_load('items', '/tmp/items.csv', ',', 1);
rows_loaded
1213: A character to numeric conversion process failed
The REJECTFILE in /tmp contains the details of any errors. In the above case, it contained:
You can even create an external table to read a reject file in SQL with a “~” tilde column delimiter as follows (change the table name and file path to your own):
CREATE EXTERNAL TABLE temp_items_ext_rej
(
file_name VARCHAR(167),
line_number INTEGER,
error_code VARCHAR(32),
column_name VARCHAR(167),
data LVARCHAR
)
USING
(
DATAFILES ('DISK:/tmp/sp_load.items.rej'),
DELIMITER '~',
ESCAPE
);
> SELECT * FROM temp_items_ext_rej;
file_name /tmp/items.csv
line_number 1
error_code CONVERT_ERR
column_name item_num
data item_num,order_num,stock_num,manu_code,quantity,total_price,
If you run “sp_load” on the standard “items” table, you will see this in the message log:
Switching load on target table informix.items to DELUXE
The “sp_load” function specifies EXPRESS for reminders that it would be faster if the target table was not logged.
The following does not cause that message:
> SELECT * FROM items WHERE ROWID = 0 INTO RAW items_copy;
0 row(s) retrieved into table.
> EXECUTE FUNCTION sp_load('items_copy', '/tmp/items.unl');
rows_loaded
67
1 row(s) retrieved.
For a huge data set, that would be massively quicker and avoid this error:
informix@ifx:~$ finderr -458
-458 Long transaction aborted.
The database server ran out of log space in which to record this
transaction. A transaction that is not fully recorded cannot be rolled
back. To preserve database integrity, the operating system ended the
transaction and rolled it back automatically. All changes made since
the start of the transaction have been removed. Terminate the
application, and replan it so that it modifies fewer rows per
transaction. Alternatively, contact the database server administrator
to discuss increasing the number or the size of the logical logs. Be
prepared to talk about the number of rows being updated or inserted and
the size of each row.
To alter an existing table to RAW, you must first drop referential constraints. The items table was previously created the lazy way with no explicit indexes or constraint names, according to the following beautified output from “dbschema -d stores_demo -t items -nw -q”:
CREATE TABLE items
(
item_num SMALLINT,
order_num INTEGER,
stock_num SMALLINT NOT NULL,
manu_code CHAR(3) NOT NULL,
quantity SMALLINT,
total_price MONEY(8,2),
CHECK (quantity >= 1),
PRIMARY KEY (item_num, order_num)
);
ALTER TABLE items ADD CONSTRAINT
FOREIGN KEY (order_num)
REFERENCES orders;
ALTER TABLE items ADD CONSTRAINT
FOREIGN KEY (stock_num, manu_code)
REFERENCES stock;
To obtain those names, our article on Forest of Trees Indexes shows the following form of SQL:
SELECT t.tabname,
c.constrtype,
p.tabname AS refers,
c.constrname,
i.idxname
FROM systables AS t
JOIN sysconstraints AS c ON c.tabid = t.tabid
JOIN sysindexes AS i ON i.idxname = c.idxname LEFT OUTER
JOIN sysreferences AS r ON r.constrid = c.constrid LEFT OUTER
JOIN systables AS p ON p.tabid = r.ptabid
ORDER BY 1, 2, 3;
The data returned includes:
tabname | constrtype | refers | constrname | idxname |
---|---|---|---|---|
items | P | u105_10 | 105_10 | |
items | R | orders | r105_11 | 105_11 |
items | R | stock | r105_12 | 105_12 |
SQL to retain indexes but drop constraints might therefore be:
RENAME INDEX 105_10 TO ix_pk_items;
RENAME INDEX 105_11 TO ix_fk_items_orders;
RENAME INDEX 105_12 TO ix_fk_items_stock;
ALTER TABLE items DROP CONSTRAINT u105_10;
ALTER TABLE items DROP CONSTRAINT r105_11;
ALTER TABLE items DROP CONSTRAINT r105_12;
WARNING: dropping a primary key constraint also drops foreign key constraints which reference it on other tables (none in this case) which you also need to record beforehand and recreate afterwards.
After loading, to reinstate logging, you would run:
> ALTER TABLE items TYPE (STANDARD);
Table altered.
Something like this will have been recorded in the message log:
Warning : The partition '0x400003' for the table stores_demo:"informix".items'
is converted from RAW to STANDARD type.
This table may not be fully restored in the case of server crash.
It is recommended to take a level-0 backup before you use or
modify the table in a transaction.
In some circumstances, you might get this error trying to use a table after an express load:
informix@ifx:~$ finderr -197
-197 ISAM error: Cannot open a recently appended logged partition for writing.
A level-0 backup is needed before you can write to the target database.
If this error recurs after a level-0 backup, note all circumstances and contact
IBM Informix Technical Support.
That’s quite hard to reproduce. The following will fix it without delay, though you would still want to run a real level 0 archive as soon as possible afterwards:
informix@ifx:~$ ontape -s -L 0 -t /dev/null
Archive to tape device '/dev/null' is complete.
Program over.
Best practice would be to recreate constraints with explicit names, using NOVALIDATE for speed (the indexes will already be there):
SET ENVIRONMENT NOVALIDATE ON;
ALTER TABLE items ADD CONSTRAINT
PRIMARY KEY (item_num, order_num)
CONSTRAINT pk_items;
ALTER TABLE items ADD CONSTRAINT
FOREIGN KEY (order_num)
REFERENCES orders
CONSTRAINT fk_items_orders;
ALTER TABLE items ADD CONSTRAINT
FOREIGN KEY (stock_num, manu_code)
REFERENCES stock
CONSTRAINT fk_items_stock;
To minimise down time, a better alternative on a real system might be to complete all the above on a copy of the table, and switch them round afterwards with RENAME TABLE.
If the table must be available to other sessions during a load and therefore cannot be exclusively locked, the following procedure provides a slower safe alternative even if the table is logged:
- Nothing is returned, so this is a procedure, not a function.
- The work is done in a sub-process by the dbload command line tool supplied with IDS.
- Unlike external tables, there are separate optional procedure parameters “skip_rows” and “max_errors” corresponding to dbload argument options “-i” (ignore) and “-e (errors tolerated), both defaulting to zero (no column headings, abort on first error).
- The number of rows committed per transaction can be tuned, but experience shows you probably won’t need to change it from the default of 2000 rows. A line is output on every commit, providing progress reports when loading a large amount of data.
- Unlike the other two functions, it can be run from or on another database.
- The number of columns in the file is needed by dbload. That can be fetched from system catalog table systables since we are loading all columns. The procedure is aborted if the table name cannot be found.
- File paths are rejected if they contain characters that might enable command injection.
- A dbload command file is created and run, creating an output and reject file.
- All files created are in “/tmp” and can subsequently be overwritten by other users.
- DELIMIDENT is set in the dbload environment and double quotes put around the table name in the command file to handle special characters or mixed case.
The following example loads a file we created earlier, telling it to skip the first row as it contains column headings:
> TRUNCATE items;
Table truncated.
> EXECUTE FUNCTION sp_dbload('items', '/tmp/items.csv', ',', 1);
rows_loaded
67
1 row(s) retrieved.
The command file it created was:
The reject file was empty and the output file contained:
To see what happens if there is an error in the file, we can load the same file but telling it not to skip the first line, but to tolerate one error:
> TRUNCATE items;
Table truncated.
> EXECUTE FUNCTION sp_dbload('items', '/tmp/items.csv', ',', 0, 1);
rows_loaded
67
1 row(s) retrieved.
The same command file is produced, but the reject file is no longer empty, with the error details also in the output file:
Caveats
For all functions:
- Using with IDS versions prior to 11.50 has not been tested.
- Complex and binary column data types are not supported.
- Specifying a comma delimiter does not ensure full CSV file handling.
For “sp_unload” and “sp_load”:
- The user must have DBA or RESOURCE database privilege.
- The function must be installed in each database where it will be used.
For “sp_load” and “sp_dbload”:
- The file must contain all columns in the same order as the table.
- A work-around for that could be devised using a VIEW and column DEFAULT
For “sp_load” only:
- The function will fail if the table is in use and therefore cannot be exclusively locked.
- If the table is not logged, that has issues to consider, but loading will be hugely faster.
- Otherwise, “Long Transaction Aborted” may result with larger data sets.
Disclaimer
Suggestions above are provided “as is” without warranty of any kind, either express or implied, including without limitation any implied warranties of condition, uninterrupted use, merchantability, fitness for a particular purpose, or non-infringement.
Contact Us
If you have any questions or would like to find out more about this topic, please contact us.