Updated June 2024
June 2024 change log:
- sp_load into temp table handled
- sp_load no longer specifies EXPRESS (only caused warning in message log)
September 2023 change log:
- Work-around for tables containing hidden specialized columns such as VERCOLS
- DBDATE external table option removed for better flexibility via environment
- CONSTRAINTS also disabled as necessary for EXPRESS with hidden indexes.
We have raised relevant APARs to be fixed in 14.10.xC11. They are:
Abstract
The Informix statements LOAD and UNLOAD allow data to be transferred between a flat file and a database, but are in fact only implemented in certain clients such as DB-Access, 4GL variants, and AGS Server Studio. You may therefore need functions in Informix Dynamic Server (IDS) that do this, such as when coding in Java with JDBC. Furthermore, external tables are much lighter and quicker for this purpose, so you might in any case prefer the functions described in this article, which use them in Informix Stored Procedure Language (SPL).
Content
Examples in this article were produced using the provided “stores_demo” database in a Docker container from ibmcom/informix-developer-database.
An EXTERNAL TABLE definition allows reading and writing to a flat file as though it were a normal table, but with many restrictions: in summary, they cannot have indexes, triggers or replication, and support only SELECT, INSERT (always truncates it first) and DROP statements.
There are many other uses outside the focus of this article, such as permanent archives of data that you don’t want kept in your instance to keep it smaller, and which you are happy always to be scanned sequentially. They can reside on another system if presented to the Informix server in a Networked File System (NFS), which is available on all supported operating systems including Windows.
Because external tables are handled entirely within the database engine, extraction or ingestion of data is much faster. If the table is not logged – either type RAW or in an unlogged database – ingestion is especially quick as this uses Express Mode with “light appends” which bypass the buffer pool.
Before we look at the code and how the functions work, it is easier to see them in action first. Once created, you could do this in a dbaccess session:
Used in the simplest form above, the whole table is unloaded or loaded. The file name should be a full path as seen on the database server, not the client.
An optional third parameter can be given to specify the delimiter character if different to the default “|” pipe symbol. For example:
Beware that this does not produce true CSV content with strings in double-quotes. Any literal occurrence of the delimiter still has a back-slash before it, as does back-slash itself. For loading into Excel, a tab delimited text file will be more reliable:
When unloading, you can in fact give any SELECT statement instead of the table name, for example:
Note above that two single-quotes is a literal single-quote. Do not use double-quotes which might be reserved for SQL identifiers: see environment variable DELIMIDENT.
When loading, external tables require that the final column delimiter is present even if the last column is not empty (which LOAD allows and is common in CSV files).
Listings and explanations of the two functions follow.
- Remove or comment out the DEBUG and TRACE statements as preferred.
- Both functions handle tables names in double-quotes containing special characters or mixed case if DELIMIDENT is set in the environment.
- If a user supplies a SQL statement to unload, only SELECT is allowed.
- Semi-colons are rejected throughout to prevent multiple statements.
- For “sp_unload”, the external table temporarily created contains the session ID for concurrent use.
- For “sp_load”, it contains the target table name which is in any case exclusively locked, and indexes are temporarily disabled for speed and as required by Express Mode.
- These functions fail unless the user has RESOURCE or DBA database privilege needed to create tables.
You can find the SQL that created external tables in the debug files, as in the following example (with indentation added):
ESCAPE is required for compatibility with versions prior to IDS 12.
MAX_ERRORS defaults to 2 in the “sp_load” function, meaning it will abort on the second error when loading data. Files from other systems often have column headings in the first line which would otherwise abort the load. You can override that with a fourth optional function parameter. For example, loading “items.csv” with column headings inserted at the top and with MAX_ERRORS set to 2 then 1 produces:
The REJECTFILE in /tmp contains the details of any errors. In the above case, it contained:
You can even create an external table to read a reject file in SQL with a “~” tilde column delimiter as follows (change the table name and file path to your own):
If you run “sp_load” on the standard “items” table, you will see this in the message log:
The “sp_load” function specifies EXPRESS for reminders that it would be faster if the target table was not logged.
The following does not cause that message:
For a huge data set, that would be massively quicker and avoid this error:
To alter an existing table to RAW, you must first drop referential constraints. The items table was previously created the lazy way with no explicit indexes or constraint names, according to the following beautified output from “dbschema -d stores_demo -t items -nw -q”:
To obtain those names, our article on Forest of Trees Indexes shows the following form of SQL:
The data returned includes:
tabname | constrtype | refers | constrname | idxname |
---|---|---|---|---|
items | P | u105_10 | 105_10 | |
items | R | orders | r105_11 | 105_11 |
items | R | stock | r105_12 | 105_12 |
SQL to retain indexes but drop constraints might therefore be:
WARNING: dropping a primary key constraint also drops foreign key constraints which reference it on other tables (none in this case) which you also need to record beforehand and recreate afterwards.
After loading, to reinstate logging, you would run:
Something like this will have been recorded in the message log:
In some circumstances, you might get this error trying to use a table after an express load:
That’s quite hard to reproduce. The following will fix it without delay, though you would still want to run a real level 0 archive as soon as possible afterwards:
Best practice would be to recreate constraints with explicit names, using NOVALIDATE for speed (the indexes will already be there):
To minimise down time, a better alternative on a real system might be to complete all the above on a copy of the table, and switch them round afterwards with RENAME TABLE.
If the table must be available to other sessions during a load and therefore cannot be exclusively locked, the following procedure provides a slower safe alternative even if the table is logged:
- Nothing is returned, so this is a procedure, not a function.
- The work is done in a sub-process by the dbload command line tool supplied with IDS.
- Unlike external tables, there are separate optional procedure parameters “skip_rows” and “max_errors” corresponding to dbload argument options “-i” (ignore) and “-e (errors tolerated), both defaulting to zero (no column headings, abort on first error).
- The number of rows committed per transaction can be tuned, but experience shows you probably won’t need to change it from the default of 2000 rows. A line is output on every commit, providing progress reports when loading a large amount of data.
- Unlike the other two functions, it can be run from or on another database.
- The number of columns in the file is needed by dbload. That can be fetched from system catalog table systables since we are loading all columns. The procedure is aborted if the table name cannot be found.
- File paths are rejected if they contain characters that might enable command injection.
- A dbload command file is created and run, creating an output and reject file.
- All files created are in “/tmp” and can subsequently be overwritten by other users.
- DELIMIDENT is set in the dbload environment and double quotes put around the table name in the command file to handle special characters or mixed case.
The following example loads a file we created earlier, telling it to skip the first row as it contains column headings:
The command file it created was:
The reject file was empty and the output file contained:
To see what happens if there is an error in the file, we can load the same file but telling it not to skip the first line, but to tolerate one error:
The same command file is produced, but the reject file is no longer empty, with the error details also in the output file:
Caveats
For all functions:
- Using with IDS versions prior to 11.50 has not been tested.
- Complex and binary column data types are not supported.
- Specifying a comma delimiter does not ensure full CSV file handling.
For “sp_unload” and “sp_load”:
- The user must have DBA or RESOURCE database privilege.
- The function must be installed in each database where it will be used.
For “sp_load” and “sp_dbload”:
- The file must contain all columns in the same order as the table.
- A work-around for that could be devised using a VIEW and column DEFAULT
For “sp_load” only:
- The function will fail if the table is in use and therefore cannot be exclusively locked.
- If the table is not logged, that has issues to consider, but loading will be hugely faster.
- Otherwise, “Long Transaction Aborted” may result with larger data sets.
Disclaimer
Suggestions above are provided “as is” without warranty of any kind, either express or implied, including without limitation any implied warranties of condition, uninterrupted use, merchantability, fitness for a particular purpose, or non-infringement.
Contact Us
If you have any questions or would like to find out more about this topic, please contact us.