Abstract

This third article in the Easy IWA series provides specific worked examples of creating two Data Marts under IBM Informix Dynamic Server and IBM Informix Warehouse Accelerator 12.10.FC4. Both examples use the “smart mart” utility, with the first example loading from external tables into IBM Informix tables and subsequently into an IWA Data Mart; the second loads directly into an IWA Data Mart from external tables. Both examples use the TPCH data generation utility and schema as detailed in a previous article, with the addition of the removal of unsuitable “*_comment” columns via a simple in-line pipe and sed or awk approach.

Content

The creation and loading of a Data Mart under IBM Informix can at first appear convoluted; however, it is a straight forward set of activities that can be placed in a single, repeatable SQL script.

The process described here is based on the following premises (the scripts are written so that they can be re-run; the first run will report an error regarding the dropping of the source database and the dropping of the Data Mart):

An IBM Informix instance with an IWA Accelerator (JJDWA1) is present and running

No databases exist for the source of data for the Accelerator (“jj_tpch_internal” and “jj_tpch_external”)

Data will be generated using the TPCH data generation utility with redundant “*_comment” column data removed in-line (jj_tpch_data_stripped.sh)

The scripts for this article can be found in the resources section at the end of this article

If using Scale Factor 1, then both Data Marts are able to loaded, but this is running close to the limit imposed by having 2,560 MB RAM allocated to the WORKERS and 512 allocated to the CO-ORDINATOR

Two approaches will be described:

  1. Creation and loading a Data Mart from IBM Informix standard tables (jj_tpch_internal_stripped.sql)
    1. Creation of a TPCH database schema (excluding “*_comment” columns) and defining external tables defined as pipes based on the standard tables
    2. Loading IBM Informix tables with TPCH data from the external tables using the TPCH data generation utility (jj_tpch_data_stripped.sh)
    3. Loading the IWA Data Mart from IBM Informix tables
  2. Creation and loading a Data Mart from IBM Informix external tables (jj_tpch_external_stripped.sql)
    1. Creation of a TPCH database schema (excludes “*_comment” columns) and defining external tables based on the standard tables
    2. Loading IBM IWA directly from the external tables using the TPCH data generation utility (jj_tpch_data_stripped.sh)

As the data generation is the same for both approaches, the first issue to address is removal of the last “*_comment” field from the data generation scripts to ensure effective use of available resources.

Removal of “*_comment” columns from data generation

The following script achieves this, using awk or sed to remove in-line the last “|” delimited field, placing all data on respective pipes for subsequent usage by external tables:

Copy to Clipboard

Please review the full file jj_tpch_internal.sql whilst progressing through the following extracts and note that the above data generation script, jj_tpch_data_stripped.sh, needs to be running prior to actual execution of this SQL. A minimum of 2 GB is recommended for the dbspace where the jj_tpch_internal database is created, noting that SF=1 denotes approximately “1 GB of data” in the data generation script.
  1. Drop any previous incarnation of the “jj_tpch_internal” database
Note that a database cannot be dropped when there exists a table altered from raw to standard (unless a Level 0 archive has been performed), but the table itself can be dropped. Raw tables are used here for speed of loading.
Copy to Clipboard
  1. Create the jj_tpch_internal database with buffered logging in an appropriate dbspace, followed by creation of raw TPCH tables excluding the “*_comment” columns:
Copy to Clipboard
  1. Create the supporting external tables which will subsequently be used to load the raw tables (the working directory is “jj_tpch_external” where all scripts and external tables are):
Copy to Clipboard
  1. Load the raw tables from the external tables (note that the data generation should already be running; it can be started in the background before the SQL scripted is run):
Copy to Clipboard
  1. Alter the raw tables to standard once loading has completed:
Copy to Clipboard
  1. Create unique indexes against the candidate primary keys:
Copy to Clipboard
  1. Run a full set of update statistics
In this example update statistics high is used for the complete database, this is not strictly required and update statistics low will be sufficient; in fact the creation of indexes will suffice as this activity will provide the required statistics
Copy to Clipboard
At this point the process of loading the data into raw tables and converting them to standard has been completed; this may differ from specific implementation requirements where an existing database could be the candidate source of data for a Data Mart. From this point on activity is in relation to Data Mart creation and loading.
  1. Drop any previous incarnation of the “jj_tpch_mart_int” Data Mart from the Accelerator “JJDWA1” and the current database
It should be noted that the database jj_tpch_internal has been dropped. The following command will remove the reference to the Data Mart from sysadmin:iwa_datamarts and also remove the Data Mart from the Accelerator
Copy to Clipboard
  1. Clean up any previous probing data generated by “set environment use_dwa ‘probe start’;”
Copy to Clipboard
  1. Start the SmartMart probing utility to “probe” any queries that are run in this session, gathering information for later processing into a mart definition
Copy to Clipboard
  1. Generally the requirement of running the query to produce data is not required or desired during query probing, so setting “set explain on avoid_execute;” prevents query execution
set explain on avoid execute;
  1. Having previously established the two FACT tables, lineitem and partsupp, for the Data Mart, the following 4 statements dictate:
    1. That lineitem is a FACT table for the first select statement, with the select statement detailing all potential joins between lineitem and the DIMENSION tables
    2. That partsupp is a FACT table for the second select statement, with the select statement detailing all potential joins between partsupp and the DIMENSION tables
It is worth noting, from a relational database design perspective, that nation table can be a parent to either the supplier or the customer tables, therefore two distinct joins to nation are detailed along with respective joins to region
Copy to Clipboard
  1. Turn off set explain and probing now that the candidate queries have been processed:
Copy to Clipboard
  1. At this point a separate logged database is required for processing the captured probe data and converting into an actual Mart definition; here “jj_tpch_internal_mart” is dropped and created for use.
Copy to Clipboard
  1. The following statement processes probing data that was captured against the “jj_tpch_internal” database and creates a Data Mart definition “jj_tpch_mart_int”
Copy to Clipboard
  1. The Data Mart definition jj_tpch_mart_int residing within jj_tpch_internal_mart now needs to be passed to the Accelerator JJDWA1 whilst connected to the database where the source data is held
Copy to Clipboard
  1. All that remains, which is really the main activity being performed here, is to load the Data Mart jj_tpch_mart_int in the Accelerator JJDWA1 with NONE for the locking preferences
During this activity, it is recommended to monitor the Accelerator using “watch ondwa tasks” where each table can be observed being received, compressed, encoded, distributed and flushed out to storage
Copy to Clipboard
  1. For completeness, the Data Mart definition jj_tpch_mart_int can be extracted from the Accelerator JJDWA1 and placed in a file jj_tpch_mart_int_??.xml, where ?? is replaced. This file is in XML format
Copy to Clipboard
Please review the full file jj_tpch_external.sql whilst progressing through the following extracts and note that the data generation script, jj_tpch_data_stripped.sh, needs to be running prior to actual execution of this SQL. A minimum of 2 GB is recommended for the dbspace where the jj_tpch_external database is created, noting that SF=1 denotes approximately “1 GB of data” in the data generation script, however, as the data is not going to be held in standard tables, the dbspace requirements are only several MB to hold the schemas.
  1. Drop any previous incarnation of the “jj_tpch_external” database:
Copy to Clipboard
  1. Create the “jj_tpch_external” database with buffered logging in an appropriate dbspace:
Copy to Clipboard
  1. For ease of use, create a set of standard tables (named tabname_std) which reflect the format of the external tables; these will subsequently be used to create a corresponding set of external tables using the SAMEAS clause.
Copy to Clipboard
  1. Create the required external tables based on the tabname_std table with the corresponding pipe; these will be the source of data used to load directly into the data mart:
Copy to Clipboard
  1. The creation of the data mart requires knowledge of unique keys, and creation of unique disabled indexes against the external tables denotes this:
Copy to Clipboard
  1. Drop any previous incarnation of the “jj_tpch_mart_ext” Data Mart from the Accelerator “JJDWA1” and the current database
It should be noted that the database jj_tpch_external has been dropped. The following command will remove the reference to the Data Mart from sysadmin:iwa_datamarts and also remove the Data Mart from the Accelerator
Copy to Clipboard
  1. Clean up any previous probing data generated by “set environment use_dwa ‘probe start’;”
Copy to Clipboard
  1. Start the SmartMart probing utility to “probe” any queries that are run in this session, gathering information for later processing into a mart definition
Copy to Clipboard
  1. Generally the requirement of running the query to produce data is not required or desired during query probing, so setting “set explain on avoid_execute;” prevents query execution
Copy to Clipboard
  1. Having previously established the two FACT tables, lineitem and partsupp, for the Data Mart, the following 4 statements dictate:
    1. That lineitem is a FACT table for the first select statement, with the select statement detailing all potential joins between lineitem and the DIMENSION tables
    2. That partsupp is a FACT table for the second select statement, with the select statement detailing all potential joins between partsupp and the DIMENSION tables
It is worth noting, from a relational database design perspective, that nation table can be a parent to either the supplier or the customer tables, therefore two distinct joins to nation are detailed along with respective joins to region
Copy to Clipboard
  1. Turn off set explain and probing now that the candidate queries have been processed:
Copy to Clipboard
  1. At this point a separate logged database is required for processing the captured probe data and converting into an actual Mart definition; here “jj_tpch_internal_mart” is dropped and created for use.
Copy to Clipboard
  1. The following statement processes probing data that was captured against the “jj_tpch_external” database and creates a Data Mart definition “jj_tpch_mart_ext”
Copy to Clipboard
  1. The Data Mart definition jj_tpch_mart_ext residing within jj_tpch_external_mart now needs to be passed to the Accelerator JJDWA1 whilst connected to the database where the source data is held
Copy to Clipboard
  1. All that remains, which is really the main activity being performed here, is to load the Data Mart jj_tpch_mart_ext in the Accelerator JJDWA1 with NONE for the locking preferences
During this activity, it is recommended to monitor the Accelerator using “watch ondwa tasks” where each table can be observed being received, compressed, encoded, distributed and flushed out to storage
Copy to Clipboard
  1. For completeness, the Data Mart definition jj_tpch_mart_ext can be extracted from the Accelerator JJDWA1 and placed in a file jj_tpch_mart_ext_??.xml, where ?? is replaced. This file is in XML format
Copy to Clipboard
Monitoring It is definitely worthwhile monitoring IWA, and running “watch ondwa tasks” in a separate window will allow the observation of which table is being loaded into the Data Mart along with various other details.

Conclusion

This article, and the attached files, demonstrate the complete process of creation and loading of an IBM Informix Warehouse Accelerator, using both internal table data (standard tables) and external tables ( based on pipes) as the source of data. The process is straight forward and the provided examples provide a clear initial basis to build upon. Using external tables, any data source can potentially be transformed into a data mart within IWA enabling high performance “what if” analysis in a discrete environment.

Resources

jj_tpch_data_stripped.txt

jj_tpch_external_stripped.txt

jj_tpch_internal_stripped.txt

Disclaimer

The code fix suggested above is provided “as is” without warranty of any kind, either express or implied, including without limitation any implied warranties of condition, uninterrupted use, merchantability, fitness for a particular purpose, or non-infringement.

Contact us

If you have any questions regarding data marts and IBM Informix Warehouse Accelerator and would like to find out more, simply contact us.

Author

  • Big Data and RDBMS Senior Technical Engineer and Unix Administrator. Informix Specialist.