Abstract

This second article in the Easy IWA series details how to create an Accelerator under IBM Informix 12.10.FC4 Workgroup and Advanced editions. Worked examples to establish initial memory requirements are provided, which is key to effective and efficient resource usage and ensuring that initial provisions are reasonable. The Accelerator provides the interface between IBM Informix and the Warehouse Accelerator, performing mart administration and query processing.

Content

Before creating and initialising an accelerator there are several considerations that should be made and requirements established and satisfied:

The design of the data mart is key in ensuring effective and efficient usage of available O/S resources.

By limiting the FACT and DIMENSION table columns to only what is required, memory and offline storage requirements are significantly reduced.

Understanding what tables are candidates for the FACTFACT|DIMENSION (aka HYBRID) and DIMENSION may result in reducing the scope of queries that can be satisfied to reduce memory and storage footprint; for example, is a full set of address details really relevant for a Data Warehouse query? Perhaps just the POSTCODE would be more appropriate, saving a significant amount of memory and filesystem resource.

Using the TPCH database from the aforementioned previous article as a Mart, the tables below provide details on actual IWA memory and offline storage requirements.

The first table provides details on the number of rows produced for each Scale Factor used for the TPCH dbgen utility, along with whether a table is a DIMENSION or a FACT table (for a query to be able to be processed by IWA, there has to be at least one FACT table).

The second table shows the actual IDS storage used for each scale factor along with the IWA Data Mart memory requirements, compared to the storage requirements when all “X_comment” columns are removed.

As can be seen, nearly 4 times the data (48,000,000 lineitems compared to 12,000,000 lineitems) can be held in an accelerator with 2,560 MB of memory allocated to 2 WORKERS by not storing the “X_comment” columns. Note that DIMENSION tables are held in full for each WORKER and FACT tables are distributed across all WORKERs and filesystem storage will be of a similar size, so ensure that there is adequate storage (preferably a dedicated filesystem) for offline MART storage.

Scale FactorREGIONNATIONSUPPLIERPARTPARTSUPPCUSTOMERORDERSLINEITEM
Dimension Dimension Dimension Dimension FactDimensionDimensionFact
SF=1 NRows52510,000200,000800,000 150,0001,500,000 6,000,000
SF=2 NRows52520,000400,0001,600,000300,0003,000,00012,000,000
SF=4 Nrows52540,000800,000 3,200,000 600,0006,000,00024,000,000
SF=8 Nrows52580,0001,600,0006,400,0001,200,00012,000,00048,000,000


SF=1
All Columns
SF=1 No X_commentSF=2
All Columns
SF=2 No X_commentSF=4 No X_commentSF=8 No X_comment
IDS Storage MB1,1297642,2601,5283,0566,112
IWA Memory MB8253091,6715881,0652,084
IWA / IDS storage ratio0.730.400.740.380.350.34
This article was developed on a Virtual Machine with 6 GB of RAM and 4 virtual CPUs allocated, running OpenSuse 13.1; the following are based on squeezing in an IBM Informix instance and an IBM Informix Warehouse Accelerator loaded with the Mart as described in the last table above.

Kernel parameters

SHMMAX => Set to at least the shared memory requirements of a WORKER process for the Data Mart data, in this case 4GB; note that a WORKER process requires enough shared memory to hold DIMENSION tables in full and a portion of the FACT table (i.e. shared memory size of FACT table / number of WORKERS). From the above chart it can be seen that 4 GB is adequate to hold the complete Data Mart

Get the current value using:

Copy to Clipboard

Set the value using:

Copy to Clipboard

Alternatively, set it permanently in /etc/sysctl.conf by adding the following line

Copy to Clipboard

SHMALL => Set to at the size of available RAM, in this case 4GB, bearing in mind other requirements like “huge_pages” may limit actual available shared memory

Get the current value using:

Copy to Clipboard

Set the value using:

Copy to Clipboard

Alternatively, set it permanently in /etc/sysctl.conf by adding the following line

Copy to Clipboard

If running on very limited memory appliances, some consideration should be made to “overcommit_memory” and “overcommit_ratio”, especially if encountering OOM killer!

Increase the RAM based tmpfs filesystem

Review the /etc/fstab file, and ensure there is an entry similar to the following:

Copy to Clipboard

Note that the default is usually half the actual RAM available, for smooth running of IBM Informix Warehouse Accelerator, set this to be the same as the available RAM

Copy to Clipboard

and issue, as root, mount –o remount /dev/shm to enable this new size

Ensure that a working java environment is available, alternatively, the java from the IBM Informix engine can be used – $INFORMIXDIR/extend/krakatoa/jre/bin/java

Under $INFORMIXDIR/dwa/examples/cli/source are several source java files which have been built into the $INFORMIXDIR/dwa/example/cli/dwacli.jar; adding this to $CLASSPATH means that a set of IWA administration tools are available. In order to run, a connection file is required, conn.prop and an example is provided in this directory.

The CLASSPATH used for this environment is set to:

$INFORMIXDIR/jdbc/lib/ifxjdbc.jar:$INFORMIXDIR/dwa/example/cli/dwacli.jar:

As an example of usage, to get information about an active mart jj_tpch_mart defined in the accelerator JJIWA1 using the connection file $INFORMIXDIR/dwa/example/cli/conn.prop, run the following noting that jj_tpch_external is the database within the IBM Informix instance that is being connected to:

Copy to Clipboard

The connection file, conn.prop, contains the following:

Copy to Clipboard

Where jj-prepsuse-a-lan is the hostname, 1525 is the port where IBM Informix is listening, jj_tpch_external is the name of the database to connect to and INFORMIXSERVER is set to jj_prepsuse_a_1_t
In order to run IBM Informix across multiple nodes, there are several pre-requisites; although this article is not based on multiple nodes, only the following requirements are needed to be satisfied:

A clustered filesystem sharing the IWA data store across the nodes is required

A trust relationship is between the nodes obviating the need for a password entry
A small SBSPACE is required, less than 64MB, and the $ONCONFIG entry SBSPACENAME updated with the name of the SBSPACE; this will require an instance restart.

There is a warning produced when initialising the Accelerator informing of this:

Copy to Clipboard

Although not a specific requirement, adding a single “dwavp” to the $ONCONFIG will avoid a dynamic allocation of a dwavp when the first IWA related activity occurs.

In systems with significant activity, it is probably worthwhile having two “dwavp”s configured in the $ONCONFIG to ensure timely response to IWA activities (specifically during Mart loading, when the dwavp may be busy).

Add the following line to the $ONCONFIG:

Copy to Clipboard
The configuration file for IWA is located under the $INFORMIXDIR/dwa/etc directory, and named dwainst.conf.

Once the comments have been removed, there are surprisingly few parameters to set, however, it is worth noting that these parameters are interpreted and a much more detailed configuration is determined and placed under the storage area.

The IWA configuration file used in this example contains the following:

Copy to Clipboard

The location where the actual Mart is stored, along with various other components like log files and the detailed node configuration files (which should not be manipulated)

It is generally recommended to provide a distinct filesystem with a minimum size of at least the Mart size plus a generous excess (several GB) for potential Mart growth and supporting components

Copy to Clipboard

This is the default value, and is generally not required to be changed, although a confirmation of port availability of START_PORT-1+NUM_NODES*4 is suggested using netstat -an

Copy to Clipboard

Here this specifies the total number of COORDINATOR and WORKER processes; here, this specifies that there will be one COORDINATOR (if NUM_NODES is less than 8) and 2 WORKERs

Copy to Clipboard

This specifies the maximum amount of shared memory to allocate to the WORKER processes for storing the Data Marts’ data . A very basic rule of thumb here is to ensure that the size of all Data Marts fits within this value, in this example with a single Data Mart of 2,084, where the FACT table distribution and the DIMENSION tables do not exceed 1,280 per WORKER, 2,560 is an appropriate value. It should also be noted that WORKER processes will allocate private memory (i.e. outside of shared memory) for query processing and that this can be a significant requirement, a conservative rule of thumb is to allocate 40% of memory to WORKER_SHM and allow up to 60% of memory for query processing

Copy to Clipboard

This specifies the total amount of shared memory allocated to the COORDINATOR process. If the workload being serviced by the Accelerator performs a lot of sorting and a high amount of data extraction, this may not be sufficient. There is also a requirement for private memory allocation to service query processing.

Copy to Clipboard

This specifies the communication interface between the Accelerator and the supporting IBM Informix instance. Here “lo”, local loopback, is used, as IBM Informix and the Accelerator are running on the same host, although this could be any local network interface which would be required when running on separate hosts.

Copy to Clipboard

For a dedicated IWA host then having this value set to 100 attempts to distribute evenly ALL CPU resources across the COORDINATOR and WORKER processes for Mart loading

Copy to Clipboard

For a dedicated IWA host then having this value set to 100 attempts to distribute evenly ALL CPU resources across the COORDINATOR and WORKER processes for Mart scanning
IWA uses the system logging facility “local6”; under OpenSuse 13.1 the default placement of “local6” messages is in /var/log/messages. For clarity it is recommended to have a separate message file, under OpenSuse 13.1 this is easily achieved by performing the following:

Change the following lines in /etc/rsyslogd.conf:

Copy to Clipboard

and

Copy to Clipboard

To

Copy to Clipboard

and

Copy to Clipboard

Add the following line:

Copy to Clipboard

Restart the rsyslogd service:

Copy to Clipboard

Once this has been done a confirmation by restarting the IWA accelerator can be performed; confirm no messages in /var/log/messages and that /var/log/iwamessages is being populated. It is worth noting that there is quite a lot of potential output!
It is preferable to have a small supporting script file to perform the initialisation; the script detailed below is repeatable performing a full clear down of a single Mart, jj_tpch_mart, and removal of the Accelerator, JJDWA1 before going through initialisation.

Copy to Clipboard

On the very first initialisation, the only required components from the above script are:

Copy to Clipboard

The purpose of ondwa getpin is to populate the active SQLHOSTS file with relevant Accelerator entries and an authentication token.

For reference, the expected output from running the full script above is as follows:

Copy to Clipboard

At this point the Accelerator should be up and running, albeit with no active Data Marts so memory utilisation will not be representative of actual requirements.

Confirmation of the status of the Accelerator is performed using ondwa status, and should be similar to the following output:

Copy to Clipboard

Conclusion

Setup and initialisation of IBM Informix Warehouse Accelerator is a straight forward process, however, it is critical to understand the shared memory resource requirements that will be made by an Accelerator which are entirely dependent on Data Mart design and the number of allocated WORKER processes, along with the complexity of the query being processed which will have an impact on the private memory requirements beyond shared memory. This article attempts to provide a straight forward and clear guide to initial IWA instantiation and explain how the Accelerator configuration in conjunction with Data Mart design impacts on resource usage.

Disclaimer

The code fix suggested above is provided “as is” without warranty of any kind, either express or implied, including without limitation any implied warranties of condition, uninterrupted use, merchantability, fitness for a particular purpose, or non-infringement.

Contact us

If you have any questions or would like to find out more about Accelerator and Informix, simply contact us.

Author

  • Big Data and RDBMS Senior Technical Engineer and Unix Administrator. Informix Specialist.