Abstract
This second article in the Easy IWA series details how to create an Accelerator under IBM Informix 12.10.FC4 Workgroup and Advanced editions. Worked examples to establish initial memory requirements are provided, which is key to effective and efficient resource usage and ensuring that initial provisions are reasonable. The Accelerator provides the interface between IBM Informix and the Warehouse Accelerator, performing mart administration and query processing.
Content
Before creating and initialising an accelerator there are several considerations that should be made and requirements established and satisfied:
By limiting the
FACT
and DIMENSION
table columns to only what is required, memory and offline storage requirements are significantly reduced.Understanding what tables are candidates for the
FACT
, FACT|DIMENSION
(aka HYBRID
) and DIMENSION
may result in reducing the scope of queries that can be satisfied to reduce memory and storage footprint; for example, is a full set of address details really relevant for a Data Warehouse query? Perhaps just the POSTCODE
would be more appropriate, saving a significant amount of memory and filesystem resource.Using the TPCH database from the aforementioned previous article as a Mart, the tables below provide details on actual IWA memory and offline storage requirements.
The first table provides details on the number of rows produced for each Scale Factor used for the TPCH dbgen utility, along with whether a table is a
DIMENSION
or a FACT
table (for a query to be able to be processed by IWA, there has to be at least one FACT
table).The second table shows the actual IDS storage used for each scale factor along with the IWA Data Mart memory requirements, compared to the storage requirements when all
“X_comment”
columns are removed.As can be seen, nearly 4 times the data (48,000,000
lineitems
compared to 12,000,000 lineitems
) can be held in an accelerator with 2,560 MB of memory allocated to 2 WORKERS
by not storing the “X_comment”
columns. Note that DIMENSION
tables are held in full for each WORKER
and FACT
tables are distributed across all WORKER
s and filesystem storage will be of a similar size, so ensure that there is adequate storage (preferably a dedicated filesystem) for offline MART
storage.Scale Factor | REGION | NATION | SUPPLIER | PART | PARTSUPP | CUSTOMER | ORDERS | LINEITEM |
---|---|---|---|---|---|---|---|---|
Dimension | Dimension | Dimension | Dimension | Fact | Dimension | Dimension | Fact | |
SF=1 NRows | 5 | 25 | 10,000 | 200,000 | 800,000 | 150,000 | 1,500,000 | 6,000,000 |
SF=2 NRows | 5 | 25 | 20,000 | 400,000 | 1,600,000 | 300,000 | 3,000,000 | 12,000,000 |
SF=4 Nrows | 5 | 25 | 40,000 | 800,000 | 3,200,000 | 600,000 | 6,000,000 | 24,000,000 |
SF=8 Nrows | 5 | 25 | 80,000 | 1,600,000 | 6,400,000 | 1,200,000 | 12,000,000 | 48,000,000 |
SF=1 All Columns | SF=1 No X_comment | SF=2 All Columns | SF=2 No X_comment | SF=4 No X_comment | SF=8 No X_comment | |
---|---|---|---|---|---|---|
IDS Storage MB | 1,129 | 764 | 2,260 | 1,528 | 3,056 | 6,112 |
IWA Memory MB | 825 | 309 | 1,671 | 588 | 1,065 | 2,084 |
IWA / IDS storage ratio | 0.73 | 0.40 | 0.74 | 0.38 | 0.35 | 0.34 |
Kernel parameters
SHMMAX => Set to at least the shared memory requirements of a WORKER process for the Data Mart data, in this case 4GB; note that a WORKER process requires enough shared memory to hold DIMENSION tables in full and a portion of the FACT table (i.e. shared memory size of FACT table / number of WORKERS). From the above chart it can be seen that 4 GB is adequate to hold the complete Data Mart
Get the current value using:
Set the value using:
Alternatively, set it permanently in /etc/sysctl.conf by adding the following line
SHMALL => Set to at the size of available RAM, in this case 4GB, bearing in mind other requirements like “huge_pages” may limit actual available shared memory
Get the current value using:
Set the value using:
Alternatively, set it permanently in /etc/sysctl.conf by adding the following line
If running on very limited memory appliances, some consideration should be made to “overcommit_memory” and “overcommit_ratio”, especially if encountering OOM killer!
Increase the RAM based tmpfs filesystem
Review the /etc/fstab file, and ensure there is an entry similar to the following:
Note that the default is usually half the actual RAM available, for smooth running of IBM Informix Warehouse Accelerator, set this to be the same as the available RAM
and issue, as root, mount –o remount /dev/shm to enable this new size
$INFORMIXDIR/extend/krakatoa/jre/bin/java
Under
$INFORMIXDIR/dwa/examples/cli/source
are several source java files which have been built into the $INFORMIXDIR/dwa/example/cli/dwacli.jar
; adding this to $CLASSPATH
means that a set of IWA administration tools are available. In order to run, a connection file is required, conn.prop
and an example is provided in this directory.The CLASSPATH used for this environment is set to:
$INFORMIXDIR/jdbc/lib/ifxjdbc.jar:$INFORMIXDIR/dwa/example/cli/dwacli.jar:
As an example of usage, to get information about an active mart
jj_tpch_mart
defined in the accelerator JJIWA1
using the connection file $INFORMIXDIR/dwa/example/cli/conn.prop
, run the following noting that jj_tpch_external
is the database within the IBM Informix instance that is being connected to:The connection file,
conn.prop
, contains the following:Where jj-prepsuse-a-lan is the hostname, 1525 is the port where IBM Informix is listening, jj_tpch_external is the name of the database to connect to and INFORMIXSERVER is set to jj_prepsuse_a_1_t
A clustered filesystem sharing the IWA data store across the nodes is required
A trust relationship is between the nodes obviating the need for a password entry
SBSPACE
is required, less than 64MB, and the $ONCONFIG
entry SBSPACENAME
updated with the name of the SBSPACE
; this will require an instance restart.There is a warning produced when initialising the Accelerator informing of this:
Although not a specific requirement, adding a single “dwavp” to the
$ONCONFIG
will avoid a dynamic allocation of a dwavp when the first IWA related activity occurs.In systems with significant activity, it is probably worthwhile having two “dwavp”s configured in the
$ONCONFIG
to ensure timely response to IWA activities (specifically during Mart loading, when the dwavp may be busy).Add the following line to the
$ONCONFIG
:$INFORMIXDIR/dwa/etc
directory, and named dwainst.conf
.Once the comments have been removed, there are surprisingly few parameters to set, however, it is worth noting that these parameters are interpreted and a much more detailed configuration is determined and placed under the storage area.
The IWA configuration file used in this example contains the following:
The location where the actual Mart is stored, along with various other components like log files and the detailed node configuration files (which should not be manipulated)
It is generally recommended to provide a distinct filesystem with a minimum size of at least the Mart size plus a generous excess (several GB) for potential Mart growth and supporting components
This is the default value, and is generally not required to be changed, although a confirmation of port availability of
START_PORT-1+NUM_NODES*4
is suggested using netstat -an
Here this specifies the total number of
COORDINATOR
and WORKER
processes; here, this specifies that there will be one COORDINATOR
(if NUM_NODES
is less than 8) and 2 WORKERs
This specifies the maximum amount of shared memory to allocate to the
WORKER
processes for storing the Data Marts’ data . A very basic rule of thumb here is to ensure that the size of all Data Marts fits within this value, in this example with a single Data Mart of 2,084, where the FACT table distribution and the DIMENSION tables do not exceed 1,280 per WORKER, 2,560 is an appropriate value. It should also be noted that WORKER
processes will allocate private memory (i.e. outside of shared memory) for query processing and that this can be a significant requirement, a conservative rule of thumb is to allocate 40% of memory to WORKER_SHM and allow up to 60% of memory for query processingThis specifies the total amount of shared memory allocated to the
COORDINATOR
process. If the workload being serviced by the Accelerator performs a lot of sorting and a high amount of data extraction, this may not be sufficient. There is also a requirement for private memory allocation to service query processing.This specifies the communication interface between the Accelerator and the supporting IBM Informix instance. Here “lo”, local loopback, is used, as IBM Informix and the Accelerator are running on the same host, although this could be any local network interface which would be required when running on separate hosts.
For a dedicated IWA host then having this value set to 100 attempts to distribute evenly ALL CPU resources across the
COORDINATOR
and WORKER
processes for Mart loadingFor a dedicated IWA host then having this value set to 100 attempts to distribute evenly ALL CPU resources across the
COORDINATOR
and WORKER
processes for Mart scanning/var/log/messages
. For clarity it is recommended to have a separate message file, under OpenSuse 13.1 this is easily achieved by performing the following:Change the following lines in
/etc/rsyslogd.conf
:and
To
and
Add the following line:
Restart the
rsyslogd
service:Once this has been done a confirmation by restarting the IWA accelerator can be performed; confirm no messages in
/var/log/messages
and that /var/log/iwamessages
is being populated. It is worth noting that there is quite a lot of potential output!jj_tpch_mart
, and removal of the Accelerator, JJDWA1
before going through initialisation.On the very first initialisation, the only required components from the above script are:
The purpose of
ondwa getpin
is to populate the active SQLHOSTS
file with relevant Accelerator entries and an authentication token.For reference, the expected output from running the full script above is as follows:
At this point the Accelerator should be up and running, albeit with no active Data Marts so memory utilisation will not be representative of actual requirements.
Confirmation of the status of the Accelerator is performed using ondwa status, and should be similar to the following output:
Conclusion
Setup and initialisation of IBM Informix Warehouse Accelerator is a straight forward process, however, it is critical to understand the shared memory resource requirements that will be made by an Accelerator which are entirely dependent on Data Mart design and the number of allocated WORKER
processes, along with the complexity of the query being processed which will have an impact on the private memory requirements beyond shared memory. This article attempts to provide a straight forward and clear guide to initial IWA instantiation and explain how the Accelerator configuration in conjunction with Data Mart design impacts on resource usage.
Disclaimer
The code fix suggested above is provided “as is” without warranty of any kind, either express or implied, including without limitation any implied warranties of condition, uninterrupted use, merchantability, fitness for a particular purpose, or non-infringement.
Contact us
If you have any questions or would like to find out more about Accelerator and Informix, simply contact us.