DataStage EE environment variables

The default environment variables settings are provided during the Datastage installation (common for all users).

Users have a few options to override the default settings with Datastage client applications:

  • With Datastage Administrator - project-wide defaults for general environment variables, set per project in the Projects tab under Properties -> General Tab -> Environment...
  • With Datastage Designer - settings at the job level in Job Properties
  • With Datastage Director - settings per run, overrides all other settings and is very useful for testing and debuging.

    The Datastage environment variables are grouped and each variable falls into one of categories.
    Basically the default values set up during an installation are resonable and in most cases there is no need to modify them.

    Setting environment variables for parallel execution in Datastage Administrator
    Setting environment variables for parallel execution in Datastage Administrator


    Environment variables overview

    Listed below are only environment variables that are candidates to adjustment in real-life project deployments. Please refer to the datastage help for details on variables not listed here.

    General variables

  • LD_LIBRARY_PATH - specifies the location of dynamic libraries on Unix
  • PATH - Unix shell search path
  • TMPDIR - temporary directory

    Parallel properties

  • APT_CONFIG_FILE - the parallel job configuration file. It points to the active configuration file on the server. Please refer to Datastage EE configuration guide for more details on creating a config file.
  • APT_DISABLE_COMBINATION - prevents operators (stages) from being combined into one process. Used mainly for benchmarks.
  • APT_ORCHHOME - home path for parallel content.
  • APT_STRING_PADCHAR - defines a pad character which is used when a varchar is converted to a fixed length string

    Operator specific

    The operator specific variables under parallel properties are stage specific settings and usually set during an installation. The settings apply to the supported parallel database engines (DB2, Oracle, Sas and Teradata).

  • APT_DBNAME - default DB2 database name to use
  • APT_RDBMS_COMMIT_ROWS - RDBMS commit interval

    Reporting

    The reporting variables control logging options and take True/False values only.

  • APT_DUMP_SCORE - shows operators, datasets, nodes, partitions, combinations and processes used in a job.
  • APT_RECORD_COUNTS - helps detect and analyze load imbalance. It prints the number of records consumed by getRecord() and produced by putRecord()
  • OSH_PRINT_SCHEMAS - shows unformatted metadata for all stages (interface schema) and datasets (record schema). OSH_PRINT_SCHEMAS environment variable should be set to verify that runtime schemas match the job design column definitions (especially from Oracle).
  • OSH_DUMP - shows an OSH script and produces a verbose description of a step before executing it
  • APT_NO_JOBMON - disables performance statistics and process metadata reporting in Designer.

    Compiler

  • APT_COMPILER - path to the C++ compiler needed to compile transformer stages


    Comments

    2012-08-25 14:38:56 by Yogi:
    Very good and informative article on environmental variables