Datastage EE configuration file

The Datastage EE configuration file is a master control file (a textfile which sits on the server side) for Enterprise Edition jobs which describes the parallel system resources and architecture. The configuration file provides hardware configuration for supporting such architectures as SMP (Single machine with multiple CPU , shared memory and disk), Grid , Cluster or MPP (multiple CPU, mulitple nodes and dedicated memory per node).

The configuration file defines all processing and storage resources and can be edited with any text editor or within Datastage Manager.
The main outcome from having the configuration file is to separate software and hardware configuration from job design. It allows changing hardware and software resources without changing a job design. Datastage EE jobs can point to different configuration files by using job parameters, which means that a job can utilize different hardware architectures without being recompiled.

The Datastage EE configuration file is specified at runtime by a $APT_CONFIG_FILE variable.

Configuration file structure

Datastage EE configuration file defines number of nodes, assigns resources to each node and provides advanced resource optimizations and configuration.

Sample configuration files

Configuration file for a simple SMP

A basic configuration file for a single machine, two node server (2-CPU) is shown below. The file defines 2 nodes (dev1 and dev2) on a single etltools-dev server (IP address might be provided as well instead of a hostname) with 3 disk resources (d1 , d2 for the data and temp as scratch space).

The configuration file is shown below:

{
	node "dev1"
	{
		fastname "etltools-dev"
		pool ""
		resource disk "/data/etltools-tutorial/d1" { }
		resource disk "/data/etltools-tutorial/d2" { }		
		resource scratchdisk "/data/etltools-tutorial/temp" { }
	}

	node "dev2"
	{
		fastname "etltools-dev"
		pool ""
		resource disk "/data/etltools-tutorial/d1" { }
		resource scratchdisk "/data/etltools-tutorial/temp" { }
	}	
}

Configuration file for a cluster / MPP / grid

The sample configuration file for a cluster or a grid computing on 4 machines is shown below.
The configuration defines 4 nodes (etltools-prod[1-4]), node pools (n[1-4]) and s[1-4), resource pools bigdata and sort and a temporary space.

{
	node "prod1"
	{
		fastname "etltools-prod1"
		pool "" "n1" "s1""tutorial2" "sort"
		resource disk "/data/prod1/d1" {}
		resource disk "/data/prod1/d2" {"bigdata"}		
		resource scratchdisk "/etltools-tutorial/temp" {"sort"}
	}

	node "prod2"
	{
		fastname "etltools-prod2"
		pool "" "n2" "s2""tutorial1"
		resource disk "/data/prod2/d1" {}
		resource disk "/data/prod2/d2" {"bigdata"}		
		resource scratchdisk "/etltools-tutorial/temp" {}
	}

	node "prod3"
	{
		fastname "etltools-prod3"
		pool "" "n3" "s3""tutorial1"
		resource disk "/data/prod3/d1" {}
		resource scratchdisk "/etltools-tutorial/temp" {}
	}

	node "prod4"
	{
		fastname "etltools-prod4"
		pool "n4" "s4""tutorial1"
		resource disk "/data/prod4/d1" {}
		resource scratchdisk "/etltools-tutorial/temp" {}
	}
}

Validate configuration file

The easiest way to validate the configuration file is to export APT_CONFIG_FILE variable pointing to the newly created configuration file and then issue the following command:
orchadmin check