Extending Infosphere Datastage

One of the great features of Datastage Parallel engine is a possibility of extending it by creating custom components. The following article describes what are the options to build a DataStage PX stage that handles special processing needs not supplied with the native stages.

Many people may ask why build a custom stage?
Basically the main reason is to implement a complex business logic, not easily accomplished using standard Datastage stages or reuse of existing C, C++, Java, COBOL transformations.

Both BuildOPS and CustomOPS are primarly C++ code.

Wrappers

Wrappers are good if you cannot or do not want to modify the application and performance is not critical.
Basically an OS-level legacy executable can be wrapped and turned into a Datastage PX stage (capable of parallel execution within the framework). Examples of commands that can be wrapped: a Binary file, Unix command (ls, grep, etc), Shell script.

There are a few conditions that the legacy executable needs to fulfill:

Wrappers are treated by Infosphere Datastage as a black box, the application has no knowledge of contents, has no means of managing anything that occurs inside the wrapper, it only knows how to export data to and import data from the wrapper. So it is a user's task to know at design time the intended behavior of the wrapper and its schema interface.

Example: wrapping ls Unix command:
Ls /dwdev/sourcedata would yield a list of files and subdirectories. The wrapper is thus comprised of the command and a parameter that contains a disk location.

Buildops

Buildops are good if users need custom coding but do not need dynamic (runtime-based) input and output interfaces.
Buildop provides a simple means of extending beyond the functionality provided by PX, but does not use an existing executable (like the wrapper).

The Datastage interface called buildop automatically performs the tedious, error-prone tasks rquired to compile the program, such as invoke needed header files and build the necessary 'plumbing' for a correct and efficient parallel execution.
The code needs to be Ansi C/C++ compliant. If code does not compile outside of Datastage, it will not compile within Datastage PX either. The good thing is that BuildOPS are compiled by DataStage itself.

Custom stage

Custom (C++ coding using framework API) is used when there is a need for custom coding and for dynamic input and output interfaces.

CustomOPS vs BuildOPS