Informatica mapping development
The article provides a checklist of topics to consider during the Informatica ETL development development project and covers a variety of tips, guidelines and things to consider before proceeding with the development.
General ETL development tips
Before designing a mapping, it is important to have a clear picture of the end-to-end processes that the data will flow through.
It is a good practice to create a high-level view of the mapping first and document a picture of the process with the mapping, using a textual description to explain exactly what the mapping is supposed to accomplish and the methods or steps it will follow to accomplish its goal.
Next, document the details at the field level, listing each of the target fields and the source fields that are used to create the target field, along with the transformations used to create this field (for example: a sum calculation, a concatenation of two fields, a comparison of two fields, etc.). At this point the designer may have to do some investigation for some business rules with business guys.
Create an inventory of Mappings and Reusable objects (mapplets, worklets). This will be a 'work in progress' list and will have to be continually updated (this is particularly valuable for the lead developer).
The administrator or lead developer should gather all of the potential Sources, Targets and Reusable objects and place these in a shared folder accessible to all who may need access to them.
As for Reusable objects, they need to be properly documented to make it easier for other developers to determine if they can be re-used.
As a developer the specifications for a mapping should include required Sources, Targets and additional information regarding derived ports and finally how the ports relate from the source to the target.
Document any other information about the mapping that is likely to be helpful in developing the mapping. This may, for example, include source and target database connection information (database schema owners, passwords and connect strings), lookups and how to match data in the lookup tables, data cleansing needed at a field level, potential data issues at a field level, any known issues with particular fields, pre or post mapping processing requirements, and any information about specific error handling for the mapping.
The completed mapping design should then be reviewed with one or more team members for completeness and adherence to the business requirements. In addition, the design document should be updated if the business rules change or if more information is gathered during the build process.
Informatica-specific mapping development guidelines
One of the first things to do is to bring in all required source and target objects into the mapping.
Only connect fields that are needed or will be used.
Only connect from the Source Qualifier those fields needed subsequently.
Filter early and often. Only manipulate data that needs to be moved and transformed. Reduce the number non-essential records that are passed through the mapping.
Decide if a Source Qualifier join will net the result needed versus creating a Lookup to retrieve desired results.
Reduce the number of transformations. Excessive number of transformations will increase overhead.
Consider increasing the shared memory when using a large number of transformations.
Make use of variables, local or global, to reduce the number of times functions will have to be used.
Watch the data types. The Informatica engine converts compatible data types automatically. Excessive number of conversions is inefficient.
Make use of variables, reusable transformations and mapplets for reusable code. These will leverage the work done by others.
Use active transformations early in the process to reduce the number of records as early in the mapping as possible.
When joining sources, select appropriate driving and master table.
Utilize single pass reads. Design mappings to utilize one Source Qualifier to populate multiple targets.
Remove or reduce field-level stored procedures. These will be executed for each record and slow performance.