ETL process and concepts

ETL stands for extraction, transformation and loading. Etl is a process that involves the following tasks:

The ETL process is also very often referred to as Data Integration process and ETL tool as a Data Integration platform.
The terms closely related to and managed by ETL processes are: data migration, data management, data cleansing, data synchronization and data consolidation.

The main goal of maintaining an ETL process in an organization is to migrate and transform data from the source OLTP systems to feed a data warehouse and form data marts.

ETL Tools

At present the most popular and widely used ETL tools and applications on the market are:

  • IBM Websphere DataStage (Formerly known as Ascential DataStage and Ardent DataStage)
  • Informatica PowerCenter
  • Oracle ETL
  • Ab Initio
  • Pentaho Data Integration - Kettle Project (open source ETL)
  • SAS ETL studio
  • Cognos Decisionstream
  • Business Objects Data Integrator (BODI)
  • Microsoft SQL Server Integration Services (SSIS)

    ETL process references

    A detailed step-by-step instructions on how the ETL process can be implemented in IBM Websphere Datastage can be found in the Implementing ETL in DataStage tutorial lesson.

    Etl process can be also successfully implemented using an open source ETL tool - Kettle. There is a separate Pentaho Data Integration section on our pages: Kettle ETL transformations