ETL process and concepts
ETL stands for extraction, transformation and loading. Etl is a process that involves the following tasks:
- extracting data from source operational or archive systems which are the primary source of data for the data warehouse
- transforming the data - which may involve cleaning, filtering, validating and applying business rules
- loading the data into a data warehouse or any other database or application that houses data
The ETL process is also very often referred to as Data Integration process and ETL tool as a Data Integration platform.
The terms closely related to and managed by ETL processes are: data migration, data management, data cleansing, data synchronization and data consolidation.
The main goal of maintaining an ETL process in an organization is to migrate and transform data from the source OLTP systems to feed a data warehouse and form data marts.
At present the most popular and widely used ETL tools and applications on the market are:IBM Websphere DataStage (Formerly known as Ascential DataStage and Ardent DataStage)
Oracle Warehouse Builder
Pentaho Data Integration - Kettle Project (open source ETL)
SAS ETL studio
Business Objects Data Integrator (BODI)
Microsoft SQL Server Integration Services (SSIS)
ETL process references
A detailed step-by-step instructions on how the ETL process can be implemented in IBM Websphere Datastage can be found in the
Implementing ETL in DataStage tutorial lesson.
Etl process can be also successfully implemented using an open source ETL tool - Kettle. There is a separate Pentaho Data Integration section on our pages:
Kettle ETL transformations