Pentaho Data Integration - Kettle ETL tool

Kettle (K.E.T.T.L.E - Kettle ETTL Environment) has been recently aquired by the Pentaho group and renamed to Pentaho Data Integration. Kettle is a leading open source ETL application on the market. It is classified as an ETL tool, however the concept of classic ETL process (extract, transform, load) has been slightly modified in Kettle as it is composed of four elements, ETTL, which stands for:

  • Data extraction from source databases
  • Transport of the data
  • Data transformation
  • Loading of data into a data warehouse

    Kettle is a set of tools and applications which allows data manipulations across multiple sources.
    The main components of Pentaho Data Integration are:
  • Spoon - a graphical tool which make the design of an ETTL process transformations easy to create. It performs the typical data flow functions like reading, validating, refining, transforming, writing data to a variety of different data sources and destinations. Tranformations designed in Spoon can be run with Kettle Pan and Kitchen.
  • Pan - is an application dedicated to run data transformations designed in Spoon.
  • Chef - a tool to create jobs which automate the database update process in a complex way
  • Kitchen - it's an application which helps execute the jobs in a batch mode, usually using a schedule which makes it easy to start and control the ETL processing
  • Carte - a web server which allows remote monitoring of the running Pentaho Data Integration ETL processes through a web browser.

    Currently, the data sources and supported databases in Kettle ETL are:
    - Any database using ODBC on Windows
    - Oracle
    - MySQL
    - AS/400
    - MS Access
    - MS SQL Server
    - IBM DB2
    - PostgreSQL
    - Intersystems Caché
    - Informix
    - Sybase
    - dBase
    - Firebird SQL
    - MaxDB (SAP DB)
    - Hypersonic
    - CA Ingress
    - SAP R/3 System (using the ProSAPCONN plugin)