Data Warehousing ETL tutorial

The ETL and Data Warehousing tutorial is organized into lessons representing various business intelligence scenarios, each of which describes a typical data warehousing challenge.
This guide might be considered as an ETL process and Data Warehousing knowledge base with a series of examples illustrating how to manage and implement the ETL process in a data warehouse environment.

The purpose of this tutorial is to outline and analyze the most widely encountered real life datawarehousing problems and challenges that need to be taken during the design and architecture phases of a successful data warehouse project deployment.

Going through the sample implementations of the business scenarios is also a good way to compare Business Intelligence and ETL tools and get to know the different approaches to designing the data integration process. This also gives an idea and helps identify strong and weak points of various ETL and data warehousing applications.

This tutorial shows how to use the following BI, ETL and datawarehousing tools: Datastage, SAS, Pentaho, Cognos and Teradata.

Data Warehousing & ETL Tutorial lessons

  • Surrogate key generation example which includes information on business keys and surrogate keys and shows how to design an ETL process to manage surrogate keys in a data warehouse environment. Sample design in Pentaho Data Integration
  • Header and trailer processing - considerations on processing files arranged in blocks consisting of a header record, body items and a trailer. This type of files usually come from mainframes, also it applies to EDI and EPIC files. Solution examples in Datastage, SAS and Pentaho Data Integration
  • Loading customers - a data extract is placed on an FTP server. It is copied to an ETL server and loaded into the data warehouse. Sample loading in Teradata MultiLoad
  • Data allocation ETL process case study for allocating data. Examples in Pentaho Data Integration and Cognos PowerPlay
  • Data masking and scambling algorithms and ETL deployments. Sample Kettle implementation
  • Site traffic analysis - a guide to creating a data warehouse with data marts for website traffic analysis and reporting. Sample design in Pentaho Kettle
  • Real time ETL with Kafka streaming - a guide to transition from traditional ETL to modern real-time streams. Sample design in Pentaho Kettle
  • Data Quality - ETL process design aimed to test and cleanse data in a Data Warehouse. Sample outline in PDI
  • XML ETL processing and parsing example


2009-09-04 05:41:00 by behramkhan:
very informative web site both for students and faculty.
2011-02-24 16:30:43 by Ed:
Are Kettle and PDI and spoon the same thing?
2011-03-15 12:38:18 by ObjectiveC:
PDI = Pentaho Data Integration which is by default Kettle (but you could change this by replacing it with your tool of choice). Furthermore Spoon is part of Kettle. Kettle consists of more than just spoon alone (Pan, Kitchen, ...) .
Cheers ObjectiveC !
Find me on the pentaho forum ^^
2012-03-06 19:41:42 by Vasant Rao:
I would like like to see and learn the ETL tools and Data Warehousing tutorial.