FTP copy and load customers extract
Goal
Load customers data into a data warehouse according to the business requirements.The customers data details is very often changed in a source system and we want to reflect those changes in the data warehouse. We also want to be able to keep track of that changes.
Scenario overview and details
The customers data is extracted from the source system on a monthly basis and placed on a FTP server. The ETL process needs to get the file locally and load it into the data warehouse. There are several possible scenarios which need to be included in the process:- a new customer is added
- a customer already exists and needs to be updated
- an existing customer remains unchanged
- a record may be invalid
Additionally, we want to keep track of the changes made to the customers data and be able to see all the changes to a given record in time. To handle historical data we will use the Type 4 Slowly Changing Dimension. SCD Type 4 means that the changed or deleted data is stored in a separate table. A timestamp and a change index will be added to handle records which change more than once.
Input data:
- D_CUSTOMER - data warehouse table with customers
- Dwh_cust_extract_ 2411.txt - customers extract for the current month
Output data:
- D_CUSTOMER - updated table which stores only current records.
- D_CUSTOMER_HIST table - table with historical data for the customers dimension. It stores deleted records and customers that have already been updated.
- Cust_errors_241121.txt - a log file with loading errors
Proposed solution
The design of an ETL process flow for the customers loading will be as follows:
a. If a record is malformed and does not pass validation, it is redirected to a reject flow.
b. If the lookup does not match records, it means that this is a new customer and it needs to be loaded into the customers table.
c. If the lookup matches, we need to compare the non-key fields to check if the customer details have changed. There are two options available: all fields remain the same (then we leave the record as it is and proceed to the next one) or a field has changed. In that case the current record needs to be insterted into the historical table and replaced by a new one in the main customers table.
Implementation
Loading customers ETL process implementation in various environments: