|
ETL-Tools.Info
|
Business Intelligence - Data warehousing - ETL |
|
Processing a header and trailer textfile in Pentaho Data Integration (PDI)Example case
The input file in our scenario is a stream of records representing invoices. It is a text file in a header-trailer format and has the following structure: Pentaho Spoon solutionThe final header and trailer transform designed in Pentaho Data Integration is depicted below: ![]() Detailed analysis of all components of the transform and the ETL processExtract with the invoices: ![]() var prev_row; var invc_header_id; var invc_line_id; if (prev_row == null) prev_row =
row; var prevInvHead = prev_row.getString("data_line","-"); if (kind.getString()=='H') { invc_header_id = substr(data_line.getString(),0,8); //Populate invoice header fields var cust_id = data_line.Clone().substr(8,14).trim(); var dte = data_line.Clone().substr(14,22).trim(); var crncy = data_line.Clone().substr(22,25).trim(); } if (kind.getString()!='I') { invc_line_id=0; } else { invc_line_id++; //populate invoice lines fields var prod_id = data_line.Clone().substr(1,7).trim(); var prod_age = data_line.Clone().substr(7,9).trim(); var quantity = data_line.Clone().substr(9,16).trim(); var net_value = data_line.Clone().substr(16,27).trim().str2num("0.##",".",""); } PDI header and trailer mapping javascript object: ![]() PDI header and trailer mapping javascript object: ![]() ETL process execution log: ![]() Contents of the invc_headers.csv and invc_items.csv output files invc_header_id;cust_id;dte;crncy INV01696;R006PL;20080316;EUR INV01698;W002UK;20080317;GBP invc_header_id;invc_line_id;prod_id;prod_age;quantity;net_value INV01696;1;CONU01;3;5;22,47 INV01696;2;TRFO00;9;1;49,99 INV01698;1;HOBE01;0;100;1199,99 INV01698;2;RORE02;12;10;149,99 INV01698;3;LOMA02;6;1;49,99 INV01698;4;CONU01;3;50;199,49
|
|
All Rights Reserved |
Add a comment