|
ETL-Tools.Info
|
Business Intelligence - Data warehousing - ETL |
|
Processing a header and trailer textfile in SAS / BASEExample case
The input file in our scenario is a stream of records representing invoices. It is a text file in a header-trailer format and has the following structure: SAS solutionSAS is a powerful application when dealing with unusually structured files. To prove this we will use only 2 simple SAS Base scripts to process the input file organized in header and trailer format. /* Read the entire header and trailer
input file into a ht_input dataset Two fields will be filled: - one with
type of record (kind) - other with
a complete dataline (starting from the second
character) */ data ht_input;
format kind $1.;
format dataline $256.; /* Despite the fact that there are no
commas in the extract, we use it as a separator to tell SAS not to use a default
blank space as a separator */
infile 'D:\data\etl-examples\in_invoices.txt' MISSOVER
dlm=',';
input kind $ 1
dataline $; run; SAS header and trailer input loaded into a dataset: ![]() * Columns definition for 3 output sas datasets: headers, items and trailers
; data headers(keep=invc_header_id
cust_id dte crncy)
items (keep=invc_header_id invc_line_id prod_id prod_age
quantity net_value)
trailers (keep=invc_header_id no_lines total_net_value); set ht_input; * format
records ; format invc_header_id
$8.; format cust_id $6.; format prod_id $8.; format dte $8.; format crncy $3.; format prod_age 2.0; format quantity 7.0; format net_value 8.2; format no_lines 2.0 ; format total_net_value
8.2; * processing in case the dataline is a header ; IF KIND = "H" THEN
do; * populate
fields; invc_header_id = substr(dataline,1,8); cust_id = substr(dataline,9,6); dte = substr(dataline,15,8); crncy = substr(dataline,23,3); * create
record in the headers dataset; output headers; invc_line_id = 0; *
reset invoice lines counter ; end; * invoice line processing (detail,
item); IF KIND = "I" THEN
do; * increase
invoice line sequence; invc_line_id = invc_line_id
+ 1; * populate
fields; prod_id = substr(dataline,1,7); prod_age = substr(dataline,8,2); quantity = substr(dataline,11,7); net_value = substr(dataline,19,11); * create
new record in the items dataset; output items; * retain will make it
possible to remember the current invoice line ID in the next processing loop; retain invc_line_id ; end; * process
trailer ; IF KIND = "T" THEN
do; no_lines = substr(dataline,1,2); total_net_value = substr(dataline,3,11); * create
new trailer record; output trailers; end; * The processed invoice header ID
needs to be stored for the next loop ; retain
invc_header_id; run; Header and trailer format input file processed in SAS: ![]()
|
|
All Rights Reserved |
Add a comment