Processing a header and trailer textfile in SAS / BASE

Example case

The input file in our scenario is a stream of records representing invoices. It is a text file in a header-trailer format and has the following structure:
- A header starts with a letter H which is followed by an invoice number, a customer number, a date and currency
- Every item starts with a letter I which is followed by product ID, product age, quantity and net value
- Trailer starts with a T and contains two values and may be used for validation: total number of invoice lines and total net value.

SAS solution

SAS is a powerful application when dealing with unusually structured files. To prove this we will use only 2 simple SAS Base scripts to process the input file organized in header and trailer format.

  • In the first step the entire extract while the first character (letter H, I or T) is separated from other strings in the line


    Read the entire header and trailer input file into a ht_input dataset

    Two fields will be filled:

    - one with type of record (kind)

    - other with a complete dataline (starting from the second character)


    data ht_input;

    format kind $1.;

    format dataline $256.;



    Despite the fact that there are no commas in the extract, we use it as a separator to tell SAS not to use a default blank space as a separator


    infile 'D:\data\etl-examples\in_invoices.txt' MISSOVER dlm=',';

    input kind $ 1 dataline $;


    SAS header and trailer input loaded into a dataset:
    SAS header and trailer input loaded into a dataset

  • Now the loaded header and items dataset will be processed in the way that records will be placed in three outputs simultaneously: headers, items and trailers
  • RETAIN SAS statement will be used to save appropriate invoice header IDs accross all the input lines. We will also use it to assign corresponding invoice lines.

    * Columns definition for 3 output sas datasets: headers, items and trailers ;


    data headers(keep=invc_header_id cust_id dte crncy)

         items (keep=invc_header_id invc_line_id

    prod_id prod_age quantity net_value)

         trailers (keep=invc_header_id no_lines total_net_value);


    set ht_input;


    * format records ;

          format invc_header_id $8.;

          format cust_id $6.;

          format prod_id $8.;

          format dte $8.;

          format crncy $3.;

          format prod_age 2.0;

          format quantity 7.0;

          format net_value 8.2;

          format no_lines 2.0 ;

          format total_net_value 8.2;


          * processing in case the dataline is a header ;

          IF KIND = "H" THEN


          * populate fields;

          invc_header_id = substr(dataline,1,8);

          cust_id = substr(dataline,9,6);

          dte = substr(dataline,15,8);

          crncy = substr(dataline,23,3);


          * create record in the headers dataset;

          output headers;

          invc_line_id = 0; * reset invoice lines counter ;



          * invoice line processing (detail, item);

          IF KIND = "I" THEN


          * increase invoice line sequence;

          invc_line_id = invc_line_id + 1;


          * populate fields;

          prod_id = substr(dataline,1,7);

          prod_age = substr(dataline,8,2);

          quantity = substr(dataline,11,7);

          net_value = substr(dataline,19,11);


          * create new record in the items dataset;

          output items;


    * retain will make it possible to remember the current invoice line ID in the next processing loop;

          retain invc_line_id ;



          * process trailer ;

          IF KIND = "T" THEN


          no_lines = substr(dataline,1,2);

          total_net_value = substr(dataline,3,11);


    * create new trailer record;

          output trailers;



    * The processed invoice header ID needs to be stored for the next loop ;

    retain invc_header_id;


    Header and trailer format input file processed in SAS:
    Header and trailer format input file processed in SAS

  • Header and trailer example main page

    Back to the Data Warehousing tutorial home