Processing a header and trailer textfile

Goal

Process a text file which contains records arranged in blocks consisting of a header record, details (items, body) and a trailer. The aim is to normalize records and load them into a relational database structure.

Scenario overview and details

Typically, header and item processing needs to be implemented when processing files that origin from mainframe systems, and also EDI transmission files, SWIFT Schemas, EPIC files.

The input file in our scenario is a stream of records representing invoices. Each invoice consists of :
- A header containing invoice number, dates, customer reference and other details
- One or more items representing ordered products, including an item number, quantity, price and value
- A trailer which contains summary information for all the items

The records are distinguished by the first character in each line: H stands for headers, I for items and T for the trailers.
The lines in a section of the input file are fixed length (so for example all headers have the same number of characters but it varies from items).

Input data:

Text file in a Header-trailer format. The file presented in our example is divided into headers, items and trailers.
- A header starts with a letter H and then contains an invoice number, a customer number, a date and invoice currency
- Every item starts with a letter I which is followed by product ID, product age, quantity and net value
- The trailer starts with a T and contains two values which act as a checksum: the total number of invoice lines and a total net value.

Sample header and trailer input textfile:

Output data:

INVC_HEADER and INVC_LINE relational tables: one with invoice headers and the other with the invoice lines

rejects_250601.txt - a text file with loading errors desired output

Problem solution implemented in various ETL tools:

Datastage - header and trailer processing implementation

SAS Base - header and trailer processing implementation

PDI - Pentaho Data Integration header and trailer processing

Back to the Data Warehousing tutorial home