Sorter transformation

Purpose

Sorter is an active transformation which sorts the incoming data based on one or more key values (in an ascending, descending or mixed order). The Sorter transformation attribute, 'Distinct' provides a facility to remove duplicates from the input rows.

Key properties worth looking at:

  • Sorter Cache Size - the Integration Service passes all rows into the Sorter transformation before it performs the sort operation. Any amount between 1 MB and 4 GB can be specified for the Sorter cache size.
  • Case Sensitive and Null Treated Low - options used to precise the output order
  • Work Directory - the directory used to create temporary files while it sorts data. After the rows are sorted, the temporary files get deleted automatically.
  • Distinct - if distinct output rows is checked, the Mapping Designer configures all ports as part of the sort key.

    Aggregator transformation

    Purpose

    The Aggregator transformation calculates aggregates such as sums, minimum or maximum values across multiple groups of rows.
    The Aggregator transformation can apply expressions to its ports however those expressions will be applied to a group of rows unlike the Expression transformation which applies calculations on a row-by-row basis only.

    Aggregate functions are created in output ports only. Function grouping requirements are set using the Aggregator Group By port.

    Available aggregate expressions: AVG, COUNT, FIRST, LAST, MAX, MEDIAN, MIN, PERCENTILE, STDDEV, SUM

    Key properties:

  • Sorted Input - should be checked when the input data is already sorted by groups.
  • Aggregator Data Cache Size
  • Aggregator Index Cache Size - default is "Auto", index cache contains all group by ports.

    Examples / useful tips

  • Sorting the records prior to passing them on to an Aggregator transformation may improve the overall performance of the aggregation task.
  • Using a Sorter transformation may improve performance over an 'Order By' clause in a SQL override statement in aggregate session when the source is a database because the source database may not be tuned with the buffer sizes needed for a database sort.
  • In Aggregator checking the sorted input attribute will bypass caching.