Looking up data in Informatica

Purpose and overview

Lookup transformation - returns values from a database table or a flat file associated with a given input value.
In SQL teminology a lookup may be considered as a sub-query.

Example - dynamic lookup

Let's consider the daily updated master table with products. Within a day a product may change its status or an error may be corrected. A new product record may be added first and a change to that record may be added later. The change (insert followed by an update) needs to be detected automatically.

Example - unconnected lookup

Lookup table is usually used within a conditional statement in an Expression. For instance, to look up a product color only for rows which don't have it filled, use the following :

IIF ( ISNULL(colorproduct_id),:lkp.COLORLOOKUP(product_id),colorproduct_id)

Lookup cache

Basically there are two types of cache memory: index and data cache. All port values from the lookup table where the port is part of the lookup condition are loaded into index cache.
The index cache contains all port values from the lookup table where the port is specified in the lookup condition.
The data cache contains all port values from the lookup table that are not in the lookup condition and that are specified as "output" ports.
After the cache is loaded, values from the Lookup input port(s) that are part of the lookup condition are compared to the index cache.
Upon a match the rows from the cache are included in the stream.

Key cache related properties in the Lookup transform:

  • Lookup Caching Enabled - Indicates whether the integration Service caches lookup values during the session.
  • Lookup Cache Persistent - Indicates whether the Integration Service uses a persistent lookup cache.
  • Recache From Lookup Source - can be used only with the lookup cache enabled. When selected, the Integration Service rebuilds the lookup cache from the lookup source when it first calls the Lookup transformation instance. If a persistent lookup cache is used, it rebuilds the persistent cache files before using the cache, otherwise, it rebuilds the lookup cache in memory before using the cache.

    Useful tips / best practices

  • When the source is large, cache lookup table columns for those lookup tables of 500,000 rows or less.
  • Standard rule of thumb is not to cache tables over 500,000 rows.
  • Use equality (=) conditions if possible in the Condition tab of the Lookup.
  • Use IIF or DECODE functions when lookup returns small row sets.
  • Avoid date comparisons in lookup, it's more efficient to convert to string.
  • A large lookup table may require more memory resources than available. SQL override in the lookup transformation can be used to reduce the amount of memory used by the Lookup cache.
  • The unconnected lookup function should normally be called only when a condition (IIF) is evaluated.
  • An unconnected lookup transformation can return only one port, however multiple input ports may be passed to it.
  • One expression transformer can do more than one lookups.
  • Connected lookups use default values, where in unconnected lookups the default values are ignored.
  • Lookup cache - cache if the number (and size) of records in the lookup table is relatively small comparing to the number of rows requiring a lookup.
  • Lookup caching typically improves performance if the time taken to load the lookup cache is less than the time that would be taken to perform the external read requests. To reduce the amount of cache required: turn off or delete any unused output ports, index the lookup file to speed the retrieval time or use where clauses in the SQL override to minimize the amount of data written to cache.