Data Warehouse metadata

The metadata in a data warehouse system unfolds the definitions, meaning, origin and rules of the data used in a Data Warehouse. There are two main types of metadata in a data warehouse system: business metadata and technical metadata. Those two types illustrate both business and technical point of view on the data.
The Data Warehouse Metadata is usually stored in a Metadata Repository which is accessible by a wide range of users.

Business metadata

Business metadata (datawarehouse metadata, front room metadata, operational metadata) - this type of metadata stores business definitions of the data, it contains high-level definitions of all fields present in the data warehouse, information about cubes, aggregates, datamarts.
Business metadata is mainly addressed to and used by the data warehouse users, report authors (for ad-hoc querying), cubes creators, data managers, testers, analysts.

Typically, the following information needs to be provided to describe business metadata:

  • DW Table Name
  • DW Column Name
  • Business Name - short and desctiptive header information
  • Definition - extended description with brief overiview of the business rules for the field
  • Field Type - a flag may indicate whether a given field stores the key or a discrete value, whether is active or not, or what data type is it. The content of that field (or fields) may vary upon business needs.

    Technical metadata

    Technical metadata (ETL process metadata, back room metadata, transformation metadata) is a representation of the ETL process. It stores data mapping and transformations from source systems to the data warehouse and is mostly used by datawarehouse developers, specialists and ETL modellers.
    Most commercial ETL applications provide a metadata repository with an integrated metadata management system to manage the ETL process definition.
    The definition of technical metadata is usually more complex than the business metadata and it sometimes involves multiple dependencies.

    The technical metadata can be structured in the following way:

  • Source Database - or system definition. It can be a source system database, another data warehouse, file system, etc.
  • Target Database - Data Warehouse instance
  • Source Tables - one or more tables which are input to calculate a value of the field
  • Source Columns - one or more columns which are input to calculate a value of the field
  • Target Table - target DW table and column are always single in a metadata repository.
  • Target Column - target DW column
  • Transformation - the descriptive part of a metadata entry. It usually contains a lot of information, so it is important to use a common standard throughout the organisation to keep the data consistent.

    Some tools dedicated to the metadata management(many of them are bundled with ETL tools):
  • Teradata Metadata Services
  • Erwin Data modeller
  • Microsoft Repository
  • IBM (Ascential) MetaStage
  • Pentaho Metadata
  • AbInitio EME (Enterpise Metadata Environment)