Data Masking in Informatica
Today's organizations base on data stronger than ever before. No longer anything remains intuitive once every decision must be motivated with proper and accurate analysis. However, most today's companies consist of different departments, divisions and sub-companies, usually widely spread geographically.
Organizations make huge efforts to secure the data in production environments, however very often non-production environments, such as development, test, or training, are overlooked.
Having said this, the test/dev environments are a potential interesting target to malicious users, especially when the data is sent to an outsourcing or offshore external vendor.
Informatica PowerCenter Data Masking Option is using randomization algorithms which transform production data into realistic looking anonymized data that has nothing in common with reality.
In a very few words, PowerCenter Informatica Data Masking Option ensures that - even if some part of data gets out of the organization - it couldn't be properly understood. What strangers see is faked data that might be looking similarly to their demands.
Even though it's estimated that developing the comprehensive data security system could last even half a year, Data Masking seamless integration with PowerCenter platforms allows to reduce this time significantly. Moreover, hurry does not result in inaccuracy once the time can be shortened due to pre-built components.
The Data Masking tool itself creates functional and production looking data which is used to retain the original data's properties and preserves referential integrity.
Informatica Data Masking components features
From the technical point of view, Informatica Data Masking Option:
- Uses numerous and differentiated algorithms and techniques to mask crucial fields of forwarded data. Among many others, techniques used are non-deterministic randomization (which replaces each field with randomly chosen data), blurring (modifies the original data within a specific range), repeatable masking, and substitution.
- Maintains repeatability with built in methods
- Geneates efficient randomized output, while preserving the original data properties like datatypes, formats, lengths, etc.
- Provides own list of names and addresses which original ones can be substituted for (so for instance John Smith may be replaced by Timothy Adams, and not 'AABB CDE')
- Contains a set of rules used for modifying data from special fields (tax id numers, credit card numbers generating rules, for instance)
- Fully supports PowerCenter connectivity what ensures data from all types of sources can be masked
- Widens platform's transformation options what helps with ensuring proper protection
- Randomly substitute original values with false but realistic-looking values
- Integrates seamlessly into the PowerCenter environment (a plug-in idea)