Data scrambling using PHP and MySQL
Business case
The business background and algorithms used in this tutorial is the same as in the Pentaho Data Integration sanitization example on our pages.
Please also refer to the Data blinding considerations on our Data Warehousing Tutorial pages for more background information.
PHP and MySQL solution
Scrambling algorithm
The data scrambling algorithm for generating encoded numbers is as follows:
Environment setup
`acc_no` varchar(16) collate latin1_general_ci NOT NULL,
`md5_hash` varchar(32) collate latin1_general_ci NOT NULL,
`md5_dec` varchar(32) collate latin1_general_ci default NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci;
Implementation
The PHP code to do the processing is attached below. It contains comments which will help understand how the program works:<?php
/************ PHP MySQL data scrambling ************/
//parameters
$rec_cnt=1000000;
$acc_len=16;
//check the max execution time
echo ini_get('max_execution_time')."<br>";
//connect to the database and select the database
$db=mysql_connect ("localhost", "etl-tools", ".info");
mysql_select_db ("etltoolsinfo");
//flush the table first
mysql_query("delete from scrambled_accounts;")
or die('Error, delete query failed');
//main loop
for ($i=0;$i<$rec_cnt;$i++) {
// generate o 16-digit random account number. A sequence is used to avoid generating duplicates.
//function definition attached below the code
$acc=gen_accnumber(0+$i,$acc_len);
//calculate a md5 sum
$h_md5 = md5($acc);
//replace A-F hex digits
$acc_m=str_replace('a','',$h_md5);
$acc_m=str_replace('b','',$acc_m);
$acc_m=str_replace('c','',$acc_m);
$acc_m=str_replace('d','',$acc_m);
$acc_m=str_replace('e','',$acc_m);
$acc_m=str_replace('f','',$acc_m);
$acc_m=substr($acc_m,0,$acc_len);
// construct sql statement
$sql="INSERT INTO
scrambled_accounts(acc_no,md5_hash,md5_dec)
VALUES ('".$acc."','".$h_md5."','".$acc_m."');";
// run query or stop and raise an error
mysql_query($sql) or die('Error, insert query failed');
//loop through
}
// print output
echo "Processing finished";
//function definition
function gen_accnumber($tmp,$il_chars){
//generate random number
$dokl=rand(0,999999);
//number of chars to add to the main number
$ile = $il_chars-strlen($tmp)-strlen($dokl);
$tmpdokl="";
//add zeros where appropriate
for ($j=0;$j<$ile;$j++) {
$tmpdokl="0".$tmpdokl;
}
//return result
return $dokl.$tmpdokl.$tmp;
}
?>