|
ETL-Tools.Info
|
Business Intelligence - Data warehousing - ETL |
|
Data scrambling using PHP and MySQLBusiness case
The business background and algorithms used in this tutorial is the same as in the Pentaho Data Integration sanitization example on our pages. PHP and MySQL solutionScrambling algorithmThe data scrambling algorithm for generating encoded numbers is as follows: Environment setup
CREATE
TABLE
etltoolsinfo.scrambled_accounts
( `acc_no` varchar(16) collate latin1_general_ci NOT NULL, `md5_hash` varchar(32) collate latin1_general_ci NOT NULL, `md5_dec` varchar(32) collate latin1_general_ci default NULL ) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci; ImplementationThe PHP code to do the processing is attached below. It contains comments which will help understand how the program works:<?php /************ PHP MySQL data scrambling ************/
//parameters $rec_cnt=1000000; $acc_len=16;
//check the max execution time echo ini_get('max_execution_time')."<br>";
//connect to the database and select the database $db=mysql_connect ("localhost", "etl-tools", ".info"); mysql_select_db ("etltoolsinfo");
//flush the table first mysql_query("delete from scrambled_accounts;") or die('Error, delete query failed');
//main loop for ($i=0;$i<$rec_cnt;$i++) { // generate o 16-digit random account number. A sequence is used to avoid generating duplicates. //function definition attached below the code $acc=gen_accnumber(0+$i,$acc_len);
//calculate a md5 sum $h_md5 = md5($acc);
//replace A-F hex digits $acc_m=str_replace('a','',$h_md5); $acc_m=str_replace('b','',$acc_m); $acc_m=str_replace('c','',$acc_m); $acc_m=str_replace('d','',$acc_m); $acc_m=str_replace('e','',$acc_m); $acc_m=str_replace('f','',$acc_m); $acc_m=substr($acc_m,0,$acc_len);
// construct sql statement $sql="INSERT INTO scrambled_accounts(acc_no,md5_hash,md5_dec) VALUES ('".$acc."','".$h_md5."','".$acc_m."');";
// run query or stop and raise an error mysql_query($sql) or die('Error, insert query failed');
//loop through }
// print output echo "Processing finished";
//function definition function gen_accnumber($tmp,$il_chars){ //generate random number $dokl=rand(0,999999);
//number of chars to add to the main number $ile = $il_chars-strlen($tmp)-strlen($dokl); $tmpdokl="";
//add zeros where appropriate for ($j=0;$j<$ile;$j++) { $tmpdokl="0".$tmpdokl; } //return result return $dokl.$tmpdokl.$tmp;
}
?>
Execution and testing
http://127.0.0.1/scramble_accounts.htm
SELECT count(1)
acc_cnt
FROM
etltoolsinfo.scrambled_accounts
|
|
All Rights Reserved |
Add a comment