Link to home
Start Free TrialLog in
Avatar of sunhux
sunhux

asked on

one-way data "masking" MD5 sql

Refer to attached sql script which my DBA colleague plan to use to
"mask" sensitive data in the DB (I presume it's Oracle).

Q1:
Is this a 1-way hashing or the 'masked' data can be 'unhashed' back?

Q2:
Is this MD5 the industry practice or there are more secure ways of 'masking' it

Q3:
How do people verify the 'hashed' data?   Do we do 'norows export' & randomly
check the exported tables or is there a way to export out table by table of the
hashed tables?

Q4:
What are the things to look out for in the logs to verify this 'masking' process
completed successfully or how can I amend this script further so that it will
log down its activities for verification?

Q5:
How does the script ensure that the keys that link the various tables are 'hashed'
together so that it does not break the entity relationship & the 'masked' database
can still be used by the vendor we're sending this database to?

Q6:
How do people normally send such a 'masked' Oracle DB out : by encrypted tapes
or USB HDD or slowly via sftp over Internet?
Avatar of sunhux
sunhux

ASKER

Somehow couldn't attach the file, so pasted below:


drop table ENTITY_REFERENCE;


create table ENTITY_REFERENCE
(entity_code varchar2(50) NOT NULL,
entity_code_md5_key varchar2(50) NOT NULL,
entity_desc varchar2(100) NULL,
entity_desc_md5_key varchar2(100) NULL
);

insert into ENTITY_REFERENCE
SELECT entity_code,rawtohex(DBMS_OBFUSCATION_TOOLKIT.md5 (input => UTL_RAW.cast_to_raw(entity_code)))entity_code_md5_key,
entity_desc,rawtohex(DBMS_OBFUSCATION_TOOLKIT.md5 (input => UTL_RAW.cast_to_raw(coalesce(entity_desc,'blank'))))entity_desc_md5_key
FROM ENTITY;




------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------

--select distinct table_name from SECURITY_REFERENCE

drop table SECURITY_REFERENCE;

create table SECURITY_REFERENCE 
(table_name varchar2(50) NOT NULL,
SECURITY_REFERENCE varchar2(101) NULL,
SECURITY_REFERENCE_md5_key varchar2(101) NULL
);

select table_name, count(1) from SECURITY_REFERENCE group by table_name
select * from SECURITY_REFERENCE 

insert into SECURITY_REFERENCE 
select distinct 'SECURITY_POSITIONS' table_name, SECURITY_REFERENCE, rawtohex(DBMS_OBFUSCATION_TOOLKIT.md5 (input => UTL_RAW.cast_to_raw(SECURITY_REFERENCE))) from SECURITY_POSITIONS_MASKTEST;
insert into SECURITY_REFERENCE 
select distinct'REPO' table_name, SECURITY_REFERENCE, rawtohex(DBMS_OBFUSCATION_TOOLKIT.md5 (input => UTL_RAW.cast_to_raw(SECURITY_REFERENCE))) from REPO_MASKTEST;
insert into SECURITY_REFERENCE 
select distinct'SECURITY_S' table_name, SECURITY_REFERENCE, rawtohex(DBMS_OBFUSCATION_TOOLKIT.md5 (input => UTL_RAW.cast_to_raw(SECURITY_REFERENCE))) from SECURITY_MASKTEST;
insert into SECURITY_REFERENCE 
select distinct'SECURITY_A' table_name, attribute_2, rawtohex(DBMS_OBFUSCATION_TOOLKIT.md5 (input => UTL_RAW.cast_to_raw(attribute_2))) from SECURITY_MASKTEST;
insert into SECURITY_REFERENCE 
select distinct'COLLATERAL' table_name, SECURITY_REFERENCE, rawtohex(DBMS_OBFUSCATION_TOOLKIT.md5 (input => UTL_RAW.cast_to_raw(SECURITY_REFERENCE))) from SECURITY_MASKTEST;


------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------

--drop table CONTRACT_REFRENCE
--drop table CONTRACT_REFERENCE

create table CONTRACT_REFERENCE
(table_name varchar2(50) NOT NULL,
CONTRACT_REFERENCE varchar2(101) NULL,
CONTRACT_REFERENCE_md5_key varchar2(101) NULL
);

insert into CONTRACT_REFERENCE 
select distinct 'LOANDEPO' table_name, CONTRACT_REFERENCE, rawtohex(DBMS_OBFUSCATION_TOOLKIT.md5 (input => UTL_RAW.cast_to_raw(CONTRACT_REFERENCE))) from LOANDEPO_MASKTEST;
insert into CONTRACT_REFERENCE 
select distinct 'ACCOUNT' table_name, ACCOUNT_REFERENCE, rawtohex(DBMS_OBFUSCATION_TOOLKIT.md5 (input => UTL_RAW.cast_to_raw(ACCOUNT_REFERENCE))) from ACCOUNT_MASKTEST;
insert into CONTRACT_REFERENCE 
select distinct 'SECURITY_POSITIONS' table_name, CONTRACT_REFERENCE, rawtohex(DBMS_OBFUSCATION_TOOLKIT.md5 (input => UTL_RAW.cast_to_raw(CONTRACT_REFERENCE))) from SECURITY_POSITIONS_MASKTEST;
insert into CONTRACT_REFERENCE 
select distinct 'SWAP' table_name, CONTRACT_REFERENCE, rawtohex(DBMS_OBFUSCATION_TOOLKIT.md5 (input => UTL_RAW.cast_to_raw(CONTRACT_REFERENCE))) from SWAP_MASKTEST; 
insert into CONTRACT_REFERENCE 
select distinct 'FOREX' table_name, CONTRACT_REFERENCE, rawtohex(DBMS_OBFUSCATION_TOOLKIT.md5 (input => UTL_RAW.cast_to_raw(CONTRACT_REFERENCE))) from FOREX_MASKTEST;
insert into CONTRACT_REFERENCE 
select distinct 'DEAL_IMPORT' table_name, CONTRACT_REFERENCE, rawtohex(DBMS_OBFUSCATION_TOOLKIT.md5 (input => UTL_RAW.cast_to_raw(CONTRACT_REFERENCE))) from DEAL_IMPORT_MASKTEST;
insert into CONTRACT_REFERENCE 
select distinct 'EXCHANGE_OPTION' table_name, CONTRACT_REFERENCE, rawtohex(DBMS_OBFUSCATION_TOOLKIT.md5 (input => UTL_RAW.cast_to_raw(CONTRACT_REFERENCE))) from EXCHANGE_OPTION_MASKTEST;
insert into CONTRACT_REFERENCE 
select distinct 'FACILITY' table_name, CONTRACT_REFERENCE, rawtohex(DBMS_OBFUSCATION_TOOLKIT.md5 (input => UTL_RAW.cast_to_raw(CONTRACT_REFERENCE))) from FACILITY_MASKTEST;
insert into CONTRACT_REFERENCE 
select distinct 'REPO' table_name, CONTRACT_REFERENCE, rawtohex(DBMS_OBFUSCATION_TOOLKIT.md5 (input => UTL_RAW.cast_to_raw(CONTRACT_REFERENCE))) from REPO_MASKTEST;
insert into CONTRACT_REFERENCE 
select distinct 'CREDIT_RATING' table_name, REFERENCE, rawtohex(DBMS_OBFUSCATION_TOOLKIT.md5 (input => UTL_RAW.cast_to_raw(REFERENCE))) from CREDIT_RATING_MASKTEST;
insert into CONTRACT_REFERENCE 
select distinct 'ENTITY_LINKS_M' table_name, mother_code, rawtohex(DBMS_OBFUSCATION_TOOLKIT.md5 (input => UTL_RAW.cast_to_raw(mother_code))) from entity_links_MASKTEST;
insert into CONTRACT_REFERENCE 
select distinct 'ENTITY_LINKS_C' table_name, child_code, rawtohex(DBMS_OBFUSCATION_TOOLKIT.md5 (input => UTL_RAW.cast_to_raw(child_code))) from entity_links_MASKTEST;
insert into CONTRACT_REFERENCE 
select distinct 'COLLATERAL' table_name, CONTRACT_REFERENCE, rawtohex(DBMS_OBFUSCATION_TOOLKIT.md5 (input => UTL_RAW.cast_to_raw(CONTRACT_REFERENCE))) from COLLATERAL_MASKTEST;

------------------------------------------------------------------------------------------------------------------------------------------------------------

--select table_name, count(1) from SECURITY_REFRENCE group by table_name order by 1
--select table_name, count(1) from CONTRACT_REFERENCE group by table_name order by 1
--select * from ENTITY_REFRENCE

------------------------------------------------------------------------------------------------------------
------------------------------------ENTITY_CODE JOIN--------------------------------------------------------

update ENTITY_MASKTEST E
set E.ENTITY_CODE = (select ER.ENTITY_CODE_MD5_KEY from ENTITY_REFERENCE ER where ER.ENTITY_CODE = E.ENTITY_CODE);
--select * from ENTITY_MASKTEST


update LOANDEPO_MASKTEST  LN
set LN.COUNTERPARTY_CODE = (select ER.ENTITY_CODE_MD5_KEY from ENTITY_REFERENCE ER where ER.ENTITY_CODE = LN.COUNTERPARTY_CODE);
--select COUNTERPARTY_CODE from LOANDEPO_MASKTEST  


update ACCOUNT_MASKTEST  AC
set AC.COUNTERPARTY_CODE = (select ER.ENTITY_CODE_MD5_KEY from ENTITY_REFERENCE ER where ER.ENTITY_CODE = AC.COUNTERPARTY_CODE);
--select COUNTERPARTY_CODE from ACCOUNT_MASKTEST  
select count(1) from ENTITY_REFERENCE


update SECURITY_MASKTEST  SC
set SC.ISSUER_CODE = (select ER.ENTITY_CODE_MD5_KEY from ENTITY_REFERENCE ER where ER.ENTITY_CODE = SC.ISSUER_CODE);
--select ISSUER_CODE from SECURITY_MASKTEST 


update SECURITY_POSITIONS_MASKTEST  SCP
set SCP.CPTY_CODE = (select ER.ENTITY_CODE_MD5_KEY from ENTITY_REFERENCE ER where ER.ENTITY_CODE = SCP.CPTY_CODE);
--select CPTY_CODE from SECURITY_POSITIONS_MASKTEST  


update SWAP_MASKTEST SW
set SW.COUNTERPARTY_CODE = (select ER.ENTITY_CODE_MD5_KEY from ENTITY_REFERENCE ER where ER.ENTITY_CODE = SW.COUNTERPARTY_CODE);
--select COUNTERPARTY_CODE from SWAP_MASKTEST  


update FOREX_MASKTEST FX
set FX.COUNTERPARTY_CODE = (select ER.ENTITY_CODE_MD5_KEY from ENTITY_REFERENCE ER where ER.ENTITY_CODE = FX.COUNTERPARTY_CODE);
--select COUNTERPARTY_CODE from FOREX_MASKTEST  


update DEAL_IMPORT_MASKTEST DI
set DI.COUNTERPARTY = (select ER.ENTITY_CODE_MD5_KEY from ENTITY_REFERENCE ER where ER.ENTITY_CODE = DI.COUNTERPARTY);
--select COUNTERPARTY from DEAL_IMPORT_MASKTEST  


update EXCHANGE_OPTION_MASKTEST EO
set EO.COUNTERPARTY_CODE = (select ER.ENTITY_CODE_MD5_KEY from ENTITY_REFERENCE ER where ER.ENTITY_CODE = EO.COUNTERPARTY_CODE);
--select COUNTERPARTY_CODE from EXCHANGE_OPTION_MASKTEST  


update FACILITY_MASKTEST FAC
set FAC.COUNTERPARTY_CODE = (select ER.ENTITY_CODE_MD5_KEY from ENTITY_REFERENCE ER where ER.ENTITY_CODE = FAC.COUNTERPARTY_CODE);
--select COUNTERPARTY_CODE from FACILITY_MASKTEST 


update REPO_MASKTEST R
set R.COUNTERPARTY_CODE = (select ER.ENTITY_CODE_MD5_KEY from ENTITY_REFERENCE ER where ER.ENTITY_CODE = R.COUNTERPARTY_CODE);
--select COUNTERPARTY_CODE from REPO_MASKTEST


update ISSUER_CREDIT_RATINGS_MASKTEST ICR 
set ICR.ENTITY_CODE = (select ER.ENTITY_CODE_MD5_KEY from ENTITY_REFERENCE ER where ER.ENTITY_CODE = ICR.ENTITY_CODE);
--select distinct ENTITY_CODE from ISSUER_CREDIT_RATINGS_MASKTEST;


update COMPANIES_MASKTEST COM 
set COM.ENTITY_CODE = (select ER.ENTITY_CODE_MD5_KEY from ENTITY_REFERENCE ER where ER.ENTITY_CODE = COM.ENTITY_CODE);
--select ENTITY_CODE from COMPANIES 


update COUNTRY_MASKTEST CTRY
set CTRY.ENTITY_CODE = (select ER.ENTITY_CODE_MD5_KEY from ENTITY_REFERENCE ER where ER.ENTITY_CODE = CTRY.ENTITY_CODE);
--select ENTITY_CODE from COUNTRY_MASKTEST


update COLLATERAL_MASKTEST CLT
set CLT.COUNTERPARTY = (select ER.ENTITY_CODE_MD5_KEY from ENTITY_REFERENCE ER where ER.ENTITY_CODE = CLT.COUNTERPARTY);
--select COUNTERPARTY from COLLATERAL_MASKTEST 
select * from COLLATERAL_MASKTEST


-----------------------------------------------------------------------------------------------------------------
------------------------------------ENTITY_CODE_DESC JOIN--------------------------------------------------------

Open in new window

Avatar of slightwv (䄆 Netminder)
I'm not going to want through all your code to see everything it is doing.

Hashing is meant to be one-way.  There is not superposed to be any way to turn it back into the passed value.

I use "meant" and "supposed" because some of the algorithms have been cracked.

It isn't masking.  Those are two different terms that mean two different things.

MD5 should be avoided at all costs:
https://en.wikipedia.org/wiki/Hash_function_security_summary

I think the minimum acceptable these days is SHA256.

What are you asking about "verifying" data?  You query it.  Does it look hashed?  If yes, then the hash worked...
Q1:

You can't "unhash", but you can still "crack" it with sufficient computing power, time and techniques


 Q2:
MD5 is definitely NOT a secure hashing mechanism.  Modern systems (like Oracle itself) uses SHA2 hashing.

https://www.experts-exchange.com/articles/855/How-Oracle-Stores-Passwords.html
or for a more up to date version of this article
https://seanstuber.wordpress.com/2016/03/19/how-oracle-stores-passwords/

Also, the obfuscation toolkit is deprecated, you should use dbms_crypto.  If you're on 11g you might want to use a java stored procedure to do the higher bit sha2 methods.  In 12c dbms_crypto has them built in.

 Q3:
You verify a hash by taking a known value, hashing it and checking if it matches an already known hash.  If they don't, then either the hashing algorithm has changed or the original data is not the same.

 Q4:
I don't know what you're trying to find in your logs

 Q5:
You have to manage FKs yourself.  Which usually means either removing the FKs hashing the parent and child tables then reapplying the FKs
or if the FKs are deferred, then updating the parent and child and then commiting
or, if the FKs are not deferred, then deleting the children, updating the parent with the hash, then reinserting the children with the newly hashed keys

 Q6:
It would depend on the size, the relationship with the vendor and the nature of the data and rules around it.    If it's not that big of a deal if it might get lost then a usb in the mail might be sufficient.  If more security is needed then maybe some secure internet transfer.
If even more security then maybe a usb drive or tapes with secured couriers and signed/biometric authentication for transfer of control.
ASKER CERTIFIED SOLUTION
Avatar of Adam Brown
Adam Brown
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of sunhux

ASKER

Thanks, so
   DBMS_OBFUSCATION_TOOLKIT.md5
should be
   DBMS_CRYPTO  ??
Avatar of sunhux

ASKER

Feedback from DBA:

The size of the encrypted key for DBMS_CRYPTO  SHA256 algorithm  is 64 character while for MD5 it is 32 character. The 64 char exceeds the size of fields ENTITY_CODE and CONTRACT_REFERENCE (only for ENTITY_LINKS table) :

so what do we do?  Will it break anything if we truncate the length of the key?


I have a clarification on the 64 & 32 chars that our DBA says:
from the link
  https://docs.oracle.com/cd/E18283_01/appdev.112/e16760/d_crypto.htm#BJFGFDFG
it says:

HASH_MD4
Produces a 128-bit hash, or message digest of the input message

HASH_MD5
Also produces a 128-bit hash, but is more complex than MD4

HASH_SH1
Secure Hash Algorithm (SHA). Produces a 160-bit hash.
 

So DBMS_CRYPTO  is using HASH_SH1 ?
So is 160bit = 64 char ?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of sunhux

ASKER

So instead of truncating the hash, we should increase the length of the
hash field (I'm no DBA, so I don't know if I've used the right term here)?

>You can use MD5 but doing so can make the data viewable by those that receive it.
Don't understand the above;  does it mean the hashed data (or database) can
be read by the external vendor whom we are sending the hashed DB to?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of sunhux

ASKER

As this hashed DB is for vendor's development, I guess collisions wont hamper
development work: so long as truncating it won't increase the security risk or
will it?

http://crypto.stackexchange.com/questions/9435/is-truncating-a-sha512-hash-to-the-first-160-bits-as-secure-as-using-sha1
Referring to above, if SHA256 hash is truncated to MD5's length, what's the chance of
collision?   2 to the power ?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of sunhux

ASKER

>Can I ask why you are sending hashed data to someone that cannot see the actual data?
This outsourced remote applications vendor is helping us to do work but we can't let
them see the actual sensitive data.  I presume the vendor could work with 'fictitious'
data but the keys linking the various relational tables must not break

>hashes for values with fewer than 10 characters have been mapped in Rainbow tables, so
>MD5 hashing will not be a good solution
But our DBA pointed out that MD5 is 32 character; so it's still mappable in Rainbow tables?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
>>I would go with the hash and truncate method ... hash the values, then hash the hash, then hash that hash, do that a bunch of times

Over dbms_random.string?
>>> Over dbms_random.string?

that will work fine if you order the updates/inserts correctly to maintain FKs.

I was suggesting hashing so that there would be a correlation, albeit irreversible, between original and hashed output.  That way when "slightwv" is hashed, it will always be the same value, and thus preserve FKs.
Unless I'm missing something, which is entirely possible, you would need to hash "slightwv", then manually update ALL the keys/columns manually with the hashed value.

Isn't that the same workload if I generate a random string to replace "slightwv" with?  Still the same number of updates.
yes, but you have to manually maintain that.

that is, you generate a random string for slightwv then you have to go find all rows/columns that have that and update them to the same thing.

if you simply do   col=some_hash(col)  

and "some_hash" returns a consistent value then you don't need to figure anything out and find related values, they'll sort themselves out.


Now - it would be possible to write "some_hash" as

dbms_random.seed(my_parameter);
return dbms_random.string('x',20);

but that  essentially writing our own hash, using a non-secure randomizer.
so, definitely more crackable than SHA
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of sunhux

ASKER

Our DBA proposes to make those keys that are 'hashed' to have Unique Constraint
so if he detects as collision/duplicate hashed keys, then manually adjust which he
thinks will not be many : anyone has any view on our DBA's proposal?
How do you make it a unique constraint if it is the child table of a foreign key constraint in a one-to-many relationship?
The proposal doesn't make any sense to me.

Your application should have constraints in place, hashing the data to mask it does not (or should not) in any way require new constraints to be added.

For example, lets say you have a table of customer names.

My name is Sean, you could hash that to A072A5313C9ACCF4C24B2DC8145937B4
Instead of hashing you could use a random data generator and change my name to Blippo123

In both cases the original name is unrecoverable.

Why would adding a unique constraint help?  What value could that possibly add?

Also, you could have multiple people named "Sean" in your customer base.
Adding a unique constraint would then cause your application to break because you couldn't have multiple people (either hashed or renamed) with the same name in your table, which is a completely artificial and likely erroneous rule.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial