sunhux
asked on
one-way data "masking" MD5 sql
Refer to attached sql script which my DBA colleague plan to use to
"mask" sensitive data in the DB (I presume it's Oracle).
Q1:
Is this a 1-way hashing or the 'masked' data can be 'unhashed' back?
Q2:
Is this MD5 the industry practice or there are more secure ways of 'masking' it
Q3:
How do people verify the 'hashed' data? Do we do 'norows export' & randomly
check the exported tables or is there a way to export out table by table of the
hashed tables?
Q4:
What are the things to look out for in the logs to verify this 'masking' process
completed successfully or how can I amend this script further so that it will
log down its activities for verification?
Q5:
How does the script ensure that the keys that link the various tables are 'hashed'
together so that it does not break the entity relationship & the 'masked' database
can still be used by the vendor we're sending this database to?
Q6:
How do people normally send such a 'masked' Oracle DB out : by encrypted tapes
or USB HDD or slowly via sftp over Internet?
"mask" sensitive data in the DB (I presume it's Oracle).
Q1:
Is this a 1-way hashing or the 'masked' data can be 'unhashed' back?
Q2:
Is this MD5 the industry practice or there are more secure ways of 'masking' it
Q3:
How do people verify the 'hashed' data? Do we do 'norows export' & randomly
check the exported tables or is there a way to export out table by table of the
hashed tables?
Q4:
What are the things to look out for in the logs to verify this 'masking' process
completed successfully or how can I amend this script further so that it will
log down its activities for verification?
Q5:
How does the script ensure that the keys that link the various tables are 'hashed'
together so that it does not break the entity relationship & the 'masked' database
can still be used by the vendor we're sending this database to?
Q6:
How do people normally send such a 'masked' Oracle DB out : by encrypted tapes
or USB HDD or slowly via sftp over Internet?
I'm not going to want through all your code to see everything it is doing.
Hashing is meant to be one-way. There is not superposed to be any way to turn it back into the passed value.
I use "meant" and "supposed" because some of the algorithms have been cracked.
It isn't masking. Those are two different terms that mean two different things.
MD5 should be avoided at all costs:
https://en.wikipedia.org/wiki/Hash_function_security_summary
I think the minimum acceptable these days is SHA256.
What are you asking about "verifying" data? You query it. Does it look hashed? If yes, then the hash worked...
Hashing is meant to be one-way. There is not superposed to be any way to turn it back into the passed value.
I use "meant" and "supposed" because some of the algorithms have been cracked.
It isn't masking. Those are two different terms that mean two different things.
MD5 should be avoided at all costs:
https://en.wikipedia.org/wiki/Hash_function_security_summary
I think the minimum acceptable these days is SHA256.
What are you asking about "verifying" data? You query it. Does it look hashed? If yes, then the hash worked...
Q1:
You can't "unhash", but you can still "crack" it with sufficient computing power, time and techniques
Q2:
MD5 is definitely NOT a secure hashing mechanism. Modern systems (like Oracle itself) uses SHA2 hashing.
https://www.experts-exchange.com/articles/855/How-Oracle-Stores-Passwords.html
or for a more up to date version of this article
https://seanstuber.wordpress.com/2016/03/19/how-oracle-stores-passwords/
Also, the obfuscation toolkit is deprecated, you should use dbms_crypto. If you're on 11g you might want to use a java stored procedure to do the higher bit sha2 methods. In 12c dbms_crypto has them built in.
Q3:
You verify a hash by taking a known value, hashing it and checking if it matches an already known hash. If they don't, then either the hashing algorithm has changed or the original data is not the same.
Q4:
I don't know what you're trying to find in your logs
Q5:
You have to manage FKs yourself. Which usually means either removing the FKs hashing the parent and child tables then reapplying the FKs
or if the FKs are deferred, then updating the parent and child and then commiting
or, if the FKs are not deferred, then deleting the children, updating the parent with the hash, then reinserting the children with the newly hashed keys
Q6:
It would depend on the size, the relationship with the vendor and the nature of the data and rules around it. If it's not that big of a deal if it might get lost then a usb in the mail might be sufficient. If more security is needed then maybe some secure internet transfer.
If even more security then maybe a usb drive or tapes with secured couriers and signed/biometric authentication for transfer of control.
You can't "unhash", but you can still "crack" it with sufficient computing power, time and techniques
Q2:
MD5 is definitely NOT a secure hashing mechanism. Modern systems (like Oracle itself) uses SHA2 hashing.
https://www.experts-exchange.com/articles/855/How-Oracle-Stores-Passwords.html
or for a more up to date version of this article
https://seanstuber.wordpress.com/2016/03/19/how-oracle-stores-passwords/
Also, the obfuscation toolkit is deprecated, you should use dbms_crypto. If you're on 11g you might want to use a java stored procedure to do the higher bit sha2 methods. In 12c dbms_crypto has them built in.
Q3:
You verify a hash by taking a known value, hashing it and checking if it matches an already known hash. If they don't, then either the hashing algorithm has changed or the original data is not the same.
Q4:
I don't know what you're trying to find in your logs
Q5:
You have to manage FKs yourself. Which usually means either removing the FKs hashing the parent and child tables then reapplying the FKs
or if the FKs are deferred, then updating the parent and child and then commiting
or, if the FKs are not deferred, then deleting the children, updating the parent with the hash, then reinserting the children with the newly hashed keys
Q6:
It would depend on the size, the relationship with the vendor and the nature of the data and rules around it. If it's not that big of a deal if it might get lost then a usb in the mail might be sufficient. If more security is needed then maybe some secure internet transfer.
If even more security then maybe a usb drive or tapes with secured couriers and signed/biometric authentication for transfer of control.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks, so
DBMS_OBFUSCATION_TOOLKIT.m d5
should be
DBMS_CRYPTO ??
DBMS_OBFUSCATION_TOOLKIT.m
should be
DBMS_CRYPTO ??
ASKER
Feedback from DBA:
The size of the encrypted key for DBMS_CRYPTO SHA256 algorithm is 64 character while for MD5 it is 32 character. The 64 char exceeds the size of fields ENTITY_CODE and CONTRACT_REFERENCE (only for ENTITY_LINKS table) :
so what do we do? Will it break anything if we truncate the length of the key?
I have a clarification on the 64 & 32 chars that our DBA says:
from the link
https://docs.oracle.com/cd/E18283_01/appdev.112/e16760/d_crypto.htm#BJFGFDFG
it says:
HASH_MD4
Produces a 128-bit hash, or message digest of the input message
HASH_MD5
Also produces a 128-bit hash, but is more complex than MD4
HASH_SH1
Secure Hash Algorithm (SHA). Produces a 160-bit hash.
So DBMS_CRYPTO is using HASH_SH1 ?
So is 160bit = 64 char ?
The size of the encrypted key for DBMS_CRYPTO SHA256 algorithm is 64 character while for MD5 it is 32 character. The 64 char exceeds the size of fields ENTITY_CODE and CONTRACT_REFERENCE (only for ENTITY_LINKS table) :
so what do we do? Will it break anything if we truncate the length of the key?
I have a clarification on the 64 & 32 chars that our DBA says:
from the link
https://docs.oracle.com/cd/E18283_01/appdev.112/e16760/d_crypto.htm#BJFGFDFG
it says:
HASH_MD4
Produces a 128-bit hash, or message digest of the input message
HASH_MD5
Also produces a 128-bit hash, but is more complex than MD4
HASH_SH1
Secure Hash Algorithm (SHA). Produces a 160-bit hash.
So DBMS_CRYPTO is using HASH_SH1 ?
So is 160bit = 64 char ?
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
So instead of truncating the hash, we should increase the length of the
hash field (I'm no DBA, so I don't know if I've used the right term here)?
>You can use MD5 but doing so can make the data viewable by those that receive it.
Don't understand the above; does it mean the hashed data (or database) can
be read by the external vendor whom we are sending the hashed DB to?
hash field (I'm no DBA, so I don't know if I've used the right term here)?
>You can use MD5 but doing so can make the data viewable by those that receive it.
Don't understand the above; does it mean the hashed data (or database) can
be read by the external vendor whom we are sending the hashed DB to?
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
As this hashed DB is for vendor's development, I guess collisions wont hamper
development work: so long as truncating it won't increase the security risk or
will it?
http://crypto.stackexchange.com/questions/9435/is-truncating-a-sha512-hash-to-the-first-160-bits-as-secure-as-using-sha1
Referring to above, if SHA256 hash is truncated to MD5's length, what's the chance of
collision? 2 to the power ?
development work: so long as truncating it won't increase the security risk or
will it?
http://crypto.stackexchange.com/questions/9435/is-truncating-a-sha512-hash-to-the-first-160-bits-as-secure-as-using-sha1
Referring to above, if SHA256 hash is truncated to MD5's length, what's the chance of
collision? 2 to the power ?
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
>Can I ask why you are sending hashed data to someone that cannot see the actual data?
This outsourced remote applications vendor is helping us to do work but we can't let
them see the actual sensitive data. I presume the vendor could work with 'fictitious'
data but the keys linking the various relational tables must not break
>hashes for values with fewer than 10 characters have been mapped in Rainbow tables, so
>MD5 hashing will not be a good solution
But our DBA pointed out that MD5 is 32 character; so it's still mappable in Rainbow tables?
This outsourced remote applications vendor is helping us to do work but we can't let
them see the actual sensitive data. I presume the vendor could work with 'fictitious'
data but the keys linking the various relational tables must not break
>hashes for values with fewer than 10 characters have been mapped in Rainbow tables, so
>MD5 hashing will not be a good solution
But our DBA pointed out that MD5 is 32 character; so it's still mappable in Rainbow tables?
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
>>I would go with the hash and truncate method ... hash the values, then hash the hash, then hash that hash, do that a bunch of times
Over dbms_random.string?
Over dbms_random.string?
>>> Over dbms_random.string?
that will work fine if you order the updates/inserts correctly to maintain FKs.
I was suggesting hashing so that there would be a correlation, albeit irreversible, between original and hashed output. That way when "slightwv" is hashed, it will always be the same value, and thus preserve FKs.
that will work fine if you order the updates/inserts correctly to maintain FKs.
I was suggesting hashing so that there would be a correlation, albeit irreversible, between original and hashed output. That way when "slightwv" is hashed, it will always be the same value, and thus preserve FKs.
Unless I'm missing something, which is entirely possible, you would need to hash "slightwv", then manually update ALL the keys/columns manually with the hashed value.
Isn't that the same workload if I generate a random string to replace "slightwv" with? Still the same number of updates.
Isn't that the same workload if I generate a random string to replace "slightwv" with? Still the same number of updates.
yes, but you have to manually maintain that.
that is, you generate a random string for slightwv then you have to go find all rows/columns that have that and update them to the same thing.
if you simply do col=some_hash(col)
and "some_hash" returns a consistent value then you don't need to figure anything out and find related values, they'll sort themselves out.
Now - it would be possible to write "some_hash" as
dbms_random.seed(my_parame ter);
return dbms_random.string('x',20) ;
but that essentially writing our own hash, using a non-secure randomizer.
so, definitely more crackable than SHA
that is, you generate a random string for slightwv then you have to go find all rows/columns that have that and update them to the same thing.
if you simply do col=some_hash(col)
and "some_hash" returns a consistent value then you don't need to figure anything out and find related values, they'll sort themselves out.
Now - it would be possible to write "some_hash" as
dbms_random.seed(my_parame
return dbms_random.string('x',20)
but that essentially writing our own hash, using a non-secure randomizer.
so, definitely more crackable than SHA
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Our DBA proposes to make those keys that are 'hashed' to have Unique Constraint
so if he detects as collision/duplicate hashed keys, then manually adjust which he
thinks will not be many : anyone has any view on our DBA's proposal?
so if he detects as collision/duplicate hashed keys, then manually adjust which he
thinks will not be many : anyone has any view on our DBA's proposal?
How do you make it a unique constraint if it is the child table of a foreign key constraint in a one-to-many relationship?
The proposal doesn't make any sense to me.
Your application should have constraints in place, hashing the data to mask it does not (or should not) in any way require new constraints to be added.
For example, lets say you have a table of customer names.
My name is Sean, you could hash that to A072A5313C9ACCF4C24B2DC814 5937B4
Instead of hashing you could use a random data generator and change my name to Blippo123
In both cases the original name is unrecoverable.
Why would adding a unique constraint help? What value could that possibly add?
Also, you could have multiple people named "Sean" in your customer base.
Adding a unique constraint would then cause your application to break because you couldn't have multiple people (either hashed or renamed) with the same name in your table, which is a completely artificial and likely erroneous rule.
Your application should have constraints in place, hashing the data to mask it does not (or should not) in any way require new constraints to be added.
For example, lets say you have a table of customer names.
My name is Sean, you could hash that to A072A5313C9ACCF4C24B2DC814
Instead of hashing you could use a random data generator and change my name to Blippo123
In both cases the original name is unrecoverable.
Why would adding a unique constraint help? What value could that possibly add?
Also, you could have multiple people named "Sean" in your customer base.
Adding a unique constraint would then cause your application to break because you couldn't have multiple people (either hashed or renamed) with the same name in your table, which is a completely artificial and likely erroneous rule.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Open in new window