HASHBYTE Function in SQL Server | Help | Urgent

Jasmine Sandlas
Jasmine Sandlas used Ask the Experts™
on
Hi Guys,

select hashbytes('SHA2_256','123'+'456'+'1') -- Hashbytes will always give UNIQUE value for n number of rows.

Just wanted to know, if we convert the HASHBYTES function to BIGINT like:-

select convert(bigint, hashbytes('SHA2_256','123'+'456'+'1')) will this conversion will always give UNIQUE values if their are million or billions of rows?

Please help.
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Russ SuterSenior Software Developer
Commented:
With any hash function collisions are possible but unlikely. Certainly with a sufficiently large data set you will inevitably run into a hash collision so uniqueness is not a guarantee. In practice, however, actually encountering a collision is like getting struck by lightning twice while holding a winning lottery ticket.

All that being said, as long as the entire set of bytes can be converted to a BIGINT and you don't have any data truncation, you're no more or less likely to encounter a collision than you would be without the conversion.
Topic Advisor, Page Editor
Distinguished Expert 2018
Commented:
No.

Cannot rely on it resulting in a unique value.

Problem is converting 256 bits (32 bytes) for SHA2_256 to bigint which is 8 bytes.

But defeats the purpose of hashbytes anyway....

Maybe better with an identity of bigint - certainly gives the range to handle millions/billions/trillions of rows -2^63  to 2^63-1 (ie up to 9,223,372,036,854,775,807)

In terms of number of unique entries, you could readily map/transform to an equivalent bigint, but again, defeats the purpose.
Nakul VachhrajaniTechnical Architect, Capgemini India
Commented:
HASHBYTES provides one advantage over other change detection mechanisms and that is the ability to detect symmetrical changes. However it comes with an additional storage overhead: http://www.sqlservercentral.com/articles/SQL+Server/95931/.

Please note that there may still be possibilities for duplicate outputs.

The CAST to BIGINT should not have any impact on the output because it is being applied to the outcome of the HASHBYTES function.
Mark WillsTopic Advisor, Page Editor
Distinguished Expert 2018
Commented:
Ummm....

Careful what you are saying - especially to the comment "applied to the outcome"

Saw this using the same MD5 algorithm as was used in the article 95931 (above) :
I have a table with 126,000,000 unique MD5 hashes.  

       Collisions
LEFT(hashkey,7)      21,726,252
LEFT(hashkey,8)      1,811,652
LEFT(hashkey,9)      115,255
LEFT(hashkey,10)      7,198

That was over on : https://social.msdn.microsoft.com/Forums/en-US/60227214-6bab-4e1c-b255-90a64eb6c7f5/converting-sql-server-hashbytes-varbinary-to-bigint?forum=transactsql

So, the possibility for duplicates (ie collisions) is invariably high when going from a larger column (32 bytes) to a bigint (8 bytes)

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial