Replacements/Alternatives for Hashcode

I have to create some sort of "cache", here's the scenario:

1. User enters keywords or makes a selection from menus
2. Users selections are converted into a string, and its hashcode is derived
3. Results are fetched from the database according to user's selections
4. Cache file is created, hashcode of the user selection string is used as the file name .. database results are dumped into the cache file

The problem arises when there is a chance that next time the app is run, for the same user selection the hashcode might be different. (Ref: MSDN Docs)

Is there any available alternative we can use as the filename instead of hashcodes which stays consistent?

Thanks,
LVL 12
lexxwernAsked:
Who is Participating?
 
JimBrandleyConnect With a Mentor Commented:
I would not hash twice. You gain nothing from it, and it costs roughly double the time. The SHA (Secure Hash Algorithm) algorithm was chosen by the National Institute of Standards and Technology. Since one of the criteria they used to make the choice was how well small changes in an input string propagate through the key, that one would get my vote.

A hash function converts inputs to some number of hash buckets. They are called buckets because they are expected to contain multiple values. The better the hash function, the more evenly it distributes the input values into the set of buckets. SHA256 will distribute inputs into 2^256 buckets. So the probability of collisions is low, but it is non-zero. A map, keyed on the hash value, that relates the search criteria string to the filename is the only absolutely safe way to do this.

Some months ago, I translated a Java implementation of the SHA256 algorithm into C# so I could debug it for a question on this site. The bug is fixed in the C# code. If you have not already looked at the algorithm published by NIST, the below might be enlightening.

Jim

using System;
using System.Collections.Generic;
using System.Text;
 
namespace EncryptionTest
{
	class SHA256FromJS
	{
		/// <summary>
		/// This class is a translation to C# from JS code published at:
		/// http://anmar.eu.org/projects/jssha2/ 
		/// </summary>
		#region Members
		private static int chrsz = 8;	// Bits per input character
		#endregion
 
		#region Public Method
		public static string hex_sha256(string s)
		{
			return binb2hex(core_sha256(str2binb(s), s.Length * chrsz));
		}
 
		#endregion
 
		#region Private methods
		private static uint safe_add(uint x, uint y)
		{
			uint lsw = (x & 0xFFFF) + (y & 0xFFFF);
			uint msw = (x >> 16) + (y >> 16) + (lsw >> 16);
			return (msw << 16) | (lsw & 0xFFFF);
		}
		private static uint S(uint x, int n)
		{
			return ( x >> n ) | (x << (32 - n));
		}
		private static uint R(uint x, int n)
		{
			return (x >> n);
		}
		private static uint Ch(uint x, uint y, uint z)
		{
			return ((x & y) ^ ((~x) & z));
		}
		private static uint Maj(uint x, uint y, uint z)
		{
			return ((x & y) ^ (x & z) ^ (y & z));
		}
		private static uint Sigma0256(uint x)
		{
			return (S(x, 2) ^ S(x, 13) ^ S(x, 22));
		}
		private static uint Sigma1256(uint x)
		{
			return (S(x, 6) ^ S(x, 11) ^ S(x, 25));
		}
		private static uint Gamma0256(uint x)
		{
			return (S(x, 7) ^ S(x, 18) ^ R(x, 3));
		}
		private static uint Gamma1256(uint x)
		{
			return (S(x, 17) ^ S(x, 19) ^ R(x, 10));
		}
		private static uint[] str2binb(string str)
		{
			byte[] plainBytes = ASCIIEncoding.ASCII.GetBytes(str);
 
			int characters = plainBytes.Length;
			ulong textLen = (ulong)characters * (ulong)8;
			int padLength = 512 - (int)((textLen + (ulong)65) % (ulong)512);
			int messageLength = (int)((textLen + (ulong)65 + (ulong)padLength) / (ulong)32);
 
			uint[] bin = new uint[messageLength];
			Array.Clear(bin, 0, bin.Length);	// Initialize to all zeros
 
			uint mask = (uint)((1 << chrsz) - 1);
 
			for (int i = 0; i < plainBytes.Length * chrsz; i += chrsz)
			{
				bin[i >> 5] |= (plainBytes[i / chrsz] & mask) << (24 - i % 32);
			}
			return bin;
		}
		private static string binb2hex(uint[] binarray)
		{
			string hex_tab = "0123456789abcdef";
			StringBuilder result = new StringBuilder(64);
			char c1;
			char c2;
			for (int i = 0; i < binarray.Length * 4; i++)
			{
				c1 = hex_tab[(int)((binarray[i >> 2] >> ((3 - i % 4) * 8 + 4)) & 0xF)];
				c2 = hex_tab[(int)((binarray[i >> 2] >> ((3 - i % 4) * 8)) & 0xF)];
				result.Append(c1);
				result.Append(c2);
			}
			return result.ToString();
		}
		private static uint[] core_sha256(uint[] m, int l)
		{
			uint[] K = new uint[] { 0x428A2F98, 0x71374491, 0xB5C0FBCF, 0xE9B5DBA5, 0x3956C25B, 0x59F111F1, 0x923F82A4, 0xAB1C5ED5,
								  0xD807AA98, 0x12835B01, 0x243185BE, 0x550C7DC3, 0x72BE5D74, 0x80DEB1FE, 0x9BDC06A7, 0xC19BF174,
								  0xE49B69C1, 0xEFBE4786, 0xFC19DC6, 0x240CA1CC, 0x2DE92C6F, 0x4A7484AA, 0x5CB0A9DC, 0x76F988DA, 
								  0x983E5152, 0xA831C66D, 0xB00327C8, 0xBF597FC7, 0xC6E00BF3, 0xD5A79147, 0x6CA6351, 0x14292967, 
								  0x27B70A85, 0x2E1B2138, 0x4D2C6DFC, 0x53380D13, 0x650A7354, 0x766A0ABB, 0x81C2C92E, 0x92722C85,
								  0xA2BFE8A1, 0xA81A664B, 0xC24B8B70, 0xC76C51A3, 0xD192E819, 0xD6990624, 0xF40E3585, 0x106AA070, 
								  0x19A4C116, 0x1E376C08, 0x2748774C, 0x34B0BCB5, 0x391C0CB3, 0x4ED8AA4A, 0x5B9CCA4F, 0x682E6FF3, 
								  0x748F82EE, 0x78A5636F, 0x84C87814, 0x8CC70208, 0x90BEFFFA, 0xA4506CEB, 0xBEF9A3F7, 0xC67178F2 };
			uint[] HASH = new uint[] { 0x6A09E667, 0xBB67AE85, 0x3C6EF372, 0xA54FF53A, 0x510E527F, 0x9B05688C, 0x1F83D9AB, 0x5BE0CD19 };
			uint[] W = new uint[64];
			uint a, b, c, d, e, f, g, h;
			uint T1, T2;
 
			// Append padding
			m[l >> 5] |= (uint)(0x80 << (24 - l % 32));
			m[((l + 64 >> 9) << 4) + 15] = (uint)l;
			for (int i = 0; i < m.Length; i += 16)
			{
				a = HASH[0]; 
				b = HASH[1]; 
				c = HASH[2]; 
				d = HASH[3]; 
				e = HASH[4]; 
				f = HASH[5]; 
				g = HASH[6]; 
				h = HASH[7];
				for (int j = 0; j < 64; j++)
				{
					if (j < 16)
						W[j] = m[j]; // Fixed bug here. Was: W[j] = m[j + i];
					else
						W[j] = safe_add(safe_add(safe_add(Gamma1256(W[j - 2]), W[j - 7]), Gamma0256(W[j - 15])), W[j - 16]);
					T1 = safe_add(safe_add(safe_add(safe_add(h, Sigma1256(e)), Ch(e, f, g)), K[j]), W[j]);
					T2 = safe_add(Sigma0256(a), Maj(a, b, c));
					h = g; 
					g = f; 
					f = e; 
					e = safe_add(d, T1); 
					d = c; 
					c = b; 
					b = a; 
					a = safe_add(T1, T2);
				}
				HASH[0] = safe_add(a, HASH[0]);
				HASH[1] = safe_add(b, HASH[1]);
				HASH[2] = safe_add(c, HASH[2]);
				HASH[3] = safe_add(d, HASH[3]);
				HASH[4] = safe_add(e, HASH[4]);
				HASH[5] = safe_add(f, HASH[5]);
				HASH[6] = safe_add(g, HASH[6]);
				HASH[7] = safe_add(h, HASH[7]);
			}
			return HASH;
		}
		#endregion
 
		#region Algorithm from http://anmar.eu.org/projects/jssha2/
		/*
		Note: All variables are unsigned 32 bits and wrap modulo 232 when calculating
 
		Initialize variables
		(first 32 bits of the fractional parts of the square roots of the first 8 primes 2..19):
		h0 := 0x6a09e667
		h1 := 0xbb67ae85
		h2 := 0x3c6ef372
		h3 := 0xa54ff53a
		h4 := 0x510e527f
		h5 := 0x9b05688c
		h6 := 0x1f83d9ab
		h7 := 0x5be0cd19
 
		Initialize table of round constants
		(first 32 bits of the fractional parts of the cube roots of the first 64 primes 2..311):
		k[0..63] :=
		   0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
		   0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
		   0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
		   0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
		   0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
		   0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
		   0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
		   0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
 
		Pre-processing:
		append the bit '1' to the message
		append k bits '0', where k is the minimum number >= 0 such that the resulting message
			length (in bits) is congruent to 448 (mod 512)
		append length of message (before pre-processing), in bits, as 64-bit big-endian integer
 
		Process the message in successive 512-bit chunks:
		break message into 512-bit chunks
		for each chunk
			break chunk into sixteen 32-bit big-endian words w[0..15]
 
			Extend the sixteen 32-bit words into sixty-four 32-bit words:
			for i from 16 to 63
				s0 := (w[i-15] rightrotate 7) xor (w[i-15] rightrotate 18) xor (w[i-15] rightshift 3)
				s1 := (w[i-2] rightrotate 17) xor (w[i-2] rightrotate 19) xor (w[i-2] rightshift 10)
				w[i] := w[i-16] + s0 + w[i-7] + s1
 
			Initialize hash value for this chunk:
			a := h0
			b := h1
			c := h2
			d := h3
			e := h4
			f := h5
			g := h6
			h := h7
 
			Main loop:
			for i from 0 to 63
				s0 := (a rightrotate 2) xor (a rightrotate 13) xor (a rightrotate 22)
				maj := (a and b) xor (a and c) xor (b and c)
				t2 := s0 + maj
				s1 := (e rightrotate 6) xor (e rightrotate 11) xor (e rightrotate 25)
				ch := (e and f) xor ((not e) and g)
				t1 := h + s1 + ch + k[i] + w[i]
 
				h := g
				g := f
				f := e
				e := d + t1
				d := c
				c := b
				b := a
				a := t1 + t2
 
			Add this chunk's hash to result so far:
			h0 := h0 + a
			h1 := h1 + b 
			h2 := h2 + c
			h3 := h3 + d
			h4 := h4 + e
			h5 := h5 + f
			h6 := h6 + g 
			h7 := h7 + h
 
		Produce the final hash value (big-endian):
		digest = hash = h0 append h1 append h2 append h3 append h4 append h5 append h6 append h7
		*/
		#endregion
 
 
		#region Original JS Source
		/*
 
		var chrsz = 8;  // bits per input character. 8 - ASCII; 16 - Unicode  
		function safe_add (x, y) {
		  var lsw = (x & 0xFFFF) + (y & 0xFFFF);
		  var msw = (x >> 16) + (y >> 16) + (lsw >> 16);
		  return (msw << 16) | (lsw & 0xFFFF);
		}
		function S (X, n) {return ( X >>> n ) | (X << (32 - n));}
		function R (X, n) {return ( X >>> n );}
		function Ch(x, y, z) {return ((x & y) ^ ((~x) & z));}
		function Maj(x, y, z) {return ((x & y) ^ (x & z) ^ (y & z));}
		function Sigma0256(x) {return (S(x, 2) ^ S(x, 13) ^ S(x, 22));}
		function Sigma1256(x) {return (S(x, 6) ^ S(x, 11) ^ S(x, 25));}
		function Gamma0256(x) {return (S(x, 7) ^ S(x, 18) ^ R(x, 3));}
		function Gamma1256(x) {return (S(x, 17) ^ S(x, 19) ^ R(x, 10));}
		function core_sha256 (m, l) {
			var K = new Array(0x428A2F98,0x71374491,0xB5C0FBCF,0xE9B5DBA5,0x3956C25B,0x59F111F1,0x923F82A4,0xAB1C5ED5,0xD807AA98,0x12835B01,0x243185BE,0x550C7DC3,0x72BE5D74,0x80DEB1FE,0x9BDC06A7,0xC19BF174,0xE49B69C1,0xEFBE4786,0xFC19DC6,0x240CA1CC,0x2DE92C6F,0x4A7484AA,0x5CB0A9DC,0x76F988DA,0x983E5152,0xA831C66D,0xB00327C8,0xBF597FC7,0xC6E00BF3,0xD5A79147,0x6CA6351,0x14292967,0x27B70A85,0x2E1B2138,0x4D2C6DFC,0x53380D13,0x650A7354,0x766A0ABB,0x81C2C92E,0x92722C85,0xA2BFE8A1,0xA81A664B,0xC24B8B70,0xC76C51A3,0xD192E819,0xD6990624,0xF40E3585,0x106AA070,0x19A4C116,0x1E376C08,0x2748774C,0x34B0BCB5,0x391C0CB3,0x4ED8AA4A,0x5B9CCA4F,0x682E6FF3,0x748F82EE,0x78A5636F,0x84C87814,0x8CC70208,0x90BEFFFA,0xA4506CEB,0xBEF9A3F7,0xC67178F2);
			var HASH = new Array(0x6A09E667, 0xBB67AE85, 0x3C6EF372, 0xA54FF53A, 0x510E527F, 0x9B05688C, 0x1F83D9AB, 0x5BE0CD19);
			var W = new Array(64);
			var a, b, c, d, e, f, g, h, i, j;
			var T1, T2;
			// append padding
			m[l >> 5] |= 0x80 << (24 - l % 32);
			m[((l + 64 >> 9) << 4) + 15] = l;
			for ( var i = 0; i<m.length; i+=16 ) {
				a = HASH[0]; b = HASH[1]; c = HASH[2]; d = HASH[3]; e = HASH[4]; f = HASH[5]; g = HASH[6]; h = HASH[7];
				for ( var j = 0; j<64; j++) 
				{
					if (j < 16)
		 				W[j] = m[j + i];  // Not - this is a bug. should be: W[j] = m[j];
					else 
						W[j] = safe_add(safe_add(safe_add(Gamma1256(W[j - 2]), W[j - 7]), Gamma0256(W[j - 15])), W[j - 16]);
					T1 = safe_add(safe_add(safe_add(safe_add(h, Sigma1256(e)), Ch(e, f, g)), K[j]), W[j]);
					T2 = safe_add(Sigma0256(a), Maj(a, b, c));
					h = g; g = f; f = e; e = safe_add(d, T1); d = c; c = b; b = a; a = safe_add(T1, T2);
				}
				HASH[0] = safe_add(a, HASH[0]); 
				HASH[1] = safe_add(b, HASH[1]);
				HASH[2] = safe_add(c, HASH[2]);
				HASH[3] = safe_add(d, HASH[3]);
				HASH[4] = safe_add(e, HASH[4]);
				HASH[5] = safe_add(f, HASH[5]);
				HASH[6] = safe_add(g, HASH[6]);
				HASH[7] = safe_add(h, HASH[7]);
			}
			return HASH;
		}
		function str2binb (str) {
		  var bin = Array();
		  var mask = (1 << chrsz) - 1;
		  for(var i = 0; i < str.length * chrsz; i += chrsz)
			bin[i>>5] |= (str.charCodeAt(i / chrsz) & mask) << (24 - i%32);
		  return bin;
		}
		function binb2hex (binarray) {
		  var hexcase = 0; // hex output format. 0 - lowercase; 1 - uppercase 
		  var hex_tab = hexcase ? "0123456789ABCDEF" : "0123456789abcdef";
		  var str = "";
		  for (var i = 0; i < binarray.length * 4; i++) {
			str += hex_tab.charAt((binarray[i>>2] >> ((3 - i%4)*8+4)) & 0xF) + hex_tab.charAt((binarray[i>>2] >> ((3 - i%4)*8  )) & 0xF);
		  }
		  return str;
		}
		function hex_sha256(s){return binb2hex(core_sha256(str2binb(s),s.length * chrsz));}
 
*/
		#endregion
	}
}

Open in new window

0
 
JimBrandleyCommented:
For a given input string, a hash function always returns the same value. For example, many organizations, including my company, do not store user passwords in the database. Instead, we store the hash of the password. Given that same password, the hash always returns the same value.

That said, if the same selections always produce the same string, your hash key will work just fine.

There is one caveat; i.e. more than one input string can produce the same hash key. That's just the nature of hash functions. That should be the only way you can have problems. You can get around that by storing a map that includes the string,  the hash value and the name of the file. Simply add -1 -2 etc to the hash key for file names as collisions occur. Then in the rare event that a collision does occur, you can still get the correct file.

Jim
0
 
anyoneisCommented:
Where in the docs does it say that the hash can change for a given byte array? I think that is not correct. Give the attached snippet, you can run ComputeHash until you are blue in the face, and you will still get the same result, AFAIK.

David
        private void ComputeHash(string input)
        {
            HashAlgorithm shm = new SHA256Managed();
            label1.Text = Convert.ToBase64String(shm.ComputeHash(Encoding.UTF8.GetBytes(input)));
 
        }

Open in new window

0
The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

 
lexxwernAuthor Commented:
Thanks for the info.

What are the conditions in which two different strings may have the same Hash code?
0
 
JimBrandleyCommented:
For normal hash functions, such as MD5 and SHA1 or SHA256, that's difficult to predict. Changing a single bit in the input string will have a far-reaching effect on the generated key. If that were not the case, they would be easy to crack. Practically speaking, you could generate millions of keys and not get a single duplicate, or the first two you attempt could produce a collision.

Jim
 
0
 
lexxwernAuthor Commented:
Okay, I have to sell this to my boss this afternoon. Which of the hash functions (MD5, SHA1, SHA256) is known to have the least probability of collision. Is there any reference out there that I can cite? More importantly, is this the best way of coming up with a filename for each distinct string?

Which is better:
Filename = hash of string 1 . hash of string 2
OR
Filename = SHA256 hash of (MD5 hash of string 1)

Also, I have written my own hash function, doing a bunch of mathematical ops, but there is a mathematical probability of filenames repeating. I know there is a very low probability of filenames repeating, but if there is some mathematical assurance that it would never happen, I could sell this idea to my manager with more conviction. :)

Thanks,
0
All Courses

From novice to tech pro — start learning today.