We are scanning sheets that have binary codes printed onto them to map the sheets to other data (sheet IDs if you will). There are a limited number of IDs, printed out as binary codes of up to 20 bits. This is implemented as boxes of a certain size at specified x,y coordinates on the page. So, for sheet ID 4, we
have the first 17 boxes showing up as blank areas on the page, followed by 1 black box, followed by blank areas for the last 2 boxes.
Never mind WHY we are doing this (we are cheapskates). What we are trying to do now is to add another value that is a hash of the first value. We are doing this to avoid doing the wrong thing when there is a problem in scanning. There are cases for a variety of reasons where the ID is misidentified and we cannot prevent this from happening. However, we can reduce the incidence of the wrong thing being done with the data by checking the hash value. If the sheet ID and its hash value (printed on the page as additional boxes) do not match up, then we can say that we do not have enough confidence in the scanning to store the other sheet data at that sheet ID in the database and report this to the user. This prevents us from overwriting existing data into the wrong place due to misrecognition of the sheet ID.
I am looking for a good hash function that will distribute the hash values well over the range of sheet IDs. Since we are actually looking at binary information, we want to minimize similar bit patterns mapping to the same hash code.
It has been a while since I thought about anything like this and I was wondering if anyone had tried to do something similar and knew a good hash function for this specific case. We are looking for 5 bit hashes mapping to the 20 bit sheet IDs
Thanks for any suggestions.