Convert this (hash) Script function to PHP code

I'd like to convert this hash function to "native" php code
http://stackoverflow.com/questions/4234499/javascript-implementation-of-jenkins-hash
I've got a lot of it converted, there are just some parts that I don't know how to translate from JS to PHP. It seems the "return" functions in JS and PHP differ enough that I don't know how to proceed. The lines are 15 and 27 mainly. I don't understand the colon in JS and how that would translate to a PHP command or function...
There is also a Python script of this same hash function found here:
http://stackoverflow.com/questions/3279615/python-implementation-of-jenkins-hash
It hasn't helped me understand it either. I want "native" code instead of a PHP extension (of which 2 or more exist from 3rd parties) for ease of use, speed and efficiency are secondary so the extension is something I don't want.
I'm not positive about the conversion I've done so far, but I think it's correct given the two examples of the hash.
-rich
<?PHP

function rot($x,$k){
	$rot =((($x)<<($k)) | (($x)>>(32-($k))));
    return $rot;
}

function mix ($a,$b,$c) {
    $a = ($a - $c) | 0;  $a ^= rot($c, 4);  $c = ($c + $b) | 0;
    $b = ($b - $a) | 0;  $b ^= rot($a, 6);  $a = ($a + $c) | 0;
    $c = ($c - $b) | 0;  $c ^= rot($b, 8);  $b = ($b + $a) | 0;
    $a = ($a - $c) | 0;  $a ^= rot($c,16);  $c = ($c + $b) | 0;
    $b = ($b - $a) | 0;  $b ^= rot($a,19);  $a = ($a + $c) | 0;
    $c = ($c - $b) | 0;  $c ^= rot($b, 4);  $b = ($b + $a) | 0;
 // $mix = {$a : $a, $b : $b, $c : $c};
  return $mix;
}

function last($a,$b,$c) {
   $c ^= $b; $c -= rot($b,14) | 0;
   $a ^= $c; $a -= rot($c,11) | 0;
   $b ^= $a; $b -= rot($a,25) | 0;
   $c ^= $b; $c -= rot($b,16) | 0;
   $a ^= $c; $a -= rot($c,4) | 0;
   $b ^= $a; $b -= rot($a,14) | 0;
   $c ^= $b; $c -= rot($b,24) | 0;
//   $last = {$a : $a, $b : $b, $c : $c};
   return $last;
}

function hashlittle2($k, $initval, $initval2) {
    $length = strlen($k);
    $a = $b = $c = 0xdeadbeef + $length + $initval;
    $c += $initval2;

    $offset = 0;
    while ($length > 12) {
        $a += (ord($k[$offset+0]) + (ord($k[$offset+1])<<8) + (ord($k[$offset+2])<<16) + (ord($k[$offset+3])<<24));
        $a = $a>>0;
        $b += (ord($k[$offset+4]) + (ord($k[$offset+5])<<8) + (ord($k[$offset+6])<<16) + (ord($k[$offset+7])<<24));
        $b = $b>>0;
        $c += (ord($k[$offset+8]) + (ord($k[$offset+9])<<8) + (ord($k[$offset+10])<<16) + (ord($k[$offset+11])<<24));
        $c = $c>>0;
        $o = mix($a,$b,$c);
        $a = $o.$a; $b = $o.$b; $c = $o.$c;
        $length -= 12;
        $offset += 12;
    }

    switch($length) {
        case 12: $c += (ord($k[$offset+8]) + (ord($k[$offset+9])<<8) + (ord($k[$offset+10])<<16) + (ord($k[$offset+11])<<24));
         $b += (ord($k[$offset+4]) + (ord($k[$offset+5])<<8) + (ord($k[$offset+6])<<16) + (ord($k[$offset+7])<<24));
         $a += (ord($k[$offset+0]) + (ord($k[$offset+1])<<8) + (ord($k[$offset+2])<<16) + (ord($k[$offset+3])<<24)); break;
        case 11: $c += (ord($k[$offset+8]) + (ord($k[$offset+9])<<8) + (ord($k[$offset+10])<<16)); $b += (ord($k[$offset+4]) + (ord($k[$offset+5])<<8) + (ord($k[$offset+6])<<16) + (ord($k[$offset+7])<<24)); $a += (ord($k[$offset+0]) + (ord($k[$offset+1])<<8) + (ord($k[$offset+2])<<16) + (ord($k[$offset+3])<<24)); break;
        case 10: $c += (ord($k[$offset+8]) + (ord($k[$offset+9])<<8)); $b += (ord($k[$offset+4]) + (ord($k[$offset+5])<<8) + (ord($k[$offset+6])<<16) + (ord($k[$offset+7])<<24)); $a += (ord($k[$offset+0]) + (ord($k[$offset+1])<<8) + (ord($k[$offset+2])<<16) + (ord($k[$offset+3])<<24)); break;
        case 9: $c += (ord($k[$offset+8])); $b += (ord($k[$offset+4]) + (ord($k[$offset+5])<<8) + (ord($k[$offset+6])<<16) + (ord($k[$offset+7])<<24)); $a += (ord($k[$offset+0]) + (ord($k[$offset+1])<<8) + (ord($k[$offset+2])<<16) + (ord($k[$offset+3])<<24)); break;
        case 8: $b += (ord($k[$offset+4]) + (ord($k[$offset+5])<<8) + (ord($k[$offset+6])<<16) + (ord($k[$offset+7])<<24)); $a += (ord($k[$offset+0]) + (ord($k[$offset+1])<<8) + (ord($k[$offset+2])<<16) + (ord($k[$offset+3])<<24)); break;
        case 7: $b += (ord($k[$offset+4]) + (ord($k[$offset+5])<<8) + (ord($k[$offset+6])<<16)); $a += (ord($k[$offset+0]) + (ord($k[$offset+1])<<8) + (ord($k[$offset+2])<<16) + (ord($k[$offset+3])<<24)); break;
        case 6: $b += ((ord($k[$offset+5])<<8) + ord($k[$offset+4])); $a += (ord($k[$offset+0]) + (ord($k[$offset+1])<<8) + (ord($k[$offset+2])<<16) + (ord($k[$offset+3])<<24)); break;
        case 5: $b += (ord($k[$offset+4])); $a += (ord($k[$offset+0]) + (ord($k[$offset+1])<<8) + (ord($k[$offset+2])<<16) + (ord($k[$offset+3])<<24)); break;
        case 4: $a += (ord($k[$offset+0]) + (ord($k[$offset+1])<<8) + (ord($k[$offset+2])<<16) + (ord($k[$offset+3])<<24)); break;
        case 3: $a += (ord($k[$offset+0]) + (ord($k[$offset+1])<<8) + (ord($k[$offset+2])<<16)); break;
        case 2: $a += (ord($k[$offset+0]) + (ord($k[$offset+1])<<8)); break;
        case 1: $a += (ord($k[$offset+0])); break;
        case 0: return array($b, $c);
    }

    $o = last($a,$b,$c);
    $a = $o.$a; $b = $o.$b; $c = $o.$c;

    return array($b>>0, $c>>0);
}
$hashstr = 'Four score and seven years ago';
$hash = hashlittle2($hashstr, 0xdeadbeef, 0xdeadbeef);
print $hashstr . ": " . $hash[0] . " " . $hash[1] . "\n";
?>

Open in new window

LVL 38
Rich RumbleSecurity SamuraiAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Slick812Commented:
greetings richrumble, , I looked at your PHP code and at the JS on the  
http://stackoverflow.com/questions/4234499/javascript-implementation-of-jenkins-hash

page, , Thats some curious coding style in that javascript ? ? but I guess it works, looks like they translated it from another language (maybe C++) without setting it more to a javascript optimum. Anyway this line =
$mix = {$a : $a, $b : $b, $c : $c};
uses javascript Object Notation with the { and }  , , but seems kinda stupid to me since they place the same thing in the property and value (key:value) pair.

for PHP I would use =
$mix = array($a,$b,$c);
return $mix;

and using this in the function
$o = mix($a,$b,$c);
$a = $o[0]; $b = $o[1]; $c = $o[2];

however it would be more efficient if you used pass by reference

function mix (&$a &,$b, &$c) {// notice the 3  &  for Reference pass
    $a = ($a - $c) | 0;  $a ^= rot($c, 4);  $c = ($c + $b) | 0;
    $b = ($b - $a) | 0;  $b ^= rot($a, 6);  $a = ($a + $c) | 0;
    $c = ($c - $b) | 0;  $c ^= rot($b, 8);  $b = ($b + $a) | 0;
    $a = ($a - $c) | 0;  $a ^= rot($c,16);  $c = ($c + $b) | 0;
    $b = ($b - $a) | 0;  $b ^= rot($a,19);  $a = ($a + $c) | 0;
    $c = ($c - $b) | 0;  $c ^= rot($b, 4);  $b = ($b + $a) | 0;
}

and using this in the function
$o = mix($a,$b,$c);// will change the $a  $b  $c in the function No return is necessary
//$a = $o[0]; $b = $o[1]; $c = $o[2];
$length -= 12;

But I wonder about the code =
$c = ($c + $b) | 0;

and OR by ZERO does NOT change the value, shouldn't it be
$c = ($c + $b);

there's some similar non-sense at
$a += (ord($k[$offset+0]
$offset+0   is just mathematically stupid, as adding ZERO does not change the value

In the last( ) function it is exactly the same
$last = array($a,$b,$c);
return $last;

using it as =
$o = last($a,$b,$c);
$a = $o[0]; $b = $o[1]; $c = $o[2];

also
the final return has
return array($b>>0, $c>>0);
I may have dumb brain thinking, but shifting Right by ZERO does NOT change the value, seems like another mathematically stupid move, better as =
return array($b, $c);

however in some strict Typed languages like C++ shifting by ZERO can insure that the value is within a BYTE limit, but it does nothing in javascript and PHP as far as I know, try it your self to see

ask questions if you need more info
Rich RumbleSecurity SamuraiAuthor Commented:
Yeah those last bits are my doing, they should be more like the originals which were
case 0: return {b : b, c : c};
return {b : b>>>0, c : c>>>0};

below is what another person emailed me, let me know if that looks closer to the JS version, right now it doesn't like my offsets in the last line...
I've updated the "returns" to what I think the equivalent will be below.
I'll have a better look at your comments now, I'm in over my head already :)
-rich

<?PHP

function rot($x,$k){
	$rot =((($x)<<($k)) | (($x)>>(32-($k))));
    return $rot;
}

function mix ($a,$b,$c) {
    $a = ($a - $c) | 0;  $a ^= rot($c, 4);  $c = ($c + $b) | 0;
    $b = ($b - $a) | 0;  $b ^= rot($a, 6);  $a = ($a + $c) | 0;
    $c = ($c - $b) | 0;  $c ^= rot($b, 8);  $b = ($b + $a) | 0;
    $a = ($a - $c) | 0;  $a ^= rot($c,16);  $c = ($c + $b) | 0;
    $b = ($b - $a) | 0;  $b ^= rot($a,19);  $a = ($a + $c) | 0;
    $c = ($c - $b) | 0;  $c ^= rot($b, 4);  $b = ($b + $a) | 0;
    $mix = array($a => $a, $b => $b, $c => $c);
  return $mix;
}

function last($a,$b,$c) {
   $c ^= $b; $c -= rot($b,14) | 0;
   $a ^= $c; $a -= rot($c,11) | 0;
   $b ^= $a; $b -= rot($a,25) | 0;
   $c ^= $b; $c -= rot($b,16) | 0;
   $a ^= $c; $a -= rot($c,4) | 0;
   $b ^= $a; $b -= rot($a,14) | 0;
   $c ^= $b; $c -= rot($b,24) | 0;
   $last = array($a => $a, $b => $b, $c => $c);
   return $last;
}

function hashlittle2($k, $initval, $initval2) {
    $length = strlen($k);
    $a = $b = $c = 0xdeadbeef + $length + $initval;
    $c += $initval2;

    $offset = 0;
    while ($length > 12) {
        $a += (ord($k[$offset+0]) + (ord($k[$offset+1])<<8) + (ord($k[$offset+2])<<16) + (ord($k[$offset+3])<<24));
        $a = $a>>0;
        $b += (ord($k[$offset+4]) + (ord($k[$offset+5])<<8) + (ord($k[$offset+6])<<16) + (ord($k[$offset+7])<<24));
        $b = $b>>0;
        $c += (ord($k[$offset+8]) + (ord($k[$offset+9])<<8) + (ord($k[$offset+10])<<16) + (ord($k[$offset+11])<<24));
        $c = $c>>0;
        $o = mix($a,$b,$c);
        $a = $o.$a; $b = $o.$b; $c = $o.$c;
        $length -= 12;
        $offset += 12;
    }

    switch($length) {
        case 12: $c += (ord($k[$offset+8]) + (ord($k[$offset+9])<<8) + (ord($k[$offset+10])<<16) + (ord($k[$offset+11])<<24));
         $b += (ord($k[$offset+4]) + (ord($k[$offset+5])<<8) + (ord($k[$offset+6])<<16) + (ord($k[$offset+7])<<24));
         $a += (ord($k[$offset+0]) + (ord($k[$offset+1])<<8) + (ord($k[$offset+2])<<16) + (ord($k[$offset+3])<<24)); break;
        case 11: $c += (ord($k[$offset+8]) + (ord($k[$offset+9])<<8) + (ord($k[$offset+10])<<16)); $b += (ord($k[$offset+4]) + (ord($k[$offset+5])<<8) + (ord($k[$offset+6])<<16) + (ord($k[$offset+7])<<24)); $a += (ord($k[$offset+0]) + (ord($k[$offset+1])<<8) + (ord($k[$offset+2])<<16) + (ord($k[$offset+3])<<24)); break;
        case 10: $c += (ord($k[$offset+8]) + (ord($k[$offset+9])<<8)); $b += (ord($k[$offset+4]) + (ord($k[$offset+5])<<8) + (ord($k[$offset+6])<<16) + (ord($k[$offset+7])<<24)); $a += (ord($k[$offset+0]) + (ord($k[$offset+1])<<8) + (ord($k[$offset+2])<<16) + (ord($k[$offset+3])<<24)); break;
        case 9: $c += (ord($k[$offset+8])); $b += (ord($k[$offset+4]) + (ord($k[$offset+5])<<8) + (ord($k[$offset+6])<<16) + (ord($k[$offset+7])<<24)); $a += (ord($k[$offset+0]) + (ord($k[$offset+1])<<8) + (ord($k[$offset+2])<<16) + (ord($k[$offset+3])<<24)); break;
        case 8: $b += (ord($k[$offset+4]) + (ord($k[$offset+5])<<8) + (ord($k[$offset+6])<<16) + (ord($k[$offset+7])<<24)); $a += (ord($k[$offset+0]) + (ord($k[$offset+1])<<8) + (ord($k[$offset+2])<<16) + (ord($k[$offset+3])<<24)); break;
        case 7: $b += (ord($k[$offset+4]) + (ord($k[$offset+5])<<8) + (ord($k[$offset+6])<<16)); $a += (ord($k[$offset+0]) + (ord($k[$offset+1])<<8) + (ord($k[$offset+2])<<16) + (ord($k[$offset+3])<<24)); break;
        case 6: $b += ((ord($k[$offset+5])<<8) + ord($k[$offset+4])); $a += (ord($k[$offset+0]) + (ord($k[$offset+1])<<8) + (ord($k[$offset+2])<<16) + (ord($k[$offset+3])<<24)); break;
        case 5: $b += (ord($k[$offset+4])); $a += (ord($k[$offset+0]) + (ord($k[$offset+1])<<8) + (ord($k[$offset+2])<<16) + (ord($k[$offset+3])<<24)); break;
        case 4: $a += (ord($k[$offset+0]) + (ord($k[$offset+1])<<8) + (ord($k[$offset+2])<<16) + (ord($k[$offset+3])<<24)); break;
        case 3: $a += (ord($k[$offset+0]) + (ord($k[$offset+1])<<8) + (ord($k[$offset+2])<<16)); break;
        case 2: $a += (ord($k[$offset+0]) + (ord($k[$offset+1])<<8)); break;
        case 1: $a += (ord($k[$offset+0])); break;
        case 0: return array($b => $b, $c => $c);
    }
    $o = last($a,$b,$c);
    $a = $o.$a; $b = $o.$b; $c = $o.$c;
    return array($b => $b>>0, $c => $c>>0);
}
$hashstr = 'Four score and seven years ago';
$hash = hashlittle2($hashstr, 0xdeadbeef, 0xdeadbeef);
print $hashstr . ": " . $hash[0] . " " . $hash[1] . "\n";
?>

Open in new window

Slick812Commented:
I was not going to say, but I guess I should, In PHP on 32 bit platforms, the addition and subtraction of integers can result in values NOT consistant with the 32 byte values in other languages as PHP will interprept them as NEGATIVE if they have the first bit set, this is problematic for binary stuff like hashes and encryption that attempt to do 32 bit integers as opposed to UNSIGNED 32 bit values in other languages
Big Business Goals? Which KPIs Will Help You

The most successful MSPs rely on metrics – known as key performance indicators (KPIs) – for making informed decisions that help their businesses thrive, rather than just survive. This eBook provides an overview of the most important KPIs used by top MSPs.

Rich RumbleSecurity SamuraiAuthor Commented:
I was reading: http://www.php.net/manual/en/language.operators.bitwise.php#90310
And that got me thinking about PHP's limitations... Also yes this code was originally from C (something) http://burtleburtle.net/bob/c/lookup3.c The uini32 stuff may be something that we do have to compensate for, I don't know yet. While the hash is not "cryptographic" it is nonetheless important to get right :)

Thanks again, I'm keen to get this working, and really hoping it will be compatible with the other implementations. I'm about to hash a few strings using the JS version to see what I should be expecting from PHP in the end.
-rich
Slick812Commented:
I looked at the code in the post ID:37263001, And it is NOT correct, you do not seem to understand that although it does NOT give you error messages in PHP  the  period .  is compleatly different in javascript and PHP

$o = last($a,$b,$c);
$a = $o.$a; $b = $o.$b; $c = $o.$c;// BAD BAD BAD, turns the Numbers ( integers) to Strings
return array($b => $b>>0, $c => $c>>0);

please try and see what these functions and code are trying to do, and not try to literally copy it line by line and language symbol each and every one. This uses numbers as 32 bit integers (actually as unsigned 32 bit values, but PHP does NOT have any variables to use UNSIGNED)

$o = last($a,$b,$c);
//$a = $o.$a; $b = $o.$b; $c = $o.$c;
return array($o[$b],$o[$c]);// WHY have the key and value the same ? ?


HOWEVER, in PHP on 32 bit systems (or even if the PHP ini sets it as 32 bit) you can not get values above the maximum for an integer in an integer, the max for an integer is HALF for an UNSIGNED 32 bit integer in other languages.
Slick812Commented:
I will look at the C++ code later, but I've tried to do 32 bit binary before and it's a hassel, many times you will need to covert to 4 Byte (8 bit) values and deal with each Byte of the 4 Byte 32 bit value, inorger to have correct addition and subreaction vales.  In your code you have alot like -
$c = ($c + $b);// More efficient to use $c += $b;
can you see that if these values of $c and  $b  are over Half the max value of an integer, the result will be MORE than the variable can handle?
Rich RumbleSecurity SamuraiAuthor Commented:
Yeah, I know very little coding in general, I am glad you do! The notation "shorthand" you mentioned "+=" can be found in the python version. I did another "Frankenstein" version for that, but the JS seemed easier to read from my limited knowledge, so I was trying to do what I thought was simpler. The Jenkins hash is what our developers told me to use (specifically hashlittel2), but they are short on time and not permitted to work on my "pet project" so I'm reaching out to EE again :) I'm basically creating an index in a flat file that can be searched quickly (also using metaphone), long story short this is the hash we want :) I will need hand holding, again not my strong point.

I tried the function mix (&$a &,$b, &$c) { ...  etc
and php (5.3) didn't expect the &
-rich
Slick812Commented:
I have looked at the C code and again at the javascript at
http://stackoverflow.com/questions/4234499/javascript-implementation-of-jenkins-hash

However, I do not see that the Javascript as
hashlittle2: function(k, initval, initval2) {
does it correctly, Have you tried this in Javescript to see if the output given as  return {b : b>>>0, c : c>>>0};  can be accessed, , I may not be thinking correctly, but this uses the same faulty logic for an object notation output and what is the property value as b  ? ? and for javescript that I have done shouldn't this be
return [b>>>0, c>>>0];// as an array to send back
of course the C++ uses a completely different way to return it, not available in a script language

just as a reference for me, Why do you need TWO integers returned, this would give you 64 bits of hash length, do you need that much, what kind of lookup table are you generating, must be HUGHmongous large?

Oh and
function mix (&$a &,$b, &$c)
should be
function mix (&$a, &$b, &$c)
I am fairly sure that php 5.3 will take & in function parameters

I am going to look at the C code some more, but the addition and subtraction used in the mix function may be to much for me to take time to try and convert to signed integer crap that PHP is limited to for it's variable types. I may or may not be able to use the bcadd( ) for that but againt it taks more time than I may have
Rich RumbleSecurity SamuraiAuthor Commented:
Understood, and I did notice the typo (&, ) and fixed :) Yes we are (will be) doing terabytes of data against the index, but we may only use the first integer hash for lookup and the second for "uniqueness" in case of collisions. The PY implimention seems to "address" the uint32 issue that PHP has too http://stackoverflow.com/a/4594659/1090023
Below is my Frankenstein version of that (py ->php), right now it is returning a negitive number more often than not so I'm not sure that's correct. I am going to try the python and js versions next to see if they are working equally well before modification. You may or may not like the PY version any better. I know how precious time is, I'm glad to have any and all help.
BTW I change "final" to "last" because I think PHP has that keyword reserved, while Python does have a "finally" keyword I don't think that's what the author was going for.
Then again, I'm making a LOT of assumptions with all of this :)
-rich
<?PHP
function rot($x,$k){
	$rot =((($x)<<($k)) | (($x)>>(32-($k)))); 
	return $rot;
  }

function mix(&$a, &$b, &$c){
    $a &= 0xffffffff; $b &= 0xffffffff; $c &= 0xffffffff;
    $a -= $c; $a &= 0xffffffff; $a ^= rot($c,4);  $a &= 0xffffffff; $c += $b; $c &= 0xffffffff;
    $b -= $a; $b &= 0xffffffff; $b ^= rot($a,6);  $b &= 0xffffffff; $a += $c; $a &= 0xffffffff;
    $c -= $b; $c &= 0xffffffff; $c ^= rot($b,8);  $c &= 0xffffffff; $b += $a; $b &= 0xffffffff;
    $a -= $c; $a &= 0xffffffff; $a ^= rot($c,16); $a &= 0xffffffff; $c += $b; $c &= 0xffffffff;
    $b -= $a; $b &= 0xffffffff; $b ^= rot($a,19); $b &= 0xffffffff; $a += $c; $a &= 0xffffffff;
    $c -= $b; $c &= 0xffffffff; $c ^= rot($b,4);  $c &= 0xffffffff; $b += $a; $b &= 0xffffffff;
    return array($a,$b,$c);
  }

function last(&$a, &$b, &$c){
    $a &= 0xffffffff; $b &= 0xffffffff; $c &= 0xffffffff;
    $c ^= $b; $c &= 0xffffffff; $c -= rot($b,14); $c &= 0xffffffff;
    $a ^= $c; $a &= 0xffffffff; $a -= rot($c,11); $a &= 0xffffffff;
    $b ^= $a; $b &= 0xffffffff; $b -= rot($a,25); $b &= 0xffffffff;
    $c ^= $b; $c &= 0xffffffff; $c -= rot($b,16); $c &= 0xffffffff;
    $a ^= $c; $a &= 0xffffffff; $a -= rot($c,4);  $a &= 0xffffffff;
    $b ^= $a; $b &= 0xffffffff; $b -= rot($a,14); $b &= 0xffffffff;
    $c ^= $b; $c &= 0xffffffff; $c -= rot($b,24); $c &= 0xffffffff;
    return array($a,$b,$c);
};

function hashlittle2($data, $initval = 0, $initval2 = 0){
$length = strlen($data); $lenpos = $length;

    $a = (0xdeadbeef + ($length) + $initval); $b = (0xdeadbeef + ($length) + $initval); $c = (0xdeadbeef + ($length) + $initval);

    $c += $initval2; $c &= 0xffffffff; $p = 0;  # string offset
    while ($lenpos > 12){
        $a += (ord($data[$p+0]) + (ord($data[$p+1])<<8) + (ord($data[$p+2])<<16) + (ord($data[$p+3])<<24));     $a &= 0xffffffff;     $b += (ord($data[$p+4]) + (ord($data[$p+5])<<8) + (ord($data[$p+6])<<16) + (ord($data[$p+7])<<24));     $b &= 0xffffffff;     $c += (ord($data[$p+8]) + (ord($data[$p+9])<<8) + (ord($data[$p+10])<<16) + (ord($data[$p+11])<<24));     $c &= 0xffffffff;     mix($a, $b, $c);     $p += 12;     $lenpos -= 12;   }

    if ($lenpos == 12){
     $c += (ord($data[$p+8]) + (ord($data[$p+9])<<8) + (ord($data[$p+10])<<16) + (ord($data[$p+11])<<24));  $b += (ord($data[$p+4]) + (ord($data[$p+5])<<8) + (ord($data[$p+6])<<16) + (ord($data[$p+7])<<24));  $a += (ord($data[$p+0]) + (ord($data[$p+1])<<8) + (ord($data[$p+2])<<16) + (ord($data[$p+3])<<24));
   }
    if ($lenpos == 11){
     $c += (ord($data[$p+8]) + (ord($data[$p+9])<<8) + (ord($data[$p+10])<<16));  $b += (ord($data[$p+4]) + (ord($data[$p+5])<<8) + (ord($data[$p+6])<<16) + (ord($data[$p+7])<<24));  $a += (ord($data[$p+0]) + (ord($data[$p+1])<<8) + (ord($data[$p+2])<<16) + (ord($data[$p+3])<<24));
   }
    if ($lenpos == 10){
     $c += (ord($data[$p+8]) + (ord($data[$p+9])<<8));  $b += (ord($data[$p+4]) + (ord($data[$p+5])<<8) + (ord($data[$p+6])<<16) + (ord($data[$p+7])<<24));  $a += (ord($data[$p+0]) + (ord($data[$p+1])<<8) + (ord($data[$p+2])<<16) + (ord($data[$p+3])<<24));
   }
    if ($lenpos == 9){
     $c += (ord($data[$p+8]));  $b += (ord($data[$p+4]) + (ord($data[$p+5])<<8) + (ord($data[$p+6])<<16) + (ord($data[$p+7])<<24));  $a += (ord($data[$p+0]) + (ord($data[$p+1])<<8) + (ord($data[$p+2])<<16) + (ord($data[$p+3])<<24));
   }
    if ($lenpos == 8){
     $b += (ord($data[$p+4]) + (ord($data[$p+5])<<8) + (ord($data[$p+6])<<16) + (ord($data[$p+7])<<24));  $a += (ord($data[$p+0]) + (ord($data[$p+1])<<8) + (ord($data[$p+2])<<16) + (ord($data[$p+3])<<24));
   }
    if ($lenpos == 7){
     $b += (ord($data[$p+4]) + (ord($data[$p+5])<<8) + (ord($data[$p+6])<<16));  $a += (ord($data[$p+0]) + (ord($data[$p+1])<<8) + (ord($data[$p+2])<<16) + (ord($data[$p+3])<<24));
   }
    if ($lenpos == 6){
     $b += ((ord($data[$p+5])<<8) + ord($data[$p+4]));  $a += (ord($data[$p+0]) + (ord($data[$p+1])<<8) + (ord($data[$p+2])<<16) + (ord($data[$p+3])<<24));
   }
    if ($lenpos == 5){
     $b += (ord($data[$p+4]));  $a += (ord($data[$p+0]) + (ord($data[$p+1])<<8) + (ord($data[$p+2])<<16) + (ord($data[$p+3])<<24));
   }
    if ($lenpos == 4){
     $a += (ord($data[$p+0]) + (ord($data[$p+1])<<8) + (ord($data[$p+2])<<16) + (ord($data[$p+3])<<24));
   }
    if ($lenpos == 3){
     $a += (ord($data[$p+0]) + (ord($data[$p+1])<<8) + (ord($data[$p+2])<<16));
   }
    if ($lenpos == 2){
     $a += (ord($data[$p+0]) + (ord($data[$p+1])<<8));
   }
    if ($lenpos == 1){
     $a += ord($data[$p+0]);  $a &= 0xffffffff;  $b &= 0xffffffff;  $c &= 0xffffffff;
   }

    if ($lenpos == 0){
    	 return array($c, $b);
    	 }; 
    last($a, $b, $c); 
    return array($c, $b);
  }

function hashlittle($data, $initval=0){
    $c = hashlittle2($data, $initval, 0); return $c;
};
$hashstr = 'Four score and seven years ago';
$hash = hashlittle2($hashstr, 0xdeadbeef, 0xdeadbeef);
print $hashstr . ": " . $hash[0] . " " . $hash[1] . "<br />";

$hash = hashlittle($hashstr, 0);
print $hashstr . ": " . $hash[0] . " " . $hash[1] . "<br />";
?>

Open in new window

Slick812Commented:
OK this has the
$c &= 0xffffffff;

which is correct for some languages AND could be used if your stuff is on a 64 bit PHP which allows LARGE numbers (integers) over the 32 bit limit since the limit for 32 bit integers is half of 0xffffffff so it is a useless call since it already lower than 0xffffffff

as I have said a 32 bit php integer is not used as an UNSIGNED value, it can be plus or minus, which will completely screw up any addition or subtraction in other code sets  as in
if you do (a - b) with b larger than a , in some languages, and the a and b are unsigned, you will NOT get a negative number, but the value will roll around to a positive number, this is confusing to coders that do not think of a number as BIT sets in a memory location, but as they learned of numbers in elementary school.
Slick812Commented:
I do not think I will have time today or tomorrow to try and redo the c code to php, even if I could do it and get two unsigned 32 bit number values, is I output those as a 32 bit PHP value, if the high bit is set php will have this as a negative number, not a fatal condition, except in how you may use this integer,
some php coders do this by passing it as a string or as a HEX value to avoid this problem, but you then will have to translate it back to the number.
Not sure if this will help but I think the php hash using the whirlpool algorithm is probably more better hash than jenlins for 64 bit hash
$re = hash('whirlpool', $in, true);
Rich RumbleSecurity SamuraiAuthor Commented:
The appeal of the Jenkins hash is that it's an integer hash as opposed to an alphanumeric hash, searching digits is much faster. There are other (index)hashes we could use, but this one seems to be more accepted than FNV and others (http://www.azillionmonkeys.com/qed/hash.html) Mr Jenkins has a newer hash (Spooky Hash) that is still in the works, but for our needs we think the Jenkins hash will fit the bill. Again appreciate your insight, it's possible we will require our customers to install the Jenkins ext, but the initial goal was to not make that a requirement if at all possible. Believe it or not, creating the hashes quickly isn't as important to us as not having collisions, but not so unimportant that we use slower hashes like md5/tiger etc... I hope that makes sense? We've used it before (jenkins) but not outside of C.
-rich
Slick812Commented:
OK, Important to consider for this, since you are looking for SPEED - - if you do a server side script as PHP and get a half way correct translation of the jenkin to php, IT will NOT be that fast, Why ? because PHP is a scripted language and slows down the calls,  as opposed to the code calls in a linked language like C or C++, This isn't to say it may still be faster than calling php md5( ), but it may take longer, , until you test and see. I did a web search for "php jenkins hash"  but did not find any, seems like there should be one somewhere if it was faster? but there were some for php as c code add on extensions, that are complied on the php core. So maybe not so much faster in PHP script? OR just a hassel to do the conversion with so many UNSIGNED additions and subtractions.
Rich RumbleSecurity SamuraiAuthor Commented:
It's the searching speed, not the generation speed we are concerned with, so an integer hash is what we will require from the searching side. PHP will be used to generate the integer hash for the index, but the searching (of the index) is done using other programs. Requiring our customer base to add an extension to their environment gets a lot of push back, but the customers do like our "idea" and not having to augment their installs with an ext or other library if it was "native php". We do understand why an interpreted/scripted language is slower for the task, and we can be slower on the generation of the index hash if that means the searching side is improved. The application isn't "real-time" so if it is possible to port Jenkins, we'd love to do so using a script as opposed to the "overhaul" we'd be doing creating an executable and or ext for the current base.
The long and the short of it is, we'd like to try it this way, were pretty sure it will work, the others are busy so I thought I'd try to help them but got in over my head pretty quickly :)
-rich
Slick812Commented:
Well, if the generation is not so Speed important I would consider using
$re = hash('whirlpool', $in, true);
although this returns a String, this string is a 64 bit Integer, and PHP has no containers for this sort of number except as a string. You can divide this string in half and convert it to 2 integer values.
Also I looked at a PHP core add on that is compiled into PHP that did jenkins hash and it also returned a String, in PHP this seems to be the "Standard" behavior.

I was drawn to this because it was a challenge for me, so I did a conversion from C code, there he used a very good method for single byte placement of the Input string, although I can not use the pointer arithmetic of C++, I did a conversion, , HOWEVER, since this is a HASH I did not have the time to do the difficult catch and push for values that are beyond the max integer about 2,200,000,000,000 or the negative numbers, In a hash this may not matter, or it may, I am not so familiar statistical math analysis. I allowed the normal PHP signed integer rollover when added or subtracted values go beyond the range of the integer, mostly for SPEED. I gaurantee that this Will NOT give the same integer output as an Unsigned Integer model would, the 2 output integers CAN be negative numbers

it looks like
function hashlittle2($inStr, &$pc, &$pb)
the $pc  and $pb are both INPUT and OUTPUT as the C code version is
In the code below there is the function and code below it to see how to use it

$inStr = 'HashStringHere';
$int1 = 7;
$int2 = 16;
hashlittle2($inStr, $int1, $int2);
echo $int1, ' : ', $int2;// $int1, $int2 are changed to the OUTPUT values to use


You requirements may not can use negative values, but to have this do full unsigned Integer would require much added code, That I beleive would slow this down to the point that it would not be faster.
I had to make some order changes to accommodate the negative values possible in PHP, So this may or may not have more collisions? ?
You can try this code or not, I have not real way to test for collisions, but the several that I tries changing just one BIT of the input did change both outputs
function hashlittle2($inStr,//hash Input
&$pc,// IN and OUT main Start Integer
&$pb){// IN and OUT second Start Integer
$length = strlen($inStr);
$a = $b = $c = 3735928559 + $length + $pc;
$c += $pb;
//roll through inStr one byte at a time until length is less than 13
$i = 0;// Use increase $i to push to next byte
while ($length > 12){
	$a += ord($inStr{$i++});
	$a += ord($inStr{$i++})<<8;
	$a += ord($inStr{$i++})<<16;
	$a += ord($inStr{$i++})<<24;
	$b += ord($inStr{$i++});
	$b += ord($inStr{$i++})<<8;
	$b += ord($inStr{$i++})<<16;
	$b += ord($inStr{$i++})<<24;
	$c += ord($inStr{$i++});
	$c += ord($inStr{$i++})<<8;
	$c += ord($inStr{$i++})<<16;
	$c += ord($inStr{$i++})<<24;// $i = 12 after inc
	//mix($a,$b,$c);// don't call function for speed increase
	$a^=($c<<4)|($c>>28);//rot($c,4);
	$a-=$c;
	$c+=$b;
	$b^=($a<<6)|($a>>26);//rot($a,6);  
	$b-=$a;
	$a+=$c;
	$c^=($b<<8)|($b>>24);//rot($b,8);
	$c-=$b;
	$b+=$a;
	$a^=($c<<16)|($c>>16);//rot($c,16);
	$a-=$c;
	$c+=$b;
	$b^=($a<<19)|($a>>25);//rot($a,19);
	$b-=$a;
	$a+=$c;
	$c^=($b<<4)|($b>>28);//rot($b,4);
	$c-=$b;
	$b+=$a;
	$length-=12;
    }

switch($length){// add each remaining byte according to length left
    case 12: $c+=ord($inStr{$i+11})<<24;
    case 11: $c+=ord($inStr{$i+10})<<16;
    case 10: $c+=ord($inStr{$i+9})<<8;
    case 9 : $c+=ord($inStr{$i+8});
    case 8 : $b+=ord($inStr{$i+7})<<24;
    case 7 : $b+=ord($inStr{$i+6})<<16;
    case 6 : $b+=ord($inStr{$i+5})<<8;
    case 5 : $b+=ord($inStr{$i+4});
    case 4 : $a+=ord($inStr{$i+3})<<24;
    case 3 : $a+=ord($inStr{$i+2})<<16;
    case 2 : $a+=ord($inStr{$i+1})<<8;
	case 1 : $a+=ord($inStr{$i}); break;
    case 0 : $pc=$c; $pb=$b; return;// do NOT finalize if there are no changes (additions)
    }
//final($a,$b,$c);
$c^=$b; $c-=($b<<14)|($b>>18);//rot($b,14);
$a^=$c; $a-=($c<<11)|($c>>21);//rot($c,11);
$b^=$a; $b-=($a<<25)|($a>>7);//rot($a,25);
$c^=$b; $c-=($b<<16)|($b>>16);//rot($b,16);
$a^=$c; $a-=($c<<4)|($c>>28);//rot($c,4);
$b^=$a; $b-=($a<<14)|($a>>18);//rot($a,14);
$c^=$b; $c-=($b<<24)|($b>>8);//rot($b,24);
$pc=$c; $pb=$b;
} // function END


$inStr = ' 8Kd[f.ij~lmnopqY';
$int1 = 1234567890;
$int2 = 2198765432;
//$int1 = 7;
//$int2 = 16;
hashlittle2($inStr, $int1, $int2);
echo '<br> hashlittle1= ',$int1, ' - ',$int2;
//$int1 = 7;
//$int2 = 16;
$int1 = 1234567890;
$int2 = 2198765432;
$inStr = ' 8Kd[g.ij~lmnopqY';
hashlittle2($inStr, $int1, $int2);
echo '<br> hashlittle2= ',$int1, ' - ',$int2;
$int1 = 1234567889;
$int2 = 2198765432;
hashlittle2($inStr, $int1, $int2);
echo '<br> hashlittle3= ',$int1, ' - ',$int2;

Open in new window

Slick812Commented:
sorry I am way off in my thinking about the
$re = hash('whirlpool', $in, true);
this is a 64 Byte output not a 64 BIT output DUHHH
anyway I do not ever remember using a 8 byte output hash
md2, md4, md5  are all 128 bit (16 byte) output, , adler32 and crc32b are 4 byte output
anyway if you use the md5 you can still convert two 4 byte segments to integers.
Rich RumbleSecurity SamuraiAuthor Commented:
Thanks again, I'll begin testing straight away!
-rich
Slick812Commented:
OK, I became more interested in the jenkins hash, I used the authors (jenkins) code to do another php version, there was some difference in the byte order, big endian and little endian from the code I used before, The way this does addition and subtraction, I personally can not see that either way makes any difference mathematically, except to match a hash from another source. I also got the XOR ordering to work as jenkins has it, it looks like the other guy did not do it right. I have coded hashes, and most of the code work for hashes uses the same Byte number for the operations as it returns, like md5, it always works on a block of 16 bytes and returns a 16 byte result. As I researched this I noticed that jenkins has a 12 byte segment that the work is performed on, and yet it supposedly returns a 4 byte result, or another "extended" one returns an 8 byte, 2 integers result, Some how this seems mathematically screwy to me, how can you abandon 4 bytes of the work value and expect it to be that unique? ? but 4 bytes is an all-fully low byte number for a hash these days. I guess it only matters that it is fast? ? I also do not "Get" why this uses a two integer input for the hash calculation, is this like a "seed" value?, I do not remember seeing other hashes with integer inputs, and how are these 2 integer inputs related to the collisions frequency of the output? If you change one or both integers, it changes the output, but do certain values alter the XOR correspondence to make collisions more likely? In his notes jenkins claims to have taken into account some chances for collisions by using something at Dr. Bobs coding web site. Not very convincing for me to consider this a bullet proof hash.

My new code is below, I changed the way this returns a result, I return ALL THREE working value integers in an array in the function return, and No longer pass by reference the 2 integer inputs. The output of the  hashlittle2( ) is a three element array of signed integers, the first integer at [0] is the primary, the next at [1] is the secondary, and the third at [2] is the remainder of the working value of 12 bytes. personally I would consider this a 12 byte output hash, but I guess that an  integer or two was easier to present, I don't really get his thinking on this.
function hashlittle2($inStr,//hash Input
$c,// IN main Start Integer
$b){// IN second Start Integer
$length = strlen($inStr);
$c += 3735928559 + $length;
$a = $c;
$c += $b;
$b = $a;
//roll through inStr one byte at a time until length is less than 13
$i = 0;// Use increase $i to push to next byte
while ($length > 12){
	$a += ord($inStr{$i++})<<24;
	$a += ord($inStr{$i++})<<16;
	$a += ord($inStr{$i++})<<8;
	$a += ord($inStr{$i++});
	$b += ord($inStr{$i++})<<24;
	$b += ord($inStr{$i++})<<16;
	$b += ord($inStr{$i++})<<8;
	$b += ord($inStr{$i++});
	$c += ord($inStr{$i++})<<24;
	$c += ord($inStr{$i++})<<16;
	$c += ord($inStr{$i++})<<8;
	$c += ord($inStr{$i++});// $i = 12 after inc
	//mix($a,$b,$c);// don't call function for speed increase
	$a-=$c;
	$a^=($c<<4)|($c>>28);//rot($c,4);
	$c+=$b;
	$b-=$a;
	$b^=($a<<6)|($a>>26);//rot($a,6);  
	$a+=$c;
	$c-=$b;
	$c^=($b<<8)|($b>>24);//rot($b,8);
	$b+=$a;
	$a-=$c;
	$a^=($c<<16)|($c>>16);//rot($c,16);
	$c+=$b;
	$b-=$a;
	$b^=($a<<19)|($a>>13);//rot($a,19);
	$a+=$c;
	$c-=$b;
	$c^=($b<<4)|($b>>28);//rot($b,4);
	$b+=$a;
	$length-=12;
    }

switch($length){// add each remaining byte according to length left
    case 12: $c+=ord($inStr{$i+11});
    case 11: $c+=ord($inStr{$i+10})<<8;
    case 10: $c+=ord($inStr{$i+9})<<16;
    case 9 : $c+=ord($inStr{$i+8})<<24;
    case 8 : $b+=ord($inStr{$i+7});
    case 7 : $b+=ord($inStr{$i+6})<<8;
    case 6 : $b+=ord($inStr{$i+5})<<16;
    case 5 : $b+=ord($inStr{$i+4})<<24;
    case 4 : $a+=ord($inStr{$i+3});
    case 3 : $a+=ord($inStr{$i+2})<<8;
    case 2 : $a+=ord($inStr{$i+1})<<16;
    case 1 : $a+=ord($inStr{$i})<<24; break;
    case 0 : return array($c,$b,$a);// do NOT finalize if there are no changes (additions)
    }
//final($a,$b,$c);
$c^=$b; $c-=($b<<14)|($b>>18);//rot($b,14);
$a^=$c; $a-=($c<<11)|($c>>21);//rot($c,11);
$b^=$a; $b-=($a<<25)|($a>>7);//rot($a,25);
$c^=$b; $c-=($b<<16)|($b>>16);//rot($b,16);
$a^=$c; $a-=($c<<4)|($c>>28);//rot($c,4);
$b^=$a; $b-=($a<<14)|($a>>18);//rot($a,14);
$c^=$b; $c-=($b<<24)|($b>>8);//rot($b,24);
return array($c,$b,$a);
}

$inStr = ' 8Kd[f.i?~';
$int1 = 88;
$int2 = 17;
$re1 = hashlittle2($inStr, $int1, $int2);
echo '<br>hashlittle1= ',$re1[0], ' [1] ',$re1[1], ' [2] ',$re1[2];

Open in new window

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Rich RumbleSecurity SamuraiAuthor Commented:
Wow, thanks again for giving this so much attention! I did not compare this to the js/py versions, or the C version for that matter, but it's looks very good. Are "mix" and "final" being done in the script as it was posted, or should they be uncommented to match other jenkins implementations, specifically the C (original) code?
I'll get back to you with my testing as soon as I can.
-rich
Slick812Commented:
you ask = "Are "mix" and "final" being done in the script as it was posted", I thought that you could see from my comment that I took the code out of the functions (actually they are NOT technically functions, in the C code there are "Defines" which I do not think that PHP has, which looks like a Function, but C code has the Function define and call, if it is a Define it uses a different mechanism, a sort of "substitution" when you write mix( i1, i2) the code in the define if placed there for execution, which is different that a function call.
Anyway if you look you will see ALL of the mix( ) code, but is is NOT in a function just line by line code , which executes faster, there is absolutely no reason for outsice functions here, since that mix( ) and final( ) code are only used once, not several times.

Are you doing any sort of collision testing?, , the PHP signed integer should have about as good of a result for this as an Unsigned integer ( my uninformed rough estimate), but be warned, this can out put a positive Zero and a negative Zero for an integer value, in PHP it does NOT recognize a negative Zero and treats it as a positive Zero ignoring the High Bit Set. So you will definitely have a chance for collisions there, 4 times if you use 2 integers..
Rich RumbleSecurity SamuraiAuthor Commented:
That is what confused me, I saw the code but no function built around it and nothing calling it :) I understand now. Thanks for the through answers and warnings. Collision testing is going to take place next week, I may accept the answer sooner, I'm keen to get started!
-rich
Rich RumbleSecurity SamuraiAuthor Commented:
Thanks for your help, I'll have more questions coming soon :)
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
PHP

From novice to tech pro — start learning today.