Question

Help with uniform hash distribution

Asked by: rllibby


I am working on a project that requires me to store and later validate/search for a set of pointers, and I could use a hand with the hash function for calculating the index.

First, the requirements:

1.) I only need comments/assistance with the hash function itself, nothing else. Do not suggest things like sorted lists, binary trees, etc.

2.) My hash table size needs to be kept in the ballpark of 16384 items; I will be using bucket chaining for collisions resolution. The 16384 is designed to be optimal with the storage of appx 256K items (a perfect distribution will mean a maximum of 16 checks per lookup, vs the 18 max checks required by a binary search)

3.) The keys for the hash are memory pointers, and more specifically, will come from either a GetMem or AllocMem call. This does give them (the keys) some traits, such as being 4 byte aligned, etc.

4.) 500 pts will be given to each answer that is an improvement on what I already have.

Here is the test frame work that I currently have set up. Again, I am only looking for enhancements to the following function.

function TForm1.HashIndex(Value: Pointer): Integer;

Thanks to all in advance,
Russell



--- Form Source Code ---

unit main;

interface

uses
  Windows, Messages, SysUtils, Classes, Graphics, Controls, Forms, Dialogs,
  StdCtrls, ExtCtrls;

// Prime number for hash size
const
  HASH_SIZE      =  16349;

// Hash definition
type
  THashArray     =  Array [0..Pred(HASH_SIZE)] of Integer;

type
  TForm1         =  class(TForm)
     Button1:    TButton;
     ListBox1:   TListBox;
     procedure   Button1Click(Sender: TObject);
     procedure   FormCreate(Sender: TObject);
     procedure   ListBox1DrawItem(Control: TWinControl; Index: Integer; Rect: TRect; State: TOwnerDrawState);
  private
     // Private declarations
     FMaxCount:  Integer;
     FHashArray: THashArray;
  public
     // Public declarations
     function    HashIndex(Value: Pointer): Integer;
  end;

var
  Form1:         TForm1;

implementation
{$R *.DFM}

function TForm1.HashIndex(Value: Pointer): Integer;
var Key:      Integer;
begin

  // Mod the value with the hash size
  result:=(Integer(Value) shr 6) mod HASH_SIZE;

end;

procedure TForm1.Button1Click(Sender: TObject);
var  List:       TList;
     i:          Integer;
     j:          Integer;
     p:          Pointer;
     dwStats:    Array [0..2] of Integer;
     dwSlots:    Integer;
     arrInts:    Array of Integer;
begin


  // Setup for test
  List:=TList.Create;
  ListBox1.Items.Clear;
  ZeroMemory(@FHashArray, SizeOf(THashArray));
  Randomize;

  try
     // Allocate memory blocks and increment the hash slot of where the item falls
     for i:=0 to 262144 do
     begin
        p:=AllocMem(Succ(Random(32)));
        Inc(FHashArray[HashIndex(p)]);
        List.Add(p);
     end;
     // Calculate the hash stats
     dwStats[0]:=0;       // Max value
     dwStats[1]:=MaxInt;  // Min value
     dwStats[2]:=0;       // Count of empty slots
     for i:=0 to High(FHashArray) do
     begin
        if (FHashArray[i] > dwStats[0]) then dwStats[0]:=FHashArray[i];
        if (FHashArray[i] < dwStats[1]) then dwStats[1]:=FHashArray[i];
        if (FHashArray[i] = 0) then Inc(dwStats[2]);
     end;
     // Allocate memory for the number of slots
     dwSlots:=Succ(dwStats[0]-dwStats[1]);
     SetLength(arrInts, dwSlots);
     // Fill in the count for each slot
     for i:=dwStats[1] to dwStats[0] do
     begin
        for j:=0 to High(FHashArray) do
        begin
           if (FHashArray[j] = i) then Inc(arrInts[i-dwStats[1]]);
        end;
     end;
     // Graph the hash statistics
     ListBox1.Items.BeginUpdate;
     FMaxCount:=0;
     for i:=dwStats[1] to dwStats[0] do
     begin
        j:=arrInts[i-dwStats[1]];
        if (j > FMaxCount) then FMaxCount:=j;
        ListBox1.Items.AddObject(Format('%d bucket(s) with %d items', [j, i]), Ptr(j));
     end;
     ListBox1.Items.EndUpdate;
     Caption:=Format('Min bucket count = %d, Max bucket count = %d', [dwStats[1], dwStats[0]]);
     // Free memory
     SetLength(arrInts, 0);
  finally
     // Free test memory
     for i:=Pred(List.Count) downto 0 do FreeMem(List[i]);
     // Tear down test
     List.Clear;
     List.Free;
  end;

end;

procedure TForm1.FormCreate(Sender: TObject);
begin

  FMaxCount:=0;

end;

procedure TForm1.ListBox1DrawItem(Control: TWinControl; Index: Integer; Rect: TRect; State: TOwnerDrawState);
var  dwCount:       Integer;
     rcFill:        TRect;
     dblPct:        Double;
begin

  // Calculate the percentage based on max and count
  dwCount:=Integer(ListBox1.Items.Objects[Index]);
  dblPct:=(dwCount / Succ(FMaxCount));

  // Get the rect to draw
  with ListBox1.Canvas do
  begin
     FillRect(Rect);
     rcFill:=Rect;
     rcFill.Right:=rcFill.Left+Trunc((rcFill.Right-rcFill.Left) * dblPct);
     Brush.Color:=clYellow;
     FillRect(rcFill);
     Brush.Color:=clWindow;
     SetBkMode(Handle, TRANSPARENT);
     TextOut(Rect.Left+2, Rect.Top+2, ListBox1.Items[Index]);
  end;

end;

end.

--- form ---

object Form1: TForm1
  Left = 80
  Top = 117
  Width = 638
  Height = 373
  Caption = 'Form1'
  Color = clBtnFace
  Font.Charset = DEFAULT_CHARSET
  Font.Color = clWindowText
  Font.Height = -11
  Font.Name = 'MS Sans Serif'
  Font.Style = []
  OldCreateOrder = False
  OnCreate = FormCreate
  PixelsPerInch = 96
  TextHeight = 13
  object Button1: TButton
    Left = 8
    Top = 8
    Width = 75
    Height = 25
    Caption = 'Button1'
    TabOrder = 0
    OnClick = Button1Click
  end
  object ListBox1: TListBox
    Left = 136
    Top = 0
    Width = 494
    Height = 339
    Align = alRight
    ItemHeight = 16
    Style = lbOwnerDrawFixed
    TabOrder = 1
    OnDrawItem = ListBox1DrawItem
  end
end



This Question has been solved and asker verified All Experts Exchange premium technology solutions are available to subscription members.

Subscribe now for full access to Experts Exchange and get

Instant Access to this Solution

  • Plus...
  • 30 Day FREE access, no risk, no obligation
  • Collaborate with the world's top tech experts
  • Unlimited access to our exclusive solution database
  • Never be left without tech help again

Subscribe Now

Asked On
2004-02-16 at 09:33:27ID20886643
Tags

hash

Topic

Delphi Programming

Participating Experts
3
Points
500
Comments
21

Trusted by hundreds of thousands everyday for fast, accurate and reliable tech support.

  • "The time we save is the biggest benefit of Experts Exchange to Warner Bros. What could take multiple guys 2 hours or more each to find is accessed in around 15 minutes on Experts Exchange." Mike Kapnisakis, Warner Bros.
  • "Our team likes having a resource that is more secure than just using Google and most experts using this service really know their stuff. It's nice to look here first versus using Google." Dayna Sellner, Lockheed Martin
  • "Anytime that I've been stumped with a problem, 9 out of 10 times Experts Exchange has either the accepted solution or an open discussion of the potential solution to the problem." Kenny Red, eBay Inc.

See what Experts Exchange can do for you.

Got a question?

We've got the answer.

Experts Exchange has been collecting answers to technology questions since 1996…3 million and counting! If you have a question, chances are we already have your answer.

Screenshot of Experts Exchange Knowledgebase

Need individual assistance?

Our experts are ready to help.

If you can't find the exact answer you're looking for, ask our exclusive community of 50,000 experts. You’ll get a personalized answer from a trusted professional.

Screenshot of Experts Exchange Knowledgebase

Want to learn from the best?

Read articles from industry experts.

Thousands of free tech tips, tricks, how-to’s and tutorials are available in our peer reviewed articles section. See for yourself how smart our experts are, no login required.

Screenshot of an Article

Working on a long term project?

Store your work and research.

Save solutions to your questions, answers you’ve discovered through searching plus helpful articles in your personal knowledgebase for easy future access.

Screenshot of Experts Exchange Knowledgebase

Access the answers to your technology questions today.

Subscribe Now

30-day free trial. Register in 60 seconds.

What Makes Experts Exchange Unique?

Members of the expert community talk about why the experience at Experts Exchange is different than what you will find anywhere else.

Trusted by the world's most respected brands.

image of each brand's logo

Faithfully serving IT professionals since 1996.

Experts Exchange Logo

Try it out and discover for yourself.

Subscribe Now

30-day free trial. Register in 60 seconds.

Related Solutions

  1. Fast hash/checksum
    I am writing a script that is going to compare the hash of files to determine which matches the hash of a given file. e.g. file1hash == file2hash .... FALSE file1hash == file3hash .... FALSE file1hash == file4hash .... FALSE file1hash == file5hash .... TRUE I could use som...
  2. Constructor on TObject descendant
    The cReate for a TObject descendant is not getting fired, in fact it's not trecognised by the complier (no blue dots). here's some sample code. What have I done wrong? Thanks, Tom. TOne = class(TObject) private protected public published end; TTwo = class(TO...
  3. Examining TObjects
    Hello, I'm (temporarily) coding in delphi 6 and using the Borland Delphi Enterprise IDE. In php we have a function called print_r() that will display the contents of a nested array in an easily understandable manner. In the current project I have a TStringList object with ...
  4. How to Thread Safe my TObject List and optimize it ?
    I'm creating multi thread program which mainly processing list of incoming data and output it into another list ready to send. So now my thread processing mainly data from TObjectList. But the problem with TObjectList in my program is it is strickly array like list instead ...
  5. Hashing to Disk
    Hi, I'd like help converting a hashing program with it's input being a text file, from a hash in memory to a disk hash. As the size of my text documents have increased, I’d like to use this methodology, as I am reaching the limit of in-memory hashing. The program is on the...
  6. Need to determin TObject ClassType & Name
    Hello, I need to determine the ClassType and Name of the TObject created before the sender... procedure MyProcedure(Sender: TObject); var TheAnswer: TObject; begin If Sender.ClassType = TForm then DoMyFormProcedure(TForm(Sender)) else // begin MyProc...

Free Tech Articles

  1. WARNING: 5 Reasons why you should NEVER fix a computer for free.
    It is in our nature to love the puzzle. We are obsessed. The lot of us. We love puzzles. We love the challenge. We thrive on finding the answer. We hate disarray. It bothers us deep in our soul. W...
  2. SCCM OSD Basic troubleshooting
    SCCM 2007 OSD is a fantastic way to deploy operating systems, however, like most things SCCM issues can sometimes be difficult to resolve due to the sheer volume of logs to sift through and the dispe...
  3. Migrate Small Business Server 2003 to Exchange 2010 and Windows 2008 R2
    This guide is intended to provide step by step instructions on how to migrate from Small Business Server 2003 to Windows 2008 R2 with Exchange 2010. For this migration to work you will need the fo...
  4. Create a Win7 Gadget
    This article shows you how to create a simple "Gadget" -- a sort of mini-application supported by Windows 7 and Vista. Gadgets can be dropped anywhere on the desktop to provide instant information, ...
  5. Outlook continually prompting for username and password
    There have been a lot of questions recently regarding Outlook prompting for a username and password whilst using Exchange 2007. There are a few reasons why this would happen and I will try to cover t...
  6. Backup Exchange 2010 Information Store using Windows Backup
    There seems to be quite a lot of confusion around the ability to backup Exchange 2010 using the built in Windows Backup feature. This stems from the omission of this feature prior to Exchange 2007 s...

Cloud Class Webinars

  1. Avoiding Bugs in Microsoft Access
    Alison Balter takes and in-depth look at avoiding bugs in Access. In this webinar you will learn about using the immediate window to debug your applications, invoking the debugger, using breakpoints to troubleshoot, stepping through code, setting the next statement to execute, ...
  2. Top 10 Best New Features in Visio 2010
    Scott Helmers gives live demonstrations of the top 10 new features in Visio 2010. This webinar will teach you how to create compelling diagrams by adding shapes to the page with a single click, linking the shapes in a diagram to data in Excel (or SQL Server, or SharePoint), ...
  3. IT Consultant Business Secrets Revealed
    Michael Munger, Experts Exchange tech pro and IT consultant, pulls back the curtain on his very successful businesses and answers question on every IT consultant and business owner should know about. He shares secrets on what he did to solve the 5 most common problems in IT, ...
  4. Disaster Recovery and Business Continuity
    Quest CTO, Mike Billon, gives an overview of the steps involved in building a dunamic disaster recovery plan. Through case studies and an examination of software/hardware tooles for monitoring and testing, you'll gain a better understandin of where you are, where you want ...
  5. Organize Your Visio Diagrams with Containers and Lists
    Scott Helmers uses cross functional flowcharts, wireframe diagrams, data graphic legends and seating charts to teach you: how to ustilize all three new structured diagram components in Visio 2010, the best practices for organizeing shapes in previous version of Visio, how to organize ...
  6. How to Us Objects, Properties, Events and Methods in Microsoft Access
    Alison Dalter gives an in-depbth look at objects, properties, events and methods in Microsoft Access. In this webinar you will learn about using the object browser, referring to objects, working with properties and methods, working with object variables, understanding the ...

Join the Community

Give a Little. Get a Lot.

Join the community of experts here and help other tech pros by answering question in your area of expertise. You can earn FREE access to all Experts Exchange's premium features and resources.

Join the Community

Answers

 

by: kretzschmarPosted on 2004-02-16 at 10:50:22ID: 10374530

listening  . . .

 

by: sftwengPosted on 2004-02-16 at 12:28:23ID: 10375477

How about this (not perfect but better I think):
------------------------------------
implementation
{$R *.DFM}
uses CRC32c;

function TForm1.HashIndex(Value: Pointer): Integer;
var Key:      Integer;
    vCRC32 : Cardinal;
    ik, ix : Integer;
    currentByte : Byte;
begin

  // Seed the CRC calculation
  vCRC32 := CRCSeed;
  ik := Integer(Value) shr 6;
  for ix := 1 to 4 do
  begin
    currentByte := ik MOD 256;
    vCRC32 := CRC32(currentByte,vCRC32);
    ik := ik DIV 256;
  end {for};
  vCRC32 := CRCEnd(vCRC32);
  // Mod the value with the hash size
  result := vCRC32 mod HASH_SIZE;

end;
--------------------------------------------------
where:

Unit CRC32c;
{ Modified from a version posted on the Pascal echo by Floor
  A.C. Naaijkens (note introduction of a Constant and a
  Function and making the Array a Constant outside of
  crc32() which should improve speed a lot; further
  references made to a C version in a File of Snippets from the
  C Echo marked as "Copyright (C) 1986 Gary S. Brown" (mostly to
  compare the large Arrays, which proved to be identical), and to
  "File verification using CRC" by Mark R. Nelson in Dr. Dobbs'
  Journal, May 1992.  The latter provided the final piece of
  crucial information.  Use: 1) Create a LongInt Variable For the
  CRC value; 2) Initialize this With CRCSeed; 3) Build the
  CRC Byte-by-Byte With CRC32(); 4) Finish With CRCend()
  (this was the part I found in Nelson).  The result is a CRC
  value identical to that calculated by PKZip and ARJ.
}
{
Now to use it...

With a LongInt Variable, say vCRC32, first seed it
  vCRC32 := CRCSeed;
Then go Byte-by-Byte thorugh to calculate:
  For P := 1 to Size DO
    vCRC32 := CRC32(Bytes[P], vCRC32);

  Then finish it off With CRCend

    vCRC32 := CRCend(vCRC32);

  You should be able to Write your own Dec2Hex Function =)
}
Interface
  Const
    CRCSeed = Cardinal($ffffffff);
    CRC32tab : Array[0..255] of Cardinal = (
      $00000000, $77073096, $ee0e612c, $990951ba, $076dc419, $706af48f,
      $e963a535, $9e6495a3, $0edb8832, $79dcb8a4, $e0d5e91e, $97d2d988,
      $09b64c2b, $7eb17cbd, $e7b82d07, $90bf1d91, $1db71064, $6ab020f2,
      $f3b97148, $84be41de, $1adad47d, $6ddde4eb, $f4d4b551, $83d385c7,
      $136c9856, $646ba8c0, $fd62f97a, $8a65c9ec, $14015c4f, $63066cd9,
      $fa0f3d63, $8d080df5, $3b6e20c8, $4c69105e, $d56041e4, $a2677172,
      $3c03e4d1, $4b04d447, $d20d85fd, $a50ab56b, $35b5a8fa, $42b2986c,
      $dbbbc9d6, $acbcf940, $32d86ce3, $45df5c75, $dcd60dcf, $abd13d59,
      $26d930ac, $51de003a, $c8d75180, $bfd06116, $21b4f4b5, $56b3c423,
      $cfba9599, $b8bda50f, $2802b89e, $5f058808, $c60cd9b2, $b10be924,
      $2f6f7c87, $58684c11, $c1611dab, $b6662d3d, $76dc4190, $01db7106,
      $98d220bc, $efd5102a, $71b18589, $06b6b51f, $9fbfe4a5, $e8b8d433,
      $7807c9a2, $0f00f934, $9609a88e, $e10e9818, $7f6a0dbb, $086d3d2d,
      $91646c97, $e6635c01, $6b6b51f4, $1c6c6162, $856530d8, $f262004e,
      $6c0695ed, $1b01a57b, $8208f4c1, $f50fc457, $65b0d9c6, $12b7e950,
      $8bbeb8ea, $fcb9887c, $62dd1ddf, $15da2d49, $8cd37cf3, $fbd44c65,
      $4db26158, $3ab551ce, $a3bc0074, $d4bb30e2, $4adfa541, $3dd895d7,
      $a4d1c46d, $d3d6f4fb, $4369e96a, $346ed9fc, $ad678846, $da60b8d0,
      $44042d73, $33031de5, $aa0a4c5f, $dd0d7cc9, $5005713c, $270241aa,
      $be0b1010, $c90c2086, $5768b525, $206f85b3, $b966d409, $ce61e49f,
      $5edef90e, $29d9c998, $b0d09822, $c7d7a8b4, $59b33d17, $2eb40d81,
      $b7bd5c3b, $c0ba6cad, $edb88320, $9abfb3b6, $03b6e20c, $74b1d29a,
      $ead54739, $9dd277af, $04db2615, $73dc1683, $e3630b12, $94643b84,
      $0d6d6a3e, $7a6a5aa8, $e40ecf0b, $9309ff9d, $0a00ae27, $7d079eb1,
      $f00f9344, $8708a3d2, $1e01f268, $6906c2fe, $f762575d, $806567cb,
      $196c3671, $6e6b06e7, $fed41b76, $89d32be0, $10da7a5a, $67dd4acc,
      $f9b9df6f, $8ebeeff9, $17b7be43, $60b08ed5, $d6d6a3e8, $a1d1937e,
      $38d8c2c4, $4fdff252, $d1bb67f1, $a6bc5767, $3fb506dd, $48b2364b,
      $d80d2bda, $af0a1b4c, $36034af6, $41047a60, $df60efc3, $a867df55,
      $316e8eef, $4669be79, $cb61b38c, $bc66831a, $256fd2a0, $5268e236,
      $cc0c7795, $bb0b4703, $220216b9, $5505262f, $c5ba3bbe, $b2bd0b28,
      $2bb45a92, $5cb36a04, $c2d7ffa7, $b5d0cf31, $2cd99e8b, $5bdeae1d,
      $9b64c2b0, $ec63f226, $756aa39c, $026d930a, $9c0906a9, $eb0e363f,
      $72076785, $05005713, $95bf4a82, $e2b87a14, $7bb12bae, $0cb61b38,
      $92d28e9b, $e5d5be0d, $7cdcefb7, $0bdbdf21, $86d3d2d4, $f1d4e242,
      $68ddb3f8, $1fda836e, $81be16cd, $f6b9265b, $6fb077e1, $18b74777,
      $88085ae6, $ff0f6a70, $66063bca, $11010b5c, $8f659eff, $f862ae69,
      $616bffd3, $166ccf45, $a00ae278, $d70dd2ee, $4e048354, $3903b3c2,
      $a7672661, $d06016f7, $4969474d, $3e6e77db, $aed16a4a, $d9d65adc,
      $40df0b66, $37d83bf0, $a9bcae53, $debb9ec5, $47b2cf7f, $30b5ffe9,
      $bdbdf21c, $cabac28a, $53b39330, $24b4a3a6, $bad03605, $cdd70693,
      $54de5729, $23d967bf, $b3667a2e, $c4614ab8, $5d681b02, $2a6f2b94,
      $b40bbe37, $c30c8ea1, $5a05df1b, $2d02ef8d  );
Function CRC32(value: Byte; crc: Cardinal) : Cardinal;
Function CRCend( crc : Cardinal ) : Cardinal ;

Implementation

Function  CRC32(value: Byte; crc: Cardinal) : Cardinal;
begin
  Result := CRC32tab[Byte(crc xor Cardinal(value))] xor
           ((crc shr 8) and $00ffffff);
end;

Function CRCend( crc : Cardinal ): Cardinal;
begin
  CRCend := (crc xor CRCSeed);
end;

end.

 

by: sftwengPosted on 2004-02-16 at 12:33:14ID: 10375524

BTW, changing the order in which the bytes are presented appears not to improve it (I'm not surprised):

function TForm1.HashIndex(Value: Pointer): Integer;
var Key:      Integer;
    vCRC32 : Cardinal;
    ik, ix : Integer;
    currentByte : ARRAY [1..4] of Byte;
begin

  // Seed the CRC calculation
  vCRC32 := CRCSeed;
  ik := Integer(Value) shr 6;
  for ix := 1 to 4 do
  begin
    currentByte[ix] := ik MOD 256;
    ik := ik DIV 256;
  end {for};
  for ix := 4 downto 1 do
  begin
    vCRC32 := CRC32(currentByte[ix],vCRC32);
  end {for};
  vCRC32 := CRCEnd(vCRC32);
  // Mod the value with the hash size
  result := vCRC32 mod HASH_SIZE;

end;

 

by: sftwengPosted on 2004-02-16 at 12:43:00ID: 10375617

There's some more info about CRC16 and CRC32 at http://www.efg2.com/Lab/Mathematics/CRC.htm

 

by: rllibbyPosted on 2004-02-16 at 13:09:21ID: 10375838

sftweng,
First, thank you for the information on crc's; though I already have 16/32 bit crcs, lrcs, and vrc routines in my bag of code. ;-).

My question now, is did you run the new hash routine through the test application to determine the results? I did, and am posting the results that I saw:

Original Baseline
----------------------------------------
8 bucket(s) with 9 items
60 bucket(s) with 10 items
210 bucket(s) with 11 items
714 bucket(s) with 12 items
1349 bucket(s) with 13 items
2037 bucket(s) with 14 items
2464 bucket(s) with 15 items
2651 bucket(s) with 16 items
2508 bucket(s) with 17 items
1954 bucket(s) with 18 items
1240 bucket(s) with 19 items
681 bucket(s) with 20 items
311 bucket(s) with 21 items
107 bucket(s) with 22 items
35 bucket(s) with 23 items
15 bucket(s) with 24 items
4 bucket(s) with 25 items
1 bucket(s) with 26 items
----------------------------------------

Suggested implementation using CRC
----------------------------------------
52 bucket(s) with 0 items
1 bucket(s) with 1 items
115 bucket(s) with 2 items
160 bucket(s) with 3 items
161 bucket(s) with 4 items
314 bucket(s) with 5 items
373 bucket(s) with 6 items
426 bucket(s) with 7 items
608 bucket(s) with 8 items
644 bucket(s) with 9 items
771 bucket(s) with 10 items
852 bucket(s) with 11 items
886 bucket(s) with 12 items
915 bucket(s) with 13 items
890 bucket(s) with 14 items
967 bucket(s) with 15 items
895 bucket(s) with 16 items
888 bucket(s) with 17 items
895 bucket(s) with 18 items
816 bucket(s) with 19 items
714 bucket(s) with 20 items
619 bucket(s) with 21 items
541 bucket(s) with 22 items
495 bucket(s) with 23 items
430 bucket(s) with 24 items
383 bucket(s) with 25 items
312 bucket(s) with 26 items
264 bucket(s) with 27 items
211 bucket(s) with 28 items
169 bucket(s) with 29 items
126 bucket(s) with 30 items
118 bucket(s) with 31 items
83 bucket(s) with 32 items
67 bucket(s) with 33 items
37 bucket(s) with 34 items
40 bucket(s) with 35 items
37 bucket(s) with 36 items
21 bucket(s) with 37 items
14 bucket(s) with 38 items
12 bucket(s) with 39 items
8 bucket(s) with 40 items
3 bucket(s) with 41 items
4 bucket(s) with 42 items
2 bucket(s) with 43 items
6 bucket(s) with 44 items
1 bucket(s) with 45 items
0 bucket(s) with 46 items
1 bucket(s) with 47 items
1 bucket(s) with 48 items
0 bucket(s) with 49 items
0 bucket(s) with 50 items
0 bucket(s) with 51 items
0 bucket(s) with 52 items
1 bucket(s) with 53 items
----------------------------------------

So its not exactly what I would call an improvement, especially when you consider the extra overhead. Now, if you saw a different result set than what I am seeing (taking into account the slight deviation that can be attributed to the random function), then I would like to hear it, as it means that I need to consider system deviations as well.

Thanks again,
Russell

 

by: sftwengPosted on 2004-02-16 at 13:24:22ID: 10375971

Hi Russell, yes I did run the whole test suite and achieved similar results. My observation about improvement was based on the fact that the number of buckets with 16 or fewer items was almost twice as many using the CRC (around 17000 versus 9000).

Alan

 

by: sftwengPosted on 2004-02-16 at 13:36:26ID: 10376086

I withdraw the comment - made a stupid mistake in arithmetic - the result is no better.

 

by: sftwengPosted on 2004-02-16 at 13:50:21ID: 10376198

Have you tried something like this?

         RandSeed := Integer(Value) shr 6;
         result := Random(HASH_SIZE);

I haven't done a valid test yet because the use of "RandSeed" in the hashkey function collides with your use outside (changing the seed used to generate the addresses as a side-effect) but I'd expect a better result.

 

by: rllibbyPosted on 2004-02-16 at 14:00:12ID: 10376278

Alan,

No problem on the previous ;-)
I was going to mention the backside of the slide... meaning there are a lot MORE keys in the 19 and over buckets.

I am also running a Mean on the final result, and currently its between
820-920, and I am trying to improve that number.

Regarding the randseed...
Not sure if the randseed will work, but I will check it out. Bottom line is that this needs to be able to hash the key so the same index is always returned (otherwise I would never find anything)

Will take a look though @ the randseed though, thanks.
Russell

 

by: sftwengPosted on 2004-02-16 at 14:05:36ID: 10376333

Priming RandSeed gives repeatable results.

 

by: sftwengPosted on 2004-02-16 at 14:07:48ID: 10376364

Here's what I tried so far. Knuth multiplicative doesn't seem much better either. I added a RadioGroup to allow choice of algorithm.

 

by: sftwengPosted on 2004-02-16 at 14:08:01ID: 10376367

function TForm1.HashIndex(Value: Pointer): Integer;
var Key:      Integer;
    vCRC32 : Cardinal;
    ik, ix : Integer;
    i64 : Int64;
    currentByte : ARRAY [1..4] of Byte;
begin

  if RadioGroup1.ItemIndex = 0
  then result := (Integer(value) shr 6) mod HASH_SIZE
  else if RadioGroup1.ItemIndex = 2
  then begin
         RandSeed := Integer(Value) shr 6;
         result := Random(HASH_SIZE);
       end
  else if RadioGroup1.ItemIndex = 3
  then begin
         i64 := (Integer(value) shr 6) * 2654435761;
         result := i64 mod HASH_SIZE;
       end
  else begin
    // Seed the CRC calculation
    vCRC32 := CRCSeed;
    ik := Integer(Value) shr 6;
    for ix := 1 to 4 do
    begin
      currentByte[ix] := ik MOD 256;
      ik := ik DIV 256;
    end {for};
    for ix := 4 downto 1 do
    begin
      vCRC32 := CRC32(currentByte[ix],vCRC32);
    end {for};
    vCRC32 := CRCEnd(vCRC32);
    // Mod the value with the hash size
    result := vCRC32 mod HASH_SIZE;
  end {if};

end;

 

by: sftwengPosted on 2004-02-16 at 14:12:59ID: 10376407

There are some more interesting hash functions at http://www.concentric.net/~Ttwang/tech/inthash.htm. Regrettably I have to stop playing now and go prepare dinner. Good luck on your quest. I'll check back in again later to see how it's going.

Alan

 

by: rllibbyPosted on 2004-02-16 at 15:55:18ID: 10377376


Thanks for your help so far Alan, here is where I am currently at:

// Best avgerage mean so far
function TForm1.HashIndex(Value: Pointer): Integer;
begin

  result:=((Integer(Value) shr 6)+(Integer(Value) shr 5)) mod HASH_SIZE;

end;

----

Still think there is something to the numbers that I am missing....

Russell

 

by: bpanaPosted on 2004-02-16 at 16:09:06ID: 10377439

listening  . . .

 

by: sftwengPosted on 2004-02-17 at 00:01:20ID: 10379933

The "Weighted Avg 'Cost'" is coming out best with CRC32 with this code (note the cost calculation in the mainline):
------------------- Code
unit main;

interface

uses
  Windows, Messages, SysUtils, Classes, Graphics, Controls, Forms, Dialogs,
  StdCtrls, ExtCtrls;

// Prime number for hash size
const
  HASH_SIZE      =  16349;

// Hash definition
type
  THashArray     =  Array [0..Pred(HASH_SIZE)] of Integer;

type
  TForm1         =  class(TForm)
     Button1:    TButton;
     ListBox1:   TListBox;
    RadioGroup1: TRadioGroup;
     procedure   Button1Click(Sender: TObject);
     procedure   FormCreate(Sender: TObject);
     procedure   ListBox1DrawItem(Control: TWinControl; Index: Integer; Rect: TRect; State: TOwnerDrawState);
  private
     // Private declarations
     FMaxCount:  Integer;
     FHashArray: THashArray;
  public
     // Public declarations
     function    HashIndex(Value: Pointer): Integer;
  end;

var
  Form1:         TForm1;

implementation
{$R *.DFM}
uses CRC32c;

function TForm1.HashIndex(Value: Pointer): Integer;
var Key:      Integer;
    vCRC32 : Cardinal;
    ik, ix : Integer;
    i64 : Int64;
    currentByte : ARRAY [1..4] of Byte;
begin

  if RadioGroup1.ItemIndex = 0
  then result := (Integer(value) shr 3) mod HASH_SIZE
  else if RadioGroup1.ItemIndex = 1
  then begin
         // Seed the CRC calculation
         vCRC32 := CRCSeed;
         ik := (Integer(Value) shr 6);
         for ix := 1 to 4 do
         begin
           currentByte[ix] := ik MOD 256;
           ik := ik DIV 256;
           vCRC32 := CRC32(currentByte[ix],vCRC32);
         end {for};
         vCRC32 := CRCEnd(vCRC32);
         // Mod the value with the hash size
         result := vCRC32 mod HASH_SIZE;
       end
  else if RadioGroup1.ItemIndex = 2
  then begin
         RandSeed := Integer(Value) shr 5;
         result := Random(HASH_SIZE);
       end
  else if RadioGroup1.ItemIndex = 3
  then begin
         i64 := (Integer(value) shr 3) * 2654435761;
         result := (i64 shr 8) mod HASH_SIZE;
       end
  else if RadioGroup1.ItemIndex = 4
  then begin
         i64 := (Integer(value) shr 5) + (Integer(value) shr 6);
         result := i64 mod HASH_SIZE;
       end
  else begin
         result := (Integer(Value) shr 6) mod HASH_SIZE;
       end;

end;

procedure TForm1.Button1Click(Sender: TObject);
var  List:       TList;
     i:          Integer;
     j:          Integer;
     p:          Pointer;
     dwStats:    Array [0..2] of Integer;
     dwSlots:    Integer;
     arrInts:    Array of Integer;
     tot:        Integer;
begin


  // Setup for test
  List:=TList.Create;
  ListBox1.Items.Clear;
  ZeroMemory(@FHashArray, SizeOf(THashArray));
  Randomize;

  try
     // Allocate memory blocks and increment the hash slot of where the item falls
     for i:=0 to 262144 do
     begin
        p:=AllocMem(Succ(Random(32)));
        Inc(FHashArray[HashIndex(p)]);
        List.Add(p);
     end;
     // Calculate the hash stats
     dwStats[0]:=0;       // Max value
     dwStats[1]:=MaxInt;  // Min value
     dwStats[2]:=0;       // Count of empty slots
     for i:=0 to High(FHashArray) do
     begin
        if (FHashArray[i] > dwStats[0]) then dwStats[0]:=FHashArray[i];
        if (FHashArray[i] < dwStats[1]) then dwStats[1]:=FHashArray[i];
        if (FHashArray[i] = 0) then Inc(dwStats[2]);
     end;
     // Allocate memory for the number of slots
     dwSlots:=Succ(dwStats[0]-dwStats[1]);
     SetLength(arrInts, dwSlots);
     // Fill in the count for each slot
     for i:=dwStats[1] to dwStats[0] do
     begin
        for j:=0 to High(FHashArray) do
        begin
           if (FHashArray[j] = i) then Inc(arrInts[i-dwStats[1]]);
        end;
     end;
     // Graph the hash statistics
     ListBox1.Items.BeginUpdate;
     FMaxCount:=0;
     tot := 0;
     for i:=dwStats[1] to dwStats[0] do
     begin
        j:=arrInts[i-dwStats[1]];
        if (j > FMaxCount) then FMaxCount:=j;
        ListBox1.Items.AddObject(Format('%d bucket(s) with %d items', [j, i]), Ptr(j));
        tot := tot + i*j;
     end;
     tot := tot div dwStats[0];
     ListBox1.Items.EndUpdate;
     Caption:=Format('Min bucket count = %d, Max bucket count = %d, Weighted Avg ''Cost''= %d', [dwStats[1], dwStats[0], tot]);
     // Free memory
     SetLength(arrInts, 0);
  finally
     // Free test memory
     for i:=Pred(List.Count) downto 0 do FreeMem(List[i]);
     // Tear down test
     List.Clear;
     List.Free;
  end;

end;

procedure TForm1.FormCreate(Sender: TObject);
begin

  FMaxCount:=0;

end;

procedure TForm1.ListBox1DrawItem(Control: TWinControl; Index: Integer; Rect: TRect; State: TOwnerDrawState);
var  dwCount:       Integer;
     rcFill:        TRect;
     dblPct:        Double;
begin

  // Calculate the percentage based on max and count
  dwCount:=Integer(ListBox1.Items.Objects[Index]);
  dblPct:=(dwCount / Succ(FMaxCount));

  // Get the rect to draw
  with ListBox1.Canvas do
  begin
     FillRect(Rect);
     rcFill:=Rect;
     rcFill.Right:=rcFill.Left+Trunc((rcFill.Right-rcFill.Left) * dblPct);
     Brush.Color:=clYellow;
     FillRect(rcFill);
     Brush.Color:=clWindow;
     SetBkMode(Handle, TRANSPARENT);
     TextOut(Rect.Left+2, Rect.Top+2, ListBox1.Items[Index]);
  end;

end;

end.
--------------------- Form
object Form1: TForm1
  Left = 80
  Top = 117
  Width = 638
  Height = 373
  Caption = 'Form1'
  Color = clBtnFace
  Font.Charset = DEFAULT_CHARSET
  Font.Color = clWindowText
  Font.Height = -11
  Font.Name = 'MS Sans Serif'
  Font.Style = []
  OldCreateOrder = False
  OnCreate = FormCreate
  PixelsPerInch = 96
  TextHeight = 13
  object Button1: TButton
    Left = 8
    Top = 8
    Width = 75
    Height = 25
    Caption = 'Button1'
    TabOrder = 0
    OnClick = Button1Click
  end
  object ListBox1: TListBox
    Left = 136
    Top = 0
    Width = 494
    Height = 339
    Style = lbOwnerDrawFixed
    Align = alRight
    ItemHeight = 16
    TabOrder = 1
    OnDrawItem = ListBox1DrawItem
  end
  object RadioGroup1: TRadioGroup
    Left = 8
    Top = 48
    Width = 121
    Height = 89
    Caption = 'Method'
    ItemIndex = 0
    Items.Strings = (
      'Original'
      'CRC32'
      'RandSeed'
      'Knuth'
      'RLLibby')
    TabOrder = 2
  end
end

 

by: rllibbyPosted on 2004-02-17 at 07:30:31ID: 10383019


Sorry Alan, thats not the case; your calculation is skewing the numbers...

Take the CRC example, which gives me a cost/mean of 5140/315. This is not taking into account the fact that 115,329 items fall into buckets with a depth greater than 19. Some buckets are heavy, while others are very (too) light. That is also why I was displaying the min/max of buckets, which for the crc is between 0-50

The we take this example:

result:=((LongWord(Value) shr 5) * 5) mod HASH_SIZE

It displays a cost/mean of 10485/920. In this example, only 17844 items fall into buckets with a depth greater than 19. The min/max for this calculation is between 8 - 25, which when used with the mean, clearly shows a much more even distribution bewteen the buckets.

If I take the following

result:=((LongWord(Value) div 4) shr 12) mod HASH_SIZE;

it returns a cost/mean of 345/22. It may have the lowest "cost", but I can't ignore the fact that 16,050 buckets are empty. ;-)

Kind Regards,
Russell







 

by: sftwengPosted on 2004-02-17 at 07:47:45ID: 10383224

I must be missing something in your analysis. The "Weighted Avg 'Cost'" is the average of the product of the number of entries in the bin times the depth of the bin and should, therefore, represent a figure of merit for the algorithm if the access to the hash table is uniform (random) or is traversed sequentially and the cost of examining each entry in the bin is linear. Therefore, it does take into account the number of entries with large numbers - it's a weighted average.

The bin population with CRC32 is not sparse (empty) so I don't see how that comes into consideration in this particular comparison though I agree that in general it would.

I guess I'm having trouble understanding by what you mean by "improvement" in your original question. It appears to me that the CRC32 has a more random dispersal, lower mean and lower total processing cost on average than the others.

If you could publish a true "cost" algorithm or way to derive a "figure of merit" then it would be easier to know what to aim for. Is the cost associated with looking an entry up in each bin not linear with the number of entries in the bin, as it would be with a sequential search through the entries? That seems to me to be the likliest worst case.

 

by: rllibbyPosted on 2004-02-17 at 10:54:01ID: 10385028


Alan,
Sorry if this was not clear from the original question, though I thought it was. I was looking for a closer uniform hash distribution than I currently have now. The closer I am to this, the flatter the "worst" case cost becomes, in a "perfect" world this would be O(16) for the worst case. I AM concerned with the average running case as well, but the worst case does factor in.

Anyways, I realize there are alot of factors that come into play with this. Such as best case, worst case, average case, overhead for index calculation, etc. So I decided to run this through a "real" world test, where the pointers are added to the hash, the hash is chained for overflows, and then 1 million random pointer lookups are performed. This way I could get a feel for average case, while also taking into consideration the worst case scenarion (and its percentage for occurance).

Needless to say I got interesting results...

Item Count = 262144
Hash Count = 16441

"Perfect Uniform Hash" (I hashed integers to get a perfect spread)
---------------------------
Min / Max - 15..16
Worst case is 16 scans
0 items (0%) will require > 16 scans to locate
78 ms add time
1240 ms locate time / 1 million random lookups

This was what I was shooting for, but now see that it has its flaws because of a slower average case, even though it has a constant O(16) worst case

CRC
----------------
Min / Max - 0..50
Worst case is ~50 scans
45,455 (17%) items will require > 16 scans to locate
125 ms add time
906 ms locate time / 1 million random lookups


(LongWord(Value) shr 6) mod HASH_SIZE
----------------
Min / Max - 3..30
Worst case is ~30 scans  
25,051 (9%) items will require > 16 scans to locate
78 ms add time
782 ms locate time / 1 million random lookups

(LongWord(Value) shr 7) mod HASH_SIZE;
----------------
Min / Max - 0..34
Worst case is ~34 scans  
34,561 (13%) items will require > 16 scans to locate
78 ms add time
610 ms locate time / 1 million random lookups

(Makes my brain hurt) So at this point I'm going to close this question out, and award you the points for the help / comments in regards to optimizing this. I do appreciate the time you have spent on this, as It gave me MANY things to look at when evaluating what is "optimal" for me.

In the end, I am going with:

  result:=(LongWord(Value) shr 7) mod HASH_SIZE;

As the calculation
1.) is faster than the crc routine
2.) produces a better average real world time than the rest
3.) produces an average worst case scenario.

Thanks again,
Russell

 

by: sftwengPosted on 2004-02-17 at 10:58:03ID: 10385084

Thanks, Russell. I found the exercise interesting too. One parting comment - I had also been thinking about doing some benchmarking and you beat me to it. Had I done so, I'd have optimized the CRC32 calculation a bit, probably throwing awa the initial and final conditioning and making one procedure call rather than four.

Nonetheless, this was fun and I enjoyed working on it with you.

Alan

 

by: rllibbyPosted on 2004-02-17 at 11:03:06ID: 10385126


Alan, I did this as well - in fact I broke the crc down to an inline asm version. But it still required about 16 instructions, compared to the 6 required for:

result:=(LongWord(Value) shr 7) mod HASH_SIZE;

I was okay with the extra insert time, but in the end the 600ms locate time was the deal breaker.

Russell


20120131-EE-VQP-002

3 Ways to Join

30-Day Free Trial

The Experts

98% positive feedback on 31,087 answers since March 2000. angeliii is a Microsoft Most Valuable Professional for his work with MS SQL Server & Develoment.

He has also proven his knowledge of Visual Basic Programming, PHP Scripting and Oracle Databases.

The Experts

97% positive feedback on 10,752 answers since July 2000. lrmoore has more than 18 years experience in the networking industry.

The six-time Mircosoft MVPs specialties include firewalls, virtual private networking, and network management.

Testimonials

"...and excellent source for support... Kind of like having your very own IT dept." Electriciansnet

Testimonials

"I was apprehensive at signing up at first. However... it has already made my life as an IT administrator much easier." JaCrews

Testimonials

"WOW! You guys have great, active, and knowledgeable people on here." moore50

Business Clients

Business Clients

In the Press

"If you’ve got a question... Experts Exchange can supply an answer.”

In the Press

"...an invaluable aid for both IT professionals and those who require tech support."

In the Press

"where IT professionals provide quick answers on just about any topic"

Business Account Plans

Loading Advertisement...