Link to home
Start Free TrialLog in
Avatar of craig_capel
craig_capel

asked on

String to Number

https://www.experts-exchange.com/jsp/qManageQuestion.jsp?ta=delphi&qid=20205546

Same question however let's simplify it.

I have wrote a chat server and i would like to store information on every user using a hashing table, so all i need to do is take a username (any characters) and hash it, seek to that record number, write out the record and get out...

so craig_capel username for example = 33233 so i seek to that record and write the data, the routine i have is ok, but i know for a fact it's not good for it's size, can anyone come up with a better method which does not use 1.2gig  (my only limit is that the usernames HAVE to be <32 chars in length)

So HashStringToNumber(Username: String): Integer;

OpenMyDataBase;
SeekToRecordNumber(HashToString('craig_capel'));
WriteDataOut;
CloseMyDataBase;

Thanks People (P.S if any admin are reading, please delete that thread at the top so i can finally free it)

-

Craig C.
Avatar of martin_g
martin_g
Flag of United States of America image

If it may help, there are some great examples of hashing string tables and using them as keys at:
http://homepages.borland.com/efg2lab/Library/Delphi/Algorithms/
Look under "Hashing"...
Avatar of DelphiArchitect
DelphiArchitect

Well, the first problem I see is what happens if you have a hash collision?  It is very likely that at some point you will have 2 users who hash to the same value (unless you use an incredibly huge table, which is a waste of space, and you still have a chance of this).

When you refer to "database" what are you using?  If your using any existing database then it should already support indicies which would be much better than implementing your own hash table.  If your not using an existing database, why not?  At the very least use the BDE which comes with Delphi or Access which is common.  You'll be vastly better off going this way.

If you insist on implementing your own database look into B+ trees which is what many database indicies are based on.

The other option is to manually implement your own in-memory index using a hash table.  I'd recommend a more typical hash implementation that uses overflow buckets or similar to handle hash collisions.  At startup read in the user name for each database record.  Add these to your hash table using the record number as the value stored.  Then you can check the hash table to get the record number quickly.  The plus side here is that hash codes don't directly equate to record numbers, thus you don't have a huge problem with hash collisions and you will save a ton of space (since you only have to have records for users, not for every possible hash value).

Hope this helps.

Delphi Architect
James Higgins
Hi craig_capel,

I saw your previous question. There is another solution, without any database engines.
As I understood, you need to have fast access to user's data. So, split your data into  two files.

First file keeps all users specific data excluding username. Second file keeps username and position of associated data in first file. When your application started, just load second file into TStringList, it able to keep strings and some associated data, offset of user's data in your case. Searching in stringlist is fast enought, you can even sort your strings and use some fast  algorithm. E.g. half-division method let you search in 1,000,000 records using only 20 string comparision operations.

-----
Igor.
I agree with Igor.

Also, how many users are you expecting, or how many do you want to support?
Avatar of craig_capel

ASKER

well before it was for my own server, now i am writing a Client based chat program for Yahoo - it would encounter around 200 - 300 users (different daily) i am actually using my old routine at the moment

Function NameToInt(Name: String): Longint;
 var
  num: integer;
  tmp: string;
  gennum: longint;
begin
 Name:=LowerCase(Name);
 tmp:=name;
 gennum:=0;
for num:=1 to length(tmp) do
 begin
   gennum:=gennum+((ord(tmp[num]))*num);
 end;
   Result:=Gennum;
end;

so if this is the best i am going to get, then i shall use this :)

Thanks
With your routine, the users "Z3", "X4", "V5", "T6", "R7", "P8", "N9" will have the same value.Many, many other combinations will also result in the same result of your NameToInt.

So, why aren't you implementing the method proposed by Igor? It would guarantee unique numbering and only use as much space as needed. For some 300 Users, it wouldn't be a problem to hold the information completely in memory anyways, eliminating any need for hashing because a binary search would be just perfect.

If you are after a better hashing method, I can provide one, but the problem of multiple names yialding the same key will always persis as long as you don't want to use a really large hash (which would again be useless because it will not be small enough to serve as index in a file). Assuming that you support 44 different chars in your names (that is, the alphabet plus digits plus a very few special chars) and names up to 32 chars length, you'll get about 39'853'812'061'028'300'000'000'000'000'000'000'000'000'000'000'000'000 different possible combinations (give or take a few). If your hash is any smaller than that, there will be double identifiers. And that's really why hashing is NOT the solution for what you want to do (gettig an unique index for any valid name).
Well ok - i have on offer 300 points, is 300 points worth an example of how to sort the TStringList;

i know how to use them, but i have never sorted the information - so if you can show me how to do that, i will award the points.

- Thanks


Craig C.
var
    List: TStringList;

List:=TStringList.Create;
List.Sorted:=True;

Adding:
List.AddObject(UpperCase(NickToAdd),PointerToYourInformation);

Searching:
List.IndexOf(UpperCase(NickToFind));

As long as the list is marked as Sorted, the IndexOf will be VERY quick (binary search).
List.AddObject(UpperCase(NickToAdd),PointerToYourInformation);

how does the Tobject work when used in conjunction with List.AddObject?

TObject is the ultimate ancestor of all VCL objects and components. - how do i declare my own information, it's just confusing with me, I mainly use straight pascal, Object Pascal is still confusing.

Thanks -


Craig C.
List.AddObject(UpperCase(NickToAdd),PointerToYourInformation);

how does the Tobject work when used in conjunction with List.AddObject?

TObject is the ultimate ancestor of all VCL objects and components. - how do i declare my own information, it's just confusing with me, I mainly use straight pascal, Object Pascal is still confusing.

Thanks -


Craig C.
Okay. I assume that your information is storen in a class, that is, inherits from TObject. This allows you to just insert you calss members to that list uaing AddObject, and you can read and write the values using the Objects[] array-style property.

List.AddObject(UpperCase(NickToAdd),YourInformationClassInstance);

This is the same code as before for adding. Let's assume that you're storeing your information in a class called TYourInformationClass.

YourInformationClassInstance:=List.Objects[List.IndexOf(NickToAdd)] as TYourInformationClass;

This gets an instance back, makes sure that it is a TYourInformationClass instance and not some other class type and then assigns it to the YourInformationClassInstance variable.
Nope, i was using  Type MyRecord = Record
                                        SomeData: String[30];
                                        Username: String[30];
                                        Ignored: Boolean;
                                   End;

And So on. i don't think it would be too hard to convert it to an Object?
Using objects would not be a problem, but it's not even really necessary here. You can use pointers and cast them to and from TObject. To simplify the identification of types, pointer types and variables (instances), I'd suggest that you use T as type prefix and P as pointer prefix. This is widely accepted as standard.

type
    PMyRecord = ^TMyRecord;
    TMyRecord = record
        SomeData: String[30];
        Username: String[30];
        Ignored: Boolean;
    end;

var
    MyRecord: PMyRecord;
    Index: Integer;

// adding
New(MyRecord);
MyRecord.SomeData:='Use The Record here';
MyRecord.Username:='Nickname';
List.Add(MyRecord.Username,TObject(MyRecord));

// searching
Index:=List.IndexOf(NickToFind);
MyRecord:=PMyRecord(List.Objects[Index]);

// deleting
Index:=List.IndexOf(NickToFind);
Dispose(PMyRecord(List.Objects[Index]));
List.Delete(Index);
Why would i add the Record, if all i am going to do is search for the String to return the data to the other File which contains all the records...

Type UserData = Record //to new db
   Loggedin : String[30];
   Loggedout : String[30];
   StartedLogging: String[30];
   UserName: String[50];
   UserAccess: integer;
   Ignored: Boolean;
   IgnoredWhen: String[30];
   Ignoredby: String[30];
   SponsoredBy: string[45];
   ReasonIgnored: String[50];
   CustomName: String[15];
   ShoutCount: Integer;
   SecondsOnline: LongInt;
   PrivateUser: Boolean;
   LastUrl: String[155];
   SecondsInRoom: Integer;
   LastThingSaid: String[30];
   ThingsSaid: LongInt;
   LastRoom: String[40];
   UsingClient: String[30];
   ClientsAge: Integer;
   MostUsedLanguage: String[10];
   XB,Perl,vb,c,delphi,java,cpp,csharp,php: integer;
   Location: String[50];
   Gender: String[10];
   Marital: String[30];
   Occupation: String[60];
   Sweared: Integer;
   AutoFriendsList: Boolean;
   RecPos: Longint;
End;


at the moment i have a File Of UserData, and i seek, i can still continue to seek to get the information out, surely if i add that whole record to TStringList object, then it''s going to be far too slow?

esp if all it is going to do is pull out the Index?
ASKER CERTIFIED SOLUTION
Avatar of AvonWyss
AvonWyss
Flag of Switzerland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial