DBM Database Files in Perl: So Huge!

I just started to use DBM files in Perl.  Unfortunately I have very bad manuals to teach the little things about it.  My book consistently makes use of:
Use AnyDBM_File;  

I've gathered that it is selecting a DBM package that is available on my computer.  I got to thinking and said that if I use this method to create my DBM database files and then upload my .CGI script and DBM file to a server, the
Use AnyDBM_File  line could possibly select a different DBM package.  Therefore now I exclusively use:  Use SDBM_FIle;
I hope this is correct.

Anyway, I created my DBM file with the following Perl script:
-----------------
use SDBM_File;

dbmopen(%TOPREF,"topref",0600);
for ($i=0;$i<5000;$i++)
{
  $value="$i"."c" x (29-length($i));
  print "\n$i:[$value]";
  $TOPREF{$i}=$value;
}
dbmclose(%TOPREF);
-----------------

It creates a file with 5000 items in it each 29 bytes long.  The DBM file seems to create 2 files on my computer: a .DIR file and a .PAG file.  And for some reason the .DIR file is HUGE (1,048,576 bytes).  I was only expecting the file size of 150,000 bytes or so.  What happened?

Now from what I gather, DBM file are unique that they DON'T actually load the entire database file in.  Using the hash array one can access items in it and do whatever.  So the 1 megabyte size of the database is irrelevant since it is never loaded in.  Is this all true?!

Now about File Locking (using FLOCK).  I am testing all my web site CGI scripts on my Pentium Windows 95 computer.  I use Perl 5 but it says that FLOCK isn't available on WIN95.
What I want to do is to be able to access the database file without having a million other users accessing it, altering it, and warping data I just read in and will soon write back out.  Essentially I don't want to screw up the database if millions of users access it at once for reading/writing/updating.  Alamo suggested FLOCK but I really couldn't find ANY information on it.  Alamo, the only part I didn't get was the "recalling" of the script at the same time to test if FLOCK worked!?  Maybe there's another method to locking files.
mirrorAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

icdCommented:
Flock will work on the server. If you want to test it on WIN95 you will have to write a 'wrapper' program that on your system is simply a dummy call but on the server makes a FLOCK call. This will allow you to test the program on your own PC (where you are unlikely to get access conflicts).
0
ashishkhCommented:
First thing, the dbm files are platform dependent( they depend on  say platform's byte order) and hence not portable. So, you should not upload  the .dir .pag files to your server. Upload only the perl script. Also, better use AnyDBM_File as SDBM may not be available on the server.
What you can do probably, is to have a cgi script link which is known only to you which will run the above script on your server and create the database. Once, it is done other cgi scripts which accesses this database can be run without problem.
        I have run your script on both NT and UNIX platform. On UNIX platform the size was 4096 and 262144 bytes for topref.dir and topref.pag resp. For NT the figures are  4096 and 1,048,576 resp. Since, the topref.pag (infact all .pag files) contains "holes" (i.e. The data in this files might stored at offset 1 and then immediately 1024 etc. thus creating dummy blocks). The .dir files contains information about how the information is distributed in different blocks of .pag file( It uses dynamic hashing algorithm). Thus, for empty (dummy) blocks in .pag file there will be no entry in .dir file. This space is not actually consumed but the dir command may show it. But, on unix system if you do touch (run touch command on them)then sometimes (and on some systems) the empty/dummy blocks may be actually created. So, in this case only the space will be actually wasted. In most of the cases, the unusually size of .pag file is not a bother at all. So, don't worry. And by the way, the .pag and .dir files are not even copiable (i.e cp, mv etc.) command may not work properly for this files. See, dbm_open man page on any unix system for more info.
 As for your question of "loading in", if you are refering to loading the database in RAM, I  think the .dir file (which is very small, in your case it is just 4096 bytes) is loaded in the memory. Then, using hash code returned by a hash function on the key (which you specify while accesses the value pointed out by the key), the .dir file is consulted to find out to which block the key:value pair should belong. If the access results in creation of new key:value pair then a new block number is selected based on the hash code. This block number refers to the block number in the .pag file where the key:value pair will be actually be stored.
        Hope this solves your problem.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Perl

From novice to tech pro — start learning today.