[Webinar] Streamline your web hosting managementRegister Today


DBM Database Files in Perl: So Huge!

Posted on 1997-12-25
Medium Priority
Last Modified: 2008-03-03
I just started to use DBM files in Perl.  Unfortunately I have very bad manuals to teach the little things about it.  My book consistently makes use of:
Use AnyDBM_File;  

I've gathered that it is selecting a DBM package that is available on my computer.  I got to thinking and said that if I use this method to create my DBM database files and then upload my .CGI script and DBM file to a server, the
Use AnyDBM_File  line could possibly select a different DBM package.  Therefore now I exclusively use:  Use SDBM_FIle;
I hope this is correct.

Anyway, I created my DBM file with the following Perl script:
use SDBM_File;

for ($i=0;$i<5000;$i++)
  $value="$i"."c" x (29-length($i));
  print "\n$i:[$value]";

It creates a file with 5000 items in it each 29 bytes long.  The DBM file seems to create 2 files on my computer: a .DIR file and a .PAG file.  And for some reason the .DIR file is HUGE (1,048,576 bytes).  I was only expecting the file size of 150,000 bytes or so.  What happened?

Now from what I gather, DBM file are unique that they DON'T actually load the entire database file in.  Using the hash array one can access items in it and do whatever.  So the 1 megabyte size of the database is irrelevant since it is never loaded in.  Is this all true?!

Now about File Locking (using FLOCK).  I am testing all my web site CGI scripts on my Pentium Windows 95 computer.  I use Perl 5 but it says that FLOCK isn't available on WIN95.
What I want to do is to be able to access the database file without having a million other users accessing it, altering it, and warping data I just read in and will soon write back out.  Essentially I don't want to screw up the database if millions of users access it at once for reading/writing/updating.  Alamo suggested FLOCK but I really couldn't find ANY information on it.  Alamo, the only part I didn't get was the "recalling" of the script at the same time to test if FLOCK worked!?  Maybe there's another method to locking files.
Question by:mirror

Expert Comment

ID: 1209393
Flock will work on the server. If you want to test it on WIN95 you will have to write a 'wrapper' program that on your system is simply a dummy call but on the server makes a FLOCK call. This will allow you to test the program on your own PC (where you are unlikely to get access conflicts).

Accepted Solution

ashishkh earned 40 total points
ID: 1209394
First thing, the dbm files are platform dependent( they depend on  say platform's byte order) and hence not portable. So, you should not upload  the .dir .pag files to your server. Upload only the perl script. Also, better use AnyDBM_File as SDBM may not be available on the server.
What you can do probably, is to have a cgi script link which is known only to you which will run the above script on your server and create the database. Once, it is done other cgi scripts which accesses this database can be run without problem.
        I have run your script on both NT and UNIX platform. On UNIX platform the size was 4096 and 262144 bytes for topref.dir and topref.pag resp. For NT the figures are  4096 and 1,048,576 resp. Since, the topref.pag (infact all .pag files) contains "holes" (i.e. The data in this files might stored at offset 1 and then immediately 1024 etc. thus creating dummy blocks). The .dir files contains information about how the information is distributed in different blocks of .pag file( It uses dynamic hashing algorithm). Thus, for empty (dummy) blocks in .pag file there will be no entry in .dir file. This space is not actually consumed but the dir command may show it. But, on unix system if you do touch (run touch command on them)then sometimes (and on some systems) the empty/dummy blocks may be actually created. So, in this case only the space will be actually wasted. In most of the cases, the unusually size of .pag file is not a bother at all. So, don't worry. And by the way, the .pag and .dir files are not even copiable (i.e cp, mv etc.) command may not work properly for this files. See, dbm_open man page on any unix system for more info.
 As for your question of "loading in", if you are refering to loading the database in RAM, I  think the .dir file (which is very small, in your case it is just 4096 bytes) is loaded in the memory. Then, using hash code returned by a hash function on the key (which you specify while accesses the value pointed out by the key), the .dir file is consulted to find out to which block the key:value pair should belong. If the access results in creation of new key:value pair then a new block number is selected based on the hash code. This block number refers to the block number in the .pag file where the key:value pair will be actually be stored.
        Hope this solves your problem.

Featured Post

Learn to develop an Android App

Want to increase your earning potential in 2018? Pad your resume with app building experience. Learn how with this hands-on course.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I've just discovered very important differences between Windows an Unix formats in Perl,at least 5.xx.. MOST IMPORTANT: Use Unix file format while saving Your script. otherwise it will have ^M s or smth likely weird in the EOL, Then DO NOT use m…
I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans
Suggested Courses

607 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question