Link to home
Start Free TrialLog in
Avatar of Buzzy Bee
Buzzy BeeFlag for United States of America

asked on

Dbase indexed files/CGI

BACKGROUND:  I am trying to create a database driven web page that accesses data from dbase files (on a Unix box)and puts it into reports.  To do this I am using perl CGI scripts.  I have found a perl module called Xbase.pm that allows me to open and access dbase files which is great.  The problem is that I can't open them by index and seek to particular records, and without this capability, I have to search through the files which is slow because the files are very large.

QUESTION:  How do I access .dbf files by index and be able to seek through them using a perl cgi script? (Preferably without buying comercial software such as WebBase or DBWeb, but those suggestions are welcome!)
Avatar of Fordream
Fordream

Then make a small C program to search the field. I think that programming only this moudle is not too hard though you don't know how to make C program. And you can ask others to do favor for you. It's not too hard with format of dbf file.
 Then perl calls it. And it returns record numbers of it.... Perl use them.
 I also can make the program for you if you tell me DBF format. or simply send me the "Xbase.prn". But I think there need more points to do it ;)
 Bye!

Avatar of Buzzy Bee

ASKER

Adjusted points to 200
This is all the points I have to give!!  Sorry.
The two functions that need to be adjusted in Xbase.pm are open_dbf() and seek().  The Xbase.pm module can be downloaded at:
 
http://www.csuchico.edu/prnt/perl-lib/Xbase.pm

and helpful explanation of it can be found at:

http://ils.unc.edu/crenshaw/Xbase.html

An example of a file I would use looks like this (called data.dbf):
FIELD      LENGTH     TYPE
County       2         Numeric          (County #: 1-87)
MCD          3         Num.     (Cities in corresponding county)
Geo_type     1         Character      (S=State,C=County,M=mcd)
POP70        8         Num.  (actual data for population in 1970)
HOU70        8         Num.     (actual data for housing in 1970)

In using this in our current setting, a Dbase program opens it up on an index tagged "area", which is a complilation of county+mcd. Therefore, if you said SEEK "8712", it would go to the record where county=87 and mcd=12.  I need to be able to do this instead of sequentially searching a file that's not in any order.  Hope this helps.
Hey webslider, before making the program, I have a question of it. Are datas in file is sorted by County and MCD? Then we do not have to seek sequentially but very fast using Binary-search algorithm.

No the data in the file is not ordered by county or mcd.  Data gets added to the end of the file every year and things can be out of place.  That's why we have the index on the files so when using it, we can see the data in logical order, not its actual order.
You cannot seek() because you don't have the corresponding IDX file?

If so, you say data gets updated on a yearly base, so you could just build up a new file from the DBF, getting rid of all the 'header' stuff (DBF has some header information in it), and storing records in increasing order of keys (the keys you need).
On the new file you may go with binary search, as Fordream suggests - which can't be as fast as IDX searching BTW.

Other chance: build up an index yourself:
scan the DBF file;
for each record, put in a array (or a structure) the 'area' information and the offset in the DBF file;
save in an 'idx' file the information collected, SORTED by 'area'.
When you need to seek, issue a binary search in the 'idx' file on the 'area' field, get the offset and do a standard seek in the DBF file.

If you can program C, i could give you some samples, but you may accomplish this in Perl too.
If you need samples, please tell where you are lost.

Cheers, julio
I'm  not totally sure what you are trying to say, Julio.  Maybe my knowledge of the interworkings of dbf files is limited.

The speed of the index doesn't matter.  What I'm trying to get away from is the amount of code I need to write to do a seek.  Right now I'm able to do most of what I need but the code is 700 lines long.  The seek that I was able to do in one line before, now takes me 30 lines.  

I've come apon another problem anyway.  I don't think I can have more than one file open at a time using the Xbase.pm and I need to be able to open more than one.

For the time being, we are going to try using CodeBase 6.2.  I haven't heard much about it but it seems to be able to do what we want.  But in case that doesn't work, I'd like to have a solution to my original problem to fall back on.  All of your help has been great so far.
Adjusted points to 235
I've read Xbase.html.

First i think you can open more than one database with something like:
--//--
$db1 = new Xbase;
$db2 = new Xbase;

$db1->open_dbf($dbf_1_name, $idx_1_name);
$db2->open_dbf($dbf_2_name, $idx_2_name);
--//--

And, you didn't answer my question: "don't you have the corresponding IDX file?". I.e., do you just issue: "$database->open_dbf($dbf_name, (null));"?

The simpler way to solve your problem would be to ask the people who mantain the original DBF to build up a corresponding IDX for you, based on 'area' informations. They CAN, very easily (just a couple of actions in their dbf application environment).

Think also that the only matter here should be speed, since you'll need more than 30 lines (and a hard work) in order to build up your own index informations.

Last (maybe not least): my previous comment was indeed an answer to your original question. Would you let me submit that as an answer even if you're going to give up with DBF (maybe lowering the points before grading)? Think that answer may be of help to other people.

Ok, that's enough... good luck
-julio
Julio-
I'd say feel free to submit your answer but Fordream has been a big help as well.  I guess, if you could tell me exactly what you mean and how to create an idx I'd be more than happy to let you have the points!!!

But before you do, I'll answer a couple of your questions.
Yes, right now I leave the idx part blank so I say:

$database->open_dbf($file_name);

As far as the index goes, I don't know what you mean by build up an IDX file. I didn't realize you could do that with a dbf file, atleast I haven't seen anything relating to it in the books I have.  Right now, we use the files in other Unix based dbase programs so using the indexes is easy.  But in transfering the data to a web page, I can't utilize the indexes that exist on the file.  So, I'm not sure what you mean by building my own index information.  If you want to, my points keep accumulating and as I get them I can add them to the total points if you want to wait another day or two.  

Thanks,
Stephanie

Oh yeah, where can I get those samples you were talking about.  Maybe that might help me understand more.
Hi Stephanie,
it's not a matter of points, you're already giving enough (IMO).

Before starting working on some code, just a few notes:

> Right now, we use the files in other Unix based dbase programs so using the indexes is easy. But in transfering the data to a web page, I can't utilize the indexes that exist on the file.

WHY NOT? I still cannot get this!
Into the 'dbase program' (what i called a 'dbf application environment'), you may create all the indexes you need very easily.
So, why not create an 'area' index by the dbase program, then get both the DBF and the 'area' IDX and issue a:
$database->open_dbf($dbf_name, $area_index_name);
WHY NOT??

This said, if you still think you need some code (but please let me understand why), i must add that i can hardly program Perl, so probably will give more a sample code than a real working code.
That should be a good starting point for you anyway, but do you think it is enough?

Cheers, julio
Into the 'dbase program' (what i called a 'dbf application environment'), you may create all the indexes you need
very easily.

THIS IS TRUE!!  I've have an index on area for the dbf file.  But the statement:
                 
    $database->open_dbf($dbf_name, $area_index_name);
       
DOESN'T WORK!!  This is what I initially tried.  The Xbase module only handles FoxPro indexed files.  Dbase creates a .mdx file that has information about the indexes associated with the .dbf file.

What we are going to attempt to do is write some c/c++(?) code to access the indexed files somehow.  We still need to find out what is in the .mdx file though.  So, unless you can come up with something about this topic, I'm not sure there is anything left to answer.  But if you do think of something that works, don't hesitate to tell me.  Thanks sooooooo much for all of your help.

webslider
Ok, finally i've gotten the problem.

Writing some code to access dbf and mdx files is behond the scope of this forum: it would be like rewriting an Xbase lib in C able to handle mdx indexes: this is really involving, and should be paid for.

What i can give you is some Perl/C code which translates an original dbf archive to a bunch of dbm tables (1 for each field + 1 for the index), and a set of C functions to access those tables allowing the same queries as Xbase does.

If your system, as most unicies, has got the 'ndbm' (or 'dbm', in which case please tell me) library, i could write the code for you in a couple of days or so. Performance should be good enough, since dbm works very fast.

And, we may be running on different platforms (i'm on Digital Unix); will you be able to handle the (very little i think) porting issues?

If you want the code, please also post the full table structure and the index formula. (Writing an all purpose code could go with your next question ;)

Ok, waiting to hear from you...
As far as I know, we don't have either dbm or ndbm libraries so I don't know if that is going to affect what you can do.  

As far as platforms go, we have many.  Solaris, Sun OS, DG Unix, and Linux.  So far no porting problems have occured.  Its important to note that our web server is running Linux and the data is on a Sun OS machine.  These are the two that will be used.

Do you have any idea where I could find the format of the .mdx files?  I've been looking for a month and not come up with anything.

One of the files would look like this:

Field Name    Type     Width
----------------------------
County         Numer   2
Mcd            N       3
Tract          Char    6
Geo_type       C       1
Pop70          N       9
Pop80          (all the rest
Pop90          are N 9)
Hou70            
Hou80            
Hou90            
P2_1...P2_5 (eg. P2_1,P2_2,P2_3...)
P3_1...P3_5
P4_1...P4_6
P5_1...P5_6
Totals -- Fields=32, Width=256

The index formula is county+mcd+tract.BUT!!!...this is a string.
So if county=12, mcd=103, tract=231092 the seek command would be:
 seek "12103231092"
If county=1, the seek string would be " 1103230192".
If mcd=4, the seek string would be " 1004230192".

Hope this makes sense.  I've got one request...please comment your code well (if you can still do it)!!

Thanks so much for all the time you've spent!
-webslider
ASKER CERTIFIED SOLUTION
Avatar of richroth
richroth

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial