Link to home
Start Free TrialLog in
Avatar of KjartanM-S
KjartanM-SFlag for Norway

asked on

Performance issue in DBI

Hi

I'm using perl to extract a lot of data from an oracle database and printing this to a file. I'm using the DBI library to connect to oracle.
My sql in the prepare statement is a join between 4 tables (where all access is made on indexes using index hinting where appropriate), and returns approx. 45K of rows (about 1/3 of total data). I have run this sql in sqlpluss where it returns my result within reasonable time - approx 2 min.

If i implement this into perl i get the following results:
The execute() statement takes about 10 minutes to finish which is slower than above.
But my biggest problem is that when i loop through the results it starts to hang. I started to print the times of every 1k rows and realised it never got that far. I then printed every row. It did about 25 - 30 rows more or less instantly and then it stalls.
I have tried to use fetchrow_hashref() and fetchrow_array() (don't know if there is a performance gain between the two??) and i have tried to both manipulate the data on the fly and also store them locally (both in hash and array) before proceeding but with the same results.

I have used the same approach before with large amounts of data (mind you only on 1 table access, possibly only a join between 2 tables).

Does anyone know why this is happening and preferably a solution?

Thanks

-KMS
Avatar of kandura
kandura

show us some perl code exhibiting the problem. I routinely do large data transfers with complex joins, and never run into any unusual performance loss.
Avatar of KjartanM-S

ASKER

sorry for the delay....

Here is my statement handler:
$customer_info_nodate_sth = $dbh->prepare(q{
            select
                        /*+ INDEX(ad ACCOUNTDETAILS_PK) */
                  aa.in_use,
                  aa.account_type,
                        a.account_num,
                  a.customer_ref,
                  a.account_status,
                  aa.account_type,
                  ad.billing_contact_seq,
                  c.first_name,
                  c.last_name,
                  cd.address_seq,
                  cd.contact_seq
            from   account a,
                  accountdetails ad,
                  contactdetails cd,
                  accountattributes aa,
                  contact c
            where
                  aa.account_type in (2, 4, 6, 8, 90, 99)
            and   a.account_num = aa.account_num
            and   ad.account_num = a.account_num
            and   ad.start_dat < sysdate
            and   (ad.end_dat is null or ad.end_dat > sysdate)
            and   c.customer_ref = a.customer_ref
            and   c.contact_seq = ad.billing_contact_seq
            and   cd.customer_ref = a.customer_ref
            and   cd.contact_seq = ad.billing_contact_seq
            and   cd.start_dat < sysdate
            and   (cd.end_dat is null or cd.end_dat > sysdate)
      });

Here are some of the methods that are involved:

sub process {
$customer_info_nodate_sth->execute();
print localtime()." INFO: SQL er executed\n";
process_task($customer_info_nodate_sth);
.....
}

sub process_task {
                my $info_sth = shift;      
      my $type = "I";
      print localtime()." INFO: Now starting to process all records found\n";
      while(my $curr_update = $info_sth->fetchrow_hashref())
      {
            my $account_num = $curr_update->{ACCOUNT_NUM};
            my $customer_ref = $curr_update->{CUSTOMER_REF};
            my $acc_status = $curr_update->{ACCOUNT_STATUS};
            my $account_type = $curr_update->{ACCOUNT_TYPE};
            my $in_use = $curr_update->{IN_USE};
            my $address_seq = $curr_update->{ADDRESS_SEQ};
            my $contact_seq = $curr_update->{CONTACT_SEQ};
            my $first_name = $curr_update->{FIRST_NAME};
            my $last_name = $curr_update->{LAST_NAME};
            
            my ($address1, $address2, $zipcode, $address4, $country) =
                  get_additional_info($curr_update->{CUSTOMER_REF}, $curr_update->{ADDRESS_SEQ});
            
            add_record(
                  $params{RECORD_TYPE_CUSTOMER}, $account_num, $customer_ref, $acc_status, $account_type, $in_use, $contact_seq,
                  $first_name, $last_name, $address1, $address2, $zipcode, $address4, $country, $type
            );
      }

sub get_additional_info {
# another select to get some more info
}

sub add_record {
#print the line to a file and count the number of lines printed
}


I've had a couple of variations of process_task here, where i have used the while loop to read into a local hash and then looped over that and fetched the remaining information, etc. I have also tried to splitt the statement into two SQLs, so that i only join on two tables for each sql. The second sql then perfoming a lookup on indexed values only. I don't know which is the best way of doing things. Having one SQL to fetch all the data or one to fetch all ACCOUNTs and then execute 45k of sql after that to get remaining information?

Thanks

-KMS
PS: I noticed the duplicate aa.account_type reference in the select statement. It is a copy & past error to the web....it was not part of the sql that i have executed.
-KMS
I  doubt the problem is with DBI.  Do you have indexes on all of the columns?  You might try an explain to check on that.  If you've eliminated that as a problem you may want to switch to fetchrow_array which is faster than fetchrow_hashref.  Its far less convenient or resilient to databas e changes so unless you have to go that way its better to stick with your existing code.
The only column that is not indexed and is a limiting factor on the select is aa.account_type (aa.account_num is indexed, but...). This results in a full table scan in this table. This cannot be helped, unfortunately, but i often use that column as a criteria in my sqls and it has always performed well before.

I have also tried fetchrow_array() in my implementation. My problem seem to be that it iterate throught the first 30 records then it all stops.....in perl at least....I can see that the oracle process on the server is running for full power.

I have noticed something similar in TOAD (if anyone is familiar with this - sqlpluss tool). After executing an sql it displayes the 30 first results very quickly, but if you try to view the whole result set then it takes time to fetch it......but usually no more than a couple of minutes depending on the size of the resultset.

-KMS
ASKER CERTIFIED SOLUTION
Avatar of cwhicks
cwhicks

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I don't have the possibility to load the data into another DB.

I tried to use a profiler, but with no luck. It kept bugging me with unloadable objects in a module.......

However....this is what i got as output from running the script:
Thu Nov 11 13:23:20 2004 Export file /home/test4/external/of_export_out/CUS_20041111_1323_TMS opened
Thu Nov 11 13:23:20 2004 INFO: Script running for first time
Thu Nov 11 13:36:39 2004 INFO: SQL er executed
Thu Nov 11 13:36:39 2004 INFO: Now starting to process all records found
Thu Nov 11 13:36:39 2004 INFO: Copy 40000000003 to local storing
... (44 records)
Thu Nov 11 13:36:39 2004 INFO: Copy 40000000062 to local storing

I have tried to let it run for about 24 hrs without it proceeding from this point. This is however from the version where i tried to copy all the data into a local hash before fetching addintional information.

-KMS