Performance issue in DBI

Posted on 2004-11-11
Last Modified: 2012-05-05

I'm using perl to extract a lot of data from an oracle database and printing this to a file. I'm using the DBI library to connect to oracle.
My sql in the prepare statement is a join between 4 tables (where all access is made on indexes using index hinting where appropriate), and returns approx. 45K of rows (about 1/3 of total data). I have run this sql in sqlpluss where it returns my result within reasonable time - approx 2 min.

If i implement this into perl i get the following results:
The execute() statement takes about 10 minutes to finish which is slower than above.
But my biggest problem is that when i loop through the results it starts to hang. I started to print the times of every 1k rows and realised it never got that far. I then printed every row. It did about 25 - 30 rows more or less instantly and then it stalls.
I have tried to use fetchrow_hashref() and fetchrow_array() (don't know if there is a performance gain between the two??) and i have tried to both manipulate the data on the fly and also store them locally (both in hash and array) before proceeding but with the same results.

I have used the same approach before with large amounts of data (mind you only on 1 table access, possibly only a join between 2 tables).

Does anyone know why this is happening and preferably a solution?


Question by:KjartanM-S
    LVL 18

    Expert Comment

    show us some perl code exhibiting the problem. I routinely do large data transfers with complex joins, and never run into any unusual performance loss.

    Author Comment

    sorry for the delay....

    Here is my statement handler:
    $customer_info_nodate_sth = $dbh->prepare(q{
                            /*+ INDEX(ad ACCOUNTDETAILS_PK) */
                from   account a,
                      accountdetails ad,
                      contactdetails cd,
                      accountattributes aa,
                      contact c
                      aa.account_type in (2, 4, 6, 8, 90, 99)
                and   a.account_num = aa.account_num
                and   ad.account_num = a.account_num
                and   ad.start_dat < sysdate
                and   (ad.end_dat is null or ad.end_dat > sysdate)
                and   c.customer_ref = a.customer_ref
                and   c.contact_seq = ad.billing_contact_seq
                and   cd.customer_ref = a.customer_ref
                and   cd.contact_seq = ad.billing_contact_seq
                and   cd.start_dat < sysdate
                and   (cd.end_dat is null or cd.end_dat > sysdate)

    Here are some of the methods that are involved:

    sub process {
    print localtime()." INFO: SQL er executed\n";

    sub process_task {
                    my $info_sth = shift;      
          my $type = "I";
          print localtime()." INFO: Now starting to process all records found\n";
          while(my $curr_update = $info_sth->fetchrow_hashref())
                my $account_num = $curr_update->{ACCOUNT_NUM};
                my $customer_ref = $curr_update->{CUSTOMER_REF};
                my $acc_status = $curr_update->{ACCOUNT_STATUS};
                my $account_type = $curr_update->{ACCOUNT_TYPE};
                my $in_use = $curr_update->{IN_USE};
                my $address_seq = $curr_update->{ADDRESS_SEQ};
                my $contact_seq = $curr_update->{CONTACT_SEQ};
                my $first_name = $curr_update->{FIRST_NAME};
                my $last_name = $curr_update->{LAST_NAME};
                my ($address1, $address2, $zipcode, $address4, $country) =
                      get_additional_info($curr_update->{CUSTOMER_REF}, $curr_update->{ADDRESS_SEQ});
                      $params{RECORD_TYPE_CUSTOMER}, $account_num, $customer_ref, $acc_status, $account_type, $in_use, $contact_seq,
                      $first_name, $last_name, $address1, $address2, $zipcode, $address4, $country, $type

    sub get_additional_info {
    # another select to get some more info

    sub add_record {
    #print the line to a file and count the number of lines printed

    I've had a couple of variations of process_task here, where i have used the while loop to read into a local hash and then looped over that and fetched the remaining information, etc. I have also tried to splitt the statement into two SQLs, so that i only join on two tables for each sql. The second sql then perfoming a lookup on indexed values only. I don't know which is the best way of doing things. Having one SQL to fetch all the data or one to fetch all ACCOUNTs and then execute 45k of sql after that to get remaining information?



    Author Comment

    PS: I noticed the duplicate aa.account_type reference in the select statement. It is a copy & past error to the was not part of the sql that i have executed.
    LVL 2

    Expert Comment

    I  doubt the problem is with DBI.  Do you have indexes on all of the columns?  You might try an explain to check on that.  If you've eliminated that as a problem you may want to switch to fetchrow_array which is faster than fetchrow_hashref.  Its far less convenient or resilient to databas e changes so unless you have to go that way its better to stick with your existing code.

    Author Comment

    The only column that is not indexed and is a limiting factor on the select is aa.account_type (aa.account_num is indexed, but...). This results in a full table scan in this table. This cannot be helped, unfortunately, but i often use that column as a criteria in my sqls and it has always performed well before.

    I have also tried fetchrow_array() in my implementation. My problem seem to be that it iterate throught the first 30 records then it all perl at least....I can see that the oracle process on the server is running for full power.

    I have noticed something similar in TOAD (if anyone is familiar with this - sqlpluss tool). After executing an sql it displayes the 30 first results very quickly, but if you try to view the whole result set then it takes time to fetch it......but usually no more than a couple of minutes depending on the size of the resultset.

    LVL 2

    Accepted Solution

    The SQL may be the limiting factor.  Oracle can be a real pig.  You might want to load the data into MySQL and see if your situation improves.

    If you're sure its a Perl issue, we need more of an idea of where the code is chewing up time.  There are several profiling tools available (do a search on  Try one of those and post the results.  :)

    Author Comment

    I don't have the possibility to load the data into another DB.

    I tried to use a profiler, but with no luck. It kept bugging me with unloadable objects in a module.......

    However....this is what i got as output from running the script:
    Thu Nov 11 13:23:20 2004 Export file /home/test4/external/of_export_out/CUS_20041111_1323_TMS opened
    Thu Nov 11 13:23:20 2004 INFO: Script running for first time
    Thu Nov 11 13:36:39 2004 INFO: SQL er executed
    Thu Nov 11 13:36:39 2004 INFO: Now starting to process all records found
    Thu Nov 11 13:36:39 2004 INFO: Copy 40000000003 to local storing
    ... (44 records)
    Thu Nov 11 13:36:39 2004 INFO: Copy 40000000062 to local storing

    I have tried to let it run for about 24 hrs without it proceeding from this point. This is however from the version where i tried to copy all the data into a local hash before fetching addintional information.


    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    Top 6 Sources for Identifying Threat Actor TTPs

    Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

    On Microsoft Windows, if  when you click or type the name of a .pl file, you get an error "is not recognized as an internal or external command, operable program or batch file", then this means you do not have the .pl file extension associated with …
    Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
    Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
    This video is in connection to the article "The case of a missing mobile phone (". It will help one to understand clearly the steps to track a lost android phone.

    759 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    11 Experts available now in Live!

    Get 1:1 Help Now