Help with Perl While Loop

justmorri
justmorri used Ask the Experts™
on
Is there any way to make this script run faster? Right now it is taking almost an hour to
process a 91,000 flat file. Here is the while loop code which runs through each line in the
file and insert it into an oracle database table.  

my $line = 0;
my $c = 0;

while (<IN>) {
  chomp($_);
  $line += 1;

  my ($field1,
    $field2,
    $field3,
    $field4,
    $field5,
    $field6,
    $field7,
    $field8,
    $field9,
    $field10,
    $field11,
    $field12,
    $field13,
    $field14,
    $field15,
    $field16,
    $field17,
    $field18,
    $field19,
    $field20,
    $field21,
    $field22,
    $field23,
    $field24) = split(/ /,$_);

  &fix($field1);
  &fix($field2);
  &fix($field3);
  &fix($field4);
  &fix($field5);
  &fix($field6);
  &fix($field7);
  &fix($field8);
  &fix($field9);
  &fix($field10);
  &fix($field11);
  &fix($field12);
  &fix($field13);
  &fix($field14);
  &fix($field15);
  &fix($field16);
  &fix($field17);
  &fix($field18);
  &fix($field19);
  &fix($field20);
  &fix($field21);
  &fix($field22);                
  &fix($field23);
  &fix($field24);
     
  if ($field1 ne ' ') {
    $field1 =~ s/\///g;
    my ($month, $day, $year) = unpack "A2A2A4", $field1;  
    $field1 = $year.''.$month.''.$day;                    
  }
 
  $field23 = lc($field23);
  $field24 = lc($field24);

  if (($field15 ne "A") && ($field16 ne "B")) {
    eval {
      $sql = "insert into myTable values (\'$field1\',\'$field2\',\'$field3\',\'$field4\',\'$field5\',\'$field6\',\'$field7\',\
'$field8\',\'$field9\',\'$field10\',\'$field11\',\'$field12\',\'$field13\',\'$field14\',\'$field15\',\'$field16\',\
'$field17\',\'$field18\',\'$field19\',\'$field20\',\'$field21\',\'$field22\',\'$field23\',\'$field24\') ";
      $sth = $dbh->prepare($sql);
      $sth->execute();
    };
    if ($@) { print "$@\n"; next; }
    $dbh->commit();
    $c += 1;
  }
}


Subroutine that's called -

sub fix {
  $_[0] =~ s/^ *//;
  $_[0] =~ s/ *$//;              
  $_[0] =~ s/'/''/g;            
  if (! $_[0]) {
    $_[0] = ' ';
    }
  }
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
ozo
Most Valuable Expert 2014
Top Expert 2015

Commented:
The perl code can be streamlined, but I would expect that most of the time would be taken by the  $sth->execute();
You might try moving the prepare out of the loop and doing a bind in the loop, or only doing a commit after several inserts.
There are a number of inefficiencies and questionable coding practices in that code, but without knowing where the script is spending most of its time, we can't say for certain which of those inefficiencies you should focus on fixing.

You can and should use the Devel::NYTProf module to learn where the script is spending most of its time.  We can also make some educated guesses, like ozo did, and correct some of those inefficiencies.

If you use ' ' for the pattern in the split statement instead of / /, it will strip the leading and trailing spaces from the fields, so you won't need to do that in the fix() sub.  And, instead of creating 24 sequentially numbered scalars, it would be better to use an array.

Instead of calling the fix() sub 24 times for each line in the file, it would be better to alter the sub to have it accept an array reference.

my @fields = split(' ', $_);
fix(\@fields);

Open in new window

An even better option/approach would be to use the Text::CSV module to parse the lines of the file.

ozo has already suggested moving the prepare statement, which should defiantly be done.  When doing that, you need to use placeholders in the prepare statement instead of passing the $fieldX vars.  Those vars (or the array) would be passed in the execute statement.
Placeholders and Bind Values

There are additional improvements that I can suggest, but start with those and we'll cover the others as needed.

Author

Commented:
Thanks, ozo and FishMonger! I'll try those suggestions and report back.
Success in ‘20 With a Profitable Pricing Strategy

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

ozo
Most Valuable Expert 2014
Top Expert 2015

Commented:
my $c = 0;
my @field;
$sth = $dbh->prepare("insert into myTable values (?,?,?,?,?,?,?,?,?,?,?,?.?,?,?,?,?,?,?,?,?,?,?,?)");
while (<IN>) {
  @field[1..24] = split;
  next unless  $field[15] ne "A" && $field[16] ne "B";
  tr/'/"/, $_||=' ' for @field;
  $_=join'',(split m{/*})[4..7,0..3] for $field[1];
  $_ = lc for @field[23,24];
   eval {
      $sth->execute(@field[1..24]);
   };
   if ($@) { print "$@\n"; next; }
   $dbh->commit();
   $c += 1;
}
my $line = $.;
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
Do you need Perl at all?

When loading data from a text file into Oracle SQL Loader or External Tables is typically the best/fastest.

Author

Commented:
I think so. I need to process the file on a daily basis out of cron.

I'm having problems with the split splitting properly as some fields have spaces between their values.

Example of a line in the file:
01/01/2013      123456789       username01        P               abc@email.com    Alfa    B       Canine                   Alfa    B       Canine                   123 South 1st Street    New York        NY      10000   Some Park    Main            +1 212 555 1234 +1 212 555 4321 75 Some Blvd    New York   NY      10000   Some Really Long Org Name    Some Really Long Org Name   ABC+D Water   Gone      Status_Range                        

I'm trying -

my @field;
$sth = $dbh->prepare("insert into myTable values (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)");

while (<IN>) {
  @field[1..37] = split(/ /, $_);
  next unless $field[29] ne "Some Really Long Org Name" && $field[30] ne "Some Really Long Org Name";
  tr/'/"/, $_||=' ' for @field;
  $_=join'',(split m{/*})[4..7,0..3] for $field[1];
  $_ = lc for @field[32,33];
   eval {
      $sth->execute(@field[1..37]);
   };
   if ($@) { print "$@\n"; next; }
   $dbh->commit();
}


There are actually 37 fields in this file not 24. :)  Also, for the second field (123456789, in this example) I'd like to substitute "None" (without the quotes) if the field is blank.
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
>>I'd like to substitute "None" (without the quotes) if the field is blank.

Databases allow null values for a reason.  Use them.  Why store data to say no data exists?

Author

Commented:
True, true. I'm in favor of that! Unfortunately, this is legacy table (which does not allow nulls -ech), and legacy code that I am trying to improve in small steps.
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
I hate bad designs that you are forced to live with.

Anyway, did you look into SQL Loader to replace the Perl Script?  SQL Loader was written to load data into Oracle from Text files and does this very efficiently.
ozo
Most Valuable Expert 2014
Top Expert 2015

Commented:
> some fields have spaces between their values.
Your use of / / instead of ' ' suggested that you wanted multiple spaces to indicate empty fields.
If fields can be separated by multiple spaces, you probably want to use " ", which is the default:
   @field[1..37] = split;
Why wouldn't you be able to use SQL Loader with cron?

Author

Commented:
I'm not very familiar with using SQL Loader from the unix cmd line...
ozo
Most Valuable Expert 2014
Top Expert 2015

Commented:
> if the field is blank.
If there can be multiple spaces between fields, how do you represent a blank field?
If the number of spaces between fields is always between 4 and 8,
so less than 4 spaces in a row are part of a field, and 19 spaces in a row, as between Canine and 123 indicates an empty field, then you might use
@field[1..37] = split/ {4,8}/;
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
sqlldr is only command line.

Setting up the control file is the hard part.

Once you have it set up the actual command is simple:
sqlldr username/password control=mycontrolfile.ctl


You've provided an example line from the file.  If you can provide the table description and the expected results I'm sure we can come up with a sample control file.
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
also, as ozo has suggested, the file layout.  Delimited or fixed width work best for sql loader.
ozo
Most Valuable Expert 2014
Top Expert 2015

Commented:
With split/ / or split' ', the s/^ *// and s/ *$// would have been unnecessary, since the spaces would separate the fields instead of being part of the field, but with something like split/ {4,8}/, the s/^ *// may become necessary
  s/^ *//, tr/'/"/, $_||=' ' for @field;
And it may have to be done before the "Some Really Long Org Name" check,
or, depending on how often that case happens, it may be faster to make the check allow for leading space:
next if grep/ *Some Really Long Org Name$/,@field[29,30];

Author

Commented:
Is there any way to not have to pass the user/password from the command line?
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
>Is there any way to not have to pass the user/password from the command line?

Looks like you can store it in a PARFILE as well.  Never used PARFILEs with sql loader but the docs say you can.

Author

Commented:
Hmm.. I think I'll work on the sql loader way after I get this ancient way with a Perl script working. :)
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
Personally I would stop the ancient way and focus efforts on SQL Loader.  Trust me, it is pretty quick at what it does.

If you can:  Create a very simple test case.  Single table, single column.  Should be a simple tweak to your Perl script to handle it and a very simple sql loader script.

Set up some representative sample of rows that you expect in production and test them.

Author

Commented:
Ok, will tackle the SQL Loader way! :)
Here is the table description:

Name            Null     Type        
--------------- -------- ------------
FIELD1          NOT NULL VARCHAR2(8)    # date of birth in file is mm/dd/yyyy but need to reformat to yyyymmdd
FIELD2          NOT NULL VARCHAR2(9)    # replace blanks with None
FIELD3          NOT NULL VARCHAR2(16)
FIELD4          NOT NULL VARCHAR2(2)  
FIELD5          NOT NULL VARCHAR2(64)
FIELD6          NOT NULL VARCHAR2(31)
FIELD7          NOT NULL VARCHAR2(20)
FIELD8          NOT NULL VARCHAR2(32)
FIELD9          NOT NULL VARCHAR2(8)  
FIELD10         NOT NULL VARCHAR2(16)
FIELD11         NOT NULL VARCHAR2(31)
FIELD12         NOT NULL VARCHAR2(20)
FIELD13         NOT NULL VARCHAR2(32)
FIELD14         NOT NULL VARCHAR2(16)
FIELD15         NOT NULL VARCHAR2(16)
FIELD16         NOT NULL VARCHAR2(96)
FIELD17         NOT NULL VARCHAR2(32)
FIELD18         NOT NULL VARCHAR2(32)
FIELD19         NOT NULL VARCHAR2(16)
FIELD20         NOT NULL VARCHAR2(64)
FIELD21         NOT NULL VARCHAR2(24)
FIELD22         NOT NULL VARCHAR2(24)
FIELD23         NOT NULL VARCHAR2(32)
FIELD24         NOT NULL VARCHAR2(32)
FIELD25         NOT NULL VARCHAR2(96)
FIELD26         NOT NULL VARCHAR2(64)
FIELD27         NOT NULL VARCHAR2(32)
FIELD28         NOT NULL VARCHAR2(16)
FIELD29         NOT NULL VARCHAR2(64)   # Some Really Long Org Name
FIELD30         NOT NULL VARCHAR2(64)   # Some Really Long Org Name
FIELD31         NOT NULL VARCHAR2(64)
FIELD32         NOT NULL VARCHAR2(16)
FIELD33         NOT NULL VARCHAR2(24)
FIELD34         NOT NULL VARCHAR2(16)
FIELD35         NOT NULL VARCHAR2(1)  
FIELD36         NOT NULL VARCHAR2(1)  
FIELD37         NOT NULL VARCHAR2(1)  


Fields 6-8 are names and sometimes have apostrophes.

Author

Commented:
I looked at the file in vi with :set list and see that the fields are separated with ^I
and lines are terminated with ^M$  

I was removing the ctrl-Ms with: perl -pi -e 's/\cM//g' $1
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
>>I looked at the file in vi with :set list and see that the fields are separated with ^I

Tab delimited makes sql loader even easier.

I'm about ready to leave work.  I'll work on this later at home.  Shouldn't be difficult.  Give me a little commute time and I should have something for you in an hour or two.

Can you post some actual tab delimited data (in a code block) for me?

This is a code block

Open in new window


It will cut down on errors when I add them to what you already posted.

Author

Commented:
Wooo, thank you!! :)

Here's a line!

01/01/2013      123456789       username01      P       abc@email.com   Alfa    B       Canine                  Alfa    B       Canine                  123 South 1st Street    New York        NY
      10000   Some Park       Main            +1 212 555 1234 +1 212 555 4321 75 Some Blvd    New York        NY      10000   Some Really Long Org Name       Some Really Long Org Name       ABC+D Water     Gone    Status_Range                            Y               ^M

Open in new window

Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
>>Here's a line!

This still doesn't have the tab separation.  I'll make some guesses.  It should be enough to get you off and running.
Most Valuable Expert 2012
Distinguished Expert 2018
Commented:
There doesn't appear to be enough fields in the data file for the table definition so its a little off.

Can you also change how the file gets to the Unix box?  The ctrl-M is common when files are tranferred from DOS/Windows to Unix.  doscp or similar command can help with this.

Assuming you cannot, check out FIELD37 in my control file.  The terminated by syntax should work once all the fields line up.

I also added another row based on your requirements for 'None' and possible single quotes in the data.

Here's what I have:

Control File (I named mine q.ctl):
LOAD DATA
INFILE "c:\q.txt"
APPEND
INTO TABLE tab1
FIELDS TERMINATED BY X'9'
TRAILING NULLCOLS
(
        FIELD1 "substr(:FIELD1,7,4) || substr(:FIELD1,1,2) || substr(:FIELD1,4,2)",
	FIELD2 "NVL(:FIELD2,'None')",
	FIELD3,
	FIELD4,
	FIELD5,
	FIELD6,
	FIELD7,
	FIELD8,
	FIELD9,
	FIELD10,
	FIELD11,
	FIELD12,
	FIELD13,
	FIELD14,
	FIELD15,
	FIELD16,
	FIELD17,
	FIELD18,
	FIELD19,
	FIELD20,
	FIELD21,
	FIELD22,
	FIELD23,
	FIELD24,
	FIELD25,
	FIELD26,
	FIELD27,
	FIELD28,
	FIELD29,
	FIELD30,
	FIELD31,
	FIELD32,
	FIELD33,
	FIELD34,
	FIELD35,
	FIELD36,
	FIELD37 TERMINATED BY X'D'
)

Open in new window



My data file (named q.txt.  I added tabs where I thought they should go):
01/01/2013	123456789	username01	P	abc@email.com	Alfa	B	Canine	Alfa	B	Canine	123 South 1st Street	New York	NY	10000	Some Park	Main	+1 212 555 1234	+1 212 555 4321	75 Some Blvd	New York	NY	10000	Some Really Long Org Name	Some Really Long Org Name	ABC+D Water	Gone	Status_Range	Y	^M
02/02/2013		mytest	P	abc@email.com	Al'fa	B	Canine	Alfa	B	Canine	123 South 1st Street	New York	NY	10000	Some Park	Main	+1 212 555 1234	+1 212 555 4321	75 Some Blvd	New York	NY	10000	Some Really Long Org Name	Some Really Long Org Name	ABC+D Water	Gone	Status_Range	Y	^M

Open in new window


PARFILE (name q.par):
userid=scott/tiger
control=q.ctl

Open in new window


My table setup:
create table tab1(
             FIELD1          VARCHAR2(8),
             FIELD2          VARCHAR2(9),
             FIELD3          VARCHAR2(16),
             FIELD4          VARCHAR2(2),
             FIELD5          VARCHAR2(64),
             FIELD6          VARCHAR2(31),
             FIELD7          VARCHAR2(20),
             FIELD8          VARCHAR2(32),
             FIELD9          VARCHAR2(8),
             FIELD10         VARCHAR2(16),
             FIELD11         VARCHAR2(31),
             FIELD12         VARCHAR2(20),
             FIELD13         VARCHAR2(32),
             FIELD14         VARCHAR2(16),
             FIELD15         VARCHAR2(16),
             FIELD16         VARCHAR2(96),
             FIELD17         VARCHAR2(32),
             FIELD18         VARCHAR2(32),
             FIELD19         VARCHAR2(16),
             FIELD20         VARCHAR2(64),
             FIELD21         VARCHAR2(24),
             FIELD22         VARCHAR2(24),
             FIELD23         VARCHAR2(32),
             FIELD24         VARCHAR2(32),
             FIELD25         VARCHAR2(96),
             FIELD26         VARCHAR2(64),
             FIELD27         VARCHAR2(32),
             FIELD28         VARCHAR2(16),
             FIELD29         VARCHAR2(64),
             FIELD30         VARCHAR2(64),
             FIELD31         VARCHAR2(64),
             FIELD32         VARCHAR2(16),
             FIELD33         VARCHAR2(24),
             FIELD34         VARCHAR2(16),
             FIELD35         VARCHAR2(1),
             FIELD36         VARCHAR2(1),
             FIELD37         VARCHAR2(1) 
);
 

Open in new window


The command from the prompt:
sqlldr parfile=q.par

Open in new window

Author

Commented:
This test box is driving me nuts. Now I get this error with the sqlldr -

Message 2100 not found; No message file for product=RDBMS, facility=ULMessage 2100 not found; No message file for product=RDBMS
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
Make sure your ORACLE_HOME is properly set.  Typically in the past when I get message not found messages it is from an environment setup issue.

Author

Commented:
Getting there... :) How do you handle possible single quotes in the data?
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
>>How do you handle possible single quotes in the data?

SQL Loader handles them for you.

Look at my example above.  The second record I added has a single quote but it's hard to see:  Al'fa.

Author

Commented:
Ah, yeah! So this works but unfortunately when I test it out using a real data file, it only loads 124 of the 91k rows before it errors out with "MAXIMUM ERROR COUNT EXCEEDED - Above statistics reflect partial run." It seems some of the records sent have two tabs where there should be one and data is shifted over for some records. The old Perl script seems to have handled that better.
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
>>It seems some of the records sent have two tabs where there should be one

Two tabs side by side typically means a null value for a field.  This by itself is normal.

>> and data is shifted over for some records

Explain?  What it looks like in an editor (like vi) and what it actually is, is two different things.

There should be a 'log' file generated that will tell you why the records failed and a 'bad' file that has the ones that failed.

Author

Commented:
Actually, my mistake. A column was out of place! I fixed that and the data loads in 3 mins 1 second!  THANKS!!!
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
No problem.   Glad to help.  Told you it was very good at what it does!

You might be able to make it even faster but you need to understand what it actually doing.  It isn't for all situations.

Read up on Direct Path loading:
http://docs.oracle.com/cd/E11882_01/server.112/e22490/ldr_concepts.htm#SUTIL997

Author

Commented:
Ok, thank you! :)

I encountered a problem... how do I compare this new table that contains NULLS with another table that has ' ' instead of NULL? When I compare them using MERGE, all 91+k records having their fields with a ' ' value updated with a NULL value.  I don't want the field to be updated if it has ' ' and this new field has a NULL.
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
You have lost me.

In Oracle an empty string, '', is technically considered a null.

For example:
select 'hello' from dual where '' is null;

Going from your previous requirements, you wanted a 'None' string inserted when the field was null.  My posted test case does this.  Look at the NVL function on FIELD2.

You now mention a MERGE statement.  Where did this come into the picture?

Can you update your requirements or explain a lot more about what you are wanting to do?


If the loaded data is being loaded into a staging table, you might be better off using External Tables (mentioned above).  These use SQL Loader style syntax but allow you to use them like a 'normal' table without having to actually load the data into a 'normal' table.

Author

Commented:
Maybe this is something new? Should I start a new question with new points?

tab1 - the table populated by the data file with sqlLoader; contains current records
tab2 - existing table of all records - current and historical

I have a script that compares tab1's fields with tab2's field and makes updates to records
in tab2 that have changed.

My tables -

SQL> desc tab1
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 FIELD1                                             VARCHAR2(9)
 FIELD2                                             VARCHAR2(16)
 FIELD3                                             VARCHAR2(2)
 FIELD4                                             VARCHAR2(64)
 FIELD5                                             VARCHAR2(8)
 FIELD6                                             VARCHAR2(31)


SQL> desc tab2;
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 FIELD1                                    NOT NULL VARCHAR2(9)
 FIELD2                                    NOT NULL VARCHAR2(16)
 FIELD3                                    NOT NULL VARCHAR2(2)
 FIELD4                                    NOT NULL VARCHAR2(64)
 FIELD5                                    NOT NULL VARCHAR2(8)
 FIELD6                                    NOT NULL VARCHAR2(31)
 
 
 sql part of my script -
 
 MERGE INTO tab2 a
  USING
    (select field1,field2,field3,field4,field5,field6 from tab1
     minus
     select field1,field2,field3,field4,field5,field6 from tab2) b
  ON (a.field2 = b.field2)
  WHEN MATCHED THEN
  UPDATE SET
    a.field1 = b.field1,
    a.field3 = b.field3,
    a.field4 = b.field4,
    a.field5 = b.field5,
    a.field6 = b.field6
   
The error message:
DBD::Oracle::db do failed: ORA-01407: cannot update ("TAB2"."FIELD1") to NULL
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
>> Should I start a new question with new points?

Yes.  You can post the link to this one in the new one as a reference.  The reason is that you asked how to make the load of records faster.  That was answered.  The merge piece is different.

When you ask the new question please provide sample data (for both tables) and expected results.  With what you posted, I believe External Tables is better than SQL Loader into a temp table.

As long as the new question is in the Oracle Zone, I'll see it (as will all the Oracle Experts that monitor that zone).

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial