asked on

Help with Perl While Loop

Is there any way to make this script run faster? Right now it is taking almost an hour to
process a 91,000 flat file. Here is the while loop code which runs through each line in the
file and insert it into an oracle database table.

my $line = 0;
my $c = 0;

while (<IN>) {
chomp($_);
$line += 1;

my ($field1,
$field2,
$field3,
$field4,
$field5,
$field6,
$field7,
$field8,
$field9,
$field10,
$field11,
$field12,
$field13,
$field14,
$field15,
$field16,
$field17,
$field18,
$field19,
$field20,
$field21,
$field22,
$field23,
$field24) = split(/ /,$_);

&fix($field1);
&fix($field2);
&fix($field3);
&fix($field4);
&fix($field5);
&fix($field6);
&fix($field7);
&fix($field8);
&fix($field9);
&fix($field10);
&fix($field11);
&fix($field12);
&fix($field13);
&fix($field14);
&fix($field15);
&fix($field16);
&fix($field17);
&fix($field18);
&fix($field19);
&fix($field20);
&fix($field21);
&fix($field22);
&fix($field23);
&fix($field24);

if ($field1 ne ' ') {
$field1 =~ s/\///g;
my ($month, $day, $year) = unpack "A2A2A4", $field1;
$field1 = $year.''.$month.''.$day;
}

$field23 = lc($field23);
$field24 = lc($field24);

if (($field15 ne "A") && ($field16 ne "B")) {
eval {
$sql = "insert into myTable values (\'$field1\',\'$field2\',\'$field3\',\'$field4\',\'$field5\',\'$field6\',\'$field7\',\
'$field8\',\'$field9\',\'$field10\',\'$field11\',\'$field12\',\'$field13\',\'$field14\',\'$field15\',\'$field16\',\
'$field17\',\'$field18\',\'$field19\',\'$field20\',\'$field21\',\'$field22\',\'$field23\',\'$field24\') ";
$sth = $dbh->prepare($sql);
$sth->execute();
};
if ($@) { print "$@\n"; next; }
$dbh->commit();
$c += 1;
}
}

Subroutine that's called -

sub fix {
$_[0] =~ s/^ *//;
$_[0] =~ s/ *$//;
$_[0] =~ s/'/''/g;
if (! $_[0]) {
$_[0] = ' ';
}
}

ozo

The perl code can be streamlined, but I would expect that most of the time would be taken by the $sth->execute();
You might try moving the prepare out of the loop and doing a bind in the loop, or only doing a commit after several inserts.

FishMonger

There are a number of inefficiencies and questionable coding practices in that code, but without knowing where the script is spending most of its time, we can't say for certain which of those inefficiencies you should focus on fixing.

You can and should use the Devel::NYTProf module to learn where the script is spending most of its time. We can also make some educated guesses, like ozo did, and correct some of those inefficiencies.

If you use ' ' for the pattern in the split statement instead of / /, it will strip the leading and trailing spaces from the fields, so you won't need to do that in the fix() sub. And, instead of creating 24 sequentially numbered scalars, it would be better to use an array.

Instead of calling the fix() sub 24 times for each line in the file, it would be better to alter the sub to have it accept an array reference.

my @fields = split(' ', $_);
fix(\@fields);

Open in new window

An even better option/approach would be to use the Text::CSV module to parse the lines of the file.

ozo has already suggested moving the prepare statement, which should defiantly be done. When doing that, you need to use placeholders in the prepare statement instead of passing the $fieldX vars. Those vars (or the array) would be passed in the execute statement.
Placeholders and Bind Values

There are additional improvements that I can suggest, but start with those and we'll cover the others as needed.

justmorri

ASKER

Thanks, ozo and FishMonger! I'll try those suggestions and report back.

ozo

my $c = 0;
my @field;
$sth = $dbh->prepare("insert into myTable values (?,?,?,?,?,?,?,?,?,?,?,?.?,?,?,?,?,?,?,?,?,?,?,?)");
while (<IN>) {
@field[1..24] = split;
next unless $field[15] ne "A" && $field[16] ne "B";
tr/'/"/, $_||=' ' for @field;
$_=join'',(split m{/*})[4..7,0..3] for $field[1];
$_ = lc for @field[23,24];
eval {
$sth->execute(@field[1..24]);
};
if ($@) { print "$@\n"; next; }
$dbh->commit();
$c += 1;
}
my $line = $.;