Link to home
Start Free TrialLog in
Avatar of khanzada19
khanzada19

asked on

Read large data file via perl in chunks

I have file with more 700,000 records, I want to load 100,000 into a table process them and get next 100,000 until I am done. How would I accomplish file reading chunks via perl (currently I load all the records at once).

Thanks
Avatar of Adam314
Adam314

Are the records fixed length?  If so, you could use the read function with
    (record size) * (number of records you want to read)

If they are terminated by something (such as newline), set $/ to the end-of-record character, then read 1 record at a time and save it in memory until you have all the records you want.
Avatar of khanzada19

ASKER

records are seperated by ;, could you give me code example. Thanks

open(my $in, "<your_file.txt") or die "could not open: $!\n";
local $/=";";  #If you meant ; then newline, use ";\n"
 
my @records;
while(<$in>) {
  if($#records<100_000) {push @records, $_;}
  else {
    #process your records here
    @records=();  #then clear for the next 100_000
  }
}
close($in);

Open in new window

I am doing following but it is read 1 record at time instead of 5000. what am I doing wrong?

  $ret = open(E_P_FILE, "< $PathFileName");
  local $/=";\n";

my @records;
      while ( $line = <EXP_IMP_FILE> )

       if ($#records < 5000){
         print"\n\n NumberOfLineImport := $NumberOfLineImport\n\n";
       }
       else{
         print"\n\n ELSE NumberOfLineImport := $NumberOfLineImport\n\n";
             @records=();
       }
 }#while

$ret = open(E_P_FILE, "< $PathFileName");
local $/=";\n";
 
my @records;
while ( $line = <EXP_IMP_FILE> ) {
    if ($#records < 5000){
        push @records, $line;
    }
    else{
        #process records here, there will be 5000 of them in @records
        
        #then clear records
        @records=();
    }
}#while
 
#If there are any records here, @records will be non-empty.
#You don't have 5000 though... if you want to process these, do so here

Open in new window

I am sorry but I dont follow what you mean, when do print of $#records i am getting -1.
$#records is the highest index in the array.  If it is -1, you don't have any records.  Where are you printing it?
Oh.... the file handle name you have is incorrect, try this, replace line 1 above with this:



open(EXP_IMP_FILE, "< $PathFileName") or die "Could not open file: $!\n";

Open in new window

The file handle name  is correct I forgot to change it when i did the cut and paste. Currently there are more than 500,000 records in the E_P_FILE, I want to first select 5000 records insert them into a table and do some processing and than after processing is perl would pick another 5000 records insert to a table and process them .... and keep going until i am done processing all the records.
>>insert them into a table
Is this a database table, or some table you are keeping in RAM?  If it is a database table, there is no need to store the records in @records, you can simply do an insert.  Then after 5000 records, you can process them from the DB.
Note that if you insert your second set of 5000 records to the same table as the first, they will all be there together.
I delete them as soon as i process them, but I am still not getting 5000 records at once, I get 1 and keep incremanting. What I did know is follwoing. it seems to be working. I was look to do a better way.

while ( $line = <EXP_IMP_FILE> )
$count = $count +1;
       if ($count eq 5000){
        insert
       process
      delete
      $count = 0;
       }
  }#while
Avatar of ozo
> What I did know is follwoing. it seems to be working. I was look to do a better way.
What would you consider to be better?
What you posted will only process every 5000'th record.  It will not insert all records, and do processing on them in groups of 5000.  Is this what you meant you wanted?
I guess khanzada19 is looking for bulk reading of the file in batches.
But even if you read them in batches, i dont think there is a bulk table loading feature available in PERL.
Yes, that's i want to process in 5000 records chunks.
ASKER CERTIFIED SOLUTION
Avatar of Adam314
Adam314

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial