[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

Read large data file via perl in chunks

Posted on 2008-11-19
16
Medium Priority
?
682 Views
Last Modified: 2012-05-05
I have file with more 700,000 records, I want to load 100,000 into a table process them and get next 100,000 until I am done. How would I accomplish file reading chunks via perl (currently I load all the records at once).

Thanks
0
Comment
Question by:khanzada19
16 Comments
 
LVL 39

Expert Comment

by:Adam314
ID: 22997213
Are the records fixed length?  If so, you could use the read function with
    (record size) * (number of records you want to read)

If they are terminated by something (such as newline), set $/ to the end-of-record character, then read 1 record at a time and save it in memory until you have all the records you want.
0
 

Author Comment

by:khanzada19
ID: 22997241
records are seperated by ;, could you give me code example. Thanks
0
 
LVL 39

Expert Comment

by:Adam314
ID: 22997402

open(my $in, "<your_file.txt") or die "could not open: $!\n";
local $/=";";  #If you meant ; then newline, use ";\n"
 
my @records;
while(<$in>) {
  if($#records<100_000) {push @records, $_;}
  else {
    #process your records here
    @records=();  #then clear for the next 100_000
  }
}
close($in);

Open in new window

0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 

Author Comment

by:khanzada19
ID: 22998265
I am doing following but it is read 1 record at time instead of 5000. what am I doing wrong?

  $ret = open(E_P_FILE, "< $PathFileName");
  local $/=";\n";

my @records;
      while ( $line = <EXP_IMP_FILE> )

       if ($#records < 5000){
         print"\n\n NumberOfLineImport := $NumberOfLineImport\n\n";
       }
       else{
         print"\n\n ELSE NumberOfLineImport := $NumberOfLineImport\n\n";
             @records=();
       }
 }#while
0
 
LVL 39

Expert Comment

by:Adam314
ID: 22998369

$ret = open(E_P_FILE, "< $PathFileName");
local $/=";\n";
 
my @records;
while ( $line = <EXP_IMP_FILE> ) {
    if ($#records < 5000){
        push @records, $line;
    }
    else{
        #process records here, there will be 5000 of them in @records
        
        #then clear records
        @records=();
    }
}#while
 
#If there are any records here, @records will be non-empty.
#You don't have 5000 though... if you want to process these, do so here

Open in new window

0
 

Author Comment

by:khanzada19
ID: 22998471
I am sorry but I dont follow what you mean, when do print of $#records i am getting -1.
0
 
LVL 39

Expert Comment

by:Adam314
ID: 22998553
$#records is the highest index in the array.  If it is -1, you don't have any records.  Where are you printing it?
0
 
LVL 39

Expert Comment

by:Adam314
ID: 22998570
Oh.... the file handle name you have is incorrect, try this, replace line 1 above with this:



open(EXP_IMP_FILE, "< $PathFileName") or die "Could not open file: $!\n";

Open in new window

0
 

Author Comment

by:khanzada19
ID: 22998664
The file handle name  is correct I forgot to change it when i did the cut and paste. Currently there are more than 500,000 records in the E_P_FILE, I want to first select 5000 records insert them into a table and do some processing and than after processing is perl would pick another 5000 records insert to a table and process them .... and keep going until i am done processing all the records.
0
 
LVL 39

Expert Comment

by:Adam314
ID: 22999139
>>insert them into a table
Is this a database table, or some table you are keeping in RAM?  If it is a database table, there is no need to store the records in @records, you can simply do an insert.  Then after 5000 records, you can process them from the DB.
Note that if you insert your second set of 5000 records to the same table as the first, they will all be there together.
0
 

Author Comment

by:khanzada19
ID: 22999207
I delete them as soon as i process them, but I am still not getting 5000 records at once, I get 1 and keep incremanting. What I did know is follwoing. it seems to be working. I was look to do a better way.

while ( $line = <EXP_IMP_FILE> )
$count = $count +1;
       if ($count eq 5000){
        insert
       process
      delete
      $count = 0;
       }
  }#while
0
 
LVL 85

Expert Comment

by:ozo
ID: 22999354
> What I did know is follwoing. it seems to be working. I was look to do a better way.
What would you consider to be better?
0
 
LVL 39

Expert Comment

by:Adam314
ID: 22999496
What you posted will only process every 5000'th record.  It will not insert all records, and do processing on them in groups of 5000.  Is this what you meant you wanted?
0
 
LVL 27

Expert Comment

by:sujith80
ID: 23001992
I guess khanzada19 is looking for bulk reading of the file in batches.
But even if you read them in batches, i dont think there is a bulk table loading feature available in PERL.
0
 

Author Comment

by:khanzada19
ID: 23003906
Yes, that's i want to process in 5000 records chunks.
0
 
LVL 39

Accepted Solution

by:
Adam314 earned 1200 total points
ID: 23004445

while ( $line = <EXP_IMP_FILE> )
    #insert
    if(!($. % 5000) or eof(EXP_IMP_FILE)) {
        #process
        #delete
    }
}

Open in new window

0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans
Suggested Courses

834 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question