hadrons
asked on
Spliting a file based on the number of records
I found a script - on this website - that's based on splitting a file based on a pattern, but when I tried to insert an if statement to split the based on the number of occurrences of that pattern it didn't work as I desired (it returned the pattern, but nothing else.) Underneath is what I tried.
Basically I would like the script to split the file if it matches more than n number (in this case 6000) pattern matches of </product>
#!/usr/bin/perl
use strict;
my $i=1;
my $open_product_count=0;
open (DATA,"file.txt") or die;
while(<DATA>){
while (/<\/product>/ig) {
$open_product_count++;
}
if ($open_product_count > 6000) {
open (FILE,">>file_$i.txt") or die;
print FILE $_;
if ($_=~/^<\/product>/){$i++; }
close (FILE);
}
}
close (DATA);
Basically I would like the script to split the file if it matches more than n number (in this case 6000) pattern matches of </product>
#!/usr/bin/perl
use strict;
my $i=1;
my $open_product_count=0;
open (DATA,"file.txt") or die;
while(<DATA>){
while (/<\/product>/ig) {
$open_product_count++;
}
if ($open_product_count > 6000) {
open (FILE,">>file_$i.txt") or die;
print FILE $_;
if ($_=~/^<\/product>/){$i++;
close (FILE);
}
}
close (DATA);
ASKER
I'm sorry for not being clearer in my original question, but basically, if a file has over 6,000 occurrences of </product> (this is the end of a record set) then I want the file to split into smaller files with no more than 6,000 record sets.
I used the code to keep track of the number of times </product> appears and to start splitting at that point:
while (/<\/product>/ig) {
$open_product_count++;
}
if ($open_product_count > 6000) { ...}
I end up with two files that had just </product> in them.
I used the code to keep track of the number of times </product> appears and to start splitting at that point:
while (/<\/product>/ig) {
$open_product_count++;
}
if ($open_product_count > 6000) { ...}
I end up with two files that had just </product> in them.
I think you should have ended up with one file for each time <\/product> appears at the beginning of a line in file.txt, after the <\/product> has been seen 6000 times anywhere on each line.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Excellent ... thankfully I didn't create 6,000 file with my code, but your works perfectly
Which was the if statement you inserted, and what do you mean by "it returned the pattern"?