[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 215
  • Last Modified:

Retreive entries from a text file

I'm in pretty desperate need of a working Perl script.  I know exactly what the code needs to do but I just don't know Perl!

Here's the problem:

I have a text file consisting of thousands of entries in the following format:

>Entry 1
blahblahblahblahblahblahblahblahblahblahblahbla
blahblahblahblahblahblahblahblahblahblahblahbla
blahblahblahblahblahblahblahblahblahblahblahblah
>Entry 2
etcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetcet
etcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetc
>Entry 1000
blahetcblahetcblahetcblahetcblahetcblahetcblahetcb
ahetcblahetcblahetcblahetcblahetcblahetcblahetcbl
hetcblahetc


I need to cut some entries from the file.  The best way for me to do this would be to use an input file listing the entry names I want removed.

eg.
>Entry1
>Entry 1000
>Entry Infinity

Thus I would provide the entry names as an input file, and the Perl script would iteratively search for each Entry name provided.  When it finds an entry it is searching for it would cut the complete entry from the file.  The '>' sign is the handiest delimiter to use since every entry name is preceded by it.

Any help would be SINCERELY appreciated.

Cheers in Advance,

tb34

0
travisbickle34
Asked:
travisbickle34
  • 8
  • 5
  • 3
  • +1
3 Solutions
 
TalmashCommented:
hi ,

why in perl , in case it's not HW , here sm that will help :

open (IN_FILE,"in_file"); # the file with the
@in_file_lines = <IN_FILE>
close(IN_FILE);

open (BIG_FILE,"my_big_file");
@big_file_lines = <BIG_FILE>;
close(BIG_FILE);

open(NEW_FILE,">new_big_file");
$big_file_index = 0;
foreach $bad_entry (@in_file_lines) {
        $bad_entry =~ /Entry\s*(\d*)/;
        $bad_line = $1;
        while ($big_file_index < $bad_line) {
            print NEW_FILE, $big_file_line[$big_file_index];
            $big_file_line++;
         }
         $big_file_line++;
}
close(NEW_FILE);

good luck

tal
0
 
travisbickle34Author Commented:
I'll probably have to alter the script slightly from time to time and I'm just more comfortable with perl!

I should probably point out that the actual syntax of the entry names is as follows:

>ADXCAPD.x.C.y

where x and y are numbers...
0
 
travisbickle34Author Commented:
I've adapted and run the script.  It runs ok but the output file produced is empty.

Any suggestions??
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
travisbickle34Author Commented:
I'm increasing the points for a working solution to this problem - as I said it's pretty important! :)

0
 
FishMongerCommented:
Here's one that I tested.

open DEL, "<delete.txt" or die $!;
@delete = <DEL>;
chomp @delete;
close DEL;
$delete = join('|', @delete);

open IN, "<travis.txt" or die $!;
{ local $/ = ">";
   while (<IN>) {
      s/>$//;
      next if (/($delete)/ or /^$/);
      $keep{$1} = $_ if /^(Entry.*?\n)/i;
     
   }
}
if (%keep) {
   open OUT, ">travis.txt" or die $!;
   print OUT sort values %keep;
}
0
 
FishMongerCommented:
I forgot, you need to keep the > at the begining of each section.

So, change:
$keep{$1} = $_ if /^(Entry.*?\n)/i;

to this:
$keep{$1} = ">$_" if /^(Entry.*?\n)/i;
0
 
FishMongerCommented:
If you have a large number of sections that need to be removed, it might be faster if we iterate over each element of the @delete array instead of joining the array.
0
 
travisbickle34Author Commented:
Something's not right.

Using this as a file for processing:

>Entry 1
blahblahblahblahblahblahblahblahblahblahblahbla
blahblahblahblahblahblahblahblahblahblahblahbla
blahblahblahblahblahblahblahblahblahblahblahblah
>Entry 2
etcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetcet
etcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetc
>Entry 1000
blahetcblahetcblahetcblahetcblahetcblahetcblahetcb
ahetcblahetcblahetcblahetcblahetcblahetcblahetcbl
hetcblahetc


And using this as a delete list:

>Entry 1
>Entry 2

The output I get is:

>Entry 1
blahblahblahblahblahblahblahblahblahblahblahbla
blahblahblahblahblahblahblahblahblahblahblahbla
blahblahblahblahblahblahblahblahblahblahblahblah
>Entry 1000
blahetcblahetcblahetcblahetcblahetcblahetcblahetcb
ahetcblahetcblahetcblahetcblahetcblahetcblahetcbl
hetcblahetc

>Entry 2
etcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetcet
etcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetc



I dunno what's happening  :-\



0
 
FishMongerCommented:
Humm, I must have changed something between the time I tested and the posting.

change these 2 lines:
      next if (/($delete)/ or /^$/);
      $keep{$1} = ">$_" if /^(Entry.*?\n)/i;

to this:
      next if (/^($delete)\n/);
      $keep{$1} = ">$_" if (/^(Entry[^\n]+)/i);
0
 
ozoCommented:
open DEL, "<delete.txt" or die $!;
@delete = <DEL>;
chomp @delete;
close DEL;
$delete = join('|', map"\Q$_\E",@delete);
$delete =qr/^($delete)$/m;
open IN, "<travis.txt" or die $!;
{ local $/ = ">";
   while (<IN>) {
      chomp;
      next unless length;
      s/^/>/;
      next if /$delete/;
      push @keep, $_;
     
   }
}
   open OUT, ">travis.txt" or die $!;
   print OUT @keep;
0
 
FishMongerCommented:
ozo's right, using an array would be better than the hash I used.  And, I'm sure that his method of constructing the regex is better, but I'm not exactly sure why.
0
 
travisbickle34Author Commented:
Both scripts are working perfectly now - thanks guys!

One last question - is there a simple way to make ozo's script output the deleted sequences to a second file?

Also - how can I split the points between you two?  It's only fair I think...
0
 
TalmashCommented:
hi travis , I did not forgot you , just we are not working in the same hours .


open (IN_FILE,"in_file"); # the file with the
@in_file_lines = <IN_FILE>
close(IN_FILE);

open (BIG_FILE,"my_big_file");
@big_file_lines = <BIG_FILE>;
close(BIG_FILE);

open(NEW_FILE,">new_big_file");

to create another file :
my @del_lines;
open (DEL_LINES,">deleted_lines.txt"); # put this line near the "open" of the other file .

$big_file_index = 0;
foreach $bad_entry (@in_file_lines) {
        $bad_entry =~ /Entry\s*(\d*)/;
        $bad_line = $1;
        while ($big_file_index < $bad_line) {
            print NEW_FILE, $big_file_line[$big_file_index];
            $big_file_index++;
         }
         push @del_lines,$big_file_line[$big_file_index];
         $big_file_index++;
}
close(NEW_FILE);
close(DEL_LINES);

tal

0
 
ozoCommented:
  if( /$delete/ ){
        print SECOND_FILE;
    }else{
        push @keep,$_;
    }
0
 
travisbickle34Author Commented:
Ozo - can you edit your alteration into this piece of script?  I seem to be making a bollocks of it somehow :(


open DEL, "<list" or die $!;
@delete = <DEL>;
chomp @delete;
close DEL;
$delete = join('|', map"\Q$_\E",@delete);
$delete =qr/^($delete)$/m;
open IN, "<input" or die $!;
{ local $/ = ">";
   while (<IN>) {
      chomp;
      next unless length;
      s/^/>/;
      next if /$delete/;
      push @keep, $_;

   }
}
   open OUT, ">output" or die $!;
   print OUT @keep;
0
 
ozoCommented:
open DEL, "<list" or die $!;
@delete = <DEL>;
chomp @delete;
close DEL;
$delete = join('|', map"\Q$_\E",@delete);
$delete =qr/^($delete)$/m;
open IN, "<input" or die $!;
open OUT, ">output" or die $!;
open SECOND_FILE,">second.file" or die $!;
{ local $/ = ">";
   while (<IN>) {
      chomp;
      next unless length;
      s/^/>/;
      if( /$delete/ ){
        print SECOND_FILE;
      }else{
        print OUT;
      }
    }
}
close OUT;
close SECOND_FILE;
0
 
travisbickle34Author Commented:
Ok - I don't know what has happened but the above script now just seems to output all entries to both output files.  My head hurts...
0
 
travisbickle34Author Commented:
Actually - it seems to be working fine now!

I don't know what was happening there.  By any chance do entries consisting of only a single line mes up the process somehow?

Regardless - I'm allocating points now.

Thanks guys.
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 8
  • 5
  • 3
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now