Retreive entries from a text file

I'm in pretty desperate need of a working Perl script.  I know exactly what the code needs to do but I just don't know Perl!

Here's the problem:

I have a text file consisting of thousands of entries in the following format:

>Entry 1
blahblahblahblahblahblahblahblahblahblahblahbla
blahblahblahblahblahblahblahblahblahblahblahbla
blahblahblahblahblahblahblahblahblahblahblahblah
>Entry 2
etcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetcet
etcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetc
>Entry 1000
blahetcblahetcblahetcblahetcblahetcblahetcblahetcb
ahetcblahetcblahetcblahetcblahetcblahetcblahetcbl
hetcblahetc


I need to cut some entries from the file.  The best way for me to do this would be to use an input file listing the entry names I want removed.

eg.
>Entry1
>Entry 1000
>Entry Infinity

Thus I would provide the entry names as an input file, and the Perl script would iteratively search for each Entry name provided.  When it finds an entry it is searching for it would cut the complete entry from the file.  The '>' sign is the handiest delimiter to use since every entry name is preceded by it.

Any help would be SINCERELY appreciated.

Cheers in Advance,

tb34

travisbickle34Asked:
Who is Participating?
 
ozoConnect With a Mentor Commented:
open DEL, "<list" or die $!;
@delete = <DEL>;
chomp @delete;
close DEL;
$delete = join('|', map"\Q$_\E",@delete);
$delete =qr/^($delete)$/m;
open IN, "<input" or die $!;
open OUT, ">output" or die $!;
open SECOND_FILE,">second.file" or die $!;
{ local $/ = ">";
   while (<IN>) {
      chomp;
      next unless length;
      s/^/>/;
      if( /$delete/ ){
        print SECOND_FILE;
      }else{
        print OUT;
      }
    }
}
close OUT;
close SECOND_FILE;
0
 
TalmashConnect With a Mentor Commented:
hi ,

why in perl , in case it's not HW , here sm that will help :

open (IN_FILE,"in_file"); # the file with the
@in_file_lines = <IN_FILE>
close(IN_FILE);

open (BIG_FILE,"my_big_file");
@big_file_lines = <BIG_FILE>;
close(BIG_FILE);

open(NEW_FILE,">new_big_file");
$big_file_index = 0;
foreach $bad_entry (@in_file_lines) {
        $bad_entry =~ /Entry\s*(\d*)/;
        $bad_line = $1;
        while ($big_file_index < $bad_line) {
            print NEW_FILE, $big_file_line[$big_file_index];
            $big_file_line++;
         }
         $big_file_line++;
}
close(NEW_FILE);

good luck

tal
0
 
travisbickle34Author Commented:
I'll probably have to alter the script slightly from time to time and I'm just more comfortable with perl!

I should probably point out that the actual syntax of the entry names is as follows:

>ADXCAPD.x.C.y

where x and y are numbers...
0
Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

 
travisbickle34Author Commented:
I've adapted and run the script.  It runs ok but the output file produced is empty.

Any suggestions??
0
 
travisbickle34Author Commented:
I'm increasing the points for a working solution to this problem - as I said it's pretty important! :)

0
 
FishMongerConnect With a Mentor Commented:
Here's one that I tested.

open DEL, "<delete.txt" or die $!;
@delete = <DEL>;
chomp @delete;
close DEL;
$delete = join('|', @delete);

open IN, "<travis.txt" or die $!;
{ local $/ = ">";
   while (<IN>) {
      s/>$//;
      next if (/($delete)/ or /^$/);
      $keep{$1} = $_ if /^(Entry.*?\n)/i;
     
   }
}
if (%keep) {
   open OUT, ">travis.txt" or die $!;
   print OUT sort values %keep;
}
0
 
FishMongerCommented:
I forgot, you need to keep the > at the begining of each section.

So, change:
$keep{$1} = $_ if /^(Entry.*?\n)/i;

to this:
$keep{$1} = ">$_" if /^(Entry.*?\n)/i;
0
 
FishMongerCommented:
If you have a large number of sections that need to be removed, it might be faster if we iterate over each element of the @delete array instead of joining the array.
0
 
travisbickle34Author Commented:
Something's not right.

Using this as a file for processing:

>Entry 1
blahblahblahblahblahblahblahblahblahblahblahbla
blahblahblahblahblahblahblahblahblahblahblahbla
blahblahblahblahblahblahblahblahblahblahblahblah
>Entry 2
etcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetcet
etcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetc
>Entry 1000
blahetcblahetcblahetcblahetcblahetcblahetcblahetcb
ahetcblahetcblahetcblahetcblahetcblahetcblahetcbl
hetcblahetc


And using this as a delete list:

>Entry 1
>Entry 2

The output I get is:

>Entry 1
blahblahblahblahblahblahblahblahblahblahblahbla
blahblahblahblahblahblahblahblahblahblahblahbla
blahblahblahblahblahblahblahblahblahblahblahblah
>Entry 1000
blahetcblahetcblahetcblahetcblahetcblahetcblahetcb
ahetcblahetcblahetcblahetcblahetcblahetcblahetcbl
hetcblahetc

>Entry 2
etcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetcet
etcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetc



I dunno what's happening  :-\



0
 
FishMongerCommented:
Humm, I must have changed something between the time I tested and the posting.

change these 2 lines:
      next if (/($delete)/ or /^$/);
      $keep{$1} = ">$_" if /^(Entry.*?\n)/i;

to this:
      next if (/^($delete)\n/);
      $keep{$1} = ">$_" if (/^(Entry[^\n]+)/i);
0
 
ozoCommented:
open DEL, "<delete.txt" or die $!;
@delete = <DEL>;
chomp @delete;
close DEL;
$delete = join('|', map"\Q$_\E",@delete);
$delete =qr/^($delete)$/m;
open IN, "<travis.txt" or die $!;
{ local $/ = ">";
   while (<IN>) {
      chomp;
      next unless length;
      s/^/>/;
      next if /$delete/;
      push @keep, $_;
     
   }
}
   open OUT, ">travis.txt" or die $!;
   print OUT @keep;
0
 
FishMongerCommented:
ozo's right, using an array would be better than the hash I used.  And, I'm sure that his method of constructing the regex is better, but I'm not exactly sure why.
0
 
travisbickle34Author Commented:
Both scripts are working perfectly now - thanks guys!

One last question - is there a simple way to make ozo's script output the deleted sequences to a second file?

Also - how can I split the points between you two?  It's only fair I think...
0
 
TalmashCommented:
hi travis , I did not forgot you , just we are not working in the same hours .


open (IN_FILE,"in_file"); # the file with the
@in_file_lines = <IN_FILE>
close(IN_FILE);

open (BIG_FILE,"my_big_file");
@big_file_lines = <BIG_FILE>;
close(BIG_FILE);

open(NEW_FILE,">new_big_file");

to create another file :
my @del_lines;
open (DEL_LINES,">deleted_lines.txt"); # put this line near the "open" of the other file .

$big_file_index = 0;
foreach $bad_entry (@in_file_lines) {
        $bad_entry =~ /Entry\s*(\d*)/;
        $bad_line = $1;
        while ($big_file_index < $bad_line) {
            print NEW_FILE, $big_file_line[$big_file_index];
            $big_file_index++;
         }
         push @del_lines,$big_file_line[$big_file_index];
         $big_file_index++;
}
close(NEW_FILE);
close(DEL_LINES);

tal

0
 
ozoCommented:
  if( /$delete/ ){
        print SECOND_FILE;
    }else{
        push @keep,$_;
    }
0
 
travisbickle34Author Commented:
Ozo - can you edit your alteration into this piece of script?  I seem to be making a bollocks of it somehow :(


open DEL, "<list" or die $!;
@delete = <DEL>;
chomp @delete;
close DEL;
$delete = join('|', map"\Q$_\E",@delete);
$delete =qr/^($delete)$/m;
open IN, "<input" or die $!;
{ local $/ = ">";
   while (<IN>) {
      chomp;
      next unless length;
      s/^/>/;
      next if /$delete/;
      push @keep, $_;

   }
}
   open OUT, ">output" or die $!;
   print OUT @keep;
0
 
travisbickle34Author Commented:
Ok - I don't know what has happened but the above script now just seems to output all entries to both output files.  My head hurts...
0
 
travisbickle34Author Commented:
Actually - it seems to be working fine now!

I don't know what was happening there.  By any chance do entries consisting of only a single line mes up the process somehow?

Regardless - I'm allocating points now.

Thanks guys.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.