Solved

Retreive entries from a text file

Posted on 2004-08-04
19
207 Views
Last Modified: 2010-03-05
I'm in pretty desperate need of a working Perl script.  I know exactly what the code needs to do but I just don't know Perl!

Here's the problem:

I have a text file consisting of thousands of entries in the following format:

>Entry 1
blahblahblahblahblahblahblahblahblahblahblahbla
blahblahblahblahblahblahblahblahblahblahblahbla
blahblahblahblahblahblahblahblahblahblahblahblah
>Entry 2
etcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetcet
etcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetc
>Entry 1000
blahetcblahetcblahetcblahetcblahetcblahetcblahetcb
ahetcblahetcblahetcblahetcblahetcblahetcblahetcbl
hetcblahetc


I need to cut some entries from the file.  The best way for me to do this would be to use an input file listing the entry names I want removed.

eg.
>Entry1
>Entry 1000
>Entry Infinity

Thus I would provide the entry names as an input file, and the Perl script would iteratively search for each Entry name provided.  When it finds an entry it is searching for it would cut the complete entry from the file.  The '>' sign is the handiest delimiter to use since every entry name is preceded by it.

Any help would be SINCERELY appreciated.

Cheers in Advance,

tb34

0
Comment
Question by:travisbickle34
  • 8
  • 5
  • 3
  • +1
19 Comments
 
LVL 6

Assisted Solution

by:Talmash
Talmash earned 50 total points
ID: 11715769
hi ,

why in perl , in case it's not HW , here sm that will help :

open (IN_FILE,"in_file"); # the file with the
@in_file_lines = <IN_FILE>
close(IN_FILE);

open (BIG_FILE,"my_big_file");
@big_file_lines = <BIG_FILE>;
close(BIG_FILE);

open(NEW_FILE,">new_big_file");
$big_file_index = 0;
foreach $bad_entry (@in_file_lines) {
        $bad_entry =~ /Entry\s*(\d*)/;
        $bad_line = $1;
        while ($big_file_index < $bad_line) {
            print NEW_FILE, $big_file_line[$big_file_index];
            $big_file_line++;
         }
         $big_file_line++;
}
close(NEW_FILE);

good luck

tal
0
 

Author Comment

by:travisbickle34
ID: 11716435
I'll probably have to alter the script slightly from time to time and I'm just more comfortable with perl!

I should probably point out that the actual syntax of the entry names is as follows:

>ADXCAPD.x.C.y

where x and y are numbers...
0
 

Author Comment

by:travisbickle34
ID: 11716717
I've adapted and run the script.  It runs ok but the output file produced is empty.

Any suggestions??
0
 

Author Comment

by:travisbickle34
ID: 11716888
I'm increasing the points for a working solution to this problem - as I said it's pretty important! :)

0
 
LVL 28

Assisted Solution

by:FishMonger
FishMonger earned 150 total points
ID: 11717137
Here's one that I tested.

open DEL, "<delete.txt" or die $!;
@delete = <DEL>;
chomp @delete;
close DEL;
$delete = join('|', @delete);

open IN, "<travis.txt" or die $!;
{ local $/ = ">";
   while (<IN>) {
      s/>$//;
      next if (/($delete)/ or /^$/);
      $keep{$1} = $_ if /^(Entry.*?\n)/i;
     
   }
}
if (%keep) {
   open OUT, ">travis.txt" or die $!;
   print OUT sort values %keep;
}
0
 
LVL 28

Expert Comment

by:FishMonger
ID: 11717181
I forgot, you need to keep the > at the begining of each section.

So, change:
$keep{$1} = $_ if /^(Entry.*?\n)/i;

to this:
$keep{$1} = ">$_" if /^(Entry.*?\n)/i;
0
 
LVL 28

Expert Comment

by:FishMonger
ID: 11717305
If you have a large number of sections that need to be removed, it might be faster if we iterate over each element of the @delete array instead of joining the array.
0
 

Author Comment

by:travisbickle34
ID: 11717415
Something's not right.

Using this as a file for processing:

>Entry 1
blahblahblahblahblahblahblahblahblahblahblahbla
blahblahblahblahblahblahblahblahblahblahblahbla
blahblahblahblahblahblahblahblahblahblahblahblah
>Entry 2
etcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetcet
etcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetc
>Entry 1000
blahetcblahetcblahetcblahetcblahetcblahetcblahetcb
ahetcblahetcblahetcblahetcblahetcblahetcblahetcbl
hetcblahetc


And using this as a delete list:

>Entry 1
>Entry 2

The output I get is:

>Entry 1
blahblahblahblahblahblahblahblahblahblahblahbla
blahblahblahblahblahblahblahblahblahblahblahbla
blahblahblahblahblahblahblahblahblahblahblahblah
>Entry 1000
blahetcblahetcblahetcblahetcblahetcblahetcblahetcb
ahetcblahetcblahetcblahetcblahetcblahetcblahetcbl
hetcblahetc

>Entry 2
etcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetcet
etcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetc



I dunno what's happening  :-\



0
 
LVL 28

Expert Comment

by:FishMonger
ID: 11717834
Humm, I must have changed something between the time I tested and the posting.

change these 2 lines:
      next if (/($delete)/ or /^$/);
      $keep{$1} = ">$_" if /^(Entry.*?\n)/i;

to this:
      next if (/^($delete)\n/);
      $keep{$1} = ">$_" if (/^(Entry[^\n]+)/i);
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 84

Expert Comment

by:ozo
ID: 11717891
open DEL, "<delete.txt" or die $!;
@delete = <DEL>;
chomp @delete;
close DEL;
$delete = join('|', map"\Q$_\E",@delete);
$delete =qr/^($delete)$/m;
open IN, "<travis.txt" or die $!;
{ local $/ = ">";
   while (<IN>) {
      chomp;
      next unless length;
      s/^/>/;
      next if /$delete/;
      push @keep, $_;
     
   }
}
   open OUT, ">travis.txt" or die $!;
   print OUT @keep;
0
 
LVL 28

Expert Comment

by:FishMonger
ID: 11718286
ozo's right, using an array would be better than the hash I used.  And, I'm sure that his method of constructing the regex is better, but I'm not exactly sure why.
0
 

Author Comment

by:travisbickle34
ID: 11723994
Both scripts are working perfectly now - thanks guys!

One last question - is there a simple way to make ozo's script output the deleted sequences to a second file?

Also - how can I split the points between you two?  It's only fair I think...
0
 
LVL 6

Expert Comment

by:Talmash
ID: 11724178
hi travis , I did not forgot you , just we are not working in the same hours .


open (IN_FILE,"in_file"); # the file with the
@in_file_lines = <IN_FILE>
close(IN_FILE);

open (BIG_FILE,"my_big_file");
@big_file_lines = <BIG_FILE>;
close(BIG_FILE);

open(NEW_FILE,">new_big_file");

to create another file :
my @del_lines;
open (DEL_LINES,">deleted_lines.txt"); # put this line near the "open" of the other file .

$big_file_index = 0;
foreach $bad_entry (@in_file_lines) {
        $bad_entry =~ /Entry\s*(\d*)/;
        $bad_line = $1;
        while ($big_file_index < $bad_line) {
            print NEW_FILE, $big_file_line[$big_file_index];
            $big_file_index++;
         }
         push @del_lines,$big_file_line[$big_file_index];
         $big_file_index++;
}
close(NEW_FILE);
close(DEL_LINES);

tal

0
 
LVL 84

Expert Comment

by:ozo
ID: 11725048
  if( /$delete/ ){
        print SECOND_FILE;
    }else{
        push @keep,$_;
    }
0
 

Author Comment

by:travisbickle34
ID: 11725289
Ozo - can you edit your alteration into this piece of script?  I seem to be making a bollocks of it somehow :(


open DEL, "<list" or die $!;
@delete = <DEL>;
chomp @delete;
close DEL;
$delete = join('|', map"\Q$_\E",@delete);
$delete =qr/^($delete)$/m;
open IN, "<input" or die $!;
{ local $/ = ">";
   while (<IN>) {
      chomp;
      next unless length;
      s/^/>/;
      next if /$delete/;
      push @keep, $_;

   }
}
   open OUT, ">output" or die $!;
   print OUT @keep;
0
 
LVL 84

Accepted Solution

by:
ozo earned 200 total points
ID: 11725378
open DEL, "<list" or die $!;
@delete = <DEL>;
chomp @delete;
close DEL;
$delete = join('|', map"\Q$_\E",@delete);
$delete =qr/^($delete)$/m;
open IN, "<input" or die $!;
open OUT, ">output" or die $!;
open SECOND_FILE,">second.file" or die $!;
{ local $/ = ">";
   while (<IN>) {
      chomp;
      next unless length;
      s/^/>/;
      if( /$delete/ ){
        print SECOND_FILE;
      }else{
        print OUT;
      }
    }
}
close OUT;
close SECOND_FILE;
0
 

Author Comment

by:travisbickle34
ID: 11726013
Ok - I don't know what has happened but the above script now just seems to output all entries to both output files.  My head hurts...
0
 

Author Comment

by:travisbickle34
ID: 11726066
Actually - it seems to be working fine now!

I don't know what was happening there.  By any chance do entries consisting of only a single line mes up the process somehow?

Regardless - I'm allocating points now.

Thanks guys.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Perl - Mawk 2 87
Strange perl issue 6 126
iSeries PERL Scripts 7 147
Any syntax error for this clone.plscript 6 134
Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Video by: Mark
This lesson goes over how to construct ordered and unordered lists and how to create hyperlinks.

863 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now