Solved

Retreive entries from a text file

Posted on 2004-08-04
19
206 Views
Last Modified: 2010-03-05
I'm in pretty desperate need of a working Perl script.  I know exactly what the code needs to do but I just don't know Perl!

Here's the problem:

I have a text file consisting of thousands of entries in the following format:

>Entry 1
blahblahblahblahblahblahblahblahblahblahblahbla
blahblahblahblahblahblahblahblahblahblahblahbla
blahblahblahblahblahblahblahblahblahblahblahblah
>Entry 2
etcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetcet
etcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetc
>Entry 1000
blahetcblahetcblahetcblahetcblahetcblahetcblahetcb
ahetcblahetcblahetcblahetcblahetcblahetcblahetcbl
hetcblahetc


I need to cut some entries from the file.  The best way for me to do this would be to use an input file listing the entry names I want removed.

eg.
>Entry1
>Entry 1000
>Entry Infinity

Thus I would provide the entry names as an input file, and the Perl script would iteratively search for each Entry name provided.  When it finds an entry it is searching for it would cut the complete entry from the file.  The '>' sign is the handiest delimiter to use since every entry name is preceded by it.

Any help would be SINCERELY appreciated.

Cheers in Advance,

tb34

0
Comment
Question by:travisbickle34
  • 8
  • 5
  • 3
  • +1
19 Comments
 
LVL 6

Assisted Solution

by:Talmash
Talmash earned 50 total points
Comment Utility
hi ,

why in perl , in case it's not HW , here sm that will help :

open (IN_FILE,"in_file"); # the file with the
@in_file_lines = <IN_FILE>
close(IN_FILE);

open (BIG_FILE,"my_big_file");
@big_file_lines = <BIG_FILE>;
close(BIG_FILE);

open(NEW_FILE,">new_big_file");
$big_file_index = 0;
foreach $bad_entry (@in_file_lines) {
        $bad_entry =~ /Entry\s*(\d*)/;
        $bad_line = $1;
        while ($big_file_index < $bad_line) {
            print NEW_FILE, $big_file_line[$big_file_index];
            $big_file_line++;
         }
         $big_file_line++;
}
close(NEW_FILE);

good luck

tal
0
 

Author Comment

by:travisbickle34
Comment Utility
I'll probably have to alter the script slightly from time to time and I'm just more comfortable with perl!

I should probably point out that the actual syntax of the entry names is as follows:

>ADXCAPD.x.C.y

where x and y are numbers...
0
 

Author Comment

by:travisbickle34
Comment Utility
I've adapted and run the script.  It runs ok but the output file produced is empty.

Any suggestions??
0
 

Author Comment

by:travisbickle34
Comment Utility
I'm increasing the points for a working solution to this problem - as I said it's pretty important! :)

0
 
LVL 28

Assisted Solution

by:FishMonger
FishMonger earned 150 total points
Comment Utility
Here's one that I tested.

open DEL, "<delete.txt" or die $!;
@delete = <DEL>;
chomp @delete;
close DEL;
$delete = join('|', @delete);

open IN, "<travis.txt" or die $!;
{ local $/ = ">";
   while (<IN>) {
      s/>$//;
      next if (/($delete)/ or /^$/);
      $keep{$1} = $_ if /^(Entry.*?\n)/i;
     
   }
}
if (%keep) {
   open OUT, ">travis.txt" or die $!;
   print OUT sort values %keep;
}
0
 
LVL 28

Expert Comment

by:FishMonger
Comment Utility
I forgot, you need to keep the > at the begining of each section.

So, change:
$keep{$1} = $_ if /^(Entry.*?\n)/i;

to this:
$keep{$1} = ">$_" if /^(Entry.*?\n)/i;
0
 
LVL 28

Expert Comment

by:FishMonger
Comment Utility
If you have a large number of sections that need to be removed, it might be faster if we iterate over each element of the @delete array instead of joining the array.
0
 

Author Comment

by:travisbickle34
Comment Utility
Something's not right.

Using this as a file for processing:

>Entry 1
blahblahblahblahblahblahblahblahblahblahblahbla
blahblahblahblahblahblahblahblahblahblahblahbla
blahblahblahblahblahblahblahblahblahblahblahblah
>Entry 2
etcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetcet
etcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetc
>Entry 1000
blahetcblahetcblahetcblahetcblahetcblahetcblahetcb
ahetcblahetcblahetcblahetcblahetcblahetcblahetcbl
hetcblahetc


And using this as a delete list:

>Entry 1
>Entry 2

The output I get is:

>Entry 1
blahblahblahblahblahblahblahblahblahblahblahbla
blahblahblahblahblahblahblahblahblahblahblahbla
blahblahblahblahblahblahblahblahblahblahblahblah
>Entry 1000
blahetcblahetcblahetcblahetcblahetcblahetcblahetcb
ahetcblahetcblahetcblahetcblahetcblahetcblahetcbl
hetcblahetc

>Entry 2
etcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetcet
etcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetcetc



I dunno what's happening  :-\



0
 
LVL 28

Expert Comment

by:FishMonger
Comment Utility
Humm, I must have changed something between the time I tested and the posting.

change these 2 lines:
      next if (/($delete)/ or /^$/);
      $keep{$1} = ">$_" if /^(Entry.*?\n)/i;

to this:
      next if (/^($delete)\n/);
      $keep{$1} = ">$_" if (/^(Entry[^\n]+)/i);
0
Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

 
LVL 84

Expert Comment

by:ozo
Comment Utility
open DEL, "<delete.txt" or die $!;
@delete = <DEL>;
chomp @delete;
close DEL;
$delete = join('|', map"\Q$_\E",@delete);
$delete =qr/^($delete)$/m;
open IN, "<travis.txt" or die $!;
{ local $/ = ">";
   while (<IN>) {
      chomp;
      next unless length;
      s/^/>/;
      next if /$delete/;
      push @keep, $_;
     
   }
}
   open OUT, ">travis.txt" or die $!;
   print OUT @keep;
0
 
LVL 28

Expert Comment

by:FishMonger
Comment Utility
ozo's right, using an array would be better than the hash I used.  And, I'm sure that his method of constructing the regex is better, but I'm not exactly sure why.
0
 

Author Comment

by:travisbickle34
Comment Utility
Both scripts are working perfectly now - thanks guys!

One last question - is there a simple way to make ozo's script output the deleted sequences to a second file?

Also - how can I split the points between you two?  It's only fair I think...
0
 
LVL 6

Expert Comment

by:Talmash
Comment Utility
hi travis , I did not forgot you , just we are not working in the same hours .


open (IN_FILE,"in_file"); # the file with the
@in_file_lines = <IN_FILE>
close(IN_FILE);

open (BIG_FILE,"my_big_file");
@big_file_lines = <BIG_FILE>;
close(BIG_FILE);

open(NEW_FILE,">new_big_file");

to create another file :
my @del_lines;
open (DEL_LINES,">deleted_lines.txt"); # put this line near the "open" of the other file .

$big_file_index = 0;
foreach $bad_entry (@in_file_lines) {
        $bad_entry =~ /Entry\s*(\d*)/;
        $bad_line = $1;
        while ($big_file_index < $bad_line) {
            print NEW_FILE, $big_file_line[$big_file_index];
            $big_file_index++;
         }
         push @del_lines,$big_file_line[$big_file_index];
         $big_file_index++;
}
close(NEW_FILE);
close(DEL_LINES);

tal

0
 
LVL 84

Expert Comment

by:ozo
Comment Utility
  if( /$delete/ ){
        print SECOND_FILE;
    }else{
        push @keep,$_;
    }
0
 

Author Comment

by:travisbickle34
Comment Utility
Ozo - can you edit your alteration into this piece of script?  I seem to be making a bollocks of it somehow :(


open DEL, "<list" or die $!;
@delete = <DEL>;
chomp @delete;
close DEL;
$delete = join('|', map"\Q$_\E",@delete);
$delete =qr/^($delete)$/m;
open IN, "<input" or die $!;
{ local $/ = ">";
   while (<IN>) {
      chomp;
      next unless length;
      s/^/>/;
      next if /$delete/;
      push @keep, $_;

   }
}
   open OUT, ">output" or die $!;
   print OUT @keep;
0
 
LVL 84

Accepted Solution

by:
ozo earned 200 total points
Comment Utility
open DEL, "<list" or die $!;
@delete = <DEL>;
chomp @delete;
close DEL;
$delete = join('|', map"\Q$_\E",@delete);
$delete =qr/^($delete)$/m;
open IN, "<input" or die $!;
open OUT, ">output" or die $!;
open SECOND_FILE,">second.file" or die $!;
{ local $/ = ">";
   while (<IN>) {
      chomp;
      next unless length;
      s/^/>/;
      if( /$delete/ ){
        print SECOND_FILE;
      }else{
        print OUT;
      }
    }
}
close OUT;
close SECOND_FILE;
0
 

Author Comment

by:travisbickle34
Comment Utility
Ok - I don't know what has happened but the above script now just seems to output all entries to both output files.  My head hurts...
0
 

Author Comment

by:travisbickle34
Comment Utility
Actually - it seems to be working fine now!

I don't know what was happening there.  By any chance do entries consisting of only a single line mes up the process somehow?

Regardless - I'm allocating points now.

Thanks guys.
0

Featured Post

Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

Join & Write a Comment

I've just discovered very important differences between Windows an Unix formats in Perl,at least 5.xx.. MOST IMPORTANT: Use Unix file format while saving Your script. otherwise it will have ^M s or smth likely weird in the EOL, Then DO NOT use m…
Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
When you create an app prototype with Adobe XD, you can insert system screens -- sharing or Control Center, for example -- with just a few clicks. This video shows you how. You can take the full course on Experts Exchange at http://bit.ly/XDcourse.

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now