further modify data in text file

I have a text file with which I want to perform some replace operations.

Here's the example [have sort data based on protocol/port/IP basis with a new line character seperating two sorts]:

      disposition      protocol      source                  destination            operator      port-range
      permit            tcp            host_10_13_41_100      host_10_14_5_252      eq            445                  
      permit            tcp            host_10_13_41_120      host_10_14_5_252      eq            445                  

      permit            tcp            10_36_16_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_36_32_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_40_16_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_40_32_0      21      host_10_13_5_129      eq            445                  

      permit            tcp            10_0_0_0      8      host_10_13_5_106      range            1024      65535            
      permit            tcp            10_0_0_0      8      host_10_13_5_107      range            1024      65535            
      permit            tcp            10_0_0_0      8      host_10_13_5_11            range            1024      65535            

1. For every unique protocol/port[-range] need a line as below:
      application protocol_port[_range] protocol <protocol> destination-port <port><[-range]>
From example above, output would look like:
      application tcp_445 protocol tcp destination-port 445
      application tcp_1024_65535 protocol tcp destination-port 1024-65535

2. Replace lines with mask with underscore from <tab> [these can be part of either source or destination or be present in both]:
      10_36_16_0      21
      10_36_32_0      21
      10_0_0_0      8
3. Next for every unique source and destination:
      protocol_port[_range] source [ <source> <source> ] destination [ <destination> <destination> ] application protocol_port[_range]
From example above the lines would look like:
      tcp_445      source [ host_10_13_41_100 host_10_13_41_120 ] destination host_10_14_5_252 application tcp_445      
      tcp_445      source [ 10_36_16_0_21 10_36_32_0_21 ] destination host_10_13_5_129 application tcp_445
      tcp_1024_65535 source 10_0_0_0_8 destination [ host_10_13_5_106 host_10_13_5_107 host_10_13_5_11 ] application tcp_1024_65535
Thank you for your help in advance.
LVL 32
Who is Participating?
wilcoxonConnect With a Mentor Commented:
This should do it...

I noticed that you said in #3 that it should be unique for source/dest so that's what I did.  What about if the protocol and/or port differ for the same source/dest pair?

use strict;
use warnings;

my (%prot, %src);
while (<DATA>) {
    unless ($. > 1 and not m{^\s*$}) {
        print "$_\n";
#    s{(\s(?:\d+_){3}\d+)\s+(\d+\s)}{$1$2};
    my @arr = split /\s+/;
    if ($arr[3] =~ m{^\d+$}) {
        splice @arr, 2, 2, "$arr[2]_$arr[3]";
    if ($arr[4] eq 'eq') {
        $src{$arr[3]}{$arr[2]} = "$arr[1]_$arr[5]";
    } else {
        $src{$arr[3]}{$arr[2]} = "$arr[1]_$arr[5]_$arr[6]";
    print join("\t", @arr), "\n";

print "\n";
foreach my $p (sort keys %prot) {
    foreach my $beg (sort num keys %{$prot{$p}}) {
        foreach my $end (sort num keys %{$prot{$p}{$beg}}) {
            if ($beg == $end) {
                print "application ${p}_$beg protocol $p destination-port $beg\n";
            } else {
                print "application ${p}_${beg}_$end protocol $p destination-port $beg-$end\n";

print "\n";
foreach my $d (sort keys %src) {
    my @src = (sort keys %{$src{$d}});
    print "$src{$d}{$src[0]} source[@src] destination $d application $src{$d}{$src[0]}\n";

sub num { $a <=> $b }

 disposition      protocol      source                  destination            operator      port-range
      permit            tcp            host_10_13_41_100      host_10_14_5_252      eq            445                  
      permit            tcp            host_10_13_41_120      host_10_14_5_252      eq            445                  

      permit            tcp            10_36_16_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_36_32_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_40_16_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_40_32_0      21      host_10_13_5_129      eq            445                  

      permit            tcp            10_0_0_0      8      host_10_13_5_106      range            1024      65535            
      permit            tcp            10_0_0_0      8      host_10_13_5_107      range            1024      65535            
      permit            tcp            10_0_0_0      8      host_10_13_5_11            range            1024      65535

Open in new window

dpk_walConnect With a Mentor Author Commented:
Good point raised by you; there are few lines where I have more than one protocol, eg,
      permit            tcp            10_36_16_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_36_32_0      21      host_10_13_5_129      eq            445                  
      permit            udp            10_40_16_0      21      host_10_13_5_129      eq            445                  
      permit            udp            10_40_32_0      21      host_10_13_5_129      eq            445                  

or more than one port:
      permit            tcp            10_36_16_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_36_32_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_40_16_0      21      host_10_13_5_129      eq            446                  
      permit            tcp            10_40_32_0      21      host_10_13_5_129      eq            446                  

As I stated in my original post; the lines are separated by new line character if that helps anything.

I would try the code tomorrow and update on results.

Thank you.
The new output from my script won't be as pretty as the original (just separated columns with tabs).  Will lines with a mask always be quad format (eg xxx_xxx_xxx_xxx  yyy) and not named (host_10_13_41_100)?  If so, you can uncomment the regex at line 15, add print "$_\n" as the next line, and remove the print at line 27 - then the output will maintain the same form as the input except for the merging of mask onto the quad.

Although, rereading, I don't see that you say you want the lines in #2 printed out.  If that's the case just remove the print at line 27 (and they won't be printed).
Upgrade your Question Security!

Your question, your audience. Choose who sees your identity—and your question—with question security.

dpk_walAuthor Commented:
Looked at solution again; we are appending the data at the bottom of the script; can we read the data from a file instead; I have 20k lines in the text file.
Sure.  Just change line 7 ("while (<DATA>) {") to:

my $fil = shift or die "Usage: $0 input_file\n";
open IN, $fil or die "could not open $fil: $!";
while (<IN>) {

Now the script takes the input file name as a command line argument.
dpk_walAuthor Commented:
Not working exactly as I expected; but thank you for your effort.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.