further modify data in text file

I have a text file with which I want to perform some replace operations.

Here's the example [have sort data based on protocol/port/IP basis with a new line character seperating two sorts]:

      disposition      protocol      source                  destination            operator      port-range
      permit            tcp            host_10_13_41_100      host_10_14_5_252      eq            445                  
      permit            tcp            host_10_13_41_120      host_10_14_5_252      eq            445                  

      permit            tcp            10_36_16_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_36_32_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_40_16_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_40_32_0      21      host_10_13_5_129      eq            445                  

      permit            tcp            10_0_0_0      8      host_10_13_5_106      range            1024      65535            
      permit            tcp            10_0_0_0      8      host_10_13_5_107      range            1024      65535            
      permit            tcp            10_0_0_0      8      host_10_13_5_11            range            1024      65535            


1. For every unique protocol/port[-range] need a line as below:
      application protocol_port[_range] protocol <protocol> destination-port <port><[-range]>
      
From example above, output would look like:
      application tcp_445 protocol tcp destination-port 445
      application tcp_1024_65535 protocol tcp destination-port 1024-65535

2. Replace lines with mask with underscore from <tab> [these can be part of either source or destination or be present in both]:
      10_36_16_0      21
      10_36_32_0      21
      10_0_0_0      8
      
To:
      10_36_16_0_21
      10_36_32_0_21
      10_0_0_0_8
      
3. Next for every unique source and destination:
      protocol_port[_range] source [ <source> <source> ] destination [ <destination> <destination> ] application protocol_port[_range]
      
From example above the lines would look like:
      tcp_445      source [ host_10_13_41_100 host_10_13_41_120 ] destination host_10_14_5_252 application tcp_445      
      tcp_445      source [ 10_36_16_0_21 10_36_32_0_21 ] destination host_10_13_5_129 application tcp_445
      tcp_1024_65535 source 10_0_0_0_8 destination [ host_10_13_5_106 host_10_13_5_107 host_10_13_5_11 ] application tcp_1024_65535
      
Thank you for your help in advance.
LVL 32
dpk_walAsked:
Who is Participating?
 
wilcoxonConnect With a Mentor Commented:
This should do it...

I noticed that you said in #3 that it should be unique for source/dest so that's what I did.  What about if the protocol and/or port differ for the same source/dest pair?
#!/usr/local/bin/perl

use strict;
use warnings;

my (%prot, %src);
while (<DATA>) {
    chomp;
    s{^\s+}{};
    s{\s+$}{};
    unless ($. > 1 and not m{^\s*$}) {
        print "$_\n";
        next;
    }
#    s{(\s(?:\d+_){3}\d+)\s+(\d+\s)}{$1$2};
    my @arr = split /\s+/;
    if ($arr[3] =~ m{^\d+$}) {
        splice @arr, 2, 2, "$arr[2]_$arr[3]";
    }
    if ($arr[4] eq 'eq') {
        $prot{$arr[1]}{$arr[5]}{$arr[5]}++;
        $src{$arr[3]}{$arr[2]} = "$arr[1]_$arr[5]";
    } else {
        $prot{$arr[1]}{$arr[5]}{$arr[6]}++;
        $src{$arr[3]}{$arr[2]} = "$arr[1]_$arr[5]_$arr[6]";
    }
    print join("\t", @arr), "\n";
}

print "\n";
foreach my $p (sort keys %prot) {
    foreach my $beg (sort num keys %{$prot{$p}}) {
        foreach my $end (sort num keys %{$prot{$p}{$beg}}) {
            if ($beg == $end) {
                print "application ${p}_$beg protocol $p destination-port $beg\n";
            } else {
                print "application ${p}_${beg}_$end protocol $p destination-port $beg-$end\n";
            }
        }
    }
}

print "\n";
foreach my $d (sort keys %src) {
    my @src = (sort keys %{$src{$d}});
    print "$src{$d}{$src[0]} source[@src] destination $d application $src{$d}{$src[0]}\n";
}

sub num { $a <=> $b }

__DATA__
 disposition      protocol      source                  destination            operator      port-range
      permit            tcp            host_10_13_41_100      host_10_14_5_252      eq            445                  
      permit            tcp            host_10_13_41_120      host_10_14_5_252      eq            445                  

      permit            tcp            10_36_16_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_36_32_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_40_16_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_40_32_0      21      host_10_13_5_129      eq            445                  

      permit            tcp            10_0_0_0      8      host_10_13_5_106      range            1024      65535            
      permit            tcp            10_0_0_0      8      host_10_13_5_107      range            1024      65535            
      permit            tcp            10_0_0_0      8      host_10_13_5_11            range            1024      65535

Open in new window

0
 
dpk_walConnect With a Mentor Author Commented:
Good point raised by you; there are few lines where I have more than one protocol, eg,
      permit            tcp            10_36_16_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_36_32_0      21      host_10_13_5_129      eq            445                  
      permit            udp            10_40_16_0      21      host_10_13_5_129      eq            445                  
      permit            udp            10_40_32_0      21      host_10_13_5_129      eq            445                  

or more than one port:
      permit            tcp            10_36_16_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_36_32_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_40_16_0      21      host_10_13_5_129      eq            446                  
      permit            tcp            10_40_32_0      21      host_10_13_5_129      eq            446                  

As I stated in my original post; the lines are separated by new line character if that helps anything.

I would try the code tomorrow and update on results.

Thank you.
0
 
wilcoxonCommented:
The new output from my script won't be as pretty as the original (just separated columns with tabs).  Will lines with a mask always be quad format (eg xxx_xxx_xxx_xxx  yyy) and not named (host_10_13_41_100)?  If so, you can uncomment the regex at line 15, add print "$_\n" as the next line, and remove the print at line 27 - then the output will maintain the same form as the input except for the merging of mask onto the quad.

Although, rereading, I don't see that you say you want the lines in #2 printed out.  If that's the case just remove the print at line 27 (and they won't be printed).
0
Upgrade your Question Security!

Your question, your audience. Choose who sees your identity—and your question—with question security.

 
dpk_walAuthor Commented:
Looked at solution again; we are appending the data at the bottom of the script; can we read the data from a file instead; I have 20k lines in the text file.
0
 
wilcoxonCommented:
Sure.  Just change line 7 ("while (<DATA>) {") to:

my $fil = shift or die "Usage: $0 input_file\n";
open IN, $fil or die "could not open $fil: $!";
while (<IN>) {

Now the script takes the input file name as a command line argument.
0
 
dpk_walAuthor Commented:
Not working exactly as I expected; but thank you for your effort.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.