further modify data in text file

Posted on 2011-02-19
Last Modified: 2012-05-11
I have a text file with which I want to perform some replace operations.

Here's the example [have sort data based on protocol/port/IP basis with a new line character seperating two sorts]:

      disposition      protocol      source                  destination            operator      port-range
      permit            tcp            host_10_13_41_100      host_10_14_5_252      eq            445                  
      permit            tcp            host_10_13_41_120      host_10_14_5_252      eq            445                  

      permit            tcp            10_36_16_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_36_32_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_40_16_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_40_32_0      21      host_10_13_5_129      eq            445                  

      permit            tcp            10_0_0_0      8      host_10_13_5_106      range            1024      65535            
      permit            tcp            10_0_0_0      8      host_10_13_5_107      range            1024      65535            
      permit            tcp            10_0_0_0      8      host_10_13_5_11            range            1024      65535            

1. For every unique protocol/port[-range] need a line as below:
      application protocol_port[_range] protocol <protocol> destination-port <port><[-range]>
From example above, output would look like:
      application tcp_445 protocol tcp destination-port 445
      application tcp_1024_65535 protocol tcp destination-port 1024-65535

2. Replace lines with mask with underscore from <tab> [these can be part of either source or destination or be present in both]:
      10_36_16_0      21
      10_36_32_0      21
      10_0_0_0      8
3. Next for every unique source and destination:
      protocol_port[_range] source [ <source> <source> ] destination [ <destination> <destination> ] application protocol_port[_range]
From example above the lines would look like:
      tcp_445      source [ host_10_13_41_100 host_10_13_41_120 ] destination host_10_14_5_252 application tcp_445      
      tcp_445      source [ 10_36_16_0_21 10_36_32_0_21 ] destination host_10_13_5_129 application tcp_445
      tcp_1024_65535 source 10_0_0_0_8 destination [ host_10_13_5_106 host_10_13_5_107 host_10_13_5_11 ] application tcp_1024_65535
Thank you for your help in advance.
Question by:dpk_wal
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
LVL 26

Accepted Solution

wilcoxon earned 500 total points
ID: 34935878
This should do it...

I noticed that you said in #3 that it should be unique for source/dest so that's what I did.  What about if the protocol and/or port differ for the same source/dest pair?

use strict;
use warnings;

my (%prot, %src);
while (<DATA>) {
    unless ($. > 1 and not m{^\s*$}) {
        print "$_\n";
#    s{(\s(?:\d+_){3}\d+)\s+(\d+\s)}{$1$2};
    my @arr = split /\s+/;
    if ($arr[3] =~ m{^\d+$}) {
        splice @arr, 2, 2, "$arr[2]_$arr[3]";
    if ($arr[4] eq 'eq') {
        $src{$arr[3]}{$arr[2]} = "$arr[1]_$arr[5]";
    } else {
        $src{$arr[3]}{$arr[2]} = "$arr[1]_$arr[5]_$arr[6]";
    print join("\t", @arr), "\n";

print "\n";
foreach my $p (sort keys %prot) {
    foreach my $beg (sort num keys %{$prot{$p}}) {
        foreach my $end (sort num keys %{$prot{$p}{$beg}}) {
            if ($beg == $end) {
                print "application ${p}_$beg protocol $p destination-port $beg\n";
            } else {
                print "application ${p}_${beg}_$end protocol $p destination-port $beg-$end\n";

print "\n";
foreach my $d (sort keys %src) {
    my @src = (sort keys %{$src{$d}});
    print "$src{$d}{$src[0]} source[@src] destination $d application $src{$d}{$src[0]}\n";

sub num { $a <=> $b }

 disposition      protocol      source                  destination            operator      port-range
      permit            tcp            host_10_13_41_100      host_10_14_5_252      eq            445                  
      permit            tcp            host_10_13_41_120      host_10_14_5_252      eq            445                  

      permit            tcp            10_36_16_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_36_32_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_40_16_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_40_32_0      21      host_10_13_5_129      eq            445                  

      permit            tcp            10_0_0_0      8      host_10_13_5_106      range            1024      65535            
      permit            tcp            10_0_0_0      8      host_10_13_5_107      range            1024      65535            
      permit            tcp            10_0_0_0      8      host_10_13_5_11            range            1024      65535

Open in new window

LVL 32

Assisted Solution

dpk_wal earned 0 total points
ID: 34937298
Good point raised by you; there are few lines where I have more than one protocol, eg,
      permit            tcp            10_36_16_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_36_32_0      21      host_10_13_5_129      eq            445                  
      permit            udp            10_40_16_0      21      host_10_13_5_129      eq            445                  
      permit            udp            10_40_32_0      21      host_10_13_5_129      eq            445                  

or more than one port:
      permit            tcp            10_36_16_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_36_32_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_40_16_0      21      host_10_13_5_129      eq            446                  
      permit            tcp            10_40_32_0      21      host_10_13_5_129      eq            446                  

As I stated in my original post; the lines are separated by new line character if that helps anything.

I would try the code tomorrow and update on results.

Thank you.
LVL 26

Expert Comment

ID: 34937880
The new output from my script won't be as pretty as the original (just separated columns with tabs).  Will lines with a mask always be quad format (eg xxx_xxx_xxx_xxx  yyy) and not named (host_10_13_41_100)?  If so, you can uncomment the regex at line 15, add print "$_\n" as the next line, and remove the print at line 27 - then the output will maintain the same form as the input except for the merging of mask onto the quad.

Although, rereading, I don't see that you say you want the lines in #2 printed out.  If that's the case just remove the print at line 27 (and they won't be printed).
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

LVL 32

Author Comment

ID: 34940395
Looked at solution again; we are appending the data at the bottom of the script; can we read the data from a file instead; I have 20k lines in the text file.
LVL 26

Expert Comment

ID: 34940527
Sure.  Just change line 7 ("while (<DATA>) {") to:

my $fil = shift or die "Usage: $0 input_file\n";
open IN, $fil or die "could not open $fil: $!";
while (<IN>) {

Now the script takes the input file name as a command line argument.
LVL 32

Author Closing Comment

ID: 34986375
Not working exactly as I expected; but thank you for your effort.

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Active Directory replication delay is the cause to many problems.  Here is a super easy script to force Active Directory replication to all sites with by using an elevated PowerShell command prompt, and a tool to verify your changes.
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

688 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question