further modify data in text file

Posted on 2011-02-19
Last Modified: 2012-05-11
I have a text file with which I want to perform some replace operations.

Here's the example [have sort data based on protocol/port/IP basis with a new line character seperating two sorts]:

      disposition      protocol      source                  destination            operator      port-range
      permit            tcp            host_10_13_41_100      host_10_14_5_252      eq            445                  
      permit            tcp            host_10_13_41_120      host_10_14_5_252      eq            445                  

      permit            tcp            10_36_16_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_36_32_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_40_16_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_40_32_0      21      host_10_13_5_129      eq            445                  

      permit            tcp            10_0_0_0      8      host_10_13_5_106      range            1024      65535            
      permit            tcp            10_0_0_0      8      host_10_13_5_107      range            1024      65535            
      permit            tcp            10_0_0_0      8      host_10_13_5_11            range            1024      65535            

1. For every unique protocol/port[-range] need a line as below:
      application protocol_port[_range] protocol <protocol> destination-port <port><[-range]>
From example above, output would look like:
      application tcp_445 protocol tcp destination-port 445
      application tcp_1024_65535 protocol tcp destination-port 1024-65535

2. Replace lines with mask with underscore from <tab> [these can be part of either source or destination or be present in both]:
      10_36_16_0      21
      10_36_32_0      21
      10_0_0_0      8
3. Next for every unique source and destination:
      protocol_port[_range] source [ <source> <source> ] destination [ <destination> <destination> ] application protocol_port[_range]
From example above the lines would look like:
      tcp_445      source [ host_10_13_41_100 host_10_13_41_120 ] destination host_10_14_5_252 application tcp_445      
      tcp_445      source [ 10_36_16_0_21 10_36_32_0_21 ] destination host_10_13_5_129 application tcp_445
      tcp_1024_65535 source 10_0_0_0_8 destination [ host_10_13_5_106 host_10_13_5_107 host_10_13_5_11 ] application tcp_1024_65535
Thank you for your help in advance.
Question by:dpk_wal
  • 3
  • 3
LVL 26

Accepted Solution

wilcoxon earned 500 total points
ID: 34935878
This should do it...

I noticed that you said in #3 that it should be unique for source/dest so that's what I did.  What about if the protocol and/or port differ for the same source/dest pair?

use strict;
use warnings;

my (%prot, %src);
while (<DATA>) {
    unless ($. > 1 and not m{^\s*$}) {
        print "$_\n";
#    s{(\s(?:\d+_){3}\d+)\s+(\d+\s)}{$1$2};
    my @arr = split /\s+/;
    if ($arr[3] =~ m{^\d+$}) {
        splice @arr, 2, 2, "$arr[2]_$arr[3]";
    if ($arr[4] eq 'eq') {
        $src{$arr[3]}{$arr[2]} = "$arr[1]_$arr[5]";
    } else {
        $src{$arr[3]}{$arr[2]} = "$arr[1]_$arr[5]_$arr[6]";
    print join("\t", @arr), "\n";

print "\n";
foreach my $p (sort keys %prot) {
    foreach my $beg (sort num keys %{$prot{$p}}) {
        foreach my $end (sort num keys %{$prot{$p}{$beg}}) {
            if ($beg == $end) {
                print "application ${p}_$beg protocol $p destination-port $beg\n";
            } else {
                print "application ${p}_${beg}_$end protocol $p destination-port $beg-$end\n";

print "\n";
foreach my $d (sort keys %src) {
    my @src = (sort keys %{$src{$d}});
    print "$src{$d}{$src[0]} source[@src] destination $d application $src{$d}{$src[0]}\n";

sub num { $a <=> $b }

 disposition      protocol      source                  destination            operator      port-range
      permit            tcp            host_10_13_41_100      host_10_14_5_252      eq            445                  
      permit            tcp            host_10_13_41_120      host_10_14_5_252      eq            445                  

      permit            tcp            10_36_16_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_36_32_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_40_16_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_40_32_0      21      host_10_13_5_129      eq            445                  

      permit            tcp            10_0_0_0      8      host_10_13_5_106      range            1024      65535            
      permit            tcp            10_0_0_0      8      host_10_13_5_107      range            1024      65535            
      permit            tcp            10_0_0_0      8      host_10_13_5_11            range            1024      65535

Open in new window

LVL 32

Assisted Solution

dpk_wal earned 0 total points
ID: 34937298
Good point raised by you; there are few lines where I have more than one protocol, eg,
      permit            tcp            10_36_16_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_36_32_0      21      host_10_13_5_129      eq            445                  
      permit            udp            10_40_16_0      21      host_10_13_5_129      eq            445                  
      permit            udp            10_40_32_0      21      host_10_13_5_129      eq            445                  

or more than one port:
      permit            tcp            10_36_16_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_36_32_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_40_16_0      21      host_10_13_5_129      eq            446                  
      permit            tcp            10_40_32_0      21      host_10_13_5_129      eq            446                  

As I stated in my original post; the lines are separated by new line character if that helps anything.

I would try the code tomorrow and update on results.

Thank you.
LVL 26

Expert Comment

ID: 34937880
The new output from my script won't be as pretty as the original (just separated columns with tabs).  Will lines with a mask always be quad format (eg xxx_xxx_xxx_xxx  yyy) and not named (host_10_13_41_100)?  If so, you can uncomment the regex at line 15, add print "$_\n" as the next line, and remove the print at line 27 - then the output will maintain the same form as the input except for the merging of mask onto the quad.

Although, rereading, I don't see that you say you want the lines in #2 printed out.  If that's the case just remove the print at line 27 (and they won't be printed).
Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

LVL 32

Author Comment

ID: 34940395
Looked at solution again; we are appending the data at the bottom of the script; can we read the data from a file instead; I have 20k lines in the text file.
LVL 26

Expert Comment

ID: 34940527
Sure.  Just change line 7 ("while (<DATA>) {") to:

my $fil = shift or die "Usage: $0 input_file\n";
open IN, $fil or die "could not open $fil: $!";
while (<IN>) {

Now the script takes the input file name as a command line argument.
LVL 32

Author Closing Comment

ID: 34986375
Not working exactly as I expected; but thank you for your effort.

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Unix/bash: scripted arithmetic 13 101
shell script 2 38
how to pick specific file from ftp 13 64
Powering LED Strips Via the plug and controlling them via USB or Rasberry Pie, 1 36
I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
How to remove superseded packages in windows w60 or w61 installation media (.wim) or online system to prevent unnecessary space. w60 means Windows Vista or Windows Server 2008. w61 means Windows 7 or Windows Server 2008 R2. There are various …
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

828 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question