Solved

further modify data in text file

Posted on 2011-02-19
6
216 Views
Last Modified: 2012-05-11
I have a text file with which I want to perform some replace operations.

Here's the example [have sort data based on protocol/port/IP basis with a new line character seperating two sorts]:

      disposition      protocol      source                  destination            operator      port-range
      permit            tcp            host_10_13_41_100      host_10_14_5_252      eq            445                  
      permit            tcp            host_10_13_41_120      host_10_14_5_252      eq            445                  

      permit            tcp            10_36_16_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_36_32_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_40_16_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_40_32_0      21      host_10_13_5_129      eq            445                  

      permit            tcp            10_0_0_0      8      host_10_13_5_106      range            1024      65535            
      permit            tcp            10_0_0_0      8      host_10_13_5_107      range            1024      65535            
      permit            tcp            10_0_0_0      8      host_10_13_5_11            range            1024      65535            


1. For every unique protocol/port[-range] need a line as below:
      application protocol_port[_range] protocol <protocol> destination-port <port><[-range]>
      
From example above, output would look like:
      application tcp_445 protocol tcp destination-port 445
      application tcp_1024_65535 protocol tcp destination-port 1024-65535

2. Replace lines with mask with underscore from <tab> [these can be part of either source or destination or be present in both]:
      10_36_16_0      21
      10_36_32_0      21
      10_0_0_0      8
      
To:
      10_36_16_0_21
      10_36_32_0_21
      10_0_0_0_8
      
3. Next for every unique source and destination:
      protocol_port[_range] source [ <source> <source> ] destination [ <destination> <destination> ] application protocol_port[_range]
      
From example above the lines would look like:
      tcp_445      source [ host_10_13_41_100 host_10_13_41_120 ] destination host_10_14_5_252 application tcp_445      
      tcp_445      source [ 10_36_16_0_21 10_36_32_0_21 ] destination host_10_13_5_129 application tcp_445
      tcp_1024_65535 source 10_0_0_0_8 destination [ host_10_13_5_106 host_10_13_5_107 host_10_13_5_11 ] application tcp_1024_65535
      
Thank you for your help in advance.
0
Comment
Question by:dpk_wal
  • 3
  • 3
6 Comments
 
LVL 26

Accepted Solution

by:
wilcoxon earned 500 total points
ID: 34935878
This should do it...

I noticed that you said in #3 that it should be unique for source/dest so that's what I did.  What about if the protocol and/or port differ for the same source/dest pair?
#!/usr/local/bin/perl

use strict;
use warnings;

my (%prot, %src);
while (<DATA>) {
    chomp;
    s{^\s+}{};
    s{\s+$}{};
    unless ($. > 1 and not m{^\s*$}) {
        print "$_\n";
        next;
    }
#    s{(\s(?:\d+_){3}\d+)\s+(\d+\s)}{$1$2};
    my @arr = split /\s+/;
    if ($arr[3] =~ m{^\d+$}) {
        splice @arr, 2, 2, "$arr[2]_$arr[3]";
    }
    if ($arr[4] eq 'eq') {
        $prot{$arr[1]}{$arr[5]}{$arr[5]}++;
        $src{$arr[3]}{$arr[2]} = "$arr[1]_$arr[5]";
    } else {
        $prot{$arr[1]}{$arr[5]}{$arr[6]}++;
        $src{$arr[3]}{$arr[2]} = "$arr[1]_$arr[5]_$arr[6]";
    }
    print join("\t", @arr), "\n";
}

print "\n";
foreach my $p (sort keys %prot) {
    foreach my $beg (sort num keys %{$prot{$p}}) {
        foreach my $end (sort num keys %{$prot{$p}{$beg}}) {
            if ($beg == $end) {
                print "application ${p}_$beg protocol $p destination-port $beg\n";
            } else {
                print "application ${p}_${beg}_$end protocol $p destination-port $beg-$end\n";
            }
        }
    }
}

print "\n";
foreach my $d (sort keys %src) {
    my @src = (sort keys %{$src{$d}});
    print "$src{$d}{$src[0]} source[@src] destination $d application $src{$d}{$src[0]}\n";
}

sub num { $a <=> $b }

__DATA__
 disposition      protocol      source                  destination            operator      port-range
      permit            tcp            host_10_13_41_100      host_10_14_5_252      eq            445                  
      permit            tcp            host_10_13_41_120      host_10_14_5_252      eq            445                  

      permit            tcp            10_36_16_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_36_32_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_40_16_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_40_32_0      21      host_10_13_5_129      eq            445                  

      permit            tcp            10_0_0_0      8      host_10_13_5_106      range            1024      65535            
      permit            tcp            10_0_0_0      8      host_10_13_5_107      range            1024      65535            
      permit            tcp            10_0_0_0      8      host_10_13_5_11            range            1024      65535

Open in new window

0
 
LVL 32

Assisted Solution

by:dpk_wal
dpk_wal earned 0 total points
ID: 34937298
Good point raised by you; there are few lines where I have more than one protocol, eg,
      permit            tcp            10_36_16_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_36_32_0      21      host_10_13_5_129      eq            445                  
      permit            udp            10_40_16_0      21      host_10_13_5_129      eq            445                  
      permit            udp            10_40_32_0      21      host_10_13_5_129      eq            445                  

or more than one port:
      permit            tcp            10_36_16_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_36_32_0      21      host_10_13_5_129      eq            445                  
      permit            tcp            10_40_16_0      21      host_10_13_5_129      eq            446                  
      permit            tcp            10_40_32_0      21      host_10_13_5_129      eq            446                  

As I stated in my original post; the lines are separated by new line character if that helps anything.

I would try the code tomorrow and update on results.

Thank you.
0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 34937880
The new output from my script won't be as pretty as the original (just separated columns with tabs).  Will lines with a mask always be quad format (eg xxx_xxx_xxx_xxx  yyy) and not named (host_10_13_41_100)?  If so, you can uncomment the regex at line 15, add print "$_\n" as the next line, and remove the print at line 27 - then the output will maintain the same form as the input except for the merging of mask onto the quad.

Although, rereading, I don't see that you say you want the lines in #2 printed out.  If that's the case just remove the print at line 27 (and they won't be printed).
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 32

Author Comment

by:dpk_wal
ID: 34940395
Looked at solution again; we are appending the data at the bottom of the script; can we read the data from a file instead; I have 20k lines in the text file.
0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 34940527
Sure.  Just change line 7 ("while (<DATA>) {") to:

my $fil = shift or die "Usage: $0 input_file\n";
open IN, $fil or die "could not open $fil: $!";
while (<IN>) {

Now the script takes the input file name as a command line argument.
0
 
LVL 32

Author Closing Comment

by:dpk_wal
ID: 34986375
Not working exactly as I expected; but thank you for your effort.
0

Featured Post

Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
Recently, an awarded photographer, Selina De Maeyer (http://www.selinademaeyer.com/), completed a photo shoot of a beautiful event (http://www.sintjacobantwerpen.be/verslag-en-fotoreportage-van-de-sacramentsprocessie-door-antwerpen#thumbnails) in An…
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now