Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

extract and modify data from text file

Posted on 2011-02-21
5
Medium Priority
?
406 Views
Last Modified: 2012-05-11
Hi,

I have a large text file (20k lines) with entries like below:
# disposition      protocol      source                   destination            operator      port-range ###header for explanation; does not exist in original file
      permit      tcp                 10_12_10_0      23      host_10_14_0_181      range      1525      1527            
      permit      tcp      10_12_10_0      23      host_10_14_4_16            range      1525      1527            
      permit      tcp      10_12_10_0      23      host_10_14_5_217      range      1525      1527            
      permit      tcp      10_12_10_0      23      host_10_14_5_218      range      1525      1527            

      permit      tcp      10_119_160_0      24      host_10_14_0_157      eq      1526                  
      permit      tcp      10_97_163_0      24      host_10_14_0_157      eq      1526                  
      permit      tcp      10_24_18_0      24      host_10_14_0_157      eq      1526                  

      permit      tcp      host_10_14_1_40            host_10_13_5_44      range      1531      1550
      permit      tcp      host_10_14_1_50            host_10_13_5_44      range      1531      1550
      permit      tcp      10_14_42_0      24      host_10_13_5_44      range      1531      1550
      permit      tcp      host_10_14_1_40            host_10_13_5_46      range      1531      1550
      permit      tcp      host_10_14_1_50            host_10_13_5_46      range      1531      1550
      permit      tcp      10_14_42_0      24      host_10_13_5_46      range      1531      1550

I want to format the data as below:
source <unique_source> destination <unique_destination> application <protocol>_<port>[-<range>]

All IP subnets starting with 10_ and followed by two letter mask; should get listed as subnet_mask in the final output. Eg, in text above, 10_12_10_0      23 should get listed as 10_12_10_0_23

A group of lines are separated by a blank line as shown above. So in every group we want unique host IP or subnet IP and would put them in [] square brackets if they are more than one for a specific source or destination.
There is a possibility that an IP address might be same between two groups but that should not get clubbed together.

All groups have same port or port range; there is a possibility that the protocol might be both tcp and udp, for eg,
      permit      tcp      10_97_163_0      24      host_10_14_0_157      eq      1526                  
      permit      udp      10_97_163_0      24      host_10_14_0_157      eq      1526                  
In above case, application should get reported as tcp_udp_port[-range]. If this is tough to code then I can remove such lines and only have lines where the port/protocol are same for a single group.

Working on text above, it needs to be formatted as:

source 10_12_10_0_23 destination [ host_10_14_0_181 host_10_14_4_16 host_10_14_5_217 host_10_14_5_218 ] application tcp_1525_1527

source [ 10_119_160_0_24 10_97_163_0_24 10_24_18_0_24 ] destination host_10_14_0_157 application tcp_1526

source [ host_10_14_1_40 host_10_14_1_50 10_14_42_0_24 ] destination [ host_10_13_5_44 host_10_13_5_46] application tcp_1531_1550

Sorry for the long question.
0
Comment
Question by:dpk_wal
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
5 Comments
 
LVL 16

Expert Comment

by:sjklein42
ID: 34945660
How close did I come?

sub FlushIt
{
	my @sources = sort(keys(%sources));
	my $sourceCount = @sources;
	my $sourceString = ($sourceCount > 1 ) ? ( "[ " . join(" ",@sources) . " ]" ) : $sources[0];

	my @destinations = sort(keys(%destinations));
	my $destinationCount = @destinations;
	my $destinationString = ($destinationCount > 1 ) ? ( "[ " . join(" ",@destinations) . " ]" ) : $destinations[0];

	my @protocols = sort(keys(%protocols));
	my $protocolCount = @protocols;
	my $protocolString = join("_",@protocols) . "_" . $minPort . (( $maxPort ne '' ) ? ("_" . $maxPort) : '');

	print "source $sourceString destination $destinationString application $protocolString\n";

	undef %sources;
	undef %destinations;
	undef %protocols;
}


while ( <> )
{
	s/[\r\n]//g;

	if ( $_ eq '' ) { FlushIt(); }
	else
	{
		# All IP subnets starting with 10_ and followed by two digit mask;
		# should get listed as subnet_mask in the final output.
		# Eg, in text above, 10_12_10_0      23 should get listed as 10_12_10_0_23

		s/(\s+10\_[0-9\_]+)\s+([0-9][0-9])(\s+)/$1\_$2$3/;

		#       permit      tcp      host_10_14_1_50            host_10_13_5_46      range      1531      1550

		s/^\s+//;		# trim leading spaces
		s/\s+$//;		# trim trailing spaces

		($disposition, $protocol, $source, $destination, $operator, $minPort, $maxPort) = split(/\s+/);
		##print STDERR join("\n", $disposition, $protocol, $source, $destination, $operator, $minPort, $maxPort) . "\n\n";

		$sources{$source} = 1;
		$destinations{$destination} = 1;
		$protocols{$protocol} = 1;
	}
}

FlushIt();

Open in new window



Input:
      permit      tcp                 10_12_10_0      23      host_10_14_0_181      range      1525      1527            
      permit      tcp      10_12_10_0      23      host_10_14_4_16            range      1525      1527            
      permit      tcp      10_12_10_0      23      host_10_14_5_217      range      1525      1527            
      permit      tcp      10_12_10_0      23      host_10_14_5_218      range      1525      1527            

      permit      tcp      10_119_160_0      24      host_10_14_0_157      eq      1526                  
      permit      tcp      10_97_163_0      24      host_10_14_0_157      eq      1526                  
      permit      tcp      10_24_18_0      24      host_10_14_0_157      eq      1526                  

      permit      tcp      host_10_14_1_40            host_10_13_5_44      range      1531      1550
      permit      tcp      host_10_14_1_50            host_10_13_5_44      range      1531      1550
      permit      tcp      10_14_42_0      24      host_10_13_5_44      range      1531      1550
      permit      tcp      host_10_14_1_40            host_10_13_5_46      range      1531      1550
      permit      tcp      host_10_14_1_50            host_10_13_5_46      range      1531      1550
      permit      tcp      10_14_42_0      24      host_10_13_5_46      range      1531      1550

      permit      tcp      10_97_163_0      24      host_10_14_0_157      eq      1526                  
      permit      udp      10_97_163_0      24      host_10_14_0_157      eq      1526                  

Open in new window


Output:

c:\temp>perl foo.pl foo.txt
source 10_12_10_0_23 destination [ host_10_14_0_181 host_10_14_4_16 host_10_14_5_217 host_10_14_5_218 ] application tcp_1525_1527
source [ 10_119_160_0_24 10_24_18_0_24 10_97_163_0_24 ] destination host_10_14_0_157 application tcp_1526
source [ 10_14_42_0_24 host_10_14_1_40 host_10_14_1_50 ] destination [ host_10_13_5_44 host_10_13_5_46 ] application tcp_1531_1550
source 10_97_163_0_24 destination host_10_14_0_157 application tcp_udp_1526

Open in new window

0
 
LVL 32

Author Comment

by:dpk_wal
ID: 34946088
Works great;Thank you!
just one problem; if I have a subnet in destination; the subnet mask is getting truncated.

For example; if I change the lines in sample output as below:
Input:
      permit      tcp      10_12_10_0      23      10_12_10_0      23      range      1525      1527            
      permit      tcp      10_12_10_0      23      10_12_10_0      23      range      1525      1527            
      permit      tcp      10_12_10_0      23      10_12_10_0      23      range      1525      1527            
      permit      tcp      10_12_10_0      23      10_12_10_0      23      range      1525      1527            
Output:
-bash-2.05b$ perl flushit.pl subMask
source 10_12_10_0_23 destination 10_12_10_0 application tcp_range_1525

Also, in such cases I think the port range is also not getting captured.

If the address is 10_ then it would be followed by two digit subnet mask; if the address is host_ then it would be single address. We can have host_ or 10_ addresses for both source and destination.

Thank you.
0
 
LVL 32

Author Comment

by:dpk_wal
ID: 34946145
The mask followed by 10_ address can even be single digit, but would be 10_x_x_x space or tab and then mask; eg, 10_0_0_0     8

Thank you for all your help and support; really appreciate it!
0
 
LVL 16

Accepted Solution

by:
sjklein42 earned 2000 total points
ID: 34946284
Changed so that more than one of the 10_... mask pairs can appear on a single line.

Also changed so that the mask can be one or two digits, not just two digits.

sub FlushIt
{
	my @sources = sort(keys(%sources));
	my $sourceCount = @sources;
	my $sourceString = ($sourceCount > 1 ) ? ( "[ " . join(" ",@sources) . " ]" ) : $sources[0];

	my @destinations = sort(keys(%destinations));
	my $destinationCount = @destinations;
	my $destinationString = ($destinationCount > 1 ) ? ( "[ " . join(" ",@destinations) . " ]" ) : $destinations[0];

	my @protocols = sort(keys(%protocols));
	my $protocolCount = @protocols;
	my $protocolString = join("_",@protocols) . "_" . $minPort . (( $maxPort ne '' ) ? ("_" . $maxPort) : '');

	print "source $sourceString destination $destinationString application $protocolString\n";

	undef %sources;
	undef %destinations;
	undef %protocols;
}


while ( <> )
{
	s/[\r\n]//g;

	if ( $_ eq '' ) { FlushIt(); }
	else
	{
		# All IP subnets starting with 10_ and followed by two digit mask;
		# should get listed as subnet_mask in the final output.
		# Eg, in text above, 10_12_10_0      23 should get listed as 10_12_10_0_23

		while ( s/(\s+10\_[0-9\_]+)\s+([0-9]+)(\s+)/$1\_$2$3/ ) {}

		#       permit      tcp      host_10_14_1_50            host_10_13_5_46      range      1531      1550

		s/^\s+//;		# trim leading spaces
		s/\s+$//;		# trim trailing spaces

		($disposition, $protocol, $source, $destination, $operator, $minPort, $maxPort) = split(/\s+/);
		##print STDERR join("\n", $disposition, $protocol, $source, $destination, $operator, $minPort, $maxPort) . "\n\n";

		$sources{$source} = 1;
		$destinations{$destination} = 1;
		$protocols{$protocol} = 1;
	}
}

FlushIt();

Open in new window

0
 
LVL 32

Author Closing Comment

by:dpk_wal
ID: 34948767
Worked like charm!! Many thanks! :)
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

How to remove superseded packages in windows w60 or w61 installation media (.wim) or online system to prevent unnecessary space. w60 means Windows Vista or Windows Server 2008. w61 means Windows 7 or Windows Server 2008 R2. There are various …
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Six Sigma Control Plans
In a recent question (https://www.experts-exchange.com/questions/29004105/Run-AutoHotkey-script-directly-from-Notepad.html) here at Experts Exchange, a member asked how to run an AutoHotkey script (.AHK) directly from Notepad++ (aka NPP). This video…

688 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question