Solved

strip file for nicer output

Posted on 2010-11-19
24
472 Views
Last Modified: 2012-06-21
Hi,

I have a file named datadb.xml

There are lines like so in it:

<OBJECT><PROPERTY NAME="__type" VALUE=".com.infoblox.dns.host"/><PROPERTY NAME="ttl_option" VALUE="0"/><PROPERTY NAME="disabled" VALUE="false"/><PROPERTY NAME="zone" VALUE="._default.int.lgnrgy"/><PROPERTY NAME="name" VALUE="lvpdpr125520fw01"/><PROPERTY NAME="comment" VALUE="Firewall"/></OBJECT>
<OBJECT><PROPERTY NAME="__type" VALUE=".com.dns.host_address"/><PROPERTY NAME="host" VALUE="._default.int.ergy.lvpdpr125520fw01"/><PROPERTY NAME="address" VALUE="170.119.66.194"/><PROPERTY NAME="reverse_zone" VALUE="._default.arpa.in-addr.170.119"/><PROPERTY NAME="configure_for_dhcp" VALUE="false"/></OBJECT>
<OBJECT><PROPERTY NAME="__type" VALUE=".com.one.extensible_attributes_value"/><PROPERTY NAME="object" VALUE=".com.dns.host$._default.int.lgergy.lvpdpr125520fw01"/><PROPERTY NAME="tag" VALUE="object_class"/><PROPERTY NAME="value" VALUE="Firewall"/></OBJECT>


I would like to request a shell script or perl to look in this file and provide an output in csv format  like so

IP_addreess=     Hostname=     Object_Class=

So in this case:
IP_addreess= 170.119.66.194    Hostname= lvpdpr125520fw01   Object_Class= Firewall


I would be running this on windows.

Thanks
0
Comment
Question by:richsark
  • 13
  • 10
24 Comments
 
LVL 10

Expert Comment

by:jeromee
Comment Utility
Rich (?),
I don't fully understand the mapping between the 3 lines and that one output line you want.
For example, I see "Firewall" on 2 lines which one do you want to use?

Please explain.
0
 
LVL 1

Author Comment

by:richsark
Comment Utility
Hi, The line which has object_class
0
 
LVL 26

Expert Comment

by:wilcoxon
Comment Utility
Given your small snippet, the below code will work.  However, it probably won't work on the real file.  In order to properly handle XML processing, we need to know the full hierarchical structure of the file.

To change the code to read a file, just remove the "my $file =" line, change $file in the XMLin line to be the quoted filename (eg 'filename'), and (optionally) delete __DATA__ and below.
use strict;
use warnings;
use XML::Simple;

my $file = join '', <DATA>;
my $xml = XMLin($file, ForceArray => 1, KeyAttr => ['NAME'], ContentKey => '-VALUE');
my ($host, $class, $ip);
foreach my $obj (@{$xml->{OBJECT}}) {
    if (exists $obj->{PROPERTY}{name}) {
        $host = $obj->{PROPERTY}{name};
    } elsif (exists $obj->{PROPERTY}{value}) {
        $class = $obj->{PROPERTY}{value};
    } elsif (exists $obj->{PROPERTY}{address}) {
        $ip = $obj->{PROPERTY}{address};
    }
}
print "IP_address=$ip,Hostname=$host,Object_Class=$class\n";
__DATA__
<xml>
<OBJECT><PROPERTY NAME="__type" VALUE=".com.infoblox.dns.host"/><PROPERTY NAME="ttl_option" VALUE="0"/><PROPERTY NAME="disabled" VALUE="false"/><PROPERTY NAME="zone" VALUE="._default.int.lgnrgy"/><PROPERTY NAME="name" VALUE="lvpdpr125520fw01"/><PROPERTY NAME="comment" VALUE="Firewall"/></OBJECT>
<OBJECT><PROPERTY NAME="__type" VALUE=".com.dns.host_address"/><PROPERTY NAME="host" VALUE="._default.int.ergy.lvpdpr125520fw01"/><PROPERTY NAME="address" VALUE="170.119.66.194"/><PROPERTY NAME="reverse_zone" VALUE="._default.arpa.in-addr.170.119"/><PROPERTY NAME="configure_for_dhcp" VALUE="false"/></OBJECT>
<OBJECT><PROPERTY NAME="__type" VALUE=".com.one.extensible_attributes_value"/><PROPERTY NAME="object" VALUE=".com.dns.host$._default.int.lgergy.lvpdpr125520fw01"/><PROPERTY NAME="tag" VALUE="object_class"/><PROPERTY NAME="value" VALUE="Firewall"/></OBJECT>
</xml>

Open in new window

0
 
LVL 26

Expert Comment

by:wilcoxon
Comment Utility
If you provide additional information about the file hierarchy, I will correct the code I provided to work with the real file.
0
 
LVL 1

Author Comment

by:richsark
Comment Utility
HI, Can I email it to you. Its sorta sensitive
0
 
LVL 26

Expert Comment

by:wilcoxon
Comment Utility
I don't actually need all of the data - I just need to know the hierarchy (including case of letters) which probably looks something like this:

<xml ...>
    <some_tag ...>
        <ENTRY ...>
            ... the snippet from above ...
        </ENTRY>
        <ENTRY ...>
            ... similar to above snippet ...
        </ENTRY>
    </some_tag>
</xml>
0
 
LVL 1

Author Comment

by:richsark
Comment Utility
Hi,

Here is the complete xml file on a blank database.

I could not follow your example sorry onedb.xml
0
 
LVL 26

Expert Comment

by:wilcoxon
Comment Utility
Hmm.  The file format is very flat with no grouping intrinsic in the hierarchy.  How do I know to associate the three OBJECT entries in your snipper rather than some other OBJECT entries in the file (or fewer than these three)?  The __type entries don't match, the zone entries don't match - I can't see anything in those records to relate them...
0
 
LVL 1

Author Comment

by:richsark
Comment Utility
Hi Wilcoxon,

Sorry about that. See the new file attached please onedb.zip
0
 
LVL 1

Author Comment

by:richsark
Comment Utility
HI,

Any luck on the complete file?
0
 
LVL 1

Author Comment

by:richsark
Comment Utility
Hi,

One other piece may make it easier


Perhaps narrow it down to values that have extensible_attributes_


<OBJECT><PROPERTY NAME="__type" VALUE=".com.infoblox.one.extensible_attributes_value"/><PROPERTY NAME="object" VALUE=".com.infoblox.dns.host$._default.int.lgeenergy.brownctfw02"/><PROPERTY NAME="tag" VALUE="object_class"/><PROPERTY NAME="value" VALUE="Others"/></OBJECT>
 .

0
 
LVL 1

Author Comment

by:richsark
Comment Utility
HI wilcoxon,

Just wanted to check-in to see if the complete file is going to work. I am biting at the nails to try it :)

Thank you
0
Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

 
LVL 26

Expert Comment

by:wilcoxon
Comment Utility
Sorry.  I was busy last night and today.  I'm looking at the revised full file now.
0
 
LVL 1

Author Comment

by:richsark
Comment Utility
Cool. Can't wait to try

Thanks agian for all your help      
0
 
LVL 26

Expert Comment

by:wilcoxon
Comment Utility
The joining of elements ended up being a lot more complex than I expected but this should work.  I left in code to warn when it hits values the were already defined - it looks like this is mostly hosts that have multiple IP addresses defined.

Towards the end of the file is a commented out print line that will provide more details - namely the "root" __type used to join all of the elements together.
#!/usr/local/bin/perl
use strict;
use warnings FATAL => qw(all);
use XML::Simple;

my $xml = XMLin('/cygdrive/e/dl/new/onedb.xml', ForceArray => 1, KeyAttr => ['NAME'], ContentKey => '-VALUE');
my %data;
foreach my $obj (@{$xml->{OBJECT}}) {
    my $type = $obj->{PROPERTY}{__type};
    next unless ($type and $type =~ m{\.dns\.host$});
    next unless (exists $obj->{PROPERTY}{name} and exists $obj->{PROPERTY}{zone});
    $type =~ s{\.dns\.host$}{};
    my $zone = $obj->{PROPERTY}{zone};
    my $host = $obj->{PROPERTY}{name};
    $data{$type}{$zone}{$host}{host} = $obj->{PROPERTY}{name};
}

foreach my $obj (@{$xml->{OBJECT}}) {
    my $type = $obj->{PROPERTY}{__type};
    next unless ($type and $type =~ m{\.dns\.host_address$});
    next unless (exists $obj->{PROPERTY}{address} and exists $obj->{PROPERTY}{host});
    $type =~ s{\.dns\.host_address$}{};
    my @parts = split /\./, $obj->{PROPERTY}{host};
    my $zone = shift @parts;
    while (not exists $data{$type}{$zone} and @parts) {
        $zone .= '.' . shift @parts;
    }
    die "could not find zone in host $obj->{PROPERTY}{host}"
        unless exists($data{$type}{$zone});
    my $host = $obj->{PROPERTY}{host};
    $host =~ s{^$zone\.}{};
    warn "found ip $obj->{PROPERTY}{address} for $type/$zone/$host which was already defined as $data{$type}{$zone}{$host}{ip}" if $data{$type}{$zone}{$host}{ip};
    $data{$type}{$zone}{$host}{ip} = $obj->{PROPERTY}{address};
}

foreach my $obj (@{$xml->{OBJECT}}) {
    my $type = $obj->{PROPERTY}{__type};
    next unless ($type and $type =~ m{\.one\.extensible_attributes_value$});
    next unless (exists $obj->{PROPERTY}{tag} and $obj->{PROPERTY}{tag} eq 'object_class' and exists $obj->{PROPERTY}{value} and exists $obj->{PROPERTY}{object});
    $type =~ s{\.one\.extensible_attributes_value$}{};
    my $str = $obj->{PROPERTY}{object};
    $str =~ s{^.*?\$}{};
    my @parts = split /\./, $str;
    my $zone = shift @parts;
    while (not exists $data{$type}{$zone} and @parts) {
        $zone .= '.' . shift @parts;
    }
    die "could not find zone in object $obj->{PROPERTY}{object}"
        unless exists($data{$type}{$zone});
    my $host = $obj->{PROPERTY}{object};
    $host =~ s{^.*?\$$zone\.}{};
    die "could not find host in object $obj->{PROPERTY}{object}"
        unless exists($data{$type}{$zone}{$host});
    warn "found class $obj->{PROPERTY}{value} for $type/$zone/$host which was already defined as $data{$type}{$zone}{$host}{class}" if $data{$type}{$zone}{$host}{class};
    $data{$type}{$zone}{$host}{class} = $obj->{PROPERTY}{value};
}

open CSV, '>onedb.csv' or die "could not write onedb.csv: $!";
foreach my $type (sort keys %data) {
    foreach my $zone (sort keys %{$data{$type}}) {
        foreach my $host (sort keys %{$data{$type}{$zone}}) {
            my ($ip, $host, $class) = @{$data{$type}{$zone}{$host}}{qw(ip host class)};
            $ip = '' unless defined($ip);
            $host = '' unless defined($host);
            $class = '' unless defined($class);
#            print CSV "type=$type,IP_address=$ip,Hostname=$host,Object_Class=$class\n";
            print CSV "IP_address=$ip,Hostname=$host,Object_Class=$class\n";
        }
    }
}
close CSV;

Open in new window

0
 
LVL 1

Author Comment

by:richsark
Comment Utility
Awesome,  did u by chance run it? How does it look.

So ones with hosts with multiple IPs are treated how?

I really appericate it. I will try it later when I get home.

Thanks
0
 
LVL 26

Expert Comment

by:wilcoxon
Comment Utility
Yep.  I ran it.  It looks reasonable on the output (to me anyway).  Right now, it will just use the latest IP it finds since you didn't specify any special handling for multiple IPs.  Everything is grouped by the base __type, the zone (see .dns.host entries), and the host name.
0
 
LVL 1

Author Comment

by:richsark
Comment Utility
Hi,

Just ran it, it takes a few min in cygwin.

Looks good man !

Let me sleep on it till the afternoon Sunday  and get back to ya. but for now it looks great !

Thanks for all your hard work.
0
 
LVL 1

Author Comment

by:richsark
Comment Utility
Real quick,

For the ones in the csv like this (hostname= ( blank) means what?

Line2 from onedb.csv

IP_address=170.119.58.50      Hostname=      Object_Class=
0
 
LVL 1

Author Comment

by:richsark
Comment Utility
Hi !

Since were on this topic, I found a few more that I dont seem to understand why/logic

Rows 17161 to 17164 in the onedb.csv

17161 IP_address=      Hostname=      Object_Class=
17162 IP_address=      Hostname=boileropt4      Object_Class=
17163 IP_address=      Hostname=boileropt5      Object_Class=
17164 IP_address=      Hostname=boileropt6      Object_Class=
0
 
LVL 26

Expert Comment

by:wilcoxon
Comment Utility
That would mean one of:
1) the __type of *.dns.host_address did not have any corresponding *.dns.host nor *.one.extensible_attributes_value entries for the zone and host
2) the corresponding *.dns.host entry had name of "" and no *.one.extensible_attributes_value entry

I looked in the onedb.xml and found that in this case it is #2
0
 
LVL 26

Expert Comment

by:wilcoxon
Comment Utility
I'm not sure what line 17161 would be about - best guess is that there is a *.dns.host entry with name = "" and neither of the other two entries.

17162 would be *.dns.host entry but no *.dns.host_address or *.one.extensible_attributes_value entries.

Nope.  It's a bug - the problem is that ._default.int.lgeenergy and ._default.int.lgeenergy.controls are both valid zones - I need to change the code to take the largest zone match rather than the smallest....
0
 
LVL 26

Accepted Solution

by:
wilcoxon earned 500 total points
Comment Utility
Here's the revised code (with some additional debugging statements commented out).  I changed it to do multiple IP addresses for a given host separated by pipe '|'.
#!/usr/local/bin/perl
use strict;
use warnings FATAL => qw(all);
use XML::Simple;
use Data::Dumper qw(Dumper);

my $xml = XMLin('onedb.xml', ForceArray => 1, KeyAttr => ['NAME'], ContentKey => '-VALUE');
my %data;
foreach my $obj (@{$xml->{OBJECT}}) {
    my $type = $obj->{PROPERTY}{__type};
    next unless ($type and $type =~ m{\.dns\.host$});
    next unless (exists $obj->{PROPERTY}{name} and exists $obj->{PROPERTY}{zone});
    $type =~ s{\.dns\.host$}{};
    my $zone = $obj->{PROPERTY}{zone};
    my $host = $obj->{PROPERTY}{name};
    $data{$type}{$zone}{$host}{host} = $host;
#    print "type/zone/host/host = $type/$zone/$host/$data{$type}{$zone}{$host}{host}\n", Dumper($obj->{PROPERTY}), "\n" if ($type eq '.com.infoblox' and $host =~ m{^boileropt});
}

foreach my $obj (@{$xml->{OBJECT}}) {
    my $type = $obj->{PROPERTY}{__type};
    next unless ($type and $type =~ m{\.dns\.host_address$});
    next unless (exists $obj->{PROPERTY}{address} and exists $obj->{PROPERTY}{host});
    $type =~ s{\.dns\.host_address$}{};
    my @parts = split /\./, $obj->{PROPERTY}{host};
    my $zone = shift @parts;
    my $match;
    while (@parts) {
        $zone .= '.' . shift @parts;
        $match = $zone if exists($data{$type}{$zone});
    }
    $zone = $match;
    die "could not find zone in host $obj->{PROPERTY}{host}"
        unless exists($data{$type}{$zone});
    my $host = $obj->{PROPERTY}{host};
    $host =~ s{^$zone\.}{};
    if (defined $data{$type}{$zone}{$host}{ip}) {
        push @{$data{$type}{$zone}{$host}{ip}}, $obj->{PROPERTY}{address};
    } else {
        $data{$type}{$zone}{$host}{ip} = [$obj->{PROPERTY}{address}];
    }
#    print "type/zone/host/IP = $type/$zone/$host/@{$data{$type}{$zone}{$host}{ip}}\n", Dumper($obj->{PROPERTY}), "\n" if ($type eq '.com.infoblox' and $host =~ m{boileropt});
}

foreach my $obj (@{$xml->{OBJECT}}) {
    my $type = $obj->{PROPERTY}{__type};
    next unless ($type and $type =~ m{\.one\.extensible_attributes_value$});
    next unless (exists $obj->{PROPERTY}{tag} and $obj->{PROPERTY}{tag} eq 'object_class' and exists $obj->{PROPERTY}{value} and exists $obj->{PROPERTY}{object});
    $type =~ s{\.one\.extensible_attributes_value$}{};
    my $str = $obj->{PROPERTY}{object};
    $str =~ s{^.*?\$}{};
    my @parts = split /\./, $str;
    my $zone = shift @parts;
    my $match;
    while (@parts) {
        $zone .= '.' . shift @parts;
        $match = $zone if exists($data{$type}{$zone});
    }
    $zone = $match;
    die "could not find zone in object $obj->{PROPERTY}{object}"
        unless exists($data{$type}{$zone});
    my $host = $obj->{PROPERTY}{object};
    $host =~ s{^.*?\$$zone\.}{};
    die "could not find host in object $obj->{PROPERTY}{object}"
        unless exists($data{$type}{$zone}{$host});
    warn "found class $obj->{PROPERTY}{value} for $type/$zone/$host which was already defined as $data{$type}{$zone}{$host}{class}" if $data{$type}{$zone}{$host}{class};
    $data{$type}{$zone}{$host}{class} = $obj->{PROPERTY}{value};
#    print "type/zone/host/class = $type/$zone/$host/$data{$type}{$zone}{$host}{class}\n", Dumper($obj->{PROPERTY}), "\n" if ($type eq '.com.infoblox' and $host =~ m{boileropt});
}

open CSV, '>onedb.csv' or die "could not write onedb.csv: $!";
foreach my $type (sort keys %data) {
    foreach my $zone (sort keys %{$data{$type}}) {
        foreach my $host (sort keys %{$data{$type}{$zone}}) {
            my ($ip, $host, $class) = @{$data{$type}{$zone}{$host}}{qw(ip host class)};
            $ip = '' unless defined($ip);
            $ip = join '|', @$ip if ref($ip);
            $host = '' unless defined($host);
            $class = '' unless defined($class);
#            print CSV "type=$type,IP_address=$ip,Hostname=$host,Object_Class=$class\n";
            print CSV "IP_address=$ip,Hostname=$host,Object_Class=$class\n";
        }
    }
}
close CSV;

Open in new window

0
 
LVL 1

Author Closing Comment

by:richsark
Comment Utility
Awesome work and very prompt      
0

Featured Post

6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

Join & Write a Comment

Suggested Solutions

There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
Recently, an awarded photographer, Selina De Maeyer (http://www.selinademaeyer.com/), completed a photo shoot of a beautiful event (http://www.sintjacobantwerpen.be/verslag-en-fotoreportage-van-de-sacramentsprocessie-door-antwerpen#thumbnails) in An…
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now