• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 501
  • Last Modified:

strip file for nicer output

Hi,

I have a file named datadb.xml

There are lines like so in it:

<OBJECT><PROPERTY NAME="__type" VALUE=".com.infoblox.dns.host"/><PROPERTY NAME="ttl_option" VALUE="0"/><PROPERTY NAME="disabled" VALUE="false"/><PROPERTY NAME="zone" VALUE="._default.int.lgnrgy"/><PROPERTY NAME="name" VALUE="lvpdpr125520fw01"/><PROPERTY NAME="comment" VALUE="Firewall"/></OBJECT>
<OBJECT><PROPERTY NAME="__type" VALUE=".com.dns.host_address"/><PROPERTY NAME="host" VALUE="._default.int.ergy.lvpdpr125520fw01"/><PROPERTY NAME="address" VALUE="170.119.66.194"/><PROPERTY NAME="reverse_zone" VALUE="._default.arpa.in-addr.170.119"/><PROPERTY NAME="configure_for_dhcp" VALUE="false"/></OBJECT>
<OBJECT><PROPERTY NAME="__type" VALUE=".com.one.extensible_attributes_value"/><PROPERTY NAME="object" VALUE=".com.dns.host$._default.int.lgergy.lvpdpr125520fw01"/><PROPERTY NAME="tag" VALUE="object_class"/><PROPERTY NAME="value" VALUE="Firewall"/></OBJECT>


I would like to request a shell script or perl to look in this file and provide an output in csv format  like so

IP_addreess=     Hostname=     Object_Class=

So in this case:
IP_addreess= 170.119.66.194    Hostname= lvpdpr125520fw01   Object_Class= Firewall


I would be running this on windows.

Thanks
0
richsark
Asked:
richsark
  • 13
  • 10
1 Solution
 
jeromeeCommented:
Rich (?),
I don't fully understand the mapping between the 3 lines and that one output line you want.
For example, I see "Firewall" on 2 lines which one do you want to use?

Please explain.
0
 
richsarkAuthor Commented:
Hi, The line which has object_class
0
 
wilcoxonCommented:
Given your small snippet, the below code will work.  However, it probably won't work on the real file.  In order to properly handle XML processing, we need to know the full hierarchical structure of the file.

To change the code to read a file, just remove the "my $file =" line, change $file in the XMLin line to be the quoted filename (eg 'filename'), and (optionally) delete __DATA__ and below.
use strict;
use warnings;
use XML::Simple;

my $file = join '', <DATA>;
my $xml = XMLin($file, ForceArray => 1, KeyAttr => ['NAME'], ContentKey => '-VALUE');
my ($host, $class, $ip);
foreach my $obj (@{$xml->{OBJECT}}) {
    if (exists $obj->{PROPERTY}{name}) {
        $host = $obj->{PROPERTY}{name};
    } elsif (exists $obj->{PROPERTY}{value}) {
        $class = $obj->{PROPERTY}{value};
    } elsif (exists $obj->{PROPERTY}{address}) {
        $ip = $obj->{PROPERTY}{address};
    }
}
print "IP_address=$ip,Hostname=$host,Object_Class=$class\n";
__DATA__
<xml>
<OBJECT><PROPERTY NAME="__type" VALUE=".com.infoblox.dns.host"/><PROPERTY NAME="ttl_option" VALUE="0"/><PROPERTY NAME="disabled" VALUE="false"/><PROPERTY NAME="zone" VALUE="._default.int.lgnrgy"/><PROPERTY NAME="name" VALUE="lvpdpr125520fw01"/><PROPERTY NAME="comment" VALUE="Firewall"/></OBJECT>
<OBJECT><PROPERTY NAME="__type" VALUE=".com.dns.host_address"/><PROPERTY NAME="host" VALUE="._default.int.ergy.lvpdpr125520fw01"/><PROPERTY NAME="address" VALUE="170.119.66.194"/><PROPERTY NAME="reverse_zone" VALUE="._default.arpa.in-addr.170.119"/><PROPERTY NAME="configure_for_dhcp" VALUE="false"/></OBJECT>
<OBJECT><PROPERTY NAME="__type" VALUE=".com.one.extensible_attributes_value"/><PROPERTY NAME="object" VALUE=".com.dns.host$._default.int.lgergy.lvpdpr125520fw01"/><PROPERTY NAME="tag" VALUE="object_class"/><PROPERTY NAME="value" VALUE="Firewall"/></OBJECT>
</xml>

Open in new window

0
Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

 
wilcoxonCommented:
If you provide additional information about the file hierarchy, I will correct the code I provided to work with the real file.
0
 
richsarkAuthor Commented:
HI, Can I email it to you. Its sorta sensitive
0
 
wilcoxonCommented:
I don't actually need all of the data - I just need to know the hierarchy (including case of letters) which probably looks something like this:

<xml ...>
    <some_tag ...>
        <ENTRY ...>
            ... the snippet from above ...
        </ENTRY>
        <ENTRY ...>
            ... similar to above snippet ...
        </ENTRY>
    </some_tag>
</xml>
0
 
richsarkAuthor Commented:
Hi,

Here is the complete xml file on a blank database.

I could not follow your example sorry onedb.xml
0
 
wilcoxonCommented:
Hmm.  The file format is very flat with no grouping intrinsic in the hierarchy.  How do I know to associate the three OBJECT entries in your snipper rather than some other OBJECT entries in the file (or fewer than these three)?  The __type entries don't match, the zone entries don't match - I can't see anything in those records to relate them...
0
 
richsarkAuthor Commented:
Hi Wilcoxon,

Sorry about that. See the new file attached please onedb.zip
0
 
richsarkAuthor Commented:
HI,

Any luck on the complete file?
0
 
richsarkAuthor Commented:
Hi,

One other piece may make it easier


Perhaps narrow it down to values that have extensible_attributes_


<OBJECT><PROPERTY NAME="__type" VALUE=".com.infoblox.one.extensible_attributes_value"/><PROPERTY NAME="object" VALUE=".com.infoblox.dns.host$._default.int.lgeenergy.brownctfw02"/><PROPERTY NAME="tag" VALUE="object_class"/><PROPERTY NAME="value" VALUE="Others"/></OBJECT>
 .

0
 
richsarkAuthor Commented:
HI wilcoxon,

Just wanted to check-in to see if the complete file is going to work. I am biting at the nails to try it :)

Thank you
0
 
wilcoxonCommented:
Sorry.  I was busy last night and today.  I'm looking at the revised full file now.
0
 
richsarkAuthor Commented:
Cool. Can't wait to try

Thanks agian for all your help      
0
 
wilcoxonCommented:
The joining of elements ended up being a lot more complex than I expected but this should work.  I left in code to warn when it hits values the were already defined - it looks like this is mostly hosts that have multiple IP addresses defined.

Towards the end of the file is a commented out print line that will provide more details - namely the "root" __type used to join all of the elements together.
#!/usr/local/bin/perl
use strict;
use warnings FATAL => qw(all);
use XML::Simple;

my $xml = XMLin('/cygdrive/e/dl/new/onedb.xml', ForceArray => 1, KeyAttr => ['NAME'], ContentKey => '-VALUE');
my %data;
foreach my $obj (@{$xml->{OBJECT}}) {
    my $type = $obj->{PROPERTY}{__type};
    next unless ($type and $type =~ m{\.dns\.host$});
    next unless (exists $obj->{PROPERTY}{name} and exists $obj->{PROPERTY}{zone});
    $type =~ s{\.dns\.host$}{};
    my $zone = $obj->{PROPERTY}{zone};
    my $host = $obj->{PROPERTY}{name};
    $data{$type}{$zone}{$host}{host} = $obj->{PROPERTY}{name};
}

foreach my $obj (@{$xml->{OBJECT}}) {
    my $type = $obj->{PROPERTY}{__type};
    next unless ($type and $type =~ m{\.dns\.host_address$});
    next unless (exists $obj->{PROPERTY}{address} and exists $obj->{PROPERTY}{host});
    $type =~ s{\.dns\.host_address$}{};
    my @parts = split /\./, $obj->{PROPERTY}{host};
    my $zone = shift @parts;
    while (not exists $data{$type}{$zone} and @parts) {
        $zone .= '.' . shift @parts;
    }
    die "could not find zone in host $obj->{PROPERTY}{host}"
        unless exists($data{$type}{$zone});
    my $host = $obj->{PROPERTY}{host};
    $host =~ s{^$zone\.}{};
    warn "found ip $obj->{PROPERTY}{address} for $type/$zone/$host which was already defined as $data{$type}{$zone}{$host}{ip}" if $data{$type}{$zone}{$host}{ip};
    $data{$type}{$zone}{$host}{ip} = $obj->{PROPERTY}{address};
}

foreach my $obj (@{$xml->{OBJECT}}) {
    my $type = $obj->{PROPERTY}{__type};
    next unless ($type and $type =~ m{\.one\.extensible_attributes_value$});
    next unless (exists $obj->{PROPERTY}{tag} and $obj->{PROPERTY}{tag} eq 'object_class' and exists $obj->{PROPERTY}{value} and exists $obj->{PROPERTY}{object});
    $type =~ s{\.one\.extensible_attributes_value$}{};
    my $str = $obj->{PROPERTY}{object};
    $str =~ s{^.*?\$}{};
    my @parts = split /\./, $str;
    my $zone = shift @parts;
    while (not exists $data{$type}{$zone} and @parts) {
        $zone .= '.' . shift @parts;
    }
    die "could not find zone in object $obj->{PROPERTY}{object}"
        unless exists($data{$type}{$zone});
    my $host = $obj->{PROPERTY}{object};
    $host =~ s{^.*?\$$zone\.}{};
    die "could not find host in object $obj->{PROPERTY}{object}"
        unless exists($data{$type}{$zone}{$host});
    warn "found class $obj->{PROPERTY}{value} for $type/$zone/$host which was already defined as $data{$type}{$zone}{$host}{class}" if $data{$type}{$zone}{$host}{class};
    $data{$type}{$zone}{$host}{class} = $obj->{PROPERTY}{value};
}

open CSV, '>onedb.csv' or die "could not write onedb.csv: $!";
foreach my $type (sort keys %data) {
    foreach my $zone (sort keys %{$data{$type}}) {
        foreach my $host (sort keys %{$data{$type}{$zone}}) {
            my ($ip, $host, $class) = @{$data{$type}{$zone}{$host}}{qw(ip host class)};
            $ip = '' unless defined($ip);
            $host = '' unless defined($host);
            $class = '' unless defined($class);
#            print CSV "type=$type,IP_address=$ip,Hostname=$host,Object_Class=$class\n";
            print CSV "IP_address=$ip,Hostname=$host,Object_Class=$class\n";
        }
    }
}
close CSV;

Open in new window

0
 
richsarkAuthor Commented:
Awesome,  did u by chance run it? How does it look.

So ones with hosts with multiple IPs are treated how?

I really appericate it. I will try it later when I get home.

Thanks
0
 
wilcoxonCommented:
Yep.  I ran it.  It looks reasonable on the output (to me anyway).  Right now, it will just use the latest IP it finds since you didn't specify any special handling for multiple IPs.  Everything is grouped by the base __type, the zone (see .dns.host entries), and the host name.
0
 
richsarkAuthor Commented:
Hi,

Just ran it, it takes a few min in cygwin.

Looks good man !

Let me sleep on it till the afternoon Sunday  and get back to ya. but for now it looks great !

Thanks for all your hard work.
0
 
richsarkAuthor Commented:
Real quick,

For the ones in the csv like this (hostname= ( blank) means what?

Line2 from onedb.csv

IP_address=170.119.58.50      Hostname=      Object_Class=
0
 
richsarkAuthor Commented:
Hi !

Since were on this topic, I found a few more that I dont seem to understand why/logic

Rows 17161 to 17164 in the onedb.csv

17161 IP_address=      Hostname=      Object_Class=
17162 IP_address=      Hostname=boileropt4      Object_Class=
17163 IP_address=      Hostname=boileropt5      Object_Class=
17164 IP_address=      Hostname=boileropt6      Object_Class=
0
 
wilcoxonCommented:
That would mean one of:
1) the __type of *.dns.host_address did not have any corresponding *.dns.host nor *.one.extensible_attributes_value entries for the zone and host
2) the corresponding *.dns.host entry had name of "" and no *.one.extensible_attributes_value entry

I looked in the onedb.xml and found that in this case it is #2
0
 
wilcoxonCommented:
I'm not sure what line 17161 would be about - best guess is that there is a *.dns.host entry with name = "" and neither of the other two entries.

17162 would be *.dns.host entry but no *.dns.host_address or *.one.extensible_attributes_value entries.

Nope.  It's a bug - the problem is that ._default.int.lgeenergy and ._default.int.lgeenergy.controls are both valid zones - I need to change the code to take the largest zone match rather than the smallest....
0
 
wilcoxonCommented:
Here's the revised code (with some additional debugging statements commented out).  I changed it to do multiple IP addresses for a given host separated by pipe '|'.
#!/usr/local/bin/perl
use strict;
use warnings FATAL => qw(all);
use XML::Simple;
use Data::Dumper qw(Dumper);

my $xml = XMLin('onedb.xml', ForceArray => 1, KeyAttr => ['NAME'], ContentKey => '-VALUE');
my %data;
foreach my $obj (@{$xml->{OBJECT}}) {
    my $type = $obj->{PROPERTY}{__type};
    next unless ($type and $type =~ m{\.dns\.host$});
    next unless (exists $obj->{PROPERTY}{name} and exists $obj->{PROPERTY}{zone});
    $type =~ s{\.dns\.host$}{};
    my $zone = $obj->{PROPERTY}{zone};
    my $host = $obj->{PROPERTY}{name};
    $data{$type}{$zone}{$host}{host} = $host;
#    print "type/zone/host/host = $type/$zone/$host/$data{$type}{$zone}{$host}{host}\n", Dumper($obj->{PROPERTY}), "\n" if ($type eq '.com.infoblox' and $host =~ m{^boileropt});
}

foreach my $obj (@{$xml->{OBJECT}}) {
    my $type = $obj->{PROPERTY}{__type};
    next unless ($type and $type =~ m{\.dns\.host_address$});
    next unless (exists $obj->{PROPERTY}{address} and exists $obj->{PROPERTY}{host});
    $type =~ s{\.dns\.host_address$}{};
    my @parts = split /\./, $obj->{PROPERTY}{host};
    my $zone = shift @parts;
    my $match;
    while (@parts) {
        $zone .= '.' . shift @parts;
        $match = $zone if exists($data{$type}{$zone});
    }
    $zone = $match;
    die "could not find zone in host $obj->{PROPERTY}{host}"
        unless exists($data{$type}{$zone});
    my $host = $obj->{PROPERTY}{host};
    $host =~ s{^$zone\.}{};
    if (defined $data{$type}{$zone}{$host}{ip}) {
        push @{$data{$type}{$zone}{$host}{ip}}, $obj->{PROPERTY}{address};
    } else {
        $data{$type}{$zone}{$host}{ip} = [$obj->{PROPERTY}{address}];
    }
#    print "type/zone/host/IP = $type/$zone/$host/@{$data{$type}{$zone}{$host}{ip}}\n", Dumper($obj->{PROPERTY}), "\n" if ($type eq '.com.infoblox' and $host =~ m{boileropt});
}

foreach my $obj (@{$xml->{OBJECT}}) {
    my $type = $obj->{PROPERTY}{__type};
    next unless ($type and $type =~ m{\.one\.extensible_attributes_value$});
    next unless (exists $obj->{PROPERTY}{tag} and $obj->{PROPERTY}{tag} eq 'object_class' and exists $obj->{PROPERTY}{value} and exists $obj->{PROPERTY}{object});
    $type =~ s{\.one\.extensible_attributes_value$}{};
    my $str = $obj->{PROPERTY}{object};
    $str =~ s{^.*?\$}{};
    my @parts = split /\./, $str;
    my $zone = shift @parts;
    my $match;
    while (@parts) {
        $zone .= '.' . shift @parts;
        $match = $zone if exists($data{$type}{$zone});
    }
    $zone = $match;
    die "could not find zone in object $obj->{PROPERTY}{object}"
        unless exists($data{$type}{$zone});
    my $host = $obj->{PROPERTY}{object};
    $host =~ s{^.*?\$$zone\.}{};
    die "could not find host in object $obj->{PROPERTY}{object}"
        unless exists($data{$type}{$zone}{$host});
    warn "found class $obj->{PROPERTY}{value} for $type/$zone/$host which was already defined as $data{$type}{$zone}{$host}{class}" if $data{$type}{$zone}{$host}{class};
    $data{$type}{$zone}{$host}{class} = $obj->{PROPERTY}{value};
#    print "type/zone/host/class = $type/$zone/$host/$data{$type}{$zone}{$host}{class}\n", Dumper($obj->{PROPERTY}), "\n" if ($type eq '.com.infoblox' and $host =~ m{boileropt});
}

open CSV, '>onedb.csv' or die "could not write onedb.csv: $!";
foreach my $type (sort keys %data) {
    foreach my $zone (sort keys %{$data{$type}}) {
        foreach my $host (sort keys %{$data{$type}{$zone}}) {
            my ($ip, $host, $class) = @{$data{$type}{$zone}{$host}}{qw(ip host class)};
            $ip = '' unless defined($ip);
            $ip = join '|', @$ip if ref($ip);
            $host = '' unless defined($host);
            $class = '' unless defined($class);
#            print CSV "type=$type,IP_address=$ip,Hostname=$host,Object_Class=$class\n";
            print CSV "IP_address=$ip,Hostname=$host,Object_Class=$class\n";
        }
    }
}
close CSV;

Open in new window

0
 
richsarkAuthor Commented:
Awesome work and very prompt      
0

Featured Post

Hire Technology Freelancers with Gigs

Work with freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely, and get projects done right.

  • 13
  • 10
Tackle projects and never again get stuck behind a technical roadblock.
Join Now