Solved

strip file for nicer output

Posted on 2010-11-19
24
483 Views
Last Modified: 2012-06-21
Hi,

I have a file named datadb.xml

There are lines like so in it:

<OBJECT><PROPERTY NAME="__type" VALUE=".com.infoblox.dns.host"/><PROPERTY NAME="ttl_option" VALUE="0"/><PROPERTY NAME="disabled" VALUE="false"/><PROPERTY NAME="zone" VALUE="._default.int.lgnrgy"/><PROPERTY NAME="name" VALUE="lvpdpr125520fw01"/><PROPERTY NAME="comment" VALUE="Firewall"/></OBJECT>
<OBJECT><PROPERTY NAME="__type" VALUE=".com.dns.host_address"/><PROPERTY NAME="host" VALUE="._default.int.ergy.lvpdpr125520fw01"/><PROPERTY NAME="address" VALUE="170.119.66.194"/><PROPERTY NAME="reverse_zone" VALUE="._default.arpa.in-addr.170.119"/><PROPERTY NAME="configure_for_dhcp" VALUE="false"/></OBJECT>
<OBJECT><PROPERTY NAME="__type" VALUE=".com.one.extensible_attributes_value"/><PROPERTY NAME="object" VALUE=".com.dns.host$._default.int.lgergy.lvpdpr125520fw01"/><PROPERTY NAME="tag" VALUE="object_class"/><PROPERTY NAME="value" VALUE="Firewall"/></OBJECT>


I would like to request a shell script or perl to look in this file and provide an output in csv format  like so

IP_addreess=     Hostname=     Object_Class=

So in this case:
IP_addreess= 170.119.66.194    Hostname= lvpdpr125520fw01   Object_Class= Firewall


I would be running this on windows.

Thanks
0
Comment
Question by:richsark
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 13
  • 10
24 Comments
 
LVL 10

Expert Comment

by:jeromee
ID: 34174703
Rich (?),
I don't fully understand the mapping between the 3 lines and that one output line you want.
For example, I see "Firewall" on 2 lines which one do you want to use?

Please explain.
0
 
LVL 1

Author Comment

by:richsark
ID: 34174733
Hi, The line which has object_class
0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 34174925
Given your small snippet, the below code will work.  However, it probably won't work on the real file.  In order to properly handle XML processing, we need to know the full hierarchical structure of the file.

To change the code to read a file, just remove the "my $file =" line, change $file in the XMLin line to be the quoted filename (eg 'filename'), and (optionally) delete __DATA__ and below.
use strict;
use warnings;
use XML::Simple;

my $file = join '', <DATA>;
my $xml = XMLin($file, ForceArray => 1, KeyAttr => ['NAME'], ContentKey => '-VALUE');
my ($host, $class, $ip);
foreach my $obj (@{$xml->{OBJECT}}) {
    if (exists $obj->{PROPERTY}{name}) {
        $host = $obj->{PROPERTY}{name};
    } elsif (exists $obj->{PROPERTY}{value}) {
        $class = $obj->{PROPERTY}{value};
    } elsif (exists $obj->{PROPERTY}{address}) {
        $ip = $obj->{PROPERTY}{address};
    }
}
print "IP_address=$ip,Hostname=$host,Object_Class=$class\n";
__DATA__
<xml>
<OBJECT><PROPERTY NAME="__type" VALUE=".com.infoblox.dns.host"/><PROPERTY NAME="ttl_option" VALUE="0"/><PROPERTY NAME="disabled" VALUE="false"/><PROPERTY NAME="zone" VALUE="._default.int.lgnrgy"/><PROPERTY NAME="name" VALUE="lvpdpr125520fw01"/><PROPERTY NAME="comment" VALUE="Firewall"/></OBJECT>
<OBJECT><PROPERTY NAME="__type" VALUE=".com.dns.host_address"/><PROPERTY NAME="host" VALUE="._default.int.ergy.lvpdpr125520fw01"/><PROPERTY NAME="address" VALUE="170.119.66.194"/><PROPERTY NAME="reverse_zone" VALUE="._default.arpa.in-addr.170.119"/><PROPERTY NAME="configure_for_dhcp" VALUE="false"/></OBJECT>
<OBJECT><PROPERTY NAME="__type" VALUE=".com.one.extensible_attributes_value"/><PROPERTY NAME="object" VALUE=".com.dns.host$._default.int.lgergy.lvpdpr125520fw01"/><PROPERTY NAME="tag" VALUE="object_class"/><PROPERTY NAME="value" VALUE="Firewall"/></OBJECT>
</xml>

Open in new window

0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 26

Expert Comment

by:wilcoxon
ID: 34174935
If you provide additional information about the file hierarchy, I will correct the code I provided to work with the real file.
0
 
LVL 1

Author Comment

by:richsark
ID: 34175020
HI, Can I email it to you. Its sorta sensitive
0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 34175159
I don't actually need all of the data - I just need to know the hierarchy (including case of letters) which probably looks something like this:

<xml ...>
    <some_tag ...>
        <ENTRY ...>
            ... the snippet from above ...
        </ENTRY>
        <ENTRY ...>
            ... similar to above snippet ...
        </ENTRY>
    </some_tag>
</xml>
0
 
LVL 1

Author Comment

by:richsark
ID: 34175467
Hi,

Here is the complete xml file on a blank database.

I could not follow your example sorry onedb.xml
0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 34175575
Hmm.  The file format is very flat with no grouping intrinsic in the hierarchy.  How do I know to associate the three OBJECT entries in your snipper rather than some other OBJECT entries in the file (or fewer than these three)?  The __type entries don't match, the zone entries don't match - I can't see anything in those records to relate them...
0
 
LVL 1

Author Comment

by:richsark
ID: 34176460
Hi Wilcoxon,

Sorry about that. See the new file attached please onedb.zip
0
 
LVL 1

Author Comment

by:richsark
ID: 34177283
HI,

Any luck on the complete file?
0
 
LVL 1

Author Comment

by:richsark
ID: 34177309
Hi,

One other piece may make it easier


Perhaps narrow it down to values that have extensible_attributes_


<OBJECT><PROPERTY NAME="__type" VALUE=".com.infoblox.one.extensible_attributes_value"/><PROPERTY NAME="object" VALUE=".com.infoblox.dns.host$._default.int.lgeenergy.brownctfw02"/><PROPERTY NAME="tag" VALUE="object_class"/><PROPERTY NAME="value" VALUE="Others"/></OBJECT>
 .

0
 
LVL 1

Author Comment

by:richsark
ID: 34179969
HI wilcoxon,

Just wanted to check-in to see if the complete file is going to work. I am biting at the nails to try it :)

Thank you
0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 34181223
Sorry.  I was busy last night and today.  I'm looking at the revised full file now.
0
 
LVL 1

Author Comment

by:richsark
ID: 34181472
Cool. Can't wait to try

Thanks agian for all your help      
0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 34181619
The joining of elements ended up being a lot more complex than I expected but this should work.  I left in code to warn when it hits values the were already defined - it looks like this is mostly hosts that have multiple IP addresses defined.

Towards the end of the file is a commented out print line that will provide more details - namely the "root" __type used to join all of the elements together.
#!/usr/local/bin/perl
use strict;
use warnings FATAL => qw(all);
use XML::Simple;

my $xml = XMLin('/cygdrive/e/dl/new/onedb.xml', ForceArray => 1, KeyAttr => ['NAME'], ContentKey => '-VALUE');
my %data;
foreach my $obj (@{$xml->{OBJECT}}) {
    my $type = $obj->{PROPERTY}{__type};
    next unless ($type and $type =~ m{\.dns\.host$});
    next unless (exists $obj->{PROPERTY}{name} and exists $obj->{PROPERTY}{zone});
    $type =~ s{\.dns\.host$}{};
    my $zone = $obj->{PROPERTY}{zone};
    my $host = $obj->{PROPERTY}{name};
    $data{$type}{$zone}{$host}{host} = $obj->{PROPERTY}{name};
}

foreach my $obj (@{$xml->{OBJECT}}) {
    my $type = $obj->{PROPERTY}{__type};
    next unless ($type and $type =~ m{\.dns\.host_address$});
    next unless (exists $obj->{PROPERTY}{address} and exists $obj->{PROPERTY}{host});
    $type =~ s{\.dns\.host_address$}{};
    my @parts = split /\./, $obj->{PROPERTY}{host};
    my $zone = shift @parts;
    while (not exists $data{$type}{$zone} and @parts) {
        $zone .= '.' . shift @parts;
    }
    die "could not find zone in host $obj->{PROPERTY}{host}"
        unless exists($data{$type}{$zone});
    my $host = $obj->{PROPERTY}{host};
    $host =~ s{^$zone\.}{};
    warn "found ip $obj->{PROPERTY}{address} for $type/$zone/$host which was already defined as $data{$type}{$zone}{$host}{ip}" if $data{$type}{$zone}{$host}{ip};
    $data{$type}{$zone}{$host}{ip} = $obj->{PROPERTY}{address};
}

foreach my $obj (@{$xml->{OBJECT}}) {
    my $type = $obj->{PROPERTY}{__type};
    next unless ($type and $type =~ m{\.one\.extensible_attributes_value$});
    next unless (exists $obj->{PROPERTY}{tag} and $obj->{PROPERTY}{tag} eq 'object_class' and exists $obj->{PROPERTY}{value} and exists $obj->{PROPERTY}{object});
    $type =~ s{\.one\.extensible_attributes_value$}{};
    my $str = $obj->{PROPERTY}{object};
    $str =~ s{^.*?\$}{};
    my @parts = split /\./, $str;
    my $zone = shift @parts;
    while (not exists $data{$type}{$zone} and @parts) {
        $zone .= '.' . shift @parts;
    }
    die "could not find zone in object $obj->{PROPERTY}{object}"
        unless exists($data{$type}{$zone});
    my $host = $obj->{PROPERTY}{object};
    $host =~ s{^.*?\$$zone\.}{};
    die "could not find host in object $obj->{PROPERTY}{object}"
        unless exists($data{$type}{$zone}{$host});
    warn "found class $obj->{PROPERTY}{value} for $type/$zone/$host which was already defined as $data{$type}{$zone}{$host}{class}" if $data{$type}{$zone}{$host}{class};
    $data{$type}{$zone}{$host}{class} = $obj->{PROPERTY}{value};
}

open CSV, '>onedb.csv' or die "could not write onedb.csv: $!";
foreach my $type (sort keys %data) {
    foreach my $zone (sort keys %{$data{$type}}) {
        foreach my $host (sort keys %{$data{$type}{$zone}}) {
            my ($ip, $host, $class) = @{$data{$type}{$zone}{$host}}{qw(ip host class)};
            $ip = '' unless defined($ip);
            $host = '' unless defined($host);
            $class = '' unless defined($class);
#            print CSV "type=$type,IP_address=$ip,Hostname=$host,Object_Class=$class\n";
            print CSV "IP_address=$ip,Hostname=$host,Object_Class=$class\n";
        }
    }
}
close CSV;

Open in new window

0
 
LVL 1

Author Comment

by:richsark
ID: 34181640
Awesome,  did u by chance run it? How does it look.

So ones with hosts with multiple IPs are treated how?

I really appericate it. I will try it later when I get home.

Thanks
0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 34181705
Yep.  I ran it.  It looks reasonable on the output (to me anyway).  Right now, it will just use the latest IP it finds since you didn't specify any special handling for multiple IPs.  Everything is grouped by the base __type, the zone (see .dns.host entries), and the host name.
0
 
LVL 1

Author Comment

by:richsark
ID: 34181803
Hi,

Just ran it, it takes a few min in cygwin.

Looks good man !

Let me sleep on it till the afternoon Sunday  and get back to ya. but for now it looks great !

Thanks for all your hard work.
0
 
LVL 1

Author Comment

by:richsark
ID: 34181824
Real quick,

For the ones in the csv like this (hostname= ( blank) means what?

Line2 from onedb.csv

IP_address=170.119.58.50      Hostname=      Object_Class=
0
 
LVL 1

Author Comment

by:richsark
ID: 34181976
Hi !

Since were on this topic, I found a few more that I dont seem to understand why/logic

Rows 17161 to 17164 in the onedb.csv

17161 IP_address=      Hostname=      Object_Class=
17162 IP_address=      Hostname=boileropt4      Object_Class=
17163 IP_address=      Hostname=boileropt5      Object_Class=
17164 IP_address=      Hostname=boileropt6      Object_Class=
0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 34181995
That would mean one of:
1) the __type of *.dns.host_address did not have any corresponding *.dns.host nor *.one.extensible_attributes_value entries for the zone and host
2) the corresponding *.dns.host entry had name of "" and no *.one.extensible_attributes_value entry

I looked in the onedb.xml and found that in this case it is #2
0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 34182038
I'm not sure what line 17161 would be about - best guess is that there is a *.dns.host entry with name = "" and neither of the other two entries.

17162 would be *.dns.host entry but no *.dns.host_address or *.one.extensible_attributes_value entries.

Nope.  It's a bug - the problem is that ._default.int.lgeenergy and ._default.int.lgeenergy.controls are both valid zones - I need to change the code to take the largest zone match rather than the smallest....
0
 
LVL 26

Accepted Solution

by:
wilcoxon earned 500 total points
ID: 34182051
Here's the revised code (with some additional debugging statements commented out).  I changed it to do multiple IP addresses for a given host separated by pipe '|'.
#!/usr/local/bin/perl
use strict;
use warnings FATAL => qw(all);
use XML::Simple;
use Data::Dumper qw(Dumper);

my $xml = XMLin('onedb.xml', ForceArray => 1, KeyAttr => ['NAME'], ContentKey => '-VALUE');
my %data;
foreach my $obj (@{$xml->{OBJECT}}) {
    my $type = $obj->{PROPERTY}{__type};
    next unless ($type and $type =~ m{\.dns\.host$});
    next unless (exists $obj->{PROPERTY}{name} and exists $obj->{PROPERTY}{zone});
    $type =~ s{\.dns\.host$}{};
    my $zone = $obj->{PROPERTY}{zone};
    my $host = $obj->{PROPERTY}{name};
    $data{$type}{$zone}{$host}{host} = $host;
#    print "type/zone/host/host = $type/$zone/$host/$data{$type}{$zone}{$host}{host}\n", Dumper($obj->{PROPERTY}), "\n" if ($type eq '.com.infoblox' and $host =~ m{^boileropt});
}

foreach my $obj (@{$xml->{OBJECT}}) {
    my $type = $obj->{PROPERTY}{__type};
    next unless ($type and $type =~ m{\.dns\.host_address$});
    next unless (exists $obj->{PROPERTY}{address} and exists $obj->{PROPERTY}{host});
    $type =~ s{\.dns\.host_address$}{};
    my @parts = split /\./, $obj->{PROPERTY}{host};
    my $zone = shift @parts;
    my $match;
    while (@parts) {
        $zone .= '.' . shift @parts;
        $match = $zone if exists($data{$type}{$zone});
    }
    $zone = $match;
    die "could not find zone in host $obj->{PROPERTY}{host}"
        unless exists($data{$type}{$zone});
    my $host = $obj->{PROPERTY}{host};
    $host =~ s{^$zone\.}{};
    if (defined $data{$type}{$zone}{$host}{ip}) {
        push @{$data{$type}{$zone}{$host}{ip}}, $obj->{PROPERTY}{address};
    } else {
        $data{$type}{$zone}{$host}{ip} = [$obj->{PROPERTY}{address}];
    }
#    print "type/zone/host/IP = $type/$zone/$host/@{$data{$type}{$zone}{$host}{ip}}\n", Dumper($obj->{PROPERTY}), "\n" if ($type eq '.com.infoblox' and $host =~ m{boileropt});
}

foreach my $obj (@{$xml->{OBJECT}}) {
    my $type = $obj->{PROPERTY}{__type};
    next unless ($type and $type =~ m{\.one\.extensible_attributes_value$});
    next unless (exists $obj->{PROPERTY}{tag} and $obj->{PROPERTY}{tag} eq 'object_class' and exists $obj->{PROPERTY}{value} and exists $obj->{PROPERTY}{object});
    $type =~ s{\.one\.extensible_attributes_value$}{};
    my $str = $obj->{PROPERTY}{object};
    $str =~ s{^.*?\$}{};
    my @parts = split /\./, $str;
    my $zone = shift @parts;
    my $match;
    while (@parts) {
        $zone .= '.' . shift @parts;
        $match = $zone if exists($data{$type}{$zone});
    }
    $zone = $match;
    die "could not find zone in object $obj->{PROPERTY}{object}"
        unless exists($data{$type}{$zone});
    my $host = $obj->{PROPERTY}{object};
    $host =~ s{^.*?\$$zone\.}{};
    die "could not find host in object $obj->{PROPERTY}{object}"
        unless exists($data{$type}{$zone}{$host});
    warn "found class $obj->{PROPERTY}{value} for $type/$zone/$host which was already defined as $data{$type}{$zone}{$host}{class}" if $data{$type}{$zone}{$host}{class};
    $data{$type}{$zone}{$host}{class} = $obj->{PROPERTY}{value};
#    print "type/zone/host/class = $type/$zone/$host/$data{$type}{$zone}{$host}{class}\n", Dumper($obj->{PROPERTY}), "\n" if ($type eq '.com.infoblox' and $host =~ m{boileropt});
}

open CSV, '>onedb.csv' or die "could not write onedb.csv: $!";
foreach my $type (sort keys %data) {
    foreach my $zone (sort keys %{$data{$type}}) {
        foreach my $host (sort keys %{$data{$type}{$zone}}) {
            my ($ip, $host, $class) = @{$data{$type}{$zone}{$host}}{qw(ip host class)};
            $ip = '' unless defined($ip);
            $ip = join '|', @$ip if ref($ip);
            $host = '' unless defined($host);
            $class = '' unless defined($class);
#            print CSV "type=$type,IP_address=$ip,Hostname=$host,Object_Class=$class\n";
            print CSV "IP_address=$ip,Hostname=$host,Object_Class=$class\n";
        }
    }
}
close CSV;

Open in new window

0
 
LVL 1

Author Closing Comment

by:richsark
ID: 34193318
Awesome work and very prompt      
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

On Microsoft Windows, if  when you click or type the name of a .pl file, you get an error "is not recognized as an internal or external command, operable program or batch file", then this means you do not have the .pl file extension associated with …
Recently, an awarded photographer, Selina De Maeyer (http://www.selinademaeyer.com/), completed a photo shoot of a beautiful event (http://www.sintjacobantwerpen.be/verslag-en-fotoreportage-van-de-sacramentsprocessie-door-antwerpen#thumbnails) in An…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
In a recent question (https://www.experts-exchange.com/questions/29004105/Run-AutoHotkey-script-directly-from-Notepad.html) here at Experts Exchange, a member asked how to run an AutoHotkey script (.AHK) directly from Notepad++ (aka NPP). This video…

751 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question