fabiano petrone
asked on
Perl processing of an XML file
Hello,
an xml file contains user records like:
mymail.com "
is it possible individuating & suppressing all and only these <user> records?
Thanks a lot,
Fabiano
an xml file contains user records like:
<user>
<account_type>EXTERNAL</account_type>
<primary_id>BRMNTN62M16A944E</primary_id>
<first_name>John</first_name>
<last_name>Doe</last_name>
...other fields...
<user_identifier>
<id_type desc="Additional">02</id_type>
<value>john.doe@mymail.com</value>
</user_identifier>
...other fields...
</user>
it happens that some records have the SAME <primary_id> but the <value> of the <user_identifier> of <type> 02 (see above) have a value different than "<first_name>.<last_name>@is it possible individuating & suppressing all and only these <user> records?
Thanks a lot,
Fabiano
ASKER
Hi, ozo
yes: same as "BRMNTN62M16A944E".
i.e. there're 2 records with
1) same <primary_id>BRMNTN62M16A94 4E</primar y_id>
2) same first & second, say:
<first_name>John</first_na me>
<last_name>Doe</last_name>
3) but different email, say:
<user_identifier>
<id_type desc="Additional">02</id_t ype>
<value>john.doe@mymail.com </value>
</user_identifier>
for the first record and
<user_identifier>
<id_type desc="Additional">02</id_t ype>
<value>something.else@myma il.com</va lue>
</user_identifier>
I want to keep only the first <user>...</user> record and delete the second
Thanks a lot,
Fabiano
yes: same as "BRMNTN62M16A944E".
i.e. there're 2 records with
1) same <primary_id>BRMNTN62M16A94
2) same first & second, say:
<first_name>John</first_na
<last_name>Doe</last_name>
3) but different email, say:
<user_identifier>
<id_type desc="Additional">02</id_t
<value>john.doe@mymail.com
</user_identifier>
for the first record and
<user_identifier>
<id_type desc="Additional">02</id_t
<value>something.else@myma
</user_identifier>
I want to keep only the first <user>...</user> record and delete the second
Thanks a lot,
Fabiano
ASKER
Hi, ozo
to make the things more clearer (I hope) I've attached a sample file with 2 <user> records.
They have the same value for the fields:
<primary_id>BRMNTN62M16A94 4E</primar y_id>
<first_name>john</first_na me>
<last_name>doe</last_name>
but different value in the user_identifier field.
for the first record:
<user_identifier>
<id_type desc="Additional">02</id_t ype>
<value>john.doe@uniud.it</ value>
</user_identifier>
and for the second one:
<user_identifier>
<id_type desc="Additional">02</id_t ype>
<value>someOther.thing@uni ud.it</val ue>
</user_identifier>
I want to keep only the first <user> record and delete the second.
Thanks,
Fabiano
sample.xml
to make the things more clearer (I hope) I've attached a sample file with 2 <user> records.
They have the same value for the fields:
<primary_id>BRMNTN62M16A94
<first_name>john</first_na
<last_name>doe</last_name>
but different value in the user_identifier field.
for the first record:
<user_identifier>
<id_type desc="Additional">02</id_t
<value>john.doe@uniud.it</
</user_identifier>
and for the second one:
<user_identifier>
<id_type desc="Additional">02</id_t
<value>someOther.thing@uni
</user_identifier>
I want to keep only the first <user> record and delete the second.
Thanks,
Fabiano
sample.xml
You need to parse your XML with a Perl module which gives you access to all its elements, in a hierarchy that you can use to find whatever you want and delete it. I generally use XML::LibXML for this. Here's a little bit of code that may help you get going; XML::LibXML and its related modules (XML::LibXML::Node and XML::LibXML::Element are the ones you'll use most) will do everything you want.
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
use XML::LibXML;
my $dom = XML::LibXML->load_xml(location => 'sample.xml');
my @primary_ids = $dom->findnodes('/users/user/primary_id');
my $previous_id = '';
foreach my $id_elt ( @primary_ids ){
my $this_id = $id_elt->to_literal();
if ( $this_id eq $previous_id ){
my @user_identifiers = $id_elt->parentNode()->findnodes('./user_identifiers/user_identifier');
print "User identifiers for $this_id: \n";
print "-----------------\n$_\n" for @user_identifiers;
print "Here you do other things as required\n";
}
else {
$previous_id = $this_id;
}
}
ASKER
Hi, Thanks a lot for the reply
is this work possible with the XML::Simple module?
Thanks,
Fabiano
is this work possible with the XML::Simple module?
Thanks,
Fabiano
You could try, but i personally wouldn't recommend it. XML::Simple makes assumptions about the way that the XML data is represented by variables--arrays, hashes and so forth--which may not be the best way of doing it from the programming point of view. There are parameters that you can pass which change that, but the way the arrays and hashes are created has a tendency to change with the structure of the data, so that your program may fail at some later stage because perfectly valid XML is structured in a new way. The CPAN entry for XML::Simple even says this: "The use of this module in new code is discouraged. Other modules are available which provide more straightforward and consistent interfaces"
If you know XML::Simple then this page may be of interest.
If you know XML::Simple then this page may be of interest.
ASKER
Hi,
Thanks for the reply.
I use W7 with activeperl 5.24.0 and on the ppm I can find the XML::Simple and XML::Parser modules, but sadly not the XML::LibXML module
Thanks for the reply.
I use W7 with activeperl 5.24.0 and on the ppm I can find the XML::Simple and XML::Parser modules, but sadly not the XML::LibXML module
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Hi, Henry
Thanks: I'll follow your suggestions
Thanks,
Fabiano
Thanks: I'll follow your suggestions
Thanks,
Fabiano
What characters are allowed <first_name> and <last_name>?