Solved

How do i parse XML to csv or to a plain text file and use sqlldr to load to Oracle 10g

Posted on 2009-07-07
10
465 Views
Last Modified: 2013-12-19
Hi,

I'm looking for a way of parsing an XML file into csv or similar a similar flatfile in order to use sqlldr to load it to an Oracle 10g table.

I had thought of perhaps using a perl script to use this but i'm not sure of the best way of going about it.

I have found the source code for some java classes to create a SAX utility however i've had no success in loading the file to an XMLTYPE table.  My first preference would be to parse it to csv or similar anyway.

The XML is an export from the Cramer inventory system if its of any assistance to those interested my dilemma.

(I'm very new to XML and i have limited scripting experience)

Thanks in advance.
Example of my XML input:
 
 
<?xml version = '1.0' encoding = 'UTF-8'?>
<UDM xmlns="http://www.cramer.com/udm" xmlns:udm="http://www.cramer.com/udm" versionNumber="2.0" generatedBy="UDMGenerator" source="ResourceManager" UDMDateTime="2009-06-01T06:52:51" profileName="export.BulkServiceableLocations.CIS.CIS">
   <location UDMID="25275" name="ServiceLocation_PTVFTTP_1" type="Network Structure" status="1760030082" level="1" UPK="alias2_PTVFTTP_1">
      <attr UDMID="25475" name="fullName" value=""/>
      <attr UDMID="25476" name="relativeName" value=""/>
      <attr UDMID="25477" name="alias1" value=""/>
      <attr UDMID="25478" name="alias2" value="alias2_PTVFT"/>
      <attr UDMID="25479" name="objectID" value=""/>
      <attr UDMID="25480" name="subType" value=""/>
      <attr UDMID="25481" name="subStatus" value=""/>
      <attr UDMID="25482" name="createdDate" value="2009-05-26 12:44:18.0"/>
      <attr UDMID="25483" name="lastModifiedDate" value="2009-05-26 12:44:19.0"/>
      <attr UDMID="25484" name="address" value=""/>
      <attr UDMID="25485" name="townCity" value=""/>
      <attr UDMID="25486" name="province" value=""/>
      <attr UDMID="25487" name="zip" value=""/>
      <attr UDMID="25488" name="responsible" value=""/>
      <attr UDMID="25489" name="telephone" value=""/>
      <attr UDMID="25490" name="fax" value=""/>
      <attr UDMID="25491" name="address1" value=""/>
      <attr UDMID="25492" name="address2" value=""/>
      <attr UDMID="25493" name="address3" value=""/>
      <attr UDMID="25494" name="functionalStatus" value=""/>
      <attr UDMID="27475" name="UNIT_NUMBER" value=""/>
      <attr UDMID="27575" name="FLOOR" value=""/>
      <attr UDMID="27675" name="PROPERTY_NUMBER" value=""/>
      <attr UDMID="27775" name="STREET_NAME" value="STN_1866"/>
      <attr UDMID="27776" name="STREET_DESCR" value="STYP"/>
      <attr UDMID="27975" name="DISTRIBUTION_AREA" value="DA_205"/>
      <attr UDMID="27976" name="EXCHANGE_AREA" value="EXA_205"/>
      <attr UDMID="27977" name="DATE_SERVICEABLE" value="2015-01-01 00:00:00.0"/>
      <attr UDMID="27978" name="SYSTEM_CODE" value="CBL"/>
      <attr UDMID="28375" name="BUILDINGID" value=""/>
      <attr UDMID="28475" name="ACTION" value=""/>
      <attr UDMID="28476" name="CABLE_LOCATION" value="F"/>
      <attr UDMID="28477" name="LEADIN_LOCATION" value="F"/>
      <attr UDMID="28478" name="HEADEND" value="DVH"/>
      <attr UDMID="28479" name="NODE" value="VPR"/>
      <attr UDMID="28480" name="HUB" value="VPST_130"/>
      <attr UDMID="28481" name="AMPLIFIER" value=""/>
      <attr UDMID="28482" name="POWER" value="IN_1307"/>
      <attr UDMID="28483" name="NODEABBREVIATION" value="NC_1307"/>
      <sourceObject UDMID="25276" SOClass="6" SOObjID="1866"/>
   </location>
   <location UDMID="25277" name="ServiceLocation_PTVFTTP_2" type="Network Structure" status="1760030082" level="1" UPK="alias2_PTVFTTP_2">
      <attr UDMID="25495" name="fullName" value=""/>
      <attr UDMID="25496" name="relativeName" value=""/>
      <attr UDMID="25497" name="alias1" value=""/>
      <attr UDMID="25498" name="alias2" value="alias2_PTVFT"/>
      <attr UDMID="25499" name="objectID" value=""/>
      <attr UDMID="25500" name="subType" value=""/>
      <attr UDMID="25501" name="subStatus" value=""/>
      <attr UDMID="25502" name="createdDate" value="2009-05-26 12:44:21.0"/>
      <attr UDMID="25503" name="lastModifiedDate" value="2009-05-26 12:44:21.0"/>
      <attr UDMID="25504" name="address" value=""/>
      <attr UDMID="25505" name="townCity" value=""/>
      <attr UDMID="25506" name="province" value=""/>
      <attr UDMID="25507" name="zip" value=""/>
      <attr UDMID="25508" name="responsible" value=""/>
      <attr UDMID="25509" name="telephone" value=""/>
      <attr UDMID="25510" name="fax" value=""/>
      <attr UDMID="25511" name="address1" value=""/>
      <attr UDMID="25512" name="address2" value=""/>
      <attr UDMID="25513" name="address3" value=""/>
      <attr UDMID="25514" name="functionalStatus" value=""/>
      <attr UDMID="27476" name="UNIT_NUMBER" value=""/>
      <attr UDMID="27576" name="FLOOR" value=""/>
      <attr UDMID="27676" name="PROPERTY_NUMBER" value=""/>
      <attr UDMID="27777" name="STREET_NAME" value="STN_1867"/>
      <attr UDMID="27778" name="STREET_DESCR" value="STYP"/>
      <attr UDMID="27979" name="DISTRIBUTION_AREA" value="DA_206"/>
      <attr UDMID="27980" name="EXCHANGE_AREA" value="EXA_206"/>
      <attr UDMID="27981" name="DATE_SERVICEABLE" value="2015-01-01 00:00:00.0"/>
      <attr UDMID="27982" name="SYSTEM_CODE" value="CBL"/>
      <attr UDMID="28376" name="BUILDINGID" value=""/>
      <attr UDMID="28484" name="ACTION" value=""/>
      <attr UDMID="28485" name="CABLE_LOCATION" value="F"/>
      <attr UDMID="28486" name="LEADIN_LOCATION" value="F"/>
      <attr UDMID="28487" name="HEADEND" value="DVH"/>
      <attr UDMID="28488" name="NODE" value="VPR"/>
      <attr UDMID="28489" name="HUB" value="VPST_130"/>
      <attr UDMID="28490" name="AMPLIFIER" value=""/>
      <attr UDMID="28491" name="POWER" value="IN_1308"/>
      <attr UDMID="28492" name="NODEABBREVIATION" value="NC_1308"/>
      <sourceObject UDMID="25278" SOClass="6" SOObjID="1867"/>
   </location>
</UDM>

Open in new window

0
Comment
Question by:liamose
  • 5
  • 4
10 Comments
 
LVL 40

Expert Comment

by:mrjoltcola
ID: 24799698
One approach is to just load the whole XML file into an Oracle XMLTYPE or CLOB field and use Oracle's XML API to translate it.

See if this previous question helps:

http://www.experts-exchange.com/Database/Oracle/Q_24401746.html
0
 

Author Comment

by:liamose
ID: 24799817
Hi Mrjoltcola,

That is a useful option alright and that article has a few good insights for my issue. Thanks.

However, i've been requested to provide a solution which would first parse this to plain text first, before ever going near Oracle, if its possible.    

This task will have to performed daily on a ~5mb file, however i'll have an initial load of ~12gb to deal with first and i guess Loading it to an XMLTYPE field is probably to most sensible for this.
0
 
LVL 39

Expert Comment

by:Adam314
ID: 24801240
How do you want the output file formatted?  There are several modules in perl that could be used to parse the XML and convert it to csv.

One of the easiest to use is XML::Simple.  This will load the entire XML file into memory though.  If each XML file you need to load is not very large (5MB would be fine), then this module will likely work well.
http://search.cpan.org/~grantm/XML-Simple-2.18/lib/XML/Simple.pm

If any one XML file is to large to load into memory, the XML::LibXML module works well.  There are several interfaces it provides.  Which will work best here would depend on how you want the output to look.
http://search.cpan.org/~pajas/XML-LibXML-1.69/LibXML.pod
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 

Author Comment

by:liamose
ID: 24809707
Hi Adam,

Thanks for your reply.

I want to parse the XML into either a comma or tab delimited text file.

The XML::Simple module seems appropriate for processing my 5mb daily file.  Given the snippet of XML i provided in my question, do you know if that's easily digested by the XML:Simple module?
0
 
LVL 39

Expert Comment

by:Adam314
ID: 24810830
Yes, the XML you have is very easily parsed with XML::Simple.  This code snipped is all you need:
    use XML::Simple;    #Put at the top of your script
    my $xml = XMLin('filename.xml');    #Use your actual filename
That's it.  You can then access everything in your XML file through the $xml variable.  There are some options to the XMLin function that might be needed, depending on your exact needs.

For the XML sample you gave, how should the CSV file look?  Would each location tag be a row, with the values from the attr tags the fields? Meaning two rows of 39 fields?  Or some other way?

Here is an example program that works that way - each location tag is a row, with several of the attr tags, identified by their UDMID attribute used as fields of the row.
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use XML::Simple;
 
my $xml = XMLin('file1.xml', keyattr => {location => 'id', attr => 'UDMID'}, forcearray => ['location']);
my @UDMIDs = ('27775', '25489', '28477', '28479');
 
print join(",", @UDMIDs) . "\n";
foreach my $location (@{$xml->{location}}) {
	print join(",", map {$location->{attr}->{$_}->{value}} @UDMIDs) . "\n";
}

Open in new window

0
 

Author Comment

by:liamose
ID: 24836620
Hi Adam,

Thanks for your reply. This is exactly what i needed - with each location tag be a row, with the values from the attr tags the fields.

However, when i ran your script, i got the following error:



27775,25489,28477,28479
STN_1866,,F,VPR
Use of uninitialized value in join or string at adam.pl line 13.
Use of uninitialized value in join or string at adam.pl line 13.
Use of uninitialized value in join or string at adam.pl line 13.
Use of uninitialized value in join or string at adam.pl line 13
0
 
LVL 39

Expert Comment

by:Adam314
ID: 24843100
The "Use of uninitialized value..." means that the field does not have a value.  It is a warning.  You can skip these warnings with this:

print join(",", map {$location->{attr}->{$_}->{value} or ""} @UDMIDs) . "\n";

Open in new window

0
 

Author Comment

by:liamose
ID: 24845678
Thanks Adam.

I have just one last question i think.  I've modified the script slightly to access the data i need (see snippet) but i also need to access the data (name, type, status, level & UPK) at the start of the location tag and include it in each of the records

name="ServiceLocation_PTVFTTP_1" type="Network Structure" status="1760030082" level="1" UPK="alias2_PTVFTTP_1


#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use XML::Simple;
 
open (MYCSVFILE, '>>output.csv');
 
my $xml = XMLin('sample.xml', keyattr => {location => 'id', attr => 'name'}, forcearray => ['location']);
 
my @name = ('fullName', 'relativeName', 'alias1', 'alias2', 'objectID', 'subType', 'subStatus', 'createdDate', 'lastModifiedDate',
  'address', 'townCity', 'province', 'zip', 'responsible', 'telephone', 'fax', 'address1', 'address2', 'addresss3',
  'functionalStatus', 'UNIT_NUMBER', 'FLOOR', 'PROPERTY_NUMBER', 'STREET_NAME', 'STREET_DESCR', 'DISTRIBUTION_AREA',
  'EXCHANGE_AREA', 'DATE_SERVICEABLE', 'SYSTEM_CODE', 'BUILDINGID', 'ACTION', 'CABLE_LOCATION', 'LEADIN_LOCATION',
  'HEADEND', 'NODE', 'HUB', 'AMPLIFIER', 'POWER', 'NODEABBREVIATION' );
 
print MYCSVFILE join(",", @name) . "\n";
 
foreach my $location (@{$xml->{location}})
 { print MYCSVFILE join(",", map {$location->{attr}->{$_}->{value} or ""} @name) . "\n"; }
 
close (MYCSVFILE);

Open in new window

0
 
LVL 39

Accepted Solution

by:
Adam314 earned 500 total points
ID: 24846327

...
my @locinfo = ('name', 'type', 'status', 'level', 'UPK');
my @name = ('fullName', 'relativeName', 'alias1', 'alias2', 'objectID', 'subType', 'subStatus', 'createdDate', 'lastModifiedDate',
  'address', 'townCity', 'province', 'zip', 'responsible', 'telephone', 'fax', 'address1', 'address2', 'addresss3',
  'functionalStatus', 'UNIT_NUMBER', 'FLOOR', 'PROPERTY_NUMBER', 'STREET_NAME', 'STREET_DESCR', 'DISTRIBUTION_AREA',
  'EXCHANGE_AREA', 'DATE_SERVICEABLE', 'SYSTEM_CODE', 'BUILDINGID', 'ACTION', 'CABLE_LOCATION', 'LEADIN_LOCATION',
  'HEADEND', 'NODE', 'HUB', 'AMPLIFIER', 'POWER', 'NODEABBREVIATION' );
 
print MYCSVFILE join(",", @locinfo, @name) . "\n";
 
foreach my $location (@{$xml->{location}}) {
    print MYCSVFILE join(",", map({$location->{$_} or ""} @locinfo), map({$location->{attr}->{$_}->{value} or ""} @name)) . "\n";
}
...

Open in new window

0
 

Author Closing Comment

by:liamose
ID: 31603140
Hi Adam,
Thanks a million. Thats worked perfectly for me.
Liam.
0

Featured Post

Master Your Team's Linux and Cloud Stack

Come see why top tech companies like Mailchimp and Media Temple use Linux Academy to build their employee training programs.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
oracle 11g 23 73
Issue with XSLT mapping 10 35
Help on model clause 5 27
Internal Server Error on 1&1 Apache Server running PERL 3 15
I remember the day when someone asked me to create a user for an application developement. The user should be able to create views and materialized views and, so, I used the following syntax: (CODE) This way, I guessed, I would ensure that use…
This post first appeared at Oracleinaction  (http://oracleinaction.com/undo-and-redo-in-oracle/)by Anju Garg (Myself). I  will demonstrate that undo for DML’s is stored both in undo tablespace and online redo logs. Then, we will analyze the reaso…
Via a live example show how to connect to RMAN, make basic configuration settings changes and then take a backup of a demo database
This video shows how to Export data from an Oracle database using the Original Export Utility.  The corresponding Import utility, which works the same way is referenced, but not demonstrated.

815 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now