Link to home
Start Free TrialLog in
Avatar of sunny82
sunny82

asked on

Converting an XML file to a pipe-delimited file taking long time

Hi,

I am trying to load an xml file to Oracle database. I am first converting the xml to a flat file (pipe-delimited) and then using sql loader to load it to oracle database. The problem is when the xml file has large no of records, it is taking long time just to generate the flat file. For e.g. if the xml has 50000 records, it is taking about 27 mins to genearte the flat file.

Is there any issue with the code? Or how can I make it faster? Or Any alternate approach to loading xml data to oracle table?

Here is the code I am using
-----------------------------------------------------

#!/export/local/bin/perl

use warnings;
use strict;
use XML::XPath;

my($xp) = XML::XPath->new( join('', Record.xml') );
my(@records) = $xp->findnodes( '/Records/Record' );
my($firstTime) = 0;

foreach my $record ( @records ) {
my(@fields) = $xp->find( './child::*', $record )->get_nodelist();
unless ( $firstTime++ ) {
print( join( '|', map { $_->getName() } @fields ), "\n");
}

print( join( '|', map { $_->string_value() } @fields ), "\n");

------------------------------------------------------
SOLUTION
Avatar of parparov
parparov
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of sunny82
sunny82

ASKER

So can you help me with a better solution? maybe using some other modules which can process large xml files very fast? How about using XML::XSLT or XML::LibXML.

I have no idea which modules can be fast in this case.
Did you try print optimization first?

Also, monitor memory usage by your script. If you see it swelling into swap area, that may be the key point we'll try to address.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Also, for a flat list of records I would consider using XML::Simple.
Then slurping everything at once will result in a 50K-member list, each member being a hash which can't be too large (it's DB row).

http://search.cpan.org/~grantm/XML-Simple-2.18/lib/XML/Simple.pm
SOLUTION
Avatar of slightwv (䄆 Netminder)
slightwv (䄆 Netminder)

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of sunny82

ASKER

Thanks everyone for the answers.

@parparov,@bounsy: you were right, XML::XPath was the problem. I did not try the print optimization though.

I found one module XML::LibXSLTwhich uses a stylesheet to convert it into a csv. 50K rows got converted into a pipe delimited file within 2 mins by applying the stylesheet. I have not tried with XML::Simple though.

@slightw v: I tried the CLOB solution by creating an XMLType column in the table. I created an Oracle stored procedure which loads data into the table. But I could not view the contents of that column after running the procedure as it throws me an error "OCITypeBinder - Cannot convert to Stream". On doing some research, I found that this was a common bug in older DB Artisan and Rapid SQL versions, which they have corrected later. (I am using the old RapidSQL7.6.4 btw). May be you can also throw some more light in this regard if you have encountered anything like this.

Here is the proc code I wrote -->

--------------------------------------------
CREATE OR REPLACE PROCEDURE           LOAD_XML
(  
  p_dir IN VARCHAR2,                                    
  p_filename  IN  VARCHAR2) AS
  l_bfile  BFILE := BFILENAME(p_dir, p_filename);
  l_clob   CLOB;
BEGIN
  DBMS_LOB.createtemporary (l_clob, TRUE);
 
  DBMS_LOB.fileopen(l_bfile, DBMS_LOB.file_readonly);
  DBMS_LOB.loadfromfile(l_clob, l_bfile, DBMS_LOB.getlength(l_bfile));
  DBMS_LOB.fileclose(l_bfile);
  INSERT INTO xml_tab (
    id,
    filename,
    xml
  )
  VALUES (
    xml_tab_seq.NEXTVAL,
    p_filename,
    XMLTYPE.createXML(l_clob)
  );
  COMMIT;
 
  DBMS_LOB.freetemporary (l_clob);
END;
-------------------------------------------------

After running the procedure by passing the parameters, when I am trying to do

select xml from xml_tab, I am getting that OCITypeBinder Error.

Regarding loading XML file into a table by using xml tags as delimiters through RapidSQL, I would love to try that solution which you provided.
Avatar of sunny82

ASKER

Correction, my last comment will be

"Regarding loading XML file into a table by using xml tags as delimiters through SQL Loader, I would love to try that solution which you provided."
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial