sunny82
asked on
Converting an XML file to a pipe-delimited file taking long time
Hi,
I am trying to load an xml file to Oracle database. I am first converting the xml to a flat file (pipe-delimited) and then using sql loader to load it to oracle database. The problem is when the xml file has large no of records, it is taking long time just to generate the flat file. For e.g. if the xml has 50000 records, it is taking about 27 mins to genearte the flat file.
Is there any issue with the code? Or how can I make it faster? Or Any alternate approach to loading xml data to oracle table?
Here is the code I am using
-------------------------- ---------- ---------- -------
#!/export/local/bin/perl
use warnings;
use strict;
use XML::XPath;
my($xp) = XML::XPath->new( join('', Record.xml') );
my(@records) = $xp->findnodes( '/Records/Record' );
my($firstTime) = 0;
foreach my $record ( @records ) {
my(@fields) = $xp->find( './child::*', $record )->get_nodelist();
unless ( $firstTime++ ) {
print( join( '|', map { $_->getName() } @fields ), "\n");
}
print( join( '|', map { $_->string_value() } @fields ), "\n");
-------------------------- ---------- ---------- --------
I am trying to load an xml file to Oracle database. I am first converting the xml to a flat file (pipe-delimited) and then using sql loader to load it to oracle database. The problem is when the xml file has large no of records, it is taking long time just to generate the flat file. For e.g. if the xml has 50000 records, it is taking about 27 mins to genearte the flat file.
Is there any issue with the code? Or how can I make it faster? Or Any alternate approach to loading xml data to oracle table?
Here is the code I am using
--------------------------
#!/export/local/bin/perl
use warnings;
use strict;
use XML::XPath;
my($xp) = XML::XPath->new( join('', Record.xml') );
my(@records) = $xp->findnodes( '/Records/Record' );
my($firstTime) = 0;
foreach my $record ( @records ) {
my(@fields) = $xp->find( './child::*', $record )->get_nodelist();
unless ( $firstTime++ ) {
print( join( '|', map { $_->getName() } @fields ), "\n");
}
print( join( '|', map { $_->string_value() } @fields ), "\n");
--------------------------
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Did you try print optimization first?
Also, monitor memory usage by your script. If you see it swelling into swap area, that may be the key point we'll try to address.
Also, monitor memory usage by your script. If you see it swelling into swap area, that may be the key point we'll try to address.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Also, for a flat list of records I would consider using XML::Simple.
Then slurping everything at once will result in a 50K-member list, each member being a hash which can't be too large (it's DB row).
http://search.cpan.org/~grantm/XML-Simple-2.18/lib/XML/Simple.pm
Then slurping everything at once will result in a 50K-member list, each member being a hash which can't be too large (it's DB row).
http://search.cpan.org/~grantm/XML-Simple-2.18/lib/XML/Simple.pm
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks everyone for the answers.
@parparov,@bounsy: you were right, XML::XPath was the problem. I did not try the print optimization though.
I found one module XML::LibXSLTwhich uses a stylesheet to convert it into a csv. 50K rows got converted into a pipe delimited file within 2 mins by applying the stylesheet. I have not tried with XML::Simple though.
@slightw v: I tried the CLOB solution by creating an XMLType column in the table. I created an Oracle stored procedure which loads data into the table. But I could not view the contents of that column after running the procedure as it throws me an error "OCITypeBinder - Cannot convert to Stream". On doing some research, I found that this was a common bug in older DB Artisan and Rapid SQL versions, which they have corrected later. (I am using the old RapidSQL7.6.4 btw). May be you can also throw some more light in this regard if you have encountered anything like this.
Here is the proc code I wrote -->
-------------------------- ---------- --------
CREATE OR REPLACE PROCEDURE LOAD_XML
(
p_dir IN VARCHAR2,
p_filename IN VARCHAR2) AS
l_bfile BFILE := BFILENAME(p_dir, p_filename);
l_clob CLOB;
BEGIN
DBMS_LOB.createtemporary (l_clob, TRUE);
DBMS_LOB.fileopen(l_bfile, DBMS_LOB.file_readonly);
DBMS_LOB.loadfromfile(l_cl ob, l_bfile, DBMS_LOB.getlength(l_bfile ));
DBMS_LOB.fileclose(l_bfile );
INSERT INTO xml_tab (
id,
filename,
xml
)
VALUES (
xml_tab_seq.NEXTVAL,
p_filename,
XMLTYPE.createXML(l_clob)
);
COMMIT;
DBMS_LOB.freetemporary (l_clob);
END;
-------------------------- ---------- ---------- ---
After running the procedure by passing the parameters, when I am trying to do
select xml from xml_tab, I am getting that OCITypeBinder Error.
Regarding loading XML file into a table by using xml tags as delimiters through RapidSQL, I would love to try that solution which you provided.
@parparov,@bounsy: you were right, XML::XPath was the problem. I did not try the print optimization though.
I found one module XML::LibXSLTwhich uses a stylesheet to convert it into a csv. 50K rows got converted into a pipe delimited file within 2 mins by applying the stylesheet. I have not tried with XML::Simple though.
@slightw v: I tried the CLOB solution by creating an XMLType column in the table. I created an Oracle stored procedure which loads data into the table. But I could not view the contents of that column after running the procedure as it throws me an error "OCITypeBinder - Cannot convert to Stream". On doing some research, I found that this was a common bug in older DB Artisan and Rapid SQL versions, which they have corrected later. (I am using the old RapidSQL7.6.4 btw). May be you can also throw some more light in this regard if you have encountered anything like this.
Here is the proc code I wrote -->
--------------------------
CREATE OR REPLACE PROCEDURE LOAD_XML
(
p_dir IN VARCHAR2,
p_filename IN VARCHAR2) AS
l_bfile BFILE := BFILENAME(p_dir, p_filename);
l_clob CLOB;
BEGIN
DBMS_LOB.createtemporary (l_clob, TRUE);
DBMS_LOB.fileopen(l_bfile,
DBMS_LOB.loadfromfile(l_cl
DBMS_LOB.fileclose(l_bfile
INSERT INTO xml_tab (
id,
filename,
xml
)
VALUES (
xml_tab_seq.NEXTVAL,
p_filename,
XMLTYPE.createXML(l_clob)
);
COMMIT;
DBMS_LOB.freetemporary (l_clob);
END;
--------------------------
After running the procedure by passing the parameters, when I am trying to do
select xml from xml_tab, I am getting that OCITypeBinder Error.
Regarding loading XML file into a table by using xml tags as delimiters through RapidSQL, I would love to try that solution which you provided.
ASKER
Correction, my last comment will be
"Regarding loading XML file into a table by using xml tags as delimiters through SQL Loader, I would love to try that solution which you provided."
"Regarding loading XML file into a table by using xml tags as delimiters through SQL Loader, I would love to try that solution which you provided."
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
I have no idea which modules can be fast in this case.