dplinnane
asked on
Improving performance of EXTRACTVALUE XML DB
I have posted many times here in the last few weeks. The main problem I am have is this select statement.
If I have a count of 5000 as above then I have to crate a loop from 1 to 500 where I pass in i . The bellow statement takes more then a second.
select EXTRACTVALUE(SYS_NC_ROWINF
from XMLTABLE
WHERE EXISTSNODE (SYS_NC_ROWINFO$, '/PDATA/EVH') = 1;
I will have to process 640 million recs in 8 hours possible? Files will arrive in no greater then 1.5 MB.
if the above takes 2 seconds.
What I retrieve is <EVH>1~2~3~4~5~6~7~8~9~10|
This has to be parse using || delim which represents 1 rec.
This has to be parsed 1~2~3~4~5~6~7~8~9~10 using ~ then inserted or update a table.
So far the EXTRACTVALUE is the bottle neck I really need to improve this. I just posted another question on how to register xsd to see if that will speed thins up.
I am also directly loading xml file into a clob data type.
I basically have two approaches load to xmltype table or direct from xml.
Because the xml file is fairly simlpe 3 tags 1 of which has embeded || and ~ does it make more sesne to build my own parse using instr to sepreate out hdr , evh and evd tags.
I am storing the parsed values in a nested table, is this the most efficient, maybe varray?
Could the temp values be stored in memory to avoid disk reads.
Can I use bulk loading techniques?
Also I am working on a database that was setup by someone else.
There there are only 4 tablespaces
system
sysaux
undo
temp
the system and sysaux are 99% full. This will slow processing down for sure.
The table and index are also use system table space, I can create new table space for table data and for index.
We do not have tablespaces on different disks. That could be an improvement placing tablespaces on diff disks.
Does the XDB schema default to system tablespace? Should that have its own tablespace?
All the code is on www.ihos-t.com/code
Lots of questions, in crisis mode at this time. I don't think it can be done but I want to get to a stage where at least its not dog slow.
Thanks in advance.
Instead of using extractvalue , did you try using XSL style sheet to transform your data ?
ASKER
I'm new to dealing xml so I don't know all the options how would I go about that?
One other suggestion ,try creating the following indexes and see if you see any improvements
CREATE INDEX XML_fn_index ON XMLTABLE e
(EXTRACTVALUE(VALUE(e),'/P DATA/EVH') );
CREATE INDEX xml_index1 ON xmltable(SYS_NC_ROWINFO$) indextype is ctxsys.CTXXPATH;
CREATE INDEX XML_fn_index ON XMLTABLE e
(EXTRACTVALUE(VALUE(e),'/P
CREATE INDEX xml_index1 ON xmltable(SYS_NC_ROWINFO$) indextype is ctxsys.CTXXPATH;
ASKER
FOr the first indes I get
ORA-19025: EXTRACTVALUE returns value of only one node
The second index I get
ORA-19102: XQuery string literal expected
ORA-19025: EXTRACTVALUE returns value of only one node
The second index I get
ORA-19102: XQuery string literal expected
Check this link for the create index
http://www.utexas.edu/its/unix/reference/oracledocs/v92/B10501_01/text.920/a96518/csql.htm
http://www.utexas.edu/its/unix/reference/oracledocs/v92/B10501_01/text.920/a96518/csql.htm
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
I'm not a C programmer or Pro C* I'm looking for improvements in what I alreay have.
Did you try creating the index? You can create CTXSYS.CONTEXT index and see if there's any change in your query performance
ASKER
Had to write my own parser in the end , xml functionalit was way to slow. I will always be dealing with 4 tags. The performance gains were huge.
ASKER
DISK_READS BUFFER_GETS PARSE_CALLS USER_IO_WAIT_TIME ROWS_PROCESSED
11968 31173 3 4236831 5
144704 376860 3 67324519 66
144704 376860 3 68225328 66
for HDR, EVH and EVD
select EXTRACTVALUE(SYS_NC_ROWINF
from XMLTABLE
WHERE EXISTSNODE (SYS_NC_ROWINFO$, '/PDATA/EVH') = 1;