Solved

XQuery performance - is unordered the answer

Posted on 2011-09-15
4
305 Views
Last Modified: 2013-11-11
I have an XQuery performance question I hope someone can assist with.

The code below is working fine but I would like to improve performance if possible.
What it is doing is ...
      -  getting all the distinct values of the prodname attribute found in the hits
then
  -  works out how many times each distinct value occurs in the hits
  -  returns those distinct values in order along with the total for each

I sometimes have up to 12000 items in $hits so the whole process can take a while (longer than I would like it too anyway).

I have read that using unordered expressions/functions can have a significant improvement on performance.
So, my question is, is there a way of improving the performance of the above - using unordered or any other way - and what coding changes would need to be made?
I would still need to keep the "order by $d" line 5 in order to keep the distinct values in alpha order for the return

let $tempResult := 
  	for $d in distinct-values($hits/ancestor-or-self::DOCUMENT/@prodname)
	    let $q := $hits/ancestor-or-self::DOCUMENT[@prodname = $d]      (: all the hits where prodname attribute has value of $d :)
			order by $d
			return <item zprodname="{$d}" zprodnamenum="{count($q)}"/>

Open in new window

0
Comment
Question by:Letterpart
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
4 Comments
 
LVL 60

Assisted Solution

by:Geert Bormans
Geert Bormans earned 250 total points
ID: 36547626
order is not at all the problem in this XQuery.
Walking the ancestor axis is unnecessary slow here
and distinct-values can be slow on a large set
The combination is killing
 Why don't you make an intermediate variable before calculating the distinct values?
let $p := $hits/ancestor-or-self::DOCUMENT/@prodname
for $d := distinct-values($p)

I would even have a list of all possible prodnames in the entire set (would be too many depending on XML) but you need to calculate disting values only ones
let $p := //DOCUMENT/@prodname
let $d := distinct-values($p)
let $tempResult :=
          let $q := $hits/ancestor-or-self::DOCUMENT[@prodname = $d]
...

Further optimisation requires me to see the actual XML and a hint on where the hits are

optimising XQuery often is dependent on the database + processor or standalone processor,
so, some hints on the XML plus a notion on the products you use can help me to give a better advice
0
 
LVL 51

Accepted Solution

by:
Mark Wills earned 250 total points
ID: 36547726
Where / how are you running the xquery ?

The problem as I see it is going down to the individual products and doing the count. It becomes more like "row by row" processing.
 
Might not be avoidable either. You want product (name) and the count, so, your approach is not unreasonable.

But gertone is quite correct in saying getting the distinct values can be fairly intense. I would also say doing so with the order by on a very large collection will be quite intense. So, I am not quite as relaxed about the Order By...

In some processors it is possible to create indexes on XML elements and that can dramatically improve respone, then again, it should also be possible to extract, group and count depending on the xquery processor you are using (for example SQL server).



0
 
LVL 1

Author Closing Comment

by:Letterpart
ID: 36559373
Thanks very much for your help.
Further testing showed me that the bottleneck was the queries of $hits which occurred distinct-values of @prodname times - getting the distinct-values was OK.
I was able to create an index on the @prodname attribute and that dramatically improved performance.

cheers,
Paul
0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 36559469
welcome Paul
0

Featured Post

MIM Survival Guide for Service Desk Managers

Major incidents can send mastered service desk processes into disorder. Systems and tools produce the data needed to resolve these incidents, but your challenge is getting that information to the right people fast. Check out the Survival Guide and begin bringing order to chaos.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I was working on a PowerPoint add-in the other day and a client asked me "can you implement a feature which processes a chart when it's pasted into a slide from another deck?". It got me wondering how to hook into built-in ribbon events in Office.
Create a Windows 10 custom Image with custom task bar and custom start menu using XML for deployment.
Monitoring a network: how to monitor network services and why? Michael Kulchisky, MCSE, MCSA, MCP, VTSP, VSP, CCSP outlines the philosophy behind service monitoring and why a handshake validation is critical in network monitoring. Software utilized …
If you’ve ever visited a web page and noticed a cool font that you really liked the look of, but couldn’t figure out which font it was so that you could use it for your own work, then this video is for you! In this Micro Tutorial, you'll learn yo…

729 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question