[Webinar] Streamline your web hosting managementRegister Today

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 326
  • Last Modified:

XQuery performance - is unordered the answer

I have an XQuery performance question I hope someone can assist with.

The code below is working fine but I would like to improve performance if possible.
What it is doing is ...
      -  getting all the distinct values of the prodname attribute found in the hits
then
  -  works out how many times each distinct value occurs in the hits
  -  returns those distinct values in order along with the total for each

I sometimes have up to 12000 items in $hits so the whole process can take a while (longer than I would like it too anyway).

I have read that using unordered expressions/functions can have a significant improvement on performance.
So, my question is, is there a way of improving the performance of the above - using unordered or any other way - and what coding changes would need to be made?
I would still need to keep the "order by $d" line 5 in order to keep the distinct values in alpha order for the return

let $tempResult := 
  	for $d in distinct-values($hits/ancestor-or-self::DOCUMENT/@prodname)
	    let $q := $hits/ancestor-or-self::DOCUMENT[@prodname = $d]      (: all the hits where prodname attribute has value of $d :)
			order by $d
			return <item zprodname="{$d}" zprodnamenum="{count($q)}"/>

Open in new window

0
Letterpart
Asked:
Letterpart
  • 2
2 Solutions
 
Geert BormansInformation ArchitectCommented:
order is not at all the problem in this XQuery.
Walking the ancestor axis is unnecessary slow here
and distinct-values can be slow on a large set
The combination is killing
 Why don't you make an intermediate variable before calculating the distinct values?
let $p := $hits/ancestor-or-self::DOCUMENT/@prodname
for $d := distinct-values($p)

I would even have a list of all possible prodnames in the entire set (would be too many depending on XML) but you need to calculate disting values only ones
let $p := //DOCUMENT/@prodname
let $d := distinct-values($p)
let $tempResult :=
          let $q := $hits/ancestor-or-self::DOCUMENT[@prodname = $d]
...

Further optimisation requires me to see the actual XML and a hint on where the hits are

optimising XQuery often is dependent on the database + processor or standalone processor,
so, some hints on the XML plus a notion on the products you use can help me to give a better advice
0
 
Mark WillsTopic AdvisorCommented:
Where / how are you running the xquery ?

The problem as I see it is going down to the individual products and doing the count. It becomes more like "row by row" processing.
 
Might not be avoidable either. You want product (name) and the count, so, your approach is not unreasonable.

But gertone is quite correct in saying getting the distinct values can be fairly intense. I would also say doing so with the order by on a very large collection will be quite intense. So, I am not quite as relaxed about the Order By...

In some processors it is possible to create indexes on XML elements and that can dramatically improve respone, then again, it should also be possible to extract, group and count depending on the xquery processor you are using (for example SQL server).



0
 
LetterpartAuthor Commented:
Thanks very much for your help.
Further testing showed me that the bottleneck was the queries of $hits which occurred distinct-values of @prodname times - getting the distinct-values was OK.
I was able to create an index on the @prodname attribute and that dramatically improved performance.

cheers,
Paul
0
 
Geert BormansInformation ArchitectCommented:
welcome Paul
0

Featured Post

Hire Technology Freelancers with Gigs

Work with freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely, and get projects done right.

  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now