Solved

XQuery performance - is unordered the answer

Posted on 2011-09-15
4
303 Views
Last Modified: 2013-11-11
I have an XQuery performance question I hope someone can assist with.

The code below is working fine but I would like to improve performance if possible.
What it is doing is ...
      -  getting all the distinct values of the prodname attribute found in the hits
then
  -  works out how many times each distinct value occurs in the hits
  -  returns those distinct values in order along with the total for each

I sometimes have up to 12000 items in $hits so the whole process can take a while (longer than I would like it too anyway).

I have read that using unordered expressions/functions can have a significant improvement on performance.
So, my question is, is there a way of improving the performance of the above - using unordered or any other way - and what coding changes would need to be made?
I would still need to keep the "order by $d" line 5 in order to keep the distinct values in alpha order for the return

let $tempResult := 
  	for $d in distinct-values($hits/ancestor-or-self::DOCUMENT/@prodname)
	    let $q := $hits/ancestor-or-self::DOCUMENT[@prodname = $d]      (: all the hits where prodname attribute has value of $d :)
			order by $d
			return <item zprodname="{$d}" zprodnamenum="{count($q)}"/>

Open in new window

0
Comment
Question by:Letterpart
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
4 Comments
 
LVL 60

Assisted Solution

by:Geert Bormans
Geert Bormans earned 250 total points
ID: 36547626
order is not at all the problem in this XQuery.
Walking the ancestor axis is unnecessary slow here
and distinct-values can be slow on a large set
The combination is killing
 Why don't you make an intermediate variable before calculating the distinct values?
let $p := $hits/ancestor-or-self::DOCUMENT/@prodname
for $d := distinct-values($p)

I would even have a list of all possible prodnames in the entire set (would be too many depending on XML) but you need to calculate disting values only ones
let $p := //DOCUMENT/@prodname
let $d := distinct-values($p)
let $tempResult :=
          let $q := $hits/ancestor-or-self::DOCUMENT[@prodname = $d]
...

Further optimisation requires me to see the actual XML and a hint on where the hits are

optimising XQuery often is dependent on the database + processor or standalone processor,
so, some hints on the XML plus a notion on the products you use can help me to give a better advice
0
 
LVL 51

Accepted Solution

by:
Mark Wills earned 250 total points
ID: 36547726
Where / how are you running the xquery ?

The problem as I see it is going down to the individual products and doing the count. It becomes more like "row by row" processing.
 
Might not be avoidable either. You want product (name) and the count, so, your approach is not unreasonable.

But gertone is quite correct in saying getting the distinct values can be fairly intense. I would also say doing so with the order by on a very large collection will be quite intense. So, I am not quite as relaxed about the Order By...

In some processors it is possible to create indexes on XML elements and that can dramatically improve respone, then again, it should also be possible to extract, group and count depending on the xquery processor you are using (for example SQL server).



0
 
LVL 1

Author Closing Comment

by:Letterpart
ID: 36559373
Thanks very much for your help.
Further testing showed me that the bottleneck was the queries of $hits which occurred distinct-values of @prodname times - getting the distinct-values was OK.
I was able to create an index on the @prodname attribute and that dramatically improved performance.

cheers,
Paul
0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 36559469
welcome Paul
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

The Problem How to write an Xquery that works like a SQL outer join, providing placeholders for absent data on the outer side?  I give a bit more background at the end. The situation expressed as relational data Let’s work through this.  I’ve …
Browsing the questions asked to the Experts of this forum, you will be amazed to see how many times people are headaching about monster regular expressions (regex) to select that specific part of some HTML or XML file they want to extract. The examp…
Nobody understands Phishing better than an anti-spam company. That’s why we are providing Phishing Awareness Training to our customers. According to a report by Verizon, only 3% of targeted users report malicious emails to management. With compan…
Exchange organizations may use the Journaling Agent of the Transport Service to archive messages going through Exchange. However, if the Transport Service is integrated with some email content management application (such as an antispam), the admini…

739 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question