Solved

XQuery performance - is unordered the answer

Posted on 2011-09-15
4
299 Views
Last Modified: 2013-11-11
I have an XQuery performance question I hope someone can assist with.

The code below is working fine but I would like to improve performance if possible.
What it is doing is ...
      -  getting all the distinct values of the prodname attribute found in the hits
then
  -  works out how many times each distinct value occurs in the hits
  -  returns those distinct values in order along with the total for each

I sometimes have up to 12000 items in $hits so the whole process can take a while (longer than I would like it too anyway).

I have read that using unordered expressions/functions can have a significant improvement on performance.
So, my question is, is there a way of improving the performance of the above - using unordered or any other way - and what coding changes would need to be made?
I would still need to keep the "order by $d" line 5 in order to keep the distinct values in alpha order for the return

let $tempResult := 
  	for $d in distinct-values($hits/ancestor-or-self::DOCUMENT/@prodname)
	    let $q := $hits/ancestor-or-self::DOCUMENT[@prodname = $d]      (: all the hits where prodname attribute has value of $d :)
			order by $d
			return <item zprodname="{$d}" zprodnamenum="{count($q)}"/>

Open in new window

0
Comment
Question by:Letterpart
  • 2
4 Comments
 
LVL 60

Assisted Solution

by:Geert Bormans
Geert Bormans earned 250 total points
ID: 36547626
order is not at all the problem in this XQuery.
Walking the ancestor axis is unnecessary slow here
and distinct-values can be slow on a large set
The combination is killing
 Why don't you make an intermediate variable before calculating the distinct values?
let $p := $hits/ancestor-or-self::DOCUMENT/@prodname
for $d := distinct-values($p)

I would even have a list of all possible prodnames in the entire set (would be too many depending on XML) but you need to calculate disting values only ones
let $p := //DOCUMENT/@prodname
let $d := distinct-values($p)
let $tempResult :=
          let $q := $hits/ancestor-or-self::DOCUMENT[@prodname = $d]
...

Further optimisation requires me to see the actual XML and a hint on where the hits are

optimising XQuery often is dependent on the database + processor or standalone processor,
so, some hints on the XML plus a notion on the products you use can help me to give a better advice
0
 
LVL 51

Accepted Solution

by:
Mark Wills earned 250 total points
ID: 36547726
Where / how are you running the xquery ?

The problem as I see it is going down to the individual products and doing the count. It becomes more like "row by row" processing.
 
Might not be avoidable either. You want product (name) and the count, so, your approach is not unreasonable.

But gertone is quite correct in saying getting the distinct values can be fairly intense. I would also say doing so with the order by on a very large collection will be quite intense. So, I am not quite as relaxed about the Order By...

In some processors it is possible to create indexes on XML elements and that can dramatically improve respone, then again, it should also be possible to extract, group and count depending on the xquery processor you are using (for example SQL server).



0
 
LVL 1

Author Closing Comment

by:Letterpart
ID: 36559373
Thanks very much for your help.
Further testing showed me that the bottleneck was the queries of $hits which occurred distinct-values of @prodname times - getting the distinct-values was OK.
I was able to create an index on the @prodname attribute and that dramatically improved performance.

cheers,
Paul
0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 36559469
welcome Paul
0

Featured Post

The New “Normal” in Modern Enterprise Operations

DevOps for the modern enterprise offers many benefits — increased agility, productivity, and more, but digital transformation isn’t easy, especially if you’re not addressing the right issues. Register for the webinar to dive into the “new normal” for enterprise modern ops.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
API Soap Calls 4 126
Need to take date/time from xml like: '10/12/2016 14:30' and place into a timestamp field 10 75
XML & .net 5 49
TSQL query to generate xml 4 46
Browsing the questions asked to the Experts of this forum, you will be amazed to see how many times people are headaching about monster regular expressions (regex) to select that specific part of some HTML or XML file they want to extract. The examp…
I was working on a PowerPoint add-in the other day and a client asked me "can you implement a feature which processes a chart when it's pasted into a slide from another deck?". It got me wondering how to hook into built-in ribbon events in Office.
Established in 1997, Technology Architects has become one of the most reputable technology solutions companies in the country. TA have been providing businesses with cost effective state-of-the-art solutions and unparalleled service that is designed…
With Secure Portal Encryption, the recipient is sent a link to their email address directing them to the email laundry delivery page. From there, the recipient will be required to enter a user name and password to enter the page. Once the recipient …

820 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question