?
Solved

XQuery performance - is unordered the answer

Posted on 2011-09-15
4
Medium Priority
?
313 Views
Last Modified: 2013-11-11
I have an XQuery performance question I hope someone can assist with.

The code below is working fine but I would like to improve performance if possible.
What it is doing is ...
      -  getting all the distinct values of the prodname attribute found in the hits
then
  -  works out how many times each distinct value occurs in the hits
  -  returns those distinct values in order along with the total for each

I sometimes have up to 12000 items in $hits so the whole process can take a while (longer than I would like it too anyway).

I have read that using unordered expressions/functions can have a significant improvement on performance.
So, my question is, is there a way of improving the performance of the above - using unordered or any other way - and what coding changes would need to be made?
I would still need to keep the "order by $d" line 5 in order to keep the distinct values in alpha order for the return

let $tempResult := 
  	for $d in distinct-values($hits/ancestor-or-self::DOCUMENT/@prodname)
	    let $q := $hits/ancestor-or-self::DOCUMENT[@prodname = $d]      (: all the hits where prodname attribute has value of $d :)
			order by $d
			return <item zprodname="{$d}" zprodnamenum="{count($q)}"/>

Open in new window

0
Comment
Question by:Letterpart
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
4 Comments
 
LVL 60

Assisted Solution

by:Geert Bormans
Geert Bormans earned 1000 total points
ID: 36547626
order is not at all the problem in this XQuery.
Walking the ancestor axis is unnecessary slow here
and distinct-values can be slow on a large set
The combination is killing
 Why don't you make an intermediate variable before calculating the distinct values?
let $p := $hits/ancestor-or-self::DOCUMENT/@prodname
for $d := distinct-values($p)

I would even have a list of all possible prodnames in the entire set (would be too many depending on XML) but you need to calculate disting values only ones
let $p := //DOCUMENT/@prodname
let $d := distinct-values($p)
let $tempResult :=
          let $q := $hits/ancestor-or-self::DOCUMENT[@prodname = $d]
...

Further optimisation requires me to see the actual XML and a hint on where the hits are

optimising XQuery often is dependent on the database + processor or standalone processor,
so, some hints on the XML plus a notion on the products you use can help me to give a better advice
0
 
LVL 51

Accepted Solution

by:
Mark Wills earned 1000 total points
ID: 36547726
Where / how are you running the xquery ?

The problem as I see it is going down to the individual products and doing the count. It becomes more like "row by row" processing.
 
Might not be avoidable either. You want product (name) and the count, so, your approach is not unreasonable.

But gertone is quite correct in saying getting the distinct values can be fairly intense. I would also say doing so with the order by on a very large collection will be quite intense. So, I am not quite as relaxed about the Order By...

In some processors it is possible to create indexes on XML elements and that can dramatically improve respone, then again, it should also be possible to extract, group and count depending on the xquery processor you are using (for example SQL server).



0
 
LVL 1

Author Closing Comment

by:Letterpart
ID: 36559373
Thanks very much for your help.
Further testing showed me that the bottleneck was the queries of $hits which occurred distinct-values of @prodname times - getting the distinct-values was OK.
I was able to create an index on the @prodname attribute and that dramatically improved performance.

cheers,
Paul
0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 36559469
welcome Paul
0

Featured Post

Quiz: What Do These Organizations Have In Common?

Hint: Their teams ended up taking quizzes, too.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction In my previous article (http://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/SSIS/A_9150-Loading-XML-Using-SSIS.html) I showed you how the XML Source component can be used to load XML files into a SQL Server database, us…
I was working on a PowerPoint add-in the other day and a client asked me "can you implement a feature which processes a chart when it's pasted into a slide from another deck?". It got me wondering how to hook into built-in ribbon events in Office.
In this brief tutorial Pawel from AdRem Software explains how you can quickly find out which services are running on your network, or what are the IP addresses of servers responsible for each service. Software used is freeware NetCrunch Tools (https…
In this video, Percona Solutions Engineer Barrett Chambers discusses some of the basic syntax differences between MySQL and MongoDB. To learn more check out our webinar on MongoDB administration for MySQL DBA: https://www.percona.com/resources/we…

718 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question