Link to home
Start Free TrialLog in
Avatar of raymurphy
raymurphy

asked on

JOIN performance problem - resolved but need to understand how ..

Original problem query :

SELECT  a.gbco, a.gbmcu,
        SUM(- a.AP12_sales + a.AP11_sales_advance + a.AP12_sales_arrears) AS [Net Sales],
        a.gbco + N'_' + a.Departure AS [Sort Code],
        SUM(b.AP1_sales + b.AP2_sales) AS Expr1, a.Departure, a.OCT_PREFIX
FROM    SalesFigures1 a LEFT OUTER JOIN
        SalesFigures2 b ON LTRIM(a.gbmcu) = LTRIM(b.gbmcu)WHERE     (a.Departure = N'2010.12')
GROUP BY a.gbco, a.gbmcu, a.gbco + N'_' + a.Departure, a.Departure, a.OCT_PREFIX
HAVING      (SUM(- a.AP12_sales + a.AP11_sales_advance + a.AP12_sales_arrears) <> 0)
            AND (SUM(a.AP12_Purchases + a.AP12_PO) = 0)
            AND (SUM(b.AP1_Purchases + b.AP2_Purchases + b.AP2_PO) = 0)

SalesFigures1 has 3,857,691 rows
SalesFigures2 has 1,122,659 rows

Customer reported that original query was just sitting forever without returning any rows. I
didn't have any access to this customer's box and so couldn't check query execution plan or run Profiler etc.

Established with customer that No index on Departure and no index on gbmcu, and my initial step was to advise the customer to add an index on Departure and an index on gbmcu - but this customer would have to involve their DBA resources for this, and they weren't immediately available.

Given that, I wondered if using LTRIM in the JOIN might be an issue, so suggested
that customer removed it - doing that had an immediate effect and query ran and returned
expected results in around 40 seconds.

I'm just trying to understand how the action of JUST removing the LTRIM in the JOIN on this
query had such an immediate and significant impact. Any suggestions/explanations ?
ASKER CERTIFIED SOLUTION
Avatar of Guy Hengel [angelIII / a3]
Guy Hengel [angelIII / a3]
Flag of Luxembourg image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of raymurphy
raymurphy

ASKER

Thanks for the prompt reply,  angelIII .... The gbmcu field is varchar(12) on both tables, and contains values such as '  AA7F013552' (i.e. two leading spaces).

Before I got the customer to amend their problem query (i.e. removing the original LTRIM), I got them to run the following queries as a precautionary check :

SELECT     a.gbmcu AS gbmcuSF1, b.gbmcu AS gbmcuSF2                   
FROM       SalesFigures1 a JOIN SalesFigures2 b
ON LTRIM(a.gbmcu) = LTRIM(b.gbmcu)                                     
SELECT     a.gbmcu AS gbmcuSF1, b.gbmcu AS gbmcuSF2                   
FROM       SalesFigures1 a JOIN SalesFigures2 b
ON (a.gbmcu) = (b.gbmcu)

Both queries returned the same number of rows (669220), which reassured me that I would be OK to remove the LTRIM from the original problem query - so still wondering exactly why (given that the customer has no index on the gbmcu column) removing the LTRIM from the original problem query caused it to run through OK (whereas with the LTRIM in the JOIN) the original problem query was
just sitting forever without returning any rows ???
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
The change to the original problem query, vandalesm, wasn't to remove the LEFT OUTER JOIN but just to change the ON LTRIM(a.gbmcu) = LTRIM(b.gbmcu) in the LEFT OUTER JOIN so that it became
ON (a.gbmcu) = (b.gbmcu)  .....
hi

the change is very simple

the index hold the data regarding your column the way it is in that table.
when you use a function on the column you render the indexes useless since the index dosent hold the data after function manipulation (for example oracle has wats called a Function Based Index an index that is created while using a function on a column thos saving the manipulated data and RID).

you could consider using a Indexed view to save the query results and then sql server optimizer will use it automatically when the original query is executed, but using this could have an impact on your DML operations.