Solved

making an self join more efficient

Posted on 2013-10-25
32
446 Views
Last Modified: 2013-11-06
can you suggest how to make this self join more efficient?

there are 30 M records in the ABC table.


select
a.COL1,
a.COL2,
b.COL1,
b.COL2
from
ABC a
JOIN
ABC b
ON a.COL2 = b.COL2
WHERE
a.COL1 = '112672803' AND
a.DATECOL = '2012-1-01' AND
b.COL1 <> '112672803' AND
b.DATECOL = '2012-1-01'  
AND a.COL2 = '45673'
0
Comment
Question by:25112
  • 15
  • 10
  • 3
  • +3
32 Comments
 
LVL 39

Assisted Solution

by:lcohan
lcohan earned 39 total points
ID: 39601494
That query is wrong in my opinion and result will be always null because...

a.COL1 = '112672803' AND
a.DATECOL = '2012-1-01' AND

AND at the same time you want...

a.COL1 <> '112672803' AND
b.DATECOL = '2012-1-01'


or in other words ALL rows for DATECOL = '2012-1-01' WHERE COL1 = '112672803' AND
COL1 <> '112672803' ...
0
 
LVL 45

Assisted Solution

by:Kdo
Kdo earned 39 total points
ID: 39601500
Hi 25112,

You may be able to make this query a lot faster if you prefilter one of the tables.

a.COL1 = '112672803' AND
a.DATECOL = '2012-1-01' AND
b.COL1 <> '112672803' AND
b.DATECOL = '2012-1-01'  
AND a.COL2 = '45673'

You're looking for specific rows in table A to join with specific rows in table B.  If applying the filter before the join returns only a few hundred (or even thousand) rows, that should help.

Here's the query with a pre-filter on table A.

select
  a.COL1,
  a.COL2,
  b.COL1,
  b.COL2
from
(
  SELECT * 
  FROM ABC
  WHERE COL1 = '112672803' 
    AND DATECOL = '2012-1-01' 
    AND COL2 = '45673' 
) a
JOIN ABC b
  ON a.COL2 = b.COL2
WHERE b.COL1 <> '112672803' 
  AND b.DATECOL = '2012-1-01'  

Open in new window

0
 
LVL 48

Assisted Solution

by:PortletPaul
PortletPaul earned 346 total points
ID: 39601854
just being a little pedantic, but at 30M rows I wouldn't use "select *"
we do tend to use this as shorthand in our comments - but please don't mimic that - just list the fields needed instead. (no offence intended kdo - just a reminder - I fall into this too)

lcohan raises an excellent point and I also don't see how this query will be worth running, put them side by side:
   ---- A -----                ---- B ----
SELECT COL1, COL2          SELECT COL1, COL2
FROM ABC                   FROM ABC
WHERE COL1 = '112672803'   WHERE COL1 <> '112672803'
AND DATECOL = '2012-1-01'  AND DATECOL = '2012-1-01'
AND COL2 = '45673'         AND COL2 = '45673' -- (ON a.COL2 = b.COL2)

Open in new window

By the way I'm really suspicious of '2012-1-01' also:
a. it's not a standard format and probably should be '2012-01-01'
b. if DATECOL really IS a datecol then I would avoid the implicit conversion of string to datetime and use convert(datetime,'2012-01-01',121)
(or date, use whatever the field actually is to avoid implicit conversions)
0
 
LVL 48

Assisted Solution

by:PortletPaul
PortletPaul earned 346 total points
ID: 39601947
Mmmmmm, that query will produce a Cartesian product of results. Assuming there is a record structure in ABC where 2 sets of rows exist with common COL2 and DATECOL; but a difference in COL1, those results get multiplied by the query.

see: http://sqlfiddle.com/#!3/febbe/10

I think you might be better off without the self-join
DECLARE @want AS varchar(20)
SET @want = '112672803'

SELECT
        @want A_COL1
      , COL1 B_COL1
      , COL2 A_COL2
FROM ABC
WHERE COL1 <> @want
        AND DATECOL = '2012-01-01'
;

Open in new window

further details
    
    CREATE TABLE abc
    	([COL2] varchar(20), [COL1] varchar(20), [DATECOL] datetime)
    ;
    	
    INSERT INTO abc
    	([COL2], [COL1], [DATECOL])
    VALUES
    	('45673', '112672803', '2012-01-01 00:00:00'),
    	('45673', '112672803', '2012-01-01 00:00:00'),
    	('45673', '112672803', '2012-01-01 00:00:00'),
    	('45673', '112672803', '2012-01-01 00:00:00'),
    	('45673', '112672803', '2012-01-01 00:00:00'),
    	('45673', '112672803', '2012-01-01 00:00:00'),
    	('45673', '112672803', '2012-01-01 00:00:00'),
    	('45673', '112672803', '2012-01-01 00:00:00'),
    	('45673', '112672803', '2012-01-01 00:00:00'),
    	('45673', '112672803', '2012-01-01 00:00:00'),
    	('45673', 'XXXXXXXXX', '2012-01-01 00:00:00'),
    	('45673', 'XXXXXXXXX', '2012-01-01 00:00:00'),
    	('45673', 'XXXXXXXXX', '2012-01-01 00:00:00'),
    	('45673', 'XXXXXXXXX', '2012-01-01 00:00:00'),
    	('45673', 'XXXXXXXXX', '2012-01-01 00:00:00'),
    	('45673', 'XXXXXXXXX', '2012-01-01 00:00:00'),
    	('45673', 'XXXXXXXXX', '2012-01-01 00:00:00'),
    	('45673', 'XXXXXXXXX', '2012-01-01 00:00:00'),
    	('45673', 'XXXXXXXXX', '2012-01-01 00:00:00'),
    	('45673', 'XXXXXXXXX', '2012-01-01 00:00:00')
    ;

**Query 1**:

    DECLARE @want AS varchar(20)
    SET @want = '112672803'
    
    SELECT
            @want A_COL1
          , COL1 B_COL1
          , COL2 A_COL2
    FROM ABC
    WHERE COL1 <> @want
            AND DATECOL = '2012-01-01'
    

**[Results][2]**:
    
    |    A_COL1 |    B_COL1 | A_COL2 |
    |-----------|-----------|--------|
    | 112672803 | XXXXXXXXX |  45673 |
    | 112672803 | XXXXXXXXX |  45673 |
    | 112672803 | XXXXXXXXX |  45673 |
    | 112672803 | XXXXXXXXX |  45673 |
    | 112672803 | XXXXXXXXX |  45673 |
    | 112672803 | XXXXXXXXX |  45673 |
    | 112672803 | XXXXXXXXX |  45673 |
    | 112672803 | XXXXXXXXX |  45673 |
    | 112672803 | XXXXXXXXX |  45673 |
    | 112672803 | XXXXXXXXX |  45673 |


**Query 2**:

    SELECT
            a.COL1
          , a.COL2
          , b.COL1
          , b.COL2
    FROM ABC a
            JOIN ABC b
                    ON a.COL2 = b.COL2
    WHERE a.COL1 = '112672803'
            AND a.DATECOL = '2012-1-01'
            AND b.COL1 <> '112672803'
            AND b.DATECOL = '2012-1-01'
            AND a.COL2 = '45673'
    

**[Results][3]**:
    
    |      COL1 |  COL2 |
    |-----------|-------|
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |
    | 112672803 | 45673 |


**Query 3**:

    SELECT
            a.COL1 A_COL1
          , a.COL2 A_COL2
          , b.COL1 B_COL1
          , b.COL2 B_COL2
    FROM (
                    SELECT
                            *
                    FROM ABC
                    WHERE COL1 = '112672803'
                            AND DATECOL = '2012-01-01'
                            AND COL2 = '45673'
            ) a
            JOIN ABC b
                    ON a.COL2 = b.COL2
    WHERE b.COL1 <> '112672803'
            AND b.DATECOL = '2012-01-01'
    

**[Results][4]**:
    
    |    A_COL1 | A_COL2 |    B_COL1 | B_COL2 |
    |-----------|--------|-----------|--------|
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |
    | 112672803 |  45673 | XXXXXXXXX |  45673 |



  [1]: http://sqlfiddle.com/#!3/febbe/10

  [2]: http://sqlfiddle.com/#!3/febbe/10/0

  [3]: http://sqlfiddle.com/#!3/febbe/10/1

Open in new window

0
 
LVL 5

Author Comment

by:25112
ID: 39602240
lcohan,
if i am not missing something, it is
b.COL1 <> '112672803' AND
and not..
a.COL1 <> '112672803' AND

wouldnt that be OK?
0
 
LVL 5

Author Comment

by:25112
ID: 39602241
Kdo, thanks for the filter idea and do the join as you suggest. (without the *)
0
 
LVL 5

Author Comment

by:25112
ID: 39602242
Paul, yes,  '2012-01-01'  is it...
-------------------------------------
>>Assuming there is a record structure in ABC where 2 sets of rows exist with common COL2 and DATECOL
true... thanks for the 3 scenarios..

can you run through below and give me your feedback.. in your example, it should not return anything for 6, but it is ... if we can fix that, we are doing great by avoid self join. thanks.

http://sqlfiddle.com/#!3/19d01/7
0
 
LVL 48

Assisted Solution

by:PortletPaul
PortletPaul earned 346 total points
ID: 39602305
it should not return anything for 6, but it is

sorry. but I'm not sure what this you mean
"it" is what exactly?

are you referring to this perhaps?

WHY would the expected result exclude "6"?
    CREATE TABLE abc
    	([COL2] varchar(20), [COL1] varchar(20), [DATECOL] datetime)
    ;
    	
    INSERT INTO abc
    	([COL2], [COL1], [DATECOL])
    VALUES
    	('45673', '112672803','2012-01-01 00:00:00'),
    	('45673', '12', '2012-01-01 00:00:00'),
    	('45673', '13', '2012-01-01 00:00:00'),
    	('45673', '14', '2012-01-01 00:00:00'),
    	('2', '11', '2012-01-01 00:00:00'),
    	('2', '112672803', '2012-01-01 00:00:00'),
    	('2', '17', '2012-01-01 00:00:00'),
    	('2', '18', '2012-01-01 00:00:00'),
    	('2', '19', '2012-01-01 00:00:00'),
    	('2', '20', '2012-01-01 00:00:00'),
    	('2', '21', '2012-01-01 00:00:00'),
    	('3', '112672803', '2012-01-01 00:00:00'),
    	('4', '22', '2012-01-01 00:00:00'),
    	('4', '112672803', '2012-01-01 00:00:00'),
    	('5', '14', '2012-01-01 00:00:00'),
    	('5', '19', '2012-01-01 00:00:00'),
    	('5', '112672803', '2012-01-01 00:00:00'),
    	('5', '24', '2012-01-01 00:00:00'),
    	('5', '25', '2012-01-01 00:00:00'),
    	('6', '11', '2012-01-01 00:00:00')

**Query 1**:

        DECLARE @want AS varchar(20)
        SET @want = '112672803'
        
        SELECT
                @want A_COL1
              , COL1 B_COL1
              , COL2 A_COL2
        FROM ABC
        WHERE COL1 <> @want
                AND DATECOL = '2012-01-01'
                  --AND  COL2 = '45673'
    

**[Results][2]**:
    
    |    A_COL1 | B_COL1 | A_COL2 |
    |-----------|--------|--------|
    | 112672803 |     12 |  45673 |
    | 112672803 |     13 |  45673 |
    | 112672803 |     14 |  45673 |
    | 112672803 |     11 |      2 |
    | 112672803 |     17 |      2 |
    | 112672803 |     18 |      2 |
    | 112672803 |     19 |      2 |
    | 112672803 |     20 |      2 |
    | 112672803 |     21 |      2 |
    | 112672803 |     22 |      4 |
    | 112672803 |     14 |      5 |
    | 112672803 |     19 |      5 |
    | 112672803 |     24 |      5 |
    | 112672803 |     25 |      5 |
    | 112672803 |     11 |      6 |



  [1]: http://sqlfiddle.com/#!3/19d01/24

Open in new window

0
 
LVL 31

Assisted Solution

by:awking00
awking00 earned 38 total points
ID: 39602837
You might try -
select a.col1 as col1a, a.col2 as col2a, b.col1 as col1b, b.col2 as col2b
from abc as a inner join abc as b
on a.col2 = b.col2 and a.datecol = b.datecol and a.col1 <> b.col1
where a.col1 = '112672803 and c.datecol = '2012-01-01'

As for efficiency, you probably need indexes on col1, col2, and datecol.
0
 
LVL 31

Expert Comment

by:awking00
ID: 39602842
Sorry for the typo, c.datecol should have been a.datecol.
0
 
LVL 16

Assisted Solution

by:DcpKing
DcpKing earned 38 total points
ID: 39602940
Assuming that the query, as-is, does work (albeit slowly) , despite the various opinions expressed, then the fastest way to speed things up would probably be to eliminate as many rows as you can at the start:
Select Col1, Col2
into #abc
from ABC
where DATECOL = '2012-01-01'
and COL2 = '45673'
-- -----------index the temp table here on Col1 and Col2
select a.COL1, a.COL2, b.COL1, b.COL2 
from #abc a 
JOIN #abc b 
ON a.COL2 = b.COL2
WHERE a.COL1 =   '112672803' 
AND     b.COL1 !=  '112672803' 

Open in new window


hth

Mike
0
 
LVL 5

Author Comment

by:25112
ID: 39602946
Paul,
Col1,Col2,DateCol will be distinct in ABC
Col1->Col2 is many->1 (One Col2 can belong to many Col1 for the same date)..

we are trying to match where Col2 belongs to Col1=112672803 and also some other Col1's..

In case of Col2=6, it only belongs to 11, so it is disqualified.. we could have missed it for 3 also.. but by the nature of query, it was disqualified in a different manner.. but if we fix it for 6, it should fix it for 3 also through the code itself.

i should have noted that the self join maybe was needed only because of these special circumstance of having to eliminate cases like '3' and '6' (Col2).

Thanks.
0
 
LVL 5

Author Comment

by:25112
ID: 39602975
awking00, i see you bought most of the conditions on ON instead of WHERE.. does that make a big difference?
0
 
LVL 31

Expert Comment

by:awking00
ID: 39602978
I'm not sure and, unfortunately, don't have a system available to obtain explain plans on the methods, but you might try and do that to see if and how much difference it makes.
0
 
LVL 5

Author Comment

by:25112
ID: 39603002
DcpKing, your idea is very good & would have been ideal if COL2 also gets passed.. .. i need to clarify something here.. i mentioned
"and COL2 = '45673'" in the query to simplify the examples.. but the only 2 parameters that will be sent in are:
'112672803'
and
DateValue (example: '2012-01-01')

the results will be expanded like in PortletPaul's output..
0
 
LVL 16

Expert Comment

by:DcpKing
ID: 39603029
25112;
Then cut line 5 from my suggestion. If you can replace it with any other "hint" from the outside world then do that too. The idea really is to start by eliminating as much work as you can ("if you don't have to do it then don't do it"!) from the query.

After doing that, add indexes to make the query faster.

hth

Mike
0
What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

 
LVL 5

Author Comment

by:25112
ID: 39603074
Mike,

the problem then would become that
-------------
Select Col1, Col2
into #abc
from ABC
where DATECOL = '2012-01-01'
-------------

alone could bring 2 million records. and DATECOL is not a good (best) candidate for index. But in the big picture, yes, that would cut/filter to get started. but you don't see any way to avoid the self join, do you?
0
 
LVL 48

Assisted Solution

by:PortletPaul
PortletPaul earned 346 total points
ID: 39603317
>> Col1,Col2,DateCol will be distinct in ABC
Q1: Is the clustered index of ABC based on those 3 fields?
     & might as well ask:
Q2: What are the indexes on ABC? (and perhas you might just use the real table name etc.)

>> Col1->Col2 is many->1 (One Col2 can belong to many Col1 for the same date)..
so the following (ignoring the constant datecol for now) lists the col1/col2 values we are interested in:
/*
select distinct col1, col2 from abc where col1 = '112672803'

112672803      2
112672803      3
112672803      4
112672803      45673
112672803      5
*/
>> we are trying to match where Col2 belongs to Col1=112672803 and also some other Col1's..
Q3: how many other col1's are you interested in? this may influence the query too

>>In case of Col2=6, it only belongs to 11, so it is disqualified..
>>we could have missed it for 3 also..
I assume you mean "should have" :)

>>but by the nature of query, it was disqualified in a different manner..
>>but if we fix it for 6, it should fix it for 3 also through the code itself.
>>i should have noted that ... having to eliminate cases like '3' and '6' (Col2).
It's my view that when queries get simplified for us that it takes longer to answer the question as often the full detail get revealed progressively.

Q4: Is this the expected result? (from the sample data)
|      WANT | COL1 |  COL2 |
|-----------|------|-------|
| 112672803 |   11 |     2 |
| 112672803 |   12 | 45673 |
| 112672803 |   13 | 45673 |
| 112672803 |   14 | 45673 |
| 112672803 |   14 |     5 |
| 112672803 |   17 |     2 |
| 112672803 |   18 |     2 |
| 112672803 |   19 |     2 |
| 112672803 |   19 |     5 |
| 112672803 |   20 |     2 |
| 112672803 |   21 |     2 |
| 112672803 |   22 |     4 |
| 112672803 |   24 |     5 |
| 112672803 |   25 |     5 |

Open in new window

This was produced by the following:
DECLARE @want AS varchar(20)
    SET @want = '112672803'

    SELECT
            @want want
          , COL1
          , COL2
    FROM ABC
    WHERE COL1 <> @want
            AND DATECOL = '2012-01-01'
            AND EXISTS (
                        SELECT 1
                        FROM abc AS x
                        WHERE col1= @want
                        AND x.col2=abc.col2
                       )
order by
            COL1
          , COL2
;


    CREATE TABLE abc
    	([COL2] varchar(20), [COL1] varchar(20), [DATECOL] datetime)
    ;
    	
    INSERT INTO abc
    	([COL2], [COL1], [DATECOL])
    VALUES
    	('45673', '112672803','2012-01-01 00:00:00'),
    	('45673', '12', '2012-01-01 00:00:00'),
    	('45673', '13', '2012-01-01 00:00:00'),
    	('45673', '14', '2012-01-01 00:00:00'),
    	('2', '11', '2012-01-01 00:00:00'),
    	('2', '112672803', '2012-01-01 00:00:00'),
    	('2', '17', '2012-01-01 00:00:00'),
    	('2', '18', '2012-01-01 00:00:00'),
    	('2', '19', '2012-01-01 00:00:00'),
    	('2', '20', '2012-01-01 00:00:00'),
    	('2', '21', '2012-01-01 00:00:00'),
    	('3', '112672803', '2012-01-01 00:00:00'),
    	('4', '22', '2012-01-01 00:00:00'),
    	('4', '112672803', '2012-01-01 00:00:00'),
    	('5', '14', '2012-01-01 00:00:00'),
    	('5', '19', '2012-01-01 00:00:00'),
    	('5', '112672803', '2012-01-01 00:00:00'),
    	('5', '24', '2012-01-01 00:00:00'),
    	('5', '25', '2012-01-01 00:00:00'),
    	('6', '11', '2012-01-01 00:00:00')

    

  [1]: http://sqlfiddle.com/#!3/19d01/33 

Open in new window

0
 
LVL 5

Author Comment

by:25112
ID: 39604221
Hi Paul,

>>It's my view that when queries get simplified for us that it takes longer to answer the question as often the full detail get revealed progressively.

ouch.. sorry.. i am learning.. thanks for your patience... basically, the idea to report the duplicate entries of an Employee in multiple offices.. one office will run the report and see which other office are sharing the employee. two different format of reporting is needed for this logic.. one is already figured out.. ( http://www.experts-exchange.com/Programming/Languages/SQL_Syntax/Q_28274064.html ) the one we are working has a different report layout needed.. and hence we are attempting.
The first formatting of results had to give the duplicate results all in one row...
In this, we are displaying each duplicate in each row... that being the difference between the 2 formats.

------------------------------

Q1: Is the clustered index of ABC based on those 3 fields?
No
Should it be on the 2 keys or include the date field also?
-----------------
existing: Clustered Key:
EMPHIST_KEY (artificial key)
--------------------
NonClustered Key:
OFFICE_KEY, LOCATION_KEY, EMPHIST_DATE
OFFICE_KEY, EMPHIST_DATE
EMPID
EMP_YEAR, OFFICE_KEY, EMPHIST_DATE
EMP_KEY, EMP_YEAR, DOT_KEY, EMPHIST_DATE, OFFICE_KEY
EMP_KEY, DOT_KEY
--------------------
(Latest) Query with real life column names:
http://sqlfiddle.com/#!3/f46c3/6
--------------------
so the following (ignoring the constant datecol for now) lists the col1/col2 values we are interested in:
Yes...
--------------------
Q3: how many other col1's are you interested in?
All of them. (the result from the last query you sent does this...
--------------------
I assume you mean "should have" :)
right on... :)
--------------------


The EXISTS clause - is that a join or subquery? The query is still running after 2 hours..  maybe because of the 30.5M records it has to go through? maybe it will cache it after the first time of complete run?maybe the clustered index you mentioned is needed.. but there is already a clustered index.. would a nonclustered index suffice on the 3 columns?
0
 
LVL 48

Assisted Solution

by:PortletPaul
PortletPaul earned 346 total points
ID: 39604528
>>The EXISTS clause - is that a join or subquery? The query is still running after 2 hours
my turn for "ouch"
there is something missing from that query inside the EXISTS, which is:
AND DATECOL = '2012-01-01'
It worked without it in the limited sqlfiddle because everything is of the same date and I overlooked this - sorry.
EXISTS is also known as a "left semi-join" (it is a join type, but it will not multiply the number of rows returned in the resultset).
{+ an edit, sorry}
0
 
LVL 48

Assisted Solution

by:PortletPaul
PortletPaul earned 346 total points
ID: 39604559
Here are 3 variants. The first is just that date correction. The second and third introduce use of a CTE (as this seems to have worked in that other query you mentioned). The third variant also suggests a method for filtering by multiple office_keys (but I'd suggest a testing this of a broader sample). Anyway, here are 3 variants all producing the same result
    CREATE TABLE EMPHIST
    	([EmpID] varchar(20), [OFFICE_KEY] varchar(20), [EMPHIST_DATE] datetime)
    ;
    	
    INSERT INTO EMPHIST
    	([EmpID], [OFFICE_KEY], [EMPHIST_DATE]) 
    VALUES
    	('45673', '112672803','2012-01-01 00:00:00'),
    	('45673', '12', '2012-01-01 00:00:00'),
    	('45673', '13', '2012-01-01 00:00:00'),
    	('45673', '14', '2012-01-01 00:00:00'),
    	('2', '11', '2012-01-01 00:00:00'),
    	('2', '112672803', '2012-01-01 00:00:00'),
    	('2', '17', '2012-01-01 00:00:00'),
    	('2', '18', '2012-01-01 00:00:00'),
    	('2', '19', '2012-01-01 00:00:00'),
    	('2', '20', '2012-01-01 00:00:00'),
    	('2', '21', '2012-01-01 00:00:00'),
    	('3', '112672803', '2012-01-01 00:00:00'),
    	('4', '22', '2012-01-01 00:00:00'),
    	('4', '112672803', '2012-01-01 00:00:00'),
    	('5', '14', '2012-01-01 00:00:00'),
    	('5', '19', '2012-01-01 00:00:00'),
    	('5', '112672803', '2012-01-01 00:00:00'),
    	('5', '24', '2012-01-01 00:00:00'),
    	('5', '25', '2012-01-01 00:00:00'),
    	('6', '11', '2012-01-01 00:00:00')

**Query 1**:

    DECLARE @want AS varchar(20)
    SET @want = '112672803'
    
    SELECT
                EmpID
              , OFFICE_KEY
              , @want want
    FROM EMPHIST
    WHERE OFFICE_KEY <> @want
           AND EMPHIST_DATE = '2012-01-01'
           AND EXISTS (
                            SELECT 1
                            FROM EMPHIST AS x
                            WHERE x.OFFICE_KEY = @want
                            AND x.EmpID = EMPHIST.EmpID
                            AND x.EMPHIST_DATE = '2012-01-01'
                           )
    order by
                EmpID DESC
              , OFFICE_KEY
    

**[Results][2]**:
    
    | EMPID | OFFICE_KEY |      WANT |
    |-------|------------|-----------|
    |     5 |         14 | 112672803 |
    |     5 |         19 | 112672803 |
    |     5 |         24 | 112672803 |
    |     5 |         25 | 112672803 |
    | 45673 |         12 | 112672803 |
    | 45673 |         13 | 112672803 |
    | 45673 |         14 | 112672803 |
    |     4 |         22 | 112672803 |
    |     2 |         11 | 112672803 |
    |     2 |         17 | 112672803 |
    |     2 |         18 | 112672803 |
    |     2 |         19 | 112672803 |
    |     2 |         20 | 112672803 |
    |     2 |         21 | 112672803 |


**Query 2**:

    /* single office CTE method */
    
    DECLARE @OFFICE_KEY AS varchar(20)
    SET @OFFICE_KEY = '112672803'
    
    ;WITH
    EMPHIST_CTE AS (
                      SELECT * --<< please actually list just the fields needed
                      FROM EMPHIST
                      WHERE EMPHIST_DATE = convert(datetime,'2012-01-01',121)
                                           /* this should EXACTLY match data type of
                                              EMPHIST_DATE */
                    )
    SELECT
           * --<< please actually list just the fields needed
    FROM EMPHIST_CTE
    WHERE OFFICE_KEY <> @OFFICE_KEY
    AND EXISTS (
                    SELECT 1
                    FROM EMPHIST_CTE AS x
                    WHERE x.OFFICE_KEY = @OFFICE_KEY
                    AND x.EmpID = EMPHIST_CTE.EmpID
                   )
    /* the order by is optional */
    ORDER BY
                EMPID DESC
              , OFFICE_KEY
    

**[Results][3]**:
    
    | EMPID | OFFICE_KEY |                   EMPHIST_DATE |
    |-------|------------|--------------------------------|
    |     5 |         14 | January, 01 2012 00:00:00+0000 |
    |     5 |         19 | January, 01 2012 00:00:00+0000 |
    |     5 |         24 | January, 01 2012 00:00:00+0000 |
    |     5 |         25 | January, 01 2012 00:00:00+0000 |
    | 45673 |         12 | January, 01 2012 00:00:00+0000 |
    | 45673 |         13 | January, 01 2012 00:00:00+0000 |
    | 45673 |         14 | January, 01 2012 00:00:00+0000 |
    |     4 |         22 | January, 01 2012 00:00:00+0000 |
    |     2 |         11 | January, 01 2012 00:00:00+0000 |
    |     2 |         17 | January, 01 2012 00:00:00+0000 |
    |     2 |         18 | January, 01 2012 00:00:00+0000 |
    |     2 |         19 | January, 01 2012 00:00:00+0000 |
    |     2 |         20 | January, 01 2012 00:00:00+0000 |
    |     2 |         21 | January, 01 2012 00:00:00+0000 |


**Query 3**:

    /* multiple office CTE method */
    
    ;WITH
    EMPHIST_CTE AS (
                      SELECT * --<< please actually list just the fields needed
                      FROM EMPHIST
                      WHERE EMPHIST_DATE = convert(datetime,'2012-01-01',121)
                                           /* this should EXACTLY match data type of
                                              EMPHIST_DATE */
                    )
    SELECT
           * --<< please actually list just the fields needed
    FROM EMPHIST_CTE
    WHERE OFFICE_KEY NOT IN ('112672803','xxxxxxxxx')
    AND EXISTS (
                    SELECT 1
                    FROM EMPHIST_CTE AS x
                    WHERE x.OFFICE_KEY IN ('112672803','xxxxxxxxx')
                    AND x.EmpID = EMPHIST_CTE.EmpID
                   )
    /* the order by is optional */
    ORDER BY
                EMPID DESC
              , OFFICE_KEY
    

**[Results][4]**:
    
    | EMPID | OFFICE_KEY |                   EMPHIST_DATE |
    |-------|------------|--------------------------------|
    |     5 |         14 | January, 01 2012 00:00:00+0000 |
    |     5 |         19 | January, 01 2012 00:00:00+0000 |
    |     5 |         24 | January, 01 2012 00:00:00+0000 |
    |     5 |         25 | January, 01 2012 00:00:00+0000 |
    | 45673 |         12 | January, 01 2012 00:00:00+0000 |
    | 45673 |         13 | January, 01 2012 00:00:00+0000 |
    | 45673 |         14 | January, 01 2012 00:00:00+0000 |
    |     4 |         22 | January, 01 2012 00:00:00+0000 |
    |     2 |         11 | January, 01 2012 00:00:00+0000 |
    |     2 |         17 | January, 01 2012 00:00:00+0000 |
    |     2 |         18 | January, 01 2012 00:00:00+0000 |
    |     2 |         19 | January, 01 2012 00:00:00+0000 |
    |     2 |         20 | January, 01 2012 00:00:00+0000 |
    |     2 |         21 | January, 01 2012 00:00:00+0000 |



  [1]: http://sqlfiddle.com/#!3/aae0a2/1

Open in new window

0
 
LVL 5

Author Comment

by:25112
ID: 39605831
thanks Paul- for nicely putting those 3 variants.. for some reason, the first one is not running fast.. not predictable.. sometimes takes 4 minutes, sometimes 10.. sometimes more.. and i have to cancel it.. but it has the same execution plan as the second one.. so not sure what could be going on.. but second is consistenly executing within 2 minutes.. and also after it is run once when you run the next time with the same parameter OFFICE_KEY = '112672803', it run in ms (less than one second).. but when i change parameter, let's say OFFICE_KEY = '212672603' then it again takes couple of minutes and on next execution it is fast.

for now, 3rd variant is not needed right, but i can dig it out later when needed..

now i saw you are an BETWEEN expert.. http://www.experts-exchange.com/Programming/Languages/SQL_Syntax/A_11210-Beware-of-Between.html
so i need your input on this..

in the above example, '2012-01-01' is a datetime, even though only date is displayed (not efficient, i know.. but that is another fire/issue for another day). but in actuality, what it is, a date will be given, and the previous years Jan1st through 3rd has to be passed in the query.. for easy example, i just mentioned '2012-01-01'.. but in reality the parameter can be any date in 2013, and then automatically, the logic should be
*go back one year (in this case 2012) and then automatically choose a range of Jan1st-3rd.
2 more examples:
2013-04-04 will lead to 2012-01-01 thru 2012-01-03
2011-11-24 will lead to 2010-01-01 thru 2010-01-03

so i put it in variables and passing in query.. but immediately execution plan suggests many indexes.. i have attached the 2 cte execution plans. can you suggest passing 2 datetime variable can alter the plan so much?

thanks-
cte-plan.sqlplan
CTE-plan-with-Between.sqlplan
0
 
LVL 5

Author Comment

by:25112
ID: 39605905
Paul, just one more step- maybe i jumped quickly with the prev post.. maybe we should address the sqlplan (BETWEEN condition) after we look at this below....

Just to avoid another 'ouch' on my part... here is hopefully the last bit of clarification to that will bring the whole prospect of this endevour in good light, and also good appreciation of your assistance & time..

below is the existing code that we are trying to rewrite.. there are couple of other simple logic (conditions) involved; let's incorporate them also, before we look at the indexes the exec plan recommends.. because each condition may vary the final index requirement, it would seem...

basically you will see 3 additional conditions, but in the SELF join, there will be 2 variations of OFFICE table (OFFICE & REP_OFFICE), instead of just one used twice. hope it make sense.
~~~~~~~~~~~~~~
select 
*
from 
OFFICE EMP00, 
LOCATION EMP01, 
EMPHIST EMP02,  /* SECOND EMPHIST FOR SELF JOIN */
(
select 
*
from 
(
SELECT *
from 
REP_OFFICE  /* REP_OFFICE IS VERY SIMILAR TO OFFICE TABLE, but DIFFERENT */
where REP_OFFICE.REP_OFFICE_KEY <> '11267280' and 
REP_OFFICE.REP_OFFICE_TYPE in ('SDD','PTC') /*New Condition not covered so far */
) Reporting_Office, 

LOCATION EMP10, 
EMPHIST EMP11  /* SECOND EMPHIST FOR SELF JOIN*/
where 
(Reporting_Office.REPORTING_OFFICE_TYPE in ('SDD') or EMP10.OFF_TYPE_CODE = 'TOMKET') and   /*New Condition not covered so far */
EMP11.OFFICE_KEY = Reporting_Office.OFFICE_KEY and 
EMP11.EMPHIST_DATE BETWEEN '2012-01-01' and '2012-01-03'
EMP10.LOCATION_KEY = EMP11.LOCATION_KEY
) EmpInOtherOffice

where  
EMP00.OFFICE_KEY = '11267280' and /* 11267280 is reporting office in this example */
EMP02.EMPHIST_DATE BETWEEN '2012-01-01' and '2012-01-03' and /* What actually will be passed is a date in 2013 which will translate to Jan 1-3 2012 range */
(EMP00.OFF_LEVEL in ('SDD') or EMP01.OFF_TYPE_CODE = 'TOMKET') and   /*New Condition not covered so far */
EMP00.OFFICE_KEY = EMP02.OFFICE_KEY and 
EMP01.LOCATION_KEY = EMP02.LOCATION_KEY and 
EMP02.EMPID = EmpInOtherOffice.EMPID  /*Essentially, the self join on EMPID */

Open in new window

~~~~~~~~~~~~~~~

OFFICE joins with EMPHIST on OFFICE_KEY (PK& FK)
LOCATION joins with EMPHIST on LOCATION_KEY (PK& FK)

kindly suggest an efficient join of these conditions on the CTE we have developed so far that is working good.  thanks paul
0
 
LVL 5

Author Comment

by:25112
ID: 39605916
Regarding the BETWEEN syntax, what I was thinking is to introduce a variable after @OFFICE_KEY and then calculating the start and end date and then using it in the query as below:
   SET @OFFICE_KEY = '...'
    Declare @dt date = 'somedate'
    Declare @start_date datetime  
    Declare @end_date datetime

SELECT @start_date= CAST(CAST(YEAR(@dt)-1 as char(4)) + '-01-01' as date)  /* go to prev year January 1st */
SELECT @end_date= CAST(CAST(YEAR(@dt)-1 as char(4)) + '-01-03' as date)    /* go to prev year January 3rd */

    
    ;WITH
    EMPHIST_CTE AS (
                      SELECT                 EMPID
              , OFFICE_KEY
              , @OFFICE_KEY want
                      FROM EMPHIST
                      WHERE EMPHIST_DATE BETWEEN @start_date and @end_date /*Implementation of the BETWEEN Clause */

Open in new window

0
 
LVL 5

Author Comment

by:25112
ID: 39606508
paul, here is the latest code including all the logic required.. pl comment on efficiency of overall code & index/
    DECLARE @OFFICE_KEY AS varchar(20)
    SET @OFFICE_KEY = '112672803'
    Declare @dt date = '2013-03-31'
    Declare @start_date datetime  
    Declare @end_date datetime

SELECT @start_date= CAST(CAST(YEAR(@dt)-1 as char(4)) + '-01-01' as date)  
SELECT @end_date= CAST(CAST(YEAR(@dt)-1 as char(4)) + '-01-03' as date) 
   
    ;WITH
    EMPHIST_CTE AS (
                      SELECT                 
              EMPID,
              OFFICE_KEY, 
              LOCATION_KEY,
              @OFFICE_KEY want
                      FROM EMPHIST
                      WHERE EMPHIST_DATE BETWEEN @start_date and @end_date 
                    )
    
    SELECT *  
    FROM EMPHIST_CTE CTE  JOIN REP_OFFICE RO ON CTE.@OFFICE_KEY = RO.@OFFICE_KEY AND CTE.@OFFICE_KEY <> @OFFICE_KEY     
    JOIN LOCATION LOC ON LOC.LOCATION_KEY = CTE.LOCATION_KEY
    WHERE  (REP_OFFICE_TYPE IN ('SDD','PTC') or LOC.OFF_TYPE_CODE = 'TOMKET')
    AND
     EXISTS (
                    SELECT 1
                    FROM EMPHIST_CTE AS x JOIN OFFICE D ON x.@OFFICE_KEY = D.OFFICE_KEY  
                    JOIN LOCATION L ON L.LOCATION_KEY = x.LOCATION_KEY
                    WHERE (D.LEVEL_AGGR IN ('SDD') or L.OFF_TYPE_CODE = 'TOMKET')) AND
                    x.@OFFICE_KEY = @OFFICE_KEY
                    AND x.EMPID = CTE.EMPID
                   )  
                   order by 3,1

Open in new window

question: you can't bring any columns from the query inside WHERE EXISTS to be displayed with the SELECT right after the CTE Definition, can we ?
0
 
LVL 48

Assisted Solution

by:PortletPaul
PortletPaul earned 346 total points
ID: 39607279
No "ouch" intended, but by now I hope you see that progressively revealing requirements actually makes the answering cycle longer. A tip for the future.

In the code block immediately above you have some syntax errors so I'm not fully confident that it works (eg. you can't join on x.@office_key). You have also thrown in a few joins I'm not familiar with - BUT; they don't look like they will multiply the records of EMPHIST. On this assumption what I would suggest is that you place as much of those as possible into the CTE.

Think of the CTE as "do it once, then re-use" - so anything that is repeated should go into the CTE - and you are repeating some joins.

Given that your datetime field only holds the date (i.e. time is always 00:00:00) then using between is OK. I'd prefer to see you using the better >= with < technique but it is optional as long as time of day isn't a factor.

Oh, and no you cannot access any fields of the EXISTS subquery, basically all this does is says "do I exist? yes. In fact several experts here will use this syntax:
    where exists (select NULL from abc ...)
In other words the select list inside those parentheses does not have to return any data at all. So, definitely a no.

This is my suggestion based on your code immediately above (untested):
DECLARE @OFFICE_KEY AS varchar(20)
SET @OFFICE_KEY = '112672803'
DECLARE @dt date = '2013-03-31'
DECLARE @start_date datetime
DECLARE @end_date datetime

SELECT
        @start_date = CAST(CAST(YEAR(@dt) - 1 AS char(4)) + '-01-01' AS datetime) --<<might as well specify datetime here
SELECT
        @end_date = CAST(CAST(YEAR(@dt) - 1 AS char(4)) + '-01-03' AS datetime)

;
WITH
        EMPHIST_CTE
        AS (
                SELECT
                        EMPHIST.EMPID
                      , EMPHIST.OFFICE_KEY
                      , EMPHIST.LOCATION_KEY
                      , @OFFICE_KEY want
                      , RO.REP_OFFICE_TYPE
                      , LOC.OFF_TYPE_CODE
                      , D.LEVEL_AGGR
                FROM EMPHIST
                JOIN REP_OFFICE RO
                        ON EMPHIST.OFFICE_KEY = RO.OFFICE_KEY
                JOIN LOCATION LOC
                        ON LOC.LOCATION_KEY = CTE.LOCATION_KEY
                JOIN OFFICE D
                        ON EMPHIST.OFFICE_KEY = D.OFFICE_KEY
                WHERE EMPHIST_DATE BETWEEN @start_date AND @end_date
                        AND (REP_OFFICE_TYPE IN ('SDD', 'PTC') OR LOC.OFF_TYPE_CODE = 'TOMKET')
                )

SELECT
        * --<< Grrrr, specify your columns please
FROM EMPHIST_CTE CTE
WHERE CTE.OFFICE_KEY <> @OFFICE_KEY
        AND EXISTS (
                        SELECT 1
                        FROM EMPHIST_CTE AS x
                        WHERE (LEVEL_AGGR IN ('SDD') OR OFF_TYPE_CODE = 'TOMKET')
                                AND x.OFFICE_KEY = @OFFICE_KEY
                                AND x.EMPID = CTE.EMPID
                   )
ORDER BY
        3, 1 --<< Grrrrrrr, what are columns 3 and 1? (I don't want to translate that) nor will the next person

Open in new window

It may be necessary to "play with" where you place the joins i.e. experiment to see where the best location for each one is. but I'd start there and see if it still matches the expectations.

As I don't know what fields you actually want as output I can't make code additions for that - BUT please, please, please don't rely on "select *" - that is not good practice. (nb: we have to use it in our answers when we don't know the fields, or we use it because we are abbreviating our replies - our job is to answer not to fully develop code :)

I really don't like using column numbers in the order by.
0
 
LVL 5

Author Comment

by:25112
ID: 39610712
Hi Paul, your points are well taken... thanks for your patient tip.. i will be careful with that..

yes, time of day does not matter.. so i left it as such..

yes, have and will avoid SELECT * and also waiting for final ORDER clause confirmation and then will hardcode those columns.. thanks for enforcing those good programming practises..

now with your review, i made small changes.. and could not avoid using CTE less than 3 times... please review if we have done our best, please: In the code, EMPHIST is joined with REPORT_OFFICE and LOCATION in the CTE. This will give details of the reported office.

The CTE is used 3 times in the code: details are...

First time it is used to get the reporting Office info (CTE2).
Second time it is used to get the reported office info.(CTE)
Third time, it is used in the EXISTS clause. (x)

in CTE2 the CTE results are joined with OFFICE and LOCATION to get the details needed to be displayed in the report.

CTE is joined with OFFICE to get certain details that only OFFICE has.. then it is joined (based on EMPID) with LOCATION to get certain other details.

Together, EMPHIST+OFFICE (or REPORT_OFFICE)+LOCATION combo has the results we need.

Can you kindly suggest if we have exhaused every measure to reduce the join to minimal need. Also index suggestions, please?

Thanks

code is below:
----------------
  DECLARE @OFFICE_KEY AS varchar(20)
SET @OFFICE_KEY = '112672803'
DECLARE @dt datetime
  set @dt = '2013-01-21'
  
    Declare @START_DATE datetime    
    Declare @END_DATE datetime  
  
SELECT @START_DATE= CAST(CAST(YEAR(@dt)-1 as char(4)) + '-01-01' as date)    
SELECT @END_DATE= CAST(CAST(YEAR(@dt)-1 as char(4)) + '-01-03' as date)   
   
    ;WITH  
    EMPHIST_CTE AS (  
                      SELECT    
                @OFFICE_KEY REP_OFFICE,                  
                EMPID EMPID,  
                EMP_LNAME Other_EMP_Last_Name,   
                EMP_FNAME Other_EMP_First_Name,  
                MainTable.OFFICE_KEY ,   
                RO.REP_OFFICE_NAME Other_OFFICE_Name,   
                RO.COUNTY Other_County_Name,   
                LOC.LOCATION_ID Other_Office_Number,   
                LOC.LOCATION_KEY,   
                LOC.LOCATION_NAME Other_Office_Name  
                 
                      FROM   
                      EMPHIST MainTable  
                JOIN   
                REPORT_OFFICE RO
                        ON MainTable.OFFICE_KEY = RO.OFFICE_KEY  
                JOIN LOCATION LOC  
                        ON MainTable.LOCATION_KEY = LOC.LOCATION_KEY AND RO.OFFICE_KEY = LOC.OFFICE_KEY  
                WHERE EMPHIST_DATE BETWEEN @START_DATE AND @END_DATE AND  
                (REP_OFFICE_TYPE IN ('SDD', 'PTC') OR LOC.OFF_TYPE_CODE = 'TOMKET')                           
                    )   
SELECT   
CTE.REP_OFFICE,          
O.OFFICE_NAME OFFICE_Name,     
O.COUNTY_CODE County_Code,      
O.COUNTY County_Name,   
L.LOCATION_ID Office_Number,  
L.LOCATION_NAME Office_Name,       
CTE2.EMPID,  
CTE2.Other_EMP_Last_Name EMP_Last_Name,  
CTE2.Other_EMP_First_Name EMP_First_Name,  
CTE.OFFICE_KEY Other_REP_OFFICE,   
CTE.Other_OFFICE_Name,   
CTE.Other_County_Name,   
CTE.Other_Office_Number,   
CTE.Other_Office_Name,  
CTE.Other_EMP_Last_Name,   
CTE.Other_EMP_First_Name  
                 
FROM EMPHIST_CTE CTE   
JOIN OFFICE O
ON   CTE.REP_OFFICE = O.OFFICE_KEY  
  
JOIN EMPHIST_CTE CTE2   
ON   
CTE.EMPID = CTE2.EMPID AND   
CTE.REP_OFFICE = CTE2.OFFICE_KEY  
  
JOIN LOCATION L   
ON L.OFFICE_KEY = CTE.REP_OFFICE AND CTE2.LOCATION_KEY = L.LOCATION_KEY AND     
  (O.LEVEL_AGGR IN ('SDD', 'PTC') OR L.OFF_TYPE_CODE = 'TOMKET')

WHERE CTE.OFFICE_KEY <> @OFFICE_KEY and  
     EXISTS (  
                    SELECT 0  
                    FROM EMPHIST_CTE AS x     
                    WHERE   
                    x.OFFICE_KEY = CTE.REP_OFFICE  
                    AND x.EMPID = CTE.EMPID   
                   )

Open in new window

0
 
LVL 48

Assisted Solution

by:PortletPaul
PortletPaul earned 346 total points
ID: 39610730
That looks fine to me, here the purpose of the CTE is to be re-used, so that is fine

:)
----------------------------------
except the use of between,
I do not regard that as best practice as you probably know already.even if this is true:
>>yes, time of day does not matter..

there is nothing gained (IMHO) by using between - but potentially much to lose

the alternative is quite simple, and the big advantage is
"it works 100% reliably in all scenarios" so one does not have to flip technique depending on the data precision
e.g.
[date_field] >= low_date and [date_field] < ( high_date + 1)

see: "Beware of Between"
0
 
LVL 5

Author Comment

by:25112
ID: 39611383
that makes very good sense... thanks Paul.

1)
i changed
" WHERE EMPHIST_DATE BETWEEN @START_DATE AND @END_DATE AND  "
to
" WHERE EMPHIST_DATE >= @START_DATE AND EMPHIST_DATE < ( @END_DATE + 1) AND  "

 performance wise, BETWEEN and the above approach is almost the same, right?

2)index wise, i should only focus on the query within the first main CTE, right? (the other CTEs just hold data, whereas the main cte has main real tables, right?) the below is the main join of the main CTE part:

                   ---------------------------------
set  statistics io on
                      SELECT      
COUNT(*)
FROM  
                      EMPHIST MainTable  
                JOIN  
                REPORT_OFFICE RO
                        ON MainTable.OFFICE_KEY = RO.OFFICE_KEY  
                JOIN LOCATION LOC  
                        ON MainTable.LOCATION_KEY = LOC.LOCATION_KEY AND RO.OFFICE_KEY = LOC.OFFICE_KEY  
                WHERE EMPHIST_DATE BETWEEN @START_DATE AND @END_DATE AND  
                (REP_OFFICE_TYPE IN ('SDD', 'PTC') OR LOC.OFF_TYPE_CODE = 'TOMKET')      
                       set statistics io  off
                       
(1 row(s) affected)
Table 'LOCATION'. Scan count 0, logical reads 3329236, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'EMPHIST'. Scan count 824, logical reads 6676728, physical reads 1878, read-ahead reads 14973, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'REPORT_OFFICE'. Scan count 1, logical reads 11, physical reads 3, read-ahead reads 58, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
                     
                   ---------------------------------
that is: 5 conditions:

EMPHIST.OFFICE_KEY  = REPORT_OFFICE.OFFICE_KEY
                                REPORT_OFFICE.OFFICE_KEY = LOCATION.OFFICE_KEY
EMPHIST.LOCATION_KEY = LOCATION.LOCATION_KEY

EMPHIST.EMPHIST_DATE >= '2012-01-01' AND SNAPSHOT_DATE < '2012-01-03'

(REPORT_OFFICE.REP_OFFICE_TYPE IN ('SDD', 'PTC') OR LOCATION.OFF_TYPE_CODE = 'TOMKET')
0
 
LVL 48

Accepted Solution

by:
PortletPaul earned 346 total points
ID: 39613029
performance wise, BETWEEN and the above approach is almost the same, right?
correct
in fact what happens to between is it is "translated" into the following by the optimizer

field >= value and field <= value

"between" is a script convenience only, technically it is executed by the basic operators >= and <=

so substituting >= and < into the script makes no noticeable difference

2)index wise, i should only focus on the query within the first main CTE, right? (the other CTEs just hold data, whereas the main cte has main real tables, right?)
 yes & yes
0
 
LVL 5

Author Comment

by:25112
ID: 39627670
thanks a lot- the solution has been tested for a few days and performing good. thanks again!
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 39629060
excellent - glad to have been of assistance. cheers, Paul
0

Featured Post

Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

Join & Write a Comment

Hi all, It is important and often overlooked to understand “Database properties”. Often we see questions about "log files" or "where is the database" and one of the easiest ways to get general information about your database is to use “Database p…
SQL Server engine let you use a Windows account or a SQL Server account to connect to a SQL Server instance. This can be configured immediatly during the SQL Server installation or after in the Server Authentication section in the Server properties …
This video shows how to remove a single email address from the Outlook 2010 Auto Suggestion memory. NOTE: For Outlook 2016 and 2013 perform the exact same steps. Open a new email: Click the New email button in Outlook. Start typing the address: …
This tutorial demonstrates a quick way of adding group price to multiple Magento products.

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now