x
Solved

# making an self join more efficient

Posted on 2013-10-25
Medium Priority
461 Views
can you suggest how to make this self join more efficient?

there are 30 M records in the ABC table.

select
a.COL1,
a.COL2,
b.COL1,
b.COL2
from
ABC a
JOIN
ABC b
ON a.COL2 = b.COL2
WHERE
a.COL1 = '112672803' AND
a.DATECOL = '2012-1-01' AND
b.COL1 <> '112672803' AND
b.DATECOL = '2012-1-01'
AND a.COL2 = '45673'
0
Question by:25112
• 15
• 10
• 3
• +3

LVL 40

Assisted Solution

lcohan earned 156 total points
ID: 39601494
That query is wrong in my opinion and result will be always null because...

a.COL1 = '112672803' AND
a.DATECOL = '2012-1-01' AND

AND at the same time you want...

a.COL1 <> '112672803' AND
b.DATECOL = '2012-1-01'

or in other words ALL rows for DATECOL = '2012-1-01' WHERE COL1 = '112672803' AND
COL1 <> '112672803' ...
0

LVL 46

Assisted Solution

Kent Olsen earned 156 total points
ID: 39601500
Hi 25112,

You may be able to make this query a lot faster if you prefilter one of the tables.

a.COL1 = '112672803' AND
a.DATECOL = '2012-1-01' AND
b.COL1 <> '112672803' AND
b.DATECOL = '2012-1-01'
AND a.COL2 = '45673'

You're looking for specific rows in table A to join with specific rows in table B.  If applying the filter before the join returns only a few hundred (or even thousand) rows, that should help.

Here's the query with a pre-filter on table A.

``````select
a.COL1,
a.COL2,
b.COL1,
b.COL2
from
(
SELECT *
FROM ABC
WHERE COL1 = '112672803'
AND DATECOL = '2012-1-01'
AND COL2 = '45673'
) a
JOIN ABC b
ON a.COL2 = b.COL2
WHERE b.COL1 <> '112672803'
AND b.DATECOL = '2012-1-01'
``````
0

LVL 50

Assisted Solution

Paul earned 1384 total points
ID: 39601854
just being a little pedantic, but at 30M rows I wouldn't use "select *"
we do tend to use this as shorthand in our comments - but please don't mimic that - just list the fields needed instead. (no offence intended kdo - just a reminder - I fall into this too)

lcohan raises an excellent point and I also don't see how this query will be worth running, put them side by side:
``````   ---- A -----                ---- B ----
SELECT COL1, COL2          SELECT COL1, COL2
FROM ABC                   FROM ABC
WHERE COL1 = '112672803'   WHERE COL1 <> '112672803'
AND DATECOL = '2012-1-01'  AND DATECOL = '2012-1-01'
AND COL2 = '45673'         AND COL2 = '45673' -- (ON a.COL2 = b.COL2)
``````
By the way I'm really suspicious of '2012-1-01' also:
a. it's not a standard format and probably should be '2012-01-01'
b. if DATECOL really IS a datecol then I would avoid the implicit conversion of string to datetime and use convert(datetime,'2012-01-01',121)
(or date, use whatever the field actually is to avoid implicit conversions)
0

LVL 50

Assisted Solution

Paul earned 1384 total points
ID: 39601947
Mmmmmm, that query will produce a Cartesian product of results. Assuming there is a record structure in ABC where 2 sets of rows exist with common COL2 and DATECOL; but a difference in COL1, those results get multiplied by the query.

see: http://sqlfiddle.com/#!3/febbe/10

I think you might be better off without the self-join
``````DECLARE @want AS varchar(20)
SET @want = '112672803'

SELECT
@want A_COL1
, COL1 B_COL1
, COL2 A_COL2
FROM ABC
WHERE COL1 <> @want
AND DATECOL = '2012-01-01'
;
``````
further details
``````
CREATE TABLE abc
([COL2] varchar(20), [COL1] varchar(20), [DATECOL] datetime)
;

INSERT INTO abc
([COL2], [COL1], [DATECOL])
VALUES
('45673', '112672803', '2012-01-01 00:00:00'),
('45673', '112672803', '2012-01-01 00:00:00'),
('45673', '112672803', '2012-01-01 00:00:00'),
('45673', '112672803', '2012-01-01 00:00:00'),
('45673', '112672803', '2012-01-01 00:00:00'),
('45673', '112672803', '2012-01-01 00:00:00'),
('45673', '112672803', '2012-01-01 00:00:00'),
('45673', '112672803', '2012-01-01 00:00:00'),
('45673', '112672803', '2012-01-01 00:00:00'),
('45673', '112672803', '2012-01-01 00:00:00'),
('45673', 'XXXXXXXXX', '2012-01-01 00:00:00'),
('45673', 'XXXXXXXXX', '2012-01-01 00:00:00'),
('45673', 'XXXXXXXXX', '2012-01-01 00:00:00'),
('45673', 'XXXXXXXXX', '2012-01-01 00:00:00'),
('45673', 'XXXXXXXXX', '2012-01-01 00:00:00'),
('45673', 'XXXXXXXXX', '2012-01-01 00:00:00'),
('45673', 'XXXXXXXXX', '2012-01-01 00:00:00'),
('45673', 'XXXXXXXXX', '2012-01-01 00:00:00'),
('45673', 'XXXXXXXXX', '2012-01-01 00:00:00'),
('45673', 'XXXXXXXXX', '2012-01-01 00:00:00')
;

**Query 1**:

DECLARE @want AS varchar(20)
SET @want = '112672803'

SELECT
@want A_COL1
, COL1 B_COL1
, COL2 A_COL2
FROM ABC
WHERE COL1 <> @want
AND DATECOL = '2012-01-01'

**[Results][2]**:

|    A_COL1 |    B_COL1 | A_COL2 |
|-----------|-----------|--------|
| 112672803 | XXXXXXXXX |  45673 |
| 112672803 | XXXXXXXXX |  45673 |
| 112672803 | XXXXXXXXX |  45673 |
| 112672803 | XXXXXXXXX |  45673 |
| 112672803 | XXXXXXXXX |  45673 |
| 112672803 | XXXXXXXXX |  45673 |
| 112672803 | XXXXXXXXX |  45673 |
| 112672803 | XXXXXXXXX |  45673 |
| 112672803 | XXXXXXXXX |  45673 |
| 112672803 | XXXXXXXXX |  45673 |

**Query 2**:

SELECT
a.COL1
, a.COL2
, b.COL1
, b.COL2
FROM ABC a
JOIN ABC b
ON a.COL2 = b.COL2
WHERE a.COL1 = '112672803'
AND a.DATECOL = '2012-1-01'
AND b.COL1 <> '112672803'
AND b.DATECOL = '2012-1-01'
AND a.COL2 = '45673'

**[Results][3]**:

|      COL1 |  COL2 |
|-----------|-------|
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |
| 112672803 | 45673 |

**Query 3**:

SELECT
a.COL1 A_COL1
, a.COL2 A_COL2
, b.COL1 B_COL1
, b.COL2 B_COL2
FROM (
SELECT
*
FROM ABC
WHERE COL1 = '112672803'
AND DATECOL = '2012-01-01'
AND COL2 = '45673'
) a
JOIN ABC b
ON a.COL2 = b.COL2
WHERE b.COL1 <> '112672803'
AND b.DATECOL = '2012-01-01'

**[Results][4]**:

|    A_COL1 | A_COL2 |    B_COL1 | B_COL2 |
|-----------|--------|-----------|--------|
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |
| 112672803 |  45673 | XXXXXXXXX |  45673 |

[1]: http://sqlfiddle.com/#!3/febbe/10

[2]: http://sqlfiddle.com/#!3/febbe/10/0

[3]: http://sqlfiddle.com/#!3/febbe/10/1
``````
0

LVL 5

Author Comment

ID: 39602240
lcohan,
if i am not missing something, it is
b.COL1 <> '112672803' AND
and not..
a.COL1 <> '112672803' AND

wouldnt that be OK?
0

LVL 5

Author Comment

ID: 39602241
Kdo, thanks for the filter idea and do the join as you suggest. (without the *)
0

LVL 5

Author Comment

ID: 39602242
Paul, yes,  '2012-01-01'  is it...
-------------------------------------
>>Assuming there is a record structure in ABC where 2 sets of rows exist with common COL2 and DATECOL
true... thanks for the 3 scenarios..

can you run through below and give me your feedback.. in your example, it should not return anything for 6, but it is ... if we can fix that, we are doing great by avoid self join. thanks.

http://sqlfiddle.com/#!3/19d01/7
0

LVL 50

Assisted Solution

Paul earned 1384 total points
ID: 39602305
it should not return anything for 6, but it is

sorry. but I'm not sure what this you mean
"it" is what exactly?

are you referring to this perhaps?

WHY would the expected result exclude "6"?
``````    CREATE TABLE abc
([COL2] varchar(20), [COL1] varchar(20), [DATECOL] datetime)
;

INSERT INTO abc
([COL2], [COL1], [DATECOL])
VALUES
('45673', '112672803','2012-01-01 00:00:00'),
('45673', '12', '2012-01-01 00:00:00'),
('45673', '13', '2012-01-01 00:00:00'),
('45673', '14', '2012-01-01 00:00:00'),
('2', '11', '2012-01-01 00:00:00'),
('2', '112672803', '2012-01-01 00:00:00'),
('2', '17', '2012-01-01 00:00:00'),
('2', '18', '2012-01-01 00:00:00'),
('2', '19', '2012-01-01 00:00:00'),
('2', '20', '2012-01-01 00:00:00'),
('2', '21', '2012-01-01 00:00:00'),
('3', '112672803', '2012-01-01 00:00:00'),
('4', '22', '2012-01-01 00:00:00'),
('4', '112672803', '2012-01-01 00:00:00'),
('5', '14', '2012-01-01 00:00:00'),
('5', '19', '2012-01-01 00:00:00'),
('5', '112672803', '2012-01-01 00:00:00'),
('5', '24', '2012-01-01 00:00:00'),
('5', '25', '2012-01-01 00:00:00'),
('6', '11', '2012-01-01 00:00:00')

**Query 1**:

DECLARE @want AS varchar(20)
SET @want = '112672803'

SELECT
@want A_COL1
, COL1 B_COL1
, COL2 A_COL2
FROM ABC
WHERE COL1 <> @want
AND DATECOL = '2012-01-01'
--AND  COL2 = '45673'

**[Results][2]**:

|    A_COL1 | B_COL1 | A_COL2 |
|-----------|--------|--------|
| 112672803 |     12 |  45673 |
| 112672803 |     13 |  45673 |
| 112672803 |     14 |  45673 |
| 112672803 |     11 |      2 |
| 112672803 |     17 |      2 |
| 112672803 |     18 |      2 |
| 112672803 |     19 |      2 |
| 112672803 |     20 |      2 |
| 112672803 |     21 |      2 |
| 112672803 |     22 |      4 |
| 112672803 |     14 |      5 |
| 112672803 |     19 |      5 |
| 112672803 |     24 |      5 |
| 112672803 |     25 |      5 |
| 112672803 |     11 |      6 |

[1]: http://sqlfiddle.com/#!3/19d01/24
``````
0

LVL 32

Assisted Solution

awking00 earned 152 total points
ID: 39602837
You might try -
select a.col1 as col1a, a.col2 as col2a, b.col1 as col1b, b.col2 as col2b
from abc as a inner join abc as b
on a.col2 = b.col2 and a.datecol = b.datecol and a.col1 <> b.col1
where a.col1 = '112672803 and c.datecol = '2012-01-01'

As for efficiency, you probably need indexes on col1, col2, and datecol.
0

LVL 32

Expert Comment

ID: 39602842
Sorry for the typo, c.datecol should have been a.datecol.
0

LVL 16

Assisted Solution

DcpKing earned 152 total points
ID: 39602940
Assuming that the query, as-is, does work (albeit slowly) , despite the various opinions expressed, then the fastest way to speed things up would probably be to eliminate as many rows as you can at the start:
``````Select Col1, Col2
into #abc
from ABC
where DATECOL = '2012-01-01'
and COL2 = '45673'
-- -----------index the temp table here on Col1 and Col2
select a.COL1, a.COL2, b.COL1, b.COL2
from #abc a
JOIN #abc b
ON a.COL2 = b.COL2
WHERE a.COL1 =   '112672803'
AND     b.COL1 !=  '112672803'
``````

hth

Mike
0

LVL 5

Author Comment

ID: 39602946
Paul,
Col1,Col2,DateCol will be distinct in ABC
Col1->Col2 is many->1 (One Col2 can belong to many Col1 for the same date)..

we are trying to match where Col2 belongs to Col1=112672803 and also some other Col1's..

In case of Col2=6, it only belongs to 11, so it is disqualified.. we could have missed it for 3 also.. but by the nature of query, it was disqualified in a different manner.. but if we fix it for 6, it should fix it for 3 also through the code itself.

i should have noted that the self join maybe was needed only because of these special circumstance of having to eliminate cases like '3' and '6' (Col2).

Thanks.
0

LVL 5

Author Comment

ID: 39602975
awking00, i see you bought most of the conditions on ON instead of WHERE.. does that make a big difference?
0

LVL 32

Expert Comment

ID: 39602978
I'm not sure and, unfortunately, don't have a system available to obtain explain plans on the methods, but you might try and do that to see if and how much difference it makes.
0

LVL 5

Author Comment

ID: 39603002
DcpKing, your idea is very good & would have been ideal if COL2 also gets passed.. .. i need to clarify something here.. i mentioned
"and COL2 = '45673'" in the query to simplify the examples.. but the only 2 parameters that will be sent in are:
'112672803'
and
DateValue (example: '2012-01-01')

the results will be expanded like in PortletPaul's output..
0

LVL 16

Expert Comment

ID: 39603029
25112;
Then cut line 5 from my suggestion. If you can replace it with any other "hint" from the outside world then do that too. The idea really is to start by eliminating as much work as you can ("if you don't have to do it then don't do it"!) from the query.

After doing that, add indexes to make the query faster.

hth

Mike
0

LVL 5

Author Comment

ID: 39603074
Mike,

the problem then would become that
-------------
Select Col1, Col2
into #abc
from ABC
where DATECOL = '2012-01-01'
-------------

alone could bring 2 million records. and DATECOL is not a good (best) candidate for index. But in the big picture, yes, that would cut/filter to get started. but you don't see any way to avoid the self join, do you?
0

LVL 50

Assisted Solution

Paul earned 1384 total points
ID: 39603317
>> Col1,Col2,DateCol will be distinct in ABC
Q1: Is the clustered index of ABC based on those 3 fields?
Q2: What are the indexes on ABC? (and perhas you might just use the real table name etc.)

>> Col1->Col2 is many->1 (One Col2 can belong to many Col1 for the same date)..
so the following (ignoring the constant datecol for now) lists the col1/col2 values we are interested in:
/*
select distinct col1, col2 from abc where col1 = '112672803'

112672803      2
112672803      3
112672803      4
112672803      45673
112672803      5
*/
>> we are trying to match where Col2 belongs to Col1=112672803 and also some other Col1's..
Q3: how many other col1's are you interested in? this may influence the query too

>>In case of Col2=6, it only belongs to 11, so it is disqualified..
>>we could have missed it for 3 also..
I assume you mean "should have" :)

>>but by the nature of query, it was disqualified in a different manner..
>>but if we fix it for 6, it should fix it for 3 also through the code itself.
>>i should have noted that ... having to eliminate cases like '3' and '6' (Col2).
It's my view that when queries get simplified for us that it takes longer to answer the question as often the full detail get revealed progressively.

Q4: Is this the expected result? (from the sample data)
``````|      WANT | COL1 |  COL2 |
|-----------|------|-------|
| 112672803 |   11 |     2 |
| 112672803 |   12 | 45673 |
| 112672803 |   13 | 45673 |
| 112672803 |   14 | 45673 |
| 112672803 |   14 |     5 |
| 112672803 |   17 |     2 |
| 112672803 |   18 |     2 |
| 112672803 |   19 |     2 |
| 112672803 |   19 |     5 |
| 112672803 |   20 |     2 |
| 112672803 |   21 |     2 |
| 112672803 |   22 |     4 |
| 112672803 |   24 |     5 |
| 112672803 |   25 |     5 |
``````
This was produced by the following:
``````DECLARE @want AS varchar(20)
SET @want = '112672803'

SELECT
@want want
, COL1
, COL2
FROM ABC
WHERE COL1 <> @want
AND DATECOL = '2012-01-01'
AND EXISTS (
SELECT 1
FROM abc AS x
WHERE col1= @want
AND x.col2=abc.col2
)
order by
COL1
, COL2
;

CREATE TABLE abc
([COL2] varchar(20), [COL1] varchar(20), [DATECOL] datetime)
;

INSERT INTO abc
([COL2], [COL1], [DATECOL])
VALUES
('45673', '112672803','2012-01-01 00:00:00'),
('45673', '12', '2012-01-01 00:00:00'),
('45673', '13', '2012-01-01 00:00:00'),
('45673', '14', '2012-01-01 00:00:00'),
('2', '11', '2012-01-01 00:00:00'),
('2', '112672803', '2012-01-01 00:00:00'),
('2', '17', '2012-01-01 00:00:00'),
('2', '18', '2012-01-01 00:00:00'),
('2', '19', '2012-01-01 00:00:00'),
('2', '20', '2012-01-01 00:00:00'),
('2', '21', '2012-01-01 00:00:00'),
('3', '112672803', '2012-01-01 00:00:00'),
('4', '22', '2012-01-01 00:00:00'),
('4', '112672803', '2012-01-01 00:00:00'),
('5', '14', '2012-01-01 00:00:00'),
('5', '19', '2012-01-01 00:00:00'),
('5', '112672803', '2012-01-01 00:00:00'),
('5', '24', '2012-01-01 00:00:00'),
('5', '25', '2012-01-01 00:00:00'),
('6', '11', '2012-01-01 00:00:00')

[1]: http://sqlfiddle.com/#!3/19d01/33
``````
0

LVL 5

Author Comment

ID: 39604221
Hi Paul,

>>It's my view that when queries get simplified for us that it takes longer to answer the question as often the full detail get revealed progressively.

ouch.. sorry.. i am learning.. thanks for your patience... basically, the idea to report the duplicate entries of an Employee in multiple offices.. one office will run the report and see which other office are sharing the employee. two different format of reporting is needed for this logic.. one is already figured out.. ( http://www.experts-exchange.com/Programming/Languages/SQL_Syntax/Q_28274064.html ) the one we are working has a different report layout needed.. and hence we are attempting.
The first formatting of results had to give the duplicate results all in one row...
In this, we are displaying each duplicate in each row... that being the difference between the 2 formats.

------------------------------

Q1: Is the clustered index of ABC based on those 3 fields?
No
Should it be on the 2 keys or include the date field also?
-----------------
existing: Clustered Key:
EMPHIST_KEY (artificial key)
--------------------
NonClustered Key:
OFFICE_KEY, LOCATION_KEY, EMPHIST_DATE
OFFICE_KEY, EMPHIST_DATE
EMPID
EMP_YEAR, OFFICE_KEY, EMPHIST_DATE
EMP_KEY, EMP_YEAR, DOT_KEY, EMPHIST_DATE, OFFICE_KEY
EMP_KEY, DOT_KEY
--------------------
(Latest) Query with real life column names:
http://sqlfiddle.com/#!3/f46c3/6
--------------------
so the following (ignoring the constant datecol for now) lists the col1/col2 values we are interested in:
Yes...
--------------------
Q3: how many other col1's are you interested in?
All of them. (the result from the last query you sent does this...
--------------------
I assume you mean "should have" :)
right on... :)
--------------------

The EXISTS clause - is that a join or subquery? The query is still running after 2 hours..  maybe because of the 30.5M records it has to go through? maybe it will cache it after the first time of complete run?maybe the clustered index you mentioned is needed.. but there is already a clustered index.. would a nonclustered index suffice on the 3 columns?
0

LVL 50

Assisted Solution

Paul earned 1384 total points
ID: 39604528
>>The EXISTS clause - is that a join or subquery? The query is still running after 2 hours
my turn for "ouch"
there is something missing from that query inside the EXISTS, which is:
AND DATECOL = '2012-01-01'
It worked without it in the limited sqlfiddle because everything is of the same date and I overlooked this - sorry.
EXISTS is also known as a "left semi-join" (it is a join type, but it will not multiply the number of rows returned in the resultset).
{+ an edit, sorry}
0

LVL 50

Assisted Solution

Paul earned 1384 total points
ID: 39604559
Here are 3 variants. The first is just that date correction. The second and third introduce use of a CTE (as this seems to have worked in that other query you mentioned). The third variant also suggests a method for filtering by multiple office_keys (but I'd suggest a testing this of a broader sample). Anyway, here are 3 variants all producing the same result
``````    CREATE TABLE EMPHIST
([EmpID] varchar(20), [OFFICE_KEY] varchar(20), [EMPHIST_DATE] datetime)
;

INSERT INTO EMPHIST
([EmpID], [OFFICE_KEY], [EMPHIST_DATE])
VALUES
('45673', '112672803','2012-01-01 00:00:00'),
('45673', '12', '2012-01-01 00:00:00'),
('45673', '13', '2012-01-01 00:00:00'),
('45673', '14', '2012-01-01 00:00:00'),
('2', '11', '2012-01-01 00:00:00'),
('2', '112672803', '2012-01-01 00:00:00'),
('2', '17', '2012-01-01 00:00:00'),
('2', '18', '2012-01-01 00:00:00'),
('2', '19', '2012-01-01 00:00:00'),
('2', '20', '2012-01-01 00:00:00'),
('2', '21', '2012-01-01 00:00:00'),
('3', '112672803', '2012-01-01 00:00:00'),
('4', '22', '2012-01-01 00:00:00'),
('4', '112672803', '2012-01-01 00:00:00'),
('5', '14', '2012-01-01 00:00:00'),
('5', '19', '2012-01-01 00:00:00'),
('5', '112672803', '2012-01-01 00:00:00'),
('5', '24', '2012-01-01 00:00:00'),
('5', '25', '2012-01-01 00:00:00'),
('6', '11', '2012-01-01 00:00:00')

**Query 1**:

DECLARE @want AS varchar(20)
SET @want = '112672803'

SELECT
EmpID
, OFFICE_KEY
, @want want
FROM EMPHIST
WHERE OFFICE_KEY <> @want
AND EMPHIST_DATE = '2012-01-01'
AND EXISTS (
SELECT 1
FROM EMPHIST AS x
WHERE x.OFFICE_KEY = @want
AND x.EmpID = EMPHIST.EmpID
AND x.EMPHIST_DATE = '2012-01-01'
)
order by
EmpID DESC
, OFFICE_KEY

**[Results][2]**:

| EMPID | OFFICE_KEY |      WANT |
|-------|------------|-----------|
|     5 |         14 | 112672803 |
|     5 |         19 | 112672803 |
|     5 |         24 | 112672803 |
|     5 |         25 | 112672803 |
| 45673 |         12 | 112672803 |
| 45673 |         13 | 112672803 |
| 45673 |         14 | 112672803 |
|     4 |         22 | 112672803 |
|     2 |         11 | 112672803 |
|     2 |         17 | 112672803 |
|     2 |         18 | 112672803 |
|     2 |         19 | 112672803 |
|     2 |         20 | 112672803 |
|     2 |         21 | 112672803 |

**Query 2**:

/* single office CTE method */

DECLARE @OFFICE_KEY AS varchar(20)
SET @OFFICE_KEY = '112672803'

;WITH
EMPHIST_CTE AS (
SELECT * --<< please actually list just the fields needed
FROM EMPHIST
WHERE EMPHIST_DATE = convert(datetime,'2012-01-01',121)
/* this should EXACTLY match data type of
EMPHIST_DATE */
)
SELECT
* --<< please actually list just the fields needed
FROM EMPHIST_CTE
WHERE OFFICE_KEY <> @OFFICE_KEY
AND EXISTS (
SELECT 1
FROM EMPHIST_CTE AS x
WHERE x.OFFICE_KEY = @OFFICE_KEY
AND x.EmpID = EMPHIST_CTE.EmpID
)
/* the order by is optional */
ORDER BY
EMPID DESC
, OFFICE_KEY

**[Results][3]**:

| EMPID | OFFICE_KEY |                   EMPHIST_DATE |
|-------|------------|--------------------------------|
|     5 |         14 | January, 01 2012 00:00:00+0000 |
|     5 |         19 | January, 01 2012 00:00:00+0000 |
|     5 |         24 | January, 01 2012 00:00:00+0000 |
|     5 |         25 | January, 01 2012 00:00:00+0000 |
| 45673 |         12 | January, 01 2012 00:00:00+0000 |
| 45673 |         13 | January, 01 2012 00:00:00+0000 |
| 45673 |         14 | January, 01 2012 00:00:00+0000 |
|     4 |         22 | January, 01 2012 00:00:00+0000 |
|     2 |         11 | January, 01 2012 00:00:00+0000 |
|     2 |         17 | January, 01 2012 00:00:00+0000 |
|     2 |         18 | January, 01 2012 00:00:00+0000 |
|     2 |         19 | January, 01 2012 00:00:00+0000 |
|     2 |         20 | January, 01 2012 00:00:00+0000 |
|     2 |         21 | January, 01 2012 00:00:00+0000 |

**Query 3**:

/* multiple office CTE method */

;WITH
EMPHIST_CTE AS (
SELECT * --<< please actually list just the fields needed
FROM EMPHIST
WHERE EMPHIST_DATE = convert(datetime,'2012-01-01',121)
/* this should EXACTLY match data type of
EMPHIST_DATE */
)
SELECT
* --<< please actually list just the fields needed
FROM EMPHIST_CTE
WHERE OFFICE_KEY NOT IN ('112672803','xxxxxxxxx')
AND EXISTS (
SELECT 1
FROM EMPHIST_CTE AS x
WHERE x.OFFICE_KEY IN ('112672803','xxxxxxxxx')
AND x.EmpID = EMPHIST_CTE.EmpID
)
/* the order by is optional */
ORDER BY
EMPID DESC
, OFFICE_KEY

**[Results][4]**:

| EMPID | OFFICE_KEY |                   EMPHIST_DATE |
|-------|------------|--------------------------------|
|     5 |         14 | January, 01 2012 00:00:00+0000 |
|     5 |         19 | January, 01 2012 00:00:00+0000 |
|     5 |         24 | January, 01 2012 00:00:00+0000 |
|     5 |         25 | January, 01 2012 00:00:00+0000 |
| 45673 |         12 | January, 01 2012 00:00:00+0000 |
| 45673 |         13 | January, 01 2012 00:00:00+0000 |
| 45673 |         14 | January, 01 2012 00:00:00+0000 |
|     4 |         22 | January, 01 2012 00:00:00+0000 |
|     2 |         11 | January, 01 2012 00:00:00+0000 |
|     2 |         17 | January, 01 2012 00:00:00+0000 |
|     2 |         18 | January, 01 2012 00:00:00+0000 |
|     2 |         19 | January, 01 2012 00:00:00+0000 |
|     2 |         20 | January, 01 2012 00:00:00+0000 |
|     2 |         21 | January, 01 2012 00:00:00+0000 |

[1]: http://sqlfiddle.com/#!3/aae0a2/1
``````
0

LVL 5

Author Comment

ID: 39605831
thanks Paul- for nicely putting those 3 variants.. for some reason, the first one is not running fast.. not predictable.. sometimes takes 4 minutes, sometimes 10.. sometimes more.. and i have to cancel it.. but it has the same execution plan as the second one.. so not sure what could be going on.. but second is consistenly executing within 2 minutes.. and also after it is run once when you run the next time with the same parameter OFFICE_KEY = '112672803', it run in ms (less than one second).. but when i change parameter, let's say OFFICE_KEY = '212672603' then it again takes couple of minutes and on next execution it is fast.

for now, 3rd variant is not needed right, but i can dig it out later when needed..

now i saw you are an BETWEEN expert.. http://www.experts-exchange.com/Programming/Languages/SQL_Syntax/A_11210-Beware-of-Between.html
so i need your input on this..

in the above example, '2012-01-01' is a datetime, even though only date is displayed (not efficient, i know.. but that is another fire/issue for another day). but in actuality, what it is, a date will be given, and the previous years Jan1st through 3rd has to be passed in the query.. for easy example, i just mentioned '2012-01-01'.. but in reality the parameter can be any date in 2013, and then automatically, the logic should be
*go back one year (in this case 2012) and then automatically choose a range of Jan1st-3rd.
2 more examples:
2013-04-04 will lead to 2012-01-01 thru 2012-01-03
2011-11-24 will lead to 2010-01-01 thru 2010-01-03

so i put it in variables and passing in query.. but immediately execution plan suggests many indexes.. i have attached the 2 cte execution plans. can you suggest passing 2 datetime variable can alter the plan so much?

thanks-
cte-plan.sqlplan
CTE-plan-with-Between.sqlplan
0

LVL 5

Author Comment

ID: 39605905
Paul, just one more step- maybe i jumped quickly with the prev post.. maybe we should address the sqlplan (BETWEEN condition) after we look at this below....

Just to avoid another 'ouch' on my part... here is hopefully the last bit of clarification to that will bring the whole prospect of this endevour in good light, and also good appreciation of your assistance & time..

below is the existing code that we are trying to rewrite.. there are couple of other simple logic (conditions) involved; let's incorporate them also, before we look at the indexes the exec plan recommends.. because each condition may vary the final index requirement, it would seem...

basically you will see 3 additional conditions, but in the SELF join, there will be 2 variations of OFFICE table (OFFICE & REP_OFFICE), instead of just one used twice. hope it make sense.
~~~~~~~~~~~~~~
``````select
*
from
OFFICE EMP00,
LOCATION EMP01,
EMPHIST EMP02,  /* SECOND EMPHIST FOR SELF JOIN */
(
select
*
from
(
SELECT *
from
REP_OFFICE  /* REP_OFFICE IS VERY SIMILAR TO OFFICE TABLE, but DIFFERENT */
where REP_OFFICE.REP_OFFICE_KEY <> '11267280' and
REP_OFFICE.REP_OFFICE_TYPE in ('SDD','PTC') /*New Condition not covered so far */
) Reporting_Office,

LOCATION EMP10,
EMPHIST EMP11  /* SECOND EMPHIST FOR SELF JOIN*/
where
(Reporting_Office.REPORTING_OFFICE_TYPE in ('SDD') or EMP10.OFF_TYPE_CODE = 'TOMKET') and   /*New Condition not covered so far */
EMP11.OFFICE_KEY = Reporting_Office.OFFICE_KEY and
EMP11.EMPHIST_DATE BETWEEN '2012-01-01' and '2012-01-03'
EMP10.LOCATION_KEY = EMP11.LOCATION_KEY
) EmpInOtherOffice

where
EMP00.OFFICE_KEY = '11267280' and /* 11267280 is reporting office in this example */
EMP02.EMPHIST_DATE BETWEEN '2012-01-01' and '2012-01-03' and /* What actually will be passed is a date in 2013 which will translate to Jan 1-3 2012 range */
(EMP00.OFF_LEVEL in ('SDD') or EMP01.OFF_TYPE_CODE = 'TOMKET') and   /*New Condition not covered so far */
EMP00.OFFICE_KEY = EMP02.OFFICE_KEY and
EMP01.LOCATION_KEY = EMP02.LOCATION_KEY and
EMP02.EMPID = EmpInOtherOffice.EMPID  /*Essentially, the self join on EMPID */
``````
~~~~~~~~~~~~~~~

OFFICE joins with EMPHIST on OFFICE_KEY (PK& FK)
LOCATION joins with EMPHIST on LOCATION_KEY (PK& FK)

kindly suggest an efficient join of these conditions on the CTE we have developed so far that is working good.  thanks paul
0

LVL 5

Author Comment

ID: 39605916
Regarding the BETWEEN syntax, what I was thinking is to introduce a variable after @OFFICE_KEY and then calculating the start and end date and then using it in the query as below:
``````   SET @OFFICE_KEY = '...'
Declare @dt date = 'somedate'
Declare @start_date datetime
Declare @end_date datetime

SELECT @start_date= CAST(CAST(YEAR(@dt)-1 as char(4)) + '-01-01' as date)  /* go to prev year January 1st */
SELECT @end_date= CAST(CAST(YEAR(@dt)-1 as char(4)) + '-01-03' as date)    /* go to prev year January 3rd */

;WITH
EMPHIST_CTE AS (
SELECT                 EMPID
, OFFICE_KEY
, @OFFICE_KEY want
FROM EMPHIST
WHERE EMPHIST_DATE BETWEEN @start_date and @end_date /*Implementation of the BETWEEN Clause */
``````
0

LVL 5

Author Comment

ID: 39606508
paul, here is the latest code including all the logic required.. pl comment on efficiency of overall code & index/
``````    DECLARE @OFFICE_KEY AS varchar(20)
SET @OFFICE_KEY = '112672803'
Declare @dt date = '2013-03-31'
Declare @start_date datetime
Declare @end_date datetime

SELECT @start_date= CAST(CAST(YEAR(@dt)-1 as char(4)) + '-01-01' as date)
SELECT @end_date= CAST(CAST(YEAR(@dt)-1 as char(4)) + '-01-03' as date)

;WITH
EMPHIST_CTE AS (
SELECT
EMPID,
OFFICE_KEY,
LOCATION_KEY,
@OFFICE_KEY want
FROM EMPHIST
WHERE EMPHIST_DATE BETWEEN @start_date and @end_date
)

SELECT *
FROM EMPHIST_CTE CTE  JOIN REP_OFFICE RO ON CTE.@OFFICE_KEY = RO.@OFFICE_KEY AND CTE.@OFFICE_KEY <> @OFFICE_KEY
JOIN LOCATION LOC ON LOC.LOCATION_KEY = CTE.LOCATION_KEY
WHERE  (REP_OFFICE_TYPE IN ('SDD','PTC') or LOC.OFF_TYPE_CODE = 'TOMKET')
AND
EXISTS (
SELECT 1
FROM EMPHIST_CTE AS x JOIN OFFICE D ON x.@OFFICE_KEY = D.OFFICE_KEY
JOIN LOCATION L ON L.LOCATION_KEY = x.LOCATION_KEY
WHERE (D.LEVEL_AGGR IN ('SDD') or L.OFF_TYPE_CODE = 'TOMKET')) AND
x.@OFFICE_KEY = @OFFICE_KEY
AND x.EMPID = CTE.EMPID
)
order by 3,1
``````
question: you can't bring any columns from the query inside WHERE EXISTS to be displayed with the SELECT right after the CTE Definition, can we ?
0

LVL 50

Assisted Solution

Paul earned 1384 total points
ID: 39607279
No "ouch" intended, but by now I hope you see that progressively revealing requirements actually makes the answering cycle longer. A tip for the future.

In the code block immediately above you have some syntax errors so I'm not fully confident that it works (eg. you can't join on x.@office_key). You have also thrown in a few joins I'm not familiar with - BUT; they don't look like they will multiply the records of EMPHIST. On this assumption what I would suggest is that you place as much of those as possible into the CTE.

Think of the CTE as "do it once, then re-use" - so anything that is repeated should go into the CTE - and you are repeating some joins.

Given that your datetime field only holds the date (i.e. time is always 00:00:00) then using between is OK. I'd prefer to see you using the better >= with < technique but it is optional as long as time of day isn't a factor.

Oh, and no you cannot access any fields of the EXISTS subquery, basically all this does is says "do I exist? yes. In fact several experts here will use this syntax:
where exists (select NULL from abc ...)
In other words the select list inside those parentheses does not have to return any data at all. So, definitely a no.

This is my suggestion based on your code immediately above (untested):
``````DECLARE @OFFICE_KEY AS varchar(20)
SET @OFFICE_KEY = '112672803'
DECLARE @dt date = '2013-03-31'
DECLARE @start_date datetime
DECLARE @end_date datetime

SELECT
@start_date = CAST(CAST(YEAR(@dt) - 1 AS char(4)) + '-01-01' AS datetime) --<<might as well specify datetime here
SELECT
@end_date = CAST(CAST(YEAR(@dt) - 1 AS char(4)) + '-01-03' AS datetime)

;
WITH
EMPHIST_CTE
AS (
SELECT
EMPHIST.EMPID
, EMPHIST.OFFICE_KEY
, EMPHIST.LOCATION_KEY
, @OFFICE_KEY want
, RO.REP_OFFICE_TYPE
, LOC.OFF_TYPE_CODE
, D.LEVEL_AGGR
FROM EMPHIST
JOIN REP_OFFICE RO
ON EMPHIST.OFFICE_KEY = RO.OFFICE_KEY
JOIN LOCATION LOC
ON LOC.LOCATION_KEY = CTE.LOCATION_KEY
JOIN OFFICE D
ON EMPHIST.OFFICE_KEY = D.OFFICE_KEY
WHERE EMPHIST_DATE BETWEEN @start_date AND @end_date
AND (REP_OFFICE_TYPE IN ('SDD', 'PTC') OR LOC.OFF_TYPE_CODE = 'TOMKET')
)

SELECT
FROM EMPHIST_CTE CTE
WHERE CTE.OFFICE_KEY <> @OFFICE_KEY
AND EXISTS (
SELECT 1
FROM EMPHIST_CTE AS x
WHERE (LEVEL_AGGR IN ('SDD') OR OFF_TYPE_CODE = 'TOMKET')
AND x.OFFICE_KEY = @OFFICE_KEY
AND x.EMPID = CTE.EMPID
)
ORDER BY
3, 1 --<< Grrrrrrr, what are columns 3 and 1? (I don't want to translate that) nor will the next person
``````
It may be necessary to "play with" where you place the joins i.e. experiment to see where the best location for each one is. but I'd start there and see if it still matches the expectations.

As I don't know what fields you actually want as output I can't make code additions for that - BUT please, please, please don't rely on "select *" - that is not good practice. (nb: we have to use it in our answers when we don't know the fields, or we use it because we are abbreviating our replies - our job is to answer not to fully develop code :)

I really don't like using column numbers in the order by.
0

LVL 5

Author Comment

ID: 39610712
Hi Paul, your points are well taken... thanks for your patient tip.. i will be careful with that..

yes, time of day does not matter.. so i left it as such..

yes, have and will avoid SELECT * and also waiting for final ORDER clause confirmation and then will hardcode those columns.. thanks for enforcing those good programming practises..

now with your review, i made small changes.. and could not avoid using CTE less than 3 times... please review if we have done our best, please: In the code, EMPHIST is joined with REPORT_OFFICE and LOCATION in the CTE. This will give details of the reported office.

The CTE is used 3 times in the code: details are...

First time it is used to get the reporting Office info (CTE2).
Second time it is used to get the reported office info.(CTE)
Third time, it is used in the EXISTS clause. (x)

in CTE2 the CTE results are joined with OFFICE and LOCATION to get the details needed to be displayed in the report.

CTE is joined with OFFICE to get certain details that only OFFICE has.. then it is joined (based on EMPID) with LOCATION to get certain other details.

Together, EMPHIST+OFFICE (or REPORT_OFFICE)+LOCATION combo has the results we need.

Can you kindly suggest if we have exhaused every measure to reduce the join to minimal need. Also index suggestions, please?

Thanks

code is below:
----------------
``````  DECLARE @OFFICE_KEY AS varchar(20)
SET @OFFICE_KEY = '112672803'
DECLARE @dt datetime
set @dt = '2013-01-21'

Declare @START_DATE datetime
Declare @END_DATE datetime

SELECT @START_DATE= CAST(CAST(YEAR(@dt)-1 as char(4)) + '-01-01' as date)
SELECT @END_DATE= CAST(CAST(YEAR(@dt)-1 as char(4)) + '-01-03' as date)

;WITH
EMPHIST_CTE AS (
SELECT
@OFFICE_KEY REP_OFFICE,
EMPID EMPID,
EMP_LNAME Other_EMP_Last_Name,
EMP_FNAME Other_EMP_First_Name,
MainTable.OFFICE_KEY ,
RO.REP_OFFICE_NAME Other_OFFICE_Name,
RO.COUNTY Other_County_Name,
LOC.LOCATION_ID Other_Office_Number,
LOC.LOCATION_KEY,
LOC.LOCATION_NAME Other_Office_Name

FROM
EMPHIST MainTable
JOIN
REPORT_OFFICE RO
ON MainTable.OFFICE_KEY = RO.OFFICE_KEY
JOIN LOCATION LOC
ON MainTable.LOCATION_KEY = LOC.LOCATION_KEY AND RO.OFFICE_KEY = LOC.OFFICE_KEY
WHERE EMPHIST_DATE BETWEEN @START_DATE AND @END_DATE AND
(REP_OFFICE_TYPE IN ('SDD', 'PTC') OR LOC.OFF_TYPE_CODE = 'TOMKET')
)
SELECT
CTE.REP_OFFICE,
O.OFFICE_NAME OFFICE_Name,
O.COUNTY_CODE County_Code,
O.COUNTY County_Name,
L.LOCATION_ID Office_Number,
L.LOCATION_NAME Office_Name,
CTE2.EMPID,
CTE2.Other_EMP_Last_Name EMP_Last_Name,
CTE2.Other_EMP_First_Name EMP_First_Name,
CTE.OFFICE_KEY Other_REP_OFFICE,
CTE.Other_OFFICE_Name,
CTE.Other_County_Name,
CTE.Other_Office_Number,
CTE.Other_Office_Name,
CTE.Other_EMP_Last_Name,
CTE.Other_EMP_First_Name

FROM EMPHIST_CTE CTE
JOIN OFFICE O
ON   CTE.REP_OFFICE = O.OFFICE_KEY

JOIN EMPHIST_CTE CTE2
ON
CTE.EMPID = CTE2.EMPID AND
CTE.REP_OFFICE = CTE2.OFFICE_KEY

JOIN LOCATION L
ON L.OFFICE_KEY = CTE.REP_OFFICE AND CTE2.LOCATION_KEY = L.LOCATION_KEY AND
(O.LEVEL_AGGR IN ('SDD', 'PTC') OR L.OFF_TYPE_CODE = 'TOMKET')

WHERE CTE.OFFICE_KEY <> @OFFICE_KEY and
EXISTS (
SELECT 0
FROM EMPHIST_CTE AS x
WHERE
x.OFFICE_KEY = CTE.REP_OFFICE
AND x.EMPID = CTE.EMPID
)
``````
0

LVL 50

Assisted Solution

Paul earned 1384 total points
ID: 39610730
That looks fine to me, here the purpose of the CTE is to be re-used, so that is fine

:)
----------------------------------
except the use of between,
I do not regard that as best practice as you probably know already.even if this is true:
>>yes, time of day does not matter..

there is nothing gained (IMHO) by using between - but potentially much to lose

the alternative is quite simple, and the big advantage is
"it works 100% reliably in all scenarios" so one does not have to flip technique depending on the data precision
e.g.
[date_field] >= low_date and [date_field] < ( high_date + 1)

see: "Beware of Between"
0

LVL 5

Author Comment

ID: 39611383
that makes very good sense... thanks Paul.

1)
i changed
" WHERE EMPHIST_DATE BETWEEN @START_DATE AND @END_DATE AND  "
to
" WHERE EMPHIST_DATE >= @START_DATE AND EMPHIST_DATE < ( @END_DATE + 1) AND  "

performance wise, BETWEEN and the above approach is almost the same, right?

2)index wise, i should only focus on the query within the first main CTE, right? (the other CTEs just hold data, whereas the main cte has main real tables, right?) the below is the main join of the main CTE part:

---------------------------------
set  statistics io on
SELECT
COUNT(*)
FROM
EMPHIST MainTable
JOIN
REPORT_OFFICE RO
ON MainTable.OFFICE_KEY = RO.OFFICE_KEY
JOIN LOCATION LOC
ON MainTable.LOCATION_KEY = LOC.LOCATION_KEY AND RO.OFFICE_KEY = LOC.OFFICE_KEY
WHERE EMPHIST_DATE BETWEEN @START_DATE AND @END_DATE AND
(REP_OFFICE_TYPE IN ('SDD', 'PTC') OR LOC.OFF_TYPE_CODE = 'TOMKET')
set statistics io  off

(1 row(s) affected)

---------------------------------
that is: 5 conditions:

EMPHIST.OFFICE_KEY  = REPORT_OFFICE.OFFICE_KEY
REPORT_OFFICE.OFFICE_KEY = LOCATION.OFFICE_KEY
EMPHIST.LOCATION_KEY = LOCATION.LOCATION_KEY

EMPHIST.EMPHIST_DATE >= '2012-01-01' AND SNAPSHOT_DATE < '2012-01-03'

(REPORT_OFFICE.REP_OFFICE_TYPE IN ('SDD', 'PTC') OR LOCATION.OFF_TYPE_CODE = 'TOMKET')
0

LVL 50

Accepted Solution

Paul earned 1384 total points
ID: 39613029
performance wise, BETWEEN and the above approach is almost the same, right?
correct
in fact what happens to between is it is "translated" into the following by the optimizer

field >= value and field <= value

"between" is a script convenience only, technically it is executed by the basic operators >= and <=

so substituting >= and < into the script makes no noticeable difference

2)index wise, i should only focus on the query within the first main CTE, right? (the other CTEs just hold data, whereas the main cte has main real tables, right?)
yes & yes
0

LVL 5

Author Comment

ID: 39627670
thanks a lot- the solution has been tested for a few days and performing good. thanks again!
0

LVL 50

Expert Comment

ID: 39629060
excellent - glad to have been of assistance. cheers, Paul
0

## Featured Post

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.