Solved

how to force a HASH JOIN in SQL 2008

Posted on 2010-11-20
7
956 Views
Last Modified: 2012-05-10
--On Server 2008
SELECT ID
FROM Table1
 WHERE ID NOT IN (SELECT FNAME FROM [SERVER2005].[DataBase2].DBO.Table2)
---execution time  7min  
table1  ID is PK and table has 10177 records
 
--On Server 2005
SELECT ID
FROM  [SERVER2008].[DataBase1].DBO.Table1
WHERE ID NOT IN (SELECT FNAME FROM [SERVER2005].[DataBase2].DBO.Table2)
--less than 1second.
table2 no indexes and has 103180 records.

in execution plan SERVER2008 use NestedLoops cost 96%
In     ---  ||  ----      SERVER2005 use Hash Match  cost 10%

Question is :
How to force a hash join in this case.
Or why the optimizer is choosing NestedLoops.
0
Comment
Question by:PDF
7 Comments
 
LVL 7

Expert Comment

by:twol
ID: 34181833
here is an example of how to force a hash join:

USE AdventureWorks2008R2;
GO
SELECT p.Name, pr.ProductReviewID
FROM Production.Product p
LEFT OUTER HASH JOIN Production.ProductReview pr
ON p.ProductID = pr.ProductID
ORDER BY ProductReviewID DESC;
0
 
LVL 16

Expert Comment

by:BurnieP
ID: 34183288
This should be a working solution :
SELECT ID 

  FROM Table1

  LEFT OUTER HASH JOIN [SERVER2005].[DataBase2].DBO.Table2 Table2 ON Table1.ID = Table2.FNAME

WHERE ID IS NULL

Open in new window

0
 
LVL 6

Expert Comment

by:subhashpunia
ID: 34187022
Hash join runs faster then Neste loop join in case no of record in table are more and columns do not have indexes. But HASH join uses the processing power a lot and you have to consider it before changing queries on production server.
0
Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

 
LVL 2

Expert Comment

by:vasureddym
ID: 34187818
I'm actually thinking why does SQL Server 2008 (in your case) has chosen less efficient plan? Is this a hardware issue? subhashpunia has actually raised a good point.

can you please send (attach) the 2 execution plans (.sqlplan files)? and also the hardware details of the 2 servers.
0
 

Author Comment

by:PDF
ID: 34193319
@Twol
Thanks for example.I guess i was'not clear i need to now reason way is that hapening.

@ vasureddym
here are  execution plans  attached and Servers\Hardwares details .
Server 2005:
-------------      
Runing on:
M.Windows Server2003 Standard Edition SP1
Processor: Intel(R) Xeon(R) CPU 3.00GHz proc.2.99GHz
Mem. 4 GB
64-bit Operating System

@@VERSION:
Microsoft SQL Server 2005 - 9.00.1399.06 (Intel X86)  
Standard Edition on Windows NT 5.2 (Build 3790: Service Pack 1)


Server 2008:
-------------        
Runing on:
Windows Server@Standard SP2
Processor: Intel(R) Xeon(R) CPU E5420 @ 2.50GHz 2.49GHz (2 processor)
Mem. 16 GB
64-bit Operating System

@@VERSION:
Microsoft SQL Server 2008 (RTM) - 10.0.1600.22 (X64)
Standard Edition (64-bit) on Windows NT 6.0 <X64> (Build 6002: Service Pack 2)

Thanks a lot.
0
 
LVL 2

Accepted Solution

by:
vasureddym earned 250 total points
ID: 34195908
I'm successfully able to replicate the issue on my local box.

Environment:
Server 1: SQL server 2008 express instance
Server 2: SQL server 2008 enterprise instance (trail version)
These servers have linked servers setup between them

execute the below script on the server 1
--== server 1 script (DESKTOP68\SQLEXPRESS1)
create table dbo.TestMasterTable (id int identity Primary Key, nme varchar(100))
go
insert into TestMasterTable values('A')
insert into TestMasterTable values('B')
insert into TestMasterTable values('D')
go 3000

Open in new window


execute the below script on the server 2
--== server 2 script (DESKTOP68)
create table dbo.TestChildTableTemp (id int identity, nme varchar(100), id2 uniqueidentifier default(newid()))
go

insert into TestChildTableTemp(nme) values('A')
insert into TestChildTableTemp(nme) values('B')
insert into TestChildTableTemp(nme) values('C')
go 30000

-- loading from the TestChildTableTemp instead of directly loading, to generate random values in 'id' column
create table dbo.TestChildTable (id int, nme varchar(100))
insert into TestChildTable
select id, nme from TestChildTableTemp
order by id2
go

-- delete few ids so that the NOT IN query has some results
delete from TestChildTable where id between 100 and 200
go

Open in new window


Now execute the below query on each of the servers and check the execution time and execution plan. (Note: change the servers names to match your environment)
--== Query 1: using subquery
select id
from [DESKTOP68\SQLEXPRESS1].test.dbo.TestMasterTable t1
where id not in (select id from DESKTOP68.test.dbo.TestChildTable)

Open in new window


You will note that the query executed on server 2 uses Hash Match join and runs fast (just like in your case). And the query executed on server 1 uses the Nested Loop join and takes long time.

I have spent a lot of time and here is my conclusions.

1) The original culprit is not just the Nested Loop join, but is the Table Spool operator placed just before the join. The Actual Rows returned by the Table Spool operator are many hundred times the actual data in the table. I'm still not sure why that is so, but the amount of data for the Nested Loop Join is huge (can note that the CPU utilization amount in the execution plan) and is taking a lot of time to execute.

2) On the other hand, for the same query when executed on the server 1, Row Count Spool is used which has resulted in less number of records. When you carefully observe the query, it is slow when we the query in the NOT IN subquery is from different server. To confirm this, I have swapped the servers and recreated the scenario.
In other words, if the NOT IN subquery is a remote query, we have the trouble.

3) If you actually use the LEFT OUTER JOIN (even without using the HASH JOIN hint), the optimizer is able to make better decision, and is actually avoiding the SPOOL operation, and using the MERGE join (in my case). Results are also faster. code below
select t1.id
from [DESKTOP68\SQLEXPRESS1].test.dbo.TestMasterTable t1
left join DESKTOP68.test.dbo.TestChildTable t2 on t1.id = t2.id
where t2.id is null

Open in new window


Note: these are the observations that I made based on the scenario that I replicated on my local box. I tried to make it as close as that of yours based on the details you have provided. I could actually get the my execution plans same as that of the snapshots you sent.

Hope this explanation helps you understanding the issue. Please get back to me if you need any clarification, or if you have any further questions.
0
 

Author Closing Comment

by:PDF
ID: 34202262
Excellent!
Thank you very much.
0

Featured Post

Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
SQL 2012 Syntax Error 5 24
query returning everything 11 68
SQL Help joining two tables 7 33
SQL Maintenance Plan 3 16
CCModeler offers a way to enter basic information like entities, attributes and relationships and export them as yEd or erviz diagram. It also can import existing Access or SQL Server tables with relationships.
Ever needed a SQL 2008 Database replicated/mirrored/log shipped on another server but you can't take the downtime inflicted by initial snapshot or disconnect while T-logs are restored or mirror applied? You can use SQL Server Initialize from Backup…
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

706 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now