how to force a HASH JOIN in SQL 2008

Posted on 2010-11-20
Last Modified: 2012-05-10
--On Server 2008
FROM Table1
---execution time  7min  
table1  ID is PK and table has 10177 records
--On Server 2005
FROM  [SERVER2008].[DataBase1].DBO.Table1
--less than 1second.
table2 no indexes and has 103180 records.

in execution plan SERVER2008 use NestedLoops cost 96%
In     ---  ||  ----      SERVER2005 use Hash Match  cost 10%

Question is :
How to force a hash join in this case.
Or why the optimizer is choosing NestedLoops.
Question by:PDF
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions

Expert Comment

ID: 34181833
here is an example of how to force a hash join:

USE AdventureWorks2008R2;
SELECT p.Name, pr.ProductReviewID
FROM Production.Product p
LEFT OUTER HASH JOIN Production.ProductReview pr
ON p.ProductID = pr.ProductID
ORDER BY ProductReviewID DESC;
LVL 16

Expert Comment

ID: 34183288
This should be a working solution :
  FROM Table1
  LEFT OUTER HASH JOIN [SERVER2005].[DataBase2].DBO.Table2 Table2 ON Table1.ID = Table2.FNAME

Open in new window


Expert Comment

ID: 34187022
Hash join runs faster then Neste loop join in case no of record in table are more and columns do not have indexes. But HASH join uses the processing power a lot and you have to consider it before changing queries on production server.
Windows Server 2016: All you need to know

Learn about Hyper-V features that increase functionality and usability of Microsoft Windows Server 2016. Also, throughout this eBook, you’ll find some basic PowerShell examples that will help you leverage the scripts in your environments!


Expert Comment

ID: 34187818
I'm actually thinking why does SQL Server 2008 (in your case) has chosen less efficient plan? Is this a hardware issue? subhashpunia has actually raised a good point.

can you please send (attach) the 2 execution plans (.sqlplan files)? and also the hardware details of the 2 servers.

Author Comment

ID: 34193319
Thanks for example.I guess i was'not clear i need to now reason way is that hapening.

@ vasureddym
here are  execution plans  attached and Servers\Hardwares details .
Server 2005:
Runing on:
M.Windows Server2003 Standard Edition SP1
Processor: Intel(R) Xeon(R) CPU 3.00GHz proc.2.99GHz
Mem. 4 GB
64-bit Operating System

Microsoft SQL Server 2005 - 9.00.1399.06 (Intel X86)  
Standard Edition on Windows NT 5.2 (Build 3790: Service Pack 1)

Server 2008:
Runing on:
Windows Server@Standard SP2
Processor: Intel(R) Xeon(R) CPU E5420 @ 2.50GHz 2.49GHz (2 processor)
Mem. 16 GB
64-bit Operating System

Microsoft SQL Server 2008 (RTM) - 10.0.1600.22 (X64)
Standard Edition (64-bit) on Windows NT 6.0 <X64> (Build 6002: Service Pack 2)

Thanks a lot.

Accepted Solution

vasureddym earned 250 total points
ID: 34195908
I'm successfully able to replicate the issue on my local box.

Server 1: SQL server 2008 express instance
Server 2: SQL server 2008 enterprise instance (trail version)
These servers have linked servers setup between them

execute the below script on the server 1
--== server 1 script (DESKTOP68\SQLEXPRESS1)
create table dbo.TestMasterTable (id int identity Primary Key, nme varchar(100))
insert into TestMasterTable values('A')
insert into TestMasterTable values('B')
insert into TestMasterTable values('D')
go 3000

Open in new window

execute the below script on the server 2
--== server 2 script (DESKTOP68)
create table dbo.TestChildTableTemp (id int identity, nme varchar(100), id2 uniqueidentifier default(newid()))

insert into TestChildTableTemp(nme) values('A')
insert into TestChildTableTemp(nme) values('B')
insert into TestChildTableTemp(nme) values('C')
go 30000

-- loading from the TestChildTableTemp instead of directly loading, to generate random values in 'id' column
create table dbo.TestChildTable (id int, nme varchar(100))
insert into TestChildTable
select id, nme from TestChildTableTemp
order by id2

-- delete few ids so that the NOT IN query has some results
delete from TestChildTable where id between 100 and 200

Open in new window

Now execute the below query on each of the servers and check the execution time and execution plan. (Note: change the servers names to match your environment)
--== Query 1: using subquery
select id
from [DESKTOP68\SQLEXPRESS1].test.dbo.TestMasterTable t1
where id not in (select id from DESKTOP68.test.dbo.TestChildTable)

Open in new window

You will note that the query executed on server 2 uses Hash Match join and runs fast (just like in your case). And the query executed on server 1 uses the Nested Loop join and takes long time.

I have spent a lot of time and here is my conclusions.

1) The original culprit is not just the Nested Loop join, but is the Table Spool operator placed just before the join. The Actual Rows returned by the Table Spool operator are many hundred times the actual data in the table. I'm still not sure why that is so, but the amount of data for the Nested Loop Join is huge (can note that the CPU utilization amount in the execution plan) and is taking a lot of time to execute.

2) On the other hand, for the same query when executed on the server 1, Row Count Spool is used which has resulted in less number of records. When you carefully observe the query, it is slow when we the query in the NOT IN subquery is from different server. To confirm this, I have swapped the servers and recreated the scenario.
In other words, if the NOT IN subquery is a remote query, we have the trouble.

3) If you actually use the LEFT OUTER JOIN (even without using the HASH JOIN hint), the optimizer is able to make better decision, and is actually avoiding the SPOOL operation, and using the MERGE join (in my case). Results are also faster. code below
from [DESKTOP68\SQLEXPRESS1].test.dbo.TestMasterTable t1
left join DESKTOP68.test.dbo.TestChildTable t2 on =
where is null

Open in new window

Note: these are the observations that I made based on the scenario that I replicated on my local box. I tried to make it as close as that of yours based on the details you have provided. I could actually get the my execution plans same as that of the snapshots you sent.

Hope this explanation helps you understanding the issue. Please get back to me if you need any clarification, or if you have any further questions.

Author Closing Comment

ID: 34202262
Thank you very much.

Featured Post

Free eBook: Backup on AWS

Everything you need to know about backup and disaster recovery with AWS, for FREE!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article explains how to reset the password of the sa account on a Microsoft SQL Server.  The steps in this article work in SQL 2005, 2008, 2008 R2, 2012, 2014 and 2016.
In this article we will get to know that how can we recover deleted data if it happens accidently. We really can recover deleted rows if we know the time when data is deleted by using the transaction log.
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

726 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question