Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium


how to force a HASH JOIN in SQL 2008

Posted on 2010-11-20
Medium Priority
Last Modified: 2012-05-10
--On Server 2008
FROM Table1
---execution time  7min  
table1  ID is PK and table has 10177 records
--On Server 2005
FROM  [SERVER2008].[DataBase1].DBO.Table1
--less than 1second.
table2 no indexes and has 103180 records.

in execution plan SERVER2008 use NestedLoops cost 96%
In     ---  ||  ----      SERVER2005 use Hash Match  cost 10%

Question is :
How to force a hash join in this case.
Or why the optimizer is choosing NestedLoops.
Question by:PDF

Expert Comment

ID: 34181833
here is an example of how to force a hash join:

USE AdventureWorks2008R2;
SELECT p.Name, pr.ProductReviewID
FROM Production.Product p
LEFT OUTER HASH JOIN Production.ProductReview pr
ON p.ProductID = pr.ProductID
ORDER BY ProductReviewID DESC;
LVL 16

Expert Comment

ID: 34183288
This should be a working solution :
  FROM Table1
  LEFT OUTER HASH JOIN [SERVER2005].[DataBase2].DBO.Table2 Table2 ON Table1.ID = Table2.FNAME

Open in new window


Expert Comment

ID: 34187022
Hash join runs faster then Neste loop join in case no of record in table are more and columns do not have indexes. But HASH join uses the processing power a lot and you have to consider it before changing queries on production server.
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!


Expert Comment

ID: 34187818
I'm actually thinking why does SQL Server 2008 (in your case) has chosen less efficient plan? Is this a hardware issue? subhashpunia has actually raised a good point.

can you please send (attach) the 2 execution plans (.sqlplan files)? and also the hardware details of the 2 servers.

Author Comment

ID: 34193319
Thanks for example.I guess i was'not clear i need to now reason way is that hapening.

@ vasureddym
here are  execution plans  attached and Servers\Hardwares details .
Server 2005:
Runing on:
M.Windows Server2003 Standard Edition SP1
Processor: Intel(R) Xeon(R) CPU 3.00GHz proc.2.99GHz
Mem. 4 GB
64-bit Operating System

Microsoft SQL Server 2005 - 9.00.1399.06 (Intel X86)  
Standard Edition on Windows NT 5.2 (Build 3790: Service Pack 1)

Server 2008:
Runing on:
Windows Server@Standard SP2
Processor: Intel(R) Xeon(R) CPU E5420 @ 2.50GHz 2.49GHz (2 processor)
Mem. 16 GB
64-bit Operating System

Microsoft SQL Server 2008 (RTM) - 10.0.1600.22 (X64)
Standard Edition (64-bit) on Windows NT 6.0 <X64> (Build 6002: Service Pack 2)

Thanks a lot.

Accepted Solution

vasureddym earned 1000 total points
ID: 34195908
I'm successfully able to replicate the issue on my local box.

Server 1: SQL server 2008 express instance
Server 2: SQL server 2008 enterprise instance (trail version)
These servers have linked servers setup between them

execute the below script on the server 1
--== server 1 script (DESKTOP68\SQLEXPRESS1)
create table dbo.TestMasterTable (id int identity Primary Key, nme varchar(100))
insert into TestMasterTable values('A')
insert into TestMasterTable values('B')
insert into TestMasterTable values('D')
go 3000

Open in new window

execute the below script on the server 2
--== server 2 script (DESKTOP68)
create table dbo.TestChildTableTemp (id int identity, nme varchar(100), id2 uniqueidentifier default(newid()))

insert into TestChildTableTemp(nme) values('A')
insert into TestChildTableTemp(nme) values('B')
insert into TestChildTableTemp(nme) values('C')
go 30000

-- loading from the TestChildTableTemp instead of directly loading, to generate random values in 'id' column
create table dbo.TestChildTable (id int, nme varchar(100))
insert into TestChildTable
select id, nme from TestChildTableTemp
order by id2

-- delete few ids so that the NOT IN query has some results
delete from TestChildTable where id between 100 and 200

Open in new window

Now execute the below query on each of the servers and check the execution time and execution plan. (Note: change the servers names to match your environment)
--== Query 1: using subquery
select id
from [DESKTOP68\SQLEXPRESS1].test.dbo.TestMasterTable t1
where id not in (select id from DESKTOP68.test.dbo.TestChildTable)

Open in new window

You will note that the query executed on server 2 uses Hash Match join and runs fast (just like in your case). And the query executed on server 1 uses the Nested Loop join and takes long time.

I have spent a lot of time and here is my conclusions.

1) The original culprit is not just the Nested Loop join, but is the Table Spool operator placed just before the join. The Actual Rows returned by the Table Spool operator are many hundred times the actual data in the table. I'm still not sure why that is so, but the amount of data for the Nested Loop Join is huge (can note that the CPU utilization amount in the execution plan) and is taking a lot of time to execute.

2) On the other hand, for the same query when executed on the server 1, Row Count Spool is used which has resulted in less number of records. When you carefully observe the query, it is slow when we the query in the NOT IN subquery is from different server. To confirm this, I have swapped the servers and recreated the scenario.
In other words, if the NOT IN subquery is a remote query, we have the trouble.

3) If you actually use the LEFT OUTER JOIN (even without using the HASH JOIN hint), the optimizer is able to make better decision, and is actually avoiding the SPOOL operation, and using the MERGE join (in my case). Results are also faster. code below
select t1.id
from [DESKTOP68\SQLEXPRESS1].test.dbo.TestMasterTable t1
left join DESKTOP68.test.dbo.TestChildTable t2 on t1.id = t2.id
where t2.id is null

Open in new window

Note: these are the observations that I made based on the scenario that I replicated on my local box. I tried to make it as close as that of yours based on the details you have provided. I could actually get the my execution plans same as that of the snapshots you sent.

Hope this explanation helps you understanding the issue. Please get back to me if you need any clarification, or if you have any further questions.

Author Closing Comment

ID: 34202262
Thank you very much.

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this blog post, we’ll look at how using thread_statistics can cause high memory usage.
MSSQL DB-maintenance also needs implementation of multiple activities. However, unprecedented errors can hamper the database management. In that case, deploying Stellar SQL Database Toolkit ensures fast and accurate database and backup repair as wel…
In this video, Percona Solutions Engineer Barrett Chambers discusses some of the basic syntax differences between MySQL and MongoDB. To learn more check out our webinar on MongoDB administration for MySQL DBA: https://www.percona.com/resources/we…
Despite its rising prevalence in the business world, "the cloud" is still misunderstood. Some companies still believe common misconceptions about lack of security in cloud solutions and many misuses of cloud storage options still occur every day. …

580 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question