Solved

SQL 2008 query: Is a numeric column faster for TOP than VARCHAR?

Posted on 2010-09-23
6
336 Views
Last Modified: 2012-05-10
I have a stored procedure that pushes data from one server to another. I have tried various approaches to get data from one place to another, but it seems just doing an insert into in 5000 row chunks is the best way to get data from siteA to siteB.
The primary table has about 20 columns, but for this post, only 3 are relevant. This table can have up to 1 billion rows depending on the month. This 2008 standard edition, so I don’t think partitioning is an option.
-      * PKID (Numeric (10,0) Primary Key) [Clustered index]
-      * TRANSACTION_DT (Datetime, Primary Key) [Clustered and non-clustered index]
-      TRANS_ID_NUM (Varchar(17) – this is created once the data is inserted, it is a concatenation of TRANSACTION_DT columns YYYYMM + PKID, since the PKID column get re-seeded every year, this number was created to give each transaction a unique number) [non-clustered index]
I recently had an issue where the proc stopped working. When I started troubleshooting the issue, I noticed that what I was doing was getting the MAX(TRANS_ID_NUM) from siteB.table, then getting the top 5000 TRANS_ID_NUM’s that were > siteB.table. MAX(TRANS_ID_NUM). I was taking these and creating a #table, then performing an INSERT INTO and doing an INNER JOIN on the #table.
So, to troubleshoot, I started just running a query to get the top 5000 rows where siteA.table > siteB.table. MAX(TRANS_ID_NUM). This was taking over 15 minutes to return 5000 TRANS_ID_NUM’s. I could not get the query to run any faster even when using the query below.

SELECT TOP 1 TRANS_ID_NUM
FROM siteA.table
WHERE  TRANSACTION_DT > ‘2010-09-01 00:00:000’

I was finally able to get the query to return in seconds by using this:

SELECT TOP 5000 PKID
FROM sitaA.table
WHERE TRANSACTION_DT > ‘2010-09-01 00:00:000’

Why is it faster? Is it because PKID is numeric? Or is because both columns in the second query are part of the composite key? I am happy I got things working, however I would feel much more comfortable knowing why.

0
Comment
Question by:sqlagent007
  • 3
  • 3
6 Comments
 
LVL 58

Expert Comment

by:cyberkiwi
ID: 33749028
You are missing an order by.  Try this

SELECT TOP 1 TRANS_ID_NUM
FROM siteA.table
WHERE  TRANSACTION_DT > ‘2010-09-01 00:00:000’
ORDER BY PKID

It should return instantly.  5000 should not be too long.
0
 
LVL 58

Expert Comment

by:cyberkiwi
ID: 33749087
Explanation about indexes.  If you have an index on TRANSACTION_DT, each index node (internal storage) contains the index key (distinct TRANSACTION_DT) and within that node a list of RID (record id, clustering key i.e. the PKID) matching that index key.

So when it collects data from the index, it already has the PKID ready.
You can make TRANS_ID_NUM available by using

create index ix_fast on siteA.[table](TRANSACTION_DT, TRANS_ID_NUM)
or
create index ix_fast on siteA.[table](TRANSACTION_DT) include (TRANS_ID_NUM)

However, I just realised the order by PKID does not really help in the earlier comment.  Since you are traversing on TRANSACTION_DT (the where clause), it will have to look up all the index nodes that satisfy the condition, then retrieve the PKIDs, then sort them to do the order by, then using the PKID, look up the TRANS_ID_NUM.  It could be faster, but maybe not fast enough.

In which case, I suggest using this query:

SELECT TOP 1 TRANS_ID_NUM
FROM siteA.table
WHERE  TRANSACTION_DT > ‘2010-09-01 00:00:000’
ORDER BY TRANS_ID_NUM

or

SELECT TOP 1 TRANS_ID_NUM
FROM siteA.table
WHERE  TRANSACTION_DT > ‘2010-09-01 00:00:000’
ORDER BY TRANSACTION_DT

along with this index

create index ix_fast on siteA.[table](TRANSACTION_DT, TRANS_ID_NUM)

So that it can collect the TRANS_ID_NUM in sequential order from the index keys alone (sorted) without going into the included data or record lookup.
0
 
LVL 1

Author Comment

by:sqlagent007
ID: 33749196
This query takes 0:00 seconds to run

SELECT TOP 5000
PKID
FROM siteA.table
WHERE TRANSACTION_DT > '2010-09-12 00:00:00:000'

This query takes 0:45 seconds to run

SELECT TOP 5000
TRANS_ID_NUM
FROM siteA.table
WHERE TRANSACTION_DT > '2010-09-12 00:00:00:000'

The only differences in the column PKID and TRANS_ID_NUM is that PKID is numeric, and PKID is part of the primary key, so it has a clustered index. TRANS_ID_NUM has a non clustered index on it, and it is VARCHAR.
0
Comprehensive Backup Solutions for Microsoft

Acronis protects the complete Microsoft technology stack: Windows Server, Windows PC, laptop and Surface data; Microsoft business applications; Microsoft Hyper-V; Azure VMs; Microsoft Windows Server 2016; Microsoft Exchange 2016 and SQL Server 2016.

 
LVL 1

Author Comment

by:sqlagent007
ID: 33749231
Oh...I see. If I had
- TRANSACTION_DT
- PKID
- TRANS_ID_NUM
all in 1 index, then the data would be collected in the sequential order I need for my query....right?
0
 
LVL 58

Accepted Solution

by:
cyberkiwi earned 500 total points
ID: 33749424
Not the triplet, unless you really want the result ordered by PKID?

Index on the pair : (TRANSACTION_DT, TRANS_ID_NUM)
The first part is for the WHERE, the 2nd for the ORDER BY and TOP
TOP should always be paired with ORDER BY.

If you have TRANS_ID_NUM and TRANSACTION_DT in separate indexes, you can imagine it doesn't really help since it cannot really use the TRANS_ID_NUM index in tandem.
0
 
LVL 1

Author Closing Comment

by:sqlagent007
ID: 33749442
I understand now, all the comments you provided were very helpful. Thank you.
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

SQL Server engine let you use a Windows account or a SQL Server account to connect to a SQL Server instance. This can be configured immediatly during the SQL Server installation or after in the Server Authentication section in the Server properties …
In this article we will get to know that how can we recover deleted data if it happens accidently. We really can recover deleted rows if we know the time when data is deleted by using the transaction log.
Access reports are powerful and flexible. Learn how to create a query and then a grouped report using the wizard. Modify the report design after the wizard is done to make it look better. There will be another video to explain how to put the final p…
Here's a very brief overview of the methods PRTG Network Monitor (https://www.paessler.com/prtg) offers for monitoring bandwidth, to help you decide which methods you´d like to investigate in more detail.  The methods are covered in more detail in o…

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now