Link to home
Start Free TrialLog in
Avatar of sqlagent007
sqlagent007Flag for United States of America

asked on

SQL 2008 query: Is a numeric column faster for TOP than VARCHAR?

I have a stored procedure that pushes data from one server to another. I have tried various approaches to get data from one place to another, but it seems just doing an insert into in 5000 row chunks is the best way to get data from siteA to siteB.
The primary table has about 20 columns, but for this post, only 3 are relevant. This table can have up to 1 billion rows depending on the month. This 2008 standard edition, so I don’t think partitioning is an option.
-      * PKID (Numeric (10,0) Primary Key) [Clustered index]
-      * TRANSACTION_DT (Datetime, Primary Key) [Clustered and non-clustered index]
-      TRANS_ID_NUM (Varchar(17) – this is created once the data is inserted, it is a concatenation of TRANSACTION_DT columns YYYYMM + PKID, since the PKID column get re-seeded every year, this number was created to give each transaction a unique number) [non-clustered index]
I recently had an issue where the proc stopped working. When I started troubleshooting the issue, I noticed that what I was doing was getting the MAX(TRANS_ID_NUM) from siteB.table, then getting the top 5000 TRANS_ID_NUM’s that were > siteB.table. MAX(TRANS_ID_NUM). I was taking these and creating a #table, then performing an INSERT INTO and doing an INNER JOIN on the #table.
So, to troubleshoot, I started just running a query to get the top 5000 rows where siteA.table > siteB.table. MAX(TRANS_ID_NUM). This was taking over 15 minutes to return 5000 TRANS_ID_NUM’s. I could not get the query to run any faster even when using the query below.

SELECT TOP 1 TRANS_ID_NUM
FROM siteA.table
WHERE  TRANSACTION_DT > ‘2010-09-01 00:00:000’

I was finally able to get the query to return in seconds by using this:

SELECT TOP 5000 PKID
FROM sitaA.table
WHERE TRANSACTION_DT > ‘2010-09-01 00:00:000’

Why is it faster? Is it because PKID is numeric? Or is because both columns in the second query are part of the composite key? I am happy I got things working, however I would feel much more comfortable knowing why.

Avatar of cyberkiwi
cyberkiwi
Flag of New Zealand image

You are missing an order by.  Try this

SELECT TOP 1 TRANS_ID_NUM
FROM siteA.table
WHERE  TRANSACTION_DT > ‘2010-09-01 00:00:000’
ORDER BY PKID

It should return instantly.  5000 should not be too long.
Explanation about indexes.  If you have an index on TRANSACTION_DT, each index node (internal storage) contains the index key (distinct TRANSACTION_DT) and within that node a list of RID (record id, clustering key i.e. the PKID) matching that index key.

So when it collects data from the index, it already has the PKID ready.
You can make TRANS_ID_NUM available by using

create index ix_fast on siteA.[table](TRANSACTION_DT, TRANS_ID_NUM)
or
create index ix_fast on siteA.[table](TRANSACTION_DT) include (TRANS_ID_NUM)

However, I just realised the order by PKID does not really help in the earlier comment.  Since you are traversing on TRANSACTION_DT (the where clause), it will have to look up all the index nodes that satisfy the condition, then retrieve the PKIDs, then sort them to do the order by, then using the PKID, look up the TRANS_ID_NUM.  It could be faster, but maybe not fast enough.

In which case, I suggest using this query:

SELECT TOP 1 TRANS_ID_NUM
FROM siteA.table
WHERE  TRANSACTION_DT > ‘2010-09-01 00:00:000’
ORDER BY TRANS_ID_NUM

or

SELECT TOP 1 TRANS_ID_NUM
FROM siteA.table
WHERE  TRANSACTION_DT > ‘2010-09-01 00:00:000’
ORDER BY TRANSACTION_DT

along with this index

create index ix_fast on siteA.[table](TRANSACTION_DT, TRANS_ID_NUM)

So that it can collect the TRANS_ID_NUM in sequential order from the index keys alone (sorted) without going into the included data or record lookup.
Avatar of sqlagent007

ASKER

This query takes 0:00 seconds to run

SELECT TOP 5000
PKID
FROM siteA.table
WHERE TRANSACTION_DT > '2010-09-12 00:00:00:000'

This query takes 0:45 seconds to run

SELECT TOP 5000
TRANS_ID_NUM
FROM siteA.table
WHERE TRANSACTION_DT > '2010-09-12 00:00:00:000'

The only differences in the column PKID and TRANS_ID_NUM is that PKID is numeric, and PKID is part of the primary key, so it has a clustered index. TRANS_ID_NUM has a non clustered index on it, and it is VARCHAR.
Oh...I see. If I had
- TRANSACTION_DT
- PKID
- TRANS_ID_NUM
all in 1 index, then the data would be collected in the sequential order I need for my query....right?
ASKER CERTIFIED SOLUTION
Avatar of cyberkiwi
cyberkiwi
Flag of New Zealand image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I understand now, all the comments you provided were very helpful. Thank you.