?
Solved

How to select a large amount of records in SQL SERVER

Posted on 2009-02-20
62
Medium Priority
?
458 Views
Last Modified: 2012-05-06
Select records from a table with 1.2 million records.  The returned records are 1.1 million.  
If usting the normal "Select * from table1 where fields1 = xxx", it takes about 9 seconds which is too long for the application.
I have used index on "field1".
Is there any way to improve the performance? The ideal speed bettter be within 1 second.
0
Comment
Question by:treehouse2008
  • 23
  • 12
  • 10
  • +5
60 Comments
 
LVL 15

Expert Comment

by:aibusinesssolutions
ID: 23698361
If you use Select [field1], [field2], [field3], Instead of Select *, it will speed it up quite a bit.

Also, are there any binary or image columns in the table?  Are you connecting to a local server or a remote one?
0
 

Author Comment

by:treehouse2008
ID: 23698372
Actually I only need one field, that is: "select field1 from table 1 where fields1 = xxx".
This field is int.
The server and my machine are in the same LAN.
0
 

Author Comment

by:treehouse2008
ID: 23698381
There is binary field in the table, but I do not need to select them. I only need one field which is int and is the primary key.
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 15

Expert Comment

by:aibusinesssolutions
ID: 23698387
Ok, if you had select * before, then changing that to select field1 would speed it up drastically.

Also, what type of data field is the field you are querying?
0
 

Expert Comment

by:Prasenjit_Dutta
ID: 23698393
Can you please post your exact query ?

As a primary suggestion, following query will help for speedy execution
select field1,field2 from table1(Nolock) where field1='xxx'
0
 

Author Comment

by:treehouse2008
ID: 23698396
In this case, I always only select 1 field, which is "int" type and it's primary key.
0
 
LVL 41

Expert Comment

by:Sharath
ID: 23698401
Is the column in SELECT clause and WHERE clause same? If yes, you will be ended with xxx in the query result as many times as it is repeated in your table.
select field1 from table 1 where fields1 = xxx
0
 

Author Comment

by:treehouse2008
ID: 23698404
My exact query is simple:
Select field1 from table1 with nolock  where field1 = 'xxx'
The table has about 1.2 million records.
The returned records are about 1.1 million.
 
0
 

Author Comment

by:treehouse2008
ID: 23698413
Sorry. The exact query should be:
Select field1 from table1 with nolock  where field2 = 'xxx'
The SELECT column is not the same as WHERE field
 
0
 
LVL 15

Expert Comment

by:aibusinesssolutions
ID: 23698416
Ok, what type of data field is field2?  Is it indexed?
0
 

Author Comment

by:treehouse2008
ID: 23698418
I did index on the field2 (where clause)
0
 

Author Comment

by:treehouse2008
ID: 23698419
field2 is also "int" type.
0
 
LVL 41

Expert Comment

by:Sharath
ID: 23698424
My point is if you run that query, you will get the below result.
field1
-------
xxx
xxx
xxx
....
1.1 million times
where are you using this value? You know that the value is xxx but you only want to know how many times it is repeated in your table.
you can try like this.
select count(field1) field1_count from table1 with nolock where field1 = xxx.
i guess xxx is numeric value. so i didn't enclose it in single quotes. The above query will give you the number of times you have xxx value in field1 in your table. you can use that value further in your application.
0
 
LVL 41

Expert Comment

by:Sharath
ID: 23698431
please ignore my previous post ( i overlooked as both columns are same in SELECT and WHERE clause)
0
 

Author Comment

by:treehouse2008
ID: 23698438
My query is:
select field1 from table1 with nolock where field2 = xxx.
The "field1" in Select clause is not the same as the "field2" in where clause.
I made a mistake when I first posted my query. sorry for that.
0
 
LVL 41

Expert Comment

by:Sharath
ID: 23698439

Select field1 from table1 with nolock  where field2 = xxx
add a non-clustered index on field2 and your query perfofrmance will drastically change.
0
 

Author Comment

by:treehouse2008
ID: 23698445
I did add non-clustered index on field2
0
 
LVL 15

Expert Comment

by:aibusinesssolutions
ID: 23698450
SQL Server provides a wonderful facility known as the Index Tuning Wizard which greatly enhances the index selection process. To use this tool, first use SQL Profiler to capture a trace of the activity for which you wish to optimize performance. You may wish to run the trace for an extended period of time to capture a wide range of activity. Then, using Enterprise Manager, start the Index Tuning Wizard and instruct it to recommend indexes based upon the captured trace. It will not only suggest appropriate columns for queries but also provide you with an estimate of the performance increase youll experience after making those changes!.
0
 
LVL 41

Expert Comment

by:Sharath
ID: 23698459
>> I did add non-clustered index on field2
what is the query run time now?
0
 

Author Comment

by:treehouse2008
ID: 23698460
Thank you for your advice. But in my case, only two fields are involved.  Both of the two fields have index.
0
 

Author Comment

by:treehouse2008
ID: 23698467
The 9 seconds in my original question is the query run time after I added non-cluster index on field2.
0
 
LVL 15

Expert Comment

by:aibusinesssolutions
ID: 23698477
Does it take 9 seconds if running the query in the sql profiler? or are you pulling the data in to some other software?
0
 

Author Comment

by:treehouse2008
ID: 23698487
I run it in Management Studio. I used profiler to trace it. Profile also shows the same query run time.
0
 
LVL 41

Accepted Solution

by:
Sharath earned 500 total points
ID: 23698488
I guess that is the best time that sql server can give you the result. As your table contains millions of records, you can not get all results in one second irrespective of how many indexes or any other performance related techniques you do. That is the optimum time for your query.
0
 
LVL 15

Assisted Solution

by:aibusinesssolutions
aibusinesssolutions earned 500 total points
ID: 23698493
Yeah I agree with Sharath, returning 1.1 million records is not a normal SQL query.  What are you wanting to do with that many records?  There may be a better way to do what you need.
0
 

Author Comment

by:treehouse2008
ID: 23698496
Can some kind of temp table help?
0
 
LVL 15

Expert Comment

by:aibusinesssolutions
ID: 23698499
You would still need to select into the temp table, what are you wanting to do exactly?

I can't see needing to return 1.1 million records over and over.
0
 

Author Comment

by:treehouse2008
ID: 23698506
As to why need to return 1.1 milltion, it's kind of too complicated to explain in a few words here. It's related to the whole application.
0
 
LVL 15

Expert Comment

by:aibusinesssolutions
ID: 23698526
I suggest using some sort of pagination then, if you are going to display result sets in your application. Storing 1.1 million records worth of data in a variable in an application wouldn't be good at all for memory usage.
0
 
LVL 41

Expert Comment

by:Sharath
ID: 23698534
>> As to why need to return 1.1 milltion, it's kind of too complicated to explain in a few words here. It's related to the whole application.
Thats upto you and your application. But i think that is the best time for your query. Creating temp table won't solve your problem as it may take extra seconds for execution.
0
 

Author Comment

by:treehouse2008
ID: 23698540
My task is only part of a whole project. My task is only how to get the 1.1 millon in good performance. Other things are beyong my scope.
0
 

Author Comment

by:treehouse2008
ID: 23698546
Thank both of you. Yeah, the best solution may be not to retrieve the 1.1 million records.
0
 
LVL 15

Expert Comment

by:aibusinesssolutions
ID: 23698548
If there are 1.2 million records, and your query "where field2=xxx" returns 1.1 million, does that mean they all have the same value for field2 except for the other .1 million?

And yes, there is no other way to increase the speed, besides increasing processing power and the amount of RAM in your machine.  1.1 million records uses a lot of overhead.
0
 
LVL 15

Expert Comment

by:aibusinesssolutions
ID: 23698554
I do have a suggestion..  if you just need the IDs for those 1.1 million records, you could write a stored procedure to put the IDs in a table as a single comma delimited string.  You could then query that single string as one variable, and split the data into an array in your application.
0
 

Author Comment

by:treehouse2008
ID: 23698562
The problem is that before I "put the IDs in a table as a single comma delimited string",  I need to get the 1.1 million first.
0
 
LVL 41

Expert Comment

by:Sharath
ID: 23698599
yes, i understood.  Putting 1.1 million IDs in a comma delimited string is not a good idea.
0
 
LVL 15

Expert Comment

by:aibusinesssolutions
ID: 23698605
>>The problem is that before I "put the IDs in a table as a single comma delimited string",  I need to get the 1.1 million first.

It may or may not run faster, I'm not sure, you can try it out and see how fast it is.
DECLARE @IdCSV VARCHAR(MAX)
SELECT @IdCSV = COALESCE(@IdCSV + ',','') + CAST(field AS VARCHAR)
FROM table1
WHERE field2 = xxx
 
SELECT @IdCSV

Open in new window

0
 
LVL 15

Expert Comment

by:aibusinesssolutions
ID: 23698701
@Sharath:  Pulling 1 string instead of 1.1 million records through a SELECT command may be faster, it was just a suggestion.
0
 
LVL 42

Assisted Solution

by:dqmq
dqmq earned 500 total points
ID: 23698749
Here's my conjecture.  You want to read 1.1 million rows.  That's lot of fudge to lick off the spoon, any way you stir it.   Since that's over 90% of the rows, the optimizer probably disregards the indexes, under the assumption that a full table scan is more efficient than navigating down the index structure such a  high percentage of times.

I think your best bet is to reduce the amount of pages needed to retrieve the single column you want. I suggest creating an indexed view of the two columns and then querying that instead of the table.
0
 
LVL 41

Expert Comment

by:Sharath
ID: 23698919
aibusinesssol& -  i don't think so. Asker can run both the queries and post the query times.
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 23700422
Would be inclined to create a covering index on field2 for field1 :

create nonclustered index idx_mytable_field2 on mytable
(field2)
include
(field1);

Then the "select field1 from mytable with (nolock) where field2 = 'xxx' " should be completely handled within the index and not needing to go to the underlying data.

Going to be hard for any user to do anything meaning with 1.1 million rows - are there any dupes ? maybe "select distinct field1 from mytable with (nolock) where field2 = 'xxx' " would work better in that case.

if there are 1.1 million rows for field2, then how big is the dataset / table, and is field2 reasonably unique ? Is there anything else that might help such as a period of time ?

0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 23700437
Hang on, the table has 1.2 million rows and "where field2 = 'xxx' " returns 1.1 million ?  

Gosh batman, or Holy Auto Scan, field2 is not a good candidate for an index...

Can you show the structure of the table ?
0
 

Author Comment

by:treehouse2008
ID: 23700604
I run
SELECT @IdCSV = COALESCE(@IdCSV + ',','') + CAST(field AS VARCHAR)
FROM table1 WHERE field2 = xxx
It took more than 1 minute (I canceled query after 1 minute passed).
Also, VARCHAR(MAX) only can handle at most 8000 bytes.
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 23700616
No varchar(MAX) can handle 2 gig - you can only see 8000 (ssms / query view problem, and some functions)

That is a fair bit different to : select field1 from mytable where field2 = xxx

convert to varchar will slow it down, coalesce will slow it down even more so.

0
 

Author Comment

by:treehouse2008
ID: 23700632
mark_wills:
You are right. field2 is not a good candidate for an index. And I did create some indexes which combined field2 with other fields. It help a little. The 9 seconds is the time ater I added those combined indexes.
Field1 has no dupe. It's the primary key.
The 1.1 million results are just going to be put in a table variable, which will be used by other modules.
Field2 is not unique. Among the 1.2 million records, 1.1 million of them have the same filed2. That is what I want to get.
There are many fields in the table. But what I care is just field1 and field2.
0
 
LVL 51

Assisted Solution

by:Mark Wills
Mark Wills earned 500 total points
ID: 23700644
see how long this takes - just curious :

select convert(varchar,field1) from mytable where field2 = xxx for xml path('')
0
 

Author Comment

by:treehouse2008
ID: 23700673
Wow, it's amazing.  It only took about 1.5 seconds. Cann't believe it.
0
 

Author Comment

by:treehouse2008
ID: 23700693
mark_wills:   Thank you.
Could you please  give me a script on how to put the result XML into a table variable? Thanks.
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 23700773
OK, so to get a bit more like a csv...

select (select ','+convert(varchar,[task]) from tablea for xml path('')) as idcsv

you will need to tidy up either the leading (or if you prefer, the trailing) comma. Sometimes, get the top 1 and put that in front and then the rest as the concatenated string, or, leave it at the end and let it be NULL, or substring it, etc...

0
 

Author Comment

by:treehouse2008
ID: 23700863
mark_wills:
I run select (select ','+convert(varchar,[task]) from tablea for xml path('')) as idcsv", now it took more than 3 seconds.
Besides, I want to put the 1.1 million results in a table variable, like:
Declare @temp Table (field1 Int Primary Key).
That is, @temp will have 1.1 million records.
 
Thanks.
 
 
 
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 23700944
OK, then there is no need to concatenate, and it is no longer a simple query, we now need to insert... And that will take a lot longer than 1.5 seconds


declare @tbl table (field1 int)         -- it is a memory table - no need for a primary key ? !
insert @tbl
select field1 from t_million_rows t where t.field2 = 2
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 23700962
Bit puzzled...

first was a select,
then was a concatenation
now is an insert...

Kinf of an interesting application you got going there...
0
 

Author Comment

by:treehouse2008
ID: 23700992
declare @tbl table (field1 int)        
insert @tbl
select field1 from t_million_rows t where t.field2 = 2
This is what I originally did. It took about 9 seconds. The time is the same for @tbl with primary key definition or not.


0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 23701085
Do not think it is going to happen any faster - you are doing row-wise inserts.

t_million_rows for me has two million rows and it takes around 17 seconds - on a laptop.

We had one question here in ee where they needed to do two files, one 6 million and the other 2 million in a bulk insert - all happened in less than 30 seconds - it was a big machine.

So, it kind of indicates that 9 seconds would seem to be reasonable.

Is there a more lateral solution ? like getting rid of the 100,000 rows - rather - moving them somewhere else ?
0
 
LVL 42

Expert Comment

by:dqmq
ID: 23701319
Have you tried the covering index or indexed view solution?  Either of those creates a very narrow structure that your query can access in lieu of the larger table.   Let's say rows in your table are 80 bytes.  Leaf rows in the index will be 8 bytes.  That reduces I/O by a factor of 10!  
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 23701382
Think that covering index is still a good thing to try - so long as the query optimiser doesn't step in and decide a table scan is better. In 2008 at least we can tell it what index, but in 2005 have to do the plan thingy - never had too much joy with that...
0
 
LVL 42

Expert Comment

by:dqmq
ID: 23701504
>so long as the query optimiser doesn't step in and decide a table scan is better.

That's true and partly why I prefer to try an indexed view.  I doubt the optimizer would consider a table scan when all of the required columns are materialized in the view.
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 23701524
good call - might be worthwhile...
0
 
LVL 70

Expert Comment

by:Scott Pletcher
ID: 23715549
Interesting, but I don't see how even a materialized view would help much in this situation.

I wouldn't think reading 1.1M rows is going to be that much faster than reading 1.2M rows by ANY method, since the difference in rows is less than 10%.
0
 
LVL 7

Expert Comment

by:aplusexpert
ID: 25568939
Hi,

First of all scan you table for unused indexes and drop it.

Same as above use the fields in place of *.

Thanks
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When trying to connect from SSMS v17.x to a SQL Server Integration Services 2016 instance or previous version, you get the error “Connecting to the Integration Services service on the computer failed with the following error: 'The specified service …
Microsoft provides a rich set of technologies for High Availability and Disaster Recovery solutions.
Via a live example, show how to extract insert data into a SQL Server database table using the Import/Export option and Bulk Insert.
Viewers will learn how to use the INSERT statement to insert data into their tables. It will also introduce the NULL statement, to show them what happens when no value is giving for any given column.
Suggested Courses

621 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question